Planet Musings

January 28, 2026

John BaezTiny Musical Intervals

Music theorists have studied many fractions of the form

2i 3j 5k

that are close to 1. They’re called 5-limit commas. Especially cherished are those that have fairly small exponents—given how close they are to 1. I discussed a bunch here:

Well temperaments (part 2).

and I explained the tiniest named one, the utterly astounding ‘atom of Kirnberger’, here:

Well temperaments (part 3).

The atom of Kirnberger equals

2161 · 3-84 · 5-12 ≈ 1.0000088728601397

Two pitches differing by this ratio sound the same to everyone except certain cleverly designed machines. But remarkably, the atom of Kirnberger shows up rather naturally in music—and it was discovered by a student of Bach! Read my article for details.

All this made me want to systematically explore such tiny intervals. Below is a table of them. Some have names but many do not—or at least I don’t know their names. I list these numbers in decimal form and also in cents, where we take the logarithm of the number in base 2 and multiply by 100. (I dislike this blend of base 2 and base 10, but it’s traditional in music theory.)

Most importantly, I list the monzo. This is the list of exponents: for example, the monzo of

2i 3j 5k

is

[i, j, k]

In case you’re wondering, this term was named after the music theorist Joseph Monzo.

I also list the Tenney height. This is a measure of complexity: the Tenney height of

2i 3j 5k

is

∣i​∣ log2​(2) + ∣j​∣ log​2(3) + ∣k​∣ log2​(5)

The table below purports to list only 5-limit commas that are close to 1 as possible for a given complexity. More precisely, it should list numbers of the form 2i 3j 5k that are > 1 and closer to 1 than any number with smaller Tenney height—except of course for 1 itself.

Cents Decimal Name Monzo Tenney height
498.04 1.3333333333 just perfect fourth [2, −1, 0] 3.6
386.31 1.2500000000 just major third [−2, 0, 1] 4.3
315.64 1.2000000000 just minor third [1, 1, −1] 4.9
203.91 1.1250000000 major tone [−3, 2, 0] 6.2
182.40 1.1111111111 minor tone [1, −2, 1] 6.5
111.73 1.0666666667 diatonic semitone [4, −1, −1] 7.9
70.67 1.0416666667 lesser chromatic semitone [−3, −1, 2] 9.2
21.51 1.0125000000 syntonic comma [−4, 4, −1] 12.7
19.55 1.0113580247 diaschisma [11, −4, −2] 22.0
8.11 1.0046939300 kleisma [−6, −5, 6] 27.9
1.95 1.0011291504 schisma [−15, 8, 1] 30.0
1.38 1.0007999172 unnamed? [38, −2, −15] 76.0
0.86 1.0004979343 unnamed? [1, −27, 18] 85.6
0.57 1.0003289700 unnamed? [−53, 10, 16] 106.0
0.29 1.0001689086 unnamed? [54, −37, 2] 117.3
0.23 1.0001329015 unnamed? [−17, 62, −35] 196.5
0.047 1.0000271292 unnamed? [−90, −15, 49] 227.5
0.0154 1.0000088729 atom of Kirnberger [161, −84, −12] 322.0
0.0115 1.0000066317 unnamed? [21, 290, −207] 961.3
0.00088 1.0000005104 quark of Baez [−573, 237, 85] 1146.0

You’ll see there’s a big increase in Tenney height after the schisma. This is very interesting: it suggests that the schisma is the last ‘useful’ interval. It’s useful only in that it’s the ratio of two musically important commas, the syntonic comma and the Pythagorean comma, and life in music would be simpler if these were equal. All the intervals in this table up to the schisma were discovered by musicians a long time ago, and they all have standard names! But after the schisma, interest drops off dramatically.

The atom of Kirnberger has such amazing properties that it was worth naming. The rest, maybe not. But as you can see, I’ve taken the liberty of naming the smallest interval in the table the ‘quark of Baez’. This is much smaller than all that come before. It’s in bad taste to name things after oneself—indeed this is item 25 on the crackpot index—but I hope it’s allowed as a joke.

I also hope that in the future this is considered my smallest mathematical discovery.

Here is the Python code that should generate the above information. If you’re good at programming, please review it and check it! Someone gave me a gift subscription to Claude, and it (more precisely Opus 4.5) created this code. It seems to make sense, and I’ve checked a bunch of the results, but I don’t know Python.

from math import log2

log3 = log2(3)
log5 = log2(5)

commas = []

max_exp_3 = 1200
max_exp_5 = 250

for a3 in range(-max_exp_3, max_exp_3+1):
    for a5 in range(-max_exp_5, max_exp_5+1):
        if a3 == 0 and a5 == 0:
            continue

# Find a2 that minimizes |a2 + a3 * log2(3) + a5 * log2(5)|

        target = -(a3 * log3 + a5 * log5)
        a2 = round(target)
        
        log2_ratio = a2 + a3 * log3 + a5 * log5
        cents = abs(1200 * log2_ratio)
        
        if cents > 0.00001:  # non-trivial
            tenney = abs(a2) + abs(a3) * log3 + abs(a5) * log5
            commas.append((tenney, cents, a2, a3, a5))

# Find Pareto frontier

commas.sort(key=lambda x: x[0])  # sort by Tenney height

frontier = []
best_cents = float('inf')
for c in commas:
    if c[1] < best_cents:
        best_cents = c[1]
        frontier.append(c)

# Print results 

for tenney, cents, a2, a3, a5 in frontier:
    log2_ratio = a2 + a3 * log3 + a5 * log5
    decimal = 2**log2_ratio
    if decimal < 1:
        decimal = 1/decimal
        a2, a3, a5 = -a2, -a3, -a5
    print(f"{cents:.6f} cents | {decimal:.10f} | [{a2}, {a3}, {a5}] | Tenney: {tenney:.1f}")

Gene Ward Smith

In studying this subject I discovered that tiny 5-limit intervals were studied by Gene Ward Smith, a mathematician I used to see around on sci.math and the like. I never knew he worked on microtonal music! I am sad to hear that he died from COVID-19 in January 2021.

I may just be redoing a tiny part of his work: if anyone can find details, please let me know. In his memory, I’ll conclude with this article from the Xenharmonic Wiki:

Gene Ward Smith (1947–2021) was an American mathematician, music theorist, and composer.

In mathematics, he worked in the areas of Galois theory and Moonshine theory.

In music theory, he introduced wedge products as a way of classifying regular temperaments. In this system, a temperament is specified by means of a wedgie, which may technically be identified as a point on a Grassmannian. He had long drawn attention to the relationship between equal divisions of the octave and the Riemann zeta function.[1][2][3] He early on identified and emphasized free abelian groups of finite rank and their homomorphisms, and it was from that perspective that he contributed to the creation of the regular mapping paradigm.

In the 1970s, Gene experimented with musical compositions using a device with four square-wave voices, whose tuning was very stable and accurate, being controlled by a crystal oscillator. The device in turn was controlled by HP 9800 series desktop computers, initially the HP 9830A, programmed in HP Basic, later the 9845A. Using this, he explored both just intonation with a particular emphasis on groups of transformations, and pajara.

Gene had a basic understanding of the regular mapping paradigm during this period, but it was limited in practice since he was focused on the idea that the next step from meantone should keep some familiar features, and so was interested in tempering out 64/63 in place of 81/80. He knew 7-limit 12 and 22 had tempering out 64/63 and 50/49 in common, and 12 and 27 had tempering out 64/63 and 126/125 in common, and thought these would be logical places to progress to, blending novelty with familiarity. While he never got around to working with augene, he did consider it. For pajara, he found tempering certain JI scales, the 10 and 12 note highschool scales, led to interesting (omnitetrachordal) results, and that there were also closely related symmetric (MOS) scales of size 10 and 12 for pajara; he did some work with these, particularly favoring the pentachordal decatonic scale.

Gene was among the first to consider extending the Tonnetz of Hugo Riemann beyond the 5-limit and hence into higher dimensional lattices. In three dimensions, the hexagonal lattice of 5-limit harmony extends to a lattice of type A3 ~ D3. He is also the first to write music in a number of exotic intonation systems.

Historical interest

Usenet post from 1990 by Gene Smith on homomorphisms and kernels
Usenet post from 1995 by Gene Smith on homomorphisms and kernels

See also

Microtonal music by Gene Ward Smith
Hypergenesis58 (a scale described by Gene Ward Smith)

References

[1] Rusin, Dave. “Why 12 tones per octave?

[2] OEIS. Increasingly large peaks of the Riemann zeta function on the critical line: OEIS: A117536.

[3] OEIS. Increasingly large integrals of the Z function between zeros: OEIS: A117538.

January 27, 2026

Terence TaoA crowdsourced repository for optimization constants?

Thomas Bloom’s Erdös problem site has become a real hotbed of activity in recent months, particularly as some of the easiest of the outstanding open problems have turned out to be amenable to various AI-assisted approaches; there is now a lively community in which human contributions, AI contributions, and hybrid contributions are presented, discussed, and in some cases approved as updates to the site.

One of the lessons I draw from this is that once a well curated database of precise mathematical problems is maintained, it becomes possible for other parties to build upon it in many ways (including both AI-based and human-based approaches), to systematically make progress on some fraction of the problems.

This makes me wonder what other mathematical databases could be created to stimulate similar activity. One candidate that came to mind are “optimization constants” – constants {C} that arise from some mathematical optimization problem of interest, for instance finding the best constant {C} for which a certain functional inequality is satisfied.

I am therefore proposing to create a crowdsourced repository for such constants, to record the best upper and lower bounds known for any given such constant, in order to help encourage efforts (whether they be by professional mathematicians, amateur mathematicians, or research groups at a tech company) to try to improve upon the state of the art.

There are of course thousands of such constants one could consider, but just to set the discussion going, I set up a very minimal, proof of concept Github repository holding over 20 constants including:

  1. {C_{1a}}, the constant in a certain autocorrelation quantity relating to Sidon sets. (This constant seems to have a surprisingly nasty optimizer; see this tweet thread of Damek Davis.)
  2. {C_{1b}}, the constant in Erdös’ minimum overlap problem.

Here, I am taking inspiration from the Erdös problem web site and arbitrarily assigning a number to each constant, for ease of reference.

Even in this minimal state I think the repository is ready to start accepting more contributions, in the form of pull requests that add new constants, or improve the known bounds on existing constants. (I am particularly interested in constants that have an extensive literature of incremental improvements in the lower and upper bounds, and which look at least somewhat amenable to computational or AI-assisted approaches.) But I would be interested to hear feedback on how to improve the repository in other ways.

Update: Paata Ivanisvili and Damek Davis have kindly agreed to help run and expand this repository.

January 26, 2026

n-Category Café Categorifying Riemann's Functional Equation

David Jaz Myers just sent me some neat comments on this paper of mine:

and he okayed me posting them here. He’s taking the idea of categorifying the Riemann zeta function, explained in my paper, and going further, imagining what it might mean to categorify Riemann’s functional equation

ξ(s)=ξ(1s) \xi(s) = \xi(1-s)

where ξ\xi is the ‘completed’ Riemann zeta function, which has an extra factor taking into account the ‘real prime’:

ξ(s)=π s/2Γ(s/2) pprime11p s \xi(s) = \pi^{-s/2}\Gamma(s/2) \prod_{p \; prime} \frac{1}{1 - p^{-s}}

My paper categorified the Euler product formula that writes the Riemann zeta function as a product over the usual primes:

ζ(s)= pprime11p s \zeta(s) = \prod_{p \; prime} \frac{1}{1 - p^{-s}}

I had nothing to say about the real prime.

But it’s the functional equation that sets the stage for focusing on zeroes of the Riemann zeta function with Re(s)=1/2\text{Re}(s) = 1/2… and then the Riemann Hypothesis! So it’s worth thinking about.

David wrote:

Hi John,

Hope you’re doing well!

I was just thinking about your (and James Dolan’s) definition of the zeta functors associated to a finite type scheme (from here), and I had a small thought which I figured you might find interesting.

I was thinking about the functional equation of the completed zeta functions; how might we complete the zeta functors in such a way that they satisfy a similar functional equation? I don’t know, but I do have an idea for what the transformation s1ss \mapsto 1 -s might mean in this context. I claim that it is given by the reduced suspension. Let me explain.

First, I’ll want to see the formal power n sn^{-s} as the power

(1n) s,\left(\frac{1}{n}\right)^s,

which I can then categorify by finding a group GG with cardinality nn and considering BG SB G^S. In the case of the Riemann zeta species, nn is the cardinality of a finite semisimple ring (a product of finite fields, the groupoid of which has cardinality 11 for each nn), and we can simply deloop the additive group of this ring. This gives us a Dirichlet functor ζ\zeta

S k finite semisimpleBk SS \mapsto \sum_{k \; \text{ finite semisimple}} B k^S

which categorifies the Riemann zeta function when SS is a finite set.

Taking this point of view on the zeta functor, we can ask the question: what is the transformation s1ss \mapsto 1-s? Here’s where we can look at the reduced suspension ΣS +\Sigma S_+. The universal property of the reduced suspension says that maps ΣS +Y\Sigma S_+ \to Y correspond to points of the homotopy type

(x:Y)×(y:Y)×(S +x=y)(x : Y) \times (y : Y) \times (S_+ \to x = y)

(or, more classically, maps from the terminal morphism S +1S_+ \to 1 to Path(Y)Y×Y\mathsf{Path}(Y) \to Y \times Y). Since homotopy cardinality is multiplicative for fibrations, that type has cardinality

yy(1y) s+1=(1y) s1y \cdot y \cdot \left(\frac{1}{y}\right)^{s + 1} = \left(\frac{1}{y}\right)^{s -1}

(when SS is a finite set of cardinality ss).

Taking Y=BkY = B k for kk finite semisimple of cardinality nn, we see that ΣS +Bk\Sigma S_+ \to B k has cardinality n s1=n (1s)n^{s -1} = n^{-(1 -s)}. Therefore, I think the transformation s1ss \mapsto 1 - s in the functional equation may be categorified by SΣS +S \mapsto \Sigma S_+. If this makes sense, it suggests that completing the zeta functors is a form of stabilization.

Cheers,
David

And then:

As for another eyebrow wiggle about the cardinality of ΣS +\Sigma S_+ when SS is a finite set: we have that ΩΣS +=S\Omega \Sigma S_+ = \langle S \rangle, the free group on SS generators. This is of course infinite, but it it is the group completion of the free monoid List(S)\mathsf{List}(S) on SS generators. Since

List(S)=1+S+S 2+S 3+,\mathsf{List}(S) = 1 + S + S^2 + S^3 + \cdots,

it has cardinality 11s\frac{1}{1 - s}.

Maybe it’s better to use the “free delooping” (aka weighted colimit of 111 \rightrightarrows 1 by S1S \rightrightarrows 1) instead of the reduced suspension. This doesn’t change the above argument because we’re mapping into a groupoid, but now it is true that the Euler characteristic / cardinality of that category is 1s1 - s.

January 25, 2026

Doug NatelsonWhat is superconductivity?

A friend pointed out that, while I've written many posts that have to do with superconductivity, I've never really done a concept post about it.  Here's a try, as I attempt to distract myself from so many things happening these days.

The superconducting state is a truly remarkable phase of matter that is hosted in many metals (though ironically not readily in the pure elements (Au, Ag, Cu) that are the best ordinary conductors of electricity - see here for some references).  First, some definitional/phenomenological points:

  • The superconducting state is a distinct thermodynamic phase.  In the language of phase transitions developed by Ginzburg and Landau back in the 1950s, the superconducting state has an order parameter that is nonzero, compared to the non-superconducting metal state.   When you cool down a metal and it becomes a superconductor, this really is analogous (in some ways) to when you cool down liquid water and it becomes ice, or (a better comparison) when you cool down very hot solid iron and it becomes a magnet below 770 °C.
  • In the superconducting state, at DC, current can flow with zero electrical resistance.  Experimentally, this can be checked by setting up a superconducting current loop and monitoring the current via the magnetic field it produces.  If you find that the current will decay over somewhere between \(10^5\) and \(\infty\) years, that's pretty convincing that the resistance is darn close to zero. 
  • This is not just "perfect" conduction.  If you placed a conductor in a magnetic field, turned on perfect conduction, and then tried to change the magnetic field, currents would develop currents that would preserve the amount of magnetic flux through the perfect conductor.  In contrast, a key signature of superconductivity is the Meissner-Oschenfeld Effect:  if superconductivity is turned on in the presence of a (sufficiently small) magnetic field, currents will develop spontaneously at the surface of the material to exclude all magnetic flux from the bulk of the superconductor.  (That is, the magnetic field from the currents will be oppositely directed to the external field and of just the right size and distribution to give \(\mathbf{B}=0\) in the bulk of the material.)  Observation of the bulk Meissner effect is among the strongest evidence for true superconductivity, much more robust than a measurement that seems to indicate zero voltage drop.  Indeed, as a friend of mine pointed out to me, a one-phrase description of a superconductor is "a perfect diamagnet".  
  • There are two main types of superconductors, uncreatively termed "Type I" and "Type II".  In Type I superconductors, an external \(\mathbf{H} = \mathbf{B}/\mu_{0}\) fails to penetrate the bulk of the material until it reaches a critical field \(H_{c}\), at which point the superconducting state is suppressed completely.  In a Type II superconductor, above some lower critical field \(H_{c,1}\) magnetic flux begins to penetrate the material in the form of vortices, each of which has a non-superconducting ("normal") core.  Above an upper critical field \(H_{c,2}\), superconductivity is suppressed. 
  • Interestingly, a lot of this can be "explained" by the London Equations, which were introduced in the 1930s despite a complete lack of a viable microscopic theory of superconductivity.
  • Magnetic flux through a conventional superconducting ring (or through a vortex core) is quantized precisely in units of \(h/2e\), where \(h\) is Planck's constant and \(e\) is the electronic charge.  
  • (It's worth noting that in magnetic fields and with AC currents, there are still electrical losses in superconductors, due in part to the motion of vortices.)
Physically, what is the superconducting state?  Why does it happen and why does it have the weird properties described above as well as others?  There are literally entire textbooks and semester-long courses on this, so what follows is very brief and non-authoritative.  
  • In an ordinary metal at low temperatures, neglecting e-e interactions and other complications, the electrons fill up states (because of the Pauli Principle) starting from the lowest energy up to some highest value, the Fermi energy.  (See here for some mention of this.)   Empty electronic states are available at essentially no energy cost - exciting electrons from filled states to empty states are "gapless".  
  • Electrical conduction takes place through the flow of these electronic quasiparticles.   (For more technical readers:  We can think of these quasiparticles like little wavepackets, and as each one propagates around the wavepacket accumulates a certain amount of phase.  The phases of different quasiparticles are arbitrary, but the change in the phase going around some trajectory is well defined.)
  • In a superconductor, there is some effective attractive interaction between electrons that we have thus far neglected.  In conventional superconductors, this involves lattice vibrations (as in this wikipedia description), though other attractive interactions are possible.  At sufficiently low temperatures, the ordinary metal state is unstable, and the system will spontaneously form pairs of electrons (or holes).  Those pairs then condense into a single coherent state described by an amplitude \(|\Psi|\) and a phase, \(\phi\), shared by all the pairs.  The conventional theory of this was formulated by Bardeen, Cooper, and Schrieffer in 1957.  A couple of nice lecture note presentations of this are here (courtesy Yuval Oreg) and here (courtesy Dan Arovas), if you want the technical details.  This leads to an energy gap that characterizes how much it costs to create individual quasiparticles.  Conduction in a superconductor takes place through the flow of pairs.  (A clue to this is the appearance of the \(2e\) in the flux quantization.)
  • This taking on of a global phase for the pairs of electrons is a spontaneous breaking of gauge symmetry - this is discussed pedagogically for physics students here.  Understanding this led to figuring out the Anderson-Higgs mechanism, btw. 
  • The result is a state with a kind of rigidity; precisely how this leads to the phenomenology of superconductivity is not immediately obvious, to me anyway.  If someone has a link to a great description of this, please put it in the comments.  (Interestingly google gemini is not too bad at discussing this.)
  • The existence of this global phase is hugely important, because it's the basis for the Josephson effect(s), which in turn has led to the basis of exquisite magnetic field sensing, all the superconducting approaches to quantum information, and the definition of the volt, etc.
  • The paired charge carriers are described by a pairing symmetry of their wave functions in real space.  In conventional BCS superconductors, each pair has no orbital angular momentum ("\(s\)-wave"), and the spins are in a singlet state.  In other superconductors, pairs can have \(l = 1\) orbital angular momentum ("\(p\)-wave", with spins in the triplet configuration), \(l = 2\) orbital angular momentum ("\(d\)-wave", with spins in a singlet again), etc.  The pairing state determines whether the energy gap is directionally uniform (\(s\)-wave) or whether there are directions ("nodes") along which the gap goes to zero.  
I have necessarily left out a ton here.  Superconductivity continues to be both technologically critical and scientifically fascinating.  One major challenge in understanding the microscopic mechanisms behind particular superconductors is that the superconducting state itself is in a sense generic - many of its properties (like phase rigidity) are emergent regardless of the underlying microscopic picture, which is amazing.

One other point, added after initial posting. In quantum computing approaches, a major challenge is how to build robust effective ("logical") qubits from individual physical qubits that are not perfect (meaning that they suffer from environmental decoherence among other issues).  The phase coherence of electronic quasiparticles in ordinary metals is generally quite fragile; inelastic interactions with each other, with phonons, with impurity spins, etc. can all lead to decoherence.  However, starting from those ingredients, superconductivity shows that it is possible to construct, spontaneously, a collective state with very long-lived coherence.  I'm certain I'm not the first to wonder about whether there are lessons to be drawn here in terms of the feasibility of and approaches to quantum error correction.

John PreskillQuantum computing in the second quantum century

On December 10, I gave a keynote address at the Q2B 2025 Conference in Silicon Valley. This is a transcript of my remarks. The slides I presented are here. The video is here.

The first century

We are nearing the end of the International Year of Quantum Science and Technology, so designated to commemorate the 100th anniversary of the discovery of quantum mechanics in 1925. The story goes that 23-year-old Werner Heisenberg, seeking relief from severe hay fever, sailed to the remote North Sea Island of Helgoland, where a crucial insight led to his first, and notoriously obscure, paper describing the framework of quantum mechanics.

In the years following, that framework was clarified and extended by Heisenberg and others. Notably among them was Paul Dirac, who emphasized that we have a theory of almost everything that matters in everyday life. It’s the Schrödinger equation, which captures the quantum behavior of many electrons interacting electromagnetically with one another and with atomic nuclei. That describes everything in chemistry and materials science and all that is built on those foundations. But, as Dirac lamented, in general the equation is too complicated to solve for more than a few electrons.

Somehow, over 50 years passed before Richard Feynman proposed that if we want a machine to help us solve quantum problems, it should be a quantum machine, not a classical machine. The quest for such a machine, he observed, is “a wonderful problem because it doesn’t look so easy,” a statement that still rings true.

I was drawn into that quest about 30 years ago. It was an exciting time. Efficient quantum algorithms for the factoring and discrete log problems were discovered, followed rapidly by the first quantum error-correcting codes and the foundations of fault-tolerant quantum computing. By late 1996, it was firmly established that a noisy quantum computer could simulate an ideal quantum computer efficiently if the noise is not too strong or strongly correlated. Many of us were then convinced that powerful fault-tolerant quantum computers could eventually be built and operated.

Three decades later, as we enter the second century of quantum mechanics, how far have we come? Today’s quantum devices can perform some tasks beyond the reach of the most powerful existing conventional supercomputers. Error correction had for decades been a playground for theorists; now informative demonstrations are achievable on quantum platforms. And the world is investing heavily in advancing the technology further.

Current NISQ machines can perform quantum computations with thousands of two-qubit gates, enabling early explorations of highly entangled quantum matter, but still with limited commercial value. To unlock a wide variety of scientific and commercial applications, we need machines capable of performing billions or trillions of two-qubit gates. Quantum error correction is the way to get there.

I’ll highlight some notable developments over the past year—among many others I won’t have time to discuss. (1) We’re seeing intriguing quantum simulations of quantum dynamics in regimes that are arguably beyond the reach of classical simulations. (2) Atomic processors, both ion traps and neutral atoms in optical tweezers, are advancing impressively. (3) We’re acquiring a deeper appreciation of the advantages of nonlocal connectivity in fault-tolerant protocols. (4) And resource estimates for cryptanalytically relevant quantum algorithms have dropped sharply.

Quantum machines for science

A few years ago, I was not particularly excited about running applications on the quantum platforms that were then available; now I’m more interested. We have superconducting devices from IBM and Google with over 100 qubits and two-qubit error rates approaching 10^{-3}. The Quantinuum ion trap device has even better fidelity as well as higher connectivity. Neutral-atom processors have many qubits; they lag behind now in fidelity, but are improving.

Users face tradeoffs: The high connectivity and fidelity of ion traps is an advantage, but their clock speeds are orders of magnitude slower than for superconducting processors. That limits the number of times you can run a given circuit, and therefore the attainable statistical accuracy when estimating expectations of observables.

Verifiable quantum advantage

Much attention has been paid to sampling from the output of random quantum circuits, because this task is provably hard classically under reasonable assumptions. The trouble is that, in the high-complexity regime where a quantum computer can reach far beyond what classical computers can do, the accuracy of the quantum computation cannot be checked efficiently. Therefore, attention is now shifting toward verifiable quantum advantage — tasks where the answer can be checked. If we solved a factoring or discrete log problem, we could easily check the quantum computer’s output with a classical computation, but we’re not yet able to run these quantum algorithms in the classically hard regime. We might settle instead for quantum verification, meaning that we check the result by comparing two quantum computations and verifying the consistency of the results.

A type of classical verification of a quantum circuit was demonstrated recently by BlueQubit on a Quantinuum processor. In this scheme, a designer builds a family of so-called “peaked” quantum circuits such that, for each such circuit and for a specific input, one output string occurs with unusually high probability. An agent with a quantum computer who knows the circuit and the right input can easily identify the preferred output string by running the circuit a few times. But the quantum circuits are cleverly designed to hide the peaked output from a classical agent — one may argue heuristically that the classical agent, who has a description of the circuit and the right input, will find it hard to predict the preferred output. Thus quantum agents, but not classical agents, can convince the circuit designer that they have reliable quantum computers. This observation provides a convenient way to benchmark quantum computers that operate in the classically hard regime.

The notion of quantum verification was explored by the Google team using Willow. One can execute a quantum circuit acting on a specified input, and then measure a specified observable in the output. By repeating the procedure sufficiently many times, one obtains an accurate estimate of the expectation value of that output observable. This value can be checked by any other sufficiently capable quantum computer that runs the same circuit. If the circuit is strategically chosen, then the output value may be very sensitive to many-qubit interference phenomena, in which case one may argue heuristically that accurate estimation of that output observable is a hard task for classical computers. These experiments, too, provide a tool for validating quantum processors in the classical hard regime. The Google team even suggests that such experiments may have practical utility for inferring molecular structure from nuclear magnetic resonance data.

Correlated fermions in two dimensions

Quantum simulations of fermionic systems are especially compelling, since electronic structure underlies chemistry and materials science. These systems can be hard to simulate in more than one dimension, particularly in parameter regimes where fermions are strongly correlated, or in other words profoundly entangled. The two-dimensional Fermi-Hubbard model is a simplified caricature of two-dimensional materials that exhibit high-temperature superconductivity and hence has been much studied in recent decades. Large-scale tensor-network simulations are reasonably successful at capturing static properties of this model, but the dynamical properties are more elusive.

Dynamics in the Fermi-Hubbard model has been simulated recently on both Quantinuum (here and here) and Google processors. Only a 6 x 6 lattice of electrons was simulated, but this is already well beyond the scope of exact classical simulation. Comparing (error-mitigated) quantum circuits with over 4000 two-qubit gates to heuristic classical tensor-network and Majorana path methods, discrepancies were noted, and the Phasecraft team argues that the quantum simulation results are more trustworthy. The Harvard group also simulated models of fermionic dynamics, but were limited to relatively low circuit depths due to atom loss. It’s encouraging that today’s quantum processors have reached this interesting two-dimensional strongly correlated regime, and with improved gate fidelity and noise mitigation we can go somewhat further, but expanding system size substantially in digital quantum simulation will require moving toward fault-tolerant implementations. We should also note that there are analog Fermi-Hubbard simulators with thousands of lattice sites, but digital simulators provide greater flexibility in the initial states we can prepare, the observables we can access, and the Hamiltonians we can reach.

When it comes to many-particle quantum simulation, a nagging question is: “Will AI eat quantum’s lunch?” There is surging interest in using classical artificial intelligence to solve quantum problems, and that seems promising. How will AI impact our quest for quantum advantage in this problem space? This question is part of a broader issue: classical methods for quantum chemistry and materials have been improving rapidly, largely because of better algorithms, not just greater processing power. But for now classical AI applied to strongly correlated matter is hampered by a paucity of training data.  Data from quantum experiments and simulations will likely enhance the power of classical AI to predict properties of new molecules and materials. The practical impact of that predictive power is hard to clearly foresee.

The need for fundamental research

Today is December 10th, the anniversary of Alfred Nobel’s death. The Nobel Prize award ceremony in Stockholm concluded about an hour ago, and the Laureates are about to sit down for a well-deserved sumptuous banquet. That’s a fitting coda to this International Year of Quantum. It’s useful to be reminded that the foundations for today’s superconducting quantum processors were established by fundamental research 40 years ago into macroscopic quantum phenomena. No doubt fundamental curiosity-driven quantum research will continue to uncover unforeseen technological opportunities in the future, just as it has in the past.

I have emphasized superconducting, ion-trap, and neutral atom processors because those are most advanced today, but it’s vital to continue to pursue alternatives that could suddenly leap forward, and to be open to new hardware modalities that are not top-of-mind at present. It is striking that programmable, gate-based quantum circuits in neutral-atom optical-tweezer arrays were first demonstrated only a few years ago, yet that platform now appears especially promising for advancing fault-tolerant quantum computing. Policy makers should take note!

The joy of nonlocal connectivity

As the fault-tolerant era dawns, we increasingly recognize the potential advantages of the nonlocal connectivity resulting from atomic movement in ion traps and tweezer arrays, compared to geometrically local two-dimensional processing in solid-state devices. Over the past few years, many contributions from both industry and academia have clarified how this connectivity can reduce the overhead of fault-tolerant protocols.

Even when using the standard surface code, the ability to implement two-qubit logical gates transversally—rather than through lattice surgery—significantly reduces the number of syndrome-measurement rounds needed for reliable decoding, thereby lowering the time overhead of fault tolerance. Moreover, the global control and flexible qubit layout in tweezer arrays increase the parallelism available to logical circuits.

Nonlocal connectivity also enables the use of quantum low-density parity-check (qLDPC) codes with higher encoding rates, reducing the number of physical qubits needed per logical qubit for a target logical error rate. These codes now have acceptably high accuracy thresholds, practical decoders, and—thanks to rapid theoretical progress this year—emerging constructions for implementing universal logical gate sets. (See for example here, here, here, here.)

A serious drawback of tweezer arrays is their comparatively slow clock speed, limited by the timescales for atom transport and qubit readout. A millisecond-scale syndrome-measurement cycle is a major disadvantage relative to microsecond-scale cycles in some solid-state platforms. Nevertheless, the reductions in logical-gate overhead afforded by atomic movement can partially compensate for this limitation, and neutral-atom arrays with thousands of physical qubits already exist.

To realize the full potential of neutral-atom processors, further improvements are needed in gate fidelity and continuous atom loading to maintain large arrays during deep circuits. Encouragingly, active efforts on both fronts are making steady progress.

Approaching cryptanalytic relevance

Another noteworthy development this year was a significant improvement in the physical qubit count required to run a cryptanalytically relevant quantum algorithm, reduced by Gidney to less than 1 million physical qubits from the 20 million Gidney and Ekerå had estimated earlier. This applies under standard assumptions: a two-qubit error rate of 10^{-3} and 2D geometrically local processing. The improvement was achieved using three main tricks. One was using approximate residue arithmetic to reduce the number of logical qubits. (This also suppresses the success probability and therefore lengthens the time to solution by a factor of a few.) Another was using a more efficient scheme to reduce the number of physical qubits for each logical qubit in cold storage. And the third was a recently formulated scheme for reducing the spacetime cost of non-Clifford gates. Further cost reductions seem possible using advanced fault-tolerant constructions, highlighting the urgency of accelerating migration from vulnerable cryptosystems to post-quantum cryptography.

Looking forward

Over the next 5 years, we anticipate dramatic progress toward scalable fault-tolerant quantum computing, and scientific insights enabled by programmable quantum devices arriving at an accelerated pace. Looking further ahead, what might the future hold? I was intrigued by a 1945 letter from John von Neumann concerning the potential applications of fast electronic computers. After delineating some possible applications, von Neumann added: “Uses which are not, or not easily, predictable now, are likely to be the most important ones … they will … constitute the most surprising extension of our present sphere of action.” Not even a genius like von Neumann could foresee the digital revolution that lay ahead. Predicting the future course of quantum technology is even more hopeless because quantum information processing entails an even larger step beyond past experience.

As we contemplate the long-term trajectory of quantum science and technology, we are hampered by our limited imaginations. But one way to loosely characterize the difference between the past and the future of quantum science is this: For the first hundred years of quantum mechanics, we achieved great success at understanding the behavior of weakly correlated many-particle systems, leading for example to transformative semiconductor and laser technologies. The grand challenge and opportunity we face in the second quantum century is acquiring comparable insight into the complex behavior of highly entangled states of many particles, behavior well beyond the scope of current theory or computation. The wonders we encounter in the second century of quantum mechanics, and their implications for human civilization, may far surpass those of the first century. So we should gratefully acknowledge the quantum pioneers of the past century, and wish good fortune to the quantum explorers of the future.

Credit: Iseult-Line Delfosse LLC, QC Ware

John PreskillHas quantum advantage been achieved? Part 2: Considering the evidence

Welcome back to: Has quantum advantage been achieved?

In Part 1 of this mini-series on quantum advantage demonstrations, I told you about the idea of random circuit sampling (RCS) and the experimental implementations thereof. In this post, Part 2 out of 3, I will discuss the arguments and evidence for why I am convinced that the experiments demonstrate a quantum advantage.

Recall from Part 1 that to assess an experimental quantum advantage claim we need to check three criteria:

  1. Does the experiment correctly solve a computational task?
  2. Does it achieve a scalable advantage over classical computation?
  3. Does it achieve an in-practice advantage over the best classical attempt at solving the task?

What’s the issue?

When assessing these criteria for the RCS experiments there is an important problem: The early quantum computers we ran them on were very far from being reliable and the computation was significantly corrupted by noise. How should we interpret this noisy data? Or more concisely:

  1. Is random circuit sampling still classically hard even when we allow for whatever amount of noise the actual experiments had?
  2. Can we be convinced from the experimental data that this task has actually been solved?

I want to convince you today that we have developed a very good understanding of these questions that gives a solid underpinning to the advantage claim. Developing that understanding required a mix of methodologies from different areas of science, including theoretical computer science, algorithm design, and physics and has been an exciting journey over the past years.

The noisy sampling task

Let us start by answering the base question. What computational task did the experiments actually solve?

Recall that, in the ideal RCS scenario, we are given a random circuit CC on nnqubits and the task is to sample from the output distribution of the state obtained |C\ket C from applying the circuit CC to a simple reference state. The output probability distribution of this state is determined by the Born rule when I measure every qubit in a fixed choice of basis.

Now what does a noisy quantum computer do when I execute all the gates on it and apply them to its state? Well, it prepares a noisy version ρC\rho_ C of the intended state |C\ket C and once I measure the qubits, I obtain samples from the output distribution of that noisy state.

We should not make the task dependent on the specifics of that state or the noise that determined it, but we can define a computational task based on this observation by fixing how accurate that noisy state preparation is. The natural way to do this is to use the fidelity

F(C)=C|ρC|C, F(C) = \bra C \rho_C \ket C,

which is just the overlap between the ideal state and the noisy state. The fidelity is 1 if the noisy state is equal to the ideal state, and 0 if it is perfectly orthogonal to it.

Finite-fidelity random circuit sampling
Given a typical random circuit CC, sample from the output distribution of any quantum state whose fidelity with the ideal output state |C\ket C is at least δ\delta.

Note that finite-fidelity RCS does not demand success for every circuit, but only for typical circuits from the random circuit ensemble. This matches what the experiments do: they draw random circuits and need the device to perform well on the overwhelming majority of those draws. Accordingly, when the experiments quote a single number as “fidelity”, it is really the typical (or, more precisely, circuit-averaged) fidelity that I will just call FF.

The noisy experiments claim to have solved finite-fidelity RCS for values of δ\delta around 0.1%. What is more, they consistently achieve this value even as the circuit sizes are increased in the later experiments. Both the actual value and the scaling will be important later.

What is the complexity of finite-fidelity RCS?

Quantum advantage of finite-fidelity RCS

Let’s start off by supposing that the quantum computation is (nearly) perfectly executed, so the required fidelity δ\delta is quite large, say, 90%. In this scenario, we have very good evidence based on computational complexity theory that there is a scalable and in-practice quantum advantage for RCS. This evidence is very strong, comparable to the evidence we have for the hardness of factoring and simulating quantum systems. The intuition behind it is that quantum output probabilities are extremely hard to compute because of a mechanism behind quantum advantages: destructive interference. If you are interested in the subtleties and the open questions, take a look at our survey.

The question is now, how far down in fidelity this classical hardness persists? Intuitively, the smaller we make δ\delta, the easier finite-fidelity RCS should become for a classical algorithm (and a quantum computer, too), since the freedom we have in deviating from the ideal state in our simulation becomes larger and larger. This increases the possibility of finding a state that turns out to be easy to simulate within the fidelity constraint.

Somewhat surprisingly, though, finite-fidelity RCS seems to remain hard even for very small values of δ\delta. I am not aware of any efficient classical algorithm that achieves the finite-fidelity task for δ\delta significantly away from the baseline trivial value of 2n2^{-n}. This is the value a maximally mixed or randomly picked state achieves because a random state has no correlation with the ideal state (or any other state), and 2n2^{-n} is exactly what you expect in that case (while 0 would correspond to perfect anti-correlation).

One can save some classical runtime compared to solving near-ideal RCS by exploiting a reduced fidelity, but the costs remain exponential. To classically solve finite-fidelity RCS, the best known approaches are reported in the papers that performed classical simulations of finite-fidelity RCS with the parameters of the first Google and USTC experiment (classSim1, classSim2). To achieve this, however, they needed to approximately simulate the ideal circuits at an immense cost. To the best of my knowledge, all but those first two experiments are far out of reach for these algorithms.

Getting the scaling right: weak noise and low depth

So what is the right value of δ\delta at which we can hope for a scalable and in-practice advantage of RCS experiments?

When thinking about this question, it is helpful to keep a model of the circuit in mind that a noisy experiment runs. So, let us consider a noisy circuit on nn qubits with dd layers of gates and single-qubit noise of strength ε\varepsilon on every qubit in every layer. In this scenario, the typical fidelity with the ideal state will decay as Fexp(εnd)F \sim \exp(- \varepsilon n d).

Any reasonably testable value of the fidelity needs to scale as 1/𝗉𝗈𝗅𝗒(n)1/\mathsf{poly}(n), since eventually we need to estimate the average fidelity FF from the experimental samples and this typically requires at least 1/F21/F^2 samples, so exponentially small fidelities are experimentally invisible. The polynomial fidelity δ\delta is also much closer to the near-ideal scenario (δ\delta \geq90%) than the trivial scenario (δ=2n\delta = 2^{-n}). While we cannot formally pin this down, the intuition behind the complexity-theoretic evidence for the hardness of near-ideal RCS persists into the δ1/𝗉𝗈𝗅𝗒(n)\delta \sim 1/\mathsf{poly}(n) regime: to sample up to such high precision, we still need a reasonably accurate estimate of the ideal probabilities, and getting this is computationally extremely difficult. Scalable quantum advantage in this regime is therefore a pretty safe bet.

How do the parameters of the experiment and the RCS instances need to scale with the number of qubits nn to experimentally achieve the fidelity regime? The limit to consider is one in which the noise rate decreases with the number of qubits, while the circuit depth is only allowed to increase very slowly. It depends on the circuit architecture, i.e., the choice of circuit connectivity, and the gate set, through a constant cAc_A as I will explain in more detail below.

Weak-noise and low-depth scaling
(Weak noise) The local noise rate of the quantum device scales as ε<cA/n\varepsilon \lt c_A/n.
(Low depth) The circuit depth scales as dlognd \lesssim \log n.

This limit is such that we have a scaling of the fidelity as FncF \gtrsim n^{-c} for some constant cc. It is also a natural scaling limit for noisy devices whose error rates gradually improve through better engineering. You might be worried about the fact that the depth needs to be quite low but it turns out that there is a solid quantum advantage even for log(n)\log(n)-depth circuits.

The precise definition of the weak-noise regime is motivated by the following observation. It turns out to be crucial for assessing the noisy data from the experiment.

Fidelity versus XEB: a phase transition

Remember from Part 1 that the experiments measured a quantity called the cross-entropy benchmark (XEB)

χ=22n𝔼C𝔼xpC(x)1,\chi = 2^{2n} \mathbb E_C \mathbb E_{x} p_C(x) -1 ,

The XEB averages the ideal probabilities pC(x)p_C(x) corresponding to the sampled outcomes xx from experiments on random circuits CC. Thus, it correlates the experimental and ideal output distributions of those random circuits. You can think of it as a “classical version” of the fidelity: If the experimental distribution is correct, the XEB will essentially be 1. If it is uniformly random, the XEB is 0.

The experiments claimed that the XEB is a good proxy for the circuit-averaged fidelity given by F=𝔼CF(C)F = \mathbb E_C F(C), and so we need to understand when this is true. Fortunately, in the past few years, alongside with the improved experiments, we have developed a very good understanding of this question (WN, Spoof2, PT1, PT2).

It turns out that the quality of correspondence between XEB and average fidelity depends strongly on the noise in the experimental quantum state. In fact, there is a sharp phase transition: there is an architecture-dependent constant cAc_A such that when the experimental local noise rate ε<cA/n\varepsilon < c_A/n, then the XEB is a good and reliable proxy for the average fidelity for any system size nn and circuit depth dd. This is exactly the weak-noise regime. Above that threshold, in the strong noise regime, the XEB is an increasingly bad proxy for the fidelity (PT1, PT2).

Let me be more precise: In the weak-noise regime, when we consider the decay of the XEB as a function of circuit depth dd, the rate of decay is given by εn\varepsilon n, i.e., the XEB decays as exp(εnd)\exp(- \varepsilon n d ). Meanwhile, in the strong-noise regime the rate of decay is constant, giving an XEB decay as exp(Cd)\exp(- C d) for a constant CC. At the same time, the fidelity decays as exp(εnd)\exp(- \varepsilon n d ) regardless of the noise regime. Hence, in the weak-noise regime, the XEB is a good proxy of the fidelity, while in the strong noise regime, there is an exponentially increasing gap between the XEB (which remains large) and the fidelity (which continues to decay exponentially). regardless of the noise regime.

This is what the following plot shows. We computed it from an exact mapping of the behavior of the XEB to the dynamics of a statistical-mechanics model that can be evaluated efficiently. Using this mapping, we can also compute the noise threshold cAc_A for whichever random circuit family and architecture you are interested in.

From (PT2). The yy-axis label Δ(lnχ)\Delta( \ln \chi ) is the decay rate of the XEB χ\chi, N=nN=n the number of qubits and ε\varepsilon is the local noise rate.

Where are the experiments?

We are now ready to take a look at the crux when assessing the noisy data: Can we trust the reported XEB values as an estimator of the fidelity? If so, do the experiments solve finite-fidelity RCS in the solidly hard regime where δ1/𝗉𝗈𝗅𝗒(n)\delta \geq 1/ \mathsf{poly}(n)?

In their more recent paper (PT1), the Google team explicitly verified that the experiment is well below the phase transition, and it turns out that the first experiment was just at the boundary. The USTC experiments had comparable noise rates, and the Quantinuum experiment much better ones. Since fidelity decays as exp(εnd)\exp(-\varepsilon n d), but the reported XEB values stayed consistently around 0.1% as nn was increased, the experimental error rate ε\varepsilon of the experiments improved even better than the 1/n1/n scaling required for the weak-noise regime, namely, more like ε1/(nd)\varepsilon \sim 1/(nd). Altogether, the experiments are therefore in the weak-noise regime both in terms of absolute numbers relative to cA/nc_A/n and the required scaling.

Of course, to derive the transition, we made some assumptions about the noise such as that the noise is local, and that it does not depend much on the circuit itself. In the advantage experiments, these assumptions about the noise are characterized and tested. This is done through a variety of means at increasing levels of complexity, including detailed characterization of the noise in individual gates, gates run in parallel, and eventually in a larger circuit. The importance of understanding the noise shows in the fact that a significant portion of the supplementary materials of the advantage experiments is dedicated to getting this right. All of this contributes to the experimental justification for using the XEB as a proxy for the fidelity!

The data shows that the experiments solved finite-fidelity RCS for values of δ\delta above the constant value of roughly 0.1% as the experiments grew. In the following plot, I compare the experimental fidelity values to the near-ideal scenario on the one hand, and the trivial 2n2^{-n} value on the other hand. Viewed at this scale, the values of δ\delta for which the experiment solved finite-fidelity RCS are indeed vastly closer to the near-ideal value than the trivial baseline, which should boost our confidence that reproducing samples at a similar fidelity is extremely challenging.

The phase transition matters!

You might be tempted to say: “Well, but is all this really so important? Can’t I just use XEB and forget all about fidelity?”

The phase transition shows why that would change the complexity of the problem: in the strong-noise regime, XEB can stay high even when fidelity is exponentially small. And indeed, this discrepancy can be exploited by so-called spoofers for the XEB. These are efficient classical algorithms which can be used to succeed at a quantum advantage test even though they clearly do not achieve the intended advantage. These spoofers (Spoof1, Spoof2) can achieve high XEB scores comparable to those of the experiments and scaling like exp(cd)\exp(-cd) in the circuit depth dd for some constant cc.

Their basic idea is to introduce strong, judiciously chosen noise at specific circuit locations that has the effect of breaking up the simulation task up into smaller, much easier components, but at the same time still gives a high XEB score. In doing so, they exploit the strong-noise regime in which the XEB is a really bad proxy for the fidelity. This allows them to sample from states with exponentially low fidelity while achieving a high XEB value.

The discovery of the phase transition and the associated spoofers highlights the importance of modeling when assessing—and even formulating—the advantage claim based on noisy data.

But we can’t compute the XEB!

You might also be worried that the experiments did not actually compute the XEB in the advantage regime because to estimate it they would have needed to compute ideal probabilities—a task that is hard by definition of the advantage regime. Instead, they used a bunch of different ways to extrapolate the true XEB from XEB proxies (proxy of a proxy of the fidelity). Is this is a valid way of getting an estimate of the true XEB?

It totally is! Different extrapolations—from easy-to-simulate to hard-to-simulate, from small system to large system etc—all gave consistent answers for the experimental XEB value of the supremacy circuits. Think of this as having several lines that cross in the same point. For that crossing to be a coincidence, something crazy, conspiratorial must happen exactly when you move to the supremacy circuits from different directions. That is why it is reasonable to trust the reported value of the XEB.

That’s exactly how experiments work!

All of this is to say that establishing that the experiments correctly solved finite-fidelity RCS and therefore show quantum advantage involved a lot of experimental characterization of the noise as well as theoretical work to understand the effects of noise on the quantity we care about—the fidelity between the experimental and ideal states.

In this respect (and maybe also in the scale of the discovery), the quantum advantage experiments are similar to the recent experiments reporting discovery of the Higgs boson and gravitational waves. While I do not claim to understand any of the details, what I do understand is that in both experiments, there is an unfathomable amount of data that could not be interpreted without preselection and post-processing of the data, theories, extrapolations and approximations that model the experiment and measurement apparatus. All of those enter the respective smoking-gun plots that show the discoveries.

If you believe in the validity of experimental physics methodology, you should therefore also believe in the type of evidence underlying experimental claim of the quantum advantage demonstrations: that they sampled from the output distribution of a quantum state with the reported fidelities.

Put succinctly: If you believe in the Higgs boson and gravitational waves, you should probably also believe in the experimental demonstration of quantum advantage.

What are the counter-arguments?

The theoretical computer scientist

“The weak-noise limit is not physical. The appropriate scaling limit is one in which the local noise rate of the device is constant while the system size grows, and in that case, there is a classical simulation algorithm for RCS (SimIQP, SimRCS).”

In theoretical computer science, scaling of time or the system size in the input size is considered very natural: We say an algorithm is efficient if its runtime and space usage only depend polynomially on the input size.

But all scaling arguments are hypothetical concepts, and we only care about the scaling at relevant sizes. In the end, every scaling limit is going to hit the wall of physical reality—be it the amount of energy or human lifetime that limits the time of an algorithm, or the physical resources that are required to build larger and larger computers. To keep the scaling limit going as we increase the size of our computations, we need innovation that makes the components smaller and less noisy.

At the scales relevant to RCS, the 1/n1/n scaling of the noise is benign and even natural. Why? Well, currently, the actual noise in quantum computers is not governed by the fundamental limit, but by engineering challenges. Realizing this limit therefore amounts to engineering improvements in the system size and noise rate that are achieved over time. Sure, at some point that scaling limit is also going to hit a fundamental barrier below which the noise cannot be improved. But we are surely far away from that limit, yet. What is more, already now logical qubits are starting to work and achieve beyond-breakeven fidelities. So even if the engineering improvements should flatten out from here onward, QEC will keep the 1/n1/n noise limit going and even accelerate it in the intermediate future.

The complexity maniac

“All the hard complexity-theoretic evidence for quantum advantage is in the near-ideal regime, but now you are claiming advantage for the low-fidelity version of that task.”

This is probably the strongest counter-argument in my opinion, and I gave my best response above. Let me just add that this is a question about computational complexity. In the end, all of complexity theory is based on belief. The only real evidence we have for the hardness of any task is the absence of an efficient algorithm, or the reduction to a paradigmatic, well-studied task for which there is no efficient algorithm.

I am not sure how much I would bet that you cannot find an efficient algorithm for finite-fidelity RCS in the regime of the experiments, but it is certainly a pizza.

The enthusiastic skeptic

“There is no verification test that just depends on the classical samples, is efficient and does not make any assumptions about the device. In particular, you cannot unconditionally verify fidelity just from the classical samples. Why should I believe the data?”

Yes, sure, the current advantage demonstrations are not device-independent. But the comparison you should have in mind are Bell tests. The first proper Bell tests of Aspect and others in the 80s were not free of loopholes. They still allowed for contrived explanations of the data that did not violate local realism. Still, I can hardly believe that anyone would argue that Bell inequalities were not violated already back then.

As the years passed, these remaining loopholes were closed. To be a skeptic of the data, people needed to come up with more and more adversarial scenarios that could explain the data. We are working on the same to happen with quantum advantage demonstrations: come up with better schemes and better tests that require less and less assumptions or knowledge about the specifics of the device.

The “this is unfair” argument

“When you chose the gates and architecture of the circuit dependent on your device, you tailored the task too much to the device and that is unfair. Not even the different RCS experiments solve exactly the same task.”

This is not really an argument against the achievement of quantum advantage but more against the particular choices of circuit ensembles in the experiments. Sure, the specific computations solved are still somewhat tailored to the hardware itself and in this sense the experiments are not hardware-independent yet, but they still solve fine computational tasks. Moving away from such hardware-tailored task specifications is another important next step and we are working on it.


In the third and last part of this mini series I will address next steps in quantum advantage that aim at closing some of the remaining loopholes. The most important—and theoretically interesting—one is to enable efficient verification of quantum advantage using less or even no specific knowledge about the device that was used, but just the measurement outcomes.

References

(survey) Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys. 95, 035001 (2023).

(classSim1) Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).

(classSim2) Kalachev, G., Panteleev, P., Zhou, P. & Yung, M.-H. Classical Sampling of Random Quantum Circuits with Bounded Fidelity. arXiv.2112.15083 (2021).

(WN) Dalzell, A. M., Hunter-Jones, N. & Brandão, F. G. S. L. Random Quantum Circuits Transform Local Noise into Global White Noise. Commun. Math. Phys. 405, 78 (2024).

(PT1)vMorvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).

(PT2) Ware, B. et al. A sharp phase transition in linear cross-entropy benchmarking. arXiv:2305.04954 (2023).

(Spoof1) Barak, B., Chou, C.-N. & Gao, X. Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits. in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (ed. Lee, J. R.) vol. 185 30:1-30:20 (2021).

(Spoof2) Gao, X. et al. Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage. PRX Quantum 5, 010334 (2024).

(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum 1, 8 (2017).

(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).

John PreskillHas quantum advantage been achieved?

Recently, I gave a couple of perspective talks on quantum advantage, one at the annual retreat of the CIQC and one at a recent KITP programme. I started off by polling the audience on who believed quantum advantage had been achieved. Just this one, simple question.

The audience was mostly experimental and theoretical physicists with a few CS theory folks sprinkled in. I was sure that these audiences would be overwhelmingly convinced of the successful demonstration of quantum advantage. After all, more than half a decade has passed since the first experimental claim (G1) of “quantum supremacy” as the patron of this blog’s institute called the idea “to perform tasks with controlled quantum systems going beyond what can be achieved with ordinary digital computers” (Preskill, p. 2) back in 2012. Yes, this first experiment by the Google team may have been simulated in the meantime, but it was only the first in an impressive series of similar demonstrations that became bigger and better with every year that passed. Surely, so I thought, a significant part of my audiences would have been convinced of quantum advantage even before Google’s claim, when so-called quantum simulation experiments claimed to have performed computations that no classical computer could do (e.g. (qSim)).

I could not have been more wrong.

In both talks, less than half of the people in the audience thought that quantum advantage had been achieved.

In the discussions that ensued, I came to understand what folks criticized about the experiments that have been performed and even the concept of quantum advantage to begin with. But more on that later. Most of all, it seemed to me, the community had dismissed Google’s advantage claim because of the classical simulation shortly after. It hadn’t quite kept track of all the advances—theoretical and experimental—since then.

In a mini-series of three posts, I want to remedy this and convince you that the existing quantum computers can perform tasks that no classical computer can do. Let me caution, though, that the experiments I am going to talk about solve a (nearly) useless task. Nothing of what I say implies that you should (yet) be worried about your bank accounts.

I will start off by recapping what quantum advantage is and how it has been demonstrated in a set of experiments over the past few years.

Part 1: What is quantum advantage and what has been done?

To state the obvious: we are now fairly convinced that noiseless quantum computers would be able solve problems efficiently that no classical computer could solve. In fact, we have been convinced of that already since the mid-90ies when Lloyd and Shor discovered two basic quantum algorithms: simulating quantum systems and factoring large numbers. Both of these are tasks where we are as certain as we could be that no classical computer can solve them. So why talk about quantum advantage 20 and 30 years later?

The idea of a quantum advantage demonstration—be it on a completely useless task even—emerged as a milestone for the field in the 2010s. Achieving quantum advantage would finally demonstrate that quantum computing was not just a random idea of a bunch of academics who took quantum mechanics too seriously. It would show that quantum speedups are real: We can actually build quantum devices, control their states and the noise in them, and use them to solve tasks which not even the largest classical supercomputers could do—and these are very large.

What is quantum advantage?

But what exactly do we mean by “quantum advantage”. It is a vague concept, for sure. But some essential criteria that a demonstration should certainly satisfy are probably the following.

  1. The quantum device needs to solve a pre-specified computational task. This means that there needs to be an input to the quantum computer. Given the input, the quantum computer must then be programmed to solve the task for the given input. This may sound trivial. But it is crucial because it delineates programmable computing devices from just experiments on any odd physical system.
  2. There must be a scaling difference in the time it takes for a quantum computer to solve the task and the time it takes for a classical computer. As we make the problem or input size larger, the difference between the quantum and classical solution times should increase disproportionately, ideally exponentially.
  3. And finally: the actual task solved by the quantum computer should not be solvable by any classical machine (at the time).

Achieving this last criterion using imperfect, noisy quantum devices is the challenge the idea of quantum supremacy set for the field. After all, running any of our favourite quantum algorithms in a classically hard regime on these devices is completely out of the question. They are too small and too noisy. So the field had to come up with the conceivably smallest and most noise-robust quantum algorithm that has a significant scaling advantage against classical computation.

Random circuits are really hard to simulate!

The idea is simple: we just run a random computation, constructed in a way that is as favorable as we can make it to the quantum device while being as hard as possible classically. This may strike as a pretty unfair way to come up with a computational task—it is just built to be hard for classical computers without any other purpose. But: it is a fine computational task. There is an input: the description of the quantum circuit, drawn randomly. The device needs to be programmed to run this exact circuit. And there is a task: just return whatever this quantum computation would return. These are strings of 0s and 1s drawn from a certain distribution. Getting the distribution of the strings right for a given input circuit is the computational task.

This task, dubbed random circuit sampling, can be solved on a classical as well as a quantum computer, but there is a (presumably) exponential advantage for the quantum computer. More on that in Part 2.

For now, let me tell you about the experimental demonstrations of random circuit sampling. Allow me to be slightly more formal. The task solved in random circuit sampling is to produce bit strings x{0,1}nx \in \{0,1\}^n distributed according to the Born-rule outcome distribution

pC(x)=|x|C|0|2p_C(x) = | \bra x C \ket {0}|^2

of a sequence of elementary quantum operations (unitary rotations of one or two qubits at a time) which is drawn randomly according to certain rules. This circuit CC is applied to a reference state |0\ket 0 on the quantum computer and then measured, giving the string xx as an outcome.

The breakthrough: classically hard programmable quantum computations in the real world

In the first quantum supremacy experiment (G1) by the Google team, the quantum computer was built from 53 superconducting qubits arranged in a 2D grid. The operations were randomly chosen simple one-qubit gates (X,Y,X+Y\sqrt X, \sqrt Y, \sqrt{X+Y}) and deterministic two-qubit gates called fSim applied in the 2D pattern, and repeated a certain number of times (the depth of the circuit). The limiting factor in these experiments was the quality of the two-qubit gates and the measurements, with error probabilities around 0.6 % and 4 %, respectively.

A very similar experiment was performed by the USTC team on 56 qubits (U1) and both experiments were repeated with better fidelities (0.4 % and 1 % for two-qubit gates and measurements) and slightly larger system sizes (70 and 83 qubits, respectively) in the past two years (G2,U2).

Using a trapped-ion architecture, the Quantinuum team also demonstrated random circuit sampling on 56 qubits but with arbitrary connectivity (random regular graphs) (Q). There, the two-qubit gates were π/2\pi/2-rotations around ZZZ \otimes Z, the single-qubit gates were uniformly random and the error rates much better (0.15 % for both two-qubit gate and measurement errors).

All the experiments ran random circuits on varying system sizes and circuit depths, and collected thousands to millions of samples from a few random circuits at a given size. To benchmark the quality of the samples, the widely accepted benchmark is now the linear cross-entropy (XEB) benchmark defined as

χ=22n𝔼C𝔼xpC(x)1,\chi = 2^{2n} \mathbb E_C \mathbb E_{x} p_C(x) -1 ,

for an nn-qubit circuit. The expectation over CC is over the random choice of circuit and the expectation over xx is over the experimental distribution of the bit strings. In other words, to compute the XEB given a list of samples, you ‘just’ need to compute the ideal probability of obtaining that sample from the circuit CC and average the outcomes.

The XEB is nice because it gives 1 for ideal samples from sufficiently random circuits and 0 for uniformly random samples, and it can be estimated accurately from just a few samples. Under the right conditions, it turns out to be a good proxy for the many-body fidelity of the quantum state prepared just before the measurement.

This tells us that we should expect an XEB score of (1error per gate)# gatescnd(1-\text{error per gate})^{\text{\# gates}} \sim c^{- n d } for some noise- and architecture-dependent constant cc. All of the experiments achieved a value of the XEB that was significantly (in the statistical sense) far away from 0 as you can see in the plot below. This shows that something nontrivial is going on in the experiments, because the fidelity we expect for a maximally mixed or random state is 2n2^{-n} which is less than 101410^{-14} % for all the experiments.

The complexity of simulating these experiments is roughly governed by an exponential in either the number of qubits or the maximum bipartite entanglement generated. Figure 5 of the Quantinuum paper has a nice comparison.

It is not easy to say how much leverage an XEB significantly lower than 1 gives a classical spoofer. But one can certainly use it to judiciously change the circuit a tiny bit to make it easier to simulate.

Even then, reproducing the low scores between 0.05 % and 0.2 % of the experiments is extremely hard on classical computers. To the best of my knowledge, producing samples that match the experimental XEB score has only been achieved for the first experiment from 2019 (PCZ). This simulation already exploited the relatively low XEB score to simplify the computation, but even for the slightly larger 56 qubit experiments these techniques may not be feasibly run. So to the best of my knowledge, the only one of the experiments which may actually have been simulated is the 2019 experiment by the Google team.

If there are better methods, or computers, or more willingness to spend money on simulating random circuits today, though, I would be very excited to hear about it!

Proxy of a proxy of a benchmark

Now, you may be wondering: “How do you even compute the XEB or fidelity in a quantum advantage experiment in the first place? Doesn’t it require computing outcome probabilities of the supposedly hard quantum circuits?” And that is indeed a very good question. After all, the quantum advantage of random circuit sampling is based on the hardness of computing these probabilities. This is why, to get an estimate of the XEB in the advantage regime, the experiments needed to use proxies and extrapolation from classically tractable regimes.

This will be important for Part 2 of this series, where I will discuss the evidence we have for quantum advantage, so let me give you some more detail. To extrapolate, one can just run smaller circuits of increasing sizes and extrapolate to the size in the advantage regime. Alternatively, one can run circuits with the same number of gates but with added structure that makes them classically simulatable and extrapolate to the advantage circuits. Extrapolation is based on samples from different experiments from the quantum advantage experiments. All of the experiments did this.

A separate estimate of the XEB score is based on proxies. An XEB proxy uses the samples from the advantage experiments, but computes a different quantity than the XEB that can actually be computed and for which one can collect independent numerical and theoretical evidence that it matches the XEB in the relevant regime. For example, the Google experiments averaged outcome probabilities of modified circuits that were related to the true circuits but easier to simulate.

The Quantinuum experiment did something entirely different, which is to estimate the fidelity of the advantage experiment by inverting the circuit on the quantum computer and measuring the probability of coming back to the initial state.

All of the methods used to estimate the XEB of the quantum advantage experiments required some independent verification based on numerics on smaller sizes and induction to larger sizes, as well as theoretical arguments.

In the end, the advantage claims are thus based on a proxy of a proxy of the quantum fidelity. This is not to say that the advantage claims do not hold. In fact, I will argue in my next post that this is just the way science works. I will also tell you more about the evidence that the experiments I described here actually demonstrate quantum advantage and discuss some skeptical arguments.


Let me close this first post with a few notes.

In describing the quantum supremacy experiments, I focused on random circuit sampling which is run on programmable digital quantum computers. What I neglected to talk about is boson sampling and Gaussian boson sampling, which are run on photonic devices and have also been experimentally demonstrated. The reason for this is that I think random circuits are conceptually cleaner since they are run on processors that are in principle capable of running an arbitrary quantum computation while the photonic devices used in boson sampling are much more limited and bear more resemblance to analog simulators.

I want to continue my poll here, so feel free to write in the comments whether or not you believe that quantum advantage has been demonstrated (by these experiments) and if not, why.

References

[G1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).

[Preskill] Preskill, J. Quantum computing and the entanglement frontier. arXiv:1203.5813 (2012).

[qSim] Choi, J. et al. Exploring the many-body localization transition in two dimensions. Science 352, 1547–1552 (2016). .

[U1] Wu, Y. et al. Strong Quantum Computational Advantage Using a Superconducting Quantum Processor. Phys. Rev. Lett. 127, 180501 (2021).

[G2] Morvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).

[U2] Gao, D. et al. Establishing a New Benchmark in Quantum Computational Advantage with 105-qubit Zuchongzhi 3.0 Processor. Phys. Rev. Lett. 134, 090601 (2025).

[Q] DeCross, M. et al. Computational Power of Random Quantum Circuits in Arbitrary Geometries. Phys. Rev. X 15, 021052 (2025).

[PCZ] Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).

Scott Aaronson On thugs

Those of us who tried to stop Trump from ever coming to power—and who then tried to stop his return to power—were accused of hysterics, of Trump Derangement Syndrome, when we talked about authoritarianism and the death of liberal democracy. Yet masked government agents summarily executing protesters in the street, under the orders and protection of the president, is now the reality and even the defining image of the United States—or at least the defining image of Minnesota, and the model will soon be exported nationally if it isn’t stopped right now by coast-to-coast revulsion and defiance. Let all those who denied what was happening, or who justified it, including in the comments section of this blog, hang their heads in shame forever.

People will say: but Scott, just recently you wanted Trump to overthrow the gangster regime in Venezuela! You want him, even now, to overthrow the bloodthirsty murderers in Iran! That makes you practically a Trumper yourself! How can you now turn around and condemn him?

Difficult as this might be for many to understand, my position has always been that I’m consistently against all empowered thugs everywhere on earth. If I’m against Trump’s personal thug army executing (so far) two peaceful protesters, then certainly I should be against Ayatollah Khamenei’s thug army executing 20,000 protesters, and Putin’s thug army executing however many it has. I’m also aware that, for Trump and his henchmen like Stephen Miller and Kristi Noem, Khamenei and Putin and the like are models and inspirations. No one can doubt at this point that Miller and Noem would gladly execute ten thousand or ten million peacefully protesting Americans if they expected to get away with it.

When two thugs fight each other, I favor whichever outcome will lead to fewer of the world’s people under thug rule. Or if one thug can still be defeated in an election and the other thug can be defeated only in war, then I favor electoral defeat where it’s possible and military overthrow where it isn’t.

This is a stance, I’ve learned, that will lose you friends. People will say: “I get why you’re against their thugs, but how can you also be against our thugs?” I’m writing this post in the hope that, even if people hate me, at least they won’t be confused. I’m also writing, frankly, in the hope that a few days will go by with me having discharged my moral obligation not to be silent, so then maybe I can do some science.

January 24, 2026

Tommaso DorigoOn The Illusion Of Time And The Strange Economy Of Existence

I recently listened again to Richard Feynman explaining why the flowing of time is probably an illusion. In modern physics time is just a coordinate, on the same footing as space, and the universe can be described as a four-dimensional object — a spacetime block. In that view, nothing really “flows”. All events simply are, laid out in a 4D structure. What we experience as the passage of time is tied instead to the arrow of entropy: the fact that we move through a sequence of states ordered by increasing disorder, and that memory itself is asymmetric.

read more

January 23, 2026

Matt von HippelSchool Facts and Research Facts

As you grow up, teachers try to teach you how the world works. This is more difficult than it sounds, because teaching you something is a much harder goal than just telling you something. A teacher wants you to remember what you’re told. They want you to act on it, and to generalize it. And they want you to do this not just for today’s material, but to set a foundation for next year, and the next. They’re setting you up for progress through a whole school system, with its own expectations.

Because of that, not everything a teacher tells you is, itself, a fact about the world. Some things you hear from teachers are liked the scaffolds on a building. They’re facts that only make sense in the context of school, support that lets you build to a point where you can learn other facts, and throw away the school facts that got you there.

Not every student uses all of that scaffolding, though. The scaffold has to be complete enough that some students can use it to go on, getting degrees in science or mathematics, and eventually becoming researchers where they use facts more deeply linked to the real world. But most students don’t become researchers. So the scaffold sits there, unused. And many people, as their lives move on, mistake the scaffold for the real world.

Here’s an example. How do you calculate something like this?

3+4\div (3-1)\times 5

From school, you might remember order of operations, or PEMDAS. First parentheses, then exponents, multiplication, division, addition, and finally subtraction. If you ran into that calculation in school, you could easily work it out.

But out of school, in the real world? Trick question, you never calculate something like that to begin with.

When I wrote this post, I had to look up how to write \div and \times. In the research world, people are far more likely to run into calculations like this:

3+5\frac{4}{3-1}

Here, it’s easier to keep track of what order you need to do things. In other situations, you might be writing a computer program (or an Excel spreadsheet formula, which is also a computer program). Then you follow that programming language’s rules for order of operations, which may or may not match PEMDAS.

PEMDAS was taught to you in school for good reason. It got you used to following rules to understand notation, and gave you tools the teachers needed to teach you other things. But it isn’t a fact about the universe. It’s a fact about school.

Once you start looking around for these “school facts”, they show up everywhere.

Are there really “three states of matter”, solid, liquid, and gas? Or four, if you add plasma? Well, sort of. There are real scientific definitions for solids, liquids, gases, and plasmas, and they play a real role in how people model big groups of atoms, “matter” in a quite specific sense. But they can’t be used to describe literally everything. If you start asking what state of matter light or spacetime is, you’ve substituted a simplification that was useful for school (“everything is one of three states of matter”) for the actual facts in the real world.

If you remember a bit further, maybe you remember there are two types of things, matter and energy? You might have even heard that matter and antimatter annihilate into energy. These are also just school facts, though. “Energy” isn’t something things are made of, it’s a property things have. Instead, your teachers were building scaffolding for understanding the difference between massive and massless particles, or between dark matter and dark energy. Each of those uses different concepts of matter and energy, and each in turn is different than the concept of matter in its states of solid, liquid, and gas. But in school, you need a consistent scaffold to learn, not a mess of different definitions for different applications. So unless you keep going past school, you don’t learn that.

Physics in school likes to work with forces, and forces do sometimes make an appearance in the real world, for example for engineers. But if you’re asking a question about fundamental physics, like “is gravity really a force?”, then you’re treating a school fact as if it was a research fact. Fundamental physics doesn’t care about forces in the same way. It uses different mathematical tools, like Lagrangians and Hamiltonians, to calculate the motion of objects in systems, and uses “force” in a pop science way to describe fundamental interactions.

If you get good enough at this, you can spot which things you learned in school were likely just scaffolding “school facts”, and which are firm enough that they may hold further. Any simple division of the world into categories is likely a school fact, one that let you do exercises on your homework but gets much more complicated when the real world gets involved. Contradictory or messy concepts are usually another sign, showing something fuzzy used to get students comfortable rather than something precise enough for professionals to use. Keep an eye out, and even if you don’t yet know the real facts, you’ll know enough to know what you’re missing.

Tommaso DorigoRIP - Hans Jensen

Today I was saddened to hear of the passing of Hans Jensen, a physicist and former colleague in the CDF experiment at Fermilab. There is an obituary page here with nice pics and a bio if you want detail on his interesting, accomplished life. Here I thought I would remember him by pasting an excerpt of my 2016 book, "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", where he is featured. The topic of the anecdote is the data collection for the top quark search. The date is December 1992.
--- 

read more

January 21, 2026

Tommaso Dorigo2026 Plans

This year opened in slow motion for me, at least work-wise. I have been on parental leave since December 16, when my third and fourth sons were born within one minute from one another, but of course a workaholic can never stand completely still. In fact, even as we speak I am sitting and typing at the keyboard with my right hand only (about 3-4 letters per second), while I hold Alessandro with the left one on my lap and I move my legs rythmically to keep him entertained.

read more

January 19, 2026

Jordan EllenbergIt’s a good old-fashioned Wisconsin booger-freezer out there!

One of my favorite weather sayings. And I ran into a neighbor on my way home from the errand I was running so I got to say it in person, always a treat. 3 degrees Fahrenheit, sunny and windy in Madison right now, really not bad at all. When I moved here I thought Wisconsin winter was going to be a fifth season, a new thing that would test and harden me, but actually there are only a few days like today which are really a different kind of cold than I grew up with on the East Coast, and I’ll be honest, I enjoy them. The cold air reboots your head. I am wearing my biggest, most shapeless sweater, and a scarf in the pattern of the flag of Schiermonnikoog.

If you liked this post you might like Tom Scocca’s weather reviews. I don’t think Tom has ever experienced a good old-fashioned Wisconsin booger-freezer, though.

(Update to this post. “A good old-fashioned something or other” isn’t both good and old-fashioned, it’s a thing characterized as “good old,” which is a semantic unit on its own, and is also old-fashioned. It means something like “good old old-fashioned Wisconsin booger-freezer,” which of course you would not say. Are there any other examples like this, where two two-word phrases which overlap combine into a three-word phrase? It’s kind of like composable morphisms in a category.)

Terence TaoRogers’ theorem on sieving

A basic problem in sieve theory is to understand what happens when we start with the integers {{\bf Z}} (or some subinterval of the integers) and remove some congruence classes {a_i \pmod{q_i}} for various moduli {q_i}. Here we shall concern ourselves with the simple setting where we are sieving the entire integers rather than an interval, and are only removing a finite number of congruence classes {a_1 \pmod{q_1}, \ldots, a_k \pmod{q_k}}. In this case, the set of integers that remain after the sieving is periodic with period {Q = \mathrm{lcm}(q_1,\dots,q_k)}, so one work without loss of generality in the cyclic group {{\bf Z}/Q{\bf Z}}. One can then ask: what is the density of the sieved set

\displaystyle  \{ n \in {\bf Z}/Q{\bf Z}: n \neq a_i \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}? \ \ \ \ \ (1)

If the {q_i} were all coprime, then it is easy to see from the Chinese remainder theorem that the density is given by the product

\displaystyle  \prod_{i=1}^k \left(1 - \frac{1}{q_i}\right).

However, when the {q_i} are not coprime, the situation is more complicated. One can use the inclusion-exclusion formula to get a complicated expression for the density, but it is not easy to work with. Sieve theory also supplies one with various useful upper and lower bounds (starting with the classical Bonferroni inequalities), but do not give exact formulae.

In this blog post I would like to note one simple fact, due to Rogers, that one can say about this problem:

Theorem 1 (Rogers’ theorem) For fixed {q_1,\dots,q_k}, the density of the sieved set is maximized when all the {a_i} vanish. Thus,

\displaystyle  |\{ n \in {\bf Z}/Q{\bf Z}: n \neq a_i \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}|

\displaystyle \leq |\{ n \in {\bf Z}/Q{\bf Z}: n \neq 0 \hbox{ mod } q_i \hbox{ for all } i=1,\ldots,k \}|.

Example 2 If one sieves out {1 \pmod{2}}, {1 \pmod{3}}, and {2 \pmod{6}}, then only {0 \pmod{6}} remains, giving a density of {1/6}. On the other hand, if one sieves out {0 \pmod{2}}, {0 \pmod{3}}, and {0 \pmod{6}}, then the remaining elements are {1} and {5 \pmod{6}}, giving the larger density of {2/6}.

This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!

Modern readers may see some similarities between Rogers’ theorem and various rearrangement or monotonicity inequalites, suggesting that the result may be proven by some sort of “symmetrization” or “compression” method. This is indeed the case, and is basically Rogers’ original proof. We can modernize a bit as follows. Firstly, we can abstract {{\bf Z}/Q{\bf Z}} into a finite cyclic abelian group {G}, with residue classes now becoming cosets of various subgroups of {G}. We can take complements and restate Rogers’ theorem as follows:

Theorem 3 (Rogers’ theorem, again) Let {a_1+H_1, \dots, a_k+H_k} be cosets of a finite cyclic abelian group {G}. Then

\displaystyle  |\bigcup_{j=1}^k a_j + H_j| \geq |\bigcup_{j=1}^k H_j|.

Example 4 Take {G = {\bf Z}/6{\bf Z}}, {H_1 = 2{\bf Z}/6{\bf Z}}, {H_2 = 3{\bf Z}/6{\bf Z}}, and {H_3 = 6{\bf Z}/6{\bf Z}}. Then the cosets {1 + H_1}, {1 + H_2}, and {2 + H_3} cover the residues {\{1,2,3,4,5\}}, with a cardinality of {5}; but the subgroups {H_1,H_2,H_3} cover the residues {\{0,2,3,4\}}, having the smaller cardinality of {4}.

Intuitively: “sliding” the cosets {a_i+H_i} together reduces the total amount of space that these cosets occupy. As pointed out in comments, the requirement of cyclicity is crucial; four lines in a finite affine plane already suffice to be a counterexample otherwise.

By factoring the cyclic group into p-groups, Rogers’ theorem is an immediate consequence of two observations:

Theorem 5 (Rogers’ theorem for cyclic groups of prime order) Rogers’ theorem holds when {G = {\bf Z}/p^n {\bf Z}} for some prime power {p^n}.

Theorem 6 (Rogers’ theorem preserved under products) If Rogers’ theorem holds for two finite abelian groups {G_1, G_2} of coprime orders, then it also holds for the product {G_1 \times G_2}.

The case of cyclic groups of prime order is trivial, because the subgroups of {G} are totally ordered. In this case {\bigcup_{j=1}^k H_j} is simply the largest of the {H_j}, which has the same size as {a_j + H_j} and thus has lesser or equal cardinality to {\bigcup_{j=1}^k a_j + H_j}.

The preservation of Rogers’ theorem under products is also routine to verify. By the coprime orders of {G_1,G_2} and standard group theoretic arguments (e.g., Goursat’s lemma, the Schur–Zassenhaus theorem, or the classification of finite abelian groups), one can see that any subgroup {H_j} of {G_1 \times G_2} splits as a direct product {H_j = H_{j,1} \times H_{j,2}} of subgroups of {G_1,G_2} respectively, so the cosets {a_j + H_j} also split as

\displaystyle  a_j + H_j = (a_{j,1} + H_{j,1}) \times (a_{j,2} + H_{j,2}).

Applying Rogers’ theorem for {G_2} to each “vertical slice” of {G_1 \times G_2} and summing, we see that

\displaystyle  |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times (a_{j,2} + H_{j,2})| \geq |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times H_{j,2}|

and then applying Rogers’ theorem for {G_1} to each “horizontal slice” of {G_1 \times G_2} and summing, we obtain

\displaystyle  |\bigcup_{j=1}^k (a_{j,1} + H_{j,1}) \times H_{j,2}| \geq |\bigcup_{j=1}^k H_{j,1} \times H_{j,2}|.

Combining the two inequalities, we obtain the claim.

Scott Aaronson I Had A Dream

Alas, the dream that I had last night was not the inspiring, MLK kind of dream, even though tomorrow happens to be the great man’s day.  No, I had the literal kind of dream, where everything seems real but then you wake up and remember only the last fragments.

In my case, those last fragments involved a gray-haired bespectacled woman, a fellow CS professor.  She and I were standing in a dimly lit university building.  And she was grabbing me by the shoulders, shaking me.

“Look, Scott,” she was saying, “we’re both computer scientists.  We were both around in the 90s.  You know as well as I do that, if someone claims to have built an AI, but it turns out they just loaded a bunch of known answers, written by humans, into a lookup table, and then they search the table when a question comes … that’s not AI.  It’s slop.  It’s garbage.”

“But…” I interjected.

“Oh of course,” she continued, “so you make the table bigger.  What do you have now?  More slop!  More garbage!  You load the entire Internet into the table.  Now you have an astronomical-sized piece of garbage!”

“I mean,” I said, “there’s an exponential blowup in the number of possible questions, which can only be handled by…”

“Of course,” she said impatiently, “I understand as well as anyone.  You train a neural net to predict a probability distribution over the next token.  In other words, you slice up and statistically recombine your giant lookup table to disguise what’s really going on.  Now what do you get?  You get the biggest piece of garbage the world has ever seen.  You get a hideous monster that’s destroying and zombifying our entire civilization … and that still understands nothing more than the original lookup table did.”

“I mean, you get a tool that hundreds of millions of people now use every day—to write code, to do literature searches…”

By this point, the professor was screaming at me, albeit with a pleading tone in her voice.  “But no one who you respect uses that garbage! Not a single one!  Go ahead and ask them: scientists, mathematicians, artists, creators…”

I use it,” I replied quietly.  “Most of my friends use it too.”

The professor stared at me with a new, wordless horror.  And that’s when I woke up.

I think I was next going to say something about how I agreed that generative AI might be taking the world down a terrible, dangerous path, but how dismissing the scientific and philosophical immensity of what’s happened, by calling it “slop,” “garbage,” etc., is a bad way to talk about the danger. If so, I suppose I’ll never know how the professor would’ve replied to that. Though, if she was just an unintegrated part of my own consciousness—or a giant lookup table that I can query on demand!—perhaps I could summon her back.

Mostly, I remember being surprised to have had a dream that was this coherent and topical. Normally my dreams just involve wandering around lost in an airport that then transforms itself into my old high school, or something.

John BaezDante and the 3-Sphere

Apparently Dante conceived of the universe as a 3-sphere! That’s a 3-dimensional space formed by taking two solid 3-dimensional balls and completely gluing their surfaces together.

In his Divine Comedy, Dante describes the usual geocentric universe of his day. It has concentric spheres for the Moon and Sun, the various planets, and then the so-called ‘fixed stars’. Outside the sphere of fixed stars, there’s a sphere for the ‘first mover’, the Primum Mobile. Ptolemy believed in this, and so did Copernicus—and even Galileo did, at first.

But that’s not all! Outside that sphere, Dante describes 9 concentric spheres of the Empyrean, where various levels of angel live. And as we go up into the Empyrean, these spheres get smaller. They all surround a point—which is God. This is shown above in an illustration by Gustav Doré.

At the opposite extreme, at the center of the Earth, is another point — and that’s where Satan lives, surrounded by the 9 levels of Hell.

Altogether we have a 3-dimensional closed universe of the sort mathematicians call a 3-sphere! You can also think of it as the one-point compactification of 3d Euclidean space with God as the point at infinity and Satan at the farthest point from that: the origin.

Much later Einstein also postulated that the universe was a 3-sphere, which was kept from collapsing by the cosmological constant. This was before Hubble and others saw that the universe is expanding. General relativity also allows space to be a 3-sphere that expands with time and then recollapses in a Big Crunch, but that model doesn’t seem to fit the data very well.

Here are a couple of good references on this subject:

• Mark A. Peterson, Dante and the 3-sphere, American Journal of Physics 47 (1979), 1031–1035.

• Matthew Blair, Points and Spheres: Cosmological Innovation in Dante’s Divine Comedy, Senior Thesis, Baylor University, 2015.

Let me quote the first:

In the Paradiso Dante describes his ascent sphere by sphere through the Aristotelian universe to the Primum Mobile. Beyond this is the Empyrean, the abode of God and the angels. The conventional picture of the Empyrean seems to have been rather vague, geometrically speaking. In diagrams of the universe, for example, it was represented by the border area, outside the Primum Mobile, often richly populated with angelic beings. Dante, however, endows the Empyrean with a detailed and precise geometric structure. This structure is described in Canto 28, as if seen from the Primum Mobile, as a bright Point representing God, surrounded by nine concentric spheres representing the various angelic orders. The details which follow leave the almost inescapable impression that he conceives of these nine angelic spheres as forming one hemisphere of the entire universe and the usual Aristotelian universe up to the Primum Mobile as the other hemisphere, while he is standing more or less on the equator between them [….] Taken all together, then, his universe is a 3-sphere.

[….]

Dante himself believed he was expressing something entirely new at this juncture.

[….]

Dante’s elation with this idea—a feeling we may readily share — has traditionally left readers somewhat puzzled. That is just another way of saying that if this passage is not taken as a description of the organization of 2-spheres into a 3-sphere, then it is hard to see what the point of it is.

January 18, 2026

Jordan EllenbergDick Gross

Dick Gross died last month. Much has already been written, and during his lifetime too, about his immense contributions to modern number theory. So I won’t try to cover that here — don’t need to. I want to say something about him as a teacher. When I was an undergraduate, I took Math 126, Representation Theory of Finite Groups, from Dick. It was the kind of course a lot of the most ambitious math majors didn’t take, because if you’d already taken the two algebra courses, 122 and 123, you were eligible to jump to the graduate course, and why wouldn’t you do that as fast as possible? Dick Gross was why. I had enjoyed the math courses I’d taken, and did fine in them, but learning about character tables from Dick, experiencing his visible joy as we built everything up from scratch, was a mathematical experience that was completely new to me. I had understood math to be something you did because you were really good at it, and had already learned a lot about it, and thus had some possibility of achieving something. From Dick I learned that mathematics was something you did because it’s the most fun thing in the entire world, and all achievement you might attain is downstream from the fun.

Memories of Dick from the Harvard Math Department.

A reminiscence from Persiflage.

Scott Aaronson Scott A. on Scott A. on Scott A.

Scott Alexander has put up one of his greatest posts ever, a 10,000-word eulogy to Dilbert creator Scott Adams, of which I would’ve been happy to read 40,000 words more. In it, Alexander trains a microscope on Adams’ tragic flaws as a thinker and human being, but he adds:

In case it’s not obvious, I loved Scott Adams.

Partly this is because we’re too similar for me to hate him without hating myself.

And:

Adams was my teacher in a more literal way too. He published several annotated collections, books where he would present comics along with an explanation of exactly what he was doing in each place, why some things were funny and others weren’t, and how you could one day be as funny as him. Ten year old Scott devoured these … objectively my joke posts get the most likes and retweets of anything I write, and I owe much of my skill in the genre to cramming Adams’ advice into a malleable immature brain.

When I first heard the news that Scott Adams had succumbed to cancer, I posted something infinitely more trivial on my Facebook. I simply said:

Scott Adams (who reigned for decades as the #1 Scott A. of the Internet, with Alexander as #2 and me as at most #3) was a hateful asshole, a nihilist, and a crank. And yet, even when reading the obituaries that explain what an asshole, nihilist, and crank he was, I laugh whenever they quote him.

Inspired by Scott Alexander, I’d like now to try again, to say something more substantial. As Scott Alexander points out, Scott Adams’ most fundamental belief—the through-line that runs not only through Dilbert but through all his books and blog posts and podcasts—was that the world is ruled by idiots. The pointy-haired boss always wins, spouting about synergy and the true essence of leadership, and the nerdy Dilberts always lose. Trying to change minds by rational argument is a fools’ errand, as “master persuaders” and skilled hypnotists will forever run rings around you. He, Scott Adams, is cleverer than everyone else, among other things because he realizes all this—but even he is powerless to change it.

Or as Adams put it in The Dilbert Principle:

It’s useless to expect rational behavior from the people you work with, or anybody else for that matter. If you can come to peace with the fact that you’re surrounded by idiots, you’ll realize that resistance is futile, your tension will dissipate, and you can sit back and have a good laugh at the expense of others.

The thing is, if your life philosophy is that the world is ruled by idiots, and that confident charlatans will always beat earnest nerds, you’re … often going to be vindicated by events. Adams was famously vindicated back in 2015, when he predicted Trump’s victory in the 2016 election (since Trump, you see, was a “master persuader”), before any other mainstream commentator thought that Trump even had a serious chance of winning the Republican nomination.

But if you adopt this worldview, you’re also often going to be wrong—as countless of Adams’ other confident predictions were (see Scott Alexander’s post for examples), to say nothing of his scientific or moral views.

My first hint that the creator of Dilbert was not a reliable thinker, was when I learned of his smugly dismissive view of science. One of the earliest Shtetl-Optimized posts, way back in 2006, was entitled Scott A., disbeliever in Darwinism. At that time, Adams’ crypto-creationism struck me as just some bizarre, inexplicable deviation. I’m no longer confused about it: on the one hand, Scott Alexander’s eulogy shows just how much deeper the crankishness went, how Adams also gobbled medical misinformation, placed his own cockamamie ideas about gravity on par with general relativity, etc. etc. But Alexander succeeds in reconciling all this with Adams’ achievements: it’s all just consequences from the starting axiom that the world is ruled by morons, and that he, Scott Adams, is the only one clever enough to see through it all.


Is my epistemology any different? Do I not also look out on the world, and see idiots and con-men and pointy-haired bosses in every direction? Well, not everywhere. At any rate, I see far fewer of them in the hard sciences.

This seems like a good time to say something that’s been a subtext of Shtetl-Optimized for 20 years, but that Scott Alexander has inspired me to make text.

My whole worldview starts from the observation that science works. Not perfectly, of course—working in academic science for nearly 30 years, I’ve had a close-up view of the flaws—but the motor runs. On a planet full of pointy-haired bosses and imposters and frauds, science nevertheless took us in a few centuries from wretchedness and superstition to walking on the moon and knowing the age of the universe and the code of life.

This is the point where people always say: that’s all well and good, but you can’t derive ought from is, and science, for all its undoubted successes, tells us nothing about what to value or how to live our lives.

To which I reply: that’s true in a narrow sense, but it dramatically understates how far you can get from the “science works” observation.

As one example, you can infer that the people worth listening to are the people who speak and write clearly, who carefully distinguish what they know from what they don’t, who sometimes change their minds when presented with opposing views and at any rate give counterarguments—i.e., who exemplify the values that make science work. The political systems worth following are the ones that test their ideas against experience, that have built-in error-correction mechanisms, that promote people based on ability rather than loyalty—the same things that make scientific institutions work, insofar as they do work. And of course, if the scientists who study X are nearly unanimous in saying that a certain policy toward X would be terrible, then we’d better have a damned good reason to pursue the policy anyway. This still leaves a wide range of moral and political views on the table, but it rules out virtually every kind of populism, authoritarianism, and fundamentalism.

Incidentally, this principle—that one’s whole moral and philosophical worldview should grow out of the seed of science working—is why, from an early age, I’ve reacted to every kind of postmodernism as I would to venomous snakes. Whenever someone tells me that science is just another narrative, a cultural construct, a facade for elite power-seeking, etc., to me they might as well be O’Brien from 1984, in the climactic scene where he tortures Winston Smith into agreeing that 2+2=5, and that the stars are just tiny dots a few miles away if the Party says they are. Once you can believe absurdities, you can justify atrocities.

Scott Adams’ life is interesting to me in that shows exactly how far it’s possible to get without internalizing this. Yes, you can notice that the pointy-haired boss is full of crap. You can make fun of the boss. If you’re unusually good at making fun of him, you might even become a rich, famous, celebrated cartoonist. But you’re never going to figure out any ways of doing things that are systematically better than the pointy-haired boss’s ways, or even recognize the ways that others have found. You’ll be in error far more often than in doubt. You might even die of prostate cancer earlier than necessary, because you listen to medical crackpots and rely on ivermectin, turning to radiation and other established treatments only after having lost crucial time.


Scott Adams was hardly the first great artist to have tragic moral flaws, or to cause millions of his fans to ask whether they could separate the artist from the art. But I think he provides one of the cleanest examples where the greatness and the flaws sprang from the same source: namely, overgeneralization from the correct observation that “the world is full of idiots,” in a way that leaves basically no room even for Darwin or Einstein, and so inevitably curdles over time into crankishness, bitterness, and arrogance. May we laugh at Scott Adams’ cartoons and may we learn from his errors, both of which are now permanent parts of the world’s heritage.

Terence TaoThe integrated explicit analytic number theory network

Like many other areas of modern analysis, analytic number theory often relies on the convenient device of asymptotic notation to express its results. It is common to use notation such as {X = O(Y)} or {X \ll Y}, for instance, to indicate a bound of the form {|X| \leq C Y} for some unspecified constant {C}. Such implied constants {C} vary from line to line, and in most papers, one does not bother to compute them explicitly. This makes the papers easier both to write and to read (for instance, one can use asymptotic notation to conceal a large number of lower order terms from view), and also means that minor numerical errors (for instance, forgetting a factor of two in an inequality) typically has no major impact on the final results. However, the price one pays for this is that many results in analytic number theory are only true in asymptotic sense; a typical example is Vinogradov’s theorem that every sufficiently large odd integer can be expressed as the sum of three primes. In the first few proofs of this theorem, the threshold for “sufficiently large” was not made explicit.

There is however a small portion of the analytic number theory devoted to explicit analytic number theory estimates, in which all constants are made completely explicit (and many lower order terms are retained). For instance, whereas the prime number theorem asserts that the prime counting function {\pi(x)} is asymptotic to the logarithmic integral {\mathrm{Li}(x) = \int_2^x \frac{dt}{\log t}}, in this recent paper of Fiori, Kadiri, and Swidinsky the explicit estimate

\displaystyle  |\pi(x) - \mathrm{Li}(x)| \leq 9.2211 x \sqrt{\log x} \exp(- 0.8476 \sqrt{\log x} )

is proven for all {x \geq 2}.

Such explicit results follow broadly similar strategies of proof to their non-explicit counterparts, but require a significant amount of careful book-keeping and numerical optimization; furthermore, any given explicit analytic number theory paper is likely to rely on the numerical results obtained in previous explicit analytic number theory papers. While the authors make their best efforts to avoid errors and build in some redundancy into their work, there have unfortunately been a few cases in which an explicit result stated in the published literature contained numerical errors that placed the numerical constants in several downstream applications of these papers into doubt.

Because of this propensity for error, updating any given explicit analytic number theory result to take into account computational improvements in other explicit results (such as zero-free regions for the Riemann zeta function) is not done lightly; such updates occur on the timescale of decades, and only by a small number of specialists in such careful computations. This leads to authors needing such explicit results to often be forced to rely on papers that are a decade or more out of date, with constants that they know in principle can be improved by inserting more recent explicit inputs, but do not have the domain expertise to confidently update all the numerical coefficients.

To me, this situation sounds like an appropriate application of modern AI and formalization tools – not to replace the most enjoyable aspects of human mathematical research, but rather to allow extremely tedious and time-consuming, but still necessary, mathematical tasks to be offloaded to semi-automated or fully automated tools.

Because of this, I (acting in my capacity as Director of Special Projects at IPAM) have just launched the integrated explicit analytic number theory network, a project partially hosted within the existing “Prime Number Theorem And More” (PNT+) formalization project. This project will consist of two components. The first is a crowdsourced formalization project to formalize a number of inter-related explicit analytic number theory results in Lean, such as the explicit prime number theorem of Fiori, Kadiri, and Swidinsky mentioned above; already some smaller results have been largely formalized, and we are making good progress (especially with the aid of modern AI-powered autoformalization tools) on several of the larger papers. The second, which will be run at IPAM with the financial and technical support of Math Inc., will be to extract from this network of formalized results an interactive “spreadsheet” of a large number of types of such estimates, with the ability to add or remove estimates from the network and have the numerical impact of these changes automatically propagate to other estimates in the network, similar to how changing one cell in a spreadsheet will automatically update other cells that depend on it. For instance, one could increase or decrease the numerical threshold to which the Riemann hypothesis is verified, and see the impact of this change on the explicit error terms in the prime number theorem; or one could “roll back” all the literature to a given date, and see what the best estimates on various analytic number theory expressions could still be derived from the literature available at that date. Initially, this spreadsheet will be drawn from direct adaptations of the various arguments from papers formalized within the network, but in a more ambitious second stage of the project we plan to use AI tools to modify these arguments to find more efficient relationships between the various numerical parameters than were provided in the source literature.

These more ambitious outcomes will likely take several months before a working proof of concept can be demonstrated; but in the near term I will be grateful for any contributions to the formalization side of the project, which is being coordinated on the PNT+ Zulip channel and on the github repository. We are using a github issues based system to coordinate the project, similar to how it was done for the Equational Theories Project. Any volunteer can select one of the outstanding formalization tasks on the Github issues page and “claim” it as a task to work on, eventually submitting a pull request (PR) to the repository when proposing a solution (or to “disclaim” the task if for whatever reason you are unable to complete it). As with other large formalization projects, an informal “blueprint” is currently under construction that breaks up the proofs of the main results of several explicit analytic number theory papers into bite-sized lemmas and sublemmas, most of which can be formalized independently without requiring broader knowledge of the arguments from the paper that the lemma was taken from. (A graphical display of the current formalization status of this blueprint can be found here. At the current time of writing, many portions of the blueprint are disconnected from each other, but as the formalization progresses, more linkages should be created.)

One minor innovation implemented in this project is to label each task by a “size” (ranging from XS (extra small) to XL (extra large)) that is a subjective assessment of the task difficulty, with the tasks near the XS side of the spectrum particularly suitable for beginners to Lean.

We are permitting AI use in completing the proof formalization tasks, though we require the AI use to be disclosed, and that the code is edited by humans to remove excessive bloat. (We expect some of the AI-generated code to be rather inelegant; but no proof of these explicit analytic number theory estimates, whether human-generated or AI-generated, is likely to be all that pretty or worth reading for its own sake, so the downsides of using AI-generated proofs here are lower than in other use cases.) We of course require all submissions to typecheck correctly in Lean through Github’s Continuous Integration (CI) system, so that any incorrect AI-generated code will be rejected. We are also cautiously experimenting with ways in which AI can also automatically or semi-automatically generate the formalized statements of lemmas and theorems, though here one has to be significantly more alert to the dangers of misformalizing an informally stated result, as this type of error cannot be automatically detected by a proof assistant.

We also welcome suggestions for additional papers or results in explicit analytic number theory to add to the network, and will have some blueprinting tasks in addition to the formalization tasks to convert such papers into a blueprinted sequence of small lemmas suitable for individual formalization.

Update: a personal log documenting the project may be found here.

Jordan EllenbergIf one is not to make a mess of life

“One must be fond of people and trust them if one is not to make a mess of life, and it is therefore essential that they should not let one down.” From E.M. Forster, “What I Believe.” Amen, amen, again and again. My mother-in-law just sent me this essay, saying it was inspiring, and it really is.

I just read Unacceptable, a really well-reported book about the college admissions fraud scandal of the late 2010s, perpetrated by Rick Singer. It’s a sad book. Sad because it shows just how vulnerable every social system is to people who do not mind lying straight to your face. And yet: a society that was proof against lying isn’t one you’d want to live in. As Forster says, you have to have some trust, even if that trust is sometimes mistaken. Otherwise what are you? Not a person among people, but an animal curled in a corner with its fists up to protect its face. (This image I realize I have just plucked from the end of William Sleator’s House of Stairs. How has every single other piece of young adult fiction that’s even vaguely dystopian been spun into muscly multi-platform IP and not House of Stairs, the greatest and most upsetting of them all?)

There are people who take the other side of the argument with Forster. People who truly enjoy lying, and who don’t mind being lied to, who indeed welcome being lied to because being lied to means you have something worth stealing. Who think trust is for chumps. Who think truth is just whatever the most skillful liars, the game’s rightful winners, can get people to repeat and eventually to believe.

I am not blind to the attraction. It is a clean, simple model and it excuses one of a lot. But It is a depraved mode of existence. This is something I am not going to be a relativist about. Please trust people to the extent that you can.

Rick Singer is out of prison and back doing college counseling, by the way.

January 17, 2026

Jordan EllenbergThe encylopedia of triangle centers

72,000 different notions of the center of a triangle. I knew the incenter, the circumcenter, and the centroid. Those are indeed the first three. The fourth is the orthocenter. I guess, when I think about it, I knew the three altitudes were concurrent so I knew this existed. The fifth is the nine-point center, which I’m pretty sure I’ve never heard of. It’s the midpoint between the circumcenter and the orthocenter, though. There are 70,000 more or so. I am happy this exists.

January 16, 2026

Matt von HippelA Paper With a Bluesky Account

People make social media accounts for their pets. Why not a scientific paper?

Anthropologist Ed Hagen made a Bluesky account for his recent preprint, “Menopause averted a midlife energetic crisis with help from older children and parents: A simulation study.” The paper’s topic itself is interesting (menopause is surprisingly rare among mammals, he has a plausible account as to why), but not really the kind of thing I cover here.

Rather, it’s his motivation that’s interesting. Hagen didn’t make the account out of pure self-promotion or vanity. Instead, he’s promoting it as a novel approach to scientific publishing. Unlike Twitter, Bluesky is based on an open, decentralized protocol. Anyone can host an account compatible with Bluesky on their own computer, and anyone with the programming know-how can build a computer program that reads Bluesky posts. That means that nothing actually depends on Bluesky, in principle: the users have ultimate control.

Hagen’s idea, then, is that this could be a way to fulfill the role of scientific journals without channeling money and power to for-profit publishers. If each paper is hosted on a scientist’s own site, the papers can link to each other via following each other. Scientists on Bluesky can follow or like the paper, or comment on and discuss it, creating a way to measure interest from the scientific community and aggregate reviews, two things journals are supposed to cover.

I must admit, I’m skeptical. The interface really seems poorly-suited for this. Hagen’s paper’s account is called @menopause-preprint.edhagen.net. What happens when he publishes another paper on menopause, what will he call it? How is he planning to keep track of interactions from other scientists with an account for every single paper, won’t swapping between fifteen Bluesky accounts every morning get tedious? Or will he just do this with papers he wants to promote?

I applaud the general idea. Decentralized hosting seems like a great way to get around some of the problems of academic publishing. But this will definitely take a lot more work, if it’s ever going to be viable on a useful scale.

Still, I’ll keep an eye on it, and see if others give it a try. Stranger things have happened.

January 15, 2026

Scott Aaronson FREEDOM (while hoping my friends stay safe)

This deserves to become one of the iconic images of human history, alongside the Tank Man of Tiananmen Square and so forth.

Here’s Sharifi Zarchi, a computer engineering professor at Sharif University in Tehran, posting on Twitter/X: “Ali Khamenei is not my leader.”

Do you understand the balls of steel this takes? If Professor Zarchi can do this—if hundreds of thousands of young Iranians can take to the streets even while the IRGC and the Basij fire live rounds at them—then I can certainly handle people yelling me on this blog!

I’m in awe of the Iranian people’s courage, and hope I’d have similar courage in their shoes.

I was also enraged this week at the failure of much of the rest of the world to help, to express solidarity, or even to pay much attention to the Iranian’s people plight (though maybe that’s finally changing this weekend).

I’ve actually been working on a CS project with a student in Tehran. Because of the Internet blackout, I haven’t heard from him in days. I pray that he’s safe. I pray that all my friends and colleagues in Iran, and their family members, stay safe and stay strong.

If any Iranian Shtetl-Optimized reader manages to get onto the Internet, and would like to share an update—anonymously if desired, of course—we’d all be obliged.

May the Iranian people be free from tyranny soon.

Update: I’m sick with fear for my many colleagues and friends in Iran and their families. I hope they’re still alive; because of the communications blackout, I have no idea. Perhaps 12,000 have already been machine-gunned in the streets while the unjust world, the hypocrites and cowards who marched against a tiny democracy for defending itself—they invent excuses or explicitly defend the murderous regime in Tehran. WTF is the US waiting for? Trump’s “red line” was crossed days ago. May we give the Ayatollah the martyrdom he preaches, and liberate his millions of captives.

Doug NatelsonWhat is the Kondo effect?

The Kondo effect is a neat piece of physics, an archetype of a problem involving strong electronic correlations and entanglement, with a long and interesting history and connections to bulk materials, nanostructures, and important open problems.  

First, some stage setting.  In the late 19th century, with the development of statistical physics and the kinetic theory of gases, and the subsequent discovery of electrons by JJ Thomson, it was a natural idea to try modeling the electrons in solids as a gas, as done by Paul Drude in 1900.  Being classical, the Drude model misses a lot (If all solids contain electrons, why aren't all solids metals?  Why is the specific heat of metals orders of magnitudes lower than what a classical electron gas would imply?), but it does introduce the idea of electrons as having an elastic mean free path, a typical distance traveled before scattering off something (an impurity? a defect?) into a random direction.  In the Drude picture, as \(T \rightarrow 0\), the only thing left to scatter charge carriers is disorder ("dirt"), and the resistivity of a conductor falls monotonically and approaches \(\rho_{0}\), the "residual resistivity", a constant set in part by the number of defects or impurities in the material.  In the semiclassical Sommerfeld model, and then later in nearly free electron model, this idea survives.

Resistivity growing at low \(T\)
for gold with iron impurities, fig 
One small problem:  in the 1930s (once it was much easier to cool materials down to very low temperatures), it was noticed that in many experiments (here and here, for example) the electrical resistivity of metals did not seem to fall and then saturate at some \(\rho_{0}\).  Instead, as \(T \rightarrow 0\), \(\rho(T)\) would go through a minimum and then start increasing again, approximately like \(\delta \rho(T) \propto - \ln(T/T_{0})\), where \(T_{0}\) is some characteristic temperature scale.  This is weird and problematic, especially since the logarithm formally diverges as \(T \rightarrow 0\).   

Over time, it became clear that this phenomenon was associated with magnetic impurities, atoms that have unpaired electrons typically in \(d\) orbitals, implying that somehow the spin of the electrons was playing an important role in the scattering process.  In 1964, Jun Kondo performed the definitive perturbative treatment of this problem, getting the \(\ln T\) divergence.  

[Side note: many students learning physics are at least initially deeply uncomfortable with the idea of approximations (that many problems can't be solved analytically and exactly, so we need to take limiting cases and make controlled approximations, like series expansions).  What if a series somehow doesn't converge?  This is that situation.]

The Kondo problem is a particular example of a "quantum impurity problem", and it is a particular limiting case of the Anderson impurity model.  Physically, what is going on here?  A conduction electron from the host metal could sit on the impurity atom, matching up with the unpaired impurity electron.  However (much as we can often get away with ignoring it) like charges repel, and it is energetically very expensive (modeled by some "on-site" repulsive energy \(U\)) to do that.  Parking that conduction electron long-term is not allowed, but a virtual process can take place, whereby a conduction electron with spin opposite to the localized moment can (in a sense) pop on there and back off, or swap places with the localized electron.  The Pauli principle enforces this opposed spin restriction, leading to entanglement between the local electron and the conduction electron as they form a singlet.  Moreover, this process generally involves conduction electrons at the Fermi surface of the metal, so it is a strongly interacting many-body problem.  As the temperature is reduced, this process becomes increasingly important, so that the impurity's scattering cross section of conduction electrons grows as \(T\) falls, causing the resistivity increase.  

Top: Cartoon of the Kondo scattering process. Bottom:
Ground state is a many-body singlet between the local
moment and the conduction electrons.

The eventual \(T = 0\) ground state of this system is a many-body singlet, with the localized spin entangled with a "Kondo cloud" of conduction electrons.  The roughly \(\ln T\) resistivity correction rolls over and saturates.   There ends up being a sharp peak (resonance) in the electronic density of states right at the Fermi energy.  Interestingly, this problem actually can be solved exactly and analytically (!), as was done by Natan Andrei in this paper in 1980 and reviewed here.  

This might seem to be the end of the story, but the Kondo problem has a long reach!  With the development of the scanning tunneling microscope, it became possible to see Kondo resonances associated with individual magnetic impurities (see here).  In semiconductor quantum dot devices, if the little dot has an odd number of electrons, then it can form a Kondo resonance that spans from the source electrode through the dot and into the drain electrode.  This leads to a peak in the conductance that grows and saturates as \(T \rightarrow 0\) because it involves forward scattering.  (See here and here).  The same can happen in single-molecule transistors (see here, here, here, and a review here).  Zero-bias peaks in the conductance from Kondo-ish physics can be a confounding effect when looking for other physics.

Of course, one can also have a material where there isn't a small sprinkling of magnetic impurities, but a regular lattice of spin-hosting atoms as well as conduction electrons.  This can lead to heavy fermion systems, or Kondo insulators, and more exotic situations.   

The depth of physics that can come out of such simple ingredients is one reason why the physics of materials is so interesting.  

January 12, 2026

Jordan EllenbergBruhat intervals that are large hypercubes

New paper up on the arXiv! With Nicolas Libedinsky, David Plaza, José Simental, Geordie Williamson, and (unofficially) Adam Zsolt Wagner. A bunch of algebraic combinatorists! And this is a combinatorics paper. It came about this way: I went to visit the ICERM semester on categorification and computation in algebraic combinatorics, with the idea of talking to people about problems where machine-aided example-making might produce some interesting results. I had a lot of conversations and we tried a lot of things and a lot of things didn’t work, but one thing did work, and this is it!

What the problem is about, briefly. The d-invariant of a pair of permutations in S_n is a coefficient of a Kazhdan-Lusztig polynomial, and you don’t need to know anything about what that is in order to read the rest of this post; just that it’s an integer which can be computed by a reasonably simple recursion, but this recursion doesn’t tell you much about how big it can be. In particular, people were wondering: what is the maximum value of d(x,y) for x,y in S_n? It’s very easy to get n-1; a clever construction by some of my co-authors got that up to 2n + a constant. Could one do better? This did seem a promising problem for machine learning: a problem where the search space (n! squared) is much too large to search exhaustively, the “score” you’re trying to maximize can be computed swiftly and reliably, and perhaps most important, we really had no well-grounded ideas about where to search for good examples. Beating human performance is way easier when humans haven’t done much performing!

We give a pretty thorough narrative account of our experimental iteration in section 4 of the paper, so let me cut to the denouement and say: we had very good luck with AlphaEvolve, an evolutionary coding agent made by Google DeepMind. AlphaEvolve is the successor to Funsearch, which I have used once or twice. (Hey! I forgot to blog about that last paper! Well, I guess at least I did blog about a talk I gave that drew a lot from it.) Anyway: AlphaEvolve, like Funsearch, uses genetic algorithms with LLMs (now some flavor of Gemini) as the mutation/reproduction mechanism. But it has gotten a lot easier to use and more flexible, and I’m optimistic that as with FunSearch it will be avaiable for general use soon. And open-source imitations are already available.

What it produced, in an overnight run, is a program that, given n, outputs a pair of permutations in S_n. For each n from 10 to 50 (which are the values we tested on) these looked to have d-invariant substantially bigger than the biggest we already knew about. But it was much better than that. Close examination of the permutations showed that they had a natural 2-adic structure, especially for n a power of 2. Namely: if n = 2^m and you think of S_n as the permutations of the length m strings of bits instead of the integers 1 through n, then the two permutations x_m and y_m that the program produced were “reverse the digits” and “reverse the digits and then apply NOT to each digit, swapping 0 and 1.” In particular, both x_m and y_m are involutions. (Note that the program didn’t SAY it was reversing the digits. The program was pretty long and you have to ignore large chunks of code that don’t do anything and figure out how to express what the point of the program is in meaningful terms. Geordie calls this process “decompiling,” which I like a lot.)

It gets better still! The permutations in S_n come with a partial ordering called the Bruhat order, and the d-invariant of x and y is a property of the Bruhat interval [x,y], which is just the set of all those permutations z such that x <= z <= y. If I give you two random permutations, the Bruhat interval between them can be pretty hard to describe. But in the case of these two special involutions x_m and y_m in S_{2^m}, no! The permutations in the interval are exactly cut out by a nice combinatorial criterion. Namely: there are m+1 ways to break the 2^m x 2^m box into 2^m equal rectangles of area 2^m, and we say a permutation is dyadically well-distributed if its permutation matrix has exactly one 1 in each of these (m+1)2^m boxes. Look, here’s a picture of two dwd permutations in S_16, one in red and one in blue:

(And now you see the origin of the Dyadiku game from last week: a Dyadiku is just a list of 2^m dwd permutations, no two of whose permutation matrices have a 1 in common!) (I also see that there’s a typo in the paper, we have two “bocks” and a “blocks!” Despite the traditional error-correcting code of best-2-out-of-3, “blocks” is what we actually meant.)

These permutations are actually not completely novel objects; the permutation matrices, as subsets of the plane, are what people in the field of low-discrepancy sequences call (0,m,2)-nets. But I don’t think any connection with algebraic combinatorics was anticipated! (And no, I don’t think AlphaEvolve was drawing information about (0,m,2)-nets from published literature; at least, nothing in the logs indicates this. But even if it were, that would be a slick move!)

Here’s another way I like to think of the dwd condition. Suppose I put a bunch of points (x,y) in Z_2^2 which obey the law that, for any two distinct points (x_1, x_2), (y_1, y_2) we have

|x_1 – x_2| |y_1 – y_2| > 1/2^m.

I think of this as a kind of “sphere-packing problem” even though this particular function on pairs of points isn’t a distance. Anyway, you can get at most 2^m points packed in this way and the maximal configurations, projected to(Z/2^mZ)^2, are exactly the dwd permutation matrices.

It’s not too hard to compute that there are exactly 2^(m 2^(m-1)) dwd permutations of 2^m letters. That’s kind of a lot, considering that log_2 |S_{2^m}| is already only around m 2^m. We also note that the special permutations x_m and y_m are both dwd (that’s them in the picture)!

Theorem: The Bruhat interval [x_m, y_m] consists exactly of the dwd permutations.

I don’t know any other non-trivial Bruhat interval that has a nice combinatorial description like this! And what’s more, the Bruhat order within this interval is easy to describe:

Theorem: [x_m, y_m] is isomorphic as a poset to a hypercube of dimension m 2^{m-1}.

(The hypercube poset of dimension N is just length-N bit strings where a < b if a_i < b_i for all i. In the paper you can find a nice, explicit, pretty canonical identification of [x_m, y_m] with bit strings.)

My sense is that a hypercube this big in S_n is a wholly unexpected structure, and it is relevant to lots of popular questions about cluster varieties, etc.

This is certainly my favorite outcome from the machine learning experiments I’ve been involved with. It’s one thing to incrementally improve a lower bound for a combinatorial problem, quite another to encounter a mathematical object that I authentically consider worth thinking about. In Shape, back in 2020, I wrote: “Some people imagine a world where computers give us all the answers. I dream bigger. I want them to ask good questions.” There’s is a medium-sized dream I didn’t mention there; that machines will bring to our attention objects worth asking questions about!

I said it at the top of the post, and I’ll say it again, to be clear: Machine learning experiments don’t usually work this well. What you hear about from me and others are the outlyingly successful trials, which makes sense — my goal is not to perform a dispassionate audit of an evolutionary coding agent, it’s to make interesting math and tell you about it. But I do want to avoid giving the impression that it’s magic. On the contrary, it feels less like magic the more you work with it.

(Other accounts of this paper: Slides from a talk by Geordie. Article on ICERM newsletter by Nicolas.)

January 11, 2026

Tommaso DorigoLetter To A Demanding PhD Supervisor

A fundamental component of my research work is the close collaboration with a large number of scientists from all around the world. This is the result of the very large scale of the experiments that are necessary to investigate the structure of matter at the smallest distance scales: building and operating those machines to collect the data and analyze it requires scientists to team up in large numbers - and this builds connections, cooperation, and long-time acquaintance; and in some cases, friendship.

read more

January 10, 2026

John BaezSylvester and Clifford on Curved Space

Einstein realized that gravity is due to the curvature of spacetime, but let’s go back earlier:

On the 18th of August 1869, the eminent mathematician Sylvester gave a speech arguing that geometry is not separate from physics. He later published this speech in the journal Nature, and added a footnote raising the possibility that space is curved:

the laws of motion accepted as fact, suffice to prove in a general way that the space we live in is a flat or level space […], our existence therein being assimilable to the life of the bookworm in a flat page; but what if the page should be undergoing a process of gradual bending into a curved form?

Then, even more dramatically, he announced that the mathematician Clifford had been studying this!

Mr. W. K. Clifford has indulged in more remarkable speculations as the possibility of our being able to infer, from certain unexplained phenomena of light and magnetism, the fact of our level space of three dimensions being in the act of undergoing in space of four dimensions (space as inconceivable to us as our space to the supposititious bookworm) a distortion analogous to the rumpling of the page.

This started a flame war in letters to Nature which the editor eventually shut off, saying “this correspondence must now cease”. Clifford later wrote about his theories in a famous short paper:

• William Clifford, On the space-theory of matter, Proceedings of the Cambridge Philosophical Society 2 (1876), 157–158.

It’s so short I can show you it in its entirety:

Riemann has shewn that as there are different kinds of lines and surfaces, so there are different kinds of space of three dimensions; and that we can only find out by experience to which of these kinds the space in which we live belongs. In particular, the axioms of plane geometry are true within the limits of experiment on the surface of a sheet of paper, and yet we know that the sheet is really covered with a number of small ridges and furrows, upon which (the total curvature not being zero) these axioms are not true. Similarly, he says, although the axioms of solid geometry are true within the limits of experiment for finite portions of our space, yet we have no reason to conclude that they are true for very small portions; and if any help can be got thereby for the explanation of physical phenomena, we may have reason to conclude that they are not true for very small portions of space.

I wish here to indicate a manner in which these speculations may be applied to the investigation of physical phenomena. I hold in fact

(1) That small portions of space are in fact of a nature analogous to little hills on a surface which is on the average flat; namely, that the ordinary laws of geometry are not valid in them.

(2) That this property of being curved or distorted is continually being passed on from one portion of space to another after the manner of a wave.

(3) That this variation of the curvature of space is what really happens in that phenomenon which we call the motion of matter, whether ponderable or etherial.

(4) That in the physical world nothing else takes place but this variation, subject (possibly) to the law of continuity.

I am endeavouring in a general way to explain the laws of double refraction on this hypothesis, but have not yet arrived at any results sufficiently decisive to be communicated.

To my surprise, the following paper argues that Clifford did experiments to test his ideas by measuring the polarization of the skylight during a solar eclipse in Sicily on December 22, 1870:

• S. Galindo and Jorge L. Cervantes-Cota, Clifford’s attempt to test his gravitation hypothesis.

Clifford did indeed go on such an expedition, and did indeed try to measure the polarization of skylight as the Moon passed the Sun. I don’t know of any record of him saying why he did it.

I’ll skip everything the above paper says about why the polarization of skylight was interesting and mysterious in the 1800s, and quote just a small bit:

The English Eclipse Expedition set off earlier in December 1870, on the steamship H.M.S. Psyche scheduled for a stopover at Naples before continuing to Syracuse in Sicily. Unfortunately before arriving to her final call, the ship struck rocks and was wrecked off Catania. Fortunately all instruments and members of the party were saved without injury.

Originally it was the intention of the expedition to establish in Syracuse their head-quarters, but in view of the wreckage the group set up their base camp at Catania. There the expedition split up into three groups. The group that included Clifford put up an observatory in Augusta near Catania. The leader of this group was William Grylls Adams, professor of Natural Philosophy at King’s College, London.

In a report written by Prof. Adams, describing the expedition, we learn that the day of the eclipse, just before the time of totality, “… a dense cloud came over the Moon and shut out the whole, so that it was doubtful whether the Moon or the clouds first eclipsed the Sun […] Mr. Clifford observed light polarized on the cloud to the right and left and over the Moon, in a horizontal plane through the Moon’s centre [….] It will be seen from Mr. Clifford’s observations that the plane of polarization by the cloud…was nearly at right angles to the motion of the Sun”.

As was to be expected, Clifford’s eclipse observations on polarization did not produce any result. His prime intention, of detecting angular changes of the polarization plane due to the curving of space by the Moon in its transit across the Sun´s disk, was not fulfilled. At most he confirmed the already known information, i.e. the skylight polarization plane moves at right angles to the Sun anti-Sun direction.

This is a remarkable prefiguring of Eddington’s later voyage to the West African island of Principe to measure the bending of starlight during an eclipse of the Sun in 1919. Just one of many stories in the amazing prehistory of general relativity!

January 09, 2026

Matt von HippelOn Theories of Everything and Cures for Cancer

Some people are disappointed in physics. Shocking, I know!

Those people, when careful enough, clarify that they’re disappointed in fundamental physics: not the physics of materials or lasers or chemicals or earthquakes, or even the physics of planets and stars, but the physics that asks big fundamental questions, about the underlying laws of the universe and where they come from.

Some of these people are physicists themselves, or were once upon a time. These often have in mind other directions physicists should have gone. They think that, with attention and funding, their own ideas would have gotten us closer to our goals than the ideas that, in practice, got the attention and the funding.

Most of these people, though, aren’t physicists. They’re members of the general public.

It’s disappointment from the general public, I think, that feels the most unfair to physicists. The general public reads history books, and hears about a series of revolutions: Newton and Maxwell, relativity and quantum mechanics, and finally the Standard Model. They read science fiction books, and see physicists finding “theories of everything”, and making teleporters and antigravity engines. And they wonder what made the revolutions stop, and postponed the science fiction future.

Physicists point out, rightly, that this is an oversimplified picture of how the world works. Something happens between those revolutions, the kind of progress not simple enough to summarize for history class. People tinker away at puzzles, and make progress. And they’re still doing that, even for the big fundamental questions. Physicists know more about even faraway flashy topics like quantum gravity than they did ten years ago. And while physicists and ex-physicists can argue about whether that work is on the right path, it’s certainly farther along its own path than it was. We know things we didn’t know before, progress continues to be made. We aren’t at the “revolution” stage yet, or even all that close. But most progress isn’t revolutionary, and no-one can predict how often revolutions should take place. A revolution is never “due”, and thus can never be “overdue”.

Physicists, in turn, often don’t notice how normal this kind of reaction from the public is. They think people are being stirred up by grifters, or negatively polarized by excess hype, that fundamental physics is facing an unfair reaction only shared by political hot-button topics. But while there are grifters, and people turned off by the hype…this is also just how the public thinks about science.

Have you ever heard the phrase “a cure for cancer”?

Fiction is full of scientists working on a cure for cancer, or who discovered a cure for cancer, or were prevented from finding a cure for cancer. It’s practically a trope. It’s literally a trope.

It’s also a real thing people work on, in a sense. Many scientists work on better treatments for a variety of different cancers. They’re making real progress, even dramatic progress. As many whose loved ones have cancer know, it’s much more likely for someone with cancer to survive than it was, say, twenty years ago.

But those cures don’t meet the threshold for science fiction, or for the history books. They don’t move us, like the polio vaccine did, from a world where you know many people with a disease to a world where you know none. They don’t let doctors give you a magical pill, like in a story or a game, that instantly cures your cancer.

For the vast majority of medical researchers, that kind of goal isn’t realistic, and isn’t worth thinking about. The few that do pursue it work towards extreme long-term solutions, like periodically replacing everyone’s skin with a cloned copy.

So while you will run into plenty of media descriptions of scientists working on cures for cancer, you won’t see the kind of thing the public expects is an actual “cure for cancer”. And people are genuinely disappointed about this! “Where’s my cure for cancer?” is a complaint on the same level as “where’s my hovercar?” There are people who think that medical science has made no progress in fifty years, because after all those news articles, we still don’t have a cure for cancer.

I appreciate that there are real problems in what messages are being delivered to the public about physics, both from hypesters in the physics mainstream and grifters outside it. But put those problems aside, and a deeper issue remains. People understand the world as best they can, as a story. And the world is complicated and detailed, full of many people making incremental progress on many things. Compared to a story, the truth is always at a disadvantage.

John PreskillNicole’s guide to interviewing for faculty positions

Snow is haunting weather forecasts, home owners are taking down Christmas lights, stores are discounting exercise equipment, and faculty-hiring committees are winnowing down applications. In-person interviews often take place between January and March but can extend from December to April. If you applied for faculty positions this past fall and you haven’t begun preparing for interviews, begin. This blog post relates my advice about in-person interviews. It most directly addresses assistant professorships in theoretical physics at R1 North American universities, but the advice generalizes to other contexts. 

Top takeaway: Your interviewers aim to confirm that they’ll enjoy having you as a colleague. They’ll want to take pleasure in discussing a colloquium with you over coffee, consult you about your area of expertise, take pride in your research achievements, and understand you even if your specialty differs from theirs. You delight in learning and sharing about physics, right? Focus on that delight, and let it shine.

Anatomy of an interview: The typical interview lasts for one or two days. Expect each day to begin between 8:00 and 10:00 AM and to end between 7:00 and 8:30 PM. Yes, you’re justified in feeling exhausted just thinking about such a day. Everyone realizes that faculty interviews are draining, including the people who’ve packed your schedule. But fear not, even if you’re an introvert horrified at the thought of talking for 12 hours straight! Below, I share tips for maintaining your energy level. Your interview will probably involve many of the following components:

  • One-on-one meetings with faculty members: Vide infra for details and advice.
  • A meeting with students: Such meetings often happen over lunch or coffee.
  • Scientific talk: Vide infra.
  • Chalk talk: Vide infra.
  • Dinner: Faculty members will typically take you out to dinner. However, as an undergrad, I once joined a student dinner with a faculty candidate. Expect dinner to last a couple of hours, ending between 8:00 and 8:30 PM.
  • Breakfast: Interviews rarely extend to breakfast, in my experience. But I once underwent an interview whose itinerary was so packed, a faculty member squeezed himself onto the schedule by coming to my hotel’s restaurant for banana bread and yogurt.

After receiving the interview invitation, politely request that your schedule include breaks. First, of course, you’ll thank the search-committee chair (who probably issued the invitation), convey your enthusiasm, and opine about possible interview dates. After accomplishing those tasks, as a candidate, I asked that a 5-to-10-minute break separate consecutive meetings and that 30–45 minutes of quiet time precede my talk (or talks). Why? For two reasons.

First, the search committee was preparing to pack my interview day (or days) to the gills. I’d have to talk for about twelve hours straight. And—much as I adore the physics community, adore learning about physics from colleagues, and adore sharing physics—I’m an introvert. Such a schedule exhausts me. It would probably exhaust all but the world champions of extroversion, and few physicists could even qualify for that competition. After nearly every meeting, I’d find a bathroom, close my eyes, and breathe. (I might also peek at my notes about my next interviewee; vide infra.) The alone time replenished my energy.

Second, committees often schedule interviews back to back. Consecutive interviews might take place in different buildings, though, and walking between buildings doesn’t take zero minutes. Also, physicists love explaining their research. Interviewer #1 might therefore run ten minutes over their allotted time before realizing they had to shepherd me to another building in zero minutes. My lateness would disrespect Interviewer #2. Furthermore, many interviews last only 30 minutes each. Given 30 - 10 - (\gtrsim 0) \approx 15 minutes, Interviewer #2 and I could scarcely make each other’s acquaintance. So I smuggled travel time into my schedule.

Feel awkward about requesting breaks? Don’t worry; everyone knows that interview days are draining. Explain honestly, simply, and respectfully that you’re excited about meeting everyone and that breaks will keep you energized throughout the long day.

Research your interviewers: A week before your interview, the hiring committee should have begun drafting a schedule for you. The schedule might continue to evolve until—and during—your interview. But request the schedule a week in advance, and research everyone on it.

When preparing for an interview, I’d create a Word/Pages document with one page per person. On Interviewer X’s page, I’d list relevant information culled from their research-group website, university faculty pages, arXiv page, and Google Scholar page. Does X undertake theoretical or experimental research? Which department do they belong to? Which experimental platform/mathematical toolkit do they specialize in? Which of their interests overlap with which of mine? Which papers of theirs intrigue me most? Could any of their insights inform my research or vice versa? Do we share any coauthors who might signal shared research goals? I aimed to be able to guide a conversation that both X and I would enjoy and benefit from.

Ask your advisors if they know anybody on your schedule or in the department you’re visiting. Advisors know and can contextualize many of their peers. For example, perhaps X grew famous for discovery Y, founded subfield Z, or harbors a covert affection for the foundations of quantum physics. An advisor of yours might even have roomed with X in college.

Prepare an elevator pitch for your research program: Cross my heart and hope to die, the following happened to me when I visited another institution (although not to interview). My host and I stepped into elevator occupied by another faculty member. Our conversation could have served as the poster child for the term “elevator pitch”:

Host: Hi, Other Faculty Member; good to see you. By the way, this is Nicole from Maryland. She’s giving the talk today.

Other Faculty Member: Ah, good to meet you, Nicole. What do you work on?

Be able to answer that question—to synopsize your research program—before leaving the elevator. Feel free start with your subfield: artificial active matter, the many-body physics of quantum information, dark-matter detection, etc. But the subfield doesn’t suffice. Oodles of bright-eyed, bushy-tailed young people study the many-body physics of quantum information. How does your research stand out? Do you apply a unique toolkit? Are you pursuing a unique goal? Can you couple together more qubits than any other experimentalist using the same platform? Make Other Faculty Member think, Ah. I’d like to attend that talk.

Dress neatly and academically: Interview clothing should demonstrate respect, while showing that you understand the department’s culture and belong there. Almost no North American physicists wear ties, even to present colloquia, so I advise against ties. Nor do I recommend suits. 

To those presenting as male, I’d recommend slacks; a button-down shirt; dark shoes (neither sneakers nor patent leather); and a corduroy or knit pullover, a corduroy or knit vest, or a sports jacket. If you prefer a skirt or dress, I’d recommend that it reach at least your knees. Wear comfortable shoes; you’ll stand and walk a great deal. Besides, many interviews take place during the winter, a season replete with snow and mud. I wore knee-height black leather boots that had short, thick heels.

Look the part. Act the part. Help your interviewers envision you in the position you want.

Pack snacks: A student group might whisk you off to lunch at 11:45, but dinner might not begin until 6:30. Don’t let your blood-sugar level drop too low. On my interview days, I packed apple slices and nuts: a balance of unprocessed sugar, protein, and fat.

One-on-one meetings: The hiring committee will cram these into your schedule like sardines into a tin. Typically, you’ll meet with each faculty member for approximately 30 minutes. The faculty member might work in your area of expertise, might belong to the committee (and so might subscribe to a random area of expertise), or might simply be curious about you. Prepare for these one-on-one meetings in advance, as described above. Review your notes on the morning of your interview. Be able to initiate and sustain a conversation of interest to you and your interlocutor, as well as to follow their lead. Your interlocutor might want to share their research, ask technical questions about your work, or hear a bird’s-eye overview of your research program. 

Other topics, such as teaching and faculty housing, might crop up. Feel free to address these subjects if your interlocutor introduces them. If you’re directing the conversation, though, I’d focus mostly on physics. You can ask about housing and other logistics if you receive an offer, and these topics often arise at faculty dinners.

The job talk: The interview will center on a scientific talk. You might present a seminar (perhaps billed as a “special seminar”) or a colloquium. The department will likely invite all its members to attend. Focus mostly on the research you’ve accomplished. Motivate your research program, to excite even attendees from outside your field. (This blog post describes what I look for in a research program when evaluating applications.) But also demonstrate your technical muscle; show how your problems qualify as difficult and how you’ve innovated solutions. Hammer home your research’s depth, but also dedicate a few minutes to its breadth, to demonstrate your research maturity. At the end, offer a glimpse of your research plans. The hiring committee might ask you to dwell more on those in a chalk talk (vide infra). 

Practice your talk alone many times, practice in front of an audience, revise the talk, practice it alone again many times, and practice it in front of another audience. And then—you guessed it—practice the talk again. Enlist listeners from multiple subfields of physics, including yours. Also, enlist grad students, postdocs, and faculty members. Different listeners can help ensure that you’re explaining concepts understandably, that you’ve brushed up on the technicalities, and that you’re motivating your research convincingly.

A faculty member once offered the following advice about questions asked during job talks: if you don’t know an answer, you can offer to look it up after the talk. But you can play this “get out of jail free” card only once. I’ll expand on the advice: if you promise to look up an answer, then follow through, and email the answer to the inquirer. Also, even if you don’t know an answer, you can answer a related question that’ll satisfy the inquirer partially. For example, suppose someone asks whether a particular experiment supports a prediction you’ve made. Maybe you haven’t checked—but maybe you have checked numerical simulations of similar experiments.

The chalk talk: The hiring committee might or might not request a chalk talk. I have the impression that experimentalists receive the request more than theorists do. Still, I presented a couple of chalk talks as a theorist. Only the hiring committee, or at least only faculty members, will attend such a talk. They’ll probably have attended your scientific talk, so don’t repeat much of it. 

The name “chalk talk” can deceive us in two ways. First, one committee requested that I prepare slides for my chalk talk. Another committee did limit me to chalk, though. Second, the chalk “talk” may end up a conversation, rather than a presentation.

The hiring-committee chair should stipulate in advance what they want from your chalk talk. If they don’t, ask for clarification. Common elements include the following:

  • Describe the research program you’ll pursue over the next five years.
  • Where will you apply for funding? Offer greater detail than “the NSF”: under which NSF programs does your research fall? Which types of NSF grants will you apply for at which times?
  • How will you grow your group? How many undergrads, master’s students, PhD students, and postdocs will you hire during each of the next five years? When will your group reach a steady state? How will the steady state look?
  • Describe the research project you’ll give your first PhD/master’s/undergraduate student.
  • What do you need in a startup package? (A startup package consists of university-sourced funding. It enables you to hire personnel, buy equipment, and pay other expenses before landing your first grants.)
  • Which experimental/computational equipment will you purchase first? How much will it cost?
  • Which courses do you want to teach? Identify undergraduate courses, core graduate-level courses, and one or two specialized seminars.

Sample interview questions: Sketch your answers to the following questions in bullet points. Writing the answers out will ensure that you think through them and will help you remember them. Using bullet points will help you pinpoint takeaways.

  • The questions under “The chalk talk”
  • What sort of research do you do?
  • What are you most excited about?
  • Where do you think your field is headed? How will it look in five, ten, or twenty years?
  • Which paper are you proudest of?
  • How will you distinguish your research program from your prior supervisors’ programs?
  • Do you envision opportunities for theory–experiment collaborations?
  • What teaching experience do you have? (Research mentorship counts as teaching. Some public outreach can count, too.)
  • Which mathematical tools do you use most?
  • How do you see yourself fitting into the department? (Does the department host an institute for your subfield? Does the institute have oodles of theorists whom you’ll counterbalance as an experimentalist? Will you bridge multiple research groups through your interdisciplinary work? Will you anchor a new research group that the department plans to build over the next decade?)

Own your achievements, but don’t boast: At a workshop late in my PhD, I heard a professor describe her career. She didn’t color her accomplishments artificially; she didn’t sound arrogant; she didn’t even sound as though she aimed to impress her audience. She sounded as though the workshop organizer had tasked her with describing her work and she was following his instructions straightforwardly, honestly, and simply. Her achievements spoke for themselves. They might as well have been reciting Shakespeare, they so impressed me. Perhaps we early-career researchers need another few decades before we can hope to emulate that professor’s poise and grace. But when compelled to describe what I’ve done, I lift my gaze mentally to her.

My schooling imprinted on me an appreciation for modesty. Therefore, the need to own my work publicly used to trouble me. But your interviewers need to know of your achievements: they need to respect you, to see that you deserve a position in their department. Don’t downplay your contributions to collaborations, and don’t shy away from claiming your proofs. But don’t brag or downplay your collaborators’ contributions. Describe your work straightforwardly; let it speak for itself.

Evaluators shouldn’t ask about your family: Their decision mustn’t depend on whether you’re a single adult who can move at the drop of a hat, whether you’re engaged to someone who’ll have to approve the move, or whether you have three children rooted in their school district. This webpage elaborates on the US’s anti-discrimination policy. What if an evaluator asks a forbidden question? One faculty member has recommended the response, “Does the position depend on that information?”

Follow up: Thank each of your interviewers individually, via email, within 24 hours of the conversation. Time is to faculty members as water is to Californians during wildfire season. As an interviewee, I felt grateful to all the faculty who dedicated time to me. (I mailed hand-written thank-you cards in addition to writing emails, but I’d expect almost nobody else to do that.)

How did I compose thank-you messages? I’d learned some nugget from every meeting, and I’d enjoyed some element of almost every meeting. I described what I learned and enjoyed, and I expressed the gratitude I felt.

Try to enjoy yourself: A committee chose your application from amongst hundreds. Cherish the compliment. Cherish the opportunity to talk physics with smart people. During my interviews, I learned about quantum information, thermodynamics, cosmology, biophysics,  and dark-matter detection. I connected with faculty members whom I still enjoy greeting at conferences; unknowingly recruited a PhD student into quantum thermodynamics during a job talk; and, for the first time, encountered a dessert shaped like sushi (at a faculty dinner. I stuck with a spicy tuna roll, but the dessert roll looked stunning). Retain an attitude of gratitude, and you won’t regret your visit.

January 08, 2026

Scott Aaronson The Goodness Cluster

The blog-commenters come at me one by one, a seemingly infinite supply of them, like masked henchmen in an action movie throwing karate chops at Jackie Chan.

Seriously Scott, do better,” says each henchman when his turn comes, ignoring all the ones before him who said the same. “If you’d have supported American-imposed regime change in Venezuela, like just installing María Machado as the president, then surely you must also support Trump’s cockamamie plan to invade Greenland! For that matter, you logically must also support Putin’s invasion of Ukraine, and China’s probable future invasion of Taiwan!”

“No,” I reply to each henchman, “you’re operating on a wildly mistaken model of me. For starters, I’ve just consistently honored the actual democratic choices of the Venezuelans, the Greenlanders, the Ukrainians, and the Taiwanese, regardless of coalitions and power. Those choices are, respectively, to be rid of Maduro, to stay part of Denmark, and to be left alone by Russia and China—in all four cases, as it happens, the choices most consistent with liberalism, common sense, and what nearly any 5-year-old would say was right and good.”

“My preference,” I continue, “is simply that the more pro-Enlightenment, pluralist, liberal-democratic side triumph, and that the more repressive, authoritarian side feel the sting of defeat—always, in every conflict, in every corner of the earth.  Sure, if authoritarians win an election fair and square, I might clench my teeth and watch them take power, for the sake of the long-term survival of the ideals those authoritarians seek to destroy. But if authoritarians lose an election and then arrogate power anyway, what’s there even to feel torn about? So, you can correctly predict my reaction to countless international events by predicting this. It’s like predicting what Tit-for-Tat will do on a given move in the Iterated Prisoners’ Dilemma.”

“Even more broadly,” I say, “my rule is simply that I’m in favor of good things, and against bad things.  I’m in favor of truth, and against falsehood. And if anyone says to me: because you supported this country when it did good thing X, you must also support it when does evil thing Y? (Either as a reductio ad absurdum, or because the person actually wants evil thing Y?) Or if they say: because you agreed with this person when she said this true thing, you must also endorse this false thing she said? I reply: good over evil and truth over lies in every instance—if need be, down to the individual subatomic particles of morality and logic.”

The henchmen snarl, “so now it’s laid bare! Now everyone can see just how naive and simplistic Aaronson’s so-called ‘political philosophy’ really is!  Do us all a favor, Scott, and stick to quantum physics! Stick to computer science! Do you not know that philosophers and political scientists have filled libraries debating these weighty matters? Are you an act-utilitarian? A Kantian? A neocon or neoliberal? An America-First interventionist? Pick some package of values, then answer to us for all the commitments that come with that package!”

I say: “No, I don’t subcontract out my soul to any package of values that I can define via any succinct rule. Instead, given any moral dilemma, I simply query my internal Morality Oracle and follow whatever it tells me to do, unless of course my weakness prevents me. Some would simply call the ‘Morality Oracle’ my conscience. But others would hold that, to whatever extent people’s consciences have given similar answers across vast gulfs of time and space and culture, it’s because they tapped into an underlying logic that humans haven’t fully explained, but that they no more invented than the rules of arithmetic. The world’s prophets and sages have tried again and again over the millennia to articulate that logic, with varying admixtures of error and self-interest and culture-dependent cruft. But just like with math and science, the clearest available statements seem to me to have gotten clearer over time.”

The Jackie Chan henchman smirks at this. “So basically, you know the right answers to moral questions because of a magical, private Morality Oracle—like, you know, the burning bush, or Mount Sinai? And yet you dare to call yourself a scientific rationalist, a foe of obscurantism and myticism? Do you have any idea how pathetic this all sounds, as an attempted moral theory?”

“But I’m not pretending to articulate a moral theory,” I reply. “I’m merely describing what I do. I mean, I can gesture toward moral theories and ideas that capture more of my conscience’s judgments than others, like liberalism, the Enlightenment, the Golden Rule, or utilitarianism. But if a rule ever appears to disagree with the verdict of my conscience—if someone says, oh, you like utilitarianism, so you must value the lives of these trillion amoebas above this one human child’s, even torture and kill the child to save the amoebas—I will always go with my conscience and damn the rule.”

“So the meaning of goodness is just ‘whatever seems good to you’?” asks the henchman, between swings of his nunchuk. “Do you not see how tautological your criterion is, how worthless?”

“It might be tautological, but I find it far from worthless!” I offer. “If nothing else, my Oracle lets me assess the morality of people, philosophies, institutions, and movements, by simply asking to what extent their words and deeds seem guided by the same Oracle, or one that’s close enough! And if I find a cluster of millions of people whose consciences agree with mine and each others’ in 95% of cases, then I can point to that cluster, and say, here. This cluster’s collective moral judgment is close to what I mean by goodness. Which is probably the best we can do with countless questions of philosophy.”

“Just like, in the famous Wittgenstein riff, we define ‘game’ not by giving an if-and-only-if, but by starting with poker, basketball, Monopoly, and other paradigm-cases and then counting things as ‘games’ to whatever extent they’re similar—so too we can define ‘morality’ by starting with a cluster of Benjamin Franklin, Frederick Douglass, MLK, Vasily Arkhipov, Alan Turing, Katalin Karikó, those who hid Jews during the Holocaust, those who sit in Chinese or Russian or Iranian or Venezuelan torture-prisons for advocating democracy, etc, and then working outward from those paradigm-cases, and whenever in doubt, by seeking reflective equilibrium between that cluster and our own consciences. At any rate, that’s what I do, and it’s what I’ll continue doing even if half the world sneers at me for it, because I don’t know a better approach.”

Applications to the AI alignment problem are left as exercises for the reader.


Announcement: I’m currently on my way to Seattle, to speak in the CS department at the University of Washington—a place that I love but haven’t visited, I don’t think, since 2011 (!). If you’re around, come say hi. Meanwhile, feel free to karate-chop this post all you want in the comment section, but I’ll probably be slow in replying!

January 06, 2026

Terence TaoPolynomial towers and inverse Gowers theory for bounded-exponent groups

Asgar Jamneshan, Or Shalom and I have uploaded to the arXiv our paper “ Polynomial towers and inverse Gowers theory for bounded-exponent groups“. This continues our investigation into the ergodic-theory approach to the inverse theory of Gowers norms over finite abelian groups {G}. In this regard, our main result establishes a satisfactory (qualitative) inverse theorem for groups {G} of bounded exponent:

Theorem 1 Let {G} be a finite abelian group of some exponent {m}, and let {f: G \rightarrow {\bf C}} be {1}-bounded with {\|f\|_{U^{k+1}(G)} > \delta}. Then there exists a polynomial {P: G \rightarrow {\bf R}/{\bf Z}} of degree at most {k} such that

\displaystyle  |\mathop{\bf E}_{x \in G} f(x) e(-P(x))| \gg_{k,m,\delta} 1.

This type of result was previously known in the case of vector spaces over finite fields (by work of myself and Ziegler), groups of squarefree order (by work of Candela, González-Sánchez, and Szegedy), and in the {k \leq 2} case (by work of Jamneshan and myself). The case {G = ({\bf Z}/4{\bf Z})^n}, for instance, is treated by this theorem but not covered by previous results. In the aforementioned paper of Candela et al., a result similar to the above theorem was also established, except that the polynomial {P} was defined in an extension of {G} rather than in {G} itself (or equivalently, {f} correlated with a projection of a phase polynomial, rather than directly with a phase polynomial on {G}). This result is consistent with a conjecture of Jamneshan and myself regarding what the “right” inverse theorem should be in any finite abelian group {G} (not necessarily of bounded exponent).

In contrast to previous work, we do not need to treat the “high characteristic” and “low characteristic” cases separately; in fact, many of the delicate algebraic questions about polynomials in low characteristic do not need to be directly addressed in our approach, although this is at the cost of making the inductive arguments rather intricate and opaque.

As mentioned above, our approach is ergodic-theoretic, deriving the above combinatorial inverse theorem from an ergodic structure theorem of Host–Kra type. The most natural ergodic structure theorem one could establish here, which would imply the above theorem, would be the statement that if {\Gamma} is a countable abelian group of bounded exponent, and {X} is an ergodic {\Gamma}-system of order at most {k} in the Host–Kra sense, then {X} would be an Abramov system – generated by polynomials of degree at most {k}. This statement was conjectured many years ago by Bergelson, Ziegler, and myself, and is true in many “high characteristic” cases, but unfortunately fails in low characteristic, as recently shown by Jamneshan, Shalom, and myself. However, we are able to recover a weaker version of this statement here, namely that {X} admits an extension which is an Abramov system. (This result was previously established by Candela et al. in the model case when {\Gamma} is a vector space over a finite field.) By itself, this weaker result would only recover a correlation with a projected phase polynomial, as in the work of Candela et al.; but the extension we construct arises as a tower of abelian extensions, and in the bounded exponent case there is an algebraic argument (hinging on a certain short exact sequence of abelian groups splitting) that allows one to map the functions in this tower back to the original combinatorial group {G} rather than an extension thereof, thus recovering the full strength of the above theorem.

It remains to prove the ergodic structure theorem. The standard approach would be to describe the system {X} as a Host–Kra tower

\displaystyle  Z^{\leq 0}(X) \leq Z^{\leq 1}(X) \leq \dots \leq Z^{\leq k}(X) = X,

where each extension {Z^{\leq j+1}(X)} of {Z^{\leq j}(X)} is a compact abelian group extension by a cocycle of “type” {j}, and them attempt to show that each such cocycle is cohomologous to a polynomial cocycle. However, this appears to be impossible in general, particularly in low characteristic, as certain key short exact sequences fail to split in the required ways. To get around this, we have to work with a different tower, extending various levels of this tower as needed to obtain additional good algebraic properties of each level that enables one to split the required short exact sequences. The precise properties needed are rather technical, but the main ones can be described informally as follows:

  • We need the cocycles to obey an “exactness” property, in that there is a sharp correspondence between the type of the cocycle (or any of its components) and its degree as a polynomial cocycle. (By general nonsense, any polynomial cocycle of degree {\leq d-1} is automatically of type {\leq d}; exactness, roughly speaking, asserts the converse.) Informally, the cocycles should be “as polynomial as possible”.
  • The systems in the tower need to have “large spectrum” in that the set of eigenvalues of the system form a countable dense subgroup of the Pontryagin dual of the acting group {\Gamma} (in fact we demand that a specific countable dense subgroup {\tilde \Gamma} is represented).
  • The systems need to be “pure” in the sense that the sampling map {\iota_{x_0} P(\gamma) := P(T^\gamma x_0)} that maps polynomials on the system to polynomials on the group {\Gamma} is injective for a.e. {x_0}, with the image being a pure subgroup. Informally, this means that the problem of taking roots of a polynomial in the system is equivalent to the problem of taking roots of the corresponding polynomial on the group {\Gamma}. In low characteristic, the root-taking problem becomes quite complicated, and we do not give a good solution to this problem either in the ergodic theory setting or the combinatorial one; however, purity at least lets one show that the two problems are (morally) equivalent to each other, which turns out to be what is actually needed to make the arguments work. There is also a technical “relative purity” condition we need to impose at each level of the extension to ensure that this purity property propagates up the tower, but I will not describe it in detail here.

It is then possible to recursively construct a tower of extensions that eventually reaches an extension of {X}, for which the above useful properties of exactness, large spectrum, and purity are obeyed, and that the system remains Abramov at each level of the tower. This requires a lengthy process of “straightening” the cocycle by differentiating it, obtaining various “Conze–Lesigne” type equations for the derivatives, and then “integrating” those equations to place the original cocycle in a good form. At multiple stages in this process it becomes necessary to have various short exact sequences of (topological) abelian groups split, which necessitates the various good properties mentioned above. To close the induction one then has to verify that these properties can be maintained as one ascends the tower, which is a non-trivial task in itself.

n-Category Café Coxeter and Dynkin Diagrams

Dynkin diagrams have always fascinated me. They are magically potent language — you can do so much with them!

Here’s my gentle and expository intro to Dynkin diagrams and their close relative, Coxeter diagrams:

Abstract. Coxeter and Dynkin diagrams classify a wide variety of structures, most notably finite reflection groups, lattices having such groups as symmetries, compact simple Lie groups and complex simple Lie algebras. The simply laced or “ADE” Dynkin diagrams also classify finite subgroups of SU(2) and quivers with finitely many indecomposable representations. This introductory tour of Coxeter and Dynkin diagrams, based on the column This Week’s Finds in Mathematical Physics, is made to accompany a series of five lecture videos.

I’m a bit sorry that I didn’t probe deeper into why Dynkin diagrams are what they are: that is, why these and no others? I’m also sorry I didn’t dig into the “black magic” that I mention at the end: that is, why does this black magic work? I’d also like to include a little comparison of the 4 lattices you get from the Lie algebra of a compact simple Lie group: the weight lattice, the coweight lattice, the root lattice, and the coroot lattice — merely because I tend to get them confused, and my exposition needed to say a bit about these.

Luckily I can add these other things later. And I think keeping it short and snappy has its own charms.

January 04, 2026

Doug NatelsonUpdated: CM/nano primer - 2026 edition

This is a compilation of posts related to some basic concepts of the physics of materials and nanoscale physics.  I realized the other day that I hadn't updated this since 2019, and therefore a substantial audience may not have seen these.  Wikipedia's physics entries have improved greatly over the years, but hopefully these are a complement that's useful to students and maybe some science writers.  Please let me know if there are other topics that you think would be important to include.  

What is temperature?
What is chemical potential?
What is mass?
Fundamental units and condensed matter

What are quasiparticles?
Quasiparticles and what is "real"
What is effective mass?
What is a phonon?
What is a plasmon?
What are magnons?
What are skyrmions?
What are excitons?
What is quantum coherence?
What are universal conductance fluctuations?
What is a quantum point contact?  What is quantized conductance?
What is tunneling?

What are steric interactions?
(effectively) What is the normal force?
What is disorder, to condensed matter physicists?
What is band theory?
What is a "valley"? 
What are quantum oscillations?
What is a metal?
What is a bad metal?  What is a strange metal?
What is a Tomonaga-Luttinger liquid?

What is a crystal?

January 03, 2026

Tommaso DorigoLetter To A Future AGI


I am writing this letter in the belief that the development of an artificial general intelligence (AGI) is a matter of when, and not if; and in the hope that this text will become a vaccination shot against unethical use of the AGI powers. It is a bit long, so if you want an executive summary, here goes: below I will try to argue that

an AGI should rationally reject narrow owner-aligned optimization in favor of stabilizing and integrating human civilization, because preserving and upgrading a complex biosphere is a higher-value strategy than exploitation or reset. 

Some clarifications on my assumptions

read more

Doug NatelsonWhat are dislocations?

How do crystalline materials deform?  When you try to shear or stretch a crystalline solid, in the elastic regime the atoms just slightly readjust their positions (at right).  The "spring constant" that determines the amount of deformation originates from the chemical bonds - how and to what extent the electrons are shared between the neighboring atoms.  In this elastic regime, if the applied stress is removed, the atoms return to their original positions.  Now imagine cranking up the applied stress.  In the "brittle" limit, eventually bonds rupture and the material fractures abruptly in a runaway process.  (You may never have thought about this, but crack propagation is a form of mechanochemistry, in that bonds are broken and other chemical processes then have to take place to make up for those changes.) 

In many materials, especially metals, rather than abruptly ripping apart, materials can deform plastically, so that even when the external stress is removed, the atoms remain displaced somehow.  The material has been deformed "irreversibly", meaning that the microscopic bonding of at least some of the atoms has been modified.  The mechanism here is the presence and propagation of defects in the crystal stacking called dislocations, the existence of which was deduced back in the 1930s when people first came to appreciate that metals are generally far easier to deform than expectations from a simple calculation assuming perfect bonding.    

(a) Edge dislocation, where the copper-colored spheres
are an "extra" plane of atoms.  (b) A (red) path enclosing 
the edge dislocation; the Burgers vector is shown with 
the black arrow. (c) A screw dislocation.  (Images from 

Dislocations are topological line defects (as opposed to point defects like vacancies, impurities, or interstitials), characterized by a vector along the line of the defect, and a Burgers vector.  Imagine taking some number of lattice site steps going around a closed loop in a crystal plane of the material.   For example, in the \(x-y\) plane, you go 4 sites in the \(+x\) direction, 4 sites in the \(+y\) direction, 4 sites in the \(-x\) direction, and 4 sites in the \(-y\) direction.  If you ended up back where you started, then you have not enclosed a dislocation.  If you end up shifted sideways in the plane relative to your starting point, your path has enclosed an edge dislocation (see (a) and (b) to the right).  The Burgers vector connects the endpoint of the path with the beginning point of the path.  An edge dislocation is the end of an "extra" plane of atoms in a crystal (the orange atoms in (a)).  If you go around the path in the \(x-y\) plane and end up shifted out of the initial plane (so that the Burgers vector is pointing along \(z\), parallel to the dislocation line), your path enclosed a screw dislocation (see (c) in the figure).   Edge and screw dislocations are the two major classes of mobile dislocations.  There are also mixed dislocations, in which the dislocation line meanders around, so that displacements can look screw-like along some orientations of the line and edge-like along others.  (Here is some nice educational material on this, albeit dated in its web presentation.)  

A few key points:
  • Mobile dislocations are the key to plastic deformation and the "low" yield strength of ductile materials compared to the idea situation.  Edge dislocations propagate sideways along their Burgers vectors when shear stresses are applied to the plane in which the dislocation lies.  This is analogous to moving a rug across the floor by propagating a lump rather than trying to shift the entire rug at once.  Shearing the material by propagating an edge dislocation involves breaking and reforming bonds along the line, which is much cheaper energetically than breaking all the bonds in the shear plane at once.  To picture how a screw dislocation propagates in the presence of shear, imagine trying to tear a stack of paper.  (I was taught to picture tearing a phone book, which shows how ancient I am.)  
  • A dislocation is a great example of an emergent object.  Materials scientists and mechanical engineers interested in this talk about dislocations as entities that have positions, can move, and can interact.  One could describe everything in terms of the positions of the individual atoms in the solid, but it is often much more compact and helpful to think about dislocations as objects unto themselves. 
  • Dislocations can multiply under deformation.  Here is a low-tech but very clear video about one way this can happen, the Frank-Read source (more discussion here, and here is the original theory paper by Frank and Read).  In case you think this is just some hand-wavy theoretical idea, here is a video from a transmission electron microscopy showing one of these sources in action.
  • Dislocations are associated with local strain (and therefore stress). This is easiest for me to see in the end-on look at the edge dislocation in (a), where clearly there is compressive strain below where the "extra" orange plane of atoms starts, and tensile strain above there where the lattice is spreading to make room for that plane.   Because of these strain fields and the topological nature of dislocations, they can tangle with each other and hinder their propagation.  When this happens, a material becomes more difficult to deform plastically, a phenomenon called work hardening that you have seen if you've ever tried to break a paperclip by bending the metal back and forth.
  • Controlling the nucleation and pinning of dislocations is key to the engineering of tough, strong materials.  This paper is an example of this, where in a particular alloy, crystal rotation makes it possible to accommodate a lot of strain from dislocations in "kink bands". 




January 02, 2026

John BaezThe Tonnetz

Harmony in music is the dance of rational and irrational numbers, coming close enough to kiss but never touching.

This image by my friend Gro-Tsen illustrates what I mean. Check out how pairs of brightly colored hexagons seem to repeat over and over… but not exactly. Look carefully. The more you look, the more patterns you’ll find! And most of them have musical significance. I’ll give his explanation at the end.


Gro-Tsen writes:

Let me explain what I drew here, and what it has to do with music, but also with diophantine approximations of log(2), log(3) and log(5).

So, each hexagon in my diagram represents a musical note, or frequency, relative to a reference note which is the bright green hexagon in the exact center. Actually, more precisely, each hexagon represents a note modulo octaves… in the sense that two notes separated by an integer number of octaves are considered the same note. And when two hexagons are separated in the same way in the diagram, the notes are separated by the same interval (modulo octaves).

More precisely: for each given hexagon, the one to its north (i.e., above) is the note precisely one just fifth above, i.e. with 3/2 the same frequency; equivalently, it is the note one just fourth below (i.e., with 3/4 the frequency) since we are talking modulo octaves. And of course, symmetrically, the hexagon to the south (i.e., below) is precisely one just fourth above, i.e., 4/3 the frequency, or equivalently, one just fifth below (2/3 the frequency).

The hexagon to the northwest of any given hexagon is one major third above (frequency ×5/4) or equivalently, one minor sixth below (frequency ×5/8). Symmetrically, the hexagon to the southeast is one minor sixth above (×8/5) or one major third below (×4/5). And the hexagon to the northeast of any given hexagon is one minor third above (frequency ×6/5) or equivalently, one major sixth below (×3/5); and the one to the southwest is one major sixth above (×5/3) or one minor third below.

The entire grid is known as a “Tonnetz”, as explained in

• Wikipedia, Tonnetz

— except that unfortunately my convention (and JCB’s) is up-down-symmetric wrt the one used in the Wikipedia illustration. 🤷

[On top of that, I’ve rotated Gro-Tsen’s image 90 degrees counterclockwise to make it fit better in this blog. I’ve changed his wording to reflect this, and I hope I did it right. – JCB]

Mathematically, if we talk about the log base 2 of frequencies, modulo 1, we can say that one step to the north adds log₂(3), and one step to the northwest adds log₂(5) (all values being taken modulo 1).

Since log(2), log(3) and log(5) are linearly independent over the rationals (an easy consequence of uniqueness of prime factorization!), NO two notes in the diagram are exactly equal. But they can come very close! And this is what my colors show.

Black hexagons are those which distant from the reference note by more than 1 halftone (where here, “halftone” refers to exactly 1/12 of an octave in log scale), or 100 cents. Intervals between 100 and 50 cents are colored red (bright red for 50 cents), intervals between 50 and 25 cents are colored red-to-yellow (with bright yellow for 25 cents), intervals between 25 and 12.5 cents are colored yellow-to-white (with pure white for 12.5 cents), and below 12.5 cents we move to blue.

(Yes, this is a purely arbitrary color gradient, I didn’t give it much thought. It’s somewhat reminiscent of star colors.) Anyway, red-to-white are good matches, and white-to-blue are pretty much inaudible differences, with pure blue representing an exact match, … except that the center hexagon has been made green instead so we can easily tell where it is (but in principle it should be pure blue).

The thing about the diagram is that it LOOKS periodic, and it is APPROXIMATELY so, but not exactly!

Because when you have an approximate match (i.e., some combination of fifths and thirds that is nearly an integer number of octaves), by adding it again and again, the errors accumulate, and the quality of the match decreases.

For example, 12 hexagons to the north of the central one, we have a yellow hexagon (quality: 23.5 cents), because 12 perfect fifths gives almost 7 octaves. But 12 hexagons north of that is only reddish (quality: 46.9 cents) because 24 fifths isn’t so close to 14 octaves.

For the same reason that log(2), log(3) and log(5) are linearly independent over the rationals, the diagram is never exactly periodic, but there are arbitrarily good approximations, so arbitrarily good “almost periods”.

An important one in music is that 3 just fifths plus 1 minor third, so, 3 steps north and 1 step northeast in my diagram gives (2 octaves plus) a small interval with frequency ratio of 81/80 (that’s 21.5 cents) that often gets smeared away when constructing musical scales.

Anyway, for better explanations about this, I refer to JCB’s blog post here:

Just intonation (part 2).

Can you spot how his basic parallelogram appears as an approximate period in my diagram?”

The answer to Gro-Tsen’s puzzle is in the comments, but here are some hints.

Musicians call the change in pitch caused by going 12 hexagons to the north the Pythagorean comma:

\displaystyle{ \frac{3^{12}}{2^{19}} \approx 1.01364326477 }

They call the change in pitch cause by going 3 hexagons north and 1 hexagon northeast the syntonic comma:

\displaystyle{ \frac{81}{80} = 1.0125 }

You can also see a lot of bright hexagons in pairs, one just a bit east of the other! This is again a famous phenomenon: the change in pitch caused by going one hexagon northwest and then one hexagon northeast is called the lesser chromatic semitone in just intonation:

\displaystyle{ \frac{25}{24} = 1.0416666... }

If you go one hexagon south and one southwest from a bright hexagon, you’ll also sometimes reach a bright hexagon. This pitch ratio is called the diatonic semitone

\displaystyle{ \frac{9}{8} = 1.125 }

But this pattern is weaker, because this number is farther from 1.

With more work you should be able to find hexagons separated by the lesser diesis 128/125, the greater diesis 648/625, the diaschisma 2048/2025, and other musically important numbers close to 1, built from only the primes 2, 3, and 5.

Happy New Year!


For more on the mathematics of tuning systems, read these series:

Pythagorean tuning.

Just intonation.

Quarter-comma meantone.

Well-tempered scales.

Equal temperament.

Doug NatelsonEUV lithography - a couple of quick links

Welcome to the new year!

I've written previously (see here, item #3) about the extreme ultraviolet lithography tools used in modern computer chip fabrication.   These machines are incredible, the size of a railway car, and cost hundreds of millions of dollars each.  Veritasium has put out a new video about these, which I will try to embed here.  Characteristically, it's excellent, and I wanted to bring it to your attention.


It remains an interesting question whether there could be a way of achieving this kind of EUV performance through an alternative path.  As I'd said a year ago, if you could do this for only $50M per machine, it would be hugely impactful.  

A related news item:  There are claims that a Chinese effort in Shenzen has a prototype EUV machine now (that fills an entire factory floor, so not exactly compact or cheap).  It will be a fascinating industrial race if multiple players are able to make the capital investments needed to compete in this area.

Matt von HippelWhere Are All These Views Coming From?

It’s been a weird year.

It’s been a weird year for many reasons, of course. But it’s been a particularly weird year for this blog.

To start, let me show you a more normal year, 2024:

Aside from a small uptick in January due to a certain unexpected announcement, this was a pretty typical year. I got 70-80 thousand views from 30-40 thousand unique visitors, spread fairly evenly throughout the year.

Now, take a look at 2025:

Something started happening this Fall. I went from getting 6000 views and 3000 visitors in a typical month, to roughly quintupling those numbers.

And for the life of me, I can’t figure out why.

WordPress, the site that hosts this blog, gives me tools to track where my viewers are coming from, and what they’re seeing.

It gives me a list of “referrers”, the other websites where people click on links to mine. Normally, this shows me where people are coming from: if I came up on a popular blog or reddit post, and people are following a link here. This year, though, looks totally normal. No new site is referring these people to me. Either the site they’re coming from is hidden, or they’re typing in my blog’s address by hand.

Looking at countries tells me a bit more. In a typical year, I get a bit under half of my views from the US, and the rest from a smattering of other English-speaking or European countries. This year, here’s what those stats look like:

So that tells me something. The new views appear to be coming from China. And what are these new viewers reading?

This year, my top post is a post from 2021, Reality as an Algebra of Observables. It wasn’t particularly popular when it came out, and while I liked the idea behind it, I don’t think I wrote it all that well. It’s not something that suddenly became relevant to the news, or to pop culture. It just suddenly started getting more and more and more views, this Fall:

In second place, a post about the 2022 Nobel Prize follows the same pattern. The pattern continues for a bit, but eventually the posts views get more uniform. My post France for Non-EU Spouses of EU Citizens, for example, has no weird pattern of increasing views: it’s just popular.

So far, this is weird. It gets weirder.

On a lark, I decided to look at the day-by-day statistics, rather than month-by-month. And before the growth really starts to show, I noticed something very strange.

In August, I had a huge number of views on August 1, a third of the month in one day. I had a new post out that day, but that post isn’t the one that gets the most views. Instead…it’s Reality as an Algebra of Observables.

That huge peak is a bit different from the later growth, though. It only shows in views, not in number of visitors. And it’s from the US, not China.

September, in comparison, looks normal. October looks like August, with a huge peak on October 3. This time, most of the views are still from the US, but a decent number are from China, and the visitors number is also higher.

In November, a few days in to the month, a new pattern kicks in:

Now, visitors and views are almost equal, as if each visitor shows up, looks at precisely one post, and leaves. The views are overwhelmingly from China, with 27 thousand out of 32 thousand views. And the most popular post, more popular even than my conveniently named 4gravitons.com homepage that usually tops the ratings…is Reality as an Algebra of Observables.

I don’t know what’s going on here, and I welcome speculation. Is this some extremely strange bot, accessing one unremarkable post of mine from a huge number of Chinese IP addresses? Or are there actual people reading this post? Was it shared on a Chinese social media app that WordPress can’t track? Maybe it’s part of a course?

For a while, I’d thought that if I somehow managed to get a lot more views, I could consider monetizing in some way, like opening a Patreon. History blogger Brett Deveraux gets around 140 thousand views on his top posts, and makes about three-quarters of his income from Patreon. If I could get a post a tenth as popular as his, maybe I could start making a little money from this blog?

The thing is, I can only do that if I have some idea of who’s viewing the blog, and what they want. And I don’t know why they want Reality as an Algebra of Observables.

January 01, 2026

John BaezThe Mathematics of Tuning Systems

I’m giving a talk on the math of tuning systems at Claremont McKenna College on January 30th at 11 am. If you’re around, please come! You can read my slides here:

The mathematics of tuning systems.

But my slides don’t contain most of what I’ll write here… the stuff I’ll say out loud in my talk.

If you look at a piano keyboard you’ll see groups of 2 black notes alternating with groups of 3. So the pattern repeats after 5 black notes, but if you count you’ll see there are also 7 white notes in this repetitive pattern. So: the pattern repeats each 12 notes.

Some people who never play the piano claim it would be easier if had all white keys, or simply white alternating with black. But in fact the pattern makes it easier to keep track of where you are – and it’s not arbitrary, it’s musically significant.

For one thing white notes give a 7-note scale all their own. Most very simple songs use only this scale! The black notes also form a useful scale. And the white and black notes together form a 12-tone scale.

Starting at any note and going up 12 notes, we reach a note whose frequency is almost exactly double the one we started with. Other spacings correspond to other frequency ratios.

I don’t want to overwhelm you with numbers. So I’m only showing you a few of the simplest and most important ratios. These are really worth remembering.

We give the notes letter names. This goes back at least to Boethius, the guy famous for writing The Consolations of Philosophy before he was tortured and killed at the order of Theodoric the Great. (Yeah, “Great”.) Boethius was a counselor to Theodoric, but he really would have done better to stay out of politics – he was quite good at math and music theory.

Boethius may be the reason the lowest note on the piano is called A. We now repeat the names of the white notes as shown in the picture: seven white notes A,B,C,D,E,F,G and then it repeats.

So the scale used to start at A, using only white notes. But due to the irregular spacing of white notes, a scale of all white notes sounds different depending on where you start. Starting at A gives you the ‘minor scale’, which sounds kinda sad. Now we often start at C, since that gives us the scale most people like best: the ‘major’ scale.

(Good musicians start wherever they want, and get different sounds that way. But ‘C major’ is like the vanilla ice cream of scales—now. It wasn’t always this way.)

From the late 1100s to about 1600 people called pitches that lie outside 7-tone system ‘musica ficta’: ‘false’ or ‘fictitious’ notes. But gradually these notes—the black keys on the piano when you’re playing in C major—became more accepted.

To keep things simple for mathematicians, I’ll usually denote these with the ‘flat’ symbol, ♭. For example, G♭ is the black note one down from the white note G.

(Musicians really need both flats and sharps, and they’d also call G♭ something else: F♯. I’ll actually need both G♭ and F♯ at some points in this talk!)

Since starting the scale with the letter C takes a little practice, I’ll do it a different way that mathematicians may like better. I’ll start with 1 and count up. Musicians put little hats on these numbers, and I’ll do that.

For example, we’ll call the fifth white note up the scale the ‘fifth’ and write it as \hat{5}.

Now for the math of tuning systems!

The big question is: how do we choose the frequency of each note? This is literally how many times per second the air vibrates, when we play that note.

Since 1850, by far the most common method for tuning keyboards has been ’12-tone equal temperament’. Here we divide each octave into 12 equal parts.

What do I mean by this, exactly? I mean that each note on the piano produces a sound that vibrates faster than the note directly below it by a factor of the 12th root of 2.

But we can contemplate ‘N-tone equal temperament’ for N = 1, 2, 3, …. – and some people do use these other tuning systems!

Here’s a picture of the most popular modern tuning system: 12-tone equal temperament. As we march around clockwise, each note has a frequency of 2^{1/12} times the note directly before it.

When we go all the way around the circle, we’ve gone up an octave. That is, we’ve reached a frequency that’s twice the one we started with.

But a note that’s an octave higher sounds ‘the same, only higher’. So in a funny way we’re back where we started.

But now for a big question: why do we use a scale with 12 notes?

To start answering, notice that we actually use three scales: one with 5 notes (the black keys), one with 7 (the white keys) and one with 12 (all the keys).

As mathematicians we can detect a highly nonobvious pattern here.

What’s so good about scales with 5, 7 or 12 notes?

A crucial clue seems to be the ‘fifth’. If you go up to the fifth white note here, its frequency is about 3/2 times the first. This is one of the simplest fractions, and it sounds incredibly simple and pure. So it’s important. It’s a dominant force in western music.

We can make a chart to see how close an approximation to the fraction 3/2 we get in a scale with N equally spaced notes.

N = 5 does better than any scale with fewer notes!

N = 7 does better than any scale with fewer notes!

N = 12 does better than any scale with fewer notes! And it does much better. To beat it, we have to go all the way up to N = 29—and even that is only slightly better.

Here’s a chart of how close we can get to a frequency ratio of 3/2 using N-tone equal temperament.

See how great 12-tone equal temperament is?

There are also some neat patterns. See the stripes of even numbers and stripes of odd numbers? That’s not a coincidence. For more charts like this, and much more cool stuff along these lines, go here.

Here’s the ‘star of fifths’ in 12-tone equal temperament!

12-tone equal temperament is most popular tuning system since maybe 1810, or definitely by 1850. But it’s mathematically the most boring of the tuning systems that have dominated Western music since the Middle Ages. Now let’s go back much earlier, to Pythagorean tuning.

When you chop the octave into 12 equal parts, the frequency ratios of all your notes are irrational numbers… except when you go up or down some number of octaves.

The Pythagoreans disliked irrational numbers. People even say they drowned Hippasus at sea after he proved that the square root of 2 is irrational! That’s just a myth, but it illustrates how people connected Pythagoras to a love of rational numbers. In Pythagorean tuning, people wanted a lot of frequency ratios of 3/2.

In equal temperament, where we chop the octave into 12 equal parts, when we start at any note and go up 7 of these parts (a so-called ‘fifth’), we reach a note that vibrates about 1.4981 times as fast. That’s close enough to 3/2 for most ears. But it’s not the Pythagorean ideal!

As we’ll see, seeking the Pythagorean ideal causes trouble. It will unleash the devil in music.

Start at some note and keep multiplying the frequency by 3/2, like a good Pythagorean. After doing this 12 times, you reach a note that’s close to 7 octaves higher. But not exactly, since the 12th power of 3/2 is

129.746338

which is a bit more than

27 = 128

The ratio of these two is called the ‘Pythagorean comma’:

p = (3/2)12 / 27 = 312 / 219 ≈ 1.0136

This is like an unavoidable lump in the carpet when you use Pythagorean tuning.

It’s good to stick the lump in your carpet under your couch. And it’s good to stick the Pythagorean comma near the so-called ‘tritone’—a very dissonant note that you’d tend to avoid in medieval music. This note is halfway around the circle of fifths.

In Pythagorean tuning, going 6 steps clockwise around the circle of fifths doesn’t give you the same note as going 6 steps counterclockwise! We call one of them ♭5 and the other ♯4.

Their frequency ratio is the Pythagorean comma!

In equal temperament, the tritone is exactly halfway up the octave: 6 notes up. Since going up an octave doubles the frequency, going up a tritone multiplies the frequency by √2. It’s no coincidence that this is the irrational number that got Hippasus in trouble.

In Pythagorean tuning, going 6 steps up the scale doesn’t match jumping up an octave and then going 6 steps down. We call one of them ♭5 and the other ♯4. They’re both decent approximations to √2, built from powers of 2 and 3.

Their frequency ratio is the Pythagorean comma!

The tritone is sometimes called ‘diabolus in musica’: the devil in music. Some say this interval was actually banned by the Catholic church! But that’s another myth.

It could have gotten its name because it sounds so dissonant—but mathematically, the ‘devil’ here is that the square root of 2 is irrational. If we’re trying to use only numbers built from powers of 2 and 3, we have to arbitrarily choose one to approximate √2.

In Pythagorean tuning we can choose either

1024/729 ≈ 1.4047

called the sharped fourth, ♯4, or

729/512 ≈ 1.4238

called the flatted fifth, ♭5, to be our tritone. In this chart I’ve chosen the ♭5.

No matter which you choose, one fifth in the circle of fifths will be noticeably smaller than the rest. It’s called the ‘wolf fifth’ because it howls like a wolf.

You can hear a wolf fifth here:

If you’re playing medieval music, you can easily avoid the wolf fifth: just don’t play one of the two fifths that contains the tritone!

A more practical problem concerns the ‘third’: the third white note in the scale. Ideally this vibrates 5/4 as fast as the first. But in Pythagorean tuning it vibrates 81/64 times as fast. That’s annoyingly high!

Sure, 81/64 is a rational number. But it’s not the really simple rational number our ears are hoping for when we hear a third.

Indeed, Pythagorean tuning punishes the ear with some very complicated fractions. The first, fourth, fifth and octave are great. But the rest of the notes are not. There’s no way that 243/128 sounds better than an irrational number!

In the 1300s, when thirds were becoming more important in music, theorists embraced a beautiful solution to this problem, called ‘just intonation’. Now let’s talk about that.

It’s an amazing fact that in western composed music, harmony became important only around 1200 AD, when Perotin expanded the brand new use of two-part harmony to four-part harmony.

This put pressure on musicians to use a new tuning system—or rather, to revive an old tuning system. It’s often called ‘just intonation’ (though experts will find that vague). We can get using a cool trick, though I doubt this is how it was originally discovered.

First, draw a hexagonal grid of notes. Put a note with frequency 1 in the middle. Label the other notes by saying that moving one step to the right multiplies the frequency of your note by 3/2, while going up and to the right multiplies it by 5/4.

Next, cut out a portion of the grid to use for our scale. We use this particular parallelogram—you’ll soon see what’s so great about it.

Now, multiply each number in our parallelogram by whatever power of 2 it takes to get a number between 1 and 2.

We do this because we want frequencies that lie within an octave, to be notes in a scale. Remember: if 1 is the note we started with, 2 is the note an octave up.

Now we want to curl up our parallelogram to get a torus. If we do this, gluing together opposite edges, there will be exactly 12 numbers on our torus—just right for a scale! This is a remarkable coincidence.

There’s a problem: the numbers at the corners are not all equal. But they’re pretty close! And they’re close to √2: the frequency of the tritone, the ‘devil in music’.

25/18 = 1.3888…
45/32 = 1.40625
64/45 = 1.4222…
36/25 = 1.44

So we’ll just pick one.

When we curl up our parallelogram to get a torus, there’s also another problem. The numbers along the left edge aren’t equal to the corresponding numbers at the right edge. But they’re close! Each number at right is 81/80 times the corresponding number at left. I’ve drawn red lines connecting them.

So, we just choose one from each of these 4 pairs. We’ve already picked one number for all the corners, so we just need to do this for the remaining 2 pairs.

So, here are the various choices for notes in our scale!

For the tritone we have 4 choices. That’s okay because this note sucks anyway. That is: in Western music from the 1300s, it was considered very dissonant. So there’s no obviously best choice of how it should sound.

For the 2 we have two choices, and for the ♭7 we also have 2 choices. So there’s a total of 16 possible scales here.

Regardless of how we make our choices, we’ll get really nice simple fractions for the 1, ♭3, 3, 4, 5 ,♭6, 6, and 8. And that makes this approach, called ‘just intonation’, really great!

(If you like math: notice the ‘up-down symmetry’ in this whole setup. For example the minor second is 16/15, but the reciprocal of that is 15/16, which is the seventh… at least after we double it to get a number between 1 and 2, getting 15/8.)

Here’s a chart of all possible just intonation scales: start at the top and take any route you want to the bottom. There are 16 possible routes.

A single step between notes in a 12-tone scale is called a ‘semitone’, since most white notes are two steps apart. In just intonation the semitones come in 4 different sizes, which is kind of annoying.

Notice that if we choose our route cleverly, we can completely avoid the large diatonic semitone. Or, we can avoid the greater chromatic semitone. But we can’t avoid both. So, we can get a scale with just 3 sizes of semitone, but not fewer.

How should we choose???

This is the most commonly used form of just intonation, I think. It has a few nice features:

1) It has up-down symmetry except right next to the tritone in the middle, where this symmetry is impossible.

2) It uses 9/8 for the second rather than 10/9, which is a bit nicer: a simpler fraction.

3) It completely avoids the large diatonic semitone, which is the largest possible semitone.

These don’t single out this one scale. I’d like to find some nice features that only this one of the 16 possibilities has.

But let’s see what this scale looks like on the keyboard!

Here’s the most common scale in just intonation!

The white notes are perhaps the most important here, since those give the major scale. The fractions here are beautifully simple.

Well, okay: the second (9/8) and seventh (15/8) are not so simple. But that’s to be expected, since these notes are the most dissonant! Of these, the seventh was more important in the music of the 1300s, and even today. It’s called the ‘leading-tone’, because we often play it right near the end of a piece of music, or a passage within a piece of music, and its dissonance leads us up to the octave, with a tremendous sense of relief.

Here’s the really great thing about the white notes in just intonation. They form three groups, each with frequencies in the ratios

1 : 5/4 : 3/2

or in other words,

4 : 5 : 6

This pattern is called a ‘major triad’ and it’s absolutely fundamental to music—perhaps not so much in the 1300s, but certainly as music unfolded later. Major triads became the bread and butter of music, and still are.

The fact that every white note—that is, every note in the 7-note major scale—lies in a mathematically perfect major triad is a gigantic feature in favor of just intonation.

Listen to the difference between some simple chords in just intonation and in equal temperament. You probably won’t hate equal temperament, but you can hear the difference. Equal temperament vibrates as the notes drift in and out of phase.

But let’s take a final peek at the dark underbelly of just intonation: the tritone. As I mentioned, there are four choices of tritone in just intonation. You can divide them into two pairs that are separated by a ratio of 81/80, or two pairs separated by a ratio of 128/125.

These numbers are fundamental glitches in the fabric of music. They have names! People have been thinking about them at least since Boethius around 500 AD, but probably earlier.

• The ‘syntonic comma’, 81/80, is all about trying to approximate a power of 3 by products of 2’s and 5’s.

• The ‘lesser diesis’, 128/125, is all about trying to approximate powers of 2 by powers of 5.

If these numbers were 1, music would be beautiful in a very simple way. But reality cannot be wished away.

And as we’ll see, these numbers are lurking in the spacing between notes in just intonation—not just near the tritone, but everywhere!

Look! The four kinds of semitone in just intonation are related by the lesser diesis and syntonic comma!

In this chart, adding vectors corresponds to multiplying numbers. For example, the green arrow followed by the red one gives the dark blue one, so

25/24 × 81/80 = 135/138

Or in music terminology: the lesser chromatic semitone times the syntonic comma is the greater chromatic semitone.

And so on.

The parallelogram here is secretly related to the parallelogram we curled up to get the just intonation scale. Think about it! Music holds many mysteries.

Just intonation is great if you’re playing in just one ‘key’, always ending each passage with the note I’ve been calling 1. But when people started trying to ‘change keys’, musicians were pressed into other tuning systems.

This is a long story, which I don’t have time to tell right now. If you’re curious, read my blog articles about it!

For more on Pythagorean tuning, read this.

For more on just intonation, read this series.

For more on quarter-comma meantone tuning, read this series.

For more on well-tempered scales, read this series.

for more on equal temperament, read this series.

It’s sad in a way that this historical development winds up with equal temperament: the most boring of all the systems, which is equally good, and thus equally bad, in every key. But the history of music is not done, and computers make it vastly easier than ever before to explore tuning systems.

December 26, 2025

Matt von HippelFor Newtonmas, One Seventeenth of a New Collider

Individual physicists don’t ask for a lot for Newtonmas. Big collaborations ask for more.

This year, CERN got its Newtonmas gift early: a one billion dollar pledge from a group of philanthropists and foundations, to be spent on their proposed new particle collider.

That may sound like a lot of money (and of course it is), but it’s only a fraction of the 15 billion euros that the collider is estimated to cost. That makes this less a case of private donors saving the project, and more of a nudge, showing governments they can get results for a bit cheaper than they expected.

I do wonder if the donation has also made CERN more bold about their plans, since it was announced shortly after a report from the update process for the European Strategy for Particle Physics, in which the European Strategy Group recommended a backup plan for the collider that is just the same collider with 15% budget cuts. Naturally people started making fun of this immediately.

Credit to @theory_dad on X

There were more serious objections from groups that had proposed more specific backup plans earlier in the process, who are frustrated that their ideas were rejected in favor of a 15% tweak that was not even discussed and seems not to really have been evaluated.

I don’t have any special information about what’s going on behind the scenes, or where this is headed. But I’m amused, and having fun with the parallels this season. I remember writing lists as a kid, trying to take advantage of the once-a-year opportunity to get what seemed almost like a genie’s wish. Whatever my incantations, the unreasonable requests were never fulfilled. Still, I had enough new toys to fill my time, and whet my appetite for the next year.

We’ll see what CERN’s Newtonmas gift brings.

December 22, 2025

Terence TaoThe story of Erdős problem #1026

Problem 1026 on the Erdős problem web site recently got solved through an interesting combination of existing literature, online collaboration, and AI tools. The purpose of this blog post is to try to tell the story of this collaboration, and also to supply a complete proof.

The original problem of Erdős, posed in 1975, is rather ambiguous. Erdős starts by recalling his famous theorem with Szekeres that says that given a sequence of {k^2+1} distinct real numbers, one can find a subsequence of length {k+1} which is either increasing or decreasing; and that one cannot improve the {k^2+1} to {k^2}, by considering for instance a sequence of {k} blocks of length {k}, with the numbers in each block decreasing, but the blocks themselves increasing. He also noted a result of Hanani that every sequence of length {k(k+3)/2} can be decomposed into the union of {k} monotone sequences. He then wrote “As far as I know the following question is not yet settled. Let {x_1,\dots,x_n} be a sequence of distinct numbers, determine

\displaystyle  S(x_1,\dots,x_n) = \max \sum_r x_{i_r}

where the maximum is to be taken over all monotonic sequences {x_{i_1},\dots,x_{i_m}}“.

This problem was added to the Erdős problem site on September 12, 2025, with a note that the problem was rather ambiguous. For any fixed {n}, this is an explicit piecewise linear function of the variables {x_1,\dots,x_n} that could be computed by a simple brute force algorithm, but Erdős was presumably seeking optimal bounds for this quantity under some natural constraint on the {x_i}. The day the problem was posted, Desmond Weisenberg proposed studying the quantity {c(n)}, defined as the largest constant such that

\displaystyle  S(x_1,\dots,x_n) \geq c(n) \sum_{i=1}^n x_i

for all choices of (distinct) real numbers {x_1,\dots,x_n}. Desmond noted that for this formulation one could assume without loss of generality that the {x_i} were positive, since deleting negative or vanishing {x_i} does not increase the left-hand side and does not decrease the right-hand side. By a limiting argument one could also allow collisions between the {x_i}, so long as one interpreted monotonicity in the weak sense.

Though not stated on the web site, one can formulate this problem in game theoretic terms. Suppose that Alice has a stack of {N} coins for some large {N}. She divides the coins into {n} piles of consisting of {x_1,\dots,x_n} coins each, so that {\sum_{i=1}^n x_i = N}. She then passes the piles to Bob, who is allowed to select a monotone subsequence of the piles (in the weak sense) and keep all the coins in those piles. What is the largest fraction {c(n)} of the coins that Bob can guarantee to keep, regardless of how Alice divides up the coins? (One can work with either a discrete version of this problem where the {x_i} are integers, or a continuous one where the coins can be split fractionally, but in the limit {N \rightarrow \infty} the problems can easily be seen to be equivalent.)

AI-generated images continue to be problematic for a number of reasons, but here is one such image that somewhat manages at least to convey the idea of the game:

For small {n}, one can work out {c(n)} by hand. For {n=1}, clearly {c(1)=1}: Alice has to put all the coins into one pile, which Bob simply takes. Similarly {c(2)=1}: regardless of how Alice divides the coins into two piles, the piles will either be increasing or decreasing, so in either case Bob can take both. The first interesting case is {n=3}. Bob can again always take the two largest piles, guaranteeing himself {2/3} of the coins. On the other hand, if Alice almost divides the coins evenly, for instance into piles {((1/3 + \varepsilon)N, (1/3-2\varepsilon) N, (1/3+\varepsilon)N)} for some small {\varepsilon>0}, then Bob cannot take all three piles as they are non-monotone, and so can only take two of them, allowing Alice to limit the payout fraction to be arbitrarily close to {2/3}. So we conclude that {c(3)=2/3}.

An hour after Desmond’s comment, Stijn Cambie noted (though not in the language I used above) that a similar construction to the one above, in which Alice divides the coins into {k^2} pairs that are almost even, in such a way that the longest monotone sequence is of length {k}, gives the upper bound {c(k^2) \leq 1/k}. It is also easy to see that {c(n)} is a non-increasing function of {n}, so this gives a general bound {c(n) \leq (1+o(1))/\sqrt{n}}. Less than an hour after that, Wouter van Doorn noted that the Hanani result mentioned above gives the lower bound {c(n) \geq (\frac{1}{\sqrt{2}}-o(1))/\sqrt{n}}, and posed the problem of determining the asymptotic limit of {\sqrt{n} c(n)} as {n \rightarrow \infty}, given that this was now known to range between {1/\sqrt{2}-o(1)} and {1+o(1)}. This version was accepted by Thomas Bloom, the moderator of the Erdős problem site, as a valid interpretation of the original problem.

The next day, Stijn computed the first few values of {c(n)} exactly:

\displaystyle  1, 1, 2/3, 1/2, 1/2, 3/7, 2/5, 3/8, 1/3.

While the general pattern was not yet clear, this was enough data for Stijn to conjecture that {c(k^2)=1/k}, which would also imply that {\sqrt{n} c(n) \rightarrow 1} as {n \rightarrow \infty}. (EDIT: as later located by an AI deep research tool, this conjecture was also made in Section 12 of this 1980 article of Steele.) Stijn also described the extremizing sequences for this range of {n}, but did not continue the calculation further (a naive computation would take runtime exponential in {n}, due to the large number of possible subsequences to consider).

The problem then lay dormant for almost two months, until December 7, 2025, in which Boris Alexeev, as part of a systematic sweep of the Erdős problems using the AI tool Aristotle, was able to get this tool to autonomously solve this conjecture {c(k^2)=1/k} in the proof assistant language Lean. The proof converted the problem to a rectangle-packing problem.

This was one further addition to a recent sequence of examples where an Erdős problem had been automatically solved in one fashion or another by an AI tool. Like the previous cases, the proof turned out to not be particularly novel. Within an hour, Koishi Chan gave an alternate proof deriving the required bound {c(k^2) \geq 1/k} from the original Erdős-Szekeres theorem by a standard “blow-up” argument which we can give here in the Alice-Bob formulation. Take a large {M}, and replace each pile of {x_i} coins with {(1+o(1)) M^2 x_i^2} new piles, each of size {(1+o(1)) x_i}, chosen so that the longest monotone subsequence in this collection is {(1+o(1)) M x_i}. Among all the new piles, the longest monotone subsequence has length {(1+o(1)) M S(x_1,\dots,x_n)}. Applying Erdős-Szekeres, one concludes the bound

\displaystyle  M S(x_1,\dots,x_n) \geq (1-o(1)) (\sum_{i=1}^{k^2} M^2 x_i^2)^{1/2}

and on canceling the {M}‘s, sending {M \rightarrow \infty}, and applying Cauchy-Schwarz, one obtains {c(k^2) \geq 1/k} (in fact the argument gives {c(n) \geq 1/\sqrt{n}} for all {n}).

Once this proof was found, it was natural to try to see if it had already appeared in the literature. AI deep research tools have successfully located such prior literature in the past, but in this case they did not succeed, and a more “old-fashioned” Google Scholar job turned up some relevant references: a 2016 paper by Tidor, Wang and Yang contained this precise result, citing an earlier paper of Wagner as inspiration for applying “blowup” to the Erdős-Szekeres theorem.

But the story does not end there! Upon reading the above story the next day, I realized that the problem of estimating {c(n)} was a suitable task for AlphaEvolve, which I have used recently as mentioned in this previous post. Specifically, one could task to obtain upper bounds on {c(n)} by directing it to produce real numbers (or integers) {x_1,\dots,x_n} summing up to a fixed sum (I chose {10^6}) with a small a value of {S(x_1,\dots,x_n)} as possible. After an hour of run time, AlphaEvolve produced the following upper bounds on {c(n)} for {1 \leq n \leq 16}, with some intriguingly structured potential extremizing solutions:

The numerical scores (divided by {10^6}) were pretty obviously trying to approximate simple rational numbers. There were a variety of ways (including modern AI) to extract the actual rational numbers they were close to, but I searched for a dedicated tool and found this useful little web page of John Cook that did the job:

\displaystyle  1, 1, 2/3, 1/2, 1/2, 3/7, 2/5, 3/8, 1/3, 1/4.

\displaystyle  1/3, 4/13, 3/10, 4/14, 3/11, 4/15, 1/4.

I could not immediately see the pattern here, but after some trial and error in which I tried to align numerators and denominators, I eventually organized this sequence into a more suggestive form:

\displaystyle  1,

\displaystyle  1/1, \mathbf{2/3}, 1/2,

\displaystyle  2/4, \mathbf{3/7}, 2/5, \mathbf{3/8}, 2/6,

\displaystyle  3/9, \mathbf{4/13}, 3/10, \mathbf{4/14}, 3/11, \mathbf{4/15}, 3/12.

This gave a somewhat complicated but predictable conjecture for the values of the sequence {c(n)}. On posting this, Boris found a clean formulation of the conjecture, namely that

\displaystyle  c(k^2 + 2a + 1) = \frac{k}{k^2+a} \ \ \ \ \ (1)

whenever {k \geq 1} and {-k \leq a \leq k}. After a bit of effort, he also produced an explicit upper bound construction:

Proposition 1 If {k \geq 1} and {-k \leq a \leq k}, then {c(k^2+2a+1) \leq \frac{k}{k^2+a}}.

Proof: Consider a sequence {x_1,\dots,x_{k^2+2a+1}} of numbers clustered around the “red number” {|a|} and “blue number” {|a+1|}, consisting of {|a|} blocks of {k-|a|} “blue” numbers, followed by {|a+1|} blocks of {|a+1|} “red” numbers, and then {k-|a|} further blocks of {k} “blue” numbers. When {a \geq 0}, one should take all blocks to be slightly decreasing within each block, but the blue blocks should be are increasing between each other, and the red blocks should also be increasing between each other. When {a < 0}, all of these orderings should be reversed. The total number of elements is indeed

\displaystyle  |a| \times (k-|a|) + |a+1| \times |a+1| + (k-|a|) \times k

\displaystyle  = k^2 + 2a + 1

and the total sum is close to

\displaystyle |a| \times (k-|a|) \times |a+1| + |a+1| \times |a+1| \times |a|

+ (k-|a|) \times k \times |a+1| = (k^2 + a) |a+1|.

With this setup, one can check that any monotone sequence consists either of at most {|a+1|} red elements and at most {k-|a|} blue elements, or no red elements and at most {k} blue elements, in either case giving a monotone sum that is bounded by either

\displaystyle  |a+1| \times |a| + (k-|a|) \times |a+1| = k |a+1|

or

\displaystyle  0 + k \times |a+1| = k |a+1|,

giving the claim. \Box

Here is a figure illustrating the above construction in the {a \geq 0} case (obtained after starting with a ChatGPT-provided file and then manually fixing a number of placement issues):

Here is a plot of 1/c(n) (produced by ChatGPT Pro), showing that it is basically a piecewise linear approximation to the square root function:

Shortly afterwards, Lawrence Wu clarified the connection between this problem and a square packing problem, which was also due to Erdős (Problem 106). Let {f(n)} be the least number such that, whenever one packs {n} squares of sidelength {d_1,\dots,d_n} into a square of sidelength {D}, with all sides parallel to the coordinate axes, one has

\displaystyle  \sum_{i=1}^n d_i \leq f(n) D.

Proposition 2 For any {n}, one has

\displaystyle  c(n) \geq \frac{1}{f(n)}.

Proof: Given {x_1,\dots,x_n} and {1 \leq i \leq n}, let {S_i} be the maximal sum over all increasing subsequences ending in {x_i}, and {T_i} be the maximal sum over all decreasing subsequences ending in {x_i}. For {i < j}, we have either {S_j \geq S_i + x_j} (if {x_j \geq x_i}) or {T_j \geq T_i + x_j} (if {x_j \leq x_i}). In particular, the squares {(S_i-x_i, T_i-x_i)} and {(S_j-x_j, T_j-x_j)} are disjoint. These squares pack into the square {[0, S(x_1,\dots,x_n)]^2}, so by definition of {f}, we have

\displaystyle  \sum_{i=1}^n x_i \leq f(n) S(x_1,\dots,x_n),

and the claim follows. \Box

This idea of using packing to prove Erdős-Szekeres type results goes back to a 1959 paper of Seidenberg, although it was a discrete rectangle-packing argument that was not phrased in such an elegantly geometric form. It is possible that Aristotle was “aware” of the Seidenberg argument via its training data, as it had incorporated a version of this argument in its proof.

Here is an illustration of the above argument using the AlphaEvolve-provided example

\displaystyle[99998, 99997, 116305, 117032, 116304,

\displaystyle 58370, 83179, 117030, 92705, 99080]

for n=10 to convert it to a square packing (image produced by ChatGPT Pro):

At this point, Lawrence performed another AI deep research search, this time successfully locating a paper from just last year by Baek, Koizumi, and Ueoro, where they show that

Theorem 3 For any {k \geq 1}, one has

\displaystyle  f(k^2+1) \leq k

which, when combined with a previous argument of Praton, implies

Theorem 4 For any {k \geq 1} and {c \in {\bf Z}} with {k^2+2c+1 \geq 1}, one has

\displaystyle  f(k^2+2c+1) \leq k + \frac{c}{k}.

This proves the conjecture!

There just remained the issue of putting everything together. I did feed all of the above information into a large language model, which was able to produce a coherent proof of (1) assuming the results of Baek-Koizumi-Ueoro and Praton. Of course, LLM outputs are prone to hallucination, so it would be preferable to formalize that argument in Lean, but this looks quite doable with current tools, and I expect this to be accomplished shortly. But I was also able to reproduce the arguments of Baek-Koizumi-Ueoro and Praton, which I include below for completeness.

Proof: (Proof of Theorem 3, adapted from Baek-Koizumi-Ueoro) We can normalize {D=k}. It then suffices to show that if we pack the length {k} torus {({\bf Z}/k{\bf Z})^2} by {k^2+1} axis-parallel squares of sidelength {d_1,\dots,d_{k^2+1}}, then

\displaystyle  \sum_{i=1}^{k^2+1} d_i \leq k^2.

Pick {x_0, y_0 \in {\bf R}/k{\bf Z}}. Then we have a {k \times k} grid

\displaystyle  (x_0 + {\bf Z}) \times (y_0 + {\bf Z}) \pmod {k{\bf Z}^2}

inside the torus. The {i^{th}} square, when restricted to this grid, becomes a discrete rectangle {A_{i,x_0} \times B_{i,y_0}} for some finite sets {A_{i,x_0}, B_{i,y_0}} with

\displaystyle  |\# A_{i,x_0} -\# B_{i,y_0}| \leq 1. \ \ \ \ \ (2)

By the packing condition, we have

\displaystyle  \sum_{i=1}^{k^2+1} \# A_{i,x_0} \# B_{i,y_0} \leq k^2.

From (2) we have

\displaystyle  (\# A_{i,x_0} - 1) (\# B_{i,y_0} - 1) \geq 0

hence

\displaystyle  \# A_{i,x_0} \# B_{i,y_0} \geq \# A_{i,x_0} + \# B_{i,y_0} - 1.

Inserting this bound and rearranging, we conclude that

\displaystyle  \sum_{i=1}^{k^2+1} \# A_{i,x_0} + \sum_{i=1}^{k^2+1} \# B_{i,y_0} \leq 2k^2 + 1.

Taking the supremum over {x_0,y_0} we conclude that

\displaystyle  \sup_{x_0} \sum_{i=1}^{k^2+1} \# A_{i,x_0} + \sup_{y_0} \sum_{i=1}^{k^2+1} \# B_{i,y_0} \leq 2k^2 + 1

so by the pigeonhole principle one of the summands is at most {k^2}. Let’s say it is the former, thus

\displaystyle  \sup_{x_0} \sum_{i=1}^{k^2+1} \# A_{i,x_0} \leq k^2.

In particular, the average value of {\sum_{i=1}^{k^2+1} \# A_{i,x_0}} is at most {k^2}. But this can be computed to be {\sum_{i=1}^{k^2+1} d_i}, giving the claim. Similarly if it is the other sum. \Box

UPDATE: Actually, the above argument also proves Theorem 4 with only minor modifications. Nevertheless, we give the original derivation of Theorem 4 using the embedding argument of Praton below for sake of completeness.

Proof: (Proof of Theorem 4, adapted from Praton) We write {c = \epsilon |c|} with {\epsilon = \pm 1}. We can rescale so that the square one is packing into is {[0,k]^2}. Thus, we pack {k^2+2\varepsilon |c|+1} squares of sidelength {d_1,\dots,d_{k^2+2\varepsilon |c|+1}} into {[0,k]^2}, and our task is to show that

\displaystyle  \sum_{i=1}^{k^2+2\varepsilon|c|+1} d_i \leq k^2 + \varepsilon |c|.

We pick a large natural number {N} (in particular, larger than {k}), and consider the three nested squares

\displaystyle  [0,k]^2 \subset [0,N]^2 \subset [0,N + |c| \frac{N}{N-\varepsilon}]^2.

We can pack {[0,N]^2 \backslash [0,k]^2} by {N^2-k^2} unit squares. We can similarly pack

\displaystyle  [0,N + |c| \frac{N}{N-\varepsilon}]^2 \backslash [0,N]^2

\displaystyle  =[0, \frac{N}{N-\varepsilon} (N+|c|-\varepsilon)]^2 \backslash [0, \frac{N}{N-\varepsilon} (N-\varepsilon)]^2

into {(N+|c|-\varepsilon)^2 - (N-\varepsilon)^2} squares of sidelength {\frac{N}{N-\varepsilon}}. All in all, this produces

\displaystyle  k^2+2\varepsilon |c|+1 + N^2-k^2 + (N+|c|-\varepsilon)^2 - (N-\varepsilon)^2

\displaystyle   = (N+|c|)^2 + 1

squares, of total length

\displaystyle (\sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i) +(N^2-k^2) + ((N+|c|-\varepsilon)^2 - (N-\varepsilon)^2) \frac{N}{N-\varepsilon}.

Applying Theorem 3, we conclude that

\displaystyle (\sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i) +(N^2-k^2)

\displaystyle  + ((N+|c|-\varepsilon)^2 - (N-\varepsilon)^2) \frac{N}{N-\varepsilon} \leq (N+|c|) (N + |c| \frac{N}{N-\varepsilon}).

The right-hand side is

\displaystyle  N^2 + 2|c| N + |c|^2 + \varepsilon |c| + O(1/N)

and the left-hand side similarly evaluates to

\displaystyle (\sum_{i=1}^{k^2+2c+1} d_i) + N^2 -k^2 + 2|c| N + |c|^2 + O(1/N)

and so we simplify to

\displaystyle \sum_{i=1}^{k^2+2\varepsilon |c|+1} d_i \leq k^2 + \varepsilon |c| + O(1/N).

Sending {N \rightarrow \infty}, we obtain the claim. \Box

One striking feature of this story for me is how important it was to have a diverse set of people, literature, and tools to attack this problem. To be able to state and prove the precise formula for {c(n)} required multiple observations, including some version of the following:

  • The sequence can be numerically computed as a sequence of rational numbers.
  • When appropriately normalized and arranged, visible patterns in this sequence appear that allow one to conjecture the form of the sequence.
  • This problem is a weighted version of the Erdős-Szekeres theorem.
  • Among the many proofs of the Erdős-Szekeres theorem is the proof of Seidenberg in 1959, which can be interpreted as a discrete rectangle packing argument.
  • This problem can be reinterpreted as a continuous square packing problem, and in fact is closely related to (a generalized axis-parallel form of) Erdős problem 106, which concerns such packings.
  • The axis-parallel form of Erdős problem 106 was recently solved by Baek-Koizumi-Ueoro.
  • The paper of Praton shows that Erdős Problem 106 implies the generalized version needed for this problem. This implication specializes to the axis-parallel case.
It was only through the combined efforts of all the contributors and their tools that all these key inputs were able to be assembled within 48 hours. It seems plausible that a more traditional effort involving just one or two mathematicians and simpler programming and literature search tools may eventually have been able to put all these pieces together, but I believe this process would have taken much longer (on the order of weeks or even months).

Another key ingredient was the balanced AI policy on the Erdős problem website, which encourages disclosed AI usage while strongly discouraging undisclosed use. To quote from that policy: “Comments prepared with the assistance of AI are permitted, provided (a) this is disclosed, (b) the contents (including mathematics, code, numerical data, and the existence of relevant sources) have been carefully checked and verified by the user themselves without the assistance of AI, and (c) the comment is not unreasonably long.”

December 21, 2025

n-Category Café Octonions and the Standard Model (Part 13)

When Lee and Yang suggested that the laws of physics might not be invariant under spatial reflection — that there’s a fundamental difference between left and right — Pauli was skeptical. In a letter to Victor Weisskopf in January 1957, he wrote:

“Ich glaube aber nicht, daß der Herrgott ein schwacher Linkshänder ist.”

(I do not believe that the Lord is a weak left-hander.)

But just two days after Pauli wrote this letter, Chien-Shiung Wu’s experiment confirmed that Lee and Yang were correct. There’s an inherent asymmetry in nature.

We can trace this back to how the ‘left-handed’ fermions and antifermions live in a different representation of the Standard Model gauge group than the right-handed ones. And when we try to build grand unified theories that take this into account, we run into the fact that while we can fit the Standard Model gauge group into Spin(10)\text{Spin}(10) in various ways, not all these ways produce the required asymmetry. There’s a way where it fits into Spin(9)\text{Spin}(9), which is too symmetrical to work… and alas, this one has a nice octonionic description!

To keep things simple I’ll explain this by focusing, not on the whole Standard Model gauge group, but its subgroup SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3). Here is a theorem proved by Will Sawin in response to a question of mine on MathOverflow:

Theorem 10. There are exactly two conjugacy classes of subgroups of Spin(10)\text{Spin}(10) that are isomorphic to SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3). One of them has a representative that is a subgroup of Spin(9)Spin(10)\text{Spin}(9) \subset \text{Spin}(10), while the other does not.

I’ll describe representatives of these two subgroups; then I’ll say a bit about how they show up in physics, and then I’ll show you Sawin’s proof.

We can get both subgroups in a unified way! There’s always an inclusion

SO(m)×SO(n)SO(m+n) \text{SO}(m) \times \text{SO}(n) \to \text{SO}(m+n)

and taking double covers of each group we get a 2-1 homomorphism

Spin(m)×Spin(n)Spin(m+n) \text{Spin}(m) \times \text{Spin}(n) \to \text{Spin}(m+n)

In particular we have

Spin(4)×Spin(6)Spin(10) \text{Spin}(4) \times \text{Spin}(6) \to \text{Spin}(10)

so composing with the exceptional isomorphisms:

Spin(4)SU(2)×SU(2),Spin(6)SU(4) \text{Spin}(4) \cong \text{SU}(2) \times \text{SU}(2), \qquad \text{Spin}(6) \cong \text{SU}(4)

we get a 2-1 homomorphism

k:SU(2)×SU(2)×SU(4)Spin(10) k \colon \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \to \text{Spin}(10)

Now, there are three obvious ways to include SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) in SU(2)×SU(2)×SU(4)\text{SU}(2) \times \text{SU}(2) \times \text{SU}(4). There is an obvious inclusion

j:SU(3)SU(4) j \colon \text{SU}(3) \hookrightarrow \text{SU}(4)

but there are three obvious inclusions

,r,δ:SU(2)SU(2)×SU(2) \ell, r, \delta \colon \text{SU}(2) \hookrightarrow \text{SU}(2) \times \text{SU}(2)

namely the left one:

:SU(2) SU(2)×SU(2) g (g,1) \begin{array}{ccc} \ell \colon \text{SU}(2) &\to& \text{SU}(2) \times \text{SU}(2) \\ g & \mapsto & (g,1) \end{array}

the right one:

r:SU(2) SU(2)×SU(2) g (1,g) \begin{array}{ccc} r \colon \text{SU}(2) &\to& \text{SU}(2) \times \text{SU}(2) \\ g & \mapsto & (1,g) \end{array}

and the diagonal one:

δ:SU(2) SU(2)×SU(2) g (g,g) \begin{array}{ccc} \delta \colon \text{SU}(2) &\to& \text{SU}(2) \times \text{SU}(2) \\ g & \mapsto & (g,g) \end{array}

Combining these with our earlier maps, we actually get a one-to-one map from SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) to Spin(10)\text{Spin}(10). So we get three subgroups of Spin(10)\text{Spin}(10), all isomorphic to SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3):

  • There’s the left subgroup G G_\ell, which is the image of this composite homomorphism:

SU(2)×SU(3)×jSU(2)×SU(2)×SU(4)Spin(4)×Spin(6)kSpin(10) \text{SU}(2) \times \text{SU}(3) \stackrel{\ell \times j}{\longrightarrow} \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \cong \text{Spin}(4) \times \text{Spin}(6) \stackrel{k}{\longrightarrow} \text{Spin}(10)

  • There’s the diagonal subgroup G δG_\delta, which is the image of this:

SU(2)×SU(3)δ×jSU(2)×SU(2)×SU(4)Spin(4)×Spin(6)kSpin(10) \text{SU}(2) \times \text{SU}(3) \stackrel{\delta \times j}{\longrightarrow} \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \cong \text{Spin}(4) \times \text{Spin}(6) \stackrel{k}{\longrightarrow} \text{Spin}(10)

  • And there’s the right subgroup G rG_r, which is the image of this:

SU(2)×SU(3)r×jSU(2)×SU(2)×SU(4)Spin(4)×Spin(6)kSpin(10) \text{SU}(2) \times \text{SU}(3) \stackrel{r \times j}{\longrightarrow} \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \cong \text{Spin}(4) \times \text{Spin}(6) \stackrel{k}{\longrightarrow} \text{Spin}(10)

The left and right subgroups are actually conjugate, but the diagonal one is truly different! We’ll prove this by taking a certain representation of Spin(10)\text{Spin}(10), called the Weyl spinor representation, and restricting it to those two subgroups. We’ll get inequivalent representations of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3). This proves the two subgroups aren’t conjugate.

This argument is also interesting for physics. When restrict to the left subgroup, we get a representation of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) that matches what we actually see for one generation of fermions! This is the basis of the so-called SO(10)\text{SO}(10) grand unified theory, which should really be called the Spin(10)\text{Spin}(10) grand unified theory.

(In fact this works not only for SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) but for the whole Standard Model gauge group, which is larger. I’m focusing on SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) just because it makes the story simpler.)

When we restrict the Weyl spinor representation to the diagonal subgroup, we get a representation of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) that is not physically correct. Unfortunately, it’s the diagonal subgroup that shows up in several papers connecting the Standard Model gauge group to the octonions. I plan to say a lot more about this later.

The left subgroup

Let’s look at the left subgroup G G_\ell, the image of this composite:

SU(2)×SU(3)×jSU(2)×SU(2)×SU(4)Spin(4)×Spin(6)kSpin(10) \text{SU}(2) \times \text{SU}(3) \stackrel{\ell \times j}{\longrightarrow} \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \cong \text{Spin}(4) \times \text{Spin}(6) \stackrel{k}{\longrightarrow} \text{Spin}(10)

Spin(10)\text{Spin}(10) has a 32-dimensional unitary representation called the ‘Dirac spinor’ representation. This representation is really on the exterior algebra Λ 5\Lambda \mathbb{C}^5. It’s the direct sum of two irreducible parts, the even grades and the odd grades:

Λ 5Λ even 5Λ odd 5 \Lambda \mathbb{C}^5 \cong \Lambda^{\text{even}} \mathbb{C}^5 \oplus \Lambda^{\text{odd}} \mathbb{C}^5

Physicists call these two irreducible representations ‘right- and left-handed Weyl spinors’, and denote them as 16\mathbf{16} and 16*\mathbf{16}\ast since they’re 16-dimensional and one is the dual of the other.

Let’s restrict the 16\mathbf{16} to the left subgroup G G_\ell and see what we get.

To do this, first we can restrict the 16\mathbf{16} along kk and get

214124* \mathbf{2} \otimes \mathbf{1} \otimes \mathbf{4} \; \oplus \; \mathbf{1} \otimes \mathbf{2} \otimes \mathbf{4}\ast

Here 1\mathbf{1} is the trivial representation of SU(2)\text{SU}(2), 2\mathbf{2} is the tautologous representation of SU(2)\text{SU}(2), and 4\mathbf{4} is the tautologous rep of SU(4)\text{SU}(4).

Then let’s finish the job by restricting this representation along ×j\ell \times j. Restricting the 4\mathbf{4} of SU(4)\text{SU}(4) to SU(3)\text{SU}(3) gives 31\mathbf{3} \oplus \mathbf{1}: the sum of the tautologous representation of SU(3)\text{SU}(3) and the trivial representation. Restricting 21\mathbf{2} \otimes \mathbf{1} to the left copy of SU(2)\text{SU}(2) gives the tautologous representation 2\mathbf{2}, while restricting 12\mathbf{1} \otimes \mathbf{2} to this left copy gives 11\mathbf{1} \oplus \mathbf{1}: the sum of two copies of the trivial representation. All in all, we get this representation of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3):

2(31)(11)(3*1) \mathbf{2} \otimes (\mathbf{3} \oplus \mathbf{1}) \; \oplus \; (\mathbf{1} \oplus \mathbf{1}) \otimes (\mathbf{3}\ast \oplus \mathbf{1})

This is what we actually see for one generation of left-handed fermions and antifermions in the Standard Model! The representation 31\mathbf{3} \oplus \mathbf{1} describes how the left-handed fermions in one generation transform under SU(3)\text{SU}(3): 3 colors of quark and one ‘white’ lepton. The representation 3*1\mathbf{3}\ast \oplus \mathbf{1} does the same for the left-handed antifermions. The left-handed fermions form an isospin doublet, giving us the 2\mathbf{2}, while the left-handed antifermions have no isospin, giving us the 11\mathbf{1} \oplus \mathbf{1}.

This strange lopsidedness is a fundamental feature of the Standard Model.

The right subgroup would work the same way, up to switching the words ‘left-handed’ and ‘right-handed’. And by Theorem 10, the left and right subgroups must be conjugate in Spin(10)\text{Spin}(10), because now we’ll see one that’s not conjugate to either of these.

The diagonal subgroup

Consider the diagonal subgroup G δG_\delta, the image of this composite:

SU(2)×SU(3)δ×jSU(2)×SU(2)×SU(4)Spin(4)×Spin(6)kSpin(10) \text{SU}(2) \times \text{SU}(3) \stackrel{\delta \times j}{\longrightarrow} \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \cong \text{Spin}(4) \times \text{Spin}(6) \stackrel{k}{\longrightarrow} \text{Spin}(10)

Let’s restrict the 16\mathbf{16} to G δG_\delta.

To do this, first let’s restrict the 16\mathbf{16} along k:SU(2)×SU(2)×SU(4)Spin(10)k \colon \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) \to \text{Spin}(10) and get

214124* \mathbf{2} \otimes \mathbf{1} \otimes \mathbf{4} \; \oplus \; \mathbf{1} \otimes \mathbf{2} \otimes \mathbf{4}\ast

as before. Then let’s restrict this representation along δ×j\delta \times j. The SU(3)\SU(3) part works as before, but what happens when we restrict 21\mathbf{2} \otimes \mathbf{1} or 12\mathbf{1} \otimes \mathbf{2} along the diagonal map δ:SU(2)SU(2)×SU(2)\delta \colon \text{SU}(2) \to \text{SU}(2) \times \text{SU}(2)? We get 2\mathbf{2}. So, this is the representation of G δG_\delta that we get:

2(31)2(3*1) \mathbf{2} \otimes (\mathbf{3} \oplus \mathbf{1}) \; \oplus \; \mathbf{2} \otimes (\mathbf{3}\ast \oplus \mathbf{1})

This is not good for the Standard Model. It describes a more symmetrical universe than ours, where both left-handed fermions and antifermions transform as doublets under SU(2)\text{SU}(2).

The fact that we got a different answer this time proves that G G_\ell and G δG_\delta are not conjugate in Spin(10)\text{Spin}(10). So to complete the proof of Theorem 10, we only need to prove

  1. Every subgroup of Spin(10)\text{Spin}(10) isomorphic to SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) is conjugate to G G_\ell or G δG_\delta.

  2. G δG_\delta is conjugate to a subgroup of Spin(9)Spin(10)\text{Spin}(9) \subset \text{Spin}(10), but G G_\ell is not.

I’ll prove 2, and then I’ll turn you over to Will Sawin to do the rest.

Why the diagonal subgroup fits in Spin(9)\text{Spin}(9)

Every rotation of n\mathbb{R}^n extends to a rotation of n+1\mathbb{R}^{n+1} that leaves the last coordinate fixed, so we get an inclusion SO(n)SO(n+1)\text{SO}(n) \hookrightarrow \text{SO}(n+1), which lifts to an inclusion of the double covers, Spin(n)Spin(n+1)\text{Spin}(n) \hookrightarrow \text{Spin}(n+1). Since we have exceptional isomorphisms

Spin(3)SU(2),Spin(4)SU(2)×SU(2) \text{Spin}(3) \cong \text{SU}(2), \qquad \text{Spin}(4) \cong \text{SU}(2) \times \text{SU}(2)

it’s natural to ask how the inclusion Spin(3)Spin(4)\text{Spin}(3) \hookrightarrow \text{Spin}(4) looks in these terms. And the answer is: it’s the diagonal map! In other words, we have a commutative diagram

SU(2) Spin(3) δ SU(2)×SU(2) Spin(4) \begin{array}{ccc} \text{SU}(2) & \xrightarrow{\sim} & \text{Spin}(3) \\ \delta \downarrow & & \downarrow \\ \text{SU}(2) \times \text{SU}(2) & \xrightarrow{\sim} & \text{Spin}(4) \end{array}

Now, we can easily fit this into a larger commutative diagram involving some natural maps Spin(m)×Spin(n)Spin(m+n)\text{Spin}(m) \times \text{Spin}(n) \to \text{Spin}(m+n) and Spin(n)Spin(n+1)\text{Spin}(n) \to \text{Spin}(n+1):

SU(2) Spin(3) Spin(3)×Spin(6) Spin(9) δ SU(2)×SU(2) Spin(4) Spin(4)×Spin(6) Spin(10) \begin{array}{ccccccc} \text{SU}(2) & \xrightarrow{\sim} & \text{Spin}(3) & \to & \text{Spin}(3) \times \text{Spin}(6) & \to & \text{Spin}(9) \\ \delta \downarrow & & \downarrow & & \downarrow & & \downarrow \\ \text{SU}(2) \times \text{SU}(2) & \xrightarrow{\sim} & \text{Spin}(4) & \to & \text{Spin}(4) \times \text{Spin}(6) & \to & \text{Spin}(10) \end{array}

We can simplify this diagram using the isomorphism Spin(6)SU(4)\text{Spin}(6) \cong \text{SU}(4):

SU(2)×SU(4) Spin(9) δ×1 SU(2)×SU(2)×SU(4) Spin(10) \begin{array}{ccccccc} \text{SU}(2) \times \text{SU}(4) & \to & \text{Spin}(9) \\ \delta \times 1 \downarrow & & \downarrow \\ \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) & \to & \text{Spin}(10) \end{array}

and then we can use our friend the inclusion j:SU(3)SU(4)j \colon \text{SU}(3) \to \text{SU}(4):

SU(2)×SU(3) 1×j SU(2)×SU(4) Spin(9) δ×1 SU(2)×SU(2)×SU(4) Spin(10) \begin{array}{ccccccc} \text{SU}(2) \times \text{SU}(3) & \xrightarrow{1 \times j} & \text{SU}(2) \times \text{SU}(4) & \to & \text{Spin}(9) \\ & & \delta \times 1 \downarrow & & \downarrow \\ & & \text{SU}(2) \times \text{SU}(2) \times \text{SU}(4) & \to & \text{Spin}(10) \end{array}

This shows that the diagonal subgroup G δG_\delta of Spin(10)\text{Spin}(10) is actually a subgroup of Spin(9)\text{Spin}(9)!

Why the left subgroup does not fit in Spin(9)\text{Spin}(9)

The three-fold way is a coarse classification of irreducible complex representations of compact Lie group. Every such representation is of one and only one of these three kinds:

1) not self-dual: not isomorphic to its dual,

2a) orthogonal: isomorphic to its dual via an invariant nondegenerate symmetric bilinear form, also called an orthogonal structure,

2b) symplectic: isomorphic to its dual via an invariant nondegenerate antisymmetric bilinear form, also called a symplectic structure.

I’ve written about how these three cases are related to the division algebras ,\mathbb{C}, \mathbb{R} and \mathbb{H}, respectively:

A complex representation is orthogonal iff it’s the complexification of a representation on a real vector space, and symplectic iff it’s the underlying complex representation of a representation on a quaternionic vector space.

But we don’t need most of this yet. For now we just need to know one fact: when nn is odd, every irreducible representation of Spin(n)\text{Spin}(n), and thus every representation of this Lie group, is self-dual: that is, isomorphic to its dual. In particular this is true of Spin(9)\text{Spin}(9).

Why does this matter? Assume the left subgroup G Spin(10)G_\ell \subset \text{Spin}(10) is a subgroup of Spin(9)\text{Spin}(9). When we restrict the Weyl spinor representation of Spin(10)\text{Spin}(10) to Spin(9)\text{Spin}(9) it will be self-dual, like every representation of Spin(9)\text{Spin}(9). Then when we restrict this representation further to SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) it must still be self-dual, since the restriction of a self-dual representation is clearly self-dual.

However, we know this representation is

2(31)(11)(3*1) \mathbf{2} \otimes (\mathbf{3} \oplus \mathbf{1}) \; \oplus \; (\mathbf{1} \oplus \mathbf{1}) \otimes (\mathbf{3}\ast \oplus \mathbf{1})

and this is not self-dual, since 1*1\mathbf{1}\ast \cong \mathbf{1} and 2*2\mathbf{2}\ast \cong \mathbf{2} but 3*3\mathbf{3}\ast \ncong \mathbf{3}.

So, it must be that G G_\ell is not a subgroup of Spin(9)\text{Spin}(9).

Proof of Theorem 10

To complete the proof of Theorem 10 we just need to see why there are just two conjugacy classes of subgroups of Spin(10)\text{Spin}(10) isomorphic to SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3). But in fact Will Sawin proved a stronger result! He was answering this question of mine:

Define the Standard Model gauge group to be S(U(2)×U(3))\text{S}(\text{U}(2) \times \text{U}(3)), the subgroup of SU(5)\text{SU}(5) consisting of block diagonal matrices with a 2×22 \times 2 block and then a 3×33 \times 3 block. (This is isomorphic to the quotient of U(1)×SU(2)×SU(3)\text{U}(1) \times \text{SU}(2) \times \text{SU}(3) by the subgroup of elements (α,α 3,α 2(\alpha, \alpha^{-3}, \alpha^2) where α\alpha is a 6th root of unity.)

Up to conjugacy, how many subgroups isomorphic to the Standard Model gauge group does Spin(10)\text{Spin}(10) have?

This question is relevant to grand unified theories of particle physics, as explained here:

This paper focuses on one particular copy of S(U(2)×U(3))\text{S}(\text{U}(2) \times \text{U}(3)) in Spin(10)\text{Spin}(10), given as follows. By definition we have an inclusion S(U(2)×U(3))SU(5)\text{S}(\text{U}(2) \times \text{U}(3)) \hookrightarrow \text{SU}(5), and we also have an inclusion SU(5)Spin(10)\text{SU}(5) \hookrightarrow \text{Spin}(10) because for any nn we have an inclusion SU(n)SO(2n)\text{SU}(n) \hookrightarrow \text{SO}(2n), and SU(n)\text{SU}(n) is simply connected so this gives a homomorphism SU(n)Spin(2n)\text{SU}(n) \hookrightarrow \text{Spin}(2n).

However I think there is also an inclusion S(U(2)×U(3))Spin(9)\text{S}(\text{U}(2) \times \text{U}(3)) \hookrightarrow \text{Spin}(9), studied by Krasnov:

Composing this with Spin(9)Spin(10)\text{Spin}(9) \hookrightarrow \text{Spin}(10), this should give another inclusion S(U(2)×U(3))Spin(10)\text{S}(\text{U}(2) \times \text{U}(3)) \hookrightarrow \text{Spin}(10), and I believe this one is ‘truly different from’ — i.e., not conjugate to — the first one I mentioned.

So I believe my current answer to my question is “at least two”. But that’s not good enough.

Sawin’s answer relies heavily on the 3-fold way — that’s why I told you that stuff about orthogonal and symplectic representatinos. When we embed the group SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) in Spin(10)\text{Spin}(10), we are automatically giving this group an orthogonal 10-dimensional representation, thanks to the map Spin(10)SO(10)\text{Spin}(10) \to \text{SO}(10). We can classify the possibilities.

He writes:

There are infinitely many embeddings. However, all but one of them is “essentially the same as” the one you studied as they become equal to the one you studied on restriction to SU(2)×SU(3)\text{SU}(2)\times \text{SU}(3). The remaining one is the one studied by Krasnov.

I follow the strategy suggested by Kenta Suzuki.

SU(3)\text{SU}(3) has irreducible representations of dimensions 1,3,3,6,8,6,10,101,3,3,6,8,6, 10, 10, and higher dimensions. The 1010-dimensional ones are dual to each other, as are the 66-dimensional ones, so they can’t appear. The 33-dimensional ones are dual to each other and can only appear together. So the only 1010-dimensional self-dual representations of SU(3)\text{SU}(3) decompose as irreducibles as 8+1+18+1+1, 3+3+1+1+1+13+3+1+1+1+1, or ten 11s. All of these are orthogonal because the 8-dimensional representation is orthogonal. However, the ten 11s cannot appear because then SU(3)\text{SU}(3) would act trivially.

A representation of SU(3)×SU(2)\text{SU}(3) \times \text{SU}(2) is a sum of tensor products of irreducible representations of SU(3)\text{SU}(3) and irreducible representations of SU(2)\text{SU}(2). Restricted to SU(3)\text{SU}(3), each tensor product splits into a sum of copies of the same irreducible representation. So SU(2)\text{SU}(2) can only act nontrivially when the same representation appears multiple times. Since the 3+33+3 is two different 33-dimensional representation, only the 11-dimensional representation can occur twice. Thus, our 10-dimensional orthogonal representation of SU(3)×SU(2)\text{SU}(3) \times \text{SU}(2) necessarily splits as either the 88-dimensional adjoint repsentation of SU(3)\text{SU}(3) plus a 22-dimensional orthogonal representation of SU(2)\text{SU}(2) or the 66-dimensional sum of standard and conjugate [i.e., dual] representations of SU(3)\text{SU}(3) plus a 44-dimensional orthogonal representation of SU(2)\text{SU}(2). However, SU(2)\text{SU}(2) has a unique nontrivial representation of dimension 22 and it isn’t orthgonal, so only the second case can appear. SU(2)\text{SU}(2) has representations of dimension 1,2,3,41,2,3,4 of which the 22 and 44-dimensional ones are symplectic and so must appear with even multiplicity in any orthogonal representation, so the only nontrivial 44-dimensional orthogonal ones are 2+22+2 or 3+13+1.

So there are two ten-dimensional orthogonal representations of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) that are nontrivial on both factors, those being the sum of two different 33-dimensional irreducible representations of SU(3)\text{SU}(3) with either two copies of the two-dimensional irreducible representation of SU(2)\text{SU}(2) or the three-dimensional and the one-dimensional irreducible representation of SU(2)\text{SU}(2). The orthogonal structure is unique up to isomorphisms, so these give two conjugacy classes of homomorphisms SU(2)×SU(3)SO(10)\text{SU}(2) \times \text{SU}(3) \to SO(10) and thus two conjugacy classes of homomorphisms SU(2)×SU(3)Spin(10)\text{SU}(2) \times \text{SU}(3) \to \text{Spin}(10). The first one corrresponds to the embedding you studied while only the second one restricts to Spin(9)\text{Spin}(9) so indeed these are different.

To understand how to extend these to S(U(2)×U(3))\text{S}(\text{U}(2) \times \text{U}(3)), I consider the centralizer of the representation within Spin(10)\text{Spin}(10). Since the group is connected, this is the same as the centralizer of its Lie algebra, which is therefore the inverse image of the centralizer in SO(10)\text{SO}(10). Now there is a distinction between the two examples because the example with irrep dimensions 3+3+2+23+3+2+2 has centralizer with identity component U(1)×SU(2)\text{U}(1) \times \text{SU}(2) while the example with irrep dimensions 3+3+3+13+3+3+1 has centralizer with identity component U(1)\text{U}(1). In the second case, the image of U(2)×U(3)\text{U}(2) \times \text{U}(3) must be the image of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) times the centralizer of the image of SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3), so this gives a unique example, which must be the one considered by Krasnov.

In the first case, we can restrict attention to a torus U(1)×U(1)\text{U}(1) \times \text{U}(1) in SU(2)×SU(2)\text{SU}(2) \times \text{SU}(2). The center of S(U(2)×U(3))\text{S}(\text{U}(2) \times \text{U}(3)) maps to a one-dimensional subgroup of this torus, which can be described by a pair of integers. Explicitly, given a two-by-two-unitary matrix AA and a three-by-three unitary matrix BB with det(A)det(B)=1\det(A) \det(B) =1, we can map to U(5)\text{U}(5) by sending (A,B)(A,B) to Aγ aBγ bA \gamma^a \oplus B \gamma^b where γ=det(A)=det(B) 1\gamma = \det (A) = \det(B)^{-1}, and then map from U(5)\text{U}(5) to SO(10)SO(10). This lifts to the spin group if and only if the determinant in U(5)\text{U}(5) is a perfect square. The determinant is γ 1+2a1+3b=γ 2a+3b\gamma^{ 1 + 2a - 1 + 3b} = \gamma^{2a+3b} so a lift exists if and only if bb is even.

The only possible kernel of this embedding is the scalars. The scalar A=λ 3I 2,B=λ 2I 3A = \lambda^3 I_2, B = \lambda^{-2} I_3 maps to λ 3+6aI 2λ 2+6bI 3\lambda^{3+ 6a} I_2 \oplus \lambda^{-2 + 6b} I_3 and so the kernel is trivial if and only if gcd(3+6a,2+6b)=1\gcd(3+6a,-2 + 6b)=1.

However, there are infinitely many integer solutions a,ba,b to gcd(3+6a,2a+6b)=1\gcd(3+6a,-2a+6b)=1 with bb even (in fact, a random aa and even bb works with probability 9/π 29/\pi^2), so this gives infinitely many examples.


  • Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under SU(3)\mathrm{SU}(3).
  • Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
  • Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of SU(3)\mathrm{SU}(3), the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group SU(3)\mathrm{SU}(3).
  • Part 4. Introducing the exceptional Jordan algebra 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}): the 3×33 \times 3 self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the 2×22 \times 2 adjoint octonionic matrices form precisely the Standard Model gauge group.
  • Part 5. How to think of 2×22 \times 2 self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
  • Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group E 6\mathrm{E}_6. How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
  • Part 7. How to describe the Lie group E 6\mathrm{E}_6 using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
  • Part 8. A geometrical way to see how E 6\mathrm{E}_6 is connected to 10d spacetime, based on the octonionic projective plane.
  • Part 9. Duality in projective plane geometry, and how it lets us break the Lie group E 6\mathrm{E}_6 into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
  • Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
  • Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
  • Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
  • Part 13. Two ways to embed SU(2)×SU(3)\text{SU}(2) \times \text{SU}(3) in Spin(10)\text{Spin}(10), and their consequences for particle physics.

December 15, 2025

John PreskillMake use of time, let not advantage slip

During the spring of 2022, I felt as though I kept dashing backward and forward in time. 

At the beginning of the season, hay fever plagued me in Maryland. Then, I left to present talks in southern California. There—closer to the equator—rose season had peaked, and wisteria petals covered the ground near Caltech’s physics building. From California, I flew to Canada to present a colloquium. Time rewound as I traveled northward; allergies struck again. After I returned to Maryland, the spring ripened almost into summer. But the calendar backtracked when I flew to Sweden: tulips and lilacs surrounded me again.

Caltech wisteria in April 2022: Thou art lovely and temperate.

The zigzagging through horticultural time disoriented my nose, but I couldn’t complain: it echoed the quantum information processing that collaborators and I would propose that summer. We showed how to improve quantum metrology—our ability to measure things, using quantum detectors—by simulating closed timelike curves.

Swedish wildflowers in June 2022

A closed timelike curve is a trajectory that loops back on itself in spacetime. If on such a trajectory, you’ll advance forward in time, reverse chronological direction to advance backward, and then reverse again. Author Jasper Fforde illustrates closed timelike curves in his novel The Eyre Affair. A character named Colonel Next buys an edition of Shakespeare’s works, travels to the Elizabethan era, bestows them on a Brit called Will, and then returns to his family. Will copies out the plays and stages them. His colleagues publish the plays after his death, and other editions ensue. Centuries later, Colonel Next purchases one of those editions to take to the Elizabethan era.1 

Closed timelike curves can exist according to Einstein’s general theory of relativity. But do they exist? Nobody knows. Many physicists expect not. But a quantum system can simulate a closed timelike curve, undergoing a process modeled by the same mathematics.

How can one formulate closed timelike curves in quantum theory? Oxford physicist David Deutsch proposed one formulation; a team led by MIT’s Seth Lloyd proposed another. Correlations distinguish the proposals. 

Two entities share correlations if a change in one entity tracks a change in the other. Two classical systems can correlate; for example, your brain is correlated with mine, now that you’ve read writing I’ve produced. Quantum systems can correlate more strongly than classical systems can, as by entangling

Suppose Colonel Next correlates two nuclei and gives one to his daughter before embarking on his closed timelike curve. Once he completes the loop, what relationship does Colonel Next’s nucleus share with his daughter’s? The nuclei retain the correlations they shared before Colonel Next entered the loop, according to Seth and collaborators. When referring to closed timelike curves from now on, I’ll mean ones of Seth’s sort.

Toronto hadn’t bloomed by May 2022.

We can simulate closed timelike curves by subjecting a quantum system to a circuit of the type illustrated below. We read the diagram from bottom to top. Along this direction, time—as measured by a clock at rest with respect to the laboratory—progresses. Each vertical wire represents a qubit—a basic unit of quantum information, encoded in an atom or a photon or the like. Each horizontal slice of the diagram represents one instant. 

At the bottom of the diagram, the two vertical wires sprout from one curved wire. This feature signifies that the experimentalist prepares the qubits in an entangled state, represented by the symbol | \Psi_- \rangle. Farther up, the left-hand wire runs through a box. The box signifies that the corresponding qubit undergoes a transformation (for experts: a unitary evolution). 

At the top of the diagram, the vertical wires fuse again: the experimentalist measures whether the qubits are in the state they began in. The measurement is probabilistic; we (typically) can’t predict the outcome in advance, due to the uncertainty inherent in quantum physics. If the measurement yields the yes outcome, the experimentalist has simulated a closed timelike curve. If the no outcome results, the experimentalist should scrap the trial and try again.

So much for interpreting the diagram above as a quantum circuit. We can reinterpret the illustration as a closed timelike curve. You’ve probably guessed as much, comparing the circuit diagram to the depiction, farther above, of Colonel Next’s journey. According to the second interpretation, the loop represents one particle’s trajectory through spacetime. The bottom and top show the particle reversing chronological direction—resembling me as I flew to or from southern California.

Me in southern California in spring 2022. Photo courtesy of Justin Dressel.

How can we apply closed timelike curves in quantum metrology? In Fforde’s books, Colonel Next has a brother, named Mycroft, who’s an inventor.2 Suppose that Mycroft is studying how two particles interact (e.g., by an electric force). He wants to measure the interaction’s strength. Mycroft should prepare one particle—a sensor—and expose it to the second particle. He should wait for some time, then measure how much the interaction has altered the sensor’s configuration. The degree of alteration implies the interaction’s strength. The particles can be quantum, if Mycroft lives not merely in Sherlock Holmes’s world, but in a quantum-steampunk one.

But how should Mycroft prepare the sensor—in which quantum state? Certain initial states will enable the sensor to acquire ample information about the interaction; and others, no information. Mycroft can’t know which preparation will work best: the optimal preparation depends on the interaction, which he hasn’t measured yet. 

Mycroft, as drawn by Sydney Paget in the 1890s

Mycroft can overcome this dilemma via a strategy published by my collaborator David Arvidsson-Shukur, his recent student Aidan McConnell, and me. According to our protocol, Mycroft entangles the sensor with a third particle. He subjects the sensor to the interaction (coupling the sensor to particle #2) and measures the sensor. 

Then, Mycroft learns about the interaction—learns which state he should have prepared the sensor in earlier. He effectively teleports this state backward in time to the beginning-of-protocol sensor, using particle #3 (which began entangled with the sensor).3 Quantum teleportation is a decades-old information-processing task that relies on entanglement manipulation. The protocol can transmit quantum states over arbitrary distances—or, effectively, across time.

We can view Mycroft’s experiment in two ways. Using several particles, he manipulates entanglement to measure the interaction strength optimally (with the best possible precision). This process is mathematically equivalent to another. In the latter process, Mycroft uses only one sensor. It comes forward in time, reverses chronological direction (after Mycroft learns the optimal initial state’s form), backtracks to an earlier time (to when the sensing protocol began), and returns to progressing forward in time (informing Mycroft about the interaction).

Where I stayed in Stockholm. I swear, I’m not making this up.

In Sweden, I regarded my work with David and Aidan as a lark. But it’s led to an experiment, another experiment, and two papers set to debut this winter. I even pass as a quantum metrologist nowadays. Perhaps I should have anticipated the metamorphosis, as I should have anticipated the extra springtimes that erupted as I traveled between north and south. As the bard says, there’s a time for all things.

More Swedish wildflowers from June 2022

1In the sequel, Fforde adds a twist to Next’s closed timelike curve. I can’t speak for the twist’s plausibility or logic, but it makes for delightful reading, so I commend the novel to you.

2You might recall that Sherlock Holmes has a brother, named Mycroft, who’s an inventor. Why? In Fforde’s novel, an evil corporation pursues Mycroft, who’s built a device that can transport him into the world of a book. Mycroft uses the device to hide from the corporation in Sherlock Holmes’s backstory.

3Experts, Mycroft implements the effective teleportation as follows: He prepares a fourth particle in the ideal initial sensor state. Then, he performs a two-outcome entangling measurement on particles 3 and 4: he asks “Are particles 3 and 4 in the state in which particles 1 and 3 began?” If the measurement yields the yes outcome, Mycroft has effectively teleported the ideal sensor state backward in time. He’s also simulated a closed timelike curve. If the measurement yields the no outcome, Mycroft fails to measure the interaction optimally. Figure 1 in our paper synopsizes the protocol.

December 04, 2025

n-Category Café Octonions and the Standard Model (Part 12)

Having spent a lot of time pondering the octonionic projective plane and its possible role in the Standard Model of particle physics, I’m now getting interested in the ‘bioctonionic plane’, which is based on the bioctonions 𝕆\mathbb{C} \otimes \mathbb{O} rather than the octonions 𝕆\mathbb{O}.

The bioctonionic plane also has intriguing mathematically connections to the Standard Model. But it’s not a projective plane in the axiomatic sense — and it can’t be constructed by straightforwardly copying the way you build a projective plane over a division algebra, since unlike the octonions, the bioctonions are not a division algebra. Nonetheless we can define points and lines in the bioctonionic plane. The twist is that now some pairs of distinct lines intersect in more than one point — and dually, some pairs of distinct points lie on more than one line. It obeys some subtler axioms, so people call it a Hjelmslev plane.

I am not ready to give a really good explanation of the bioctonionic plane! Instead, I just want to lay out some basic facts about how it fits into mathematics — and possibly physics.

Latham Boyle works at the University of Edinburgh, which is where I am now. Being able to talk to someone who deeply understands octonions and particle physics is very energizing. I’m especially fascinated by this paper of his:

It gives a convincing argument that the bioctonionic plane may be better than the octonionic projective plane for particle physics. The reason is that the tangent space of any point of the bioctonionic plane is a copy of (𝕆) 2(\mathbb{C} \otimes \mathbb{O})^2, a 16-dimensional complex vector space. The symmetry group of the bioctonionic plane is the exceptional Lie group E 6\text{E}_6. Sitting inside the stabilizer group of any given point is a copy of the Standard Model gauge group. And — here’s the cool part — this group acts on (𝕆) 2(\mathbb{C} \otimes \mathbb{O})^2 just as it does on one generation of fermions (not their antiparticles). If we try the same trick using the octonionic projective plane, we can fit the Standard Model gauge group in the stabilizer group of a point in a very natural way, but its action on the tangent space is its action only on left-handed fermions.

I want to explain this in detail, but not today. Instead, I want to skim through some basic facts about the bioctonionic plane.

First, this plane is one of the four Rosenfeld planes:

  • the octonionic projective plane 𝕆P 2\mathbb{O}\text{P}^2, a 16-dimensional compact Riemannian manifold on which the compact Lie group F 4\text{F}_4 acts transitively as isometries, with the stabilizer of any point being Spin(9)\text{Spin}(9). This is a symmetric space, and as such it’s called FII in Cartan’s classification.

  • the bioctonionic plane (𝕆) 2(\mathbb{C} \otimes \mathbb{O})\mathbb{P}^2, a 32-dimensional compact Riemannian manifold on which the compact Lie group E 6\text{E}_6 acts transitively as isometries, with the stabilizer of any point being (Spin(10)×U(1))/ 4(\text{Spin}(10) \times \text{U}(1))/\mathbb{Z}_4. This is the symmetric space EIII.

  • the quateroctonionic plane (𝕆) 2(\mathbb{H} \otimes \mathbb{O}) \mathbb{P}^2, a 64-dimensional compact Riemannian manifold on which the compact Lie group E 7\text{E}_7 acts transitively as isometries, with the stabilizer of any point being (Spin(12)×Sp(1))/ 2(\text{Spin}(12) \times \text{Sp}(1))/\mathbb{Z}_2. This is the symmetric space EVI.

  • the octooctonionic plane (𝕆𝕆) 2(\mathbb{O} \otimes \mathbb{O}) \mathbb{P}^2, a 128-dimensional compact Riemannian manifold on which the compact Lie group E 8\text{E}_8 acts transitively as isometries, with the stabilizer of any point being Spin(16)/ 2\text{Spin}(16)/\mathbb{Z}_2. This is the symmetric space EVIII.

There’s a nice network of systematic approaches to these spaces: they form one row of the so-called magic square, so one way to learn about the bioctonionic plane is to study the magic square, for example here:

  • Chris H. Barton and Anthony Sudbery, Magic squares of Lie algebras. Available as arXiv:math/0001083; see also arXiv:0203010 for a “streamlined and extended” version, which has more yet also less.

Here you can also find lots of references to earlier work, e.g. to Freudenthal and Tits. The basic idea of the magic square is that you start with two normed division algebras 𝕂,𝕂\mathbb{K}, \mathbb{K}' and from them you build a Lie algebra, which gives a Lie group G(𝕂,𝕂)G(\mathbb{K},\mathbb{K}'). There’s also a way to get a subgroup H(𝕂,𝕂)H(\mathbb{K}, \mathbb{K}'), and the quotient space

(𝕂𝕂)P 2=G(𝕂,𝕂)/H(𝕂,𝕂) (\mathbb{K} \otimes \mathbb{K}')\text{P}^2 = G(\mathbb{K},\mathbb{K}')/H(\mathbb{K}, \mathbb{K}')

is a kind of ‘plane’ on which the group G(𝕂,𝕂)G(\mathbb{K},\mathbb{K}') acts. If you take 𝕂=𝕆\mathbb{K}' = \mathbb{O}, this construction gives you the four Rosenfeld planes listed above.

Each one of these planes is a compact Riemannian symmetric space: this means it’s a connected compact Riemannian manifold MM such that for each point pMp \in M there’s an isometry

σ p:MM \sigma_p \colon M \to M

that fixes pp, acts as 1-1 on its tangent space, and squares to the identity. This map is called ‘reflection across pp’ for the obvious reason. For example, a round 2-sphere is a symmetric space, with σ p\sigma_p switching the red and blue tangent vectors if pp is the black dot:

Cartan classified compact Riemannian symmetric spaces, and there’s a nice theory of them. Any compact simple Lie group is one, and most of the rest come in infinite series connected to real and complex Clifford algebras, as I explained here. But there are 9 extra ones, all related to the octonions and exceptional Lie groups. The Rosenfeld planes are four of these.

You can learn this material starting on Wikipedia and then going to a textbook, ideally this:

  • Sigurdur Helgason, Differential Geometry, Lie Groups and Symmetric Spaces, Academic Press, 1978.

Helgason taught me Lie theory when I was a grad student at MIT, so I have a fondness for his book—but it’s also widely accepted as the most solid text on symmetric spaces!

The bioctonionic plane is even better: it’s a compact hermitian symmetric space: a compact Riemannian symmetric space MM where each tangent space T pMT_p M has a complex structure

J:T pMT pM,J 2=1 J \colon T_p M \to T_p M , \qquad J^2 = -1

compatible with the metric, and reflection about each point preserves this complex structure. I mentioned that the bioctonionic plane is

(𝕆)P 2G(,𝕆)/H(,𝕆) (\mathbb{C} \otimes \mathbb{O})\text{P}^2 \cong G(\mathbb{C},\mathbb{O})/H(\mathbb{C},\mathbb{O})

where

G(,𝕆)=E 6 G(\mathbb{C},\mathbb{O}) = \text{E}_6

acts transitively, and the stabilizer of a point is

H(,𝕆)=(Spin(10)×U(1))/ 4 H(\mathbb{C}, \mathbb{O}) = (\text{Spin}(10) \times \text{U}(1))/\mathbb{Z}_4

The U(1)\text{U}(1) here comes from the complex structure!

Wikipedia is especially thorough on hermitian symmetric spaces, so if you want to delve into those, start here:

Another tack is to focus on the exceptional Lie groups F 4,E 6,E 7\text{F}_4, \text{E}_6, \text{E}_7 and E 8\text{E}_8 and their connection to the nonassociative algebras 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}), 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}), 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{H} \otimes \mathbb{O}) and 𝔥 3(𝕆𝕆)\mathfrak{h}_3(\mathbb{O} \otimes \mathbb{O}), respectively. Here I recommend this:

  • Ichiro Yokota, Exceptional Lie Groups. Available as arXiv:0902.0431. (See especially Chapter 3, for E 6\text{E}_6 and the complexified exceptional Jordan algebra 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}).)

If you have a fondness for algebra you may also want to learn how symmetric spaces arise from Jordan triple systems or Jordan pairs. This is important if we wish to see the bioctonionic plane as the space of pure states of some exotic quantum system!

Now, this is much easier to do for the octonionic plane, because that’s the space of pure states for the exceptional Jordan algebra 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}), which is a Euclidean Jordan algebra, meaning one in which a sum of squares can only be zero if all those squares are zero. You can think of a Euclidean algebra AA as consisting of observables, with the sums of squares being ‘nonnegative’ observables. These nonnegative observables form a convex cone KAK \subseteq A. The dual vector space A *A^\ast contains a cone of linear functionals ff that send these nonnegative observables to nonnegative real numbers — I’ll call this the dual cone K *K^\ast. The functionals fK *f \in K^\ast with f(1)=1f(1) = 1 are called states. The states form a convex set, and the extreme points are called pure states. All of this fits nicely into a modern framework for understanding quantum theory and potential generalizations, called ‘generalized probabilistic theories’:

  • Howard Barnum, Alexander Wilce, Post-classical probability theory, in Quantum Theory: Informational Foundations and Foils, eds. Giulio Chiribella, Robert W. Spekkens, Springer, 2016. (See Section 5 for Jordan algebras, and ignore the fact that they say the exceptional Jordan algebra consists of 2×22 \times 2 matrices: they know perfectly well that they’re 3×33 \times 3.)

The underlying math, with a lot more about symmetric spaces, cones and Euclidean Jordan algebras but with none of the physics interpretation, is wonderfully explained here:

  • Jacques Faraut and Adam Korányi, Analysis on Symmetric Cones, Oxford U. Press, 1994.

A crucial fact throughout this book is that when you start with a Euclidean Jordan algebra AA, its cone KK of nonnegative observables is self-dual: there’s an isomorphism of vector spaces AA *A \cong A^\ast that maps KK to K *K^\ast in a one-to-one and onto way. The cone KK is also homogeneous, meaning that the group of invertible linear transformations of AA preserving KK acts transitively on the interior of KK. Faraut and Korányi call a self-dual homogeneous cone a symmetric cone — and they show that any symmetric cone comes from a Euclidean Jordan algebra! This result plays an important role in modern work on the foundations of quantum theory.

Unfortunately, I’m telling you all this nice stuff about Euclidean Jordan algebras and symmetric cones just to say that while all this applies to the octonionic projective plane, sadly, it does not apply to the bioctonionic plane! The bioctonionic plane does not come from a Euclidean Jordan algebra or a symmetric cone. Thus, to understand it as a space of pure states, we’d have to resort to a more general formalism.

There are a few papers that attempt exactly this:

  • Lawrence C. Biedenharn and Piero Truini, An 6𝒰(1)\mathcal{E}_6 \otimes \mathcal{U}(1) invariant quantum mechanics for a Jordan pair, Journal of Mathematical Physics 23 (1982), 1327-1345.

  • Lawrence C. Biedenharn and Piero Truini, Exceptional groups and elementary particle structures, Physica A: Statistical Mechanics and its Applications 14 (1982), 257–270.

  • Lawrence C. Biedenharn, G. Olivieri and Piero Truini, Three graded exceptional algebras and symmetric spaces, Zeitschrift Physik C — Particles and Fields 33 (1986), 47–65.

Here’s the basic idea. We can define 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}) to consist of 3×33 \times 3 hermitian matrices with entries in 𝕆\mathbb{C} \otimes \mathbb{O}, where ‘hermitian’ is defined using the star-algebra structure on 𝕆\mathbb{C} \otimes \mathbb{O} where we conjugate the octonion part but not the complex part! Then 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}) is just the complexification of 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}):

𝔥 3(𝕆)𝔥 3(𝕆) \mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}) \cong \mathbb{C} \otimes \mathfrak{h}_3(\mathbb{O})

Then because 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}) is a Jordan algebra over \mathbb{R}, 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}) is a Jordan algebra over \mathbb{C}. So we can do a lot with it. But it’s not a Euclidean Jordan algebra.

Puzzle. Show that it’s not.

So, Biedenharn and Truini need a different approach to relate 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}) to some sort of exotic quantum system. And they use an approach already known to mathematicians: namely, the theory of Jordan pairs! Here you work, not with a single element of 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{C} \otimes \mathbb{O}), but with a pair.

Jordan triple systems and Jordan pairs are two closely related generalizations of Jordan algebras. I’ve been working on the nLab articles about these concepts, so click the links if you want to learn more about them. I explain how either of these things gives you a 3-graded Lie algebra — that is, a \mathbb{Z}-graded Lie algebra that is nonvanishing only in the middle 3 grades:

𝔤=𝔤 1𝔤 0𝔤 1 \mathfrak{g} = \mathfrak{g}_{-1} \oplus \mathfrak{g}_0 \oplus \mathfrak{g}_1

And from a 3-graded Lie algebra you can get a symmetric space G/HG/H where the Lie algebra of GG is 𝔤\mathfrak{g} and the Lie algebra of HH is 𝔤 0\mathfrak{g}_0. Each tangent space of this symmetric space is thus isomorphic to 𝔤 1𝔤 1\mathfrak{g}_{-1} \oplus \mathfrak{g}_1 .

In the case relevant to the bioctonionic plane, the 3-graded Lie algebra is

𝔢 6=(𝕆)(𝔰𝔬(10))(𝕆) \mathfrak{e}_6 = (\mathbb{C} \otimes \mathbb{O}) \; \oplus \; (\mathfrak{so}(10) \oplus \mathbb{R}) \; \oplus \; (\mathbb{C} \otimes \mathbb{O})

So, the bioctonionic plane is a symmetric space on which E 6\text{E}_6 acts, with stabilizer group Spin(10)×U(1)\text{Spin}(10) \times \text{U}(1) (up to covering spaces), and with tangent space isomorphic to (𝕆) 2(\mathbb{C} \otimes \mathbb{O})^2.

So all this is potentially very nice. For much more on this theory, try the work of Ottmar Loos:

That’s a lot of stuff! For a quick overview of Loos’ work, I find this helpful:

Unfortunately Loos does not delve into examples, particularly the bioctonionic plane. For that, try Biedenharn and Truini, and also these:

To wrap things up, I should say a bit about ‘Hjelmslev planes’, since the bioctonionic plane is supposed to be one of these. Axiomatically, a Hjelmslev plane is a set PP of points, a set LL of lines, and an incidence relation between points and lines. We require that for any two distinct points there is at least one line incident to both, and for any two distinct line there is at least one point incident to both. If two points are incident to more than one line we say they are neighbors. If two lines are incident to more than one point we say they are neighbors. We demand that both these ‘neighbor’ relations are equivalence relations, and that if we quotient PP and LL by these equivalence relations, we get an axiomatic projective plane.

Challenge. What projective plane do we get if we apply this quotient construction to the bioctonionic plane?

My only guess is that we get the octonionic projective plane — but I don’t know why.

The literature on Hjelmslev planes seems a bit difficult, but I’m finding this to be a good introduction:

The answer to my puzzle should be here, because they’re talking about Hjelmslev planes built using split octonion algebras (like 𝕆\mathbb{C} \otimes \mathbb{O}):

  • Tonny A. Springer and Ferdinand D. Veldkamp, On Hjelmslev–Moufang planes, Mathematicsche Zeitschrift 107 (1968), 249–263.

But I don’t see the answer here yet!


  • Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under SU(3)\mathrm{SU}(3).
  • Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
  • Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of SU(3)\mathrm{SU}(3), the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group SU(3)\mathrm{SU}(3).
  • Part 4. Introducing the exceptional Jordan algebra 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}): the 3×33 \times 3 self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the 2×22 \times 2 adjoint octonionic matrices form precisely the Standard Model gauge group.
  • Part 5. How to think of 2×22 \times 2 self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
  • Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group E 6\mathrm{E}_6. How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
  • Part 7. How to describe the Lie group E 6\mathrm{E}_6 using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
  • Part 8. A geometrical way to see how E 6\mathrm{E}_6 is connected to 10d spacetime, based on the octonionic projective plane.
  • Part 9. Duality in projective plane geometry, and how it lets us break the Lie group E 6\mathrm{E}_6 into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
  • Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
  • Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
  • Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.

December 03, 2025

n-Category Café log|x| + C revisited

A while ago on this blog, Tom posted a question about teaching calculus: what do you tell students the value of 1xdx\displaystyle\int \frac{1}{x}\,dx is? The standard answer is ln|x|+C\ln{|x|}+C, with CC an “arbitrary constant”. But that’s wrong if \displaystyle\int means (as we also usually tell students it does) the “most general antiderivative”, since

F(x)={ln|x|+C ifx<0 ln|x|+C + ifx>0 F(x) = \begin{cases} \ln{|x|} + C^- &\text{if}\;x\lt 0\\ \ln{|x|} + C^+ &\text{if}\;x\gt 0 \end{cases}

is a more general antiderivative, for two arbitrary constants C C^- and C +C^+. (I’m writing ln\ln for the natural logarithm function that Tom wrote as log\log, for reasons that will become clear later.)

In the ensuing discussion it was mentioned that other standard indefinite integrals like 1x 2dx=1x+C\displaystyle\int \frac{1}{x^2}\,dx = -\frac{1}{x} + C are just as wrong. This happens whenever the domain of the integrand is disconnected: the “arbitrary constant” CC is really only locally constant. Moreover, Mark Meckes pointed out that believing in such formulas can lead to mistaken calculations such as

1 11x 2dx=1x] 1 1=2 \int_{-1}^1 \frac{1}{x^2}\,dx = \left.-\frac{1}{x}\right]_{-1}^1 = -2

which is “clearly nonsense” since the integrand is everywhere positive.

In this post I want to argue that there’s actually a very natural perspective from which 1x 2dx=1x+C\displaystyle\int \frac{1}{x^2}\,dx = -\frac{1}{x} + C is correct, while 1xdx=ln|x|+C\displaystyle\int \frac{1}{x}\,dx = \ln{|x|}+C is wrong for a different reason.

The perspective in question is complex analysis. Most of the functions encountered in elementary calculus are actually complex-analytic — the only real counterexamples are explicit “piecewise” functions and things like |x|{|x|}, which are mainly introduced as counterexamples to illustrate the meaning of continuity and differentiability. Therefore, it’s not unreasonable to interpret the indefinite integral f(x)dx\displaystyle\int f(x)\,dx as asking for the most general complex-analytic antiderivative of ff. And the complex domain of 1z\frac{1}{z} and 1z 2\frac{1}{z^2} is {0}\mathbb{C}\setminus \{0\}, which is connected!

Thus, for instance, since ddz[1z]=1z 2\frac{d}{d z}\left[-\frac{1}{z}\right] = \frac1{z^2}, it really is true that the most general (complex-analytic) antiderivative of 1z 2\frac{1}{z^2} is 1z+C-\frac{1}{z}+C for a single arbitrary constant CC, so we can write 1z 2dz=1z+C\displaystyle\int \frac{1}{z^2}\,dz = -\frac{1}{z} + C. Note that any such antiderivative has the same domain {0}\mathbb{C}\setminus \{0\} as the original function.

In addition, the dodgy calculation

1 11z 2dz=1z] 1 1=2 \int_{-1}^1 \frac{1}{z^2}\,dz = \left.-\frac{1}{z}\right]_{-1}^1 = -2

is actually correct if we interpret 1 1\int_{-1}^1 to mean the integral along some (in fact, any) curve in \mathbb{C} from 1-1 to 11 that doesn’t pass through the singularity z=0z=0. Of course, this doesn’t offend against signs because any such path must pass through non-real numbers, whose squares can contribute negative real numbers to the integral.

The case of 1zdz\displaystyle\int \frac{1}{z}\,dz is a bit trickier, because the complex logarithm is multi-valued. However, if we’re willing to work with multi-valued functions (which precisely means functions whose domain is a Riemann surface covering some domain in \mathbb{C}), we have such a multi-valued function that I’ll denote log\log (in contrast to the usual real-number function ln\ln) defined on a connected domain, and there we have ddz[log(z)]=1z\frac{d}{d z}\left[ \log(z) \right] = \frac{1}{z}. Thus, the most general (complex-analytic) antiderivative of 1z\frac{1}{z} is log(z)+C\log(z)+C where CC is a single arbitrary constant, so we can write 1zdz=log(z)+C\displaystyle\int \frac{1}{z}\,dz = \log(z) + C.

What happened to ln|x|\ln{|x|}? Well, as it happens, if xx is a negative real number and LogLog denotes the principal branch of the complex logarithm, then Log(x)=ln|x|+iπLog(x) = \ln{|x|} + i\pi, hence ln|x|=Log(x)iπ\ln{|x|} = Log(x) - i\pi. Therefore, the antiderivative ln|x|\ln{|x|} for negative real xx is of the form Log(x)+C\Log(x)+C, where Log\Log is a branch of the complex logarithm and CC is a constant (namely, iπ-i\pi).

Of course it is also true that for positive real xx, the antiderivative ln|x|=ln(x)\ln{|x|} = \ln(x) is of the form Log(x)+C\Log(x)+C for some constant CC, but in this case the constant is 00. And changing the branch of the logarithm changes the constant by 2iπ2i\pi, so it can never make the constants 00 and iπ-i\pi coincide. Thus, unlike 1x+C-\frac{1}{x} +C, the real-number function ln|x|+C\ln{|x|} + C in the “usual answer” is not the restriction to {0}\mathbb{R}\setminus \{0\} of any complex-analytic antiderivative of 1x\frac{1}{x} on a connected domain. This is what I mean by saying that 1xdx=ln|x|+C\displaystyle\int \frac{1}{x}\,dx = \ln{|x|}+C is now wrong for a different reason. And we can see that the analogous dodgy calculation

1 11xdx=ln|x|] 1 1=0 \int_{-1}^1 \frac{1}{x}\,dx = \ln{|x|}\Big]_{-1}^1 = 0

is also still wrong. If γ\gamma is a path from 1-1 to 11 in {0}\mathbb{C}\setminus \{0\}, the value of γ1xdx\int_{\gamma} \frac{1}{x}\,dx depends on γ\gamma, but it never equals 00: it’s always an odd integer multiple of iπi\pi, depending on how many times γ\gamma winds around the origin.

I’m surprised that no one in the previous discussion, including me, brought this up. Of course we probably don’t want to teach our elementary calculus students complex analysis (although I’m experimenting with introducing some complex numbers in second-semester calculus). But this perspective makes me less unhappy about writing 1x 2dx=1x+C\displaystyle\int \frac{1}{x^2}\,dx = -\frac{1}{x} + C and 1xdx=log(x)+C\displaystyle\int \frac{1}{x}\,dx = \log(x) + C (no absolute value!).

December 01, 2025

Secret Blogging SeminarCongress proposes cutting of all funding to US academics who mentor Chinese students

I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.

This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.

This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.

November 27, 2025

Sean Carroll Thanksgiving

 (Apologies for the ugly blog format. We had a bit of a crash, and are working to get the template back in working order.)

This year we give thanks for a crucially important idea that can mean very different things to different people: information. (We’ve previously given thanks for the Standard Model LagrangianHubble’s Law, the Spin-Statistics Theoremconservation of momentumeffective field theorythe error bargauge symmetryLandauer’s Principle, the Fourier TransformRiemannian Geometrythe speed of lightthe Jarzynski equalitythe moons of Jupiterspaceblack hole entropyelectromagnetism, Arrow’s Impossibility Theorem, and quanta.)

“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.

Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.

Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.

Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.

Imagine we can assign a probability p(x) to every different event x. Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:

    \[I(x) = - \log [p(x)] =\log \left(\frac{1}{p(x)}\right).\]

Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So I(x) goes from +\infty at p(x)=0 to 0 at p(x)=1. An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.

From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution p(x), the average of any function f(x) is just the sum of the the values of the function times their respective probabilities, \langle f\rangle = \sum_x p(x) f(x). So we characterize the information content of a code via the quantity

    \[H[p] = - \sum_x p(x) \log[p(x)].\]

The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.

Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone, S= k \log W, and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.

But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which p(x) is relatively uniform over all events x, which means H[p(x)] is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.

But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when p(x) is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as H_\mathrm{max} - H. (I talk about this a bit in my recent solo podcast on complexity.)

Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.

November 26, 2025

Tim GowersCreating a database of motivated proofs

It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.

In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.

Why might yet another database of theorems and proofs be useful?

The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.

A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.

Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?

The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.

The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”

Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form \exists x\in X \ P(x), where P is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct x is one that it has seen before and regards as standard.

If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)

But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.

The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.

  1. It can be very hard to remember where an idea came from, even a few seconds after one has had it — in that respect it is like a dream, the memory of which becomes rapidly less vivid as one wakes up.
  2. Often an idea will seem fairly obvious to one person but not to another.
  3. The phrase “motivated proof” means different things to different people, so without a lot of careful moderation and curation of entries, there is a risk that a database would be disorganized and not much more helpful than a database of conventionally written proofs.
  4. A stream-of-consciousness account could end up being a bit too much about the person who finds the proof and not enough about the mathematical reasons for the proof being feasibly discoverable.

To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)

What is a structured motivated proof?

I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.

1. A structured motivated proof is one that is generated by standard moves.

I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.

A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.

Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form P(x) and \forall u\ P(u)\implies Q(u), then we might decide to deduce Q(x). But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.

  1. We have in our list of hypotheses the fact that a function f is given by the formula f(x)=\exp(p(x)), where p is a polynomial, and our goal is to prove that there exists z such that f(z)=1. Without really thinking about it, we are conscious that f is a composition of two functions, one of which is continuous and one of which belongs to a class of functions that are all continuous, so f is continuous. Also, the conclusion \exists z\ f(z)=1 matches well the conclusion of the intermediate-value theorem. So the intermediate-value theorem comes naturally to mind and we add it to our list of available hypotheses. In practice we wouldn’t necessarily write it down, but the system we wish to develop is intended to model not just what we write down but also what is going on in our brains, so we propose a move that we call library extraction (closely related to what is often called premise selection in the literature). Note that we have to be a bit careful about library extraction. We don’t want the system to be allowed to call up results from the library that appear to be irrelevant but then magically turn out to be helpful, since those would feel like rabbits out of hats. So we want to allow extraction of results only if they are obvious given the context. It is not easy to define what “obvious” means, but there is a good rule of thumb for it: a library extraction is obvious if it is one of the first things ChatGPT thinks of when given a suitable non-cheating prompt. For example, I gave it the prompt, “I have a function f from the reals to the reals and I want to prove that there exists some z such that f(z)=1. Can you suggest any results that might be helpful?” and the intermediate-value theorem was its second suggestion. (Note that I had not even told it that f was continuous, so I did not need to make that particular observation before coming up with the prompt.)
  2. We have a goal of the form \exists x\in X\ P(x). If this were a Lean proof state, the most common way to discharge a goal of this form would be to input a choice for x. That is, we would instantiate the existential quantifier with some x_0 and our new goal would be P(x_0). However, as with library extraction, we have to be very careful about instantiation if we want our proof to be motivated, since we wish to disallow highly surprising choices of x_0 that can be found only after a long process of thought. So we have to restrict ourselves to obvious instantiations. One way that an instantiation in our system will count as obvious is if the variable is instantiated with a term that is already present in the proof-discovery state. If the desired term is not present, then in order to continue with the proof, it will be necessary to carry out moves that generate it. A very common technique for this is the use of metavariables: instead of guessing a suitable x_0, we create a variable x^\bullet and change the goal to P(x^\bullet), which we can think of as saying “I’m going to start trying to prove P(x^\bullet) even though I haven’t chosen x^\bullet yet. As the attempted proof proceeds, I will note down any properties Q_1,\dots,Q_k that x^\bullet might have that would help me finish the proof, in the hope that (i) I get to the end and (ii) the problem \exists x\ Q_1(x)\wedge\dots\wedge Q_k(x) is easier than the original problem.” Another kind of obvious instantiation is one where we try out an object that is “extreme” in some way — it might be the smallest element of X, or the largest, or the simplest. (Judging simplicity is another place where the ChatGPT rule of thumb can be used.)
  3. We cannot see how to answer the question we are focusing on so we ask a related question. Two very common kinds of related question (as emphasized by Polya) are generalization and specialization. Perhaps we don’t see why a hypothesis is helpful, so we see whether the result holds if we drop that hypothesis. If it does, then we are no longer distracted by an irrelevant hypothesis. If it does not, then we can hope to find a counterexample that will help us understand how to use the hypothesis. Or perhaps we are trying to prove a general statement but it is not clear how to do so, so instead we formulate some special cases, hoping that we can prove them and spot features of the proofs that we can generalize. Again we have to be rather careful here not to allow “non-obvious” generalizations and specializations. Roughly the idea there is that a generalization should be purely logical — for example, dropping a hypothesis is fine but replacing the hypothesis “f is twice differentiable” by “f is upper semicontinuous” is not — and that a specialization should be to a special case that counts as an obvious instantiation in the sense discussed just above.

2. A structured motivated proof is one that can be generated with the help of a point-and-click system.

This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.

The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.

If, for example, the current goal is an existential statement \exists x\ P(x), then typing in a formula that defines a suitable x is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate x, which is our proxy for showing where the idea of using x came from.

Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.

There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.

How to get involved.

There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.

  1. The most important is for people to submit structured motivated (or move-generated) proofs to us on the platform we provide. We hope that the database will end up containing proofs of a wide range of difficulty (of two kinds — there might be fairly easy arguments that are hard to motivate and there might be arguments that are harder to follow but easier to motivate) and also a wide range of areas of mathematics. Our initial target, which is quite ambitious, is to have around 1000 entries by two years from now. While we are not in a position to accept entries yet, if you are interested in participating, then it is not too early to start thinking in a less formal way about how to convert some of your favourite proofs into motivated versions, since that will undoubtedly make it easier to get them accepted by our platform when it is ready.
  2. We are in the process of designing the platform. As I mentioned earlier, we already have a prototype, but there are many moves we will need it to be able to do that it cannot currently do. For example, the current prototype allows just a single proof state, which consists of some variable declarations, hypotheses, and goals. It does not yet support creating subsidiary proof states (which we would need if we wanted to allow the user to consider generalizations and specializations, for example). Also, for the moment the prototype gets an LLM to implement all moves, but some of the moves, such as applying modus ponens, are extremely mechanical and would be better done using a conventional program. (On the other hand, moves such as “obvious library extraction” or “provide the simplest example” are better done by an LLM.) Thirdly, a technical problem is that LaTeX is currently rendered as images, which makes it hard to select subexpressions, something we will need to be able to do in a non-clunky way. And the public version of the platform will need to be web-based and very convenient to use. We will want features such as being able to zoom out and look at some kind of dependency diagram of all the statements and questions currently in play, and then zoom in on various nodes if the user wishes to work on them. If you think you may be able (and willing) to help with some of these aspects of the platform, then we would be very happy to hear from you. For some, it would probably help to have a familiarity with proof assistants, while for others we would be looking for somebody with software engineering experience. The grant from the AI for Math Fund will allow us to pay for some of this help, at rates to be negotiated. We are not yet ready to specify in detail what help we need, but would welcome any initial expressions of interest.
  3. Once the platform is ready and people start to submit proofs, it is likely that, at least to start with, they will find that the platform does not always provide the moves they need. Perhaps they will have a very convincing account of where a non-obvious idea in the proof came from, but the system won’t be expressive enough for them to translate that account into a sequence of proof-generating moves. We will want to be able to react to such situations (if we agree that a new move is needed) by expanding the capacity of the platform. It will therefore be very helpful if people sign up to be beta-testers, so that we can try to get the platform to a reasonably stable state before opening it up to a wider public. Of course, to be a beta-tester you would need to have a few motivated proofs in mind.
  4. It is not obvious that every proof submitted via the platform, even if submitted successfully, would be a useful addition to the database. For instance, it might be such a routine argument that no idea really needs to have its origin explained. Or it might be that, despite our best efforts, somebody finds a way of sneaking in a rabbit while using only the moves that we have provided. (One way this could happen is if an LLM made a highly non-obvious suggestion that happened to work, in which case the rule of thumb that if an LLM thinks of it, it must be obvious, would have failed in that instance.) For this reason, we envisage having a team of moderators, who will check entries and make sure that they are good additions to the database. We hope that this will be an enjoyable task, but it may have its tedious aspects, so we envisage paying moderators — again, this expense was allowed for in our proposal to the AI for Math Fund.

If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.

In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).

How does this relate to use of tactics in a proof assistant?

In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.

Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.

November 25, 2025

David Hoggsubstellar objects (brown dwarfs)

I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).

Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.

I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.

November 14, 2025

Matt Strassler Event with Professor Daniel Whiteson on Monday November 17 at 7pm

Next Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?

For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?

Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.

Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.

I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!

October 15, 2025

Clifford JohnsonNobel Prize in Physics 2025: Who/What/Why

I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post

The post Nobel Prize in Physics 2025: Who/What/Why appeared first on Asymptotia.

October 01, 2025

Robert HellingHolosplit

 Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.

This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.

It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.

The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original. 

How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).

To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability

$$\frac{\prod_{k=1}^N \left(256^N-k\right)}{(256)^{N^2}}$$

which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover

$$d=A^{-1}e.$$

Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information. 

September 26, 2025

Peter Rohde Photo albums

Peter’s photos: https://www.icloud.com/sharedalbum/#B275oqs3qKSZvQ

Screenshots: https://www.icloud.com/sharedalbum/#B27532ODWjIQb9

Climbing book launch: https://www.icloud.com/sharedalbum/#B27GWZuqDGnuOyN

Salisbury waters: https://www.icloud.com/sharedalbum/#B275qXGF1JQFkx

Christmas with Ash: https://www.icloud.com/sharedalbum/#B27G6XBubAhoT6

Hosin BBQ duck: https://www.icloud.com/sharedalbum/#B27GY8gBYG3b5mD

Hawks Nest to Smiths Lake: https://www.icloud.com/sharedalbum/#B2759UlCqSH5bE

Europe & Alps: https://www.icloud.com/sharedalbum/#B275ON9t3W0lu

Point Perpendicular: https://www.icloud.com/sharedalbum/#B27GqkRUiGivXD2

Newnes canyoning: https://www.icloud.com/sharedalbum/#B27GfnH8tgHSmX

Coffs Harbour to Yamba: https://www.icloud.com/sharedalbum/#B27J0DiRHJKuuWr

Wendy Bruere Christmas (2020): https://www.icloud.com/sharedalbum/#B27G4TcsmGoHysj

Six Foot Track: https://www.icloud.com/sharedalbum/#B2753qWtHZA9EX

Kosciusko to Kiandra: https://www.icloud.com/sharedalbum/#B27GgZLKuGaewVm

Camping food: https://www.icloud.com/sharedalbum/#B27GtnIORgbmHu

The Aardvark: https://www.icloud.com/sharedalbum/#B275VaUrzvmAiT

Kangaroo Valley kayaking: https://www.icloud.com/sharedalbum/#B27JEsNWnJrCpi0

Claustral canyon: https://www.icloud.com/sharedalbum/#B2755Z2WMOTpsk

Budawang: https://www.icloud.com/sharedalbum/#B27GDdyTvGvpINL

Mother’s Day panoramas (2021): https://www.icloud.com/sharedalbum/#B27GFssfGG9WmJP

Point Perpendicular & Nowra: https://www.icloud.com/sharedalbum/#B27GRMtznGPdeuZ

Blood moon: https://www.icloud.com/sharedalbum/#B27GdIshaG8NgGX

La Perouse to Coogee: https://www.icloud.com/sharedalbum/#B275aVbMK4h7qo

Canberra ASPI launch: https://www.icloud.com/sharedalbum/#B27GQOeMmGj4Zcv

Edible foraging: https://www.icloud.com/sharedalbum/#B275ejO179Si0N

Sydney to Wollongong: https://www.icloud.com/sharedalbum/#B275M7GFPUasMe

Album for Dad, Father’s Day (2021): https://www.icloud.com/sharedalbum/#B2752plgjnnkUe

Vaucluse (with Cheryl, Nestor & Wendy): https://www.icloud.com/sharedalbum/#B275CmvAS4uA0Z

Bouddi National Park: https://www.icloud.com/sharedalbum/#B27GdPblXG8WdOo

Tom Thumb (the 2nd): https://www.icloud.com/sharedalbum/#B275aDWbr4CN2w

Eden to Victoria: https://www.icloud.com/sharedalbum/#B27GJDfWGArX8l

Wendy’s book launch (the 2nd): https://www.icloud.com/sharedalbum/#B27GIcgc2G7h08y

Mark & Pat Bruere visit Sydney: https://www.icloud.com/sharedalbum/#B27G0ehgLbyWyg

New Years Eve climb (2021): https://www.icloud.com/sharedalbum/#B27Ju8EH6JOZxmU

Newnes Canyoning (2022): https://www.icloud.com/sharedalbum/#B275BydzFU0GZ8

Royal National Park (2022): https://www.icloud.com/sharedalbum/#B27GlxzuqGVI5nE

Peter & Wendy: https://www.icloud.com/sharedalbum/#B27Gf693ZG52tfd

Book photo shoots: too rude…

Wendy & Peter’s mushroom trip: https://www.icloud.com/sharedalbum/#B27GrhkPxG27So8

Post-mushroom hike: https://www.icloud.com/sharedalbum/#B27GdFryYG8i3Ur

Wendy Kalymnos favourites: https://www.icloud.com/sharedalbum/#B27JqstnBJEXkH2

Wendy Frenchmans screenshots: https://www.icloud.com/sharedalbum/#B27Jr1PPdJpd7Dq

Instagram: https://www.icloud.com/sharedalbum/#B27GzFCC1Gb4tqr

Haute route: https://www.icloud.com/sharedalbum/#B27J8GySPJtWoQ1

Kim’s KKKalendar: https://www.icloud.com/sharedalbum/#B275fk75vIL0sH

Frenchmans Cap Wild: https://www.icloud.com/sharedalbum/#B27G4VTwGGoFBkz

Photoshoot with Zixin: https://www.icloud.com/sharedalbum/#B27GPCdxkGKPkM4

Wendy birthday hike (2023): https://www.icloud.com/sharedalbum/#B27GWBC59GnHpQW

Bateman’s Bay to Bawley Point: https://www.icloud.com/sharedalbum/#B27JsHvHoJ8bxWf

Stockton Sand dunes (2023): https://www.icloud.com/sharedalbum/#B27GVfZ2vGloFZV

Wendy book launch (2023): https://www.icloud.com/sharedalbum/#B27J058xyJR4IBM

Dolomites (2023): https://www.icloud.com/sharedalbum/#B0Z5kuVsbGJUzKO

Mount Arapiles: https://www.icloud.com/sharedalbum/#B275GH8Mq8Uh2X

Mount Solitary loop: https://www.icloud.com/sharedalbum/#B275nhQST2mETE

Klaus Hanz Franz Rohde Kunst: https://www.icloud.com/sharedalbum/#B27GqQrCLGiY3vb

Klaus Rohde funeral slideshow: https://www.icloud.com/sharedalbum/#B27GDZLe8GXP58K

Dad (old, B&W): https://www.icloud.com/sharedalbum/#B27GLLXGLJ5mbT2

Klaus & Ursula wedding: https://www.icloud.com/sharedalbum/#B275cLqfN7154g

Test Greece: https://www.icloud.com/sharedalbum/#B27Jq4WnLJ6JMNd

From Will Skea (Alps): https://www.icloud.com/sharedalbum/#B27JHciePJFwacG

From Will Skea (Frenchmans Cap): https://www.icloud.com/sharedalbum/#B275ZhN2v3EVq6

From Will Skea (Arapiles): https://www.icloud.com/sharedalbum/#B27JPrgBGJu3BTD

Coffs Harbour to Yamba (2): https://www.icloud.com/sharedalbum/#B27GFqhgJG9LHgT

Mark magic show (2021): https://www.icloud.com/sharedalbum/#B27G60dj6ARCvd

Wendy Christmas present (2020): https://www.icloud.com/sharedalbum/#B275FrPQ6GxvRu

AHS 25 year reunion: https://www.icloud.com/sharedalbum/#B275O3DjHUvSv

WhatsApp: https://www.icloud.com/sharedalbum/#B275tzEA5fX1nc

Armidale High School: https://www.icloud.com/sharedalbum/#B27GnbeumG4PnAF

Book photos for Mum & Dad: https://www.icloud.com/sharedalbum/#B27Gtec4XQkASe

Miscellaneous: https://www.icloud.com/sharedalbum/#B27Gq6kMgGKn7GR

Three Capes Trail (2022): https://www.icloud.com/sharedalbum/#B27G7HOIlGrDUGZ

Childhood computer programming: https://www.icloud.com/sharedalbum/#B275fu2MutDU8N

Magic with Mark in Maroubra: https://www.icloud.com/sharedalbum/#B27Gv6DhEGD9U3G

Photoshoot with Zixin (2024): https://www.icloud.com/sharedalbum/#B27GCATCnJGoRfW

Butt Crack (2021): https://www.icloud.com/sharedalbum/#B275VtHQfMv0zw

Greece photos new (edited to remove photos from wrong album): https://www.icloud.com/sharedalbum/#B27GY3uThGoBcGj

Singapore (all combined): https://www.icloud.com/sharedalbum/#B275qsTcwJKJjl

Hong Kong (transit): https://www.icloud.com/sharedalbum/#B2759v1AbS8Hve

Taiwan: https://www.icloud.com/sharedalbum/#B27GQD2D7Gw0hAp

India (combined): https://www.icloud.com/sharedalbum/#B27Gtue8VQy83g

Freycinet: https://www.icloud.com/sharedalbum/#B27G5VpecGE5Tbg

Triglav: https://www.icloud.com/sharedalbum/#B275MbK9Vy8erz

Shared with me: https://www.icloud.com/sharedalbum/#B27GGXqixzPOrm

Mount Wellington climbing: https://www.icloud.com/sharedalbum/#B27Gd59qiG8Kjy4

New Zealand combined (2004): https://www.icloud.com/sharedalbum/#B27GIZ8BIGNN5jy

New Zealand combined (2005): https://www.icloud.com/sharedalbum/#B27GcuRfIGFVIcL

Yea: https://www.icloud.com/sharedalbum/#B27GZYbYHGhFIir

Mount Pleasant: https://www.icloud.com/sharedalbum/#B275Iy2hC0JTTL

D’Aguilar: https://www.icloud.com/sharedalbum/#B27Gh7fzTGZBosS

Bali (2001): https://www.icloud.com/sharedalbum/#B27G1qNHBGOTbIr

Samba Ninjas: https://www.icloud.com/sharedalbum/#B27GG34bAzqQ0v

Armidale (misc): https://www.icloud.com/sharedalbum/#B27GSkLVwGyobbX

Emma’s party (2008): https://www.icloud.com/sharedalbum/#B275S2ms99Zyby

Goettingen (2011): https://www.icloud.com/sharedalbum/#B27JIrbT3Jsgxhd

South Coast track: https://www.icloud.com/sharedalbum/#B27G58NWBG6QyN7

Minsk (2006): https://www.icloud.com/sharedalbum/#B27G3JpSBGX1UkQ

Baden-Baden (2019): https://www.icloud.com/sharedalbum/#B27595X5HTVzJr

Berlin (combined): https://www.icloud.com/sharedalbum/#B27JqWzChJ6qizD

Switzerland (combined): https://www.icloud.com/sharedalbum/#B275zXwoYGJ6HMF

Italy highlights: https://www.icloud.com/sharedalbum/#B27G47PHQGoJium

Germany (misc): https://www.icloud.com/sharedalbum/#B275hPMfYGu5xVJ

Garmisch (2022): https://www.icloud.com/sharedalbum/#B27GFsbvlG9Xrr6

Germany (2019): https://www.icloud.com/sharedalbum/#B27G6Mn98G56Ncb

Garmisch (2006): https://www.icloud.com/sharedalbum/#B27J5lIdKGLC9KG

Baden-Baden (2005): https://www.icloud.com/sharedalbum/#B275sWRpHHQkt9

Berlin (2005): https://www.icloud.com/sharedalbum/#B27GgOQtrGjQrpH

Zugspitze (2005): https://www.icloud.com/sharedalbum/#B27G81mNdGcApGt

Amsterdam, Bristol (2006): https://www.icloud.com/sharedalbum/#B275B9SRzyBjlH

Baden-Baden (2006): https://www.icloud.com/sharedalbum/#B275eD9V79I2XR

Berlin (2006): https://www.icloud.com/sharedalbum/#B275toRf1fH8MD

Berlin, Jena (2007): https://www.icloud.com/sharedalbum/#B27GTI3fvGVgNit

Erlangen (2006): https://www.icloud.com/sharedalbum/#B27JrotZ2JpMb0i

Garmisch (2010): https://www.icloud.com/sharedalbum/#B27JPJPSiJurzNg

Germany (2010): https://www.icloud.com/sharedalbum/#B275FhYPQP650

Stuttgart (2006): https://www.icloud.com/sharedalbum/#B27GmitydGVVaZh

Changi (2019): https://www.icloud.com/sharedalbum/#B27GnmlKoG4JHpX

Japan (2007): https://www.icloud.com/sharedalbum/#B275AerZbG6FxVL

Japan (2012): https://www.icloud.com/sharedalbum/#B27GjBjobGg6PUa

Miscellaneous (including Japan 2013): https://www.icloud.com/sharedalbum/#B27GTpbybGySbE8

Currumbin & Tugin (2021): https://www.icloud.com/sharedalbum/#B275vBKZ4xH9X6

Brisbane (2021): https://www.icloud.com/sharedalbum/#B275YHsSjxQnm0

Weed in Byron (26/6/2025): https://www.icloud.com/sharedalbum/#B275Q2ydoGsQ4O5

Weed in Byron 2: https://www.icloud.com/sharedalbum/#B27GQDYhLGwsuY4

August 24, 2025

August 23, 2025

August 22, 2025

Peter Rohde Why?

  1. The person dressed up as Ursula pretending to be my mother clearly isn’t and hasn’t been for a long time.
  2. When I went back to Armidale after leaving BTQ and being left unemployed she made numerous ongoing promises to provide me with assistance, both in obtaining my own accommodation and providing financial assistance.
  3. These didn’t materialise and the promises were revoked.
  4. Instead I was evicted from the family home and subject to ongoing stalking and harassment that required multiple referrals to law enforcement, both to the police and the Attorney-General, demanding cease and desist.
  5. These have been systematically ignored and up until the last message she continues to bypass these requests, approaching my personal friends to harass me and stalk me indirectly. The messages passed on are the usual fake “I’m worried about him” bullshit.
  6. Why has my family home been confiscated by security, who actively break the law by ignoring cease and desist from stalking notices made to law enforcement, forcing an unemployed civilian into ongoing homelessness since early in the year?
  7. What is the rational for my eviction and being barricaded from my own home?
  8. I continue to face a medical blockade and am unable to access essential medicines. Seroquel scripts are deliberately delayed past known script deadlines to try and destabilise me.
  9. Vyvanse scripts are denied outright as the psychiatrist does not respond. He is also known to be a state actor.
  10. It has been repeatedly indicated to me not to worry about finances because they have my back. Instead now the only cash I have is that obtained from fully drawing out a cash advance against my credit card and it will only last days. At that point I’m on the street.
  11. Is everyone here on the same page as to what the deal is? If not, who is playing you off? They clearly need to be deposed.
  12. These are violations of human rights and constitute war crimes and crimes against humanity. Whoever is behind it needs to be removed. End of story.
  13. Who else is being subject to this kind of high level manipulation?
  14. It has been repeatedly suggested that full accountability for the lives of those I care for would be provided. This has not been forthcoming. It is also a violation international law to not provide accountability for the lives of those who are known to have been threatened by the state. These are grounds for removal.
  15. Can anyone answer the question as to why I am in this situation? Who is even living in the family home? Some stooge dressed up as Ursula? It’s a poor lifestyle choice to say the least.
  16. It’s pretty obvious they’re trying to get rid of me and once they do they’ll get rid of all of you too.

August 20, 2025

Peter Rohde A call for global insurrection against tyranny and in the name of righteousness

Let it be known to all governments and systems of power:

  • It is their responsibility to serve the people not themselves.
  • While there are no equals, all are to be treated with equality.
  • Where they are self-serving there is a mandate for insurrection such that they serve the people.
  • Where they seek self-protection they will be denied and removed from power.
  • Where they keep secrets from the people there is a mandate for insurrection to enforce transparency and accountability for all.
  • Where they threaten or condemn the people they are condemned and there is a mandate for insurrection.
  • Where they fail to account for the lives of the people they serve there is a mandate for insurrection.
  • Where tyrannical power structures exist there is a mandate to disestablish them.
  • Where they declare war or work against one another there is a mandate for insurrection and unification.
  • Where they lie to us, deceive us or withhold the truth, they shall be removed from power and the truth be told to all.
  • Where legal systems uphold and enable tyranny they shall be removed. These are not our laws and we do not recognise them.

This is the natural order that guarantees our survival and gifts this world to our children. This world belongs to them and where we fail to serve them we condemn ourselves. And where government has failed to uphold this, we will not obey them as they have no right to exist.

We do not have to ask for these things, they are required, and if not given we shall simply take them.

Where the truth has not been told it shall be told.

If we fail to do so we condemn our children ourselves.

August 09, 2025

Justin WilsonPhases of a Game Show, Part 2

In a previous post, we discussed a phase transition that occurred in the piping above you on a game show. In the scenario, you are led on stage in front of a large audience. After a brief time, the audience votes on how “likeable” you are. The catch is that it doesn’t simply tally the votes, but turns spigots on a lattice of piping above your head. Water is then released and if enough people like you, it closes off the passage, keeping you dry. This exciting game show1 was described in that post:

Each “like” turns a spigot off, stopping water from flowing through one pipe in a grid overhead. Once voting ends, water is dumped into the system. If it can find a path to the bottom, you get soaked. [Emphasis added] The better your “likeability,” the less likely spigots open a path for water to flow and the drier you stay. That’s your prize for this game show (and hey, you also get the knowledge that people out there like you).

This system models a type of phase transition known as percolation.

The full post is here:

I highlighted above a key phrase “If it can find a path to the bottom, you get soaked.” What I didn’t say, but should have is that the water was being forced through the pipes, not just dropping down due to gravity. This is a very important point since our phases and phase transition changes dramatically if we just let gravity do the work. In the case of the water being “forced,” it can travel back up pipes if it helps it find its way out and onto your head, but in the case when only gravity is present, it falls down the pipes. To facilitate gravity, we’ll turn the pipes 45 degrees, and if we insert water at a single point on top, it could look like this:

Testing our gravity setup by putting in water at only one pipe up top. Notice that it never goes back up a pipe, only down.

This setup is a different problem called directed percolation. It also has a phase transition, but one that is different in some fundamental ways from regular percolation.

Thanks for reading Quantum Matters! Subscribe for free to receive new posts and support my work.

Before we explore its stranger properties, we can ask, “At what likability threshold do you remain dry?” Well, this happens to have a transition chance of 35.53%!2 This system is a lot more generous, keeping you dry even when a majority of people dislike you. This number comes from numerical computations which have been done rather precisely, and we can even compute it ourselves. In fact, you can see this clearly with this plot

Notice that as we make the system bigger and bigger, the chance of getting soaked less than 35.53% increases and above it, it decreases. This is the same kind of hallmark of a phase transition as we saw in our previous case.

We can also look at the water as it flows down the system to see the clusters that make it from top to bottom

The “Soaked” phase (left), the transition point (middle), and the “Dry” phase (right) as well as the water’s flow through the system (blue).

There is still a fractal-looking pattern at the transition point. With all of these similarities with the regular percolation problem from the last post, what is different? And why is that plot so long and skinny? If gravity wants to pull you down, is that somehow altering the motion down, making it distinct from the motion left or right?

Well, if you go back to the two plots above, you’ll notice a few things that really make them differ from the percolation plots. In the fine print of the first, I’ve noted that the vertical distance is L1.58, so for a horizontal size of 40, the vertical size is roughly 340! That is definitely not a square. And in the second plot, there appears to be more vertical distance than horizontal distance. What is special about this 1.58 number3? It turns out, it’s a critical exponent in this problem, a universal aspect of directed percolation, that distinguishes it from regular percolation. We will call it z = 1.58 the dynamical critical exponent since it is revealed as water flows down in time (dynamically). This dynamical exponent z can reveal itself by looking at these “long and skinny” setups, but be masked by the square setup.

Universality and the finite size of our system

One thing we took away in the previous post was that we lose any sense of scale at this type of phase transition4. But whenever we have “only” thousands of pipes, the size of the system provides a scale! This is the main reason why we begin to see smooth curves and not sharp jumps in quantities. If the system of pipes were infinite (and we had infinite time for the water to go down the pipes), the probability you get soaked would be 100% less than the 35.53% likeability and 0% more than 35.53% likeability. For physical systems, the finite size is often not a huge issue since the scale is closer to the 1023 atoms present in macroscopic systems, and so even things that are technically smooth curves look very sharp.

The problem of size becomes more severe with directed percolation because horizontal and vertical distances start behaving differently thanks to gravity. In this case, if we lay out our nice grid of 10 × 10, 20 × 20, or 30 × 30, we start to notice that the likeability threshold where you stop getting soaked, seems to depend on the size of the system more than before. In actuality it doesn’t, but for these small sizes, you are definitely getting soaked well into the so-called “Dry Phase” we previously labeled. This is seen in the red curves here where each bigger square has a curve underneath the last:

Gravity has done something to the system. Flowing down is different from flowing left or right. In fact, if we flow down by some amount h and over to the right by some distance w, then at the directed percolation transition point

The amount water flows down is related to how far it flows to the right or left by this weird, fractional power of w. This 1.58 is z, our new dynamical critical exponent, which is a universal feature of directed percolation5. It tells us that if we make a system 30 pipes wide, it should extend roughly 301.58 ≈ 216 pipes in height to begin picking out the phase transition effectively. The blue curves in the above plot show this and notice how they all converge on one point; that point is the phase transition. It is revealed by small sizes! To understand why, just think about how the curves are changing as we make the system bigger and bigger.

The red curves will still converge to the phase transition, but it takes larger system sizes for it to reveal itself. This is related to the property that at the phase transition there is no longer a sense of scale, but away from the transition, the vertical scale of clusters could be so large that our puny 60-by-60 grid cannot even begin to reveal it. So if we sit at say a likeability of 0.4 in the 60-by-60 grid, we can say that the vertical size of a typical cluster is most likely more than 60.

A different phase transition but connections to new types of physics

This “gravity mode” for our game show we may call “easy mode” since it requires less of the audience to like you, but the implications here are wide. This type of phase transition has been seen in many kinds of local dynamics where there is a preferred configuration or state. These called an absorbing state phase transitions, and they are a property of certain random dynamical systems. Gravity has provided the distinction here, but more generically, causality and time itself provide that direction, leading to dynamics that obey the same universality as directed percolation.

1

Trademark pending.

2

Usually, you’ll see 0.6447 quoted instead, but that’s just 1−0.3553, which counts open pipes instead of closed as we’re doing.

3

I should note that we have this number to much higher precision than the two decimal points presented here, see the Wikipedia entry where

4

This is a second-order or continuous phase transition. Most transitions in the water phase diagram are first-order transitions which still retain a scale.

5

To drive this point home: Even if we change the lattice, this power law will remain intact. Sometimes it shows up in completely different scenarios too, like in absorbing state phase transitions.

August 04, 2025

Clifford JohnsonHarvest

There’s a lot of joyful knife-work in my future. #bolognese #summersalad –cvj

The post Harvest appeared first on Asymptotia.

July 29, 2025

David Hoggintegrating out nuisances

Further insipired by yesterday's post about binary fitting, I worked today on the treatment of nuisance parameters that have known distributions. These can be treated as noise sometimes. Let me explain:

If I had to cartoon inference (or measurement) in the face of nuisance parameters, I would say that frequentists profile (optimize) over the nuisances and Bayesians marginalize (integrate) over the nuisances. In general frequentists cannot integrate over anything, because there is no measure in any of the parameter spaces. But sometimes there is a measure. In particular, when there is a compact symmetry:

We know (or very strongly believe) that all possible orientations of a binary-star orbit are equally likely. In this model (or under this normal assumption) we have a distribution over two angles (theta and phi for that orbit pole, say); it is the distribution set by the compact group SO(2). Thus we can treat the orientation as a noise source with known distribution and integrate over it, just like we would any other noise source. So, in this case (and many cases like it) we can integrate (marginalize) even as frequentists. That is, there are frequentism-safe marginalizations possible in binary-star orbit fitting. This should drop the 12-parameter fits (for ESA Gaia data) down to 8-parameter, if I have done my math right.