Planet Musings

January 21, 2025

Scott Aaronson Open letter to any Shtetl-Optimized readers who know Elon

Did Elon Musk make a Nazi salute? Well, not exactly. As far as I can tell, the truth is that he recklessly and repeatedly made a hand gesture that the world’s millions of Nazi sympathizers eagerly misinterpreted as a Nazi salute. He then (the worse part) declined to clarify or apologize in any way, opting instead for laugh emojis.

I hasten to add: just like with Trump’s Charlottesville dogwhistles, I find it ludicrous to imagine that Elon has any secret desire to reopen the gas chambers or whatever—and not only because of Elon’s many pro-Zionist and philosemitic actions, statements, and connections. That isn’t the issue, so don’t pretend I think it is.

Crucially, though, “not being a literal Nazi” isn’t fully exculpatory. I don’t want the overlords of the planet treating these matters as jokes. I want them to feel the crushing weight of history, exactly like I would feel it in their shoes.

Regardless of my distaste for everything that happened to reach this point, Elon is now in a unique position to nudge Trump in the direction of liberality and enlightenment on various issues.  And while I doubt Elon finds time to read Shtetl-Optimized between his CEOing, DOGEing, tweeting, and video game speedruns, I know for certain that there are multiple readers of this blog to whom Elon has listened in the past—and those people are now in a unique position too!

A public “clarification” from Elon—not an apology, not an admission of guilt, but just an acknowledgment that he knows why sleeping dragons like Nazism shouldn’t be poked for shits and giggles, that he’ll try to be careful in the future—would be a non-negligible positive update for me about the future of the world.

I understand exactly why he doesn’t want to do it: because he doesn’t want to grant any legitimacy to what he sees as the biased narrative of a legacy media that despises him. But granting some legitimacy to that narrative is precisely what I, a classically liberal Jewish scientist who bears the battle scars of attempted woke cancellation, am asking him to do. I’m asking him to acknowledge that he’s now by any measure one of the most powerful people on the planet, that with great power comes great responsibility, and that fascism is a well-known failure mode for powerful rightists, just like Communism is a well-known failure mode for leftists. I’m asking for reassurance that he takes that failure mode seriously, just like he correctly takes human extinction and catastrophic AI risk seriously.

Anyway, I figured it was worth a try, given how much I really believe might hinge on how Elon chooses to handle this. I don’t want to be kicking myself, for the rest of my life, that I had a chance to intervene in the critical moment and didn’t.

Scott Aaronson The mini-singularity

Err, happy MLK Day!

This week represents the convergence of so many plotlines that, if it were the season finale of some streaming show, I’d feel like the writers had too many balls in the air. For the benefit of the tiny part of the world that cares what I think, I offer the following comments.


My view of Trump is the same as it’s been for a decade—that he’s a con man, a criminal, and the most dangerous internal threat the US has ever faced in its history. I think Congress and Merrick Garland deserve eternal shame for not moving aggressively to bar Trump from office and then prosecute him for insurrection—that this was a catastrophic failure of our system, one for which we’ll now suffer the consequences. If this time Trump got 52% of some swing state rather than 48%, if the “zeitgeist” or the “vibes” have shifted, if the “Resistance” is so weary that it’s barely bothering to show up, if Bezos and Zuckerberg and Musk and even Sam Altman now find it expedient to placate the tyrant rather than standing up for what previously appeared to be their principles—well, I don’t see how any of that affects how I ought to feel.

All the same, I have no plans to flee the United States or anything, just like I didn’t the last time. I’ll even permit myself pleasure when the crazed strongman takes actions that I happen to agree with (like pushing the tottering Ayatollah regime toward its well-deserved end). And then I’ll vote for Enlightenment values (or the nearest available approximation) in 2026 and 2028, assuming the country survives until then.


The second plotline is the ceasefire in Gaza, and the beginning of the release of the Israeli hostages, in exchange for thousands of Palestinian prisoners. I have all the mixed emotions you might expect. I’m terrified about the precedent this reinforces and about the many mass-murderers it will free—as I was terrified in 2011 by the Gilad Shalit deal, the one that released Sinwar and thereby set the stage for October 7. Certainly World War II didn’t end with the Nazis marching triumphantly around Berlin, guns in the air, and vowing to repeat their conquest of Europe at the earliest opportunity. All the same, it’s not my place to be more Zionist than Netanyahu, or than the vast majority of the Israeli public that supported the deal. I’m obviously thrilled to see the hostages return, and even slightly touched by the ethic that would move heaven and earth to save these specific people, almost every consideration of game theory and utilitarianism be damned. I take solace that we’re not quite returning to the situation of October 6, since Hamas, Hezbollah, and Iran itself have all been severely degraded (and the Assad regime no longer exists). This is no longer 1944, when you can slaughter 1200 Jews without paying any price for it: that was the original promise of the State of Israel. All the same, I fear that bloodshed will continue from here until the Singularity, unless majorities on both sides choose coexistence—partition, the two-state solution, call it whatever you will. And that’s primarily a question of culture, and the education of children.


The third plotline was the end of TikTok, quickly followed by its (temporary?) return on Trump’s order. As far as I can tell, Instagram, Twitter/X, and TikTok have all been net negatives for the world; it would’ve been far better if none of them had been invented. But, OK, our society allows many things that are plausibly net-negative, like sports betting and Cheetos. In this case, however, the US Supreme Court ruled 9-0 (!!) that Congress has a legitimate interest in keeping Chinese Communist Party spyware off 170 million Americans’ phones—and that there’s no First Amendment concern that overrides this security interest, since the TikTok ban isn’t targeting speech on the basis of its content. I found the court’s argument convincing. I hope TikTok goes dark 90 days from now—or, second-best, that it gets sold to some entity that’s merely bad in the normal ways and not a hostile foreign power.


The fourth plotline is the still-ongoing devastation of much of Los Angeles. I heard from friends at Caltech and elsewhere who had to evacuate their homes—but at least they had homes to return to, as those in Altadena and the Palisades didn’t. It’s a sign of the times that even a disaster of this magnitude now brings only partisan bickering: was the cause climate change, reshaping the entire planet in terrifying ways, just like all those experts have been warning for decades? Or was it staggering lack of preparation from the California and LA governments? My own answers to these questions are “yes” and “yes.”

Maybe I’ll briefly highlight the role of the utilitarianism versus deontology debate. According to this article from back in October, widely shared once the fires started, the US Forest Service halted controlled burns in California because it lacked the manpower, but also this:

“I think the Forest Service is worried about the risk of something bad happening [with a prescribed burn]. And they’re willing to trade that risk — which they will be blamed for — for increased risks on wildfires,” Wara said. In the event of a wildfire, “if something bad happens, they’re much less likely to be blamed because they can point the finger at Mother Nature.”

We saw something similar with the refusal to allow challenge trials for the COVID vaccines, which could’ve moved the approval date up by months and saved millions of lives. Humans are really bad at trolley problems, at weighing a concrete, immediate risk against a diffuse future risk that might be orders of magnitude worse. (Come to think of it, Israel’s repeated hostage deals are another example—though that one has the defense that it demonstrates the lengths to which the state will go to protect its people.)


Oh, and on top of all the other plotlines, today—January 20th—is my daughter’s 12th birthday. Happy birthday Lily!!

January 20, 2025

John PreskillTen lessons I learned from John Preskill

Last August, Toronto’s Centre for Quantum Information and Quantum Control (CQIQC) gave me 35 minutes to make fun of John Preskill in public. CQIQC was hosting its biannual conference, also called CQIQC, in Toronto. The conference features the awarding of the John Stewart Bell Prize for fundamental quantum physics. The prize derives its name for the thinker who transformed our understanding of entanglement. John received this year’s Bell Prize for identifying, with collaborators, how we can learn about quantum states from surprisingly few trials and measurements.

The organizers invited three Preskillites to present talks in John’s honor: Hoi-Kwong Lo, who’s helped steer quantum cryptography and communications; Daniel Gottesman, who’s helped lay the foundations of quantum error correction; and me. I believe that one of the most fitting ways to honor John is by sharing the most exciting physics you know of. I shared about quantum thermodynamics for (simple models of) nuclear physics, along with ten lessons I learned from John. You can watch the talk here and check out the paper, recently published in Physical Review Letters, for technicalities.

John has illustrated this lesson by wrestling with the black-hole-information paradox, including alongside Stephen Hawking. Quantum information theory has informed quantum thermodynamics, as Quantum Frontiers regulars know. Quantum thermodynamics is the study of work (coordinated energy that we can harness directly) and heat (the energy of random motion). Systems exchange heat with heat reservoirs—large, fixed-temperature systems. As I draft this blog post, for instance, I’m radiating heat into the frigid air in Montreal Trudeau Airport.

So much for quantum information. How about high-energy physics? I’ll include nuclear physics in the category, as many of my Europeans colleagues do. Much of nuclear physics and condensed matter involves gauge theories. A gauge theory is a model that contains more degrees of freedom than the physics it describes. Similarly, a friend’s description of the CN Tower could last twice as long as necessary, due to redundancies. Electrodynamics—the theory behind light bulbs—is a gauge theory. So is quantum chromodynamics, the theory of the strong force that holds together a nucleus’s constituents.

Every gauge theory obeys Gauss’s law. Gauss’s law interrelates the matter at a site to the gauge field around the site. For example, imagine a positive electric charge in empty space. An electric field—a gauge field—points away from the charge at every spot in space. Imagine a sphere that encloses the charge. How much of the electric field is exiting the sphere? The answer depends on the amount of charge inside, according to Gauss’s law.

Gauss’s law interrelates the matter at a site with the gauge field nearby…which is related to the matter at the next site…which is related to the gauge field farther away. So everything depends on everything else. So we can’t easily claim that over here are independent degrees of freedom that form a system of interest, while over there are independent degrees of freedom that form a heat reservoir. So how can we define the heat and work exchanged within a lattice gauge theory? If we can’t, we should start biting our nails: thermodynamics is the queen of the physical theories, a metatheory expected to govern all other theories. But how can we define the quantum thermodynamics of lattice gauge theories? My colleague Zohreh Davoudi and her group asked me this question.

I had the pleasure of addressing the question with five present and recent Marylanders…

…the mention of whom in my CQIQC talk invited…

I’m a millennial; social media took off with my generation. But I enjoy saying that my PhD advisor enjoys far more popularity on social media than I do.

How did we begin establishing a quantum thermodynamics for lattice gauge theories?

Someone who had a better idea than I, when I embarked upon this project, was my colleague Chris Jarzynski. So did Dvira Segal, a University of Toronto chemist and CQIQC’s director. So did everyone else who’d helped develop the toolkit of strong-coupling thermodynamics. I’d only heard of the toolkit, but I thought it sounded useful for lattice gauge theories, so I invited Chris to my conversations with Zohreh’s group.

I didn’t create this image for my talk, believe it or not. The picture already existed on the Internet, courtesy of this blog.

Strong-coupling thermodynamics concerns systems that interact strongly with reservoirs. System–reservoir interactions are weak, or encode little energy, throughout much of thermodynamics. For example, I exchange little energy with Montreal Trudeau’s air, relative to the amount of energy inside me. The reason is, I exchange energy only through my skin. My skin forms a small fraction of me because it forms my surface. My surface is much smaller than my volume, which is proportional to the energy inside me. So I couple to Montreal Trudeau’s air weakly.

My surface would be comparable to my volume if I were extremely small—say, a quantum particle. My interaction with the air would encode loads of energy—an amount comparable to the amount inside me. Should we count that interaction energy as part of my energy or as part of the air’s energy? Could we even say that I existed, and had a well-defined form, independently of that interaction energy? Strong-coupling thermodynamics provides a framework for answering these questions.

Kevin Kuns, a former Quantum Frontiers blogger, described how John explains physics through simple concepts, like a ball attached to a spring. John’s gentle, soothing voice resembles a snake charmer’s, Kevin wrote. John charms his listeners into returning to their textbooks and brushing up on basic physics.

Little is more basic than the first law of thermodynamics, synopsized as energy conservation. The first law governs how much a system’s internal energy changes during any process. The energy change equals the heat absorbed, plus the work absorbed, by the system. Every formulation of thermodynamics should obey the first law—including strong-coupling thermodynamics. 

Which lattice-gauge-theory processes should we study, armed with the toolkit of strong-coupling thermodynamics? My collaborators and I implicitly followed

and

We don’t want to irritate experimentalists by asking them to run difficult protocols. Tom Rosenbaum, on the left of the previous photograph, is a quantum experimentalist. He’s also the president of Caltech, so John has multiple reasons to want not to irritate him.

Quantum experimentalists have run quench protocols on many quantum simulators, or special-purpose quantum computers. During a quench protocol, one changes a feature of the system quickly. For example, many quantum systems consist of particles hopping across a landscape of hills and valleys. One might flatten a hill during a quench.

We focused on a three-step quench protocol: (1) Set the system up in its initial landscape. (2) Quickly change the landscape within a small region. (3) Let the system evolve under its natural dynamics for a long time. Step 2 should cost work. How can we define the amount of work performed? By following

John wrote a blog post about how the typical physicist is a one-trick pony: they know one narrow subject deeply. John prefers to know two subjects. He can apply insights from one field to the other. A two-trick pony can show that Gauss’s law behaves like a strong interaction—that lattice gauge theories are strongly coupled thermodynamic systems. Using strong-coupling thermodynamics, the two-trick pony can define the work (and heat) exchanged within a lattice gauge theory. 

An experimentalist can easily measure the amount of work performed,1 we expect, for two reasons. First, the experimentalist need measure only the small region where the landscape changed. Measuring the whole system would be tricky, because it’s so large and it can contain many particles. But an experimentalist can control the small region. Second, we proved an equation that should facilitate experimental measurements. The equation interrelates the work performed1 with a quantity that seems experimentally accessible.

My team applied our work definition to a lattice gauge theory in one spatial dimension—a theory restricted to living on a line, like a caterpillar on a thin rope. You can think of the matter as qubits2 and the gauge field as more qubits. The system looks identical if you flip it upside-down; that is, the theory has a \mathbb{Z}_2 symmetry. The system has two phases, analogous to the liquid and ice phases of H_2O. Which phase the system occupies depends on the chemical potential—the average amount of energy needed to add a particle to the system (while the system’s entropy, its volume, and more remain constant).

My coauthor Connor simulated the system numerically, calculating its behavior on a classical computer. During the simulated quench process, the system began in one phase (like H_2O beginning as water). The quench steered the system around within the phase (as though changing the water’s temperature) or across the phase transition (as though freezing the water). Connor computed the work performed during the quench.1 The amount of work changed dramatically when the quench started steering the system across the phase transition. 

Not only could we define the work exchanged within a lattice gauge theory, using strong-coupling quantum thermodynamics. Also, that work signaled a phase transition—a large-scale, qualitative behavior.

What future do my collaborators and I dream of for our work? First, we want for an experimentalist to measure the work1 spent on a lattice-gauge-theory system in a quantum simulation. Second, we should expand our definitions of quantum work and heat beyond sudden-quench processes. How much work and heat do particles exchange while scattering in particle accelerators, for instance? Third, we hope to identify other phase transitions and macroscopic phenomena using our work and heat definitions. Fourth—most broadly—we want to establish a quantum thermodynamics for lattice gauge theories.

Five years ago, I didn’t expect to be collaborating on lattice gauge theories inspired by nuclear physics. But this work is some of the most exciting I can think of to do. I hope you think it exciting, too. And, more importantly, I hope John thought it exciting in Toronto.

I was a student at Caltech during “One Entangled Evening,” the campus-wide celebration of Richard Feynman’s 100th birthday. So I watched John sing and dance onstage, exhibiting no fear of embarrassing himself. That observation seemed like an appropriate note on which to finish with my slides…and invite questions from the audience.

Congratulations on your Bell Prize, John.

1Really, the dissipated work.

2Really, hardcore bosons.

January 19, 2025

John BaezMagnetohydrodynamics

Happy New Year!

I recently wrote about how the Parker Solar Probe crossed the Sun’s ‘Alfvén surface’: the surface outside which the outflowing solar wind becomes supersonic.

This is already pretty cool—but even better, the ‘sound’ here is not ordinary sound: it consists of vibrations in both the hot electrically conductive plasma of the Sun’s atmosphere and its magnetic field! These vibrations are called ‘Alfvén waves’.

To understand these waves, we need to describe how an electrically conductive fluid interacts with the electric and magnetic fields. We can do this—to a reasonable approximation—using the equations of ‘magnetohydrodynamics’:

• Wikipedia, Magnetohydrodynamics: equations.

These equations also describe other phenomena we see in stars, outer space, fusion reactors, the Earth’s liquid outer core, and the Earth’s upper atmosphere: the ionosphere, and above that the magnetosphere. These phenomena can be very difficult to understand, combining all the complexities of turbulence with new features arising from electromagnetism. Here’s an example, called the Orzag–Tang vortex, simulated by Philip Mocz:

I’ve never studied magnetohydrodynamics—I was afraid that if I learned a little, I’d never stop, because it’s endlessly complex. Now I’m getting curious. But all I want to do today is explain to you, and myself, what the equations of magnetohydrodynamics are—and where they come from.

To get these equations, we assume that our system is described by some time-dependent fields on 3-dimensional space, or whatever region of space our fluid occupies. I’ll write vector fields in boldface and scalar fields in non-bold:

• the velocity of our fluid, \mathbf{v}

• the density field of our fluid, \rho

• the pressure field of our fluid, P

• the electric current, \mathbf{J}

• the electric field, \mathbf{E}

• the magnetic field, \mathbf{B}

You may have noticed one missing: the charge density! We assume that this is zero, because in a highly conductive medium the positive and negative charges quickly even out unless the electric field is changing very rapidly. (When this assumption breaks down we use more complicated equations.)

So, we start with Maxwell’s equations, but with the charge density set to zero:

\begin{array}{lll}     \nabla\cdot\mathbf{E}= 0 & \qquad \quad &     \displaystyle{   \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} } \\ \\     \nabla\cdot\mathbf{B}=0  & &  \displaystyle{   \nabla\times\mathbf{B}= \mu_0 \mathbf{J}+ \epsilon_0 \mu_0 \frac{\partial\mathbf{E} }{\partial t} }  \end{array}

I’m writing them in SI units, where they include two constants:

• the electric permittivity of the vacuum, \epsilon_0

• the magnetic permeability of the vacuum, \mu_0

The product of these two constants is the reciprocal of the square of the speed of light:

\mu_0 \epsilon_0 = 1/c^2

This makes the last term Maxwell added to his equations very small unless the electric field is changing very rapidly:

\displaystyle{ \nabla\times\mathbf{B}= \mu_0 \mathbf{J} + \epsilon_0 \mu_0 \frac{\partial\mathbf{E} }{\partial t} }

In magnetohydrodynamics we assume the electric field is changing slowly, so we drop this last term, getting a simpler equation:

\displaystyle{ \nabla\times\mathbf{B}= \mu_0 \mathbf{J} }

We also assume a sophisticated version of Ohm’s law: the electric current \mathbf{J} is proportional to the force on the charges at that location. But here the force involves not only the electric field but the magnetic field! So, it’s given by the Lorentz force law, namely

\mathbf{E} + \mathbf{v} \times \mathbf{B}

where notice we’re assuming the velocity of the charges is the fluid velocity \mathbf{v}. Thus we get

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

where \eta is the electrical conductivity of the fluid.

Next we assume local conservation of mass: the increase (or decrease) of the fluid’s density at some point can only be caused by fluid flowing toward (or away from) that point. So, the time derivative of the density \rho is minus the divergence of the momentum density \rho \mathbf{v}:

\displaystyle{ \frac{\partial \rho}{\partial t} =   - \nabla \cdot (\rho \mathbf{v}) }

This is analogous to the equation describing local conservation of charge in electromagnetism: the so-called continuity equation.

We also assume that the pressure is some function of the density:

\displaystyle{ P = f(\rho) }

This is called the equation of state of our fluid. The function f depends on the fluid: for example, for an ideal gas P is simply proportional to \rho. We can use the equation of state to eliminate P and work with \rho.

Last—but not least—we need an equation of motion saying how the fluid’s velocity \mathbf{v} changes with time! This equation follows from Newton’s law \mathbf{F} = m \mathbf{a}, or more precisely his actual law

\displaystyle{ \frac{d}{d t} \mathbf{p} = \mathbf{F}}

where \mathbf{p} is momentum. But we need to replace \mathbf{p} by the momentum density \rho \mathbf{v} and replace \mathbf{F} by the force density, which is

\displaystyle{ \mathbf{J} \times \mathbf{B} - \nabla P + \mu \nabla^2 \mathbf{v}}

The force density comes three parts:

• The force of the magnetic field on the current, as described by the Lorentz force law: \mathbf{J} \times \mathbf{B}.

• The force caused by the gradient of pressure, pointing toward regions of lower pressure: - \nabla P

• the force caused by viscosity, where faster bits of fluid try to speed up their slower neighbors, and vice versa: \mu \nabla^2 \mathbf{v}.

Here \mu is called the coefficient of viscosity.

Thus, our equation of motion is this:

\displaystyle{ \frac{D}{D t} (\rho \mathbf{v}) = \mathbf{J} \times \mathbf{B} - \nabla P + \mu \nabla^2 \mathbf{v} }

Here we aren’t using just the ordinary time derivative of \rho \mathbf{v}: we want to keep track of how \rho \mathbf{v} is changing for a bit of fluid that’s moving along with the flow of the fluid, so we need to add in the derivative of \rho \mathbf{v} in the \mathbf{v} direction. For this we use the material derivative:

\displaystyle{ \frac{D}{Dt} = \frac{\partial}{\partial t} + \mathbf{v} \cdot \nabla   }

which also has many other names like ‘convective derivative’ or ‘substantial derivative’.

So, those are the equations of magnetohydrodynamics! Let’s see them in one place. I’ll use the equation of state to eliminate the pressure by writing it as a function of density:

MAGNETOHYDRODYNAMICS

Simplified Maxwell equations:

\begin{array}{lll}     \nabla\cdot\mathbf{E}= 0 & \qquad \quad &     \displaystyle{ \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} } \\ \\     \nabla\cdot\mathbf{B}=0  & &  \displaystyle{  \nabla\times\mathbf{B}= \mu_0 \mathbf{J} }  \end{array}

Ohm’s law:

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

Local conservation of mass:

\displaystyle{ \frac{\partial \rho}{\partial t} = - \nabla \cdot (\rho \mathbf{v}) }

Equation of motion:

\displaystyle{ \left( \frac{\partial}{\partial t} + \mathbf{v} \cdot \nabla  \right) (\rho \mathbf{v}) = \mathbf{J} \times \mathbf{B} - \nabla (f(\rho)) +  \mu \nabla^2 \mathbf{v}}

Notice that in our simplified Maxwell equations, two terms involving the electric field are gone. That’s why these are called the equations of magnetohydrodynamics. You can even eliminate the current \mathbf{J} from these equations, replacing it with \mu_0 \nabla \times \mathbf{B}. The magnetic field reigns supreme!

Magnetic diffusion

It feels unsatisfying to quit right after I show you the equations of magnetohydrodynamics. Having gotten this far, I can’t resist showing you a couple of cool things we can do with these equations!

First, we can use Ohm’s law to see how the magnetic field tends to ‘diffuse’, like heat spreads out through a medium.

We start with Ohm’s law:

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

We take the curl of both sides:

\displaystyle{ \eta \nabla \times \mathbf{J} = \nabla \times \mathbf{E} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

We can get every term here to involve \mathbf{B} if we use two of our simplified Maxwell equations:

\displaystyle{ \mathbf{J} = \frac{1}{\eta} \nabla \times \mathbf{B}, \qquad  \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} }

We get this:

\displaystyle{ \frac{\eta}{\mu_0} \nabla \times (\nabla \times \mathbf{B}) = -\frac{\partial\mathbf{B} }{\partial t} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

Then we can use this vector identity:

\displaystyle{ \nabla \times (\nabla \times \mathbf{B}) = \nabla (\nabla \cdot \mathbf{B}) - \nabla^2 \mathbf{B} }

Since another of the Maxwell equations says \nabla \cdot \mathbf{B} = 0, we get

\displaystyle{ \nabla \times (\nabla \times \mathbf{B}) = - \nabla^2 \mathbf{B} }

and thus

\displaystyle{ - \frac{\eta}{\mu_0}  \nabla^2 \mathbf{B} = -\frac{\partial\mathbf{B} }{\partial t} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

and finally we get

THE MAGNETIC DIFFUSION EQUATION

\displaystyle{ \frac{\partial\mathbf{B} }{\partial t}  = \frac{\eta}{\mu_0}  \nabla^2 \mathbf{B} \, + \, \nabla \times( \mathbf{v} \times \mathbf{B}) }

Except for the last term, this is the heat equation—but not for temperature, which is a scalar field, but for the magnetic field, which is a vector field! The constant \eta / \mu_0 says how fast the magnetic field spreads out, so it’s called the magnetic diffusivity.

The last term makes things more complicated and interesting.

Magnetic pressure and tension

Second, and finally, I want to give you more intuition for how the magnetic field exerts a force on the conductive fluid in magnetohydrodynamics. We’ll see that the magnetic field has both pressure and tension.

Remember, the magnetic field exerts a force

\mathbf{F}_m = \mathbf{J} \times \mathbf{B}

More precisely this is the force per volume: the magnetic force density. We can express this solely in terms of the magnetic field using one of our simplified Maxwell equations

\nabla \times \mathbf{B} = \mu_0 \mathbf{J}

We get

\displaystyle{ \mathbf{F}_m = \frac{1}{\mu_0} (\nabla \times \mathbf{B}) \times \mathbf{B} }

Next we can use a fiendish vector calculus identity: for any vector fields \mathbf{A}, \mathbf{B} in 3d space we have

\begin{array}{ccl}   \nabla (\mathbf{A} \cdot \mathbf{B}) &=&   (\mathbf{A} \cdot \nabla) \mathbf{B} + (\mathbf{B} \cdot \nabla) \mathbf{A} \\  & & +\mathbf{A} \times (\nabla \times \mathbf{B}) + \mathbf{B} \times (\nabla \times \mathbf{A})   \end{array}

When \mathbf{A} = \mathbf{B} this gives

\nabla (\mathbf{B} \cdot \mathbf{B}) = 2(\mathbf{B} \cdot \nabla) \mathbf{B} + 2 \mathbf{B} \times (\nabla \times \mathbf{B})

or what we actually need now:

(\nabla \times \mathbf{B}) \times \mathbf{B} = (\mathbf{B} \cdot \nabla) \mathbf{B} - \frac{1}{2} \nabla (B^2)

where B^2 = \mathbf{B} \cdot \mathbf{B}.

This identity gives a nice formula for

THE MAGNETIC FORCE DENSITY

\displaystyle{ \mathbf{F}_m = \frac{1}{\mu_0}(\mathbf{B} \cdot \nabla) \mathbf{B} \, - \, \frac{1}{2\mu_0} \nabla (B^2) }

The point of all these manipulations was not merely to revive our flagging memories of vector calculus—though that was good too. The real point is that they reveal that the magnetic force density consists of two parts, each with a nice physical intepretation.

Of course the force acts on the fluid, not on the magnetic field lines themselves. But as you dig deeper into magnetohydrodynamics, you’ll see that sometimes the magnetic field lines get ‘frozen in’ to the fluid—that is, they get carried along by the fluid flow, like bits of rope in a raging river. Then the first term above tends to straighten out these field lines, while the second term tends to push them apart!

The second term is minus the gradient of

\displaystyle{ \frac{1}{2\mu_0} B^2 }

Since minus the gradient of ordinary pressure creates a force density on a fluid, we call the quantity above the magnetic pressure. Yes, the square of the magnitude of the magnetic field creates a kind of pressure! It pushes the fluid like this:


The other more subtle term is called magnetic tension:

\displaystyle{ \frac{1}{\mu_0}(\mathbf{B} \cdot \nabla) \mathbf{B}  }

It points like this:



Its magnitude is \kappa B^2/\mu_0 where 1/\kappa is the radius of curvature of the circle that best fits the magnetic field lines at the given point. And it points toward the center of that circle!

I’m getting tired of proving formulas, so I’ll leave the proofs of these facts as puzzles. If you get stuck and want a hint, go here.

There’s a lot more to say about this, and you can find a lot of it here:

• Gordon I. Ogilvie, Lecture notes: astrophysical fluid dynamics.

I might or might not say more—but if I don’t, these notes will satisfy your curiosity about exactly when magnetic field lines get ‘frozen in’ to the motion of an electrically conductive fluid, and how magnetic pressure then tends to push these field lines apart, while magnetic tension tends to straighten them out!

All of this is very cool, and I think this subject is a place where all the subtler formulas of vector calculus really get put to good use.

January 17, 2025

Matt Strassler Double Trouble: The Quantum Two-Slit Experiment (1)

Happy New Year! 2025 is the centenary of some very important events in the development of quantum physics — the birth of new insights, of new mathematics, and of great misconceptions. For this reason, I’ve decided that this year I’ll devote more of this blog to quantum fundamentals, and take on some of the tricky issues that I carefully avoided in my recent book.

My focus will be on very basic questions, such as: How does quantum physics work, to the extent we humans understand it? Which of the widely-held and widely-promulgated ideas about quantum weirdness are true? And for those that aren’t, what is the right way to think about them?

I’ll frame some of this discussion in the context of the quantum two-slit experiment, because

  • it’s famous,
  • it’s often poorly explained
  • it’s often poorly understood,
  • it highlights (when properly understood) an extraordinarily strange aspect of quantum physics.

Not that I’ll cover this subject all in one post… far from it! It’s going to take quite some time.

The Visualization Problem

We humans often prefer to understand things visually. The problem with explaining quantum physics, aside from the fact that no one understands it 100%, is that all but the ultra-simplest problems are impossible to depict in an image or animation. This forces us to use words instead. Unfortunately, words are inherently misleading. Even when partial visual depictions are possible, they too are almost always misleading. (Math is helpful, but not as much as you’d think; it’s usually subtle and complicated, too.) So communication and clear thinking are big challenges throughout quantum physics.

These difficulties lead to many widespread misconceptions (some of which I myself suffered from when I was a student first learning the subject.) For instance, one of the most prevalent and problematic, common among undergraduates taking courses in chemistry or atomic physics, is the wrong idea that each elementary particle has its own wavefunction — a function which tells us the probability of where it might currently be located. This confusion arises, as much as anything else, from a visualization challenge.

Consider the quantum physics of the three electrons in a lithium atom. If you’ve read anything about quantum physics, you may have been led to believe that that each of the three electrons has a wave function, describing its behavior in three-dimensional space. In other words,

  • naively the system would be described by three wave functions, each in three dimensions; each electron’s wave function tells us the probability that it is located at this point or that one;
  • but in fact the three electrons are described by one wave function in nine dimensions, telling us simultaneously the overall probability that the first electron is t be found at this point, the second at that point, and the third at some other point.

Unfortunately, drawing something that exists in nine dimensions is impossible! Three wave functions in three dimensions is much easier to draw, and so, as a compromise/approximation that has some merits but is very confusing, that method of depiction is widely used in images of multiple electrons. Here, for instance, two of the lithium atom’s electrons are depicted as though they have wave functions sharing the yellow region (the “inner shell”), while the third is drawn as though it has a wave function occuping the [somewhat overlapping] blue region (the “next shell”). [The atomic nucleus is shown in red, but far larger than it actually is.] Something similar is done in this image of the electrons in oxygen from a chemistry class.)

Yet the meat of the quantum lithium atom lies in the fact that there’s actually only one wave function for the entire system, not three. Most notably, the Pauli exclusion principle, which is responsible for keeping the electrons from all doing the same things and leads to the shell-like structure, makes sense only because there’s only one wave function for the system. And so, the usual visual depictions of the three electrons in the atom are all inherently misleading.

Yet there’s no visual image that can replace them that is both correct and practical. And that’s a real problem.

That said, it is possible to use visual images for two objects traveling in one dimension, as I did in a recent article that explains what it means for a system of two particles to have only one wave function. But for today, we can set this particular issue aside.

What We Can’t Draw Can Hurt Our Brains

Like most interesting experiments, the underyling quantum physics of the quantum double slit experiment cannot be properly drawn. But depicting it somehow, or at least parts of it, will be crucial in understanding how it works. Most existing images that are made to try to explain it leave out important conceptual points. The challenge for me — not yet solved — is to find a better one.

In this post, I’ll start the process, opening a conversation with readers about what people do and don’t understand about this experiment, about what’s often said about it that is misleading or even wrong, and about why it’s so hard to draw anything that properly represents it. Over the year I expect to come back to the subject occasionally. With luck, I’ll find a way to describe this experiment to my satisfaction, and maybe yours, before the end of the year. I don’t know if I’ll succeed. Even if I do, the end product won’t be short, sweet and simple.

But let’s start at the beginning, with the conventional story of the quantum double-slit experiment. The goal here is not so much to explain the experiment — there are many descriptions of it on the internet — but rather to focus on exactly what we say and think about it. So I encourage you to read slowly and pay very close attention; in this business, every word can matter.

Observing the Two Slits and the Screen

We begin by throwing an ultra-microscopic object — perhaps a photon, or an electron, or a neutrino — toward a wall with two narrow, closely spaced slits cut in it. (The details of how we do this are not very important, although we do need to choose the slits and the distance to the screen with some care.) If the object manages to pass through the wall, then on the other side it continues onward until it hits a phosphorescent screen. Where it strikes the screen, the screen lights up. This is illustrated in Fig. 1, where several such objects are showing being sent outward from the left; a few pass through the slits and cause the screen to light up where they arrive.

Figure 1: Microscopic objects are emitted from a device at left and travel (orange arrows) toward a wall (grey) with two narrow slits in it. Each object that passes through the slits reaches a screen (black) where it causes the screen to light up with an orange flash.

If we do this many times and watch the screen, we’ll see flashes randomly around the screen, something like what is shown in Fig. 2:

Figure 2: (click to animate if necessary): The screen flickers with little dots, one for each object that impacts it.

But now let’s keep a record of where the flashes on the screen appear; that’s shown in Fig. 3, where new flashes are shown in orange and past flashes are shown in blue. When we do this, we’ll see a strange pattern emerge, seen not in each individual flash but over many flashes, growing clearer as the number of flashes increases. This pattern is not simply a copy of the shape of the two slits.

Figure 3 (click to animate if necessary): Same as Fig. 2, except that we record the locations of past flashes, revealing a surprising pattern.

After a couple of thousand flashes, we’ll recognize that the pattern is characteristic of something known as interference (discussed further in Figs. 6-7 below):

Figure 4: The interference pattern that emerges after thousands of objects have passed through the slits.

By the way, there’s nothing hypothetical about this. Performing this experiment is not easy, because both the source of the objects and the screen are delicate and expensive. But I’ve seen it done, and I can confirm that what I’ve told you is exactly what one observes.

Trying to Interpret the Observations

The question is: given what is observed, what is actually happening as these microscopic objects proceed from source through slits to screen? and what can we infer about their basic properties?

We can conclude right away that the objects are not like bullets — not like “particles” in the traditional sense of a localized object that travels upon a definite path. If we fired bullets or threw tiny balls at the slitted wall, the bullets or balls would pass through the two slits and leave two slit-shaped images on the screen behind them, as in Fig. 5.

Figure 5: If balls, bullets or other particle-like objects are thrown at the wall, those that pass through the slits will arrive at the screen in two slit-shaped regions.

Nor are these objects ripples, meaning “waves” of some sort. Caution! Here I mean what scientists and recording engineers mean by “wave”: not a single wave crest such as you’d surf at a beach, but rather something that is typically a series of wave crests and troughs. (Sometimes we call this a “wave set” in ordinary English.)

If each object were like a wave, we’d see no dot-like flashes. Instead each object would leave the interference pattern seen in Fig. 4. This is illustrated in Fig. 6 and explained in Fig. 7. A wave (consisting of multiple crests and troughs) approaches the slits from the left in Fig. 6. After it passes through the slits, a striking pattern appears on the screen, with roughly equally spaced bright and dark regions, the brightest one in the center.

Figure 6: If a rippling pattern — perhaps one of sound waves or of water waves — is sent toward the wall, what appears on the screen will be an interference pattern similar to that of Fig. 4. See Fig. 7 for the explanation. The bright zones on the screen may flicker, but the dark zones will always be dark.

Where does the interference pattern come from? This is clearest if we look at the system from above, as in Fig. 7. The wave is coming in from the left, as a linear set of ripples, with crests in blue-green and troughs in red. The wall (represented in yellow) has two slits, from which emerge two sets of circular ripples. These ripples add and subtract from one another, making a complex, beautiful “interference” pattern. When this pattern reaches the screen at the opposite wall, it creates a pattern on the screen similar to that sketched in Fig. 6, with some areas that actively flicker separated by areas that are always dark.

Fig. 7: The interference pattern created by a linear wave pattern passing through two slits, as depicted from above. The two slits convert the linear ripples to two sets of circular ripples, which cross paths and interfere. When the resulting pattern arrives at the screen at right, some areas flicker, while others between them always remain quiet. A similar pattern of activity and darkness, though with some different details (notably fewer dark and bright areas), is seen in Figs. 3, 4 and 6. Credit: Lookang, with many thanks to Fu-Kwun Hwang and author of Easy Java Simulation = Francisco Esquembre, CC BY-SA 3.0 Creative Commons license via Wikimedia Commons

It’s important to notice that the center of the phosphorescent screen is dark in Fig. 5 and bright in Fig. 6. The difference between particle-like bullets and wave-like ripples is stark.

And yet, whatever objects we’re dealing with in Figs. 2-4, they are clearly neither like the balls of Fig. 5 nor the waves of Fig. 6. Their arrival is marked with individual flashes, and the interference pattern builds up flash by flash; one object alone does not reveal the pattern. Strangely, each object seems to “know” about the pattern. After all, each one, independently, manages to avoid the dark zones and to aim for one of the bright zones.

How can these objects do this? What are they?

What Are These Objects?!

According to the conventional wisdom, Fig. 2 proves that the objects are somewhat like particles. When each object hits the wall, it instantaneously causes a single, tiny, localized flash on the screen, showing that it is itself a single, tiny, point-like object. It’s like a bullet leaving a bullet-hole: localized, sudden, and individual.

According to the conventional wisdom, Figs. 3-4 prove that the objects are somewhat like waves. They leave the same pattern that we would see if ocean swell were passing through two gaps in a harbor’s breakwater, as in Fig. 7. Interference patterns are characteristic only of waves. And because the interference pattern builds up over many independent flashes, occurring at different times, each object seems to “know,” independent of the others, what the interference pattern is. The logical conclusion is that each object interferes with itself, just as the waves of Figs. 6-7 do; otherwise how could each object “know” anything about the pattern? Interfering with oneself is something a wave can do, but a bullet or ball or anything else particle-like certainly cannot.

To review:

  • A set of particles going through two slits wouldn’t leave an interference pattern; it would leave the pattern we’d expect of a set of bullets, as in Fig. 5.
  • But waves going through two slits wouldn’t leave individual flashes on a screen; each wave would interfere with itself and leave activity all over the screen, with stronger and weaker effects in a predictable interference pattern, as in Figs. 6-7.

It’s as though the object is a wave when it goes through and past the slits, and turns into a particle before it hits the screen. (Note my careful use of the words “as though”; I did not say that’s what actually happens.)

And thus, according to the conventional wisdom, each object going through the slits is… well… depending on who you talk to or read…

  • both a wave and a particle, or
  • sometimes a wave and sometimes a particle, or
  • equally wave and particle, or
  • a thing with wave-like properties and with particle-like properties, or
  • a particle-like thing described by a probability-wave (also known as a “wave function”), or
  • a wave-like thing that can only be absorbed like a particle, or…
  • ???

So… which is it?

Or is it any of the above?

Looking More Closely

We could try to explore this further. For instance, we could try to look more closely at what is going on, by asking whether our object is a particle that goes through one slit or is a wave that goes through both.

Figure 8: We might try to investigate further, by adding sensors just behind the slits, to see whether each object goes through one slit (as for a bullet) or goes through both (as for a sound wave). With certain sensors, we will find it goes through only one — but in this case, what appears on the screen will also change! We will see not what is in Fig. 4 but rather what appears in Fig. 9.

But the very process of looking at the object to see what slit it went through changes the interference pattern of Figs. 4 and 6 into the pattern in Fig. 5, shown in Fig. 9, that we’d expect for particles. We find two blobs, one for each slit, and no noticeable interference. It’s as though, by looking at an ocean wave, we turned it into a bullet, whereas when we don’t look at the ocean wave, it remains an ocean wave as it goes through the gaps, and only somehow coalesces into a bullet before it hits (or as it hits) the screen.

Figure 9: If sensors are added to try to see which slit each object passes through (or both), the pattern seen on the screen changes to look more like that of Fig. 5, and no clarity as to the nature of the objects or the process they are undergoing is obtained.

Said another way: it seems we cannot passively look at the objects. Looking at them is an active process, and it changes how they behave.

So this really doesn’t clarify anything. If anything, it muddies the waters further.

What sense can we make of this?

Before we even begin to try to make a coherent understanding out of this diverse set of observations, we’d better double-check that the logic of the conventional wisdom is accurate in the first place. To do that, each of us should read very carefully and think very hard about what has been observed and what has been written about it. For instance, in the list of possible interpretations given above, do the words “particle” and “wave” always mean what we think they do? They have multiple meanings even in English, so are we all thinking and meaning the same thing when we describe something as, say, “sometimes a wave and sometimes a particle”?

If we are very careful about what is observed and what is inferred from what is observed, as well as the details of language used to communicate that information, we may well worry about secret and perhaps unjustified assumptions lurking in the conventional wisdom.

For instance, does the object’s behavior at the screen, as in Fig. 2, really resemble a bullet hitting a wall? Is its interaction with the screen really instantaneous and tiny? Are its effects really localized and sudden?

Exactly how localized and sudden are they?

All we saw at the screen is a flash that is fast by human standards, and localized by human standards. But why would we apply human standards to something that might be smaller than an atom? Should we instead be judging speed and size using atomic standards? Perhaps even the standards of tiny atomic nuclei?

If our objects are among those things usually called “elementary particles” — such as photons, electrons, or neutrinos — then the very naming of these objects as “elementary particles” seems to imply that they are smaller than an atom, and even than an atom’s nucleus. But do the observations shown in Fig. 2 actually give some evidence that this is true? And if not… well, what do they show?

What do we precisely mean by “particle”? By “elementary particle”? By “subatomic particle”?

What actually happened at the slits? at the screen? between them? Can we even say, or know?

These are among the serious questions that face us. Something strange is going on, that’s for sure. But if we can’t first get our language, our logic, and our thinking straight — and as a writer, if I don’t choose and place every single word with great care — we haven’t a hope of collectively making sense of quantum physics. And that’s why this on-and-off discussion will take us all of 2025, at a minimum. Maybe it will take the rest of the decade. This is a challenge for the human mind, both for novices and for experts.

Matt von HippelWays Freelance Journalism Is Different From Academic Writing

A while back, I was surprised when I saw the writer of a well-researched webcomic assume that academics are paid for their articles. I ended up writing a post explaining how academic publishing actually works.

Now that I’m out of academia, I’m noticing some confusion on the other side. I’m doing freelance journalism, and the academics I talk to tend to have some common misunderstandings. So academics, this post is for you: a FAQ of questions I’ve been asked about freelance journalism. Freelance journalism is more varied than academia, and I’ve only been doing it a little while, so all of my answers will be limited to my experience.

Q: What happens first? Do they ask you to write something? Do you write an article and send it to them?

Academics are used to writing an article, then sending it to a journal, which sends it out to reviewers to decide whether to accept it. In freelance journalism in my experience, you almost never write an article before it’s accepted. (I can think of one exception I’ve run into, and that was for an opinion piece.)

Sometimes, an editor reaches out to a freelancer and asks them to take on an assignment to write a particular sort of article. This happens more freelancers that have been working with particular editors for a long time. I’m new to this, so the majority of the time I have to “pitch”. That means I email an editor describing the kind of piece I want to write. I give a short description of the topic and why it’s interesting. If the editor is interested, they’ll ask some follow-up questions, then tell me what they want me to focus on, how long the piece should be, and how much they’ll pay me. (The last two are related, many places pay by the word.) After that, I can write a draft.

Q: Wait, you’re paid by the word? Then why not make your articles super long, like Victor Hugo?

I’m paid per word assigned, not per word in the finished piece. The piece doesn’t have to strictly stick to the word limit, but it should be roughly the right size, and I work with the editor to try to get it there. In practice, places seem to have a few standard size ranges and internal terminology for what they are (“blog”, “essay”, “short news”, “feature”). These aren’t always the same as the categories readers see online. Some places have a web page listing these categories for prospective freelancers, but many don’t, so you have to either infer them from the lengths of articles online or learn them over time from the editors.

Q: Why didn’t you mention this important person or idea?

Because pieces pay more by the word, it’s easier as a freelancer to sell shorter pieces than longer ones. For science news, favoring shorter pieces also makes some pedagogical sense. People usually take away only a few key messages from a piece, if you try to pack in too much you run a serious risk of losing people. After I’ve submitted a draft, I work with the editor to polish it, and usually that means cutting off side-stories and “by-the-ways” to make the key points as vivid as possible.

Q: Do you do those cool illustrations?

Academia has a big focus on individual merit. The expectation is that when you write something, you do almost all of the work yourself, to the extent that more programming-heavy fields like physics and math do their own typesetting.

Industry, including journalism, is more comfortable delegating. Places will generally have someone on-staff to handle illustrations. I suggest diagrams that could be helpful to the piece and do a sketch of what they could look like, but it’s someone else’s job to turn that into nice readable graphic design.

Q: Why is the title like that? Why doesn’t that sound like you?

Editors in journalistic outlets are much more involved than in academic journals. Editors won’t just suggest edits, they’ll change wording directly and even input full sentences of their own. The title and subtitle of a piece in particular can change a lot (in part because they impact SEO), and in some places these can be changed by the editor quite late in the process. I’ve had a few pieces whose title changed after I’d signed off on them, or even after they first appeared.

Q: Are your pieces peer-reviewed?

The news doesn’t have peer review, no. Some places, like Quanta Magazine, do fact-checking. Quanta pays independent fact-checkers for longer pieces, while for shorter pieces it’s the writer’s job to verify key facts, confirming dates and the accuracy of quotes.

Q: Can you show me the piece before it’s published, so I can check it?

That’s almost never an option. Journalists tend to have strict rules about showing a piece before it’s published, related to more political areas where they want to preserve the ability to surprise wrongdoers and the independence to find their own opinions. Science news seems like it shouldn’t require this kind of thing as much, it’s not like we normally write hit pieces. But we’re not publicists either.

In a few cases, I’ve had people who were worried about something being conveyed incorrectly, or misleadingly. For those, I offer to do more in the fact-checking stage. I can sometimes show you quotes or paraphrase how I’m describing something, to check whether I’m getting something wrong. But under no circumstances can I show you the full text.

Q: What can I do to make it more likely I’ll get quoted?

Pieces are short, and written for a general, if educated, audience. Long quotes are harder to use because they eat into word count, and quotes with technical terms are harder to use because we try to limit the number of terms we ask the reader to remember. Quotes that mention a lot of concepts can be harder to find a place for, too: concepts are introduced gradually over the piece, so a quote that mentions almost everything that comes up will only make sense to the reader at the very end.

In a science news piece, quotes can serve a couple different roles. They can give authority, an expert’s judgement confirming that something is important or real. They can convey excitement, letting the reader see a scientist’s emotions. And sometimes, they can give an explanation. This last only happens when the explanation is very efficient and clear. If the journalist can give a better explanation, they’re likely to use that instead.

So if you want to be quoted, keep that in mind. Try to say things that are short and don’t use a lot of technical jargon or bring in too many concepts at once. Convey judgement, which things are important and why, and convey passion, what drives you and excited you about a topic. I am allowed to edit quotes down, so I can take a piece of a longer quote that’s cleaner or cut a long list of examples from an otherwise compelling statement. I can correct grammar and get rid of filler words and obvious mistakes. But I can’t put words in your mouth, I have to work with what you actually said, and if you don’t say anything I can use then you won’t get quoted.

John BaezObelisks

Wow! Biologists seem to have discovered an entirely new kind of life form. They’re called ‘obelisks’, and you probably have some in you.

They were discovered in 2024—not by somebody actually seeing one, but by analyzing huge amounts of genetic data from the human gut. This search found 29,959 new RNA sequences, similar to each other, but very different from any previously known. Thus, we don’t know where these things fit into the tree of life!

Biologists found them when they were trying to solve a puzzle. Even smaller than viruses, there exist ‘viroids’ that are just loops of RNA that cleverly manage to reproduce using the machinery of the cell they infect. Viruses have a protein coat. Viroids are just bare RNA—it doesn’t even code for any proteins!

But all known viroids only infect plants. The first one found causes a disease in potatoes; another causes a disease in avocados, and so on. This raised the puzzle: why aren’t there viroids that infect bacteria, or animals?

Now perhaps we’ve found them! But not quite: while obelisks may work in a similar way, they seem genetically unrelated. Also, their RNA seems to code for two proteins.

Given how little we know about this stuff, I think some caution is in order. Still, this is really cool. Do any of you biologists out there know any research going on now to learn more?

The original paper is free to read on the bioRxiv:

• Ivan N. Zheludev, Robert C. Edgar, Maria Jose Lopez-Galiano, Marcos de la Peña, Artem Babaian, Ami S. Bhatt, Andrew Z. Fire, Viroid-like colonists of human microbiomes, Cell 187 (2024), 6521-6536.e18.

I see just one other paper, about an automated system for detecting obelisks:

• Frederico Schmitt Kremer and Danielle Ribeiro de Barros, Tormentor: an obelisk prediction and annotation pipeline.

There’s also a budding Wikipedia article on obelisks:

• Wikipedia, Obelisk.

Thanks to Mike Stay for pointing this out!

January 16, 2025

John BaezThe Formal Gardens, and Beyond

I visited an old estate today
Whose gardens, much acclaimed throughout the world,
Spread out beyond the gated entranceway
In scenic splendors gradually unfurled.
Bright potted blooms sprung beaming by the drive,
While further off, large topiary yews
Rose stoutly in the air. The site, alive
With summer, traded sunned and shaded views.
This composition—classical, restrained—
Bespoke the glories of a golden age:
An equilibrium perchance ordained
By god-directors on an earthly stage.
I set off lightly down a stone-laid path
With boxwood gathered cloudlike at each side;
Here, no hint of nature’s chastening wrath
Impinged upon man’s flourishes of pride.
Proceeding, I pressed forward on a walk
Of gravel, through an arch of climbing vines
And ramblers—where a long, inquiring stalk
Of fragrant musk rose, with its fuzzy spines,
Pressed toward my face with luscious pale-pink blooms.
Aha! At once, I almost could distill
A wayward slant within these outdoor rooms!
Though order reigned, I glimpsed some riot spill
From bush and bending bough. Each bordered sward
Spoke elegance, encasing gemlike pools—
Appointed, each, with cherubs keeping guard—
Yet moss had gathered round the sculpted stools.
Still rapt by these core plats, I shortly passed
Into the grounds beyond—deep green and vast.
Mincing, I issued from that inner fold
To find ahead a trilling rivulet
With flowers on either side; yet how controlled
Was even this—a cautious, cool vignette!
Nonetheless, as in some scattered spots
I’d seen before, there crept a shaggy clump
Of unmown grass—a few forgotten blots
Upon this bloom-besprinkled sphere; a bump
Of wild daylilies, mowed along with lawn;
And, further yet afield, a sprawling mire
Spread forth, less fettered still. Here berries’ brawn
Arose obscenely through the bulbs—each briar
Announcing more desuetude in this place—
Until, at once, stopped short all tended space.
There meadows shot up—shaggy, coarse, and plain;
Wiry weeds and scabbed grass, and the mad buzz
Of insects’ millions; here the splattering rain
Had mothered slug and bug and mugworts’ fuzz.
Beyond, thick woodlands, reckless and abrupt,
Loomed, calling, “Ho, enough of tended stuff!
We are what’s real!”—threatening to erupt—
Ah, dizzy, blowsy trees—nature’s high rough
Abandon!
And so I mused upon the human mind:
Its own mild garths and cool Augustan plots
Were laid for promenades—sedate, refined—
A genteel garden park, it seems, of thoughts.
Or so it might appear—yet gather close,
O marveling guest, round something slantwise spied:
An errant feature free of plan or pose—
A rankling thing you’d wish you hadn’t eyed!
Here, stark, the prankster stands—perhaps a spire
Of malice rising prideful in the air;
Perhaps a wild confusion of desire;
Perhaps a raw delusion, none too rare.
Unchecked, untrimmed, they hint at countless more
Uncomely details sprung at every edge
Of reason’s fair cross-axis, past the door
Of harmony’s last stand; truth’s final hedge.
Observe: my own best traits were raised by force
In soil hauled in from some more fertile strand.
My consciousness, when nature takes its course,
Still bristles, as if tended by no hand.
Stripped were the grounds from which my grace was carved;
Spaded, seeded, hoed its gaping womb—
Beaten and blazed its weeds; its vermin starved
To press my brain toward paths and gorgeous bloom.
Friend, pass no further from my watered spheres,
Well-groomed to please men’s civil hands and eyes!
For past these bounds, a roiling madness rears;
The way of chaos rules and wilds arise!


This was written by my sister Alexandra Baez, who died on Saturday, September 21st, 2024. She had cancer of the tongue, which she left untreated for too long, and it metastatized: in the end, despite radiation therapy and chemotherapy, she was asphixiated by a tumor in her throat.

She had her own landscape maintenance business, which fit perfectly with her deep love and knowledge of plants. She put a lot of energy into formal poetry—that is, poetry with a strict meter and rhyme scheme. This poem is longer than most of hers, and I think this length helps bring the reader into the world of the formal garden, and then out of it into wild world of nature: the real.

January 15, 2025

n-Category Café The Dual Concept of Injection

We’re brought up to say that the dual concept of injection is surjection, and of course there’s a perfectly good reason for this. The monics in the category of sets are the injections, the epics are the surjections, and monics and epics are dual concepts in the usual categorical sense.

But there’s another way of looking at things, which gives a different answer to the question “what is the dual concept of injection?”

This different viewpoint comes from the observation that in the category of sets, the injections are precisely the coproduct inclusions (coprojections)

AA+B. A \to A + B.

Certainly every coproduct inclusion in the category of sets is injective. And conversely, any injection is a coproduct inclusion, because every subset of a set has a complement.

So, the injections between sets are the specialization to Set\mathbf{Set} of the general categorical concept of coproduct coprojection. The dual of that general categorical concept is product projection. Hence one can reasonably say that the dual concept of injection is that of product projection

A×BA, A \times B \to A,

where AA and BB are sets.

Which maps of sets are product projections? It’s not hard to show that they’re exactly the functions f:XYf: X \to Y whose fibres are all isomorphic:

f 1(y)f 1(y) f^{-1}(y) \cong f^{-1}(y')

for all y,yYy, y' \in Y. You could reasonably call these “uniform” maps, or “even coverings”, or maybe there’s some other good name, but in this post I’ll just call them projections (with the slightly guilty feeling that this is already an overworked term).

Projections are always surjections, except in the trivial case where the fibres are all empty; this happens when our map is of the form Y\varnothing \to Y. On the other hand, most surjections aren’t projections. A surjection is a function whose fibres are nonempty, but there’s no guarantee that all the fibres will be isomorphic, and usually they’re not.

A projection appears in a little puzzle I heard somewhere. Suppose someone hands you a tangle of string, a great knotted mass with who knows how many bits of string all jumbled together. How do you count the number of pieces of string?

The slow way is to untangle everything, separate the strands, then count them. But the fast way is to simply count the ends of the pieces of string, then divide by two. The point is that there’s a projection — specifically, a two-to-one map — from the set of ends to the set of pieces of string.

String theory aside, does this alternative viewpoint on the dual concept of injection have any importance? I think it does, at least a little. Let me give two illustrations.

Factorization, graphs and cographs

It’s a standard fact that any function between sets can be factorized as a surjection followed by an injection, and the same is true in many other familiar categories. But here are two other methods of factorization, dual to one another, that also work in many familiar categories. One involves injections, and the other projections.

  • In any category with finite coproducts, any map f:XYf: X \to Y factors as

    XX+Y(f1)Y, X \to X + Y \stackrel{\binom{f}{1}}{\to} Y,

    where the first map is the coproduct coprojection. This is a canonical factorization of ff as a coproduct coprojection followed by a (canonically) split epic. In Set\mathbf{Set}, it’s a canonical factorization of a function ff as an injection followed by a surjection (or “split surjection”, if you don’t want to assume the axiom of choice).

    Unlike the usual factorization involving a surjection followed by an injection, this one isn’t unique: there are many ways to factor ff as an injection followed a surjection. But it is canonical.

  • In any category with finite products, any map f:XYf: X \to Y factors as

    X(1,f)X×YY, X \stackrel{(1, f)}{\to} X \times Y \to Y,

    where the second map is the product projection. This is a canonical factorization of ff as a split monic followed by a projection. In Set\mathbf{Set}, “split monic” just means a function with a retraction, or concretely, an injection with the property that if the domain is empty then so is the codomain.

    (I understand that in some circles, this or a similar statement is called the “fundamental theorem of reversible computing”. This seems like quite a grand name, but I don’t know the context.)

    Again, even in Set\mathbf{Set}, this factorization is not unique: most functions can be factored as a split monic followed by a projection in multiple ways. But again, it is canonical.

These factorizations have a role in basic set theory. Let’s consider the second one first.

Take a function f:XYf: X \to Y. The resulting function

(1,f):XX×Y (1, f): X \to X \times Y

is injective, which means it represents a subset of X×YX \times Y, called the graph of ff. A subset of X×YX \times Y is also called a relation between XX and YY; so the graph of ff is a relation between XX and YY.

The second factorization shows that ff can be recovered from its graph, by composing with the projection X×YYX \times Y \to Y. Thus, the set of functions from XX to YY embeds into the set of relations between XX and YY. Which relations between XX and YY correspond to functions? The answer is well-known: it’s the set of relations that are “functional”, meaning that each element of XX is related to exactly one element of YY.

This correspondence between functions and their graphs is very important for many reasons, which I won’t go into here. But it has a much less well known dual, which involves the first factorization.

Here I have to take a bit of a run-up. The injections into a set SS, taken up to isomorphism over SS, correspond to the subsets of SS. Dually, the surjections out of SS, taken up to isomorphism under SS, correspond to the equivalence relations on SS. (This is more or less the first isomorphism theorem for sets.) And just as we define a relation between sets XX and YY to be a subset of X×YX \times Y, it’s good to define a corelation between XX and YY to be an equivalence relation on X+YX + Y.

Now, given a function XYX \to Y, the resulting function

(f1):X+YY \binom{f}{1}: X + Y \to Y

is surjective, and so represents an equivalence relation on X+YX + Y. This equivalence relation on X+YX + Y is called the cograph of ff, and is a corelation between XX and YY.

The first of the two factorizations above shows that ff can be recovered from its cograph, by composing with the injection XX+YX \to X + Y. Thus, the set of functions from XX to YY embeds into the set of corelations between XX and YY. Which corelations between XX and YY correspond to functions? This isn’t so well known, but — recalling that a corelation between XX and YY is an equivalence relation on X+YX + Y — you can show that it’s the corelations with the property that every equivalence class contains exactly one element of YY.

I’ve digressed enough already, but I can’t resist adding that the dual graph/cograph approaches correspond to the two standard types of picture that we draw when talking about functions: graphs on the left, and cographs on the right.

        

For example, cographs are discussed in Lawvere and Rosebrugh’s book Sets for Mathematics.

Back to the injection-projection duality! I said I’d give two illustrations of the role it plays. That was the first one: the canonical factorization of a map as an injection followed by a split epic or a split monic followed by a projection. Here’s the second.

Free and cofree presheaves

Let AA be a small category. If you’re given a set P(a)P(a) for each object aa of AA then this does not, by itself, constitute a functor ASetA \to \mathbf{Set}. A functor has to be defined on maps too. But PP does generate a functor ASetA \to \mathbf{Set}. In fact, it generates two of them, in two dual universal ways.

More exactly, we’re starting with a family (P(a)) aA(P(a))_{a \in A} of sets, which is in object of the category Set obA\mathbf{Set}^{ob\ A}. There’s a forgetful functor

Set ASet obA, \mathbf{Set}^A \to \mathbf{Set}^{ob\ A},

and the general lore of Kan extensions tells us that it has both a left adjoint LL and a right adjoint RR. Better still, it provides explicit formulas: the functors L(P),R(P):ASetL(P), R(P): A \to \mathbf{Set} are given by

(L(P))(a)= bA(b,a)×P(b) (L(P))(a) = \sum_b A(b, a) \times P(b)

(where \sum means coproduct or disjoint union), and

(R(P))(a)= bP(b) A(a,b). (R(P))(a) = \prod_b P(b)^{A(a, b)}.

I’ll call L(P)L(P) the free functor on PP, and R(P)R(P) the cofree functor on PP.

Example   Let GG be a group, seen as a one-object category. Then we’re looking at the forgetful functor Set GSet\mathbf{Set}^G \to \mathbf{Set} from GG-sets to sets, which has both adjoints.

The left adjoint LL sends a set PP to G×PG \times P with the obvious GG-action, which is indeed usually called the free GG-set on PP.

The right adjoint RR sends a set PP to the set P GP^G with its canonical action: acting on a family (p g) gGP G(p_g)_{g \in G} \in P^G by an element uGu \in G produces the family (p gu) gG(p_{g u})_{g \in G}. This is a cofree GG-set.

Cofree group actions are important in some parts of the theory of dynamical systems, where GG is often the additive group \mathbb{Z}. A \mathbb{Z}-set is a set equipped with an automorphism. The cofree \mathbb{Z}-set on a set PP is the set R(P)=P R(P) = P^{\mathbb{Z}} of double sequences of elements of the “alphabet” PP, where the automorphism shifts a sequence along by one. This is called the full shift on PP, and it’s a fundamental object in symbolic dynamics.

What’s this got to do with the injection-projection duality? The next example gives a strong clue.

Example   Let AA be the category (01)(0 \to 1) consisting of a single nontrivial map. Then an object of Set obA\mathbf{Set}^{ob\ A} is a pair (P 0,P 1)(P_0, P_1) of sets, and an object of Set A\mathbf{Set}^A is a function X 0fX 1X_0 \stackrel{f}{\to} X_1 between a pair of sets.

The free functor ASetA \to \mathbf{Set} on PP is the injection

AA+B. A \to A + B.

The cofree functor ASetA \to \mathbf{Set} on PP is the projection

A×BB. A \times B \to B.

More generally, it’s a pleasant exercise to prove:

  • Any free functor X:ASetX: A \to \mathbf{Set} preserves monics. In other words, if uu is a monic in AA then X(u)X(u) is an injection.

  • Any cofree functor X:ASetX: A \to \mathbf{Set} sends epics to projections. In other words, if uu is an epic in AA then X(u)X(u) is a projection.

These necessary conditions for being a free or cofree functor aren’t sufficient. (If you want a counterexample, think about group actions.) But they give you some feel for what free and cofree functors are like.

For instance, if AA is the free category on a directed graph, then every map in AA is both monic and epic. So when X:ASetX: A \to \mathbf{Set} is free, the functions X(u)X(u) are all injections, for every map uu in AA. When XX is cofree, they’re all projections. This imposes a severe restriction on which functors can be free or cofree.

(Question: do quiver representation people ever look at cofree representations? Here we’d replace Set\mathbf{Set} by Vect\mathbf{Vect}.)

In another post, I’ll explain why I’ve been thinking about this.

Jordan EllenbergSurprises of Mexico

We recently got back from a family trip to Mexico, a country I’d almost never been to (a couple of childhood day trips to Nogales, a walk across the border into Juárez in 1999, a day and a half in Cabo San Lucas giving a lecture.) I’m a fan! Some surprises:

  1. Drinking pulque under a highway bridge with our family (part of the food tour!) I heard this incredible banger playing on the radio:

“Reaktorn läck i Barsebäck” doesn’t sound like the name of a Mexican song, and indeed, this is a 1980 album track by Sweidish raggare rocker Eddie Meduza, which, for reasons that seem to be completely unknown, is extremely popular in Mexico, where it is (for reasons that, etc etc) known as “Himno a la Banda.”

2. Mexico had secession movements. Yucatán was an independent state in the 1840s. (One of our guides, who moved there from Mexico City, told us it still feels like a different country.) And a Maya revolt in the peninsula created a de facto independent state for decades in the second half of the 19th century. Apparently the perceived weakness of the national government was one cause of the Pastry War.

3. Partly as a result of the above, sisal plantation owners in Yucatán were short of indentured workers, so they imported 1,000 desperate Korean workers in 1905. By the time their contracts ended, independent Korea had been overrun by Japan. So they stayed in Mexico and their descendants still live there today.

4. It is customary in Mexico, or at least in Mexico City, to put a coat rack right next to the table in a restaurant. I guess that makes sense! Why wouldn’t you want your coat right there, and isn’t it nicer for it to be on a rack than hung over the back of your chair?

5. The torta we usually see in America, on a big round roll, is a Central Mexico torta. In Yucatán, a torta is served on French bread, because the filling is usually cocinita pibil, which is so juicy it would make a soft roll soggy. A crusty French roll can soak up the juices without losing its structural integrity. The principle is essentially the same as that of an Italian beef.

6. The Mesoamerican ballgame is not only still played throughout the region, there’s a World Cup of it.

Andrew JaffeThe Only Gaijin in the Onsen

After living in Japan for about four months, we left in mid-December. We miss it already.

One of the pleasures we discovered is the onsen, or hot spring. Originally referring to the natural volcanic springs themselves, and the villages around them, there are now onsens all over Japan. Many hotels have an onsen, and most towns will have several. Some people still use them as their primary bath and shower for keeping clean. (Outside of actual volcanic locations, these are technically sento rather than onsen.) You don’t actually wash yourself in the hot baths themselves; they are just for soaking, and there are often several, at different temperatures, mineral content, indoor and outdoor locations, whirlpools and even “electric baths” with muscle-stimulating currents. For actual cleaning, there is a bank of hand showers, usually with soap and shampoo. Some can be very basic, some much more like a posh spa, with massages, saunas, and a restaurant.

Our favourite, about 25 minutes away by bicycle, was Kirari Onsen Tsukuba. When not traveling, we tried to go every weekend, spending a day soaking in the hot water, eating the good food, staring at the gardens, snacking on Hokkaido soft cream — possibly the best soft-serve ice cream in the world (sorry, Carvel!), and just enjoying the quiet and peace. Even our seven- and nine-year old girls have found the onsen spirit, calming and quieting themselves down for at least a few hours.

Living in Tsukuba, lovely but not a common tourist destination, although with plenty of foreigners due to the constellation of laboratories and universities, we were often one of only one or two western families in our local onsen. It sometimes takes Americans (and those from other buttoned-up cultures) some time to get used to their sex-segregated but fully-naked policies of the baths themselves. The communal areas, however, are mixed, and fully-clothed. In fact, many hotels and fancier onsen facilities supply a jinbei, a short-sleeve pyjama set in which you can softly pad around the premises during your stay. (I enjoyed wearing jinbei so much that I purchased a lightweight cotton set for home, and am also trying to get my hands on samue, a somewhat heavier style of traditional Japanese clothing.)

And my newfound love for the onsen is another reason not to get a tattoo beyond the sagging flesh and embarrassment of my future self: in Japan, tattoos are often a symbol of the yakuza, and are strictly forbidden in the onsen, even for foreigners.

Later in our sabbatical, we will be living in the Netherlands, which also has a good public bath culture, but it will be hard to match the calm of the Japanese onsen.

January 14, 2025

Scott Aaronson Above my pay grade: Jensen Huang and the quantum computing stock market crash

Update (Jan. 13): Readers might enjoy the Bankless Podcast, in which I and Justin Drake of the Ethereum engineering team discuss quantum computing and its impact on cryptocurrency. I learned something interesting from Justin—namely that Satoshi has about $90 billion worth of bitcoin that’s never been touched since the cryptocurrency’s earliest days, much of which (added: the early stuff, the stuff not additionally protected by a hash function) would be stealable by anyone who could break elliptic curve cryptography—for example, by using a scalable quantum computer. At what point in time, if any, would this stash acquire the moral or even legal status of (say) gold doubloons just lying on the bottom of the ocean? Arrr, ’tis avast Hilbert space!


Apparently Jensen Huang, the CEO of NVIDIA, opined on an analyst call this week that quantum computing was plausibly still twenty years away from being practical. As a direct result, a bunch of publicly-traded quantum computing companies (including IonQ, Rigetti, and D-Wave) fell 40% or more in value, and even Google/Alphabet stock fell on the news.

So then friends and family attuned to the financial markets started sending me messages asking for my reaction, as the world’s semi-unwilling Quantum Computing Opiner-in-Chief.

My reaction? Mostly just that it felt really weird for all those billions of dollars to change hands, or evaporate, based on what a microchip CEO offhandedly opined about my tiny little field, while I (like much of that field) could’ve remained entirely oblivious to it, were it not for all of their messages!

But was Jensen Huang right in his time estimate? And, relatedly, what is the “correct” valuation of quantum computing companies? Alas, however much more I know about quantum computing than Jensen Huang does, the knowledge does not enable me to answer to either question.

I can, of course, pontificate about the questions, as I can pontificate about anything.

To start with the question of timelines: yes, there’s a lot still to be done, and twenty years might well be correct. But as I’ve pointed out before, within the past year we’ve seen 2-qubit gates with ~99.9% fidelity, which is very near the threshold for practical fault-tolerance. And of course, Google has now demonstrated fault-tolerance that becomes more and more of a win with increasing code size. So no, I can’t confidently rule out commercially useful quantum simulations within the next decade. Like, it sounds fanciful, but then I remember how fanciful it would’ve seemed in 2012 that we’d have conversational AI by 2022. I was alive in 2012! And speaking of which, if you really believe (as many people now do) AI will match or exceed human capabilities in most fields in the next decade, then that will scramble all the other timelines too. And presumably Jensen Huang understands these points as well as anyone.

Now for the valuation question. On the one hand, Shtetl-Optimized readers will know that there’s been plenty of obfuscation and even outright lying, to journalists, the public, and investors, about what quantum computing will be good for and how soon. To whatever extent the previous valuations were based on that lying, a brutal correction was of course in order, regardless of what triggered it.

On the other hand, I can’t say with certainty that high valuations are wrong! After all, even if there’s only a 10% chance that something will produce $100B in value, that would still justify a $10B valuation. It’s a completely different way of thinking than what we’re used to in academia.

For whatever it’s worth, my own family’s money is just sitting in index funds and CDs. I have no quantum computing investments of any kind. I do sometimes accept consulting fees to talk to quantum computing startups and report back my thoughts. When I do, my highest recommendation is: “these people are smart and honest, everything they say about quantum algorithms is correct insofar as I can judge, and I hope they succeed. I wouldn’t invest my own money, but I’m very happy if you or anyone else does.” Meanwhile, my lowest recommendation is: “these people are hypesters and charlatans, and I hope they fail. But even then, I can’t say with confidence that their valuation won’t temporarily skyrocket, in which case investing in them would presumably have been the right call.”

So basically: it’s good that I became an academic rather than an investor.


Having returned from family vacation, I hope to get back to a more regular blogging schedule … let’s see how it goes!

January 13, 2025

Matt Strassler Tonight! Mars Meets the Moon (and Ducks Behind It)

Tonight (January 13th) offers a wonderful opportunity for all of us who love the night sky, and also for science teachers. For those living within the shaded region of Fig. 1, the planet Mars will disappear behind the Moon, somewhere between 9 and 10 pm Eastern (6 and 7 pm Pacific), before reappearing an hour later. Most easily enjoyed with binoculars. (And, umm, without clouds, which will be my own limitation, I believe…)

For everyone else, look up anyway! Mars and the Moon will appear very close together, a lovely pair.

Figure 1: the region of Earth’s surface where Mars will be seen to disappear behind the Moon. Elsewhere Mars and the Moon will appear very close together, itself a beautiful sight. Image from in-the-sky.org.

Why is this Cool?

“Occultations”, in which a planet or star disappears behind our Moon, are always cool. Normally, even though we know that the planets and the Moon move across the sky, we don’t get to actually see the motion. But here we can really watch the Moon close in on Mars — a way to visually experience the Moon’s motion around the Earth. You can see this minute by minute with the naked eye until Mars gets so close that the Moon’s brightness overwhelms it. Binoculars will allow you to see much more. With a small telescope, where you’ll see Mars as a small red disk, you can actually watch it gradually disappear as the Moon crosses in front of it. This takes less than a minute.

A particularly cool thing about this particular occultation is that it is happening at full Moon. Occultations like this can happen at any time of year or month, but when they happen at full Moon, it represents a very special geometry in the sky. In particular, it means that the Sun, Earth, Moon and Mars lie in almost a straight line, as shown (not to scale!!!) in Fig. 2.

  • The Moon is full because it is fully lit from our perspective, which means that it must lie almost directly behind the Earth relative to the Sun. [If it were precisely behind it, then it would be in Earth’s shadow, leading to a lunar eclipse; instead it is slightly offset, as it is at most full Moons.]
  • And when the Moon covers Mars from our perspective, that must mean Mars lies almost directly behind the Moon relative to the Earth.

So all four objects must lie nearly in a line, a relatively rare coincidence.

Figure 2: (Distances and sizes not to scale!!) For a full Moon to block our sight of Mars, it must be that the Sun, Earth, Moon and Mars lie nearly in a line, so that the night side of the Earth sees the Moon and Mars as both fully lit and in the same location in the sky. This is quite rare.

What Does This Occultation Teach Us?

Aside from the two things I’ve already mentioned — that an occultation is an opportunity to see the Moon’s motion, and that an occultation at full Moon implies the geometry of Fig. 2 — what else can we learn from this event, considered both on its own and in the context of others like it?

Distances and Sizes

Let’s start with one very simple thing: Mars is obviously farther from Earth than is the Moon, since it passes behind it. In fact, the Moon has occultations with all the planets, and all of them disappear behind the Moon instead of passing in front of it. This is why it has been understood for millennia that the Moon is closer to Earth than any of the planets.

Less obvious is that the map in Fig. 1 teaches us the size of the Moon. That’s because the width of the band where the Moon-Mars meeting is visible is approximately the diameter of the Moon. Why is that? Simple geometry. I’ve explained this here.

“Oppositions” and Orbital Periods

The moment when Mars is closest to Earth and brightest in the sky is approximately when the Sun, Earth and Mars lie in a straight line, known as “opposition”. Fig. 2 implies that an occultation of a planet at full Moon can only occur at or around that planet’s opposition. And indeed, while today’s occultation occurs on January 13th, Mars’ opposition occurs on January 15th.

Oppositions are very interesting for another reason; you can use them to learn a planet’s year. Mars’ most recent oppositions (and the next ones) are given in Fig. 3. You notice they occur about 25-26 months apart — just a bit more than two years.

Figure 3: A list of Martian oppositions (when Mars lies exactly opposite the Sun from Earth’s perspective, as in Fig. 2) showing they occur a bit more than two years apart. From nakedeyeplanets.com. [The different size and brightness of Mars from one opposition to the next reflects that the planetary orbits are not perfect circles.]

This, in turn, implies something interesting, but not instantly obvious: the time between Martian oppositions tells us that a Martian year is slightly less than two Earth years. Why?

Fig. 4 shows what would happen if (a) a Martian year (the time Mars takes to orbit the Sun) were exactly twice as long as an Earth year, and (b) both orbits were perfect circles around the Sun. Then the time between oppositions would be exactly two Earth years.

Figure 4: If Mars (red) took exactly twice as long to orbit the Sun (orange) as does Earth (blue), then an opposition (top left) would occur every two Earth years (bottom). Because oppositions occur slightly more than 24 months apart, we learn that Mars’ orbit of the Sun — its year — is slightly less than twice Earth’s year. (Yes, that’s right!) Oppositions for Jupiter and Saturn occur more often because their years are even longer.

But neither (a) nor (b) is exactly true. In fact a Martian year is 687 days, slightly less than two Earth years, whereas the time between oppositions is slightly more than two Earth years. Why? It takes a bit of thought, and is explained in detail here (for solar conjuctions rather than oppositions, but the argument is identical.)

The Planets, Sun and Moon are In a Line — Always!

And finally, one more thing about occultations of planets by the Moon: they happen for all the planets, and they actually happen pretty often, though some are much harder to observe than others. Here is a partial list, showing occultations of all planets [except Neptune is not listed for some unknown reason], as well as occultations of a few bright stars, in our current period. Why are these events so common?

Well (although the news media seems not to be aware of it!) the Moon and the planets are always laid out roughly in a (curved) line across the sky, though not all are visible at the same time. Since the Moon crosses the whole sky once a month, the chance of it passing in front of a planet is not particularly small!

Why are they roughly in a line? This is because the Sun and its planets lie roughly in a disk, with the Earth-Moon system also oriented in roughly the same disk. A disk, seen from someone sitting inside it, look like a line that goes across the sky… or rather, a huge circle that goes round the Earth.

To get a sense of how this works, look at Fig. 5. It shows a flat disk, seen from three perspectives (left to right): first head on, then obliquely (where it appears as an ellipse), and finally from the side (where it appears as a line segment.) The closer we come to the disk, the larger it will appear — and thus the longer the line segment will appear in side view. If we actually enter the disk from the side, the line segment will appear to wrap all the way around us, as a circle that we sit within.

Figure 5: A disk, seen from three perspectives: (left) face on, (center) obliquely, and (right) from the side, where it appears as a line segment. The closer we approach the disk the longer, the line segment. If we actually enter the disk, the line segment will wrap all the way around us, and will appear as a circle that surrounds us. Upon the sky, that circle will appear as a curved line (not necessarily overhead) from one horizon to the other, before passing underneath us.

Specifically for the planets, this means the following. Most planetary systems with a single star have the star at the near-center and planets orbiting in near-circles, with all the orbits roughly in a disk around the star. This is shown in Fig. 6. Just as in Fig. 5, when the star and planets are viewed obliquely, their orbits form an ellipse; and when they are viewed from the side, their orbits form a line segment, as a result of which the planets lie in a line. When we enter the planetary disk, so that some planets sit farther from the Sun than we do, then this line becomes a circle that wraps around us. That circle is the ecliptic, and all the planets and the Sun always lie close to it.

Fig. 6: (Left) Planets (colored dots) orbiting a central star (orange) along orbits (black circles) that lie in a plane. (Center) the same system viewed obliquely. (Right) The same system viewed from the side, in which case the planets and the star always lie in a straight line. (See also Fig. 5.) Viewed from one of the inner planets, the other planets and the star would seem to lie on a circle wrapping around the planet, and thus on a line across the night sky.

Reversing the logic, the fact that we observe that the planets and Sun lie on a curved line across the sky teaches us that the planetary orbits lie in a disk. This, too, has been known for millennia, long before humans understood that the planets orbit the Sun, not the Earth.

(This is also true of our galaxy, the Milky Way, in which the Sun is just one of nearly a trillion stars. The fact that the Milky Way always forms a cloudy band across the sky provides evidence that our galaxy is in the shape of a disk, probably somewhat like this one.)

The Mysteries of the Moon

But why does the Moon also lie on the ecliptic? That is, since the Moon orbits the Earth and not the Sun, why does its orbit have to lie in the same disk as the planets all do?

This isn’t obvious at all! (Indeed it was once seen as evidence that the planets and Sun must, like the Moon, all orbit the Earth.) But today we know this orientation of the Moon’s orbit is not inevitable. The moons of the planet Uranus, for instance, don’t follow this pattern; they and Uranus’ rings orbit in the plane of Uranus’ equator, tipped almost perpendicular to the plane of planetary orbits.

Well, the fact that the Moon’s orbit is almost in the same plane as the planets’ orbits — and that of Earth’s equator — is telling us something important about Earth’s history and about how the Moon came to be. The current leading explanation for the Moon’s origin is that the current Earth and Moon were born from the collision of two planets. Those planets would have been traveling in the same plane as all the others, and if they suffered a glancing blow within that plane, then the debris from the collision would also have been mostly in that plane. As the debris coalesced to form the Earth and Moon we know, they would have ended up orbiting each other, and spinning around their axes, in roughly this very same plane. (Note: This is a consequence of the conservation of angular momentum.)

This story potentially explains the orientation of the Moon’s orbit, as well as many other strange things about the Earth-Moon system. But evidence in favor of this explanation is still not overwhelmingly strong, and so we should consider this as an important question that astronomy has yet to fully settle.

So occultations, oppositions, and their near-simultaneous occurrence have a great deal to teach us and our students. Let’s not miss the opportunity!

January 12, 2025

Jordan EllenbergGeneration Z will have its revenge on Seattle

I was just in Seattle for the Joint Meetings and had dinner with an old friend. I asked her 15-year-old daughter, who’s lived in Seattle all her life, if she knew who Kurt Cobain was, and she looked at me with a slight tinge of recognition and said, “Did…. he play basketball?”

I do really love Seattle. I love the rocks coming out of the water, I love the pointy trees. Breath of happiness whenever I show up there.

January 11, 2025

Clifford Johnson93 minutes

Thanks to everyone who made all those kind remarks in various places last month after my mother died. I've not responded individually (I did not have the strength) but I did read them all and they were deeply appreciated. Yesterday would’ve been mum‘s 93rd birthday. A little side-note occurred to me the other day: Since she left us a month ago, she was just short of having seen two perfect square years. (This year and 1936.) Anyway, still on the theme of playing with numbers, my siblings and I agreed that as a tribute to her on the day, we would all do some kind of outdoor activity for 93 minutes. Over in London, my brother and sister did a joint (probably chilly) walk together in Regents Park and surrounds. I decided to take out a piece of the afternoon at low tide and run along the beach. It went pretty well, [...] Click to continue reading this post

The post 93 minutes appeared first on Asymptotia.

January 10, 2025

Matt von HippelGovernment Science Funding Isn’t a Precision Tool

People sometimes say there is a crisis of trust in science. In controversial subjects, from ecology to health, increasingly many people are rejecting not only mainstream ideas, but the scientists behind them.

I think part of the problem is media literacy, but not in the way you’d think. When we teach media literacy, we talk about biased sources. If a study on cigarettes is funded by the tobacco industry or a study on climate change is funded by an oil company, we tell students to take a step back and consider that the scientists might be biased.

That’s a worthwhile lesson, as far as it goes. But it naturally leads to another idea. Most scientific studies aren’t funded by companies, most studies are funded by the government. If you think the government is biased, does that mean the studies are too?

I’m going to argue here that government science funding is a very different thing than corporations funding individual studies. Governments do have an influence on scientists, and a powerful one, but that influence is diffuse and long-term. They don’t have control over the specific conclusions scientists reach.

If you picture a stereotypical corrupt scientist, you might imagine all sorts of perks. They might get extra pay from corporate consulting fees. Maybe they get invited to fancy dinners, go to corporate-sponsored conferences in exotic locations, and get gifts from the company.

Grants can’t offer any of that, because grants are filtered through a university. When a grant pays a scientist’s salary, the university pays less to compensate, instead reducing their teaching responsibilities or giving them a slightly better chance at future raises. Any dinners or conferences have to obey not only rules from the grant agency (a surprising number of grants these days can’t pay for alcohol) but from the university as well, which can set a maximum on the price of a dinner or require people to travel economy using a specific travel agency. They also have to be applied for: scientists have to write their planned travel and conference budget, and the committee evaluating grants will often ask if that budget is really necessary.

Actual corruption isn’t the only thing we teach news readers to watch out for. By funding research, companies can choose to support people who tend to reach conclusions they agree with, keep in contact through the project, then publicize the result with a team of dedicated communications staff.

Governments can’t follow up on that level of detail. Scientific work is unpredictable, and governments try to fund a wide breadth of scientific work, so they have to accept that studies will not usually go as advertised. Scientists pivot, finding new directions and reaching new opinions, and government grant agencies don’t have the interest or the staff to police them for it. They also can’t select very precisely, with committees that often only know bits and pieces about the work they’re evaluating because they have to cover so many different lines of research. And with the huge number of studies funded, the number that can be meaningfully promoted by their comparatively small communications staff is only a tiny fraction.

In practice, then, governments can’t choose what conclusions scientists come to. If a government grant agency funds a study, that doesn’t tell you very much about whether the conclusion of the study is biased.

Instead, governments have an enormous influence on the general type of research that gets done. This doesn’t work on the level of conclusions, but on the level of topics, as that’s about the most granular that grant committees can get. Grants work in a direct way, giving scientists more equipment and time to do work of a general type that the grant committees are interested in. It works in terms of incentives, not because researchers get paid more but because they get to do more, hiring more students and temporary researchers if they can brand their work in terms of the more favored type of research. And it works by influencing the future: by creating students and sustaining young researchers who don’t yet have temporary positions, and by encouraging universities to hire people more likely to get grants for their few permanent positions.

So if you’re suspicious the government is biasing science, try to zoom out a bit. Think about the tools they have at their disposal, about how they distribute funding and check up on how it’s used. The way things are set up currently, most governments don’t have detailed control over what gets done. They have to filter that control through grant committees of opinionated scientists, who have to evaluate proposals well outside of their expertise. Any control you suspect they’re using has to survive that.

Matt Strassler No, the Short Range of the Weak Nuclear Force Isn’t Due to Quantum Physics

When it comes to the weak nuclear force and why it is weak, there’s a strange story which floats around. It starts with a true but somewhat misleading statement:

  • The weak nuclear force (which is weak because its effects only extend over a short range) has its short range because the particles which mediate the force, the W and Z bosons, have mass [specifically, they have “rest mass”.] This is in contrast to electromagnetic forces which can reach out over great distances; that’s because photons, the particles of light which mediate that force, have no rest mass.

    This is misleading because fields mediate forces, not particles; it’s the W and Z fields that are the mediators for the weak nuclear force, just as the electromagnetic field is the mediator for the electromagnetic force. (When people speak of forces as due to exchange of “virtual particles” — which aren’t particles — they’re using fancy math language for a simple idea from first-year undergraduate physics.)

    Then things get worse, because it is stated that

    • The connection between the W and Z bosons’ rest mass and the short range of the weak nuclear force is that
      • the force is created by the exchange of virtual W and Z bosons, and
      • due to the quantum uncertainty principle, these virtual particles with mass can’t live as long and/or travel as far as virtual photons can, shortening their range.

    This is completely off-base. In fact, quantum physics plays no role in why the weak nuclear force is weak and short-range. (It plays a big role in why the strong nuclear force is strong and short-range, but that’s a tale for another day.)

    I’ve explained the real story in a new webpage that I’ve added to my site; it has a non-technical explanation, and then some first-year college math for those who want to see it. It’s gotten some preliminary comments that have helped me improve it, but I’m sure it could be even better, and I’d be happy to get your comments, suggestions, questions and critiques if you have any.

    [P.S. — if you try but are unable to leave a comment on that page, please leave one here and tell me what went wrong; and if you try but are unable to leave a comment here too for some reason, please send me a message to let me know.]

January 05, 2025

Mark GoodsellMaking back bacon

As a French citizen I should probably disavow the following post and remind myself that I have access to some of the best food in the world. Yet it's impossible to forget the tastes of your childhood. And indeed there are lots of British things that are difficult or very expensive to get hold of in France. Some of them (Marmite, Branston pickle ...) I can import via occasional trips across the channel, or in the luggage of visiting relatives. However, since Brexit this no longer works for fresh food like bacon and sausages. This is probably a good thing for my health, but every now and then I get a hankering for a fry-up or a bacon butty, and as a result of their rarity these are amongst the favourite breakfasts of my kids too. So I've learnt how to make bacon and sausages (it turns out that boudin noir is excellent with a fry-up and I even prefer it to black pudding). 

Sausages are fairly labour-intensive, but after about an hour or so's work it's possible to make one or two kilos worth. Back bacon, on the other hand, takes three weeks to make one batch, and I thought I'd share the process here.

1. Cut of meat

The first thing is to get the right piece of pork, since animals are divided up differently in different countries. I've made bacon several times now and keep forgetting which instructions I previously gave to the butcher at my local Grand Frais ... Now I have settled on asking for a carré de porc, and when they (nearly always) tell me that they don't have that in I ask for côtes de porc première in one whole piece, and try to get them to give me a couple of kilos. As you can find on wikipedia, I need the same piece of meat used to make pork chops. I then ask them to remove the spine, but it should still have the ribs. So I start with this:



2. Cure

Next the meat has to be cured for 10 days (I essentially follow the River Cottage recipe). I mix up a 50-50 batch of PDV salt and brown sugar (1 kg in total here), and add some pepper, juniper berries and bay leaves:


Notice that this doesn't include any nitrites or nitrates. I have found that nitrates/nitrites are essential for the flavour in sausages, but in bacon the only thing that they will do (other than be a carcinogen) as far as I can tell is make the meat stay pink when you cook it. I can live without that. This cure makes delicious bacon as far as I'm concerned. 

The curing process involves applying 1/10th of the mixture each day for ten days and draining off the liquid produced at each step. After the first coating it looks like this:


The salt and sugar remove water from the meat, and penetrate into it, preserving it. Each day I get liquid at the bottom, which I drain off and apply the next cure. After one day it looks like this:


This time I still had liquid after 10 days:

3. Drying

After ten days, I wash/wipe off the cure and pat it down with some vinegar. If you leave cure on the meat it will be much too salty (and, to be honest, this cure always gives quite salty bacon). So at this point it looks like this:


I then cover the container with a muslin that has been doused with a bit more vinegar, and leave in the fridge (at first) and then in the garage (since it's nice and cold this time of year) for ten days or so. This part removes extra moisture. It's possible that there will be small amounts of white mould that appear during this stage, but these are totally benign: you only have to worry if it starts to smell or you get blue/black mould, but this never happened to me so far.

4. Smoking

After the curing/drying, the bacon is ready to eat and should in principle keep almost indefinitely. However, I prefer smoked bacon, so I cold smoke it. This involves sticking it in a smoker (essentially just a box where you can suspend the meat above some smouldering sawdust) for several hours:


 










The sawdust is beech wood and slowly burns round in the little spiral device you can see above. Of course, I close the smoker up and usually put it in the shed to protect against the elements:


5. All done!

And then that's it! Delicious back bacon that really doesn't take very long to eat:


As I mentioned above, it's usually still a bit salty, so when I slice it to cook I put the pieces in water for a few minutes before grilling/frying:

Here you see that the colour is just like frying pork chops ... but the flavour is exactly right!









January 04, 2025

Doug NatelsonThis week in the arXiv: quantum geometry, fluid momentum "tunneling", and pasta sauce

Three papers caught my eye the other day on the arXiv at the start of the new year:

arXiv:2501.00098 - J. Yu et al., "Quantum geometry in quantum materials" - I hope to write up something about quantum geometry soon, but I wanted to point out this nice review even if I haven't done my legwork yet.  The ultrabrief point:  The single-particle electronic states in crystalline solids may be written as Bloch waves, of the form \(u_{n \mathbf{k}}(\mathbf{r}) \exp(i \mathbf{k} \cdot \mathbf{r})\), where the (crystal) momentum is given by \(\hbar \mathbf{k}\) and \(u_{n \mathbf{k}}\) is a function with the real-space periodicity of the crystal lattice and contains an implicit \(\mathbf{k}\) dependence.  You can get very far in understanding solid-state physics without worrying about this, but it turns out that there are a number of very important phenomena that originate from the oft-neglected \(\mathbf{k}\) dependence of \(u_{n \mathbf{k}}\).  These include the anomalous Hall effect, the (intrinsic) spin Hall effect, the orbital Hall effect, etc.  Basically the \(\mathbf{k}\) dependence of \(u_{n \mathbf{k}}\) in the form of derivatives defines an internal "quantum" geometry of the electronic structure.  This review is a look at the consequences of quantum geometry on things like superconductivity, magnetic excitations, excitons, Chern insulators, etc. in quantum materials.

Fig. 1 from arXiv:2501.01253
arXiv:2501.01253 - B. Coquinot et al., "Momentum tunnelling between nanoscale liquid flows" - In electronic materials there is a phenomenon known as Coulomb drag, in which a current driven through one electronic system (often a 2D electron gas) leads, through Coulomb interactions, to a current in adjacent but otherwise electrically isolated electronic system (say another 2D electron gas separated from the first by a few-nm insulating layer).  This paper argues that there should be a similar-in-spirit phenomenon when a polar liquid (like water) flows on one side of a thin membrane (like one or few-layer graphene, which can support electronic excitations like plasmons) - that this could drive flow of a polar fluid on the other side of the membrane (see figure).  They cast this in the language of momentum tunneling across the membrane, but the point is that it's some inelastic scattering process mediated by excitations in the membrane.  Neat idea.

arXiv:2501.00536 - G. Bartolucci et al., "Phase behavior of Cacio and Pepe sauce" - Cacio e pepe is a wonderful Italian pasta dish with a sauce made from pecorino cheese, pepper, and hot pasta cooking water that contains dissolved starch.  When prepared well, it's incredibly creamy, smooth, and satisfying.  The authors here perform a systematic study of the sauce properties as a function of temperature and starch concentration relative to cheese content, finding the part of parameter space to avoid if you don't want the sauce to "break" (condensing out clumps of cheese-rich material and ruining the sauce texture).  That's cool, but what is impressive is that they are actually able to model the phase stability mathematically and come up with a scientifically justified version of the recipe.  Very fun.


Tommaso DorigoHoliday Chess Riddle

During Christmas holidays I tend to indulge in online chess playing a bit too much, wasting several hours a day that could be used to get back on track with the gazillion research projects I am currently trying to keep pushing. But at times it gives me pleasure, when I conceive some good tactical sequence. 
Take the position below, from a 5' game on chess.com today. White has obtained a winning position, but can you win it with the clock ticking? (I have less than two minutes left for the rest of the game...)

read more

January 03, 2025

Matt von HippelFreelancing in [Country That Includes Greenland]

(Why mention Greenland? It’s a movie reference.)

I figured I’d give an update on my personal life.

A year ago, I resigned from my position in France and moved back to Denmark. I had planned to spend a few months as a visiting researcher in my old haunts at the Niels Bohr Institute, courtesy of the spare funding of a generous friend. There turned out to be more funding than expected, and what was planned as just a few months was extended to almost a year.

I spent that year learning something new. It was still an amplitudes project, trying to make particle physics predictions more efficient. But this time I used Python. I looked into reinforcement learning and PyTorch, played with using a locally hosted Large Language Model to generate random code, and ended up getting good results from a classic genetic programming approach. Along the way I set up a SQL database, configured Docker containers, and puzzled out interactions with CUDA. I’ve got a paper in the works, I’ll post about it when it’s out.

All the while, on the side, I’ve been seeking out stories. I’ve not just been a writer, but a journalist, tracking down leads and interviewing experts. I had three pieces in Quanta Magazine and one in Ars Technica.

Based on that, I know I can make money doing science journalism. What I don’t know yet is whether I can make a living doing it. This year, I’ll figure that out. With the project at the Niels Bohr Institute over, I’ll have more time to seek out leads and pitch to more outlets. I’ll see whether I can turn a skill into a career.

So if you’re a scientist with a story to tell, if you’ve discovered something or accomplished something or just know something that the public doesn’t, and that you want to share: do reach out. There’s a lot that can be of interest, passion that can be shared.

At the same time, I don’t know yet whether I can make a living as a freelancer. Many people try and don’t succeed. So I’m keeping my CV polished and my eyes open. I have more experience now with Data Science tools, and I’ve got a few side projects cooking that should give me a bit more. I have a few directions in mind, but ultimately, I’m flexible. I like being part of a team, and with enthusiastic and competent colleagues I can get excited about pretty much anything. So if you’re hiring in Copenhagen, if you’re open to someone with ten years of STEM experience who’s just starting to see what industry has to offer, then let’s chat. Even if we’re not a good fit, I bet you’ve got a good story to tell.

January 01, 2025

John PreskillHappy 200th birthday, Carnot’s theorem!

In Kenneth Grahame’s 1908 novel The Wind in the Willows, a Mole meets a Water Rat who lives on a River. The Rat explains how the River permeates his life: “It’s brother and sister to me, and aunts, and company, and food and drink, and (naturally) washing.” As the River plays many roles in the Rat’s life, so does Carnot’s theorem play many roles in a thermodynamicist’s.

Nicolas Léonard Sadi Carnot lived in France during the turn of the 19th century. His father named him Sadi after the 13th-century Persian poet Saadi Shirazi. Said father led a colorful life himself,1 working as a mathematician, engineer, and military commander for and before the Napoleonic Empire. Sadi Carnot studied in Paris at the École Polytechnique, whose members populate a “Who’s Who” list of science and engineering. 

As Carnot grew up, the Industrial Revolution was humming. Steam engines were producing reliable energy on vast scales; factories were booming; and economies were transforming. France’s old enemy Britain enjoyed two advantages. One consisted of inventors: Englishmen Thomas Savery and Thomas Newcomen invented the steam engine. Scotsman James Watt then improved upon Newcomen’s design until rendering it practical. Second, northern Britain contained loads of coal that industrialists could mine to power her engines. France had less coal. So if you were a French engineer during Carnot’s lifetime, you should have cared about engines’ efficiencies—how effectively engines used fuel.2

Carnot proved a fundamental limitation on engines’ efficiencies. His theorem governs engines that draw energy from heat—rather than from, say, the motional energy of water cascading down a waterfall. In Carnot’s argument, a heat engine interacts with a cold environment and a hot environment. (Many car engines fall into this category: the hot environment is burning gasoline. The cold environment is the surrounding air into which the car dumps exhaust.) Heat flows from the hot environment to the cold. The engine siphons off some heat and converts it into work. Work is coordinated, well-organized energy that one can directly harness to perform a useful task, such as turning a turbine. In contrast, heat is the disordered energy of particles shuffling about randomly. Heat engines transform random heat into coordinated work.

In The Wind and the Willows, Toad drives motorcars likely powered by internal combustion, rather than by a steam engine of the sort that powered the Industrial Revolution.

An engine’s efficiency is the bang we get for our buck—the upshot we gain, compared to the cost we spend. Running an engine costs the heat that flows between the environments: the more heat flows, the more the hot environment cools, so the less effectively it can serve as a hot environment in the future. An analogous statement concerns the cold environment. So a heat engine’s efficiency is the work produced, divided by the heat spent.

Carnot upper-bounded the efficiency achievable by every heat engine of the sort described above. Let T_{\rm C} denote the cold environment’s temperature; and T_{\rm H}, the hot environment’s. The efficiency can’t exceed 1 - \frac{ T_{\rm C} }{ T_{\rm H} }. What a simple formula for such an extensive class of objects! Carnot’s theorem governs not only many car engines (Otto engines), but also the Stirling engine that competed with the steam engine, its cousin the Ericsson engine, and more.

In addition to generality and simplicity, Carnot’s bound boasts practical and fundamental significances. Capping engine efficiencies caps the output one can expect of a machine, factory, or economy. The cap also prevents engineers from wasting their time on daydreaming about more-efficient engines. 

More fundamentally than these applications, Carnot’s theorem encapsulates the second law of thermodynamics. The second law helps us understand why time flows in only one direction. And what’s deeper or more foundational than time’s arrow? People often cast the second law in terms of entropy, but many equivalent formulations express the law’s contents. The formulations share a flavor often synopsized with “You can’t win.” Just as we can’t grow younger, we can’t beat Carnot’s bound on engines. 

Video courtesy of FQxI

One might expect no engine to achieve the greatest efficiency imaginable: 1 - \frac{ T_{\rm C} }{ T_{\rm H} }, called the Carnot efficiency. This expectation is incorrect in one way and correct in another. Carnot did design an engine that could operate at his eponymous efficiency: an eponymous engine. A Carnot engine can manifest as the thermodynamicist’s favorite physical system: a gas in a box topped by a movable piston. The gas undergoes four strokes, or steps, to perform work. The strokes form a closed cycle, returning the gas to its initial conditions.3 

Steampunk artist Todd Cahill beautifully illustrated the Carnot cycle for my book. The gas performs useful work because a weight sits atop the piston. Pushing the piston upward, the gas lifts the weight.

The gas expands during stroke 1, pushing the piston and so outputting work. Maintaining contact with the hot environment, the gas remains at the temperature T_{\rm H}. The gas then disconnects from the hot environment. Yet the gas continues to expand throughout stroke 2, lifting the weight further. Forfeiting energy, the gas cools. It ends stroke 2 at the temperature T_{\rm C}.

The gas contacts the cold environment throughout stroke 3. The piston pushes on the gas, compressing it. At the end of the stroke, the gas disconnects from the cold environment. The piston continues compressing the gas throughout stroke 4, performing more work on the gas. This work warms the gas back up to T_{\rm H}.

In summary, Carnot’s engine begins hot, performs work, cools down, has work performed on it, and warms back up. The gas performs more work on the piston than the piston performs on it.

At what cost, if the engine operates at the Carnot efficiency? The engine mustn’t waste heat. One wastes heat by roiling up the gas unnecessarily—by expanding or compressing it too quickly. The gas must stay in equilibrium, a calm, quiescent state. One can keep the gas quiescent only by running the cycle infinitely slowly. The cycle will take an infinitely long time, outputting zero power (work per unit time). So one can achieve the perfect efficiency only in principle, not in practice, and only by sacrificing power. Again, you can’t win.

Efficiency trades off with power.

Carnot’s theorem may sound like the Eeyore of physics, all negativity and depression. But I view it as a companion and backdrop as rich, for thermodynamicists, as the River is for the Water Rat. Carnot’s theorem curbs diverse technologies in practical settings. It captures the second law, a foundational principle. The Carnot cycle provides intuition, serving as a simple example on which thermodynamicists try out new ideas, such as quantum engines. Carnot’s theorem also provides what physicists call a sanity check: whenever a researcher devises a new (for example, quantum) heat engine, they can confirm that the engine obeys Carnot’s theorem, to help confirm their proposal’s accuracy. Carnot’s theorem also serves as a school exercise and a historical tipping point: the theorem initiated the development of thermodynamics, which continues to this day. 

So Carnot’s theorem is practical and fundamental, pedagogical and cutting-edge—brother and sister, and aunts, and company, and food and drink. I just wouldn’t recommend trying to wash your socks in Carnot’s theorem.

1To a theoretical physicist, working as a mathematician and an engineer amounts to leading a colorful life.

2People other than Industrial Revolution–era French engineers should care, too.

3A cycle doesn’t return the hot and cold environments to their initial conditions, as explained above.

December 31, 2024

Doug NatelsonEnd of the year thoughts - scientific philanthropy and impact

As we head into 2025, and the prospects for increased (US) government investment in science, engineering, and STEM education seem very limited, I wanted to revisit a topic that I wrote about over a decade ago (!!!), the role of philanthropy and foundations in these areas.  

Personally I think the case for government support of scientific research and education is overwhelmingly clear; while companies depend on having an educated technical workforce (at least for now) and continually improving technology, they are under great short-term financial pressures and genuinely long-term investment in research is rare.  Foundations are not a substitute for nation-state levels of support, but they are a critical component of the research and education landscape.  

Annual citations of the EPR paper from Web of Science,
a case study in the long-term impact of some "pure" scientific
research, and giving hope to practicing scientists that surely
our groundbreaking work will be appreciated sixty years 
after publication.  

A key question I've wondered about for a long time is how to properly judge the impact that research-supporting foundations are making.  The Science Philanthropy Alliance is a great organization that considers these issues deeply.

The nature of long-term research is that it often takes a long time for its true impact (I don't mean just citation counts, but those are an indicator of activity) to be felt.  One (admittedly extreme) example is shown here, the citations-per-year (from Web of Science) of the 1935 Einstein/Podolsky/Rosen paper about entanglement.  (Side note:  You have to love the provocative title, "Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?", which from the point of view of the authors at the time satisfies Betteridge's Law of Headlines.)  There are few companies that would be willing to invest in supporting research that won't have its true heyday for five or six decades.

One additional tricky bit is that grants are usually given to people and organizations who are already active.  It's often not simple to point to some clear result or change in output that absolutely would not have happened without foundation support.  This is exacerbated by the fact that grants in science and engineering are often given to people and organizations who are not just active but already very well supported - betting on an odds-on favorite is a low risk strategy. 

Many foundations do think very carefully about what areas to support, because they want to "move the needle".  For example, some scientific foundations are consciously reluctant to support closer-to-clinical-stage cancer research, since the total annual investment by governments and pharmaceutical companies in that area numbers in the many billions of dollars, and a modest foundation contribution would be a tiny delta on top of that.  

Here is a list of the wealthiest charitable foundations (only a few of which support scientific research and/or education) and their endowments.  Nearly all of the science-related ones are also plugged in here.  A rough estimate of annual expenditures from endowed entities is about 5% of their holdings.  Recently I've come to think about private universities as one crude comparator for impact.  If a foundation has the same size endowment as a prestigious research university, I think it's worth thinking about the relative downstream impacts of those entities.  (Novo Nordisk Foundation has an endowment three times the size of Harvard's endowment.)  

Another comparator would be the annual research expenditures of a relevant funding agency.  The US NSF put forward $234M into major research instrumentation and facilities in FY2024.  A foundation with a $5B endowment could in principle support all of that from endowment returns.  This lets me make my semiregular pitch about foundational or corporate support for research infrastructure and user facilities around the US.  The entire annual budget for the NSF's NNCI, which supports shared nanofabrication and characterization facilities around the US, is about $16M.   That's a niche where comparatively modest foundation (or corporate) support could have serious impact for interdisciplinary research and education across the country.  I'm sure there are other similar areas out there, and I hope someone is thinking about this.  

Anyway, thanks to my readers - this is now the 20th year of this blog's existence (!!! again), and I hope to be able to keep it up well in the new year.



December 30, 2024

Tommaso DorigoWhy Measure The Top Quark Production Cross Section?

As part of my self-celebrations for XX years of blogging activities, I am reposting here (very) old articles I wrote over the years on topics ranging from Physics to Chess to anything in between. The post I am recycling today is one that describes for laymen a reason why it is interesting to continue going after the top quark, many years (10, at the time the article was written) after the discovery of that particle. The piece appeared in July 10, 2005 in my column at the Quantum Diaries blog (https://qd.typepad.com/6/2005/07/ok_so_i_promise.html).

read more

December 29, 2024

John BaezProblems to Sharpen the Young

 

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

You probably know this puzzle. There are two efficient solutions, related by a symmetry that switches the wolf and the cabbage.

But what you might not know is that this puzzle goes back to a book written around 800 AD, sometimes attributed to Charlemagne’s advisor Alcuin! Charlemagne brought this monk from York to help set up the educational system of his empire. But Alcuin had a great fondness for logic. Nobody is sure if he wrote this book — but it’s fascinating nonetheless.

It has either 53 or 56 logic puzzles, depending on the version. It’s called Propositiones ad Acuendos Juvenes, or Problems to Sharpen the Young. If the wolf, goat and cabbage problem is too easy for you, you might like this one:

Three men, each with a sister, must cross a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man.

There are also trick puzzles, like this:

A man has 300 pigs. He ordered all of them slaughtered in 3 days, but with an odd number killed each day. What number were to be killed each day?

Wikipedia says this was given to punish unruly students — presumably students who didn’t know that the sum of three odd numbers is always odd.

It’s fascinating to think that while some Franks were fighting Saxons and Lombards, students in more peaceful parts of the empire were solving these puzzles!

The book also has some of the first recorded packing problems, like this:

There is a triangular city which has one side of 100 feet, another side of 100 feet, and a third of 90 feet. Inside of this, I want to build rectangular houses in such a way that each house is 20 feet in length, 10 feet in width. How many houses can I fit in the city?

This is hard! There’s a nice paper about all the packing problems in this book:

• Nikolai Yu. Zolotykh, Alcuin’s Propositiones de Civitatibus: the earliest packing problems.

He shows two solutions to the above problem, both with 16 houses. In solution here there’s a tiny space between houses 7 and 15.


However, Alcuin — or whoever wrote the book — didn’t really solve the problem. They just used an approximate Egyptian formula for the area of a triangle in terms of its side lengths, and divided that by the area of the houses! This is consistent with the generally crappy knowledge of math in Charlemagne’s empire.

There’s an even harder packing problem in this book, which again isn’t solved correctly.

For more on this book of puzzles, check out this:

• Wikipedia, Propositiones ad Acuendos Juvenes.

You can get a translation into English here:

• John Hadley and David Singmaster,
Problems to Sharpen the Young, The Mathematical Gazette 76 (1992), 102–126.

Finally, here is a nice paper on the question of whether Alcuin wrote the book, and various bit of evidence for Alcuin’s interest in mathematics, numerology, and related fun activities:

• Michael N. Brennan, Alcuin, mathematics and the rational mind, in Insular Iconographies: Essays in Honor of Jane Hawkes, eds. Meg Boulton and Michael D. J. Bintley, Boydell Press, 2019, pp. 203–216.

It’s so fascinating that I’ll quote a bunch of it. It starts this way:

A medieval mathematical manuscript

The medieval cleric Alcuin (AD 735–804) is credited in surviving manuscripts with being the originator of a collection of fifty-three mathematical and logical puzzles, the Propositiones ad acuendos iuvenes (“Problems for Sharpening the Young”). There is no direct evidence to connect the collection with Alcuin either as compiler or creator, but even modern commentators continue to associate his name with the puzzles. There are at least fourteen extant or partial copies of the Propositiones, which date from the ninth to the fifteenth century, suggesting that the collection was in popular use at least from the time of Alcuin onwards. Michael Gorman confidently listed the Propositiones among ninety spurious prose works of Alcuin for two reasons: firstly, because Alcuin always attached a dedicatory letter to his works, and there is none here, and secondly, because the work falls, Gorman thinks, among those documents which Alcuin would have had neither the time nor the energy to write in the period AD 782-800. Alcuin himself admitted to having only “stolen hours” at night in which to write a life of Willibrord. Despite Gorman”s view that the work is pseudo-Alcuin, it is reasonable to ask if there is internal evidence in the puzzles which would support or oppose the assigning of the collection to Alcuin himself, committed as he was to the promotion of educational subjects, including mathematics, in the course of the Carolingian “renaissance”.

The majority of the problem types in the Propositiones are known from earlier Chinese, Indian, Egyptian, Byzantine, Greek, and Roman sources, whilst others appear in works by Boethius, Metrodorus, and Isidore of Seville. Among the puzzles, however, there are a small number of important types that are not as yet known from earlier sources. These include the so-called “river-crossing problems”, “strange family problems”, a transportation (or “desert-crossing”) problem, and a problem that relies on the summation of an arithmetical series. The case for the mathematical creativity of the author, if there was only one author, rests on these. One puzzle has outstripped all the others in having popular appeal down to the present day. This puzzle, passing presumably from later medieval reproductions of the Propositiones into oral tradition, concerns a farmer who needs to ferry three items across a river in a boat.

He then goes on to discuss the problem of the wolf, the goat and the cabbage — though in this paper, presumably more historically accurate, it’s a wolf, a goat and a bag of oats. Then he returns to the big question:

Did Alcuin compose the Propositiones?

In a letter to Charlemagne in AD 799, three years after Alcuin had moved from the Palace School to an abbacy at Tours, Alcuin wrote that he was sending, along with some examples of grammar and correct expression, quidquid figuras arithmeticas laetitiae causa (“certain arithmetical curiosities for [your] pleasure”). He added that he would write these on an empty sheet that Charlemagne sent to him and suggested that “our friend and helper Beselel will […] be able to look up the problems in an arithmetic book”. Beselel was Alcuin”s nickname for the technically skilled Einhard, later Charlemagne”s biographer. It would be convenient if the reference in Alcuin”s letter were to the Propositiones, but for many reasons it is almost certainly not. For one thing, fifty-three propositions and their solutions would not fit on the blank side of a folio, given that they occupy ten folio sides in most of the manuscripts in which they are currently found. Secondly, since Alcuin”s letter to Charlemagne dealt primarily with the importance of exactness of language, a grammarian and self-conscious Latinist like Alcuin would not describe the Propositiones as figurae arithmeticae, or refer to an “arithmetic” book in a case where the solutions required both geometry and logic methods. Thirdly, the idea of Einhard looking up the problems in an arithmetic book is an odd one, given that the Propositiones is usually found with answers, even if in most cases it does not show how these answers were arrived at.

Apart from the reasons given by Gorman for Alcuin not having been the author of the Propositiones — the effort and time involved and the absence of a dedication — there are deeper, internal reasons for concluding that Alcuin was not the author. The river-crossing puzzles, along with the “transportation problem”, and a puzzle about the number of birds on a100-step ladder took a considerable amount of time and mathematical sophistication to compose. Accompanying them are two types of puzzles that appear repeatedly in the collection and that demand far lesssophistication: area problems, often fanciful (such as problem 29: “how many rectangular houses will fit into a town which has a circular wall?”); and division problems where a “lord of the manor” wishes to divide grain among his household (problems 32-5, and others). Repetition of problems suggests that the Propositiones was intended for use as a practice book, but this supposed pedagogical purpose stumbles (even if it does not fall) on two counts. Firstly, the solutions to most of the mensuration problems, like the one involving a circular town, rely on late Roman methods for approximating the areas of common figures (circles, triangles, etc.) and these methods can be quite wrong. Secondly, the lack of worked solutions in the Propositiones deprived the student of the opportunity to learn the method required to solve another question of the same type. Solution methods might have been lost in transcription, but their almost total absence makes this unlikely. Mathematical methods (algebra in particular) that post-dated the Carolingian era would have been necessary to provide elegant and instructive solutions to many of the problems, and one cannot escape the suspicion that trial-and-error was the method used by whoever supplied answers (without worked solutions) to the Propositiones: a guess was made, refined, and then a new guess made. Such an approach is difficult to systemise, and even more difficult to describe in writing for the benefit of students. We are left with a mixture of “complex” mathematical problems, such as those cited earlier, and simpler questions whose answers are mostly not justified.

This lack of uniformity suggests that it was a compiler rather than a composer who produced the first edition of the Propositiones. If Alcuin was involved, he was no more than a medium through which the fifty-three puzzles were assembled. No one person was the author of the Propositiones problems, because no mathematically sophisticated person would have authored the weaker ones, and nobody who was not mathematically sophisticated could have authored the others. Furthermore, with the more sophisticated problems, there is a noticeable absence of the kind of refinement and repetition that is common in modern textbooks.

He then goes on to discuss Alcuin’s fascination with numerology, acrostics, and making up nicknames for his friends. It’s too bad we’ll never know for sure what, if anything, Alcuin had to do with this book of puzzles!

John BaezThe Parker Solar Probe

Today, December 24th 2024, the Parker Solar Probe got 7 times closer to the Sun than any spacecraft ever has, going faster than any spacecraft ever has—690,000 kilometers per hour. WHEEEEEE!!!!!!!

But the newspapers are barely talking about the really cool part: what it’s like down there. The Sun doesn’t have a surface like the Earth does, since it’s all just hot ionized gas, called ‘plasma‘. But the Sun has an ‘Alfvén surface’—and the probe has penetrated that.

What’s the Alfvén surface? In simple terms, it’s where the solar wind—the hot ionized gas emitted by the Sun—breaks free of the Sun and shoots out into space. But to understand how cool this is, we need to dig a bit deeper.

After all, how can we say where the solar wind “breaks free of the Sun”?

Hot gas shoots up from the Sun, faster and faster due to its pressure, even though it’s pulled down by gravity. At some point it goes faster than the speed of sound! This is the Alfvén surface. Above this surface, the solar wind becomes supersonic, so no disturbances in its flow can affect the Sun below.

It’s sort of like the reverse of a black hole! Light emitted from within the event horizon of a black hole can’t get out. Sound emitted from outside the Alfvén surface of the Sun can’t get in.

Or, it’s like the edge of a waterfall, where the water starts flowing so fast that waves can’t make it back upstream.

That’s pretty cool. But it’s even cooler than this, because ‘sound’ in the solar wind is very different from sound on Earth. Here we have air. The Sun has ions—atoms of gas so hot that electrons have been ripped off—interacting with powerful magnetic fields. You can visualize these fields as tight rubber bands, with the ions stuck to them. They vibrate back and forth together!

You could call these vibrations ‘sound’, but the technical term is Alfvén waves. Alfvén was the one who figured out how fast these waves move. Parker studied the surface where the solar wind’s speed exceeds the speed of the Alfvén waves.

And now we’ve gone deep below that surface!

This realm is a strange one, and the more we study it, the more complex it seems to get.

You’ve probably heard the joke that ends “consider a spherical cow”. Parker’s original model of the solar wind was spherically symmetric, so he imagined the solar wind shooting straight out of the Sun in all directions. In this model, the Alfvén surface is the sphere where the wind becomes faster the Alfvén waves. There are some nice simple formulas for all this.

But in fact the Sun’s surface is roiling and dynamic, with sunspots making solar flares, and all sorts of bizarre structures made of plasma and magnetic fields, like spicules, ‘coronal streamers’ and ‘pseudostreamers’… aargh, too complicated for me to understand. This is an entire branch of science!

So, the Alfvén surface is not a mere sphere: it’s frothy and randomly changing. The Parker Solar Probe will help us learn how it works—along with many other things.

Finally, here’s something mindblowing. There’s a red dwarf star 41 light years away from us, called TRAPPIST-1, which may have six planets beneath its Alfvén surface! This means these planets can create Alfvén waves in the star’s atmosphere. Truly the music of the spheres!

For more, check out these articles:

• Wikipedia, Alfvén wave.

• Wikipedia, Alfvén surface.

and this open-access article:

• Steven R. Cranmer, Rohit Chhiber, Chris R. Gilly, Iver H. Cairns, Robin C. Colaninno, David J. McComas, Nour E. Raouafi, Arcadi V. Usmanov, Sarah E. Gibson and Craig E. DeForest, The Sun’s Alfvén surface: recent insights and prospects for the Polarimeter to Unify the Corona and Heliosphere (PUNCH), Solar Physics 298 (2023).

A quote:

Combined with recent perihelia of Parker Solar Probe, these studies seem to indicate that the Alfvén surface spends most of its time at heliocentric distances between about 10 and 20 solar radii. It is becoming apparent that this region of the heliosphere is sufficiently turbulent that there often exist multiple (stochastic and time-dependent) crossings of the Alfvén surface along any radial ray.

December 27, 2024

Matt von HippelNewtonmas and the Gift of a Physics Background

This week, people all over the world celebrated the birth of someone whose universally attractive ideas spread around the globe. I’m talking, of course about Isaac Newton.

For Newtonmas this year, I’ve been pondering another aspect of Newton’s life. There’s a story you might have heard that physicists can do basically anything, with many people going from a career in physics to a job in a variety of other industries. It’s something I’ve been trying to make happen for myself. In a sense, this story goes back to the very beginning, when Newton quit his academic job to work at the Royal Mint.

On the surface, there are a lot of parallels. At the Mint, a big part of Newton’s job was to combat counterfeiting and “clipping”, where people would carve small bits of silver off of coins. This is absolutely a type of job ex-physicists do today, at least in broad strokes. Working as Data Scientists for financial institutions, people look for patterns in transactions that give evidence of fraud.

Digging deeper, though, the analogy falls apart a bit. Newton didn’t apply any cunning statistical techniques to hunt down counterfeiters. Instead, the stories that get told about his work there are basically detective stories. He hung out in bars to catch counterfeiter gossip and interviewed counterfeiters in prison, not exactly the kind of thing you’d hire a physicist to do these days. The rest of the role was administrative: setting up new mint locations and getting people to work overtime to replace the country’s currency. Newton’s role at the mint was less like an ex-physicist going into Data Science and more like Steven Chu as Secretary of Energy: someone with a prestigious academic career appointed to a prestigious government role.

If you’re looking for a patron saint of physicists who went to industry, Newton’s contemporary Robert Hooke may be a better bet. Unlike many other scientists of the era, Hooke wasn’t independently wealthy, and for a while he was kept quite busy working for the Royal Society. But a bit later he had another, larger source of income: working as a surveyor and architect, where he designed several of London’s iconic buildings. While Newton’s work at the Mint drew on his experience as a person of power and influence, working as an architect drew much more on skills directly linked to Hooke’s work as a scientist: understanding the interplay of forces in quantitative detail.

While Newton and Hooke’s time was an era of polymaths, in some sense the breadth of skills imparted by a physics education has grown. Physicists learn statistics (which barely existed in Newton’s time) programming (which did not exist at all) and a wider range of mathematical and physical models. Having a physics background isn’t the ideal way to go into industry (that would be having an industry background). But for those of us making the jump, it’s still a Newtonmas gift to be grateful for.

Jordan EllenbergNotes towards a logic puzzle

You arrive at a gate with two guards. One guard likes big butts and he cannot lie. The other guard hates big butts and he cannot tell the truth.

December 26, 2024

Tommaso DorigoThe Buried Lottery

As part of my self-celebrations for having survived 20 years of blogging (the anniversary was a few days ago, see my previous post), I am re-posting a few representative, old articles I wrote in my column over the years. The selection will not be representative of the material I covered over all this time - that would be too tall an order. Rather, I will hand-pick a few pieces just to make a point or two about their content. 

read more

December 25, 2024

Terence TaoQuaternions and spherical trigonometry

Hamilton’s quaternion number system {\mathbb{H}} is a non-commutative extension of the complex numbers, consisting of numbers of the form {t + xi + yj + zk} where {t,x,y,z} are real numbers, and {i,j,k} are anti-commuting square roots of {-1} with {ij=k}, {jk=i}, {ki=j}. While they are non-commutative, they do keep many other properties of the complex numbers:

  • Being non-commutative, the quaternions do not form a field. However, they are still a skew field (or division ring): multiplication is associative, and every non-zero quaternion has a unique multiplicative inverse.
  • Like the complex numbers, the quaternions have a conjugation

    \displaystyle  \overline{t+xi+yj+zk} := t-xi-yj-zk,

    although this is now an antihomomorphism rather than a homomorphism: {\overline{qr} = \overline{r}\ \overline{q}}. One can then split up a quaternion {t + xi + yj + zk} into its real part {t} and imaginary part {xi+yj+zk} by the familiar formulae

    \displaystyle  \mathrm{Re} q := \frac{q + \overline{q}}{2}; \quad \mathrm{Im} q := \frac{q - \overline{q}}{2}

    (though we now leave the imaginary part purely imaginary, as opposed to dividing by {i} in the complex case).
  • The inner product

    \displaystyle  \langle q, r \rangle := \mathrm{Re} q \overline{r}

    is symmetric and positive definite (with {1,i,j,k} forming an orthonormal basis). Also, for any {q}, {q \overline{q}} is real, hence equal to {\langle q, q \rangle}. Thus we have a norm

    \displaystyle  |q| = \sqrt{q\overline{q}} = \sqrt{\langle q,q \rangle} = \sqrt{t^2 + x^2 + y^2 + z^2}.

    Since the real numbers commute with all quaternions, we have the multiplicative property {|qr| = |q| |r|}. In particular, the unit quaternions {U(1,\mathbb{H}) := \{ q \in \mathbb{H}: |q|=1\}} (also known as {SU(2)}, {Sp(1)}, or {Spin(3)}) form a compact group.
  • We have the cyclic trace property

    \displaystyle  \mathrm{Re}(qr) = \mathrm{Re}(rq)

    which allows one to take adjoints of left and right multiplication:

    \displaystyle  \langle qr, s \rangle = \langle q, s\overline{r}\rangle; \quad \langle rq, s \rangle = \langle q, \overline{r}s \rangle

  • As {i,j,k} are square roots of {-1}, we have the usual Euler formulae

    \displaystyle  e^{i\theta} = \cos \theta + i \sin \theta, e^{j\theta} = \cos \theta + j \sin \theta, e^{k\theta} = \cos \theta + k \sin \theta

    for real {\theta}, together with other familiar formulae such as {\overline{e^{i\theta}} = e^{-i\theta}}, {e^{i(\alpha+\beta)} = e^{i\alpha} e^{i\beta}}, {|e^{i\theta}| = 1}, etc.
We will use these sorts of algebraic manipulations in the sequel without further comment.

The unit quaternions {U(1,\mathbb{H}) = \{ q \in \mathbb{H}: |q|=1\}} act on the imaginary quaternions {\{ xi + yj + zk: x,y,z \in {\bf R}\} \equiv {\bf R}^3} by conjugation:

\displaystyle  v \mapsto q v \overline{q}.

This action is by orientation-preserving isometries, hence by rotations. It is not quite faithful, since conjugation by the unit quaternion {-1} is the identity, but one can show that this is the only loss of faithfulness, reflecting the well known fact that {U(1,\mathbb{H}) \equiv SU(2)} is a double cover of {SO(3)}.

For instance, for any real {\theta}, conjugation by {e^{i\theta/2} = \cos(\theta/2) + i \sin(\theta/2)} is a rotation by {\theta} around {i}:

\displaystyle  e^{i\theta/2} i e^{-i\theta/2} = i \ \ \ \ \ (1)

\displaystyle  e^{i\theta/2} j e^{-i\theta/2} = \cos(\theta) j - \sin(\theta) k \ \ \ \ \ (2)

\displaystyle  e^{i\theta/2} k e^{-i\theta/2} = \cos(\theta) k + \sin(\theta) j. \ \ \ \ \ (3)

Similarly for cyclic permutations of {i,j,k}. The doubling of the angle here can be explained from the Lie algebra fact that {[i,j]=ij-ji} is {2k} rather than {k}; it also closely related to the aforementioned double cover. We also of course have {U(1,\mathbb{H})\equiv Spin(3)} acting on {\mathbb{H}} by left multiplication; this is known as the spinor representation, but will not be utilized much in this post. (Giving {\mathbb{H}} the right action of {{\bf C}} makes it a copy of {{\bf C}^2}, and the spinor representation then also becomes the standard representation of {SU(2)} on {{\bf C}^2}.)

Given how quaternions relate to three-dimensional rotations, it is not surprising that one can also be used to recover the basic laws of spherical trigonometry – the study of spherical triangles on the unit sphere. This is fairly well known, but it took a little effort for me to locate the required arguments, so I am recording the calculations here.

The first observation is that every unit quaternion {q} induces a unit tangent vector {qj\overline{q}} on the unit sphere {S^2 \subset {\bf R}^3}, located at {qi\overline{q} \in S^2}; the third unit vector {qk\overline{q}} is then another tangent vector orthogonal to the first two (and oriented to the left of the original tangent vector), and can be viewed as the cross product of {qi\overline{q} \in S^2} and {qj\overline{q} \in S^2}. Right multplication of this quaternion then corresponds to various natural operations on this unit tangent vector:

  • Right multiplying {q} by {e^{i\theta/2}} does not affect the location {qi\overline{q}} of the tangent vector, but rotates the tangent vector {qj\overline{q}} anticlockwise by {\theta} in the direction of the orthogonal tangent vector {qk\overline{q}}, as it replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} + \sin(\theta) qk\overline{q}}.
  • Right multiplying {q} by {e^{k\theta/2}} advances the tangent vector by geodesic flow by angle {\theta}, as it replaces {qi\overline{q}} by {\cos(\theta) qi\overline{q} + \sin(\theta) qj\overline{q}}, and replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} - \sin(\theta) qi\overline{q}}.

Now suppose one has a spherical triangle with vertices {A,B,C}, with the spherical arcs {AB, BC, CA} subtending angles {c, a, b} respectively, and the vertices {A,B,C} subtending angles {\alpha,\beta,\gamma} respectively; suppose also that {ABC} is oriented in an anti-clockwise direction for sake of discussion. Observe that if one starts at {A} with a tangent vector oriented towards {B}, advances that vector by {c}, and then rotates by {\pi - \beta}, the tangent vector now at {B} and pointing towards {C}. If one advances by {a} and rotates by {\pi - \gamma}, one is now at {C} pointing towards {A}; and if one then advances by {b} and rotates by {\pi - \alpha}, one is back at {A} pointing towards {B}. This gives the fundamental relation

\displaystyle  e^{kc/2} e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} e^{kb/2} e^{i(\pi-\alpha)/2} = 1 \ \ \ \ \ (4)

relating the three sides and three equations of this triangle. (A priori, due to the lack of faithfulness of the {U(1,\mathbb{H})} action, the right-hand side could conceivably have been {-1} rather than {1}; but for extremely small triangles the right-hand side is clearly {1}, and so by continuity it must be {1} for all triangles.) Indeed, a moments thought will reveal that the condition (4) is necessary and sufficient for the data {a,b,c,\alpha,\beta,\gamma} to be associated with a spherical triangle. Thus one can view (4) as a “master equation” for spherical trigonometry: in principle, it can be used to derive all the other laws of this subject.

Remark 1 The law (4) has an evident symmetry {(a,b,c,\alpha,\beta,\gamma) \mapsto (\pi-\alpha,\pi-\beta,\pi-\gamma,\pi-a,\pi-b,\pi-c)}, which corresponds to the operation of replacing a spherical triangle with its dual triangle. Also, there is nothing particularly special about the choice of imaginaries {i,k} in (4); one can conjugate (4) by various quaternions and replace {i,k} here by any other orthogonal pair of unit quaternions.

Remark 2 If we work in the small scale regime, replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c} for some small {\varepsilon>0}, then we expect spherical triangles to behave like Euclidean triangles. Indeed, (4) to zeroth order becomes

\displaystyle  e^{i(\pi-\beta)/2} e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} = 1

which reflects the classical fact that the sum of angles of a Euclidean triangle is equal to {\pi}. To first order, one obtains

\displaystyle  c + a e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} + b e^{i(\pi-\alpha)/2} = 0

which reflects the evident fact that the vector sum of the sides of a Euclidean triangle sum to zero. (Geometrically, this correspondence reflects the fact that the action of the (projective) quaternion group on the unit sphere converges to the action of the special Euclidean group {SE(2)} on the plane, in a suitable asymptotic limit.)

The identity (4) is an identity of two unit quaternions; as the unit quaternion group {U(1,\mathbb{H})} is three-dimensional, this thus imposes three independent constraints on the six real parameters {a,b,c,\alpha,\beta,\gamma} of the spherical triangle. One can manipulate this constraint in various ways to obtain various trigonometric identities involving some subsets of these six parameters. For instance, one can rearrange (4) to get

\displaystyle  e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} = e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}. \ \ \ \ \ (5)

Conjugating by {i} to reverse the sign of {k}, we also have

\displaystyle  e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} = e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2}.

Taking the inner product of both sides of these identities, we conclude that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2}, e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}, e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2} \rangle.

Using the various properties of inner product, the former expression simplifies to {\mathrm{Re} e^{ka} = \cos a}, while the latter simplifies to

\displaystyle  \mathrm{Re} \langle e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2}, e^{kc} \rangle.

We can write {e^{kc} = \cos c + (\sin c) k} and

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2} = \cos b - (\sin b) (\cos(\pi-\alpha) k + \sin(\pi-\alpha) j)

so on substituting and simplifying we obtain

\displaystyle  \cos b \cos c + \sin b \sin c \cos \alpha = \cos a

which is the spherical cosine rule. Note in the infinitesimal limit (replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c}) this rule becomes the familiar Euclidean cosine rule

\displaystyle  a^2 = b^2 + c^2 - 2bc \cos \alpha.

In a similar fashion, from (5) we see that the quantity

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, k \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, k \rangle.

The first expression simplifies by (1) and properties of the inner product to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} k e^{i(\pi-\beta)/2} \rangle,

which by (2), (3) simplifies further to {-\sin a \sin \beta}. Similarly, the second expression simplifies to

\displaystyle  \langle e^{-kb/2} i e^{kb/2} , e^{i(\pi-\alpha)/2} k e^{-i(\pi-\alpha)/2}\rangle,

which by (2), (3) simplifies to {-\sin b \sin \alpha}. Equating the two and rearranging, we obtain

\displaystyle  \frac{\sin \alpha}{\sin a} = \frac{\sin \beta}{\sin b}

which is the spherical sine rule. Again, in the infinitesimal limit we obtain the familiar Euclidean sine rule

\displaystyle  \frac{\sin \alpha}{a} = \frac{\sin \beta}{b}.

As a variant of the above analysis, we have from (5) again that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, j \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, j \rangle.

As before, the first expression simplifies to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} j e^{i(\pi-\beta)/2} \rangle

which equals {\sin a \cos \beta}. Meanwhile, the second expression can be rearranged as

\displaystyle  \langle e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}, e^{kc/2} j e^{-kc/2} \rangle.

By (2), (3) we can simplify to

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}

\displaystyle= (\cos b) i - (\sin b) \cos(\pi-\alpha) j + (\sin b) \sin(\pi-\alpha) k

and so the inner product is {\cos b \sin c - \cos b \sin c \cos \alpha}, leading to the “five part rule

\displaystyle  \cos b \sin c - \sin b \cos c \cos \alpha = \sin a \cos \beta.

In the case of a right-angled triangle {\beta=\pi/2}, this simplifies to one of Napier’s rules

\displaystyle  \cos \alpha = \frac{\tan c}{\tan b}, \ \ \ \ \ (6)

which in the infinitesimal limit is the familiar {\cos \alpha = \frac{c}{b}}. The other rules of Napier can be derived in a similar fashion.

Example 3 One application of Napier’s rule (6) is to determine the sunrise equation for when the sun rises and sets at a given location on the Earth, and a given time of year. For sake of argument let us work in summer, in which the declination {\delta} of the Sun is positive (due to axial tilt, it reaches a maximum of {23.5^\circ} at the summer solstice). Then the Sun subtends an angle of {\pi/2-\delta} from the pole star (Polaris in the northern hemisphere, Sigma Octantis in the southern hemisphere), and appears to rotate around that pole star once every {24} hours. On the other hand, if one is at a latitude {\phi}, then the pole star an elevation of {\phi} above the horizon. At extremely high latitudes {\phi > \pi/2-\delta}, the sun will never set (a phenomenon known as “midnight sun“); but in all other cases, at sunrise or sunset, the sun, pole star, and horizon point below the pole star will form a right-angled spherical triangle, with hypotenuse subtending an angle {\pi/2-\delta} and vertical side subtending an angle {\phi}. The angle subtended by the pole star in this triangle is {\pi-\omega}, where {\omega} is the solar hour angle {\omega} – the angle that the sun deviates from its noon position. Equation (6) then gives the sunrise equation

\displaystyle  \cos(\pi-\omega) = \frac{\tan \phi}{\tan(\pi/2-\delta)}

or equivalently

\displaystyle  \cos \omega = - \tan \phi \tan \delta.

A similar rule determines the time of sunset. In particular, the number of daylight hours in summer (assuming one is not in the midnight sun scenario {\phi > \pi/2 -\delta}) is given by

\displaystyle  24 - \frac{24}{\pi} \mathrm{arccos}(\tan \phi \tan \delta).

The situation in winter is similar, except that {\delta} is now negative, and polar night (no sunrise) occurs when {\phi > \pi/2+\delta}.

December 23, 2024

Tommaso DorigoTwenty Years Blogging

Twenty years ago today I got access for the first time to the interface that allowed me to publish blog posts for the Quantum Diaries web site, a science outreach endeavor that involved some 12 (then 15, then 25 or so IIRC) researchers around the world. A week before I had been contacted by the Fermilab outreach team, who were setting the thing up, and at that time I did not even know what a blog was!

read more

Matt Strassler The Standard Model More Deeply: The Magic Angle Nailed Down

In a previous post, I showed you that the Standard Model, armed with its special angle θw of approximately 30 degrees, does a pretty good job of predicting a whole host of processes in the Standard Model. I focused attention on the decays of the Z boson, but there were many more processes mentioned in the bonus section of that post.

But the predictions aren’t perfect. They’re not enough to convince a scientist that the Standard Model might be the whole story. So today let’s bring these predictions into better focus.

There are two major issues that we have to correct in order to make more precise predictions using the Standard Model:

  • In contrast to what I assumed in the last post, θw isn’t exactly 30 degrees (i.e. sin θw isn’t 1/2)
  • Although I ignored them so far, the strong nuclear force makes small but important effects

But before we deal with these, we have to fix something with the experimental measurements themselves.

Knowledge and Uncertainty: At the Center of Science

No one complained — but everyone should have — that when I presented the experimental results in my previous post, I expressed them without the corresponding uncertainties. I did that to keep things simple. But it wasn’t professional. As every well-trained scientist knows, when you are comparing an experimental result to a theoretical prediction, the uncertainties, both experimental and theoretical, are absolutely essential in deciding whether your prediction works or not. So we have to discuss this glaring omission.

Here’s how to read typical experimental uncertainties (see Figure 1). Suppose a particle physicist says that a quantity is measured to be x ± y — for instance, that the top quark mass is measured to be 172.57± 0.29 GeV/c2. Usually (unless explicitly noted) that means that the true value has a 68% chance of lying between x-y and x+y — “within one standard deviation” — and a 95% chance of lying between x-2y and x+2y — “within two standard deviations.” (See Figure 1, where x and y are called  \mu and  \sigma .) The chance of the true value being more than two standard deviations away from x is about 5% — about 1/20. That’s not rare! It will happen several times if you make a hundred different measurements.

Figure 1: Experimental uncertainties corresponding to  \mu \pm \sigma , where  \mu is the “central value” and “ \sigma ” is a “standard deviation.

But the chance of being more than three standard deviations away from x is a small fraction of a percent — as long as the cause is purely a statistical fluke — and that is indeed rare. (That said, one has to remember that big differences between prediction and measurement can also be due to an unforseen measurement problem or feature. That won’t be an issue today.)

W Boson Decays, More Precisely

Let’s first look at W decays, where we don’t have the complication of θw , and see what happens when we account for the effect of the strong nuclear force and the impact of experimental uncertainies.

The strong nuclear force slightly increases the rate for the W boson to decay to any quark/anti-quark pair, by about 3%. This is due to the same effect discussed in the “Understanding the Remaining Discrepancy” and “Strength of a Force” sections of this post… though the effect here is a little smaller (as it decreases at shorter distances and higher energies.) This slightly increases the percentages for quarks and, to compensate, slightly reduces the percentages for the electron, muon and tau (the “leptons”).

In Figure 2 are shown predictions of the Standard Model for the probabilities of the W- boson’s various decays:

  • At left are the predictions made in the previous post.
  • At center are better predictions that account for the strong nuclear force.

(To do this properly, uncertainties on these predictions should also be provided. But I don’t think that doing so would add anything to this post, other than complications.) These predictions are then compared with the experimental measurements of several quantities, shown at right: certain combinations of these decays that are a little easier to measure are also shown. (The measurements and uncertainties are published by the Particle Data Group here.)

Figure 2: The decay probabilities for W bosons, showing the percentage of W bosons that decay to certain particles. Predictions are given both before (left) and after (center) accounting for effects of the strong nuclear force. Experimental results are given at right, showing all measurements that can be directly performed.

The predictions and measurements do not perfectly agree. But that’s fine; because of the uncertainties in the measurements, they shouldn’t perfectly agree! All of the differences are less than two standard deviations, except for the probability for decay of a W to a tau and its anti-neutrino. That deviation is less than three standard deviations — and as I noted, if you have enough measurements, you’ll occasionally get one that differs by more than two standard deviations. We still might wonder if something funny is up with the tau, but we don’t have enough evidence of that yet. Let’s see what the Z boson teaches us later.

In any case, to a physicist’s eye, there is no sign here of any notable disgreement between theory and experiment in these results. Within current uncertainties, the Standard Model correctly predicts the data.

Z Boson Decays, More Precisely

Now let’s do the same for the Z boson, but here we have three steps:

  • first, the predictions when we take sin θw = 1/2, as we did in the previous post;
  • second, the predictions when we take sin θw = 0.48;
  • third, the better predictions when we also include the effect of the strong nuclear force.

And again Figure 3 compares predictions with the data.

Figure 3: The decay probabilities for Z bosons, showing the percentage of Z bosons that decay to certain particles. Predictions are given (left to right) for sin θw = 0.5, for sin θw =0.48, and again sin θw = 0.48 with the effect of strong nuclear force accounted for. Experimental results are given at right, showing all measurements that can be directly performed.

You notice that some of the experimental measurements have extremely small uncertainties! This is especially true of the decays to electrons, to muons, to taus, and (collectively) to the three types of neutrinos. Let’s look at them closely.

If you look at the predictions with sin θw = 1/2 for the electrons, muons and taus, they are in disagreement with the measurements by a lot. For example, in Z decay to muons, the initial prediction differs from the data by 19 standard deviations!! Not even close. For sin θw = 0.48 but without accounting for the strong nuclear force, the disagreement drops to 11 standard deviations; still terrible. But once we account also for the strong nuclear force, the predictions agree with data to within 1 to 2 standard deviations for all three types of particles.

As for the decays to neutrinos, the three predictions differ by 16 standard deviations, 9 standard deviations, and… below 2 standard deviations.

My reaction, when this data came in in the 1990s, was “Wow.” I hope yours is similar. Such close matching of the Standard Model’s predictions with highly precise measurements is a truly stunning sucesss.

Notice that the successful prediction requires three of the Standard Model’s forces: the mixture of the electromagnetic and weak nuclear forces given by the magic angle, with a small effect from the strong nuclear force. Said another way, all of the Standard Model’s particles except the Higgs boson and top quark play a role in Figs. 2 and 3. (The Higgs field, meanwhile, is secretly in the background, giving the W and Z bosons their masses and affecting the Z boson’s interactions with the other particles; and the top quark is hiding in the background too, since it can’t be removed without changing how the Z boson interacts with bottom quarks.) You can’t take any part of the Standard Model out without messing up these predictions completely.

Oh, and by the way, remember how the probability for W decay to a tau and a neutrino in Fig. 2 was off the prediction by more than two standard deviations? Well there’s nothing weird about the tau or the neutrinos in Fig. 3 — predictions and measurements agree just fine — and indeed, no numbers in Z decay differ from predictions by more than two standard deviations. As I said earlier, the expectation is that about one in every twenty measurements should differ from its true value by more than two standard deviations. Since we have over a dozen measurements in Figs. 2 and 3, it’s no surprise that one of them might be two standard deviations off… and so we can’t use that single disagreement as evidence that the Standard Model doesn’t work.

Asymmetries, Precisely

Let’s do one more case: one of the asymmetries that I mentioned in the bonus section of the previous post. Consider a forward-backward asymmetry shown in Fig. 4. Take all collisions in which an electron strikes a positron (the anti-particle of an electron) and turns into a muon and an anti-muon. Now compare the probability that the muon goes “forward” (roughly the direction that the electron is heading) to the probability that it goes “backward” (roughly the direction that the positron is heading.) If the two probabilities are equal, then the asymmetry would be zero; if the muon always goes to the left, then the asymmetry would be 100%; if always to the right, the asymmetry would be -100%.

Figure 4: In electron-positron collisions that make a muon/anti-muon pair, the forward-backward asymmetry compares the rate for “forward” production (where the muon travels roughly in the same direction as the electron) to “backward” production.

Asymmetries are special because the effect of the strong nuclear force cancels out of them completely, and so they only depend on sin θw. And this particular “leptonic forward-backward” asymmetry is an example with a special feature: if sin θw were exactly 1/2, this asymmetry for lepton production would be predicted to be exactly zero.

But the measured value of this asymmetry, while quite small (less than 2%), is definitely not zero, and so this is another confirmation that sin θw is not exactly 1/2. So let’s instead compare the prediction for this asymmetry using sin θw = 0.48, the choice that worked so well for the Z boson’s decays in Fig. 3, with the data.

In Figure 5, the horizontal axis shows the lepton forward-backward asymmetry. The prediction of 1.8% that one obtains for sin θw = 0.48, widened slightly to cover 1.65% to 2.0%, which is what obtains for sin θw between 0.479 and 0.481, is shown in pink. The four open circles represent four measurements of the asymmetry by the four experiments that were located at the LEP collider; the dashes through the circles show the standard deviations on their measurements. The dark circle shows what one gets when one combines the four experiments’ data together, obtaining an even better statistical estimate: 1.71 ± 0.10%, the uncertainty being indicated both as the dash going through the solid circle and as the yellow band. Since the yellow band extends to just above 1.8%, we see that the data differs from the sin θw = 0.480 prediction (the center of the pink band) by less than one standard deviation… giving precise agreement of the Standard Model with this very small but well-measured asymmetry.

Figure 5: The data from four experiments at the LEP collider (open circles, with uncertainties shown as dashes), and the combination of their results (closed circle) giving an asymmetry of 1.70% with an uncertainty of ±0.10% (yellow bar.) The prediction of the Standard Model for sin θw between 0.479 and 0.481 is shown in pink; its central value of 1.8% is within one standard deviation of the data.

Predictions of other asymmetries show similar success, as do numerous other measurements.

The Big Picture

Successful predictions like these, especially ones in which both theory and experiment are highly precise, explain why particle physicists have such confidence in the Standard Model, despite its clear limitations.

What limitations of the Standard Model am I referring too? They are many, but one of them is simply that the Standard Model does not predict θw . No one can say why θw takes the value that it has, or whether the fact that it is close to 30 degrees is a clue to its origin or a mere coincidence. Instead, of the many measurements, we use a single one (such as one of the asymmetries) to extract its value, and then can predict many other quantities.

One thing I’ve neglected to do is to convey the complexity of the calculations that are needed to compare the Standard Model predictions to data. To carry out these computations much more carefully than I did in Figs. 2, 3 and 5, in order to make them as precise as the measurements, demands specialized knowledge and experience. (As an example of how tricky these computations can be: even defining what one means by sin θw can be ambiguous in precise enough calculations, and so one needs considerable expertise [which I do not have] to define it correctly and use that definition consistently.) So there are actually still more layers of precision that I could go into…!

But I think perhaps I’ve done enough to convince you that the Standard Model is a fortress. Sure, it’s not a finished construction. Yet neither will it be easily overthrown.

John PreskillFinding Ed Jaynes’s ghost

You might have heard of the conundrum “What do you give the man who has everything?” I discovered a variation on it last October: how do you celebrate the man who studied (nearly) everything? Physicist Edwin Thompson Jaynes impacted disciplines from quantum information theory to biomedical imaging. I almost wrote “theoretical physicist,” instead of “physicist,” but a colleague insisted that Jaynes had a knack for electronics and helped design experiments, too. Jaynes worked at Washington University in St. Louis (WashU) from 1960 to 1992. I’d last visited the university in 2018, as a newly minted postdoc collaborating with WashU experimentalist Kater Murch. I’d scoured the campus for traces of Jaynes like a pilgrim seeking a saint’s forelock or humerus. The blog post “Chasing Ed Jaynes’s ghost” documents that hunt.

I found his ghost this October.

Kater and colleagues hosted the Jaynes Centennial Symposium on a brilliant autumn day when the campus’s trees were still contemplating shedding their leaves. The agenda featured researchers from across the sciences and engineering. We described how Jaynes’s legacy has informed 21st-century developments in quantum information theory, thermodynamics, biophysics, sensing, and computation. I spoke about quantum thermodynamics and information theory—specifically, incompatible conserved quantities, about which my research-group members and I have blogged many times.

Irfan Siddiqi spoke about quantum technologies. An experimentalist at the University of California, Berkeley, Irfan featured on Quantum Frontiers seven years ago. His lab specializes in superconducting qubits, tiny circuits in which current can flow forever, without dissipating. How can we measure a superconducting qubit? We stick the qubit in a box. Light bounces back and forth across the box. The light interacts with the qubit while traversing it, in accordance with the Jaynes–Cummings model. We can’t seal any box perfectly, so some light will leak out. That light carries off information about the qubit. We can capture the light using a photodetector to infer about the qubit’s state.

The first half of Jaynes–Cummings

Bill Bialek, too, spoke about inference. But Bill is a Princeton biophysicist, so fruit flies preoccupy him more than qubits do. A fruit fly metamorphoses from a maggot that hatches from an egg. As the maggot develops, its cells differentiate: some form a head, some form a tail, and so on. Yet all the cells contain the same genetic information. How can a head ever emerge, to differ from a tail? 

A fruit-fly mother, Bill revealed, injects molecules into an egg at certain locations. These molecules diffuse across the egg, triggering the synthesis of more molecules. The knock-on molecules’ concentrations can vary strongly across the egg: a maggot’s head cells contain molecules at certain concentrations, and the tail cells contain the same molecules at other concentrations.

At this point in Bill’s story, I was ready to take my hat off to biophysicists for answering the question above, which I’ll rephrase here: if we find that a certain cell belongs to a maggot’s tail, why does the cell belong to the tail? But I enjoyed even more how Bill turned the question on its head (pun perhaps intended): imagine that you’re a maggot cell. How can you tell where in the maggot you are, to ascertain how to differentiate? Nature asks this question (loosely speaking), whereas human observers ask Bill’s first question.

To answer the second question, Bill recalled which information a cell accesses. Suppose you know four molecules’ concentrations: c_1, c_2, c_3, and c_4. How accurately can you predict the cell’s location? That is, what probability does the cell have of sitting at some particular site, conditioned on the cs? That probability is large only at one site, biophysicists have found empirically. So a cell can accurately infer its position from its molecules’ concentrations.

I’m no biophysicist (despite minor evidence to the contrary), but I enjoyed Bill’s story as I enjoyed Irfan’s. Probabilities, information, and inference are abstract notions; yet they impact physical reality, from insects to quantum science. This tension between abstraction and concreteness arrested me when I first encountered entropy, in a ninth-grade biology lecture. The tension drew me into information theory and thermodynamics. These toolkits permeate biophysics as they permeate my disciplines. So, throughout the symposium, I spoke with engineers, medical-school researchers, biophysicists, thermodynamicists, and quantum scientists. They all struck me as my kind of people, despite our distribution across the intellectual landscape. Jaynes reasoned about distributions—probability distributions—and I expect he’d have approved of this one. The man who studied nearly everything deserves a celebration that illuminates nearly everything.

December 22, 2024

Jordan EllenbergLive at the Lunchable

Much-needed new housing is going up on Madison’s north side where the Oscar Meyer plant used to stand, with more to come. The View and The Victoria will join other new apartment buildings in town, like Verve, and Chapter, and The Eastern. I think it would be a shame if the redevelopment failed to honor the greatest innovation Oscar Meyer ever devised at its Madison facility. There should be a luxury apartment building called The Lunchable.

December 21, 2024

John PreskillBeyond NISQ: The Megaquop Machine

On December 11, I gave a keynote address at the Q2B 2024 Conference in Silicon Valley. This is a transcript of my remarks. The slides I presented are here.

NISQ and beyond

I’m honored to be back at Q2B for the 8th year in a row.

The Q2B conference theme is “The Roadmap to Quantum Value,” so I’ll begin by showing a slide from last year’s talk. As best we currently understand, the path to economic impact is the road through fault-tolerant quantum computing. And that poses a daunting challenge for our field and for the quantum industry.

We are in the NISQ era. And NISQ technology already has noteworthy scientific value. But as of now there is no proposed application of NISQ computing with commercial value for which quantum advantage has been demonstrated when compared to the best classical hardware running the best algorithms for solving the same problems. Furthermore, currently there are no persuasive theoretical arguments indicating that commercially viable applications will be found that do not use quantum error-correcting codes and fault-tolerant quantum computing.

NISQ, meaning Noisy Intermediate-Scale Quantum, is a deliberately vague term. By design, it has no precise quantitative meaning, but it is intended to convey an idea: We now have quantum machines such that brute force simulation of what the quantum machine does is well beyond the reach of our most powerful existing conventional computers. But these machines are not error-corrected, and noise severely limits their computational power.

In the future we can envision FASQ* machines, Fault-Tolerant Application-Scale Quantum computers that can run a wide variety of useful applications, but that is still a rather distant goal. What term captures the path along the road from NISQ to FASQ? Various terms retaining the ISQ format of NISQ have been proposed [here, here, here], but I would prefer to leave ISQ behind as we move forward, so I’ll speak instead of a megaquop or gigaquop machine and so on meaning one capable of executing a million or a billion quantum operations, but with the understanding that mega means not precisely a million but somewhere in the vicinity of a million.

Naively, a megaquop machine would have an error rate per logical gate of order 10^{-6}, which we don’t expect to achieve anytime soon without using error correction and fault-tolerant operation. Or maybe the logical error rate could be somewhat larger, as we expect to be able to boost the simulable circuit volume using various error mitigation techniques in the megaquop era just as we do in the NISQ era. Importantly, the megaquop machine would be capable of achieving some tasks beyond the reach of classical, NISQ, or analog quantum devices, for example by executing circuits with of order 100 logical qubits and circuit depth of order 10,000.

What resources are needed to operate it? That depends on many things, but a rough guess is that tens of thousands of high-quality physical qubits could suffice. When will we have it? I don’t know, but if it happens in just a few years a likely modality is Rydberg atoms in optical tweezers, assuming they continue to advance in both scale and performance.

What will we do with it? I don’t know, but as a scientist I expect we can learn valuable lessons by simulating the dynamics of many-qubit systems on megaquop machines. Will there be applications that are commercially viable as well as scientifically instructive? That I can’t promise you.

The road to fault tolerance

To proceed along the road to fault tolerance, what must we achieve? We would like to see many successive rounds of accurate error syndrome measurement such that when the syndromes are decoded the error rate per measurement cycle drops sharply as the code increases in size. Furthermore, we want to decode rapidly, as will be needed to execute universal gates on protected quantum information. Indeed, we will want the logical gates to have much higher fidelity than physical gates, and for the logical gate fidelities to improve sharply as codes increase in size. We want to do all this at an acceptable overhead cost in both the number of physical qubits and the number of physical gates. And speed matters — the time on the wall clock for executing a logical gate should be as short as possible.

A snapshot of the state of the art comes from the Google Quantum AI team. Their recently introduced Willow superconducting processor has improved transmon lifetimes, measurement errors, and leakage correction compared to its predecessor Sycamore. With it they can perform millions of rounds of surface-code error syndrome measurement with good stability, each round lasting about a microsecond. Most notably, they find that the logical error rate per measurement round improves by a factor of 2 (a factor they call Lambda) when the code distance increases from 3 to 5 and again from 5 to 7, indicating that further improvements should be achievable by scaling the device further. They performed accurate real-time decoding for the distance 3 and 5 codes. To further explore the performance of the device they also studied the repetition code, which corrects only bit flips, out to a much larger code distance. As the hardware continues to advance we hope to see larger values of Lambda for the surface code, larger codes achieving much lower error rates, and eventually not just quantum memory but also logical two-qubit gates with much improved fidelity compared to the fidelity of physical gates.

Last year I expressed concern about the potential vulnerability of superconducting quantum processors to ionizing radiation such as cosmic ray muons. In these events, errors occur in many qubits at once, too many errors for the error-correcting code to fend off. I speculated that we might want to operate a superconducting processor deep underground to suppress the muon flux, or to use less efficient codes that protect against such error bursts.

The good news is that the Google team has demonstrated that so-called gap engineering of the qubits can reduce the frequency of such error bursts by orders of magnitude. In their studies of the repetition code they found that, in the gap-engineered Willow processor, error bursts occurred about once per hour, as opposed to once every ten seconds in their earlier hardware.  Whether suppression of error bursts via gap engineering will suffice for running deep quantum circuits in the future is not certain, but this progress is encouraging. And by the way, the origin of the error bursts seen every hour or so is not yet clearly understood, which reminds us that not only in superconducting processors but in other modalities as well we are likely to encounter mysterious and highly deleterious rare events that will need to be understood and mitigated.

Real-time decoding

Fast real-time decoding of error syndromes is important because when performing universal error-corrected computation we must frequently measure encoded blocks and then perform subsequent operations conditioned on the measurement outcomes. If it takes too long to decode the measurement outcomes, that will slow down the logical clock speed. That may be a more serious problem for superconducting circuits than for other hardware modalities where gates can be orders of magnitude slower.

For distance 5, Google achieves a latency, meaning the time from when data from the final round of syndrome measurement is received by the decoder until the decoder returns its result, of about 63 microseconds on average. In addition, it takes about another 10 microseconds for the data to be transmitted via Ethernet from the measurement device to the decoding workstation. That’s not bad, but considering that each round of syndrome measurement takes only a microsecond, faster would be preferable, and the decoding task becomes harder as the code grows in size.

Riverlane and Rigetti have demonstrated in small experiments that the decoding latency can be reduced by running the decoding algorithm on FPGAs rather than CPUs, and by integrating the decoder into the control stack to reduce communication time. Adopting such methods may become increasingly important as we scale further. Google DeepMind has shown that a decoder trained by reinforcement learning can achieve a lower logical error rate than a decoder constructed by humans, but it’s unclear whether that will work at scale because the cost of training rises steeply with code distance. Also, the Harvard / QuEra team has emphasized that performing correlated decoding across multiple code blocks can reduce the depth of fault-tolerant constructions, but this also increases the complexity of decoding, raising concern about whether such a scheme will be scalable.

Trading simplicity for performance

The Google processors use transmon qubits, as do superconducting processors from IBM and various other companies and research groups. Transmons are the simplest superconducting qubits and their quality has improved steadily; we can expect further improvement with advances in materials and fabrication. But a logical qubit with very low error rate surely will be a complicated object due to the hefty overhead cost of quantum error correction. Perhaps it is worthwhile to fashion a more complicated physical qubit if the resulting gain in performance might actually simplify the operation of a fault-tolerant quantum computer in the megaquop regime or well beyond. Several versions of this strategy are being pursued.

One approach uses cat qubits, in which the encoded 0 and 1 are coherent states of a microwave resonator, well separated in phase space, such that the noise afflicting the qubit is highly biased. Bit flips are exponentially suppressed as the mean photon number of the resonator increases, while the error rate for phase flips induced by loss from the resonator increases only linearly with the photon number. This year the AWS team built a repetition code to correct phase errors for cat qubits that are passively protected against bit flips, and showed that increasing the distance of the repetition code from 3 to 5 slightly improves the logical error rate. (See also here.)

Another helpful insight is that error correction can be more effective if we know when and where the errors occur in a quantum circuit. We can apply this idea using a dual rail encoding of the qubits. With two microwave resonators, for example, we can encode a qubit by placing a single photon in either the first resonator (the 10) state, or the second resonator (the 01 state). The dominant error is loss of a photon, causing either the 01 or 10 state to decay to 00. One can check whether the state is 00, detecting whether the error occurred without disturbing a coherent superposition of 01 and 10. In a device built by the Yale / QCI team, loss errors are detected over 99% of the time and all undetected errors are relatively rare. Similar results were reported by the AWS team, encoding a dual-rail qubit in a pair of transmons instead of resonators.

Another idea is encoding a finite-dimensional quantum system in a state of a resonator that is highly squeezed in two complementary quadratures, a so-called GKP encoding. This year the Yale group used this scheme to encode 3-dimensional and 4-dimensional systems with decay rate better by a factor of 1.8 than the rate of photon loss from the resonator. (See also here.)

A fluxonium qubit is more complicated than a transmon in that it requires a large inductance which is achieved with an array of Josephson junctions, but it has the advantage of larger anharmonicity, which has enabled two-qubit gates with better than three 9s of fidelity, as the MIT team has shown.

Whether this trading of simplicity for performance in superconducting qubits will ultimately be advantageous for scaling to large systems is still unclear. But it’s appropriate to explore such alternatives which might pay off in the long run.

Error correction with atomic qubits

We have also seen progress on error correction this year with atomic qubits, both in ion traps and optical tweezer arrays. In these platforms qubits are movable, making it possible to apply two-qubit gates to any pair of qubits in the device. This opens the opportunity to use more efficient coding schemes, and in fact logical circuits are now being executed on these platforms. The Harvard / MIT / QuEra team sampled circuits with 48 logical qubits on a 280-qubit device –- that big news broke during last year’s Q2B conference. Atom computing and Microsoft ran an algorithm with 28 logical qubits on a 256-qubit device. Quantinuum and Microsoft prepared entangled states of 12 logical qubits on a 56-qubit device.

However, so far in these devices it has not been possible to perform more than a few rounds of error syndrome measurement, and the results rely on error detection and postselection. That is, circuit runs are discarded when errors are detected, a scheme that won’t scale to large circuits. Efforts to address these drawbacks are in progress. Another concern is that the atomic movement slows the logical cycle time. If all-to-all coupling enabled by atomic movement is to be used in much deeper circuits, it will be important to speed up the movement quite a lot.

Toward the megaquop machine

How can we reach the megaquop regime? More efficient quantum codes like those recently discovered by the IBM team might help. These require geometrically nonlocal connectivity and are therefore better suited for Rydberg optical tweezer arrays than superconducting processors, at least for now. Error mitigation strategies tailored for logical circuits, like those pursued by Qedma, might help by boosting the circuit volume that can be simulated beyond what one would naively expect based on the logical error rate. Recent advances from the Google team, which reduce the overhead cost of logical gates, might also be helpful.

What about applications? Impactful applications to chemistry typically require rather deep circuits so are likely to be out of reach for a while yet, but applications to materials science provide a more tempting target in the near term. Taking advantage of symmetries and various circuit optimizations like the ones Phasecraft has achieved, we might start seeing informative results in the megaquop regime or only slightly beyond.

As a scientist, I’m intrigued by what we might conceivably learn about quantum dynamics far from equilibrium by doing simulations on megaquop machines, particularly in two dimensions. But when seeking quantum advantage in that arena we should bear in mind that classical methods for such simulations are also advancing impressively, including in the past year (for example, here and here).

To summarize, advances in hardware, control, algorithms, error correction, error mitigation, etc. are bringing us closer to megaquop machines, raising a compelling question for our community: What are the potential uses for these machines? Progress will require innovation at all levels of the stack.  The capabilities of early fault-tolerant quantum processors will guide application development, and our vision of potential applications will guide technological progress. Advances in both basic science and systems engineering are needed. These are still the early days of quantum computing technology, but our experience with megaquop machines will guide the way to gigaquops, teraquops, and beyond and hence to widely impactful quantum value that benefits the world.

I thank Dorit Aharonov, Sergio Boixo, Earl Campbell, Roland Farrell, Ashley Montanaro, Mike Newman, Will Oliver, Chris Pattison, Rob Schoelkopf, and Qian Xu for helpful comments.

*The acronym FASQ was suggested to me by Andrew Landahl.

The megaquop machine (image generated by ChatGPT.
The megaquop machine (image generated by ChatGPT).

n-Category Café Random Permutations (Part 14)

I want to go back over something from Part 11, but in a more systematic and self-contained way.

Namely, I want to prove a wonderful known fact about random permutations, the Cycle Length Lemma, using a bit of category theory. The idea here is that the number of kk-cycles in a random permutation of nn things is a random variable. Then comes a surprise: in the limit as nn \to \infty, this random variable approaches a Poisson distribution with mean 1/k1/k. And even better, for different choices of kk these random variables become independent in the nn \to \infty limit.

I’m stating these facts roughly now, to not get bogged down. But I’ll state them precisely, prove them, and categorify them. That is, I’ll state equations involving random variables — but I’ll prove that these equations come from equivalences of groupoids!

First I’ll state the Cycle Length Lemma, which summarizes a lot of interesting facts about random permutations. Then I’ll state and prove a categorified version of the Cycle Length Lemma, which asserts an equivalence of groupoids. Then I’ll derive the original version of the lemma from this categorified version by taking the cardinalities of these groupoids. The categorified version contains more information, so it’s not just a trick for proving the original lemma.

What do groupoids have to do with random permutations? You’ll see, but it’s an example of ‘principle of indifference’, especially in its modern guise, called the ‘principle of transformation groups’: the idea that outcomes related by a symmetry should have the same probability. This sets up a connection between groupoids and probability theory — and as we’ll see, we can “go down” from groupoids to probabilities using the theory of groupoid cardinalities.

The Cycle Length Lemma

In the theory of random permutations, we treat the symmetric group S nS_n as a probability measure space where each element has the same measure, namely 1/n!1/n!. Functions f:S nf \colon S_n \to \mathbb{R} then become random variables, and we can study their expected values:

E(f)=1n! σS nf(σ). E(f) = \frac{1}{n!} \sum_{\sigma \in S_n} f(\sigma).

An important example is the function

C k:S n C_k \colon S_n \to \mathbb{N}

that counts, for any permutation σS n\sigma \in S_n, its number of cycles of length kk, also called kk-cycles. A well-known but striking fact about random permutations is that whenever knk \le n, the expected number of kk-cycles is 1/k1/k:

E(C k)=1k E(C_k) = \frac{1}{k}

For example, a random permutation of any finite set has, on average, one fixed point!

Another striking fact is that whenever jkj \ne k and j+knj + k \le n, so that it’s possible for a permutation σS n\sigma \in S_n to have both a jj-cycle and a kk-cycle, the random variables C jC_j and C kC_k are uncorrelated in the following sense:

E(C jC k)=E(C j)E(C k). E(C_j C_k) = E(C_j) E(C_k) .

You might at first think that having lots of jj-cycles for some large jj would tend to inhibit the presence of kk-cycles for some other large value of kk, but that’s not true unless j+k>nj + k \gt n, when it suddenly becomes impossible to have both a jj-cycle and a kk-cycle!

These two facts are special cases of the Cycle Length Lemma. To state this lemma in full generality, recall that the number of ordered pp-tuples of distinct elements of an nn-element set is the falling power

n p̲=n(n1)(n2)(np+1). n^{\underline{p}} = n(n-1)(n-2) \, \cdots \, (n-p+1).

It follows that the function

C k p̲:S n C_k^{\underline{p}} \colon S_n \to \mathbb{N}

counts, for any permutation in S nS_n, its ordered pp-tuples of distinct kk-cycles. We can also replace the word ‘distinct’ here by ‘disjoint’, without changing the meaning, since distinct cycles must be disjoint.

The two striking facts mentioned above generalize as follows:

1) First, whenever pknp k \le n, so that it is possible for a permutation in S nS_n to have pp distinct kk-cycles, then

E(C k p̲)=1k p. E(C_k^{\underline{p}}) = \frac{1}{k^p}.

If you know about the moments of a Poisson distribution here’s a nice equivalent way to state this equation: when pknp k \le n, the ppth moment of the random variable C kC_k equals that of a Poisson distribution with mean 1/k1/k.

2) Second, the random variables C kC_k are better and better approximated by independent Poisson distributions. To state this precisely we need a bit of notation. Let p\vec{p} denote an nn-tuple (p 1,,p n)(p_1 , \dots, p_n) of natural numbers, and let

|p|=p 1+2p 2++np n. |\vec{p}| = p_1 + 2p_2 + \cdots + n p_n.

If |p|n|\vec{p}| \le n, it is possible for a permutation σS n\sigma \in S_n to have a collection of distinct cycles, with p 1p_1 cycles of length 1, p 2p_2 cycles of length 2, and so on up to p np_n cycles of length nn. If |p|>n|\vec{p}| \gt n, this is impossible. In the former case, where |p|n|\vec{p}| \le n, we always have

E( k=1 nC k p̲ k)= k=1 nE(C k p̲ k). E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right) = \prod_{k=1}^n E( C_k^{\underline{p}_k}) .

Taken together, 1) and 2) are equivalent to the Cycle Length Lemma, which may be stated in a unified way as follows:

Cycle Length Lemma. Suppose p 1,,p np_1 , \dots, p_n \in \mathbb{N}. Then

E( k=1 nC k p̲ k)={ k=1 n1k p k if|p|n 0 if|p|>n E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right) = \left\{ \begin{array}{ccc} \displaystyle{ \prod_{k=1}^n \frac{1}{k^{p_k}} } & & \mathrm{if} \; |\vec{p}| \le n \\ \\ 0 & & \mathrm{if} \; |\vec{p}| \gt n \end{array} \right.

This appears, for example, in Ford’s course notes on random permutations and the statistical properties of prime numbers [Lemma 1.1, F]. The most famous special case is when |p|=n|\vec{p}| = n. Apparently this goes back to Cauchy, but I don’t know where he proved it. I believe he would have phrased it in terms of counting permutations, not probabilities.

I won’t get into details of precisely the sense in which random variables C kC_k approach independent Poisson distributions. For that, see Arratia and Tavaré [AT].

The Categorified Cycle Length Lemma

To categorify the Cycle Length Lemma, the key is to treat a permutation as an extra structure that we can put on a set, and then consider the groupoid of nn-element sets equipped with this extra structure:

Definition. Let Perm(n)\mathsf{Perm}(n) be the groupoid in which

  • an object is an nn-element set equipped with a permutation σ:XX\sigma \colon X \to X

and

  • a morphism from σ:XX\sigma \colon X \to X to σ:XX\sigma' \colon X' \to X' is a bijection f:XXf \colon X \to X' that is permutation-preserving in the following sense:

fσf 1=σ. f \circ \sigma \circ f^{-1} = \sigma'.

We’ll need this strange fact below: if n<0n \lt 0 then Perm(n)\mathsf{Perm}(n) is the empty groupoid (that is, the groupoid with no objects and no morphisms).

More importantly, we’ll need a fancier groupoid where a set is equipped with a permutation together with a list of distinct cycles of specified lengths. For any nn \in \mathbb{N} and any nn-tuple of natural numbers p=(p 1,,p n)\vec{p} = (p_1 , \dots, p_n), recall that we have defined

|p|=p 1+2p 2++np n. |\vec{p}| = p_1 + 2p_2 + \cdots + n p_n.

Definition. Let A p\mathsf{A}_\vec{p} be the groupoid of nn-element sets XX equipped with a permutation σ:XX\sigma \colon X \to X that is in turn equipped with a choice of an ordered p 1p_1-tuple of distinct 11-cycles, an ordered p 2p_2-tuple of distinct 22-cycles, and so on up to an ordered p np_n-tuple of distinct nn-cycles. A morphism in this groupoid is a bijection that is permutation-preserving and also preserves the ordered tuples of distinct cycles.

Note that if |p|>n|p| \gt n, no choice of disjoint cycles with the specified property exists, so A pA_\vec{p} is the empty groupoid.

Finally, we need a bit of standard notation. For any group GG we write B(G)\mathsf{B}(G) for its delooping: that is, the groupoid that has one object \star and Aut()=G\mathrm{Aut}(\star) = G.

The Categorified Cycle Length Lemma. For any p=(p 1,,p n) n\vec{p} = (p_1 , \dots, p_n) \in \mathbb{N}^n we have

A pPerm(n|p|)× k=1 nB(/k) p k \mathsf{A}_{\vec{p}} \simeq \mathsf{Perm}(n - |\vec{p}|) \; \times \; \prod_{k = 1}^n \mathsf{B}(\mathbb{Z}/k)^{p_k}

Proof. Both sides are empty groupoids when n|p|<0n - |\vec{p}| \lt 0, so assume n|p|0n - |\vec{p}| \ge 0. A groupoid is equivalent to any full subcategory of that groupoid containing at least one object from each isomorphism class. So, fix an nn-element set XX and a subset YXY \subseteq X with n|p|n - |\vec{p}| elements. Partition XYX - Y into subsets S kS_{k\ell} where S kS_{k \ell} has cardinality kk, 1kn1 \le k \le n, and 1p k1 \le \ell \le p_k. Every object of A p\mathsf{A}_{\vec{p}} is isomorphic to the chosen set XX equipped with some permutation σ:XX\sigma \colon X \to X that has each subset S kS_{k \ell} as a kk-cycle. Thus A p\mathsf{A}_{\vec{p}} is equivalent to its full subcategory containing only objects of this form.

An object of this form consists of an arbitrary permutation σ Y:YY\sigma_Y \colon Y \to Y and a cyclic permutation σ k:S kS k\sigma_{k \ell} \colon S_{k \ell} \to S_{k \ell} for each k,k,\ell as above. Consider a second object of this form, say σ Y:YY\sigma'_Y \colon Y \to Y equipped with cyclic permutations σ k\sigma'_{k \ell}. Then a morphism from the first object to the second consists of two pieces of data. First, a bijection

f:YY f \colon Y \to Y

such that

σ Y=fσ Yf 1. \sigma'_Y = f \circ \sigma_Y \circ f^{-1}.

Second, for each k,k,\ell as above, bijections

f k:S kS k f_{k \ell} \colon S_{k \ell} \to S_{k \ell}

such that

σ k=f kσ kf k 1. \sigma'_{k \ell} = f_{k \ell} \circ \sigma_{k \ell} \circ f_{k \ell}^{-1}.

Since YY has n|p|n - |\vec{p}| elements, while σ k\sigma_{k \ell} and σ k\sigma'_{k \ell} are cyclic permutations of kk-element sets, it follows that A p\mathsf{A}_{\vec{p}} is equivalent to

Perm(n|p|)× k=1 nB(/k) p k. \mathsf{Perm}(n - |\vec{p}|) \; \times \; \prod_{k = 1}^n \mathsf{B}(\mathbb{Z}/k)^{p_k}. \qquad \qquad &#9646;

The case where |p|=n|\vec{p}| = n is especially pretty, since then our chosen cycles completely fill up our nn-element set and we have

A p k=1 nB(/k) p k. \mathsf{A}_{\vec{p}} \simeq \prod_{k = 1}^n \mathsf{B}(\mathbb{Z}/k)^{p_k}.

Groupoid Cardinality

The cardinality of finite sets has a natural extension to finite groupoids, and this turns out to be the key to extracting results on random permutations from category theory. Let’s briefly recall the idea of ‘groupoid cardinality’ [BD,BHW]. Any finite groupoid G\mathsf{G} is equivalent to a coproduct of finitely many one-object groupoids, which are deloopings of finite groups G 1,,G mG_1, \dots, G_m:

G i=1 mB(G i), \mathsf{G} \simeq \sum_{i = 1}^m \mathsf{B}(G_i),

and then the cardinality of G\mathsf{G} is defined to be

|G|= i=1 m1|G i|. |\mathsf{G}| = \sum_{i = 1}^m \frac{1}{|G_i|}.

This concept of groupoid cardinality has various nice properties. For example it’s additive:

|G+H|=|G|+|H| |\mathsf{G} + \mathsf{H}| = |\mathsf{G}| + |\mathsf{H}|

and multiplicative:

|G×H|=|G|×|H| |\mathsf{G} \times \mathsf{H}| = |\mathsf{G}| \times |\mathsf{H}|

and invariant under equivalence of groupoids:

GH|G|=|H|. \mathsf{G} \simeq \mathsf{H} \implies |\mathsf{G}| = |\mathsf{H}|.

But none of these three properties require that we define |G||\mathsf{G}| as the sum of the reciprocals of the cardinalities |G i||G_i|: any other power of these cardinalities would work just as well. What makes the reciprocal cardinalities special is that if GG is a finite group acting on a set SS, we have

|SG|=|S|/|G| |S\sslash G| = |S|/|G|

where the groupoid SGS \sslash G is the weak quotient or homotopy quotient of SS by GG, also called the action groupoid. This is the groupoid with elements of SS as objects and one morphism from ss to ss' for each gGg \in G with gs=sg s = s', with composition of morphisms coming from multiplication in GG.

The groupoid of nn-element sets equipped with permutation, Perm(n)\mathsf{Perm}(n), has a nice description in terms of weak quotients:

Lemma. For all nn \in \mathbb{N} we have an equivalence of groupoids

Perm(n)S nS n \mathsf{Perm}(n) \simeq S_n \sslash S_n

where the group S nS_n acts on the underlying set of S nS_n by conjugation.

Proof. We use the fact that Perm(n)\mathrm{Perm}(n) is equivalent to any full subcategory of Perm(n)\mathrm{Perm}(n) containing at least one object from each isomorphism class. For Perm(n)\mathsf{Perm}(n) we can get such a subcategory by fixing an nn-element set, say X={1,,n}X = \{1,\dots, n\}, and taking only objects of the form σ:XX\sigma \colon X \to X, i.e. σS n\sigma \in S_n. A morphism from σS n\sigma \in S_n to σS n\sigma' \in S_n is then a permutation τS n\tau \in S_n such that

σ=τστ 1. \sigma' = \tau \sigma \tau^{-1} .

But this subcategory is precisely S nS nS_n \sslash S_n.       ▮

Corollary. For all nn \in \mathbb{N} we have

|Perm(n)|=1 |\mathrm{Perm}(n)| = 1

Proof. We have |Perm(n)|=|S nS n|=|S n|/|S n|=1|\mathrm{Perm}(n)| = |S_n \sslash S_n| = |S_n|/|S_n| = 1.       ▮

It should now be clear why we can prove results on random permutations using the groupoid Perm(n)\mathsf{Perm}(n): this groupoid is equivalent to S nS nS_n \sslash S_n, a groupoid with one object for each permutation σS n\sigma \in S_n, and with each object contributing 1/n!1/n! to the groupoid cardinality.

Now let us use this idea to derive the original Cycle Length Lemma from the categorified version.

Cycle Length Lemma. Suppose p 1,,p np_1 , \dots, p_n \in \mathbb{N}. Then

E( k=1 nC k p̲ k)={ k=1 n1k p k if|p|n 0 if|p|>n E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right) = \left\{ \begin{array}{ccc} \displaystyle{ \prod_{k=1}^n \frac{1}{k^{p_k}} } & & \mathrm{if} \; |\vec{p}| \le n \\ \\ 0 & & \mathrm{if} \; |\vec{p}| \gt n \end{array} \right.

Proof. We know that

A pPerm(n|p|)× k=1 nB(/k) p k \mathsf{A}_{\vec{p}} \simeq \mathsf{Perm}(n - |\vec{p}|) \; \times \; \prod_{k = 1}^n \mathsf{B}(\mathbb{Z}/k)^{p_k}

So, to prove the Cycle Length Lemma it suffices to show three things:

|A p|=E( k=1 nC k p̲ k) |\mathsf{A}_{\vec{p}}| = E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right)

Perm(n|p|)={1 if|p|n 0 if|p|>n \mathsf{Perm}(n - |\vec{p}|) = \left\{ \begin{array}{ccc} 1 & & \mathrm{if} \; |\vec{p}| \le n \\ \\ 0 & & \mathrm{if} \; |\vec{p}| \gt n \end{array} \right.

and

|B(/k)|=1/k |\mathsf{B}(\mathbb{Z}/k)| = 1/k

The last of these is immediate from the definition of groupoid cardinality. The second follows from the Corollary above, together with the fact that Perm(n|p|)\mathsf{Perm}(n - |\vec{p}|) is the empty groupoid when |p|>n|\vec{p}| \gt n. Thus we are left needing to show that

|A p|=E( k=1 nC k p̲ k). |\mathsf{A}_{\vec{p}}| = E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right).

We prove this by computing the cardinality of a groupoid equivalent to A p\mathsf{A}_{\vec p}. This groupoid is of the form

Q(p)S n Q(\vec{p}) \sslash S_n

where Q(p)Q(\vec{p}) is a set on which S nS_n acts. As a result we have

|A p|=|Q(p)S n|=|Q(p)|/n! |\mathsf{A}_{\vec{p}}| = |Q(\vec{p}) \sslash S_n| = |Q(\vec{p})| / n!

and to finish the proof we will need to show

E( k=1 nC k p̲ k)=|Q(p)|/n!. E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right) = |Q(\vec{p})| / n!.

What is the set Q(p)Q(\vec{p}), and how does S nS_n act on this set? An element of Q(p)Q(\vec{p}) is a permutation σS n\sigma \in S_n equipped with an ordered p 1p_1-tuple of distinct 11-cycles, an ordered p 2p_2-tuple of distinct 22-cycles, and so on up to an ordered p np_n-tuple of distinct nn-cycles. Any element τS n\tau \in S_n acts on Q(p)Q(\vec{p}) in a natural way, by conjugating the permutation σS n\sigma \in S_n to obtain a new permutation, and mapping the chosen cycles of σ\sigma to the corresponding cycles of this new permutation τστ 1\tau \sigma \tau^{-1}.

Recalling the definition of the groupoid A p\mathsf{A}_{\vec{p}}, it is clear that any element of Q(p)Q(\vec{p}) gives an object of A p\mathsf{A}_{\vec{p}}, and any object is isomorphic to one of this form. Furthermore any permutation τS n\tau \in S_n gives a morphism between such objects, all morphisms between such objects are of this form, and composition of these morphisms is just multiplication in S nS_n. It follows that

A pQ(p)S n. \mathsf{A}_{\vec{p}} \simeq Q(\vec{p}) \sslash S_n.

To finish the proof, note that

E( k=1 nC k p̲ k) E\left( \prod_{k=1}^n C_k^{\underline{p}_k} \right)

is 1/n!1/n! times the number of ways of choosing a permutation σS n\sigma \in S_n and equipping it with an ordered p 1p_1-tuple of distinct 11-cycles, an ordered p 2p_2-tuple of distinct 22-cycles, and so on. This is the same as |Q(p)|/n! |Q(\vec{p})| / n!.       ▮

References

[AT] Richard Arratia and Simon Tavaré, The cycle structure of random permutations, The Annals of Probability (1992), 1567–1591.

[BD] John C. Baez and James Dolan, From finite sets to Feynman diagrams, in Mathematics Unlimited—2001 and Beyond, vol. 1, eds. Björn Engquist and Wilfried Schmid, Springer, Berlin, 2001, pp. 29–50.

[BHW] John C. Baez, Alexander E. Hoffnung and Christopher D. Walker, Higher-dimensional algebra VII: groupoidification, Theory and Applications of Categories 24 (2010), 489–553.

[F] Kevin Ford, Anatomy of Integers and Random Permutations—Course Lecture Notes.

December 20, 2024

Matt von HippelHow Small Scales Can Matter for Large Scales

For a certain type of physicist, nothing matters more than finding the ultimate laws of nature for its tiniest building-blocks, the rules that govern quantum gravity and tell us where the other laws of physics come from. But because they know very little about those laws at this point, they can predict almost nothing about observations on the larger distance scales we can actually measure.

“Almost nothing” isn’t nothing, though. Theoretical physicists don’t know nature’s ultimate laws. But some things about them can be reasonably guessed. The ultimate laws should include a theory of quantum gravity. They should explain at least some of what we see in particle physics now, explaining why different particles have different masses in terms of a simpler theory. And they should “make sense”, respecting cause and effect, the laws of probability, and Einstein’s overall picture of space and time.

All of these are assumptions, of course. Further assumptions are needed to derive any testable consequences from them. But a few communities in theoretical physics are willing to take the plunge, and see what consequences their assumptions have.

First, there’s the Swampland. String theorists posit that the world has extra dimensions, which can be curled up in a variety of ways to hide from view, with different observable consequences depending on how the dimensions are curled up. This list of different observable consequences is referred to as the Landscape of possibilities. Based on that, some string theorists coined the term “Swampland” to represent an area outside the Landscape, containing observations that are incompatible with quantum gravity altogether, and tried to figure out what those observations would be.

In principle, the Swampland includes the work of all the other communities on this list, since a theory of quantum gravity ought to be consistent with other principles as well. In practice, people who use the term focus on consequences of gravity in particular. The earliest such ideas argued from thought experiments with black holes, finding results that seemed to demand that gravity be the weakest force for at least one type of particle. Later researchers would more frequently use string theory as an example, looking at what kinds of constructions people had been able to make in the Landscape to guess what might lie outside of it. They’ve used this to argue that dark energy might be temporary, and to try to figure out what traits new particles might have.

Second, I should mention naturalness. When talking about naturalness, people often use the analogy of a pen balanced on its tip. While possible in principle, it must have been set up almost perfectly, since any small imbalance would cause it to topple, and that perfection demands an explanation. Similarly, in particle physics, things like the mass of the Higgs boson and the strength of dark energy seem to be carefully balanced, so that a small change in how they were set up would lead to a much heavier Higgs boson or much stronger dark energy. The need for an explanation for the Higgs’ careful balance is why many physicists expected the Large Hadron Collider to discover additional new particles.

As I’ve argued before, this kind of argument rests on assumptions about the fundamental laws of physics. It assumes that the fundamental laws explain the mass of the Higgs, not merely by giving it an arbitrary number but by showing how that number comes from a non-arbitrary physical process. It also assumes that we understand well how physical processes like that work, and what kinds of numbers they can give. That’s why I think of naturalness as a type of argument, much like the Swampland, that uses the smallest scales to constrain larger ones.

Third is a host of constraints that usually go together: causality, unitarity, and positivity. Causality comes from cause and effect in a relativistic universe. Because two distant events can appear to happen in different orders depending on how fast you’re going, any way to send signals faster than light is also a way to send signals back in time, causing all of the paradoxes familiar from science fiction. Unitarity comes from quantum mechanics. If quantum calculations are supposed to give the probability of things happening, those probabilities should make sense as probabilities: for example, they should never go above one.

You might guess that almost any theory would satisfy these constraints. But if you extend a theory to the smallest scales, some theories that otherwise seem sensible end up failing this test. Actually linking things up takes other conjectures about the mathematical form theories can have, conjectures that seem more solid than the ones underlying Swampland and naturalness constraints but that still can’t be conclusively proven. If you trust the conjectures, you can derive restrictions, often called positivity constraints when they demand that some set of observations is positive. There has been a renaissance in this kind of research over the last few years, including arguments that certain speculative theories of gravity can’t actually work.

Doug NatelsonTechnological civilization and losing object permanence

In the grand tradition of physicists writing about areas outside their expertise, I wanted to put down some thoughts on a societal trend.  This isn't physics or nanoscience, so feel free to skip this post.

Object permanence is a term from developmental psychology.  A person (or animal) has object permanence if they understand that something still exists even if they can't directly see it or interact with it in the moment.  If a kid realizes that their toy still exists even though they can't see it right now, they've got the concept.  

I'm wondering if modern technological civilization has an issue with an analog of object permanence.  Let me explain what I mean, why it's a serious problem, and end on a hopeful note by pointing out that even if this is the case, we have the tools needed to overcome this.

By the standards of basically any previous era, a substantial fraction of humanity lives in a golden age.  We have a technologically advanced, globe-spanning civilization.  A lot of people (though geographically very unevenly distributed) have grown up with comparatively clean water; comparatively plentiful food available through means other than subsistence agriculture; electricity; access to radio, television, and for the last couple of decades nearly instant access to communications and a big fraction of the sum total of human factual knowledge.  

Whether it's just human nature or a consequence of relative prosperity, there seems to be some timescale on the order of a few decades over which a non-negligible fraction of even the most fortunate seem to forget the hard lessons that got us to this point.  If they haven't seen something with their own eyes or experienced it directly, they decide it must not be a real issue.  I'm not talking about Holocaust deniers or conspiracy theorists who think the moon landings were fake.  There are a bunch of privileged people who have never personally known a time when tens of thousands of their neighbors died from childhood disease (you know, like 75 years ago, when 21,000 Americans were paralyzed every year from polio (!), proportionately like 50,000 today), who now think we should get rid of vaccines, and maybe germs aren't real.  Most people alive today were not alive the last time nuclear weapons were used, so some of them argue that nuclear weapons really aren't that bad (e.g. setting off 2000 one megaton bombs spread across the US would directly destroy less than 5% of the land area, so we're good, right?).  Or, we haven't had massive bank runs in the US since the 1930s, so some people now think that insuring bank deposits is a waste of resources and should stop.  I'll stop the list here, before veering into even more politically fraught territory.  I think you get my point, though - somehow chunks of modern society seem to develop collective amnesia, as if problems that we've never personally witnessed must have been overblown before or don't exist at all.  (Interestingly, this does not seem to happen for most technological problems.  You don't see many people saying, you know, maybe building fires weren't that big a deal, let's go back to the good old days before smoke alarms and sprinklers.)  

While the internet has downsides, including the ability to spread disinformation very effectively, all the available and stored knowledge also has an enormous benefit:  It should make it much harder than ever before for people to collectively forget the achievements of our species.  Sanitation, pasteurization, antibiotics, vaccinations - these are absolutely astonishing technical capabilities that were hard-won and have saved many millions of lives.  It's unconscionable that we are literally risking mass death by voluntarily forgetting or ignoring that.  Nuclear weapons are, in fact, terribleInsuring bank deposits with proper supervision of risk is a key factor that has helped stabilize economies for the last century.  We need to remember historical problems and their solutions, and make sure that the people setting policy are educated about these things. They say that those who cannot remember the past are doomed to repeat it.  As we look toward the new year, I hope that those who are familiar with the hard earned lessons of history are able to make themselves heard over the part of the populace who simply don't believe that old problems were real and could return.



Jordan EllenbergThree ways to apply AI to mathematics

If you wanted to train a machine to play Go, there are three ways you could do it, at decreasing levels of “from-scratchness.

You could tell the machine the rules of the game, and have it play many millions of games against itself; your goal is to learn a function that does a good job assigning a value to a game state, and you evaluate such a function by seeing how often it wins in this repeated arena. This is an oversimplified account of what AlphaGo does.

Or you could have the machine try to learn a state function from some database of games actually played by expert human competitors — those games would be entered in some formal format and the machine would try to learn to imitate those expert players. Of course, you could then combine this with simulated internal games as in the first step, but you’d be starting with a leg up from accumulated human knowledge.

The third way would be to train on every natural-language book ever written about Go and try to produce natural-language response to natural-language questions that just tells you what to do.

I don’t actually care about Go, but I do care about math, and I think all three of these approaches have loose analogues as we ask what it might look like for machines to help mathematicians. The first, “from scratch” approach, is the side of things I’ve worked on in projects like PatternBoost and FunSearch. (OK, maybe FunSearch has aspects of both method 1 and method 2.) Here you actively try to keep prior human knowledge away from the machine, because you want to see what it can do on its own.

The second approach is where I’d put formalized proof. If we try to train a machine to get from one assertion to another using a chain of proven theorems in a formal system like Lean, we’re starting from a high platform: a huge repository of theorems and even more importantly definitions which guide the machine along channels which people have already figured out are rich in meaning. AlphaProof is like this.

The third approach is more like what GPT o1 is doing — asking whether you can circumvent the formal language entirely and just generate text which kindles mathematical insight in the mind of the human reader.

I think all of these are reasonable things to try. I guess my own mostly unjustified prejudice is that the first approach is the one that has the most to teach us about what the scope of what machine learning actually is, while the second is the one that will probably end up being most useful in practice. The third? So far I think it doesn’t work. I don’t think it couldn’t work. But I also don’t think it’s on an obvious trajectory towards working, if words like “trajectory” even make sense in this context. At some point I’ll post an o1-preview dialogue which I found very illluminating in this respect.

Clifford JohnsonA Long Goodbye

I've been very quiet here over the last couple of weeks. My mother, Delia Maria Johnson, already in hospital since 5th November or so, took a turn for the worse and began a rapid decline. She died peacefully after some days, and to be honest I’ve really not been myself since then.

My mother Delia at a wedding in 2012

There's an extra element to the sense of loss when (as it approaches) you are powerless to do anything because of being thousands of miles away. On the plus side, because of the ease of using video calls, and with the help of my sister being there, I was able to be somewhat present during what turned out to be the last moments when she was aware of people around her, and therefore was able to tell her I loved her one last time.

Rather than charging across the world on planes, trains, and in automobiles, probably being out of reach during any significant changes in the situation (the doctors said I would likely not make it in time) I did a number of things locally that I am glad I got to do.

It began with visiting (and sending a photo from) the Santa Barbara mission, a place she dearly loved and was unable to visit again after 2019, along with the pier. These are both places we walked together so much back when I first lived here in what feels like another life.

Then, two nights before mum passed away, but well after she’d seemed already beyond reach of anyone, although perhaps (I’d like to think) still able to hear things, my sister contacted me from her bedside asking if I’d like to read mum a psalm, perhaps one of her favourites, 23 or 91. At first I thought she was already planning the funeral, and expressed my surprise at this since mum was still alive and right next to her. But I’d misunderstood, and she’d in fact had a rather great idea. This suggestion turned into several hours of, having sent on recordings of the two psalms, my digging into the poetry shelf in the study and discovering long neglected collections through which I searched (sometimes accompanied by my wife and son) for additional things to read. I recorded some and sent them along, as well as one from my son, I’m delighted to say. Later, the whole thing turned into me singing various songs while playing my guitar and sending recordings of those along too.

Incidentally, the guitar-playing was an interesting turn of events since not many months ago I decided after a long lapse to start playing guitar again, and try to move the standard of my playing (for vocal accompaniment) to a higher level than I’d previously done, by playing and practicing for a little bit on a regular basis. I distinctly recall thinking at one point during one practice that it would be nice to play for mum, although I did not imagine that playing to her while she was on her actual death-bed would be the circumstance under which I’d eventually play for her, having (to my memory) never directly done so back when I used to play guitar in my youth. (Her overhearing me picking out bits of Queen songs behind my room door when I was a teenager doesn’t count as direct playing for her.)

Due to family circumstances I’ll perhaps go into another time... Click to continue reading this post

The post A Long Goodbye appeared first on Asymptotia.

December 19, 2024

Terence TaoOn the distribution of eigenvalues of GUE and its minors at fixed index

I’ve just uploaded to the arXiv the paper “On the distribution of eigenvalues of GUE and its minors at fixed index“. This is a somewhat technical paper establishing some estimates regarding one of the most well-studied random matrix models, the Gaussian Unitary Ensemble (GUE), that were not previously in the literature, but which will be needed for some forthcoming work of Hariharan Narayanan on the limiting behavior of “hives” with GUE boundary conditions (building upon our previous joint work with Sheffield).

For sake of discussion we normalize the GUE model to be the random {N \times N} Hermitian matrix {H} whose probability density function is proportional to {e^{-\mathrm{tr} H^2}}. With this normalization, the famous Wigner semicircle law will tell us that the eigenvalues {\lambda_1 \leq \dots \leq \lambda_N} of this matrix will almost all lie in the interval {[-\sqrt{2N}, \sqrt{2N}]}, and after dividing by {\sqrt{2N}}, will asymptotically be distributed according to the semicircle distribution

\displaystyle  \rho_{\mathrm{sc}}(x) := \frac{2}{\pi} (1-x^2)_+^{1/2}.

In particular, the normalized {i^{th}} eigenvalue {\lambda_i/\sqrt{2N}} should be close to the classical location {\gamma_{i/N}}, where {\gamma_{i/N}} is the unique element of {[-1,1]} such that

\displaystyle  \int_{-\infty}^{\gamma_{i/N}} \rho_{\mathrm{sc}}(x)\ dx = \frac{i}{N}.

Eigenvalues can be described by their index {i} or by their (normalized) energy {\lambda_i/\sqrt{2N}}. In principle, the two descriptions are related by the classical map {i \mapsto \gamma_{i/N}} defined above, but there are microscopic fluctuations from the classical location that create subtle technical difficulties between “fixed index” results in which one focuses on a single index {i} (and neighboring indices {i+1, i-1}, etc.), and “fixed energy” results in which one focuses on a single energy {x} (and eigenvalues near this energy). The phenomenon of eigenvalue rigidity does give some control on these fluctuations, allowing one to relate “averaged index” results (in which the index {i} ranges over a mesoscopic range) with “averaged energy” results (in which the energy {x} is similarly averaged over a mesoscopic interval), but there are technical issues in passing back from averaged control to pointwise control, either for the index or energy.

We will be mostly concerned in the bulk region where the index {i} is in an inteval of the form {[\delta n, (1-\delta)n]} for some fixed {\delta>0}, or equivalently the energy {x} is in {[-1+c, 1-c]} for some fixed {c > 0}. In this region it is natural to introduce the normalized eigenvalue gaps

\displaystyle  g_i := \sqrt{N/2} \rho_{\mathrm{sc}}(\gamma_{i/N}) (\lambda_{i+1} - \lambda_i).

The semicircle law predicts that these gaps {g_i} have mean close to {1}; however, due to the aforementioned fluctuations around the classical location, this type of claim is only easy to establish in the “fixed energy”, “averaged energy”, or “averaged index” settings; the “fixed index” case was only achieved by myself as recently as 2013, where I showed that each such gap in fact asymptotically had the expected distribution of the Gaudin law, using manipulations of determinantal processes. A significantly more general result, avoiding the use of determinantal processes, was subsequently obtained by Erdos and Yau.

However, these results left open the possibility of bad tail behavior at extremely large or small values of the gaps {g_i}; in particular, moments of the {g_i} were not directly controlled by previous results. The first result of the paper is to push the determinantal analysis further, and obtain such results. For instance, we obtain moment bounds

\displaystyle  \mathop{\bf E} g_i^p \ll_p 1

for any fixed {p > 0}, as well as an exponential decay bound

\displaystyle  \mathop{\bf P} (g_i > h) \ll \exp(-h/4)

for {0 < h \ll \log\log N}, and a lower tail bound

\displaystyle  \mathop{\bf P} (g_i \leq h) \ll h^{2/3} \log^{1/2} \frac{1}{h}

for any {h>0}. We also obtain good control on sums {g_i + \dots + g_{i+m-1}} of {m} consecutive gaps for any fixed {m}, showing that this sum has mean {m + O(\log^{4/3} (2+m))} and variance {O(\log^{7/3} (2+m))}. (This is significantly less variance than one would expect from a sum of {m} independent random variables; this variance reduction phenomenon is closely related to the eigenvalue rigidity phenomenon alluded to earlier, and reflects the tendency of eigenvalues to repel each other.)

A key point in these estimates is that no factors of {\log N} occur in the estimates, which is what one would obtain if one tried to use existing eigenvalue rigidity theorems. (In particular, if one normalized the eigenvalues {\lambda_i} at the same scale at the gap {g_i}, they would fluctuate by a standard deviation of about {\sqrt{\log N}}; it is only the gaps between eigenvalues that exhibit much smaller fluctuation.) On the other hand, the dependence on {h} is not optimal, although it was sufficient for the applications I had in mind.

As with my previous paper, the strategy is to try to replace fixed index events such as {g_i > h} with averaged energy events. For instance, if {g_i > h} and {i} has classical location {x}, then there is an interval of normalized energies {t} of length {\gg h}, with the property that there are precisely {N-i} eigenvalues to the right of {f_x(t)} and no eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}, where

\displaystyle  f_x(t) = \sqrt{2N}( x + \frac{t}{N \rho_{\mathrm{sc}}(x)})

is an affine rescaling to the scale of the eigenvalue gap. So matters soon reduce to controlling the probability of the event

\displaystyle  (N_{x,t} = N-i) \wedge (N_{x,t,h/2} = 0)

where {N_{x,t}} is the number of eigenvalues to the right of {f_x(t)}, and {N_{x,t,h/2}} is the number of eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}. These are fixed energy events, and one can use the theory of determinantal processes to control them. For instance, each of the random variables {N_{x,t}}, {N_{x,t,h/2}} separately have the distribution of sums of independent Boolean variables, which are extremely well understood. Unfortunately, the coupling is a problem; conditioning on the event {N_{x,t} = N-i}, in particular, affects the distribution of {N_{x,t,h/2}}, so that it is no longer the sum of independent Boolean variables. However, it is still a mixture of such sums, and with this (and the Plancherel-Rotach asymptotics for the GUE determinantal kernel) it is possible to proceed and obtain the above estimates after some calculation.

For the intended application to GUE hives, it is important to not just control gaps {g_i} of the eigenvalues {\lambda_i} of the GUE matrix {M}, but also the gaps {g'_i} of the eigenvalues {\lambda'_i} of the top left {N-1 \times N-1} minor {M'} of {M}. This minor of a GUE matrix is basically again a GUE matrix, so the above theorem applies verbatim to the {g'_i}; but it turns out to be necessary to control the joint distribution of the {g_i} and {g'_i}, and also of the interlacing gaps {\tilde g_i} between the {\lambda_i} and {\lambda'_i}. For fixed energy, these gaps are in principle well understood, due to previous work of Adler-Nordenstam-van Moerbeke and of Johansson-Nordenstam which show that the spectrum of both matrices is asymptotically controlled by the Boutillier bead process. This also gives averaged energy and averaged index results without much difficulty, but to get to fixed index information, one needs some universality result in the index {i}. For the gaps {g_i} of the original matrix, such a universality result is available due to the aforementioned work of Erdos and Yau, but this does not immediately imply the corresponding universality result for the joint distribution of {g_i} and {g'_i} or {\tilde g_i}. For this, we need a way to relate the eigenvalues {\lambda_i} of the matrix {M} to the eigenvalues {\lambda'_i} of the minors {M'}. By a standard Schur’s complement calculation, one can obtain the equation

\displaystyle a_{NN} - \lambda_i - \sum_{j=1}^{N-1}\frac{|X_j|^2}{\lambda'_j - \lambda_i} = 0

for all {i}, where {a_{NN}} is the bottom right entry of {M}, and {X_1,\dots,X_{N-1}} are complex gaussians independent of {\lambda'_j}. This gives a random system of equations to solve for {\lambda_i} in terms of {\lambda'_j}. Using the previous bounds on eigenvalue gaps (particularly the concentration results for sums of consecutive gaps), one can localize this equation to the point where a given {\lambda_i} is mostly controlled by a bounded number of nearby {\lambda'_j}, and hence a single gap {g_i} is mostly controlled by a bounded number of {g'_j}. From this, it is possible to leverage the existing universality result of Erdos and Yau to obtain universality of the joint distribution of {g_i} and {g'_i} (or of {\tilde g_i}). (The result can also be extended to more layers of the minor process than just two, as long as the number of minors is held fixed.)

This at last brings us to the final result of the paper, which is the one which is actually needed for the application to GUE hives. Here, one is interested in controlling the variance of a linear combination {\sum_{l=1}^m a_l \tilde g_{i+l}} of a fixed number {l} of consecutive interlacing gaps {\tilde g_{i+l}}, where the {a_l} are arbitrary deterministic coefficients. An application of the triangle and Cauchy-Schwarz inequalities, combined with the previous moment bounds on gaps, shows that this randomv ariable has variance {\ll m \sum_{l=1}^m |a_i|^2}. However, this bound is not expected to be sharp, due to the expected decay between correlations of eigenvalue gaps. In this paper, I improve the variance bound to

\displaystyle  \ll_A \frac{m}{\log^A(2+m)} \sum_{l=1}^m |a_i|^2

for any {A>0}, which is what was needed for the application.

This improvement reflects some decay in the covariances between distant interlacing gaps {\tilde g_i, \tilde g_{i+h}}. I was not able to establish such decay directly. Instead, using some Fourier analysis, one can reduce matters to studying the case of modulated linear statistics such as {\sum_{l=1}^m e(\xi l) \tilde g_{i+l}} for various frequencies {\xi}. In “high frequency” cases one can use the triangle inequality to reduce matters to studying the original eigenvalue gaps {g_i}, which can be handled by a (somewhat complicated) determinantal process calculation, after first using universality results to pass from fixed index to averaged index, thence to averaged energy, then to fixed energy estimates. For low frequencies the triangle inequality argument is unfavorable, and one has to instead use the determinantal kernel of the full minor process, and not just an individual matrix. This requires some classical, but tedious, calculation of certain asymptotics of sums involving Hermite polynomials.

The full argument is unfortunately quite complex, but it seems that the combination of having to deal with minors, as well as fixed indices, places this result out of reach of many prior methods.

December 17, 2024

Matt Strassler Science Book of The Year (!?!)

Well, gosh… what nice news as 2024 comes to a close… My book has received a ringing endorsement from Ethan Siegel, the science writer and Ph.D. astrophysicist who hosts the well-known, award-winning blog “Starts with a Bang“. Siegel’s one of the most reliable and prolific science writers around — he writes for BigThink and has published in Forbes, among others — and it’s a real honor to read what he’s written about Waves in an Impossible Sea.

His brief review serves as an introduction to an interview that he conducted with me recently, which I think many of you will enjoy. We discussed science — the nature of particles/wavicles, the Higgs force, the fabric (if there is one) of the universe, and the staying power of the idea of supersymmetry among many theoretical physicists — and science writing, including novel approaches to science communication that I used in the book.

If you’re a fan of this blog or of the book, please consider sharing his review on social media (as well as the Wall Street Journal’s opinion.) The book has sold well this year, but I am hoping that in 2025 it will reach an even broader range of people who seek a better understanding of the cosmos, both in the large and in the small.

Andrew JaffeDiscovering Japan

My old friend Marc Weidenbaum, curator and writer of disquiet.com, reminded me, in his latest post, of the value of blogging. So, here I am (again).

Since September, I have been on sabbatical in Japan, working mostly at QUP (International Center for Quantum-field Measurement Systems for Studies of the Universe and Particles) at the KEK accelerator lab in Tsukuba, Japan, and spending time as well at the Kavli IPMU, about halfway into Tokyo from here. Tsukuba is a “science city” about 30 miles northeast of Tokyo, home to multiple Japanese scientific establishments (such as a University and a major lab for JAXA, the Japanese space agency).

Scientifically, I’ve spent a lot of time thinking and talking about the topology of the Universe, future experiments to measure the cosmic microwave background, and statistical tools for cosmology experiments. And I was honoured to be asked to deliver a set of lectures on probability and statistics in cosmology, a topic which unites most of my research interests nowadays.

Japan, and Tsukuba in particular, is a very nice place to live. It’s close enough to Tokyo for regular visits (by the rapid Tsukuba Express rail line), but quiet enough for our local transport to be dominated by cycling around town. We love the food, the Japanese schools that have welcomed our children, the onsens, and our many views of Mount Fuji.

Fuji with buildings

Fuji through windows

And after almost four months in Japan, it’s beginning to feel like home.

Unfortunately, we’re leaving our short-term home in Japan this week. After a few weeks of travel in Southeast Asia, we’ll be decamped to the New York area for the rest of the Winter and early Spring. But (as further encouragement to myself to continue blogging) I’ll have much more to say about Japan — science and life — in upcoming posts.

December 15, 2024

Doug NatelsonItems for discussion, including google's latest quantum computing result

As we head toward the end of the calendar year, a few items:

  • Google published a new result in Nature a few days ago.  This made a big news splash, including this accompanying press piece from google themselvesthis nice article in Quanta, and the always thoughtful blog post by Scott Aaronson.  The short version:  Physical qubits as made today in the superconducting platform favored by google don't have the low error rates that you'd really like if you want to run general quantum algorithms on a quantum computer, which could certainly require millions of steps.  The hope of the community is to get around this using quantum error correction, where some number of physical qubits are used to function as one "logical" qubit.  If physical qubit error rates are sufficiently low, and these errors can be corrected with enough efficacy, the logical qubits can function better than the physical qubits, ideally being able to undergo a sequential operations indefinitely without degradation of their information.   One technique for this is called a surface code.  Google have implemented this in their most recent chip 105 physical qubit chip ("Willow"), and they seem to have crossed a huge threshold:  When they increase the size of their correction scheme (going from a 3 (physical qubit) \(\times\) 3 (physical qubit) to 5 \(\times\) 5 to 7 \(\times\) 7), the error rates of the resulting logical qubits fall as hoped.  This is a big deal, as it implies that larger chips, if they could be implemented, should scale toward the desired performance.  This does not mean that general purpose quantum computers are just around the corner, but it's very encouraging.  There are many severe engineering challenges still in place.  For example, the present superconducting qubits must be tweaked and tuned.  The reason google only has 105 of them on the Willow chip is not that they can't fit more - it's that they have to have wires and control capacity to tune and run them.  A few thousand really good logical qubits would be needed to break RSA encryption, and there is no practical way to put millions of wires down a dilution refrigerator.  Rather, one will need cryogenic control electronics
  • On a closely related point, google's article talks about how it would take a classical computer ten septillion years to do what its Willow chip can do.  This is based on a very particularly chosen problem (as I mentioned here five years ago) called random circuit sampling, looking at the statistical properties of the outcome of applying random gate sequences to a quantum computer.  From what I can tell, this is very different than what most people mean when they think of a problem to benchmark a quantum computer's advantage over a classical computer.  I suspect the typical tech-literate person considering quantum computing wants to know, if I ask a quantum computer and a classical computer to factor huge numbers or do some optimization problem, how much faster is the quantum computer for a given size of problem?  Random circuit sampling feels instead much more to me like comparing an experiment to a classical theory calculation.  For a purely classical analog, consider putting an airfoil in a windtunnel and measuring turbulent flow, and comparing with a computational fluids calculation.  Yes, the windtunnel can get you an answer very quickly, but it's not "doing" a calculation, from my perspective.  This doesn't mean random circuit sampling is a poor benchmark, just that people should understand it's rather different from the kind of quantum/classical comparison they may envision.
  • On one unrelated note:  Thanks to a timey inquiry from a reader, I have now added a search bar to the top of the blog.  (Just in time to capture the final decline of science blogging?)
  • On a second unrelated note:  I'd be curious to hear from my academic readers on how they are approaching generative AI, both on the instructional side (e.g., should we abandon traditional assignments and take-home exams?  How do we check to see if students are really learning vs. becoming dependent on tools that have dubious reliability?) and on the research side (e.g., what level of generative AI tool use is acceptable in paper or proposal writing?  What aspects of these tools are proving genuinely useful to PIs?  To students?  Clearly generative AI's ability to help with coding is very nice indeed!)

December 13, 2024

n-Category Café Martianus Capella

I’ve been blogging a bit about medieval math, physics and astronomy over on Azimuth. I’ve been writing about medieval attempts to improve Aristotle’s theory that velocity is proportional to force, understand objects moving at constant acceleration, and predict the conjunctions of Jupiter and Saturn. A lot of interesting stuff was happening back then!

As a digression from our usual fare on the nn-Café, here’s one of my favorites, about an early theory of the Solar System, neither geocentric nor heliocentric, that became popular thanks to a quirk of history around the time of Charlemagne. The more I researched this, the more I wanted to know.

In 1543, Nicolaus Copernicus published a book arguing that the Earth revolves around the Sun: De revolutionibus orbium coelestium.

This is sometimes painted as a sudden triumph of rationality over the foolish yet long-standing belief that the Sun and all the planets revolve around the Earth. As usual, this triumphalist narrative is oversimplified. In the history of science, everything is always more complicated than you think.

First, Aristarchus had come up with a heliocentric theory way back around 250 BC. While Copernicus probably didn’t know all the details, he did know that Aristarchus said the Earth moves. Copernicus mentioned this in an early unpublished version of De revolutionibus.

Copernicus also had some precursors in the Middle Ages, though it’s not clear whether he was influenced by them.

In the 1300’s, the philosopher Jean Buridan argued that the Earth might not be at the center of the Universe, and that it might be rotating. He claimed — correctly in the first case, and only a bit incorrectly in the second — that there’s no real way to tell. But he pointed out that it makes more sense to have the Earth rotating than have the Sun, Moon, planets and stars all revolving around it, because

it is easier to move a small thing than a large one.

In 1377 Nicole Oresme continued this line of thought, making the same points in great detail, only to conclude by saying

Yet everyone holds, and I think myself, that the heavens do move and not the Earth, for “God created the orb of the Earth, which will not be moved” [Psalms 104:5], notwithstanding the arguments to the contrary.

Everyone seems to take this last-minute reversal of views at face value, but I have trouble believing he really meant it. Maybe he wanted to play it safe with the Church. I think I detect a wry sense of humor, too.

Martianus Capella

I recently discovered another fascinating precursor of Copernicus’ heliocentric theory: a theory that is neither completely geocentric nor completely heliocentric! And that’s what I want to talk about today.

Sometime between 410 and 420 AD, Martianus Capella came out with a book saying Mercury and Venus orbit the Sun, while the other planets orbit the Earth!

This picture is from a much later book by the German astronomer Valentin Naboth, in 1573. But it illustrates Capella’s theory — and as we’ll see, his theory was rather well-known in western Europe starting in the 800s.

First of all, take a minute to think about how reasonable this theory is. Mercury and Venus are the two planets closer to the Sun than we are. So, unlike the other planets, we can never possibly see them more than 90° away from the Sun. In fact Venus never gets more than 48° from the Sun, and Mercury stays even closer. So it looks like these planets are orbiting the Sun, not the Earth!

But who was this guy, and why did he matter?

Martianus Capella was a jurist and writer who lived in the city of Madauros, which is now in Algeria, but in his day was in Numidia, one of six African provinces of the Roman Empire. He’s famous for a book with the wacky title De nuptiis Philologiae et Mercurii, which means On the Marriage of Philology and Mercury. It was an allegorical story, in prose and verse, describing the courtship and wedding of Mercury (who stood for “intelligent or profitable pursuit”) and the maiden Philologia (who stood for “the love of letters and study”). Among the wedding gifts are seven maids who will be Philology’s servants. They are the seven liberal arts:

In seven chapters, the seven maids explain these subjects. What matters for us is the chapter on astronomy, which explains the structure of the Solar System.

This book De nuptiis Philologiae et Mercurii became very important after the decline and fall of the Roman Empire, mainly as a guide to the liberal arts. In fact, if you went to a college that claimed to offer a liberal arts education, you were indirectly affected by this book!

Here is a painting by Botticelli from about 1485, called A Young Man Being Introduced to the Seven Liberal Arts:

The Carolingian Renaissance

But why did Martianus Capella’s book become so important?

I’m no expert on this, but it seems as the Roman Empire declined there was a gradual dumbing down of scholarship, with original and profound works by folks like Aristotle, Euclid, and Ptolemy eventually being lost in western Europe — though preserved in more civilized parts of the world, like Baghdad and the Byzantine Empire. In the west, eventually all that was left were easy-to-read popularizations by people like Pliny the Elder, Boethius, Macrobius, Cassiodorus… and Martianus Capella!

By the end of the 800s, many copies of Capella’s book De nuptiis Philologiae et Mercurii were available. Let’s see how that happened!


Expansion of the Franks


To set the stage: Charlemagne became King of the Franks in 768 AD. Being a forward-looking fellow, he brought in Alcuin, headmaster of the cathedral school in York and “the most learned man anywhere to be found”, to help organize education in his kingdom.

Alcuin set up schools for boys and girls, systematized the curriculum, raised the standards of scholarship, and encouraged the study of liberal arts. Yes: the liberal arts as described by Martianus Capella! For Alcuin this was all in the service of Christianity. But scholars, being scholars, took advantage of this opportunity to start copying the ancient books that were available, writing commentaries on them, and the like.

In 800, Charlemagne became emperor of what’s now called the Carolingian Empire. When Charlemagne died in 814 a war broke out, but it ended in 847. Though divided into three parts, the empire flourished until about 877, when it began sinking due to internal struggles, attacks from Vikings in the north, etc.

The heyday of culture in the Carolingian Empire, roughly 768–877, is sometimes called the Carolingian Renaissance because of the flourishing of culture and learning brought about by Alcuin and his successors. To get a sense of this: between 550 and 750 AD, only 265 books have been preserved from Western Europe. From the Carolingian Renaissance we have over 7000.

However, there was still a huge deficit of the classical texts we now consider most important. As far as I can tell, the works of Aristotle, Eratosthenes, Euclid, Ptolemy and Archimedes were completely missing in the Carolingian Empire. I seem to recall that from Plato only the Timaeus was available at this time. But Martianus Capella’s De nuptiis Philologiae et Mercurii was very popular. Hundreds of copies were made, and many survive even to this day! Thus, his theory of the Solar System, where Mercury and Venus orbited the Sun but other planets orbited the Earth, must have had an out-sized impact on cosmology at this time.

Here is part of a page from one of the first known copies of De nuptiis Philologiae et Mercurii:

It’s called VLF 48, and it’s now at the university library in Leiden. Most scholars say it dates to 850 AD, though Mariken Teeuwen has a paper claiming it goes back to 830 AD.

You’ll notice that in addition to the main text, there’s a lot of commentary in smaller letters! This may have been added later. Nobody knows who wrote it, or even whether it was a single person. It’s called the Anonymous Commentary. This commentary was copied into many of the later versions of the book, so it’s important.

The Anonymous Commentary

So far my tale has been a happy one: even in the time of Charlemagne, the heliocentric revolt against the geocentric cosmology was brewing, with a fascinating ‘mixed’ cosmology being rather well-known.

Alas, now I need to throw a wet blanket on that, and show how poorly Martianus Capella’s cosmology was understood at this time!

The Anonymous Commentary actually describes three variants of Capella’s theory of the orbits of Mercury and Venus. One of them is good, one seems bad, and one seems very bad. Yet subsequent commentators in the Carolingian Empire didn’t seem to recognize this fact and discard the bad ones.

These three variants were drawn as diagrams in the margin of VLF 48, but Robert Eastwood has nicely put them side by side here:

The one at right, which the commentary attributes to the “Platonists”, shows the orbit of Mercury around the Sun surrounded by the larger orbit of Venus. This is good.

The one in the middle, which the commentary attributes to Martianus Capella himself, shows the orbits of Mercury and Venus crossing each other. This seems bad.

The one at left, which the commentary attributes to Pliny, shows orbits for Mercury and Venus that are cut off when they meet the orbit of the Sun, not complete circles. This seems very bad — so bad that I can’t help but hope there’s some reasonable interpretation that I’m missing. (Maybe just that these planets get hidden when they go behind the Sun?)

Robert Eastwood attributes the two bad models to a purely textual approach to astronomy, where commentators tried to interpret texts and compare them to other texts, without doing observations. I’m still puzzled.

Copernicus

Luckily, we’ve already seen that by 1573, Valentin Naboth had settled on the good version of Capella’s cosmology:

That’s 30 years after Copernicus came out with his book… but the clarification probably happened earlier. And Copernicus did mention Martianus Capella’s work. In fact, he used it to argue for a heliocentric theory! In Chapter 10 of De Revolutionibus he wrote:

In my judgement, therefore, we should not in the least disregard what was familiar to Martianus Capella, the author of an encyclopedia, and to certain other Latin writers. For according to them, Venus and Mercury revolve around the sun as their center. This is the reason, in their opinion, why these planets diverge no farther from the sun than is permitted by the curvature of their revolutions. For they do not encircle the earth, like the other planets, but “have opposite circles”. Then what else do these authors mean but that the center of their spheres is near the sun? Thus Mercury’s sphere will surely be enclosed within Venus’, which by common consent is more than twice as big, and inside that wide region it will occupy a space adequate for itself. If anyone seizes this opportunity to link Saturn, Jupiter, and Mars also to that center, provided he understands their spheres to be so large that together with Venus and Mercury the earth too is enclosed inside and encircled, he will not be mistaken, as is shown by the regular pattern of their motions.

For [these outer planets] are always closest to the earth, as is well known, about the time of their evening rising, that is, when they are in opposition to the sun, with the earth between them and the sun. On the other hand, they are at their farthest from the earth at the time of their evening setting, when they become invisible in the vicinity of the sun, namely, when we have the sun between them and the earth. These facts are enough to show that their center belongs more to then sun, and is identical with the center around which Venus and Mercury likewise execute their revolutions.

Conclusion

What’s the punchline? For me, it’s that there was not a purely binary choice between geocentric and heliocentric cosmologies. Instead, many options were in play around the time of Copernicus:

  • In classic geocentrism, the Earth was non-rotating and everything revolved around it.

  • Buridan and Oresme strongly considered the possibility that the Earth rotated… but not, apparently, that it revolved around the Sun.

  • Capella believed Mercury and Venus revolved around the Sun… but the Sun revolved around the Earth.

  • Copernicus believed the Earth rotates, and also revolves around the Sun.

  • And to add to the menu, Tycho Brahe, coming after Copernicus, argued that all the planets except Earth revolve around the Sun, but the Sun and Moon revolve around the Earth, which is fixed.

And Capella’s theory actually helped Copernicus!

This diversity of theories is fascinating… even though everyone holds, and I think myself, that the Earth revolves around the Sun.

Above is a picture of the “Hypothesis Tychonica”, from a book written in 1643.

References

We know very little about Aristarchus’ heliocentric theory. Much comes from Archimedes, who wrote in his Sand-Reckoner that

You King Gelon are aware the ‘universe’ is the name given by most astronomers to the sphere the centre of which is the centre of the earth, while its radius is equal to the straight line between the centre of the sun and the centre of the earth. This is the common account as you have heard from astronomers. But Aristarchus has brought out a book consisting of certain hypotheses, wherein it appears, as a consequence of the assumptions made, that the universe is many times greater than the ‘universe’ just mentioned. His hypotheses are that the fixed stars and the sun remain unmoved, that the earth revolves about the sun on the circumference of a circle, the sun lying in the middle of the orbit, and that the sphere of fixed stars, situated about the same centre as the sun, is so great that the circle in which he supposes the earth to revolve bears such a proportion to the distance of the fixed stars as the centre of the sphere bears to its surface.

The last sentence, which Archimedes went on to criticize, seems to be a way of saying that the fixed stars are at an infinite distance from us.

For Aristarchus’ influence on Copernicus, see:

An unpublished early version of Copernicus’ De revolutionibus, preserved at the Jagiellonian Library in Kraków, contains this passage:

And if we should admit that the motion of the Sun and Moon could be demonstrated even if the Earth is fixed, then with respect to the other wandering bodies there is less agreement. It is credible that for these and similar causes (and not because of the reasons that Aristotle mentions and rejects), Philolaus believed in the mobility of the Earth and some even say that Aristarchus of Samos was of that opinion. But since such things could not be comprehended except by a keen intellect and continuing diligence, Plato does not conceal the fact that there were very few philosophers in that time who mastered the study of celestial motions.

For Buridan on the location and possible motion of the Earth, see:

  • John Buridan, Questions on the Four Books on the Heavens and the World of Aristotle, Book II, Question 22, trans. Michael Claggett, in The Science of Mechanics in the Middle Ages, University of Wisconsin Press, Madison, Wisconsin, 1961, pp. 594–599.

For Oresme on similar issues, see:

  • Nicole Oresme, On the Book on the Heavens and the World of Aristotle, Book II, Chapter 25, trans. Michael Claggett, in The Science of Mechanics in the Middle Ages, University of Wisconsin Press, Madison, Wisconsin, 1961, pp. 600–609.

Both believed in a principle of relativity for rotational motion, so they thought there’d be no way to tell whether the Earth was rotating. This of course got revisited in Newton’s rotating bucket argument, and then Mach’s principle, frame-dragging in general relativity, and so on.

You can read Martianus Capella’s book in English translation here:

William Harris Stahl, Evan Laurie Burge and Richard Johnson, eds., Martianus Capella and the Seven Liberal Arts: The Marriage of Philology and Mercury. Vol. 2., Columbia University Press, 1971.

I got my figures on numbers of books available in the early Middle Ages from here:

This is the best source I’ve found on Martianus Capella’s impact on cosmology in the Carolingian Renaissance:

  • Bruce S. Eastwood, Ordering the Heavens: Roman Astronomy and Cosmology in the Carolingian Renaissance, Brill, 2007.

This also good:

  • Mariken Teeuwen and Sínead O’Sullivan, eds., Carolingian Scholarship and Martianus Capella: Ninth-Century Commentary Traditions on De nuptiis in Context, The Medieval Review (2012).

In this book, the essay most relevant to Capella’s cosmology is again by Eastwood:

  • Bruce S. Eastwood, The power of diagrams: the place of the anonymous commentary in the development of Carolingian astronomy and cosmology.

However, this seems subsumed by the more detailed information in his book. There’s also an essay with a good discussion about Carolingian manuscripts of De nuptiis, especially the one called VLF 48 that I showed you, which may be the earliest:

  • Mariken Teeuwen, Writing between the lines: reflections of a scholarly debate in a Carolingian commentary tradition.

For the full text of Copernicus’ book, translated into English, go here.

December 11, 2024

Scott Aaronson The Google Willow thing

Yesterday I arrived in Santa Clara for the Q2B (Quantum 2 Business) conference, which starts this morning, and where I’ll be speaking Thursday on “Quantum Algorithms in 2024: How Should We Feel?” and also closing the conference via an Ask-Us-Anything session with John Preskill. (If you’re at Q2B, reader, come and say hi!)

And to coincide with Q2B, yesterday Google’s Quantum group officially announced “Willow,” its new 105-qubit superconducting chip with which it’s demonstrated an error-corrected surface code qubit as well as a new, bigger quantum supremacy experiment based on Random Circuit Sampling. I was lucky to be able to attend Google’s announcement ceremony yesterday afternoon at the Computer History Museum in Mountain View, where friend-of-the-blog-for-decades Dave Bacon and other Google quantum people explained exactly what was done and took questions (the technical level was surprisingly high for this sort of event). I was also lucky to get a personal briefing last week from Google’s Sergio Boixo on what happened.

Meanwhile, yesterday Sundar Pichai tweeted about Willow, and Elon Musk replied “Wow.” It cannot be denied that those are both things that happened.

Anyway, all yesterday, I then read comments on Twitter, Hacker News, etc. complaining that, since there wasn’t yet a post on Shtetl-Optimized, how could anyone possibly know what to think of this?? For 20 years I’ve been trying to teach the world how to fish in Hilbert space, but (sigh) I suppose I’ll just hand out some more fish. So, here are my comments:

  1. Yes, this is great. Yes, it’s a real milestone for the field. To be clear: for anyone who’s been following experimental quantum computing these past five years (say, since Google’s original quantum supremacy milestone in 2019), there’s no particular shock here. Since 2019, Google has roughly doubled the number of qubits on its chip and, more importantly, increased the qubits’ coherence time by a factor of 5. Meanwhile, their 2-qubit gate fidelity is now roughly 99.7% (for controlled-Z gates) or 99.85% (for “iswap” gates), compared to ~99.5% in 2019. They then did the more impressive demonstrations that predictably become possible with more and better qubits. And yet, even if the progress is broadly in line with what most of us expected, it’s still of course immensely gratifying to see everything actually work! Huge congratulations to everyone on the Google team for a well-deserved success.
  2. I already blogged about this!!! Specifically, I blogged about Google’s fault-tolerance milestone when its preprint appeared on the arXiv back in August. To clarify, what we’re all talking about now is the same basic technical advance that Google already reported in August, except now with the PR blitz from Sundar Pichai on down, a Nature paper, an official name for the chip (“Willow”), and a bunch of additional details about it.
  3. Scientifically, the headline result is that, as they increase the size of their surface code, from 3×3 to 5×5 to 7×7, Google finds that their encoded logical qubit stays alive for longer rather than shorter. So, this is a very important threshold that’s now been crossed. As Dave Bacon put it to me, “eddies are now forming”—or, to switch metaphors, after 30 years we’re now finally tickling the tail of the dragon of quantum fault-tolerance, the dragon that (once fully awoken) will let logical qubits be preserved and acted on for basically arbitrary amounts of time, allowing scalable quantum computation.
  4. Having said that, Sergio Boixo tells me that Google will only consider itself to have created a “true” fault-tolerant qubit, once it can do fault-tolerant two-qubit gates with an error of ~10-6 (and thus, on the order of a million fault-tolerant operations before suffering a single error). We’re still some ways from that milestone: after all, in this experiment Google created only a single encoded qubit, and didn’t even try to do encoded operations on it, let alone on multiple encoded qubits. But all in good time. Please don’t ask me to predict how long, though empirically, the time from one major experimental QC milestone to the next now seems to be measured in years, which are longer than weeks but shorter than decades.
  5. Google has also announced a new quantum supremacy experiment on its 105-qubit chip, based on Random Circuit Sampling with 40 layers of gates. Notably, they say that, if you use the best currently-known simulation algorithms (based on Johnnie Gray’s optimized tensor network contraction), as well as an exascale supercomputer, their new experiment would take ~300 million years to simulate classically if memory is not an issue, or ~1025 years if memory is an issue (note that a mere ~1010 years have elapsed since the Big Bang). Probably some people have come here expecting me to debunk those numbers, but as far as I know they’re entirely correct, with the caveats stated. Naturally it’s possible that better classical simulation methods will be discovered, but meanwhile the experiments themselves will also rapidly improve.
  6. Having said that, the biggest caveat to the “1025 years” result is one to which I fear Google drew insufficient attention. Namely, for the exact same reason why (as far as anyone knows) this quantum computation would take ~1025 years for a classical computer to simulate, it would also take ~1025 years for a classical computer to directly verify the quantum computer’s results!! (For example, by computing the “Linear Cross-Entropy” score of the outputs.) For this reason, all validation of Google’s new supremacy experiment is indirect, based on extrapolations from smaller circuits, ones for which a classical computer can feasibly check the results. To be clear, I personally see no reason to doubt those extrapolations. But for anyone who wonders why I’ve been obsessing for years about the need to design efficiently verifiable near-term quantum supremacy experiments: well, this is why! We’re now deeply into the unverifiable regime that I warned about.
  7. In his remarks yesterday, Google Quantum AI leader Hartmut Neven talked about David Deutsch’s argument, way back in the 1990s, that quantum computers should force us to accept the reality of the Everettian multiverse, since “where else could the computation have happened, if it wasn’t being farmed out to parallel universes?” And naturally there was lots of debate about that on Hacker News and so forth. Let me confine myself here to saying that, in my view, the new experiment doesn’t add anything new to this old debate. It’s yet another confirmation of the predictions of quantum mechanics. What those predictions mean for our understanding of reality can continue to be argued as it’s been since the 1920s.
  8. Cade Metz did a piece about Google’s announcement for the New York Times. Alas, when Cade reached out to me for comment, I decided that it would be too awkward, after what Cade did to my friend Scott Alexander almost four years ago. I talked to several other journalists, such as Adrian Cho for Science.
  9. No doubt people will ask me what this means for superconducting qubits versus trapped-ion or neutral-atom or photonic qubits, or for Google versus its many competitors in experimental QC. And, I mean, it’s not bad for Google or for superconducting QC! These past couple years I’d sometimes commented that, since Google’s 2019 announcement of quantum supremacy via superconducting qubits, the trapped-ion and neutral-atom approaches had seemed to be pulling ahead, with spectacular results from Quantinuum (trapped-ion) and QuEra (neutral atoms) among others. One could think of Willow as Google’s reply, putting the ball in competitors’ courts likewise to demonstrate better logical qubit lifetime with increasing code size (or, better yet, full operations on logical qubits exceeding that threshold, without resorting to postselection). The great advantage of trapped-ion qubits continues to be that you can move the qubits around (and also, the two-qubit gate fidelities seem somewhat ahead of superconducting). But to compensate, superconducting qubits have the advantage that the gates are a thousand times faster, which makes feasible to do experiments that require collecting millions of samples.
  10. Of course the big question, the one on everyone’s lips, was always how quantum computing skeptic Gil Kalai was going to respond. But we need not wonder! On his blog, Gil writes: “We did not study yet these particular claims by Google Quantum AI but my general conclusion apply to them ‘Google Quantum AI’s claims (including published ones) should be approached with caution, particularly those of an extraordinary nature. These claims may stem from significant methodological errors and, as such, may reflect the researchers’ expectations more than objective scientific reality.’ ”  Most of Gil’s post is devoted to re-analyzing data from Google’s 2019 quantum supremacy experiment, which Gil continues to believe can’t possibly have done what was claimed. Gil’s problem is that the 2019 experiment was long ago superseded anyway: besides the new and more inarguable Google result, IBM, Quantinuum, QuEra, and USTC have now all also reported Random Circuit Sampling experiments with good results. I predict that Gil, and others who take it as axiomatic that scalable quantum computing is impossible, will continue to have their work cut out for them in this new world.

Update: Here’s Sabine Hossenfelder’s take. I don’t think she and I disagree about any of the actual facts; she just decided to frame things much more negatively. Ironically, I guess 20 years of covering hyped, dishonestly-presented non-milestones in quantum computing has inclined me to be pretty positive when a group puts in this much work, demonstrates a real milestone, and talks about it without obvious falsehoods!

December 09, 2024

Scott Aaronson Podcasts!

Update (Dec. 9): For those who still haven’t gotten enough, check out a 1-hour Zoom panel discussion about quantum algorithms, featuring yours truly along with my distinguished colleagues Eddie Farhi, Aram Harrow, and Andrew Childs, moderated by Barry Sanders, as part of the QTML’2024 conference held in Melbourne (although, it being Thanksgiving week, none of the four panelists were actually there in person). Part of the panel devolves into a long debate between me and Eddie about how interesting quantum algorithms are if they don’t achieve speedups over classical algorithms, and whether some quantum algorithms papers mislead people by not clearly addressing the speedup question (you get one guess as to which side I took). I resolved going in to keep my comments as civil and polite as possible—you can judge for yourself how well I succeeded! Thanks very much to Barry and the other QTML organizers for making this happen.


Do you like watching me spout about AI alignment, watermarking, my time at OpenAI, the P versus NP problem, quantum computing, consciousness, Penrose’s views on physics and uncomputability, university culture, wokeness, free speech, my academic trajectory, and much more, despite my slightly spastic demeanor and my many verbal infelicities? Then holy crap are you in luck today! Here’s 2.5 hours of me talking to former professional poker players (and now wonderful Austin-based friends) Liv Boeree and her husband Igor Kurganov about all of those topics. (Or 1.25 hours if you watch at 2x speed, as I strongly recommend.)

But that’s not all! Here I am talking to Harvard’s Hrvoje Kukina, in a much shorter (45-minute) podcast focused on quantum computing, cosmological bounds on information processing, and the idea of the universe as a computer:

Last but not least, here I am in an hour-long podcast (this one audio-only) with longtime friend Kelly Weinersmith and her co-host Daniel Whiteson, talking about quantum computing.

Enjoy!

David Hoggpossible Trojan planet?

In group meeting last week, Stefan Rankovic (NYU undergrad) presented results on a very low-amplitude possible transit in the lightcurve of a candidate long-period eclipsing binary system found in the NASA Kepler data. The weird thing is that (even though the period is very long) the transit of the possible planet looks just like the transit of the secondary star in the eclipsing binary. Like just like it, only lower in amplitude (smaller in radius).

If the transit looks identical, only lower in amplitude, it suggests that it is taking an extremely similar chord across the primary star, at the same speed, with no difference in inclination. How could that be? Well if they are moving at the same speed on the same path, maybe we have a 1:1 resonance, like a Trojan? If so, there are so many cool things about this system. It was an exciting group meeting, to be sure.

December 07, 2024

Doug NatelsonSeeing through your head - diffuse imaging

From the medical diagnostic perspective (and for many other applications), you can understand why it might be very convenient to be able to perform some kind of optical imaging of the interior of what you'd ordinarily consider opaque objects.  Even when a wavelength range is chosen so that absorption is minimized, photons can scatter many times as they make their way through dense tissue like a breast.  We now have serious computing power and extremely sensitive photodetectors, which has led to the development of imaging techniques to perform imaging through media that absorb and diffuse photons.  Here is a review of this topic from 2005, and another more recent one (pdf link here).  There are many cool approaches that can be combined, including using pulsed lasers to do time-of-flight measurements (review here), and using "structured illumination" (review here).   

Sure, point that laser at my head.  (Adapted from
Figure 1 of this paper.)

I mention all of this to set the stage for this fun preprint, titled "Photon transport through the entire adult human head".  Sure, you think your head is opaque, but it only attenuates photon fluxes by a factor of around \(10^{18}\).  With 1 Watt of incident power at 800 nm wavelength spread out over a 25 mm diameter circle and pulsed 80 million times a second, time-resolved single-photon detectors like photomultiplier tubes can readily detect the many-times-scattered photons that straggle their way out of your head around 2 nanoseconds later.  (The distribution of arrival times contains a bunch of information.  Note that the speed of light in free space is around 30 cm/ns; even accounting for the index of refraction of tissue, those photons have bounced around a lot before getting through.)  The point of this is that those photons have passed through parts of the brain that are usually considered inaccessible.  This shows that one could credibly use spectroscopic methods to get information out of there, like blood oxygen levels.

December 06, 2024

Terence TaoAI for Math fund

Renaissance Philanthropy and XTX Markets have announced the launch of the AI for Math Fund, a new grant program supporting projects that apply AI and machine learning to mathematics, with a focus on automated theorem proving, with an initial $9.2 million in funding. The project funding categories, and examples of projects in such categories, are:

1. Production Grade Software Tools

  • AI-based autoformalization tools for translating natural-language mathematics into the formalisms of proof assistants
  • AI-based auto-informalization tools for translating proof-assistant proofs into interpretable natural-language mathematics
  • AI-based models for suggesting tactics/steps or relevant concepts to the user of a proof assistant, or for generating entire proofs
  • Infrastructure to connect proof assistants with computer algebra systems, calculus, and PDEs
  • A large-scale, AI-enhanced distributed collaboration platform for mathematicians

2. Datasets

  • Datasets of formalized theorems and proofs in a proof assistant
  • Datasets that would advance AI for theorem proving as applied to program verification and secure code generation
  • Datasets of (natural-language) mathematical problems, theorems, proofs, exposition, etc.
  • Benchmarks and training environments associated with datasets and model tasks (autoformalization, premise selection, tactic or proof generation, etc.)

3. Field Building

  • Textbooks
  • Courses
  • Documentation and support for proof assistants, and interfaces/APIs to integrate with AI tools

4. Breakthrough Ideas

  • Expected difficulty estimation (of sub-problems of a proof)
  • Novel mathematical implications of proofs formalized type-theoretically
  • Formalization of proof complexity in proof assistants

The deadline for initial expressions of interest is Jan 10, 2025.

[Disclosure: I have agreed to serve on the advisory board for this fund.]

Update: See also this discussion thread on possible projects that might be supported by this fund.

December 04, 2024

n-Category Café ACT 2025

The Eighth International Conference on Applied Category Theory (https://easychair.org/cfp/ACT2025) will take place at the University of Florida on June 2-6, 2025. The conference will be preceded by the Adjoint School on May 26-30, 2025.

This conference follows previous events at Oxford (2024, 2019), University of Maryland (2023), Strathclyde (2022), Cambridge (2021), MIT (2020), and Leiden (2019).

Applied category theory is important to a growing community of researchers who study computer science, logic, engineering, physics, biology, chemistry, social science, systems, linguistics and other subjects using category-theoretic tools. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, strengthen the applied category theory community, disseminate the latest results, and facilitate further development of the field.

If you want to give a talk, read on!

Submission

Important dates

All deadlines are AoE (Anywhere on Earth).

  • February 26: title and brief abstract submission
  • March 3: paper submission
  • April 7: notification of authors
  • May 19: Pre-proceedings ready versions
  • June 2-6: conference

Submissions

The submission URL is: https://easychair.org/conferences/?conf=act2025

We accept submissions in English of original research papers, talks about work accepted/submitted/published elsewhere, and demonstrations of relevant software. Accepted original research papers will be published in a proceedings volume. The conference will include an industry showcase event and community meeting. We particularly encourage people from underrepresented groups to submit their work and the organizers are committed to non-discrimination, equity, and inclusion.

  • Conference Papers should present original, high-quality work in the style of a computer science conference paper (up to 12 pages, not counting the bibliography; more detailed parts of proofs may be included in an appendix for the convenience of the reviewers). Such submissions should not be an abridged version of an existing journal article although pre-submission arXiv preprints are permitted. These submissions will be adjudicated for both a talk and publication in the conference proceedings.

  • Talk proposals not to be published in the proceedings, e.g. about work accepted/submitted/published elsewhere, should be submitted as abstracts, one or two pages long. Authors are encouraged to include links to any full versions of their papers, preprints or manuscripts. The purpose of the abstract is to provide a basis for determining the topics and quality of the anticipated presentation.

  • Software demonstration proposals should also be submitted as abstracts, one or two pages. The purpose of the abstract is to provide the program committee with enough information to assess the content of the demonstration.

The selected conference papers will be published in a volume of Proceedings. Authors are advised to use EPTCS style; files are available at https://style.eptcs.org/

Reviewing will be single-blind, and we are not making public the reviews, reviewer names, the discussions nor the list of under-review submissions. This is the same as previous instances of ACT.

In order to give our reviewers enough time to bid on submissions, we ask for a title and brief abstract of your submission by February 26. The full two-page pdf extended abstract submissions and up to 12 page proceedings submissions are both due by the submissions deadline of March 3 11:59pm AoE (Anywhere on Earth).

Please contact the Programme Committee Chairs for more information: Amar Hadzihasanovic (amar.hadzihasanovic@taltech.ee) and JS Lemay (js.lemay@mq.edu.au).

Programme Committee

See conference website for full list:

https://gataslab.org/act2025/act2025cfp

December 02, 2024

Terence TaoOn several irrationality problems for Ahmes series

Vjeko Kovac and I have just uploaded to the arXiv our paper “On several irrationality problems for Ahmes series“. This paper resolves (or at least makes partial progress on) some open questions of Erdős and others on the irrationality of Ahmes series, which are infinite series of the form {\sum_{k=1}^\infty \frac{1}{a_k}} for some increasing sequence {a_k} of natural numbers. Of course, since most real numbers are irrational, one expects such series to “generically” be irrational, and we make this intuition precise (in both a probabilistic sense and a Baire category sense) in our paper. However, it is often difficult to establish the irrationality of any specific series. For example, it is already a non-trivial result of Erdős that the series {\sum_{k=1}^\infty \frac{1}{2^k-1}} is irrational, while the irrationality of {\sum_{p \hbox{ prime}} \frac{1}{2^p-1}} (equivalent to Erdős problem #69) remains open, although very recently Pratt established this conditionally on the Hardy–Littlewood prime tuples conjecture. Finally, the irrationality of {\sum_n \frac{1}{n!-1}} (Erdős problem #68) is completely open.

On the other hand, it has long been known that if the sequence {a_k} grows faster than {C^{2^k}} for any {C}, then the Ahmes series is necessarily irrational, basically because the fractional parts of {a_1 \dots a_m \sum_{k=1}^\infty \frac{1}{a_k}} can be arbitrarily small positive quantities, which is inconsistent with {\sum_{k=1}^\infty \frac{1}{a_k}} being rational. This growth rate is sharp, as can be seen by iterating the identity {\frac{1}{n} = \frac{1}{n+1} + \frac{1}{n(n+1)}} to obtain a rational Ahmes series of growth rate {(C+o(1))^{2^k}} for any fixed {C>1}.

In our paper we show that if {a_k} grows somewhat slower than the above sequences in the sense that {a_{k+1} = o(a_k^2)}, for instance if {a_k \asymp 2^{(2-\varepsilon)^k}} for a fixed {0 < \varepsilon < 1}, then one can find a comparable sequence {b_k \asymp a_k} for which {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This partially addresses Erdős problem #263, which asked if the sequence {a_k = 2^{2^k}} had this property, and whether any sequence of exponential or slower growth (but with {\sum_{k=1}^\infty 1/a_k} convergent) had this property. Unfortunately we barely miss a full solution of both parts of the problem, since the condition {a_{k+1} = o(a_k^2)} we need just fails to cover the case {a_k = 2^{2^k}}, and also does not quite hold for all sequences going to infinity at an exponential or slower rate.

We also show the following variant; if {a_k} has exponential growth in the sense that {a_{k+1} = O(a_k)} with {\sum_{k=1}^\infty \frac{1}{a_k}} convergent, then there exists nearby natural numbers {b_k = a_k + O(1)} such that {\sum_{k=1}^\infty \frac{1}{b_k}} is rational. This answers the first part of Erdős problem #264 which asked about the case {a_k = 2^k}, although the second part (which asks about {a_k = k!}) is slightly out of reach of our methods. Indeed, we show that the exponential growth hypothesis is best possible in the sense a random sequence {a_k} that grows faster than exponentially will not have this property, this result does not address any specific superexponential sequence such as {a_k = k!}, although it does apply to some sequence {a_k} of the shape {a_k = k! + O(\log\log k)}.

Our methods can also handle higher dimensional variants in which multiple series are simultaneously set to be rational. Perhaps the most striking result is this: we can find an increasing sequence {a_k} of natural numbers with the property that {\sum_{k=1}^\infty \frac{1}{a_k + t}} is rational for every rational {t} (excluding the cases {t = - a_k} to avoid division by zero)! This answers (in the negative) a question of Stolarsky Erdős problem #266, and also reproves Erdős problem #265 (and in the latter case one can even make {a_k} grow double exponentially fast).

Our methods are elementary and avoid any number-theoretic considerations, relying primarily on the countable dense nature of the rationals and an iterative approximation technique. The first observation is that the task of representing a given number {q} as an Ahmes series {\sum_{k=1}^\infty \frac{1}{a_k}} with each {a_k} lying in some interval {I_k} (with the {I_k} disjoint, and going to infinity fast enough to ensure convergence of the series), is possible if and only if the infinite sumset

\displaystyle  \frac{1}{I_1} + \frac{1}{I_2} + \dots

to contain {q}, where {\frac{1}{I_k} = \{ \frac{1}{a}: a \in I_k \}}. More generally, to represent a tuple of numbers {(q_t)_{t \in T}} indexed by some set {T} of numbers simultaneously as {\sum_{k=1}^\infty \frac{1}{a_k+t}} with {a_k \in I_k}, this is the same as asking for the infinite sumset

\displaystyle  E_1 + E_2 + \dots

to contain {(q_t)_{t \in T}}, where now

\displaystyle  E_k = \{ (\frac{1}{a+t})_{t \in T}: a \in I_k \}. \ \ \ \ \ (1)

So the main problem is to get control on such infinite sumsets. Here we use a very simple observation:

Proposition 1 (Iterative approximation) Let {V} be a Banach space, let {E_1,E_2,\dots} be sets with each {E_k} contained in the ball of radius {\varepsilon_k>0} around the origin for some {\varepsilon_k} with {\sum_{k=1}^\infty \varepsilon_k} convergent, so that the infinite sumset {E_1 + E_2 + \dots} is well-defined. Suppose that one has some convergent series {\sum_{k=1}^\infty v_k} in {V}, and sets {B_1,B_2,\dots} converging in norm to zero, such that

\displaystyle  v_k + B_k \subset E_k + B_{k+1} \ \ \ \ \ (2)

for all {k \geq 1}. Then the infinite sumset {E_1 + E_2 + \dots} contains {\sum_{k=1}^\infty v_k + B_1}.

Informally, the condition (2) asserts that {E_k} occupies all of {v_k + B_k} “at the scale {B_{k+1}}“.

Proof: Let {w_1 \in B_1}. Our task is to express {\sum_{k=1}^\infty v_k + w_1} as a series {\sum_{k=1}^\infty e_k} with {e_k \in E_k}. From (2) we may write

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=2}^\infty v_k + e_1 + w_2

for some {e_1 \in E_1} and {w_2 \in B_2}. Iterating this, we may find {e_k \in E_k} and {w_k \in B_k} such that

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = \sum_{k=m+1}^\infty v_k + e_1 + e_2 + \dots + e_m + w_{m+1}

for all {m}. Sending {m \rightarrow \infty}, we obtain

\displaystyle  \sum_{k=1}^\infty v_k + w_1 = e_1 + e_2 + \dots

as required. \Box

In one dimension, sets of the form {\frac{1}{I_k}} are dense enough that the condition (2) can be satisfied in a large number of situations, leading to most of our one-dimensional results. In higher dimension, the sets {E_k} lie on curves in a high-dimensional space, and so do not directly obey usable inclusions of the form (2); however, for suitable choices of intervals {I_k}, one can take some finite sums {E_{k+1} + \dots + E_{k+d}} which will become dense enough to obtain usable inclusions of the form (2) once {d} reaches the dimension of the ambient space, basically thanks to the inverse function theorem (and the non-vanishing curvatures of the curve in question). For the Stolarsky problem, which is an infinite-dimensional problem, it turns out that one can modify this approach by letting {d} grow slowly to infinity with {k}.

December 01, 2024

Tommaso DorigoTracking Particles With Neuromorphic Computing

At the IV Workshop in Valencia a student from my group, Emanuele Coradin, presented the results of a novel algorithm for the identification of charged particles in a silicon tracker. The novelty is due to the use of neuromorphic computing, which works by encoding detector hits in the time of arrival of current impulses at neurons, and by letting neurons "learn" the true patterns of hits produced by charged particles from the noise due to random hits.

read more

Clifford JohnsonMagic Ingredients Exist!

I’m a baker, as you probably know. I’ve regularly made bread, cakes, pies, and all sorts of things for friends and family. About a year ago, someone in the family was diagnosed with a severe allergy to gluten, and within days we removed all gluten products from the kitchen, began … Click to continue reading this post

The post Magic Ingredients Exist! appeared first on Asymptotia.

November 30, 2024

Clifford JohnsonHope’s Benefits

The good news (following from last post) is that it worked out! I was almost short of the amount I needed to cover the pie, and so that left nothing for my usual decoration... but it was a hit at dinner and for left-overs today, so that's good!

--cvj Click to continue reading this post

The post Hope’s Benefits appeared first on Asymptotia.

November 28, 2024

Clifford JohnsonHope

The delicious chaos that (almost always) eventually tames into a tasty flaky pastry crust… it’s always a worrying mess to start out, but you trust to your experience, and you carry on, with hope. #thanksgiving

The post Hope appeared first on Asymptotia.

November 22, 2024

n-Category Café Axiomatic Set Theory 9: The Axiom of Choice

Previously: Part 8. Next: Part 10.

It’s the penultimate week of the course, and up until now we’ve abstained from using the axiom of choice. But this week we gorged on it.

We proved that all the usual things are equivalent to the axiom of choice: Zorn’s lemma, the well ordering principle, cardinal comparability (given two sets, one must inject into the other), and the souped-up version of cardinal comparability that compares not just two sets but an arbitrary collection of them: for any nonempty family of sets (X i) iI(X_i)_{i \in I}, there is some X iX_i that injects into all the others.

The section I most enjoyed writing and teaching was the last one, on unnecessary uses of the axiom of choice. I’m grateful to Todd Trimble for explaining to me years ago how to systematically remove dependence on choice from arguments in basic general topology. (For some reason, it’s very tempting in that subject to use choice unnecessarily.) I talk about this at the very end of the chapter.

Section of a surjection

November 12, 2024

Terence TaoHigher uniformity of arithmetic functions in short intervals II. Almost all intervals

Kaisa Matomäki, Maksym Radziwill, Fernando Xuancheng Shao, Joni Teräväinen, and myself have (finally) uploaded to the arXiv our paper “Higher uniformity of arithmetic functions in short intervals II. Almost all intervals“. This is a sequel to our previous paper from 2022. In that paper, discorrelation estimates such as

\displaystyle  \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma) = o(H)

were established, where {\Lambda} is the von Mangoldt function, {\Lambda^\sharp} was some suitable approximant to that function, {F(g(n)\Gamma)} was a nilsequence, and {[x,x+H]} was a reasonably short interval in the sense that {H \sim x^{\theta+\varepsilon}} for some {0 < \theta < 1} and some small {\varepsilon>0}. In that paper, we were able to obtain non-trivial estimates for {\theta} as small as {5/8}, and for some other functions such as divisor functions {d_k} for small values of {k}, we could lower {\theta} somewhat to values such as {3/5}, {5/9}, {1/3} of {\theta}. This had a number of analytic number theory consequences, for instance in obtaining asymptotics for additive patterns in primes in such intervals. However, there were multiple obstructions to lowering {\theta} much further. Even for the model problem when {F(g(n)\Gamma) = 1}, that is to say the study of primes in short intervals, until recently the best value of {\theta} available was {7/12}, although this was very recently improved to {17/30} by Guth and Maynard.

However, the situation is better when one is willing to consider estimates that are valid for almost all intervals, rather than all intervals, so that one now studies local higher order uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{F,g} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) \bar{F}(g(n)\Gamma)|\ dx = o(XH)

where {H = X^{\theta+\varepsilon}} and the supremum is over all nilsequences of a certain Lipschitz constant on a fixed nilmanifold {G/\Gamma}. This generalizes local Fourier uniformity estimates of the form

\displaystyle  \int_X^{2X} \sup_{\alpha} | \sum_{x \leq n \leq x+H} (\Lambda(n) - \Lambda^\sharp(n)) e(-\alpha n)|\ dx = o(XH).

There is particular interest in such estimates in the case of the Möbius function {\mu(n)} (where, as per the Möbius pseudorandomness conjecture, the approximant {\mu^\sharp} should be taken to be zero, at least in the absence of a Siegel zero). This is because if one could get estimates of this form for any {H} that grows sufficiently slowly in {X} (in particular {H = \log^{o(1)} X}), this would imply the (logarithmically averaged) Chowla conjecture, as I showed in a previous paper.

While one can lower {\theta} somewhat, there are still barriers. For instance, in the model case {F \equiv 1}, that is to say prime number theorems in almost all short intervals, until very recently the best value of {\theta} was {1/6}, recently lowered to {2/15} by Guth and Maynard (and can be lowered all the way to zero on the Density Hypothesis). Nevertheless, we are able to get some improvements at higher orders:

  • For the von Mangoldt function, we can get {\theta} as low as {1/3}, with an arbitrary logarithmic saving {\log^{-A} X} in the error terms; for divisor functions, one can even get power savings in this regime.
  • For the Möbius function, we can get {\theta=0}, recovering our previous result with Tamar Ziegler, but now with {\log^{-A} X} type savings in the exceptional set (though not in the pointwise bound outside of the set).
  • We can now also get comparable results for the divisor function.

As sample applications, we can obtain Hardy-Littlewood conjecture asymptotics for arithmetic progressions of almost all given steps {h \sim X^{1/3+\varepsilon}}, and divisor correlation estimates on arithmetic progressions for almost all {h \sim X^\varepsilon}.

Our proofs are rather long, but broadly follow the “contagion” strategy of Walsh, generalized from the Fourier setting to the higher order setting. Firstly, by standard Heath–Brown type decompositions, and previous results, it suffices to control “Type II” discorrelations such as

\displaystyle  \sup_{F,g} | \sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F}(g(n)\Gamma)|

for almost all {x}, and some suitable functions {\alpha,\beta} supported on medium scales. So the bad case is when for most {x}, one has a discorrelation

\displaystyle  |\sum_{x \leq n \leq x+H} \alpha*\beta(n) \bar{F_x}(g_x(n)\Gamma)| \gg H

for some nilsequence {F_x(g_x(n) \Gamma)} that depends on {x}.

The main issue is the dependency of the polynomial {g_x} on {x}. By using a “nilsequence large sieve” introduced in our previous paper, and removing degenerate cases, we can show a functional relationship amongst the {g_x} that is very roughly of the form

\displaystyle  g_x(an) \approx g_{x'}(a'n)

whenever {n \sim x/a \sim x'/a'} (and I am being extremely vague as to what the relation “{\approx}” means here). By a higher order (and quantitatively stronger) version of Walsh’s contagion analysis (which is ultimately to do with separation properties of Farey sequences), we can show that this implies that these polynomials {g_x(n)} (which exert influence over intervals {[x,x+H]}) can “infect” longer intervals {[x', x'+Ha]} with some new polynomials {\tilde g_{x'}(n)} and various {x' \sim Xa}, which are related to many of the previous polynomials by a relationship that looks very roughly like

\displaystyle  g_x(n) \approx \tilde g_{ax}(an).

This can be viewed as a rather complicated generalization of the following vaguely “cohomological”-looking observation: if one has some real numbers {\alpha_i} and some primes {p_i} with {p_j \alpha_i \approx p_i \alpha_j} for all {i,j}, then one should have {\alpha_i \approx p_i \alpha} for some {\alpha}, where I am being vague here about what {\approx} means (and why it might be useful to have primes). By iterating this sort of contagion relationship, one can eventually get the {g_x(n)} to behave like an Archimedean character {n^{iT}} for some {T} that is not too large (polynomial size in {X}), and then one can use relatively standard (but technically a bit lengthy) “major arc” techiques based on various integral estimates for zeta and {L} functions to conclude.

October 28, 2024

John PreskillAnnouncing the quantum-steampunk creative-writing course!

Why not run a quantum-steampunk creative-writing course?

Quantum steampunk, as Quantum Frontiers regulars know, is the aesthetic and spirit of a growing scientific field. Steampunk is a subgenre of science fiction. In it, futuristic technologies invade Victorian-era settings: submarines, time machines, and clockwork octopodes populate La Belle Èpoque, a recently liberated Haiti, and Sherlock Holmes’s London. A similar invasion characterizes my research field, quantum thermodynamics: thermodynamics is the study of heat, work, temperature, and efficiency. The Industrial Revolution spurred the theory’s development during the 1800s. The theory’s original subject—nineteenth-century engines—were large, were massive, and contained enormous numbers of particles. Such engines obey the classical mechanics developed during the 1600s. Hence thermodynamics needs re-envisioning for quantum systems. To extend the theory’s laws and applications, quantum thermodynamicists use mathematical and experimental tools from quantum information science. Quantum information science is, in part, the understanding of quantum systems through how they store and process information. The toolkit is partially cutting-edge and partially futuristic, as full-scale quantum computers remain under construction. So applying quantum information to thermodynamics—quantum thermodynamics—strikes me as the real-world incarnation of steampunk.

But the thought of a quantum-steampunk creative-writing course had never occurred to me, and I hesitated over it. Quantum-steampunk blog posts, I could handle. A book, I could handle. Even a short-story contest, I’d handled. But a course? The idea yawned like the pitch-dark mouth of an unknown cavern in my imagination.

But the more I mulled over Edward Daschle’s suggestion, the more I warmed to it. Edward was completing a master’s degree in creative writing at the University of Maryland (UMD), specializing in science fiction. His mentor Emily Brandchaft Mitchell had sung his praises via email. In 2023, Emily had served as a judge for the Quantum-Steampunk Short-Story Contest. She works as a professor of English at UMD, writes fiction, and specializes in the study of genre. I reached out to her last spring about collaborating on a grant for quantum-inspired art, and she pointed to her protégé.

Who won me over. Edward and I are co-teaching “Writing Quantum Steampunk: Science-Fiction Workshop” during spring 2025.

The course will alternate between science and science fiction. Under Edward’s direction, we’ll read and discuss published fiction. We’ll also learn about what genres are and how they come to be. Students will try out writing styles by composing short stories themselves. Everyone will provide feedback about each other’s writing: what works, what’s confusing, and opportunities for improvement. 

The published fiction chosen will mirror the scientific subjects we’ll cover: quantum physics; quantum technologies; and thermodynamics, including quantum thermodynamics. I’ll lead this part of the course. The scientific studies will interleave with the story reading, writing, and workshopping. Students will learn about the science behind the science fiction while contributing to the growing subgenre of quantum steampunk.

We aim to attract students from across campus: physics, English, the Jiménez-Porter Writers’ House, computer science, mathematics, and engineering—plus any other departments whose students have curiosity and creativity to spare. The course already has four cross-listings—Arts and Humanities 270, Physics 299Q, Computer Science 298Q, and Mechanical Engineering 299Q—and will probably acquire a fifth (Chemistry 298Q). You can earn a Distributive Studies: Scholarship in Practice (DSSP) General Education requirement, and undergraduate and graduate students are welcome. QuICS—the Joint Center for Quantum Information and Computer Science, my home base—is paying Edward’s salary through a seed grant. Ross Angelella, the director of the Writers’ House, arranged logistics and doused us with enthusiasm. I’m proud of how organizations across the university are uniting to support the course.

The diversity we seek, though, poses a challenge. The course lacks prerequisites, so I’ll need to teach at a level comprehensible to the non-science students. I’d enjoy doing so, but I’m concerned about boring the science students. Ideally, the science students will help me teach, while the non-science students will challenge us with foundational questions that force us to rethink basic concepts. Also, I hope that non-science students will galvanize discussions about ethical and sociological implications of quantum technologies. But how can one ensure that conversation will flow?

This summer, Edward and I traded candidate stories for the syllabus. Based on his suggestions, I recommend touring science fiction under an expert’s guidance. I enjoyed, for a few hours each weekend, sinking into the worlds of Ted Chiang, Ursula K. LeGuinn, N. K. Jemison, Ken Liu, and others. My scientific background informed my reading more than I’d expected. Some authors, I could tell, had researched their subjects thoroughly. When they transitioned from science into fiction, I trusted and followed them. Other authors tossed jargon into their writing but evidenced a lack of deep understanding. One author nailed technical details about quantum computation, initially impressing me, but missed the big picture: his conflict hinged on a misunderstanding about entanglement. I see all these stories as affording opportunities for learning and teaching, in different ways.

Students can begin registering for “Writing Quantum Steampunk: Science-Fiction Workshop” on October 24. We can offer only 15 seats, due to Writers’ House standards, so secure yours as soon as you can. Part of me still wonders how the Hilbert space I came to be co-teaching a quantum-steampunk creative-writing course.1 But I look forward to reading with you next spring!


1A Hilbert space is a mathematical object that represents a quantum system. But you needn’t know that to succeed in the course.

Matt LeiferDoctoral Position

Funding is available for a Doctor of Science Studentship with Dr. Matthew Leifer at the Institute for Quantum Studies, Chapman University, California, USA.  It is in Chapman’s unique interdisciplinary Math, Physics, and Philosophy (MPP) program, which emphasizes research that encompasses two or more of the three core disciplines.  This is a 3-year program that focuses on research, and students are expected to have a terminal Masters degree before they start.

This position is part of the Southern California Quantum Foundations Hub, funded by the John Templeton Foundation.  The research project must be in quantum foundations, particularly in one of the three theme areas of the grant:

  1. The Nature of the Quantum State
  2. Past and Future Boundary Conditions
  3. Agency in Quantum Observers. 

The university also provides other scholarships for the MPP program.  Please apply before January 15, 2025, to receive full consideration for the available funding.

Please follow the “Graduate Application” link on the MPP website to apply.

For informal inquiries about the position and research projects, please get in touch with me.