Planet Musings

April 19, 2024

Terence TaoTwo announcements: AI for Math resources, and

This post contains two unrelated announcements. Firstly, I would like to promote a useful list of resources for AI in Mathematics, that was initiated by Talia Ringer (with the crowdsourced assistance of many others) during the National Academies workshop on “AI in mathematical reasoning” last year. This list is now accepting new contributions, updates, or corrections; please feel free to submit them directly to the list (which I am helping Talia to edit). Incidentally, next week there will be a followup webinar to the aforementioned workshop, building on the topics covered there.

Secondly, I would like to advertise the website, launched recently by Thomas Bloom. This is intended to be a living repository of the many mathematical problems proposed in various venues by Paul Erdős, who was particularly noted for his influential posing of such problems. For a tour of the site and an explanation of its purpose, I can recommend Thomas’s recent talk on this topic at a conference last week in honor of Timothy Gowers.

Thomas is currently issuing a call for help to develop the website in a number of ways (quoting directly from that page):

  • You know Github and could set a suitable project up to allow people to contribute new problems (and corrections to old ones) to the database, and could help me maintain the Github project;
  • You know things about web design and have suggestions for how this website could look or perform better;
  • You know things about Python/Flask/HTML/SQL/whatever and want to help me code cool new features on the website;
  • You know about accessibility and have an idea how I can make this website more accessible (to any group of people);
  • You are a mathematician who has thought about some of the problems here and wants to write an expanded commentary for one of them, with lots of references, comparisons to other problems, and other miscellaneous insights (mathematician here is interpreted broadly, in that if you have thought about the problems on this site and are willing to write such a commentary you qualify);
  • You knew Erdős and have any memories or personal correspondence concerning a particular problem;
  • You have solved an Erdős problem and I’ll update the website accordingly (and apologies if you solved this problem some time ago);
  • You have spotted a mistake, typo, or duplicate problem, or anything else that has confused you and I’ll correct things;
  • You are a human being with an internet connection and want to volunteer a particular Erdős paper or problem list to go through and add new problems from (please let me know before you start, to avoid duplicate efforts);
  • You have any other ideas or suggestions – there are probably lots of things I haven’t thought of, both in ways this site can be made better, and also what else could be done from this project. Please get in touch with any ideas!

I for instance contributed a problem to the site (#587) that Erdős himself gave to me personally (this was the topic of a somewhat well known photo of Paul and myself, and which he communicated again to be shortly afterwards on a postcard; links to both images can be found by following the above link). As it turns out, this particular problem was essentially solved in 2010 by Nguyen and Vu.

(Incidentally, I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my slides.)

Scott Aaronson That IACR preprint

Update (April 19): Apparently a bug has been found, and the author has withdrawn the claim (see the comments).

For those who don’t yet know from their other social media: a week ago the cryptographer Yilei Chen posted a preprint,, claiming to give a polynomial-time quantum algorithm to solve lattice problems. For example, it claims to solve the GapSVP problem, which asks to approximate the length of the shortest nonzero vector in a given n-dimensional lattice, to within an approximation ratio of ~n4.5. The best approximation ratio previously known to be achievable in classical or quantum polynomial time was exponential in n.

If it’s correct, this is an extremely big deal. It doesn’t quite break the main lattice-based cryptosystems, but it would put those cryptosystems into a precarious position, vulnerable to a mere further polynomial improvement in the approximation factor. And, as we learned from the recent NIST competition, if the lattice-based and LWE-based systems were to fall, then we really don’t have many great candidates left for post-quantum public-key cryptography! On top of that, a full quantum break of LWE (which, again, Chen is not claiming) would lay waste (in a world with scalable QCs, of course) to a large fraction of the beautiful sandcastles that classical and quantum cryptographers have built up over the last couple decades—everything from Fully Homomorphic Encryption schemes, to Mahadev’s protocol for proving the output of any quantum computation to a classical skeptic.

So on the one hand, this would substantially enlarge the scope of exponential quantum speedups beyond what we knew a week ago: yet more reason to try to build scalable QCs! But on the other hand, it could also fuel an argument for coordinating to slow down the race to scalable fault-tolerant QCs, until the world can get its cryptographic house into better order. (Of course, as we’ve seen with the many proposals to slow down AI scaling, this might or might not be possible.)

So then, is the paper correct? I don’t know. It’s very obviously a serious effort by a serious researcher, a world away from the P=NP proofs that fill my inbox every day. But it might fail anyway. I’ve asked the world experts in quantum algorithms for lattice problems, and they’ve been looking at it, and none of them is ready yet to render a verdict. The central difficulty is that the algorithm is convoluted, and involves new tools that seem to come from left field, including complex Gaussian functions, the windowed quantum Fourier transform, and Karst waves (whatever those are). The algorithm has 9 phases by the author’s count. In my own perusal, I haven’t yet extracted even a high-level intuition—I can’t tell any little story like for Shor’s algorithm, e.g. “first you reduce factoring to period-finding, then you solve period-finding by applying a Fourier transform to a vector of amplitudes.”

So, the main purpose of this post is simply to throw things open to commenters! I’m happy to provide a public clearinghouse for questions and comments about the preprint, if those studying it would like that. You can even embed LaTeX in your comments, as will probably be needed to get anywhere.

Unrelated Update: Connor Tabarrok and his friends just put a podcast with me up on YouTube, in which they interview me in my office at UT Austin about watermarking of large language models and other AI safety measures.

Matt von HippelNo Unmoved Movers

Economists must find academics confusing.

When investors put money in a company, they have some control over what that company does. They vote to decide a board, and the board votes to hire a CEO. If the company isn’t doing what the investors want, the board can fire the CEO, or the investors can vote in a new board. Everybody is incentivized to do what the people who gave the money want to happen. And usually, those people want the company to increase its profits, since most of them people are companies with their own investors).

Academics are paid by universities and research centers, funded in the aggregate by governments and student tuition and endowments from donors. But individually, they’re also often funded by grants.

What grant-givers want is more ambiguous. The money comes in big lumps from governments and private foundations, which generally want something vague like “scientific progress”. The actual decision of who gets the money are made by committees made up of senior scientists. These people aren’t experts in every topic, so they have to extrapolate, much as investors have to guess whether a new company will be profitable based on past experience. At their best, they use their deep familiarity with scientific research to judge which projects are most likely to work, and which have the most interesting payoffs. At their weakest, though, they stick with ideas they’ve heard of, things they know work because they’ve seen them work before. That, in a nutshell, is why mainstream research prevails: not because the mainstream wants to suppress alternatives, but because sometimes the only way to guess if something will work is raw familiarity.

(What “works” means is another question. The cynical answers are “publishes papers” or “gets citations”, but that’s a bit unfair: in Europe and the US, most funders know that these numbers don’t tell the whole story. The trivial answer is “achieves what you said it would”, but that can’t be the whole story, because some goals are more pointless than others. You might want the answer to be “benefits humanity”, but that’s almost impossible to judge. So in the end the answer is “sounds like good science”, which is vulnerable to all the fads you can imagine…but is pretty much our only option, regardless.)

So are academics incentivized to do what the grant committees want? Sort of.

Science never goes according to plan. Grant committees are made up of scientists, so they know that. So while many grants have a review process afterwards to see whether you achieved what you planned, they aren’t all that picky about it. If you can tell a good story, you can explain why you moved away from your original proposal. You can say the original idea inspired a new direction, or that it became clear that a new approach was necessary. I’ve done this with an EU grant, and they were fine with it.

Looking at this, you might imagine that an academic who’s a half-capable storyteller could get away with anything they wanted. Propose a fashionable project, work on what you actually care about, and tell a good story afterwards to avoid getting in trouble. As long as you’re not literally embezzling the money (the guy who was paying himself rent out of his visitor funding, for instance), what could go wrong? You get the money without the incentives, you move the scientific world and nobody gets to move you.

It’s not quite that easy, though.

Sabine Hossenfelder told herself she could do something like this. She got grants for fashionable topics she thought were pointless, and told herself she’d spend time on the side on the things she felt were actually important. Eventually, she realized she wasn’t actually doing the important things: the faddish research ended up taking all her time. Not able to get grants doing what she actually cared about (and, in one of those weird temporary European positions that only lasts until you run out of grants), she now has to make a living from her science popularization work.

I can’t speak for Hossenfelder, but I’ve also put some thought into how to choose what to research, about whether I could actually be an unmoved mover. A few things get in the way:

First, applying for grants doesn’t just take storytelling skills, it takes scientific knowledge. Grant committees aren’t experts in everything, but they usually send grants to be reviewed by much more appropriate experts. These experts will check if your grant makes sense. In order to make the grant make sense, you have to know enough about the faddish topic to propose something reasonable. You have to keep up with the fad. You have to spend time reading papers, and talking to people in the faddish subfield. This takes work, but also changes your motivation. If you spend time around people excited by an idea, you’ll either get excited too, or be too drained by the dissonance to get any work done.

Second, you can’t change things that much. You still need a plausible story as to how you got from where you are to where you are going.

Third, you need to be a plausible person to do the work. If the committee looks at your CV and sees that you’ve never actually worked on the faddish topic, they’re more likely to give a grant to someone who’s actually worked on it.

Fourth, you have to choose what to do when you hire people. If you never hire any postdocs or students working on the faddish topic, then it will be very obvious that you aren’t trying to research it. If you do hire them, then you’ll be surrounded by people who actually care about the fad, and want your help to understand how to work with it.

Ultimately, to avoid the grant committee’s incentives, you need a golden tongue and a heart of stone, and even then you’ll need to spend some time working on something you think is pointless.

Even if you don’t apply for grants, even if you have a real permanent position or even tenure, you still feel some of these pressures. You’re still surrounded by people who care about particular things, by students and postdocs who need grants and jobs and fellow professors who are confident the mainstream is the right path forward. It takes a lot of strength, and sometimes cruelty, to avoid bowing to that.

So despite the ambiguous rules and lack of oversight, academics still respond to incentives: they can’t just do whatever they feel like. They aren’t bound by shareholders, they aren’t expected to make a profit. But ultimately, the things that do constrain them, expertise and cognitive load, social pressure and compassion for those they mentor, those can be even stronger.

I suspect that those pressures dominate the private sector as well. My guess is that for all that companies think of themselves as trying to maximize profits, the all-too-human motivations we share are more powerful than any corporate governance structure or org chart. But I don’t know yet. Likely, I’ll find out soon.

n-Category Café The Modularity Theorem as a Bijection of Sets

guest post by Bruce Bartlett

John has been making some great posts on counting points on elliptic curves (Part 1, Part 2, Part 3). So I thought I’d take the opportunity and float my understanding here of the Modularity Theorem for elliptic curves, which frames it as an explicit bijection between sets. To my knowledge, it is not stated exactly in this form in the literature. There are aspects of this that I don’t understand (the explicit isogeny); perhaps someone can assist.

Bijection statement

Here is the statement as I understand it to be, framed as a bijection of sets. My chief reference is the wonderful book Elliptic Curves, Modular Forms and their L-Functions by Álvaro Lozano-Robledo (and references therein), as well as the standard reference A First Course in Modular Forms by Diamond and Shurman.

I will first make the statement as succinctly as I can, then I will ask the question I want to ask, then I will briefly explain the terminology I’ve used.

Modularity Theorem (Bijection version). The following maps are well-defined and inverse to each other, and give rise to an explicit bijection of sets:

{Elliptic curves defined over with conductorN}/isogeny{Integral normalized newforms of weight 2 for Γ 0(N)} \left\{\begin{array}{c} \text{Elliptic curves defined over}\: \mathbb{Q} \\ \text{with conductor}\: N \end{array} \right\} \: / \: \text{isogeny} \quad \leftrightarrows \quad \left\{ \begin{array}{c} \text{Integral normalized newforms} \\ \text{of weight 2 for }\: \Gamma_0(N) \end{array} \right\}

  • In the forward direction, given an elliptic curve EE defined over the rationals, we build the modular form f E(z)= n=1 a nq n,q=e 2πiz f_E(z) = \sum_{n=1}^\infty a_n q^n , \quad q=e^{2 \pi i z} where the coefficients a na_n are obtained by expanding out the following product over all primes as a Dirichlet series, pexp( k=1 |E(𝔽 p k)|kp ks)=a 11 s+a 22 s+a 33 s+a 44 s+, \prod_p \text{exp}\left( \sum_{k=1}^\infty \frac{|E(\mathbb{F}_{p^k})|}{k} p^{-ks} \right) = \frac{a_1}{1^s} + \frac{a_2}{2^s} + \frac{a_3}{3^s} + \frac{a_4}{4^s} + \cdots , where |E(𝔽 p k)||E(\mathbb{F}_{p^k})| counts the number of solutions to the equation for the elliptic curve over the finite field 𝔽 p k\mathbb{F}_{p^k} (including the point at infinity). So for example, as John taught us in Part 3, for good primes pp (which is almost all of them), a p=p+1|E(𝔽 p)|. a_p = p + 1 - |E(\mathbb{F}_p)|. But the above description tells you how to compute a na_n for any natural number nn. (By the way, the nontrivial content of the theorem is proving that f Ef_E is indeed a modular form for any elliptic curve EE).

  • In the reverse direction, given an integral normalized newform ff of weight 22 for Γ 0(N)\Gamma_0(N), we interpret it as a differential form on the genus gg modular surface X 0(N)X_0(N), and then compute its period lattice Λ\Lambda \subset \mathbb{C} by integrating it over all the 1-cycles in the first homology group of X 0(N)X_0(N). Then the resulting elliptic curve is E f=/ΛE_f = \mathbb{C}/\Lambda.

An explicit isogeny?

My question to the experts is the following. Suppose we start with an elliptic curve EE defined over \mathbb{Q}, then compute the modular form f Ef_E, and then compute its period lattice Λ\Lambda to arrive at the elliptic curve E=/ΛE' = \mathbb{C} / \Lambda. The theorem says that EE and EE' are isogenous. What is the explicit isogeny?


  • An elliptic curve is a complex curve E 2E \subset \mathbb{C}\mathbb{P}^2 defined by a cubic polynomial F(X,Y,Z)=0F(X,Y,Z)=0 with rational coefficients, such that EE is smooth, i.e. the tangent vector (FX,FY,FZ)(\frac{\partial F}{\partial X}, \frac{\partial F}{\partial Y}, \frac{\partial F}{\partial Z}) does not vanish at any point pEp \in E. If the coefficients are all rational, then we say that EE is defined over \mathbb{Q}. We can always make a transformation of variables and write the equation for EE in an affine chart in Weierstrass form, y 2=x 3+Ax+B. y^2 = x^3 + A x + B. Importantly, every elliptic curve is isomorphic to one of the form /Λ\mathbb{C} / \Lambda where Λ\Lambda is a rank 2 sublattice of \mathbb{C}. So, an elliptic curve is topologically a doughnut S 1×S 1S^1 \times S^1, and it has an addition law making it into an abelian group.

  • An isogeny from EE to EE' is a surjective holomorphic homomorphism. This is actually an equivalence relation on the class of elliptic curves.

  • The conductor of an elliptic curve EE defined over the rationals is N= pp f p N = \prod_p p^{f_p} where: f p={0 ifEremains smooth over𝔽 p 1 ifEgets a node over𝔽 p 2 ifEgets a cusp over𝔽 pandp2,3 2+δ p ifEgets a cusp over𝔽 pandp=2or3 f_p = \begin{cases} 0 & \text{if}\:E\:\text{remains smooth over}\:\mathbb{F}_p \\ 1 & \text{if}\:E\:\text{gets a node over}\:\mathbb{F}_p \\ 2 & \text{if}\:E\:\text{gets a cusp over}\:\mathbb{F}_p\:\text{and}\: p \neq 2,3 \\ 2+\delta_p & \text{if}\:E\:\text{gets a cusp over}\:\mathbb{F}_p\:\text{and}\: p = 2\:\text{or}\:3 \end{cases} where δ p\delta_p is a technical invariant that describes whether there is wild ramification in the action of the inertia group at pp of Gal(¯/)\text{Gal}(\bar{\mathbb{Q}}/\mathbb{Q}) on the Tate module T p(E)T_p(E).

  • The modular curve X 0(N)X_0(N) is a certain compact Riemann surface which parametrizes isomorphism classes of pairs (E,C)(E,C) where EE is an elliptic curve and CC is a cyclic subgroup of EE of order NN.The genus of X 0(N)X_0(N) depends on NN.

  • A modular form ff for Γ 0(N)\Gamma_0(N) of weight kk is a certain kind of holomorphic function f:f : \mathbb{H} \rightarrow \mathbb{C}. The number NN is called the level of the modular form.

  • Every modular form f(z)f(z) can be expanded as a Fourier series f(z)= n=0 a nq n,q=e 2πiz f(z) = \sum_{n=0}^\infty a_n q^n, \quad q=e^{2 \pi i z} We say that ff is integral if all its Fourier coefficients a na_n are integers. We say ff is a cusp form if a 0=0a_0 = 0. A cusp form is called normalized if a 1=1a_1 = 1.

  • Geometrically, a cusp form of weight kk can be interpreted as a holomorphic section of a certain line bundle L kL_k over X 0(N)X_0(N). Since X 0(N)X_0(N) is compact, this implies that the vector space of cusp modular forms is finite-dimensional. (In particular, this means that ff is determined by only finitely many of its Fourier coefficients).

  • In particular, L 2L_2 is the cotangent bundle of X 0(N)X_0(N). This means that the cusp modular forms for Γ 0(N)\Gamma_0(N) of weight 2 can be interpreted as differential forms on X 0(N)X_0(N). That is to say, they are things that can be integrated along curves on X 0(N)X_0(N).

  • If you have a modular form of level MM which divides NN, then there is a way to build a new modular form of level NN. We call level NN forms of this type old. They form a subspace of the vector space S 2(Γ 0(N))S_2(\Gamma_0(N)). If we’re at level NN, then we are really interested in the new forms - these are the forms in S 2(Γ 0(N))S_2(\Gamma_0(N)) which are orthogonal to the old forms, with respect to a certain natural inner product.

  • If you have a level 2 newform ff, and you interpret it as a differential form on X 0(N)X_0(N), then integrating ff along 1-cycles γ\gamma in X 0(N)X_0(N) will give a nonzero result only for cycles living in a rank 2 sublattice of H 1(X 0(N))H_1(X_0(N)). So, the period integrals of ff will form a rank-2 sublattice Λ\Lambda \subset \mathbb{C}.

  • So, given a level 2 newform ff, we get a canonical integration map I:X 0(N)/Λ I: X_0(N) \rightarrow \mathbb{C}/\Lambda obtained by fixing a basepoint x 0X 0(N)x_0 \in X_0(N) and then defining I(x)= γf I(x) = \int_{\gamma} f where γ\gamma is any path from x 0x_0 to xx in X 0(N)X_0(N). The answer won’t depend on the choice of path, because different choices will differ by a 1-cycle, and we are modding out by the periods of 1-cycles!

  • The Jacobian of a Riemann surface XX is the quotient group Jac(X)=Ω hol 1(X) /H 1(X;) \text{Jac}(X) = \Omega^1_\text{hol} (X)^\vee / H_1(X; \mathbb{Z}) This is why one version of the Modularity Theorem says:

    Modularity Theorem (Diamond and Shurman’s Version J CJ_C). There exists a surjective holomorphic homomorphism of the (higher-dimensional) complex torus Jac(X 0(N))\text{Jac}(X_0(N)) onto EE.

    I would like to ask the same question here as I asked before: is there an explicit description of this map?

April 18, 2024

Tommaso DorigoOn Rating Universities

In a world where we live hostages of advertisement, where our email addresses and phone numbers are sold and bought by companies eager to intrude in our lives and command our actions, preferences, tastes; in a world where appearance trumps substance 10 to zero, where your knowledge and education are less valued than your looks, a world where truth is worth dimes and myths earn you millions - in this XXI century world, that is, Universities look increasingly out of place. 

read more

n-Category Café The Quintic, the Icosahedron, and Elliptic Curves

Old-timers here will remember the days when Bruce Bartlett and Urs Schreiber were regularly talking about 2-vector spaces and the like. Later I enjoyed conversations with Bruce and Greg Egan on quintics and the icosahedron. And now Bruce has come out with a great article linking those topics to elliptic curves!

It’s expository and fun to read.

I can’t do better than quoting the start:

There is a remarkable relationship between the roots of a quintic polynomial, the icosahedron, and elliptic curves. This discovery is principally due to Felix Klein (1878), but Klein’s marvellous book misses a trick or two, and doesn’t tell the whole story. The purpose of this article is to present this relationship in a fresh, engaging, and concise way. We will see that there is a direct correspondence between:

  • “evenly ordered” roots (x 1,,x 5)(x_1, \dots, x_5) of a Brioschi quintic x 5+10bx 3+45bx 2+b 2=0x^5 + 10b x^3 + 45b x^2 + b^2 = 0,
  • points on the icosahedron, and
  • elliptic curves equipped with a primitive basis for their 5-torsion, up to isomorphism.

Moreover, this correspondence gives us a very efficient direct method to actually calculate the roots of a general quintic! For this, we’ll need some tools both new and old, such as Cremona and Thongjunthug’s complex arithmetic geometric mean, and the Rogers–Ramanujan continued fraction. These tools are not found in Klein’s book, as they had not been invented yet!

If you are impatient, skip to the end to see the algorithm.

If not, join me on a mathematical carpet ride through the mathematics of the last four centuries. Along the way we will marvel at Kepler’s Platonic model of the solar system from 1597, witness Gauss’ excitement in his diary entry from 1799, and experience the atmosphere in Trinity College Hall during the wonderful moment Ramanujan burst onto the scene in 1913.

The prose sizzles with excitement, and the math lives up to this.

April 17, 2024

n-Category Café Pythagorean Triples and the Projective Line

Pythagorean triples like 3 2+4 2=5 23^2 + 4^2 = 5^2 may seem merely cute, but they’re connected to some important ideas in algebra. To start seeing this, note that rescaling any Pythagorean triple m 2+n 2=k 2m^2 + n^2 = k^2 gives a point with rational coordinates on the unit circle:

(m/k) 2+(n/k) 2=1(m/k)^2 + (n/k)^2 = 1

Conversely any point with rational coordinates on the unit circle can be scaled up to get a Pythagorean triple.

Now, if you’re a topologist or differential geometer you’ll know the unit circle is isomorphic to the real projective line P 1\mathbb{R}\mathrm{P}^1 as a topological space, and as a smooth manifold. You may even know they’re isomorphic as real algebraic varieties. But you may never have wondered whether the points with rational coordinates on the unit circle form a variety isomorphic to the rational projective line P 1\mathbb{Q}\mathrm{P}^1.

It’s true! And since P 1\mathbb{Q}\mathrm{P}^1 is \mathbb{Q} plus a point at infinity, this means there’s a way to turn rational numbers into Pythagorean triples. Working this out explicitly, this gives a nice explicit way to get our hands on all Pythagorean triples. And as a side-benefit, we see that points with rational coordinates are dense in the unit circle.

The basic idea is simple, but there’s a bit of suspense involved, and thus a bit to be learned.

First we set up an explicit isomorphism between P 1\mathbb{R}\mathrm{P}^1 and the unit circle S 1S^1. To do this, we think of P 1\mathbb{R}\mathrm{P}^1 as the xx axis in the plane together with a point at infinity. Then we map each point on the xx axis to a point on the unit circle by drawing a straight line through that point and the ‘north pole’ (0,1)(0,1) and seeing where it hits the circle:

This extends to a bijection f:P 1S 1f \colon \mathbb{R}\mathrm{P}^1 \to S^1. Then we can restrict this to an isomorphism between P 1\mathbb{Q}\mathrm{P}^1 and what I’ll call the rational unit circle

{(x,y) 2|x 2+y 2=1} \{(x,y) \in \mathbb{Q}^2 \vert \; x^2 + y^2 = 1 \}

But there’s something to check here! Why does ff map rational numbers qq \in \mathbb{Q} to points on S 1S^1 with rational coordinates?

For this we need to think about ff. The line through the north pole and a point (q,0)(q,0) on the xx axis is given by

x=q(1y) x = q(1-y)

since this equation gives x=0x = 0 when y=1y = 1 and x=qx = q when y=0y = 0. This line goes through some point f(q)f(q) on the unit circle. To figure out this point f(q)=(x,y)f(q) = (x,y), notice it obeys the equation for the line and also

x 2+y 2=1 x^2 + y^2 = 1

So plop the equation for the line into the equation for the circle and get

q 2(1y) 2+y 2=1 q^2 (1-y)^2 + y^2 = 1

Now things get sort of interesting. It’s easy to solve this using the quadratic formula, but it’s not instantly obvious the solution will be rational when qq is rational! We could work it out and see, but a general principle saves us from this demeaning labor:

Lemma. If ax 2+bx+ca x^2 + b x + c is a quadratic with coefficients a,b,ca,b,c in some field kk, and this quadratic has one root r 1r_1 in kk, then the quadratic factors as a(xr 1)(xr 2)a(x-r_1)(x-r_2) in this field kk.

Our equation q 2(1y) 2+y 2=1q^2(1-y)^2 + y^2 = 1 has rational coefficients when qq is rational, and we know it has one rational solution, namely y=1y = 1, since the north pole lies both on the line and the circle. So the lemma says the other solution, the one we care about, must also be rational!

In short: whenever qq is rational so is yy, and then x=(1q)yx = (1-q)y is as well, so f(q)=(x,y)f(q) = (x,y) is a point on the rational unit circle.

Since lines from the north pole to rational points on the xx axis hit the unit circle in a dense set, it follows that the rational unit circle is dense in the unit circle. And these correspond precisely to Pythagorean triples mod rescaling by integers. So we get all the Pythagorean triples this way, and a lot of them.

That makes it more interesting to see an explicit formula for f(q)=(x,y)f(q) = (x,y). I joked that this would require “demeaning labor”. It’s just some algebra, but it’s a bit annoying. The main step is to solve the quadratic equation

q 2(1y) 2+y 2=1 q^2 (1-y)^2 + y^2 = 1

or in other words

(q 2+1)y 22q 2y+(q 21)=0 (q^2 + 1) y^2 - 2q^2 y + (q^2 - 1) = 0

We could solve this using the quadratic formula—but that makes it seem a bit miraculous that in addition to the root y=1y = 1, the root we care about will also be rational. It’s actually more fun to solve this equation using this proof of the Lemma:

Lemma. If ax 2+bx+ca x^2 + b x + c is a quadratic with coefficients a,b,ca,b,c in some field kk, and this quadratic has one root r 1r_1 in kk, then the quadratic factors as a(xr 1)(xr 2)a(x-r_1)(x-r_2) for some r 2kr_2 \in k.

Proof. There’s a general abstract nonsense way to prove this, but it’s even quicker to show it directly and get a formula for r 2r_2. The quadratic must factor in the algebraic closure of kk, so

ax 2+bx+c=a(xr 1)(xr 2) a x^2 + b x + c = a(x-r_1)(x-r_2)

for r 1,r 2k¯r_1, r_2 \in \overline{k}. Thus

b=a(r 1+r 2) b = - a(r_1 + r_2)

which lets us solve for one root if we know the other:

r 2=bar 1 r_2 = - \frac{b}{a} - r_1

If the coefficients a,ba,b are in kk and the root r 1r_1 is in kk, it follows that the root r 2r_2 is also in kk. \quad

Since our quadratic

(q 2+1)y 22q 2y+(q 21)=0 (q^2 + 1) y^2 - 2q^2 y + (q^2 - 1) = 0

has one rational root r 1=1r_1 = 1, we see that the other root must be

r 2=2q 2q 2+11 r_2 = \frac{2q^2}{q^2 + 1} - 1

That’s the root we want, so

y=2q 2q 2+11 y = \frac{2q^2}{q^2 + 1} - 1

See, this was more fun than using the quadratic formula and discovering that “miraculously” the answer is rational despite the square root! Simplifying a bit we get

y=q 21q 2+1 y = \frac{q^2 - 1}{q^2 + 1}

which is much more cute. Then we get

x=q(1y)=q(1q 21q 2+1) x = q(1-y) = q\left(1 - \frac{q^2 - 1}{q^2 + 1} \right)

Simplifying this we again get something much more cute:

x=2qq 2+1 x = \frac{2q}{q^2 + 1}

So, our map from the rational projective line to the rational circle is

f(q)=(2qq 2+1,q 21q 2+1) f(q) = \left( \frac{2q}{q^2 + 1}, \frac{q^2 - 1}{q^2 + 1} \right)

You can see this formula on Wikipedia, but I find the journey much more interesting than the destination… and we’re not even done with the journey.

But before we go on, we should at least take a moment to childishly flex our muscles. Take any rational number, like q=100q = 100, and pop it into our formula for ff. We get

f(100)=(20010001,999910001) f(100) = \left( \frac{200}{10001}, \frac{9999}{10001} \right)

This means that

(20010001) 2+(999910001) 2=1 \left( \frac{200}{10001} \right)^2 + \left( \frac{9999}{10001} \right)^2 = 1

Rescaling, we get a Pythagorean triple:

200 2+9999 2=10001 2 200^2 + 9999^2 = 10001^2


Further adventures

The unit circle is an example of a ‘conic’. Other examples include ellipses, hyperbolas and parabolas. Conics can be studied over any field kk, but it’s easiest to study them as projective varieties rather than affine varieties. To do that we take a quadratic form in 3 variables, see where it vanishes, and then ‘projectivize’: mod out by rescaling to get a subvariety of the projective plane. For example we get our friend the unit circle by projectivizing the real solutions of

x 2+y 2=z 2 x^2 + y^2 = z^2

and the same idea works for the rational unit circle if we use rational solutions.

Working projectively, we include ‘points at infinity’ which make our conics better behaved. The ordinary real hyperbola is obviously not isomorphic to P 1\mathbb{R}\mathrm{P}^1, but when we work projectively it gets two points at infinity that fix this problem: you can sail out along one branch of the hyperbola to a point at infinity, sail back in along another branch, go out along that branch to another point at infinity, and then sail back to where you started. It’s really a circle viewed in a funny way!

I got interested in this stuff while writing about elliptic curves over finite fields. I suddenly realized I didn’t even understand conics over finite fields! This is like writing about cubic equations when you don’t even understand the quadratic equation. So I started learning about conics.

Among other things, I re-read Chapter 1 of Gille and Szamuely’s nice book Central Simple Algebras and Galois Cohomology, which explains the correspondence between projective conics over an arbitrary field and ‘quaternion algebras’ over that field, which are central simple algebras of dimension 4. I believe this should clarify the relation between Pythagorean triples and the algebra of 2×22 \times 2 integer matrices lurking behind Trautman’s work on Pythagorean spinors and the modular group. At least the algebra of 2×22 \times 2 rational matrices is a quaternion algebra over \mathbb{Q}. But there seems to be something more going on here: some extension of the correspondence between conics and quaternion algebras over fields to certain more general commutative rings, including \mathbb{Z} but also maybe all integral domains or something.

Why are Gille and Szamuely talking about conics and quaternion algebras? Because the correspondence between these led people to the more general correspondence between central simple algebras over a field kk and Severi–Brauer varieties over that field, which are varieties that become isomorphic to projective space when we pass to the algebraic closure k¯\overline{k}. Attempting to classify these things led folks like Brauer, Hasse and Noether to invent what we’d now Galois cohomology, leading inexorably to modern homological algebra and descent theory. A great historical account is here:

I learned a lot of algebra from this paper; it’s a good supplement to Gille and Szamuely’s book.

But I digress. Gille and Szamuely write:

It is a well-known fact from algebraic geometry that a smooth projective conic defined over a field kk is isomorphic to the projective line P 1\mathrm{P}^1 over kk if and only if it has a kk-rational point. The isomorphism is given by taking the line joining a point PP of the conic to some fixed kk-rational point OO and then taking the intersection of this line with P 1\mathrm{P}^1 embedded as, say, some coordinate axis in P 2\mathrm{P}^2.

As far as I can tell, a ‘kk-rational point’ is just a point defined over the field kk (rather than some algebraic extension). So they’re saying a smooth projective conic over a field kk that has any points in this simple sense must be isomorphic to the projective line kP 1k\mathrm{P}^1. And their sketched proof of this generalizes the argument I presented above for k=k = \mathbb{Q}. Discussing this over on the Category Theory Community Server, I got some help from people like Morgan Rogers, who pointed out the lemma used above. This lemma should guarantee that if a line in kP 2k\mathrm{P}^2 intersects a conic transversally at one point it intersects it in some other point.

Matt Strassler Speaking Today in Seattle, Tomorrow near Portland

A quick reminder, to those in the northwest’s big cities, that I will be giving two talks about my book in the next 48 hours:

Hope to see some of you there! (You can keep track of my speaking events at my events page.)

John BaezAgent-Based Models (Part 8)

Last time I presented a class of agent-based models where agents hop around a graph in a stochastic way. Each vertex of the graph is some ‘state’ agents can be in, and each edge is called a ‘transition’. In these models, the probability per time of an agent making a transition and leaving some state can depend on when it arrived at that state. It can also depend on which agents are in other states that are ‘linked’ to that edge—and when those agents arrived.

I’ve been trying to generalize this framework to handle processes where agents are born or die—or perhaps more generally, processes where some number of agents turn into some other number of agents. There’s already a framework that does something sort of like this. It’s called ‘stochastic Petri nets’, and we explained this framework here:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics, World Scientific Press, Singapore, 2018. (See also blog articles here.)

However, in their simplest form, stochastic Petri nets are designed for agents whose only distinguishing information is which state they’re in. They don’t have ‘names’—that is, individual identities. Thus, even calling them ‘agents’ is a bit of a stretch: usually they’re called ‘tokens’, since they’re drawn as black dots.

We could try to enhance the Petri net framework to give tokens names and other identifying features. There are various imaginable ways to do this, such as ‘colored Petri nets’. But so far this approach seems rather ill-adapted for processes where agents have identities—perhaps because I’m not thinking about the problem the right way.

So, at some point I decided to try something less ambitious. It turns out that in applications to epidemiology, general processes where n agents come in and m go out are not often required. So I’ve been trying to minimally enhance the framework from last time to include processes ‘birth’ and ‘death’ processes as well as transitions from state to state.

As I thought about this, some questions kept plaguing me:

When an agent gets created, or ‘born’, which one actually gets born? In other words, what is its name? Its precise name may not matter, but if we want to keep track of it after it’s born, we need to give it a name. And this name had better be ‘fresh’: not already the name of some other agent.

There’s also the question of what happens when an agent gets destroyed, or ‘dies’. This feels less difficult: there just stops being an agent with the given name. But probably we want to prevent a new agent from having the same name as that dead agent.

Both these questions seem fairly simple, but so far they’re making it hard for me to invent a truly elegant framework. At first I tried to separately describe transitions between states, births, and deaths. But this seemed to triplicate the amount of work I needed to do.

Then I tried models that have

• a finite set S of states,

• a finite set T of transitions,

• maps u, d \colon T \to S + \{\textrm{undefined}\} mapping each transition to its upstream and downstream states.

Here S + \{\textrm{undefined}\} is the disjoint union of S and a singleton whose one element is called undefined. Maps from T to S + \{\textrm{undefined}\} are a standard way to talk about partially defined maps from T to S. We get four cases:

1) If the downstream of a transition is defined (i.e. in S) but its upstream is undefined we call this transition a birth transition.

2) If the upstream of a transition is defined but its downstream is undefined we call this transition a death transition.

3) If the upstream and downstream of a transition are both defined we call this transition a transformation. In practice most of transitions will be of this sort.

4) We never need transitions whose upstream and downstream are undefined: these would describe agents that pop into existence and instantly disappear.

This is sort of nice, except for the fourth case. Unfortunately when I go ahead and try to actually describe a model based on this paradigm, I seem still to wind up needing to handle births, deaths and transformations quite differently.

For example, last time my models had a fixed set A of agents. To handle births and deaths, I wanted to make this set time-dependent. But I need to separately say how this works for transformations, birth transitions and death transitions. For transformations we don’t change A. For birth transitions we add a new element to A. And for death transitions we remove an element from A, and maybe record its name on a ledger or drive a stake through its heart to make sure it can never be born again!

So far this is tolerable, but things get worse. Our model also needs ‘links’ from states to transitions, to say how agents present in those states affect the timing of those transition. These are used in the ‘jump function’, a stochastic function that answers this question:

If at time t agent a arrives at the state upstream to some transition e, and the agents at states linked to the transition e form some set S_e, when will agent a make the transition e given that it doesn’t do anything else first?

This works fine for transformations, meaning transitions e that have both an upstream and downstream state. It works just a tiny bit differently for death transitions. But birth transitions are quite different: since newly born agents don’t have a previous upstream state u(e), they don’t have a time at which they arrived at that state.

Perhaps this is just how modeling works: perhaps the search for a staggeringly beautiful framework is a distraction. But another approach just occurred to me. Today I just want to briefly state it. I don’t want to write a full blog article on it yet, since I’ve already spent a lot of time writing two articles that I deleted when I became disgusted with them—and I might become disgusted with this approach too!

Briefly, this approach is exactly the approach I described last time. There are fundamentally no births and no deaths: all transitions have an upstream and a downstream state. There is a fixed set A of agents that does not change with time. We handle births and deaths using a dirty trick.

Namely, births are transitions out of a ‘unborn’ state. Agents hang around in this state until they are born.

Similarly, deaths are transitions to a ‘dead’ state.

There can be multiple ‘unborn’ states and ‘dead’ states. Having multiple unborn states makes it easy to have agents with different characteristics enter the model. Having multiple dead states makes it easy for us to keep tallies of different causes of death. We should make the unborn states distinct from the dead states to prevent ‘reincarnation’—that is, the birth of a new agent that happens to equal an agent that previously died.

I’m hoping that when we proceed this way, we can shoehorn birth and death processes into the framework described last time, without really needing to modify it at all! All we’re doing is exploiting it in a new way.

Here’s one possible problem: if we start with a finite number of agents in the ‘unborn’ states, the population of agents can’t grow indefinitely! But this doesn’t seem very dire. For most agent-based models we don’t feel a need to let the number of agents grow arbitrarily large. Or we can relax the requirement that the set of agents is finite, and put an infinite number of agents u_1, u_2, u_3, \dots in an unborn state. This can be done without using an infinite amount of memory: it’s a ‘potential infinity’ rather than an ‘actual infinity’.

There could be other problems. So I’ll post this now before I think of them.

April 16, 2024

Matt Strassler Why The Higgs Field is Nothing Like Molasses, Soup, or a Crowd

The idea that a field could be responsible for the masses of particles (specifically the masses of photon-like [“spin-one”] particles) was proposed in several papers in 1964. They included one by Peter Higgs, one by Robert Brout and Francois Englert, and one, slightly later but independent, by Gerald Guralnik, C. Richard Hagen, and Tom Kibble. This general idea was then incorporated into a specific theory of the real world’s particles; this was accomplished in 1967-1968 in two papers, one written by Steven Weinberg and one by Abdus Salam. The bare bones of this “Standard Model of Particle Physics” was finally confirmed experimentally in 2012.

How precisely can mass come from a field? There’s a short answer to this question, invented a couple of decades ago. It’s the kind of answer that serves if time is short and attention spans are limited; it is intended to sound plausible, even though the person delivering the “explanation” knows that it is wrong. In my recent book, I called this type of little lie, a compromise that physicists sometimes have to make between giving no answer and giving a correct but long answer, a “phib” — a physics fib. Phibs are usually harmless, as long as people don’t take them seriously. But the Higgs field’s phib is particularly problematic.

The Higgs Phib

The Higgs phib comes in various forms. Here’s a particularly short one:

There’s this substance, like a soup, that fills the universe; that’s the Higgs field. As objects move through it, the soup slows them down, and that’s how they get mass.

Some variants replace the soup with other thick substances, or even imagine the field as though it were a crowd of people.

How bad is this phib, really? Well, here’s the problem with it. This phib violates several basic laws of physics. These include foundational laws that have had a profound impact on human culture and are the first ones taught in any physics class. It also badly misrepresents what a field is and what it can do. As a result, taking the phib seriously makes it literally impossible to understand the universe, or even daily human experience, in a coherent way. It’s a pedagogical step backwards, not forwards.

What’s Wrong With The Higgs Phib

So here are my seven favorite reasons to put a flashing red warning sign next to any presentation of the Higgs phib.

1. Against The Principle of Relativity

The phib brazenly violates the principle of relativity — both Galileo’s original version and Einstein’s updates to it. That principle, the oldest law of physics that has never been revised, says that if your motion is steady and you are in a closed room, no experiment can tell you your speed, your direction of motion, or even whether you are in motion at all. The phib directly contradicts this principle. It claims that

  • if an object moves, the Higgs field affects it by slowing it down, while
  • if it doesn’t move, the Higgs field does nothing to it.

But if that were true, the action of the Higgs field could easily allow you to distinguish steady motion from being stationary, and the principle of relativity would be false.

2. Against Newton’s First Law of Motion

The phib violates Newton’s first law of motion — that an object in motion not acted on by any force will remain in steady motion. If the Higgs field slowed things down, it could only do so, according to this law, by exerting a force.

But Newton, in predicting the motions of the planets, assumed that the only force acting on the planets was that of gravity. If the Higgs field exerted an additional force on the planets simply because they have mass (or because it was giving them mass), Newton’s methods for predicting planetary motions would have failed.

Worse, the slowing from the Higgs field would have acted like friction over billions of years, and would by now have caused the Earth to slow down and spiral into the Sun.

3. Against Newton’s Second Law of Motion

The phib also violates Newton’s second law of motion, by completely misrepresenting what mass is. It makes it seem as though mass makes motion difficult, or at least has something to do with inhibiting motion. But this is wrong.

As Newton’s second law states, mass is something that inhibits changes in motion. It does not inhibit motion, or cause things to slow down, or arise from things being slowed down. Mass is the property that makes it hard both to speed something up and to slow it down. It makes it harder to throw a lead ball compared to a plastic one, and it also makes the lead ball harder to catch bare-handed than a plastic one. It also makes it difficult to change something’s direction.

To say this another way, Newton’s second law F=ma says that to make a change in an object’s motion (an acceleration a) requires a force (F); the larger the object’s mass (m), the larger the required force must be. Notice that it does not have anything to say about an object’s motion (its velocity v).

To suggest that mass has to do with motion, and not with change in motion, is to suggest that Newton’s law should be F=mv — which, in fact, many pre-Newtonian physicists once believed. Let’s not let a phib throw us back to the misguided science of the Middle Ages!

4. Not a Universal Mass-Giver

The phib implies that the Higgs field gives mass to all objects with mass, causing all of them to slow down. After all, if there were a universal “soup” found everywhere, then every object would encounter it. If it were true that the Higgs field acted on all objects in the same way — “universally”, similar to gravity, which pulls on all objects — then every object in our world would get its mass from the Higgs field.

But in fact, the Higgs field only generates the masses of the known elementary particles. More complex particles such as protons and neutrons — and therefore the atoms, molecules, humans and planets that contain them — get most of their mass in another way. The phib, therefore, can’t be right about how the Higgs field does its job.

5. Not Like a Substance

As is true of all fields, the Higgs field is not like a substance, in contrast to soup, molasses, or a crowd. It has no density or materiality, as soup would have. Instead, the Higgs field (like any field!) is more like a property of a substance.

As an analogue, consider air pressure (which is itself an example of an ordinary field.) Air is a substance; it is made of molecules, and has density and weight. But air’s pressure is not a thing; it is a property of air, , and is not itself a substance. Pressure has no density or weight, and is not made from anything. It just tells you what the molecules of air are doing.

The Higgs field is much more like air pressure than it is like air itself. It simply is not a substance, despite what the phib suggests.

6. Not Filling the Universe

The Higgs field does not “fill” the universe any more than pressure fills the atmosphere. Pressure is found throughout the atmosphere, yes, but it is not what makes the atmosphere full. Air is what constitutes the atmosphere, and is the only thing that can be said, in any sense, to fill it.

While a substance could indeed make the universe more full than it would otherwise be, a field of the universe is not a substance. Like the magnetic field or any other cosmic field, the Higgs field exists everywhere — but the universe would be just as empty (and just as full) if the Higgs field did not exist.

7. Not Merely By Its Presence

Finally, the phib doesn’t mention the thing that makes the Higgs field special, and that actually allows it to affect the masses of particles. This is not merely that it is present everywhere across the universe, but that it is, in a sense, “on.” To give you a sense of what this might mean, consider the wind.

On a day with a steady breeze, we can all feel the wind. But even when the wind is calm, physicists would say that the wind exists, though it is inactive. In the language I’m using here, I would say that the wind is something that can always be measured — it always exists — but

  • on a calm day it is “off” or “zero”, while
  • on a day with a steady breeze, it is “on” or “non-zero”.

In other words, the wind is always present, whether it is calm or steady; it can always be measured.

In rough analogy, the Higgs field, though switched on in our universe, might in principle have been off. A switched-off Higgs field would not give mass to anything. The Higgs field affects the masses of elementary particles in our universe only because, in addition to being present, it is on. (Physicists would say it has a “non-zero average value” or a “non-zero vacuum expectation value”)

Why is it on? Great question. From the theoretical point of view, it could have been either on or off, and we don’t know why the universe arranged for the former.

Beyond the Higgs Phib

I don’t think we can really view a phib with so many issues as an acceptable pseudo-explanation. It causes more problems and confusions than it resolves.

But I wish it were as easy to replace the Higgs phib as it is to criticize it. No equally short story can do the job. If such a brief tale were easy to imagine, someone would have invented it by now.

Some years ago, I found a way to explain how the Higgs field works that is non-technical and yet correct — one that I would be happy to present to my professional physics colleagues without apology or embarrassment. (In fact, I did just that in my recent talks at the physics departments at Vanderbilt and Irvine.) Although I tried delivering it to non-experts in an hour-long talk, I found that it just doesn’t fit. But it did fit quite well in a course for non-experts, in which I had several hours to lay out the basics of particle physics before addressing the Higgs field’s role.

That experience motivated me to write a book that contains this explanation. It isn’t brief, and it’s not a light read — the universe is subtle, and I didn’t want to water the explanation down. But it does deliver what it promises. It first carefully explains what “elementary particles” and fields really are [here’s more about fields] and what it means for such a “particle” to have mass. Then it gives the explanation of the Higgs field’s effects — to the extent we understand them. (Readers of the book are welcome to ask me questions about its content; I am collecting Q&A and providing additional resources for readers on this part of the website.)

A somewhat more technical explanation of how the Higgs field works is given elsewhere on this website: check out this series of pages followed by this second series, with additional technical information available in this third series. These pages do not constitute a light read either! But if you are comfortable with first-year university math and physics, you should be able to follow them. Ask questions as need be.

Between the book, the above-mentioned series of webpages, and my answers to your questions, I hope that most readers who want to know more about the Higgs field can find the explanation that best fits their interests and background.

John BaezAgent-Based Models (Part 7)

Last time I presented a simple, limited class of agent-based models where each agent independently hops around a graph. I wrote:

Today the probability for an agent to hop from one vertex of the graph to another by going along some edge will be determined the moment the agent arrives at that vertex. It will depend only on the agent and the various edges leaving that vertex. Later I’ll want this probability to depend on other things too—like whether other agents are at some vertex or other. When we do that, we’ll need to keep updating this probability as the other agents move around.

Let me try to figure out that generalization now.

Last time I discovered something surprising to me. To describe it, let’s bring in some jargon. The conditional probability per time of an agent making a transition from its current state to a chosen other state (given that it doesn’t make some other transition) is called the hazard function of that transition. In a Markov process, the hazard function is actually a constant, independent of how long the agent has been in its current state. In a semi-Markov process, the hazard function is a function only of how long the agent has been in its current state.

For example, people like to describe radioactive decay using a Markov process, since experimentally it doesn’t seem that ‘old’ radioactive atoms decay at a higher or lower rate than ‘young’ ones. (Quantum theory says this can’t be exactly true, but nobody has seen deviations yet.) On the other hand, the death rate of people is highly non-Markovian, but we might try to describe it using a semi-Markov process. Shortly after birth it’s high—that’s called ‘infant mortality’. Then it goes down, and then it gradually increases.

We definitely want to our agent-based processes to have the ability to describe semi-Markov processes. What surprised me last time is that I could do it without explicitly keeping track of how long the agent has been in its current state, or when it entered its current state!

The reason is that we can decide which state an agent will transition to next, and when, as soon as it enters its current state. This decision is random, of course. But using random number generators we can make this decision the moment the agent enters the given state—because there is nothing more to be learned by waiting! I described an algorithm for doing this.

I’m sure this is well-known, but I had fun rediscovering it.

But today I want to allow the hazard function for a given agent to make a given transition to depend on the states of other agents. In this case, if some other agent randomly changes state, we will need to recompute our agent’s hazard function. There is probably no computationally feasible way to avoid this, in general. In some analytically solvable models there might be—but we’re simulating systems precisely because we don’t know how to solve them analytically.

So now we’ll want to keep track of the residence time of each agent—that is, how long it’s been in its current state. But William Waites pointed out a clever way to do this: it’s cheaper to keep track of the agent’s arrival time, i.e. when it entered its current state. This way you don’t need to keep updating the residence time. Whenever you need to know the residence time, you can just subtract the arrival time from the current clock time.

Even more importantly, our model should now have ‘informational links’ from states to transitions. If we want the presence or absence of agents in some state to affect the hazard function of some transition, we should draw a ‘link’ from that state to that transition! Of course you could say that anything is allowed to affect anything else. But this would create an undisciplined mess where you can’t keep track of the chains of causation. So we want to see explicit ‘links’.

So, here’s my new modeling approach, which generalizes the one we saw last time. For starters, a model should have:

• a finite set V of vertices or states,

• a finite set E of edges or transitions,

• maps u, d \colon E \to V mapping each edge to its source and target, also called its upstream and downstream,

• finite set A of agents,

• a finite set L of links,

• maps s \colon L \to V and t \colon L \to E mapping each link to its source (a state) and its target (a transition).

All of this stuff, except for the set of agents, is exactly what we had in our earlier paper on stock-flow models, where we treated people en masse instead of as individual agents. You can see this in Section 2.1 here:

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood, Evan Patterson, Compositional modeling with stock and flow models.

So, I’m trying to copy that paradigm, and eventually unify the two paradigms as much as possible.

But they’re different! In particular, our agent-based models will need a ‘jump function’. This says when each agent a \in A will undergo a transition e \in E if it arrives at the state upstream to that transition at a specific time t \in \mathbb{R}. This jump function will not be deterministic: it will be a stochastic function, just as it was in yesterday’s formalism. But today it will depend on more things! Yesterday it depended only on a, e and t. But now the links will come into play.

For each transition e \in E, there is set of links whose target is that transition, namely

t^{-1}(e) = \{\ell \in L \; \vert \; t(\ell) = e \}

Each link in \ell \in  t^{-1}(e) will have one state v as its source. We say this state affects the transition e via the link \ell.

We want the jump function for the transition e to depend on the presence or absence of agents in each state that affects this transition.

Which agents are in a given state? Well, it depends! But those agents will always form some subset of A, and thus an element of 2^A. So, we want the jump function for the transition e to depend on an element of

\prod_{\ell \in t^{-1}(e)} 2^A = 2^{A \times t^{-1}(e)}

I’ll call this element S_e. And as mentioned earlier, the jump function will also depend on a choice of agent a \in A and on the arrival time of the agent a.

So, we’ll say there’s a jump function j_e for each transition e, which is a stochastic function

j_e \colon A \times 2^{A \times t^{-1}(e)} \times \mathbb{R} \rightsquigarrow \mathbb{R}

The idea, then, is that j_e(a, S_e, t) is the answer to this question:

If at time t agent a arrived at the vertex u(e), and the agents at states linked to the edge e are described by the set S_e, when will agent a move along the edge e to the vertex d(e), given that it doesn’t do anything else first?

The answer to this question can keep changing as agents other than a move around, since the set S_e can keep changing. This is the big difference between today’s formalism and yesterday’s.

Here’s how we run our model. At every moment in time we keep track of some information about each agent a \in A, namely:

• Which vertex is it at now? We call this vertex the agent’s state, \sigma(a).

• When did it arrive at this vertex? We call this time the agent’s arrival time, \alpha(a).

• For each edge e whose upstream is \sigma(a), when will agent a move along this edge if it doesn’t do anything else first? Call this time T(a,e).

I need to explain how we keep updating these pieces of information (supposing we already have them). Let’s assume that at some moment in time t_i an agent makes a transition. More specifically, suppose agent \underline{a} \in A makes a transition \underline{e} from the state

\underline{v} = u(\underline{e}) \in V

to the state

\underline{v}' = d(\underline{e}) \in V.

At this moment we update the following information:

1) We set

\alpha(\underline{a}) := t_i

(So, we update the arrival time of that agent.)

2) We set

\sigma(\underline{a}) := \underline{v}'

(So, we update the state of that agent.)

3) We recompute the subset of agents in the state \underline{v} (by removing \underline{a} from this subset) and in the state \underline{v}' (by adding \underline{a} to this subset).

4) For every transition f that’s affected by the state \underline{v} or the state \underline{v}', and for every agent a in the upstream state of that transition, we set

T(a,f) := j_f(a, S_f, \alpha(a))

where S_f is the element of 2^{A \times t^{-1}(f)} saying which subset of agents is in each state affecting the transition f. (So, we update our table of times at which agent a will make the transition f, given that it doesn’t do anything else first.)

Now we need to compute the next time at which something happens, namely t_{i+1}. And we need to compute what actually happens then!

To do this, we look through our table of times T(a,e) for each agent a and all transitions out of the state that agent is in. and see which time is smallest. If there’s a tie, break it. Then we reset \underline{a} and \underline{e} to be the agent-edge pair that minimizes T(a,e).

5) We set

t_{i+1} := T(\underline{a},\underline{e})

Then we loop back around to step 1), but with i+1 replacing i.

Whew! I hope you followed that. If not, please ask questions.

Doug NatelsonThe future of the semiconductor industry, + The Mechanical Universe

 Three items of interest:

  • This article is a nice review of present semiconductor memory technology.  The electron micrographs in Fig. 1 and the scaling history in Fig. 3 are impressive.
  • This article in IEEE Spectrum is a very interesting look at how some people think we will get to chips for AI applications that contain a trillion (\(10^{12}\)) transistors.  For perspective, the processor in my laptop used to write this has about 40 billion transistors.  (The article is nice, though the first figure commits the terrible sin of having no y-axis number or label; clearly it's supposed to represent exponential growth as a function of time in several different parameters.)
  • Caltech announced the passing of David Goodstein, renowned author of States of Matter and several books about the energy transition.  I'd written about my encounter with him, and I wanted to take this opportunity to pass along a working link to the youtube playlist for The Mechanical Universe.  While the animation can look a little dated, it's worth noting that when this was made in the 1980s, the CGI was cutting edge stuff that was presented at siggraph.

April 15, 2024

John PreskillHow I didn’t become a philosopher (but wound up presenting a named philosophy lecture anyway)

Many people ask why I became a theoretical physicist. The answer runs through philosophy—which I thought, for years, I’d left behind in college.

My formal relationship with philosophy originated with Mr. Bohrer. My high school classified him as a religion teacher, but he co-opted our junior-year religion course into a philosophy course. He introduced us to Plato’s cave, metaphysics, and the pursuit of the essence beneath the skin of appearance. The essence of reality overlaps with quantum theory and relativity, which fascinated him. Not that he understood them, he’d hasten to clarify. But he passed along that fascination to me. I’d always loved dealing in abstract ideas, so the notion of studying the nature of the universe attracted me. A friend and I joked about growing up to be philosophers and—on account of not being able to find jobs—living in cardboard boxes next to each other.

After graduating from high school, I searched for more of the same in Dartmouth College’s philosophy department. I began with two prerequisites for the philosophy major: Moral Philosophy and Informal Logic. I adored those courses, but I adored all my courses.

As a sophomore, I embarked upon Dartmouth’s philosophy-of-science course. I was one of the course’s youngest students, but the professor assured me that I’d accumulated enough background information in science and philosophy classes. Yet he and the older students threw around technical terms, such as qualia, that I’d never heard of. Those terms resurfaced in the assigned reading, again without definitions. I struggled to follow the conversation.

Meanwhile, I’d been cycling through the sciences. I’d taken my high school’s highest-level physics course, senior year—AP Physics C: Mechanics and Electromagnetism. So, upon enrolling in college, I made the rounds of biology, chemistry, and computer science. I cycled back to physics at the beginning of sophomore year, taking Modern Physics I in parallel with Informal Logic. The physics professor, Miles Blencowe, told me, “I want to see physics in your major.” I did, too, I assured him. But I wanted to see most subjects in my major.

Miles, together with department chair Jay Lawrence, helped me incorporate multiple subjects into a physics-centric program. The major, called “Physics Modified,” stood halfway between the physics major and the create-your-own major offered at some American liberal-arts colleges. The program began with heaps of prerequisite courses across multiple departments. Then, I chose upper-level physics courses, a math course, two history courses, and a philosophy course. I could scarcely believe that I’d planted myself in a physics department; although I’d loved physics since my first course in it, I loved all subjects, and nobody in my family did anything close to physics. But my major would provide a well-rounded view of the subject.

From shortly after I declared my Physics Modified major. Photo from outside the National Academy of Sciences headquarters in Washington, DC.

The major’s philosophy course was an independent study on quantum theory. In one project, I dissected the “EPR paper” published by Einstein, Podolsky, and Rosen (EPR) in 1935. It introduced the paradox that now underlies our understanding of entanglement. But who reads the EPR paper in physics courses nowadays? I appreciated having the space to grapple with the original text. Still, I wanted to understand the paper more deeply; the philosophy course pushed me toward upper-level physics classes.

What I thought of as my last chance at philosophy evaporated during my senior spring. I wanted to apply to graduate programs soon, but I hadn’t decided which subject to pursue. The philosophy and history of physics remained on the table. A history-of-physics course, taught by cosmologist Marcelo Gleiser, settled the matter. I worked my rear off in that course, and I learned loads—but I already knew some of the material from physics courses. Moreover, I knew the material more deeply than the level at which the course covered it. I couldn’t stand the thought of understanding the rest of physics only at this surface level. So I resolved to burrow into physics in graduate school. 

Appropriately, Marcelo published a book with a philosopher (and an astrophysicist) this March.

Burrow I did: after a stint in condensed-matter research, I submerged up to my eyeballs in quantum field theory and differential geometry at the Perimeter Scholars International master’s program. My research there bridged quantum information theory and quantum foundations. I appreciated the balance of fundamental thinking and possible applications to quantum-information-processing technologies. The rigorous mathematical style (lemma-theorem-corollary-lemma-theorem-corollary) appealed to my penchant for abstract thinking. Eating lunch with the Perimeter Institute’s quantum-foundations group, I felt at home.

Craving more research at the intersection of quantum thermodynamics and information theory, I enrolled at Caltech for my PhD. As I’d scarcely believed that I’d committed myself to my college’s physics department, I could scarcely believe that I was enrolling in a tech school. I was such a child of the liberal arts! But the liberal arts include the sciences, and I ended up wrapping Caltech’s hardcore vibe around myself like a favorite denim jacket.

Caltech kindled interests in condensed matter; atomic, molecular, and optical physics; and even high-energy physics. Theorists at Caltech thought not only abstractly, but also about physical platforms; so I started to, as well. I began collaborating with experimentalists as a postdoc, and I’m now working with as many labs as I can interface with at once. I’ve collaborated on experiments performed with superconducting qubits, photons, trapped ions, and jammed grains. Developing an abstract idea, then nursing it from mathematics to reality, satisfies me. I’m even trying to redirect quantum thermodynamics from foundational insights to practical applications.

At the University of Toronto in 2022, with my experimental collaborator Batuhan Yılmaz—and a real optics table!

So I did a double-take upon receiving an invitation to present a named lecture at the University of Pittsburgh Center for Philosophy of Science. Even I, despite not being a philosopher, had heard of the cache of Pitt’s philosophy-of-science program. Why on Earth had I received the invitation? I felt the same incredulity as when I’d handed my heart to Dartmouth’s physics department and then to a tech school. But now, instead of laughing at the image of myself as a physicist, I couldn’t see past it.

Why had I received that invitation? I did a triple-take. At Perimeter, I’d begun undertaking research on resource theories—simple, information-theoretic models for situations in which constraints restrict the operations one can perform. Hardly anyone worked on resource theories then, although they form a popular field now. Philosophers like them, and I’ve worked with multiple classes of resource theories by now.

More recently, I’ve worked with contextuality, a feature that distinguishes quantum theory from classical theories. And I’ve even coauthored papers about closed timelike curves (CTCs), hypothetical worldlines that travel backward in time. CTCs are consistent with general relativity, but we don’t know whether they exist in reality. Regardless, one can simulate CTCs, using entanglement. Collaborators and I applied CTC simulations to metrology—to protocols for measuring quantities precisely. So we kept a foot in practicality and a foot in foundations.

Perhaps the idea of presenting a named lecture on the philosophy of science wasn’t hopelessly bonkers. All right, then. I’d present it.

Presenting at the Center for Philosophy of Science

This March, I presented an ALS Lecture (an Annual Lecture Series Lecture, redundantly) entitled “Field notes on the second law of quantum thermodynamics from a quantum physicist.” Scientists formulated the second law the early 1800s. It helps us understand why time appears to flow in only one direction. I described three enhancements of that understanding, which have grown from quantum thermodynamics and nonequilibrium statistical mechanics: resource-theory results, fluctuation theorems, and thermodynamic applications of entanglement. I also enjoyed talking with Center faculty and graduate students during the afternoon and evening. Then—being a child of the liberal arts—I stayed in Pittsburgh for half the following Saturday to visit the Carnegie Museum of Art.

With a copy of a statue of the goddess Sekhmet. She lives in the Carnegie Museum of Natural History, which shares a building with the art museum, from which I detoured to see the natural-history museum’s ancient-Egypt area (as Quantum Frontiers regulars won’t be surprised to hear).

Don’t get me wrong: I’m a physicist, not a philosopher. I don’t have the training to undertake philosophy, and I have enough work to do in pursuit of my physics goals. But my high-school self would approve—that self is still me.

Matt Strassler Update to the Higgs FAQ

Although I’ve been slowly revising the Higgs FAQ 2.0, this seemed an appropriate time to bring the Higgs FAQ on this website fully into the 2020’s. You will find the Higgs FAQ 3.0 here; it explains the basics of the Higgs boson and Higgs field, along with some of the wider context.

For deeper explanations of the Higgs field:

  • if you are comfortable with math, you can find this series of pages useful (but you will probably to read this series first.)
  • if you would prefer to avoid the math, a full and accurate conceptual explanation of the Higgs field is given in my book.

Events: this week I am speaking Tuesday in Berkeley, CA; Wednesday in Seattle, WA (at Town Hall); and Thursday outside of Portland, OR (at the Powell’s bookstore in Cedar Hills). Click here for more details.

n-Category Café Semi-Simplicial Types, Part II: The Main Results

(Jointly written by Astra Kolomatskaia and Mike Shulman)

This is part two of a three part series of expository posts on our paper Displayed Type Theory and Semi-Simplicial Types. In this part, we cover the main results of the paper.

The Geometric Intuition

The central motivating definition of our paper is the following:

A semi-simplicial type XX consists of a type X 0X_0 together with, for every x:X 0x:X_0, a displayed semi-simplicial type over XX.

The purpose of the 100+ pages is to formulate a type theory in which we can make sense of this as a kind of “coinductive definition”. The key is to figure out what “displayed semi-simplicial type over XX should mean. Intuitively, it should be an “indexed” reformulation of a morphism of semi-simplicial types YXY \to X, but how do we do that?

The idea behind our new type theory, Displayed Type Theory (dTT), is that if X:TypeX : \mathsf{Type} is any notion of mathematical object (such as a group, category, or semi-simplicial type), then there should exist a notion of displayed elements of XX living over the elements of XX. Thus, for example, if 𝒞:Cat\mathcal{C} : \mathsf{Cat} is a category, then there should be a type Cat d𝒞\mathsf{Cat^d}\;\mathcal{C} of categories displayed over 𝒞\mathcal{C}, or alternatively dependent categories over 𝒞\mathcal{C}. The particular case of displayed categories was introduced by Ahrens and Lumsdaine in an eponymous 2019 paper, and the idea has since been generalized to bicategories by Ahrens, Frumin, Maggesi, Veltri, and van der Weide; dTT posits that such displayed structures exist for any kind of mathematical object, with a definition that can be derived algorithmically from the definition of the object.

We will return to this below; but first, as suggested above, we explain how such a notion enables a coinductive definition of semi-simplicial types. Indeed, we say that a semi-simplicial type AA consists of the following: First, AA defines a type ZA:Type\mathsf{Z}\;A : \mathsf{Type}, called the 00-simplices of AA. Second, each 00-simplex x:ZAx : \mathsf{Z}\;A defines a semi-simplicial type SAx:SST dA\mathsf{S}\;A\;x : \mathsf{SST^d}\;A displayed over AA, called the slice of AA over xx.

We can phrase this in Agda-esque syntax for dTT as follows:

codata SST : Type where
  Z : SST → Type
  S : (X : SST) → Z X → SSTᵈ X

To see why this definition is correct, we have to understand what, in general, a semi-simplicial type BB displayed over AA consists of. The first part of this answer is that BB defines a type of 00-simplices of BB displayed over 00-simplices of AA, i.e. a family Z dB:ZAType\mathsf{Z^d}\;B : \mathsf{Z}\;A \to \mathsf{Type}. Then, every displayed 00-simplex z:Z dByz : \mathsf{Z^d}\;B\;y over a 00-simplex y:ZAy : \mathsf{Z}\;A should define S dByz\mathsf{S^d}\;B\;y\;z, a doubly displayed semi-simplicial type, for whatever this means.

Now, instead of working out the definition of a doubly dependent semi-simplicial type, let’s circle back and think geometrically. Semi-simplicial types should have families of nn-simplex types. If A:SSTA : \mathsf{SST}, then we write that

A 0:Type A 0ZA \begin{aligned} &A_0 : \mathsf{Type} \\ &A_0 \equiv \mathsf{Z}\;A \end{aligned}

is the type of 00-simplices in AA. Similarly, if B:SST dAB : \mathsf{SST^d}\;A is semi-simplicial type displayed over AA, then for y:A 0y : A_0, we write

B 0:A 0Type B 0yZ dBy \begin{aligned} &B_0 : A_0 \to \mathsf{Type} \\ &B_0\;y \equiv \mathsf{Z^d}\;B\;y \end{aligned}

for the type of 00-simplices of BB displayed over the 00-simplex yy of AA. Putting this together, if we have two 00-simplices x 00x 10:A 0x_{00}\;x_{10} : A_0 of AA, then we may form

A 1:(x 01:A 0)(x 10:A 0)Type A 1x 01x 10Z d(SAx 01)x 10, \begin{aligned} &A_1 : (x_{01} : A_0)\;(x_{10} : A_0) \to \mathsf{Type} \\ &A_1\;x_{01}\;x_{10} \equiv \mathsf{Z^d}\;(\mathsf{S}\;A\;x_{01})\;x_{10}, \end{aligned}

which is the type of 11-simplices in AA joining x 01x_{01} to x 10x_{10}.

It therefore stands to reason that BB should have a type of dependent 11-simplices living over the 11-simplices of AA. Thus if β 11:A 1y 01y 10\beta_{11} : A_1\;y_{01}\;y_{10}, then given dependent endpoints z 01:B 0y 01z_{01} : B_0\;y_{01} and z 10:B 0y 10z_{10} : B_0\;y_{10}, we should get a type B 1y 01z 01y 10z 10β 11B_1\;y_{01}\;z_{01}\;y_{10}\;z_{10}\;\beta_{11}. The formula for this happens to take the following form:

B 1:(y 01:A 0)(z 01:B 0y 01)(y 10:A 0)(z 10:B 0y 10)(β 11:A 1y 00y 10)Type B 1y 01z 01y 10z 10β 11Z dd(S dBy 01z 01)y 10z 10β 11, \begin{aligned} &B_1 : (y_{01} : A_0)\;(z_{01} : B_0\;y_{01})\;(y_{10} : A_0)\;(z_{10} : B_0\;y_{10})\;(\beta_{11} : A_1\;y_{00}\;y_{10}) \to \mathsf{Type} \\ &B_1\;y_{01}\;z_{01}\;y_{10}\;z_{10}\;\beta_{11} \equiv \mathsf{Z^{dd}}\;(\mathsf{S^d}\;B\;y_{01}\;z_{01})\;y_{10}\;z_{10}\;\beta_{11}, \end{aligned}

which mirrors the formula for A 1A_1.

Then, putting all of this together again, if we have a 00-simplex x 001:A 0x_{001} : A_0, then we take BSAx 001B \equiv \mathsf{S}\;A\;x_{001}. For x 010:A 0x_{010} : A_0, we have that B 0x 010Z d(SAx 001)x 010A 1x 001x 010B_0\;x_{010} \equiv \mathsf{Z^d}\;(\mathsf{S}\;A\;x_{001})\;x_{010} \equiv A_1\;x_{001}\;x_{010}. We thus get that:

A 2:(x 001:A 0)(x 010:A 0)(β 011:A 1x 001x 010)(x 100:A 0) (β 101:A 1x 001x 100)(β 110:A 1x 010x 100)Type A 2x 001x 010β 011x 100β 101β 110Z dd(S d(SAx 001)x 010β 011)x 100β 101β 110 \begin{aligned} &A_2 :(x_{001} : A_0)\;(x_{010} : A_0)\;(\beta_{011} : A_1\;x_{001}\;x_{010})\;(x_{100} : A_0) \\ &\quad\quad(\beta_{101} : A_1\;x_{001}\;x_{100})\;(\beta_{110} : A_1\;x_{010}\;x_{100}) \to \mathsf{Type} \\ &A_2\;x_{001}\;x_{010}\;\beta_{011}\;x_{100}\;\beta_{101}\;\beta_{110} \equiv \mathsf{Z^{dd}}\;(\mathsf{S^d}\;(\mathsf{S}\;A\;x_{001})\;x_{010}\;\beta_{011})\;x_{100}\;\beta_{101}\;\beta_{110} \end{aligned}

In general, this pattern continues in higher dimensions and the process described lets us extract nn-simplex types.

We can visualise what’s going on in two different ways. The first visualisation shows how the nn-simplices of the slice of AA over xx live dependently over simplices of AA.


For example, if z 1z_1 is a 00-simplex of the slice of AA over xx displayed over the zero simplex y 1y_1 of AA, then z 1z_1 is a 11-simplex of AA joining xx to y 1y_1. Similarly, suppose z 01z_{01} and z 10z_{10} are 00-simplices of the slice of AA over xx displayed over y 01y_{01} and y 10y_{10}, respectively, and β 11\beta_{11} is a 11-simplex of AA joining y 01y_{01} to y 11y_{11}. Then if γ 11\gamma_{11} is a 11-simplex of the slice of AA over xx displayed over β 11\beta_{11} and joining z 01z_{01} to z 10z_{10}, then γ 11\gamma_{11} is a 22-simplex of AA with the specified boundary.

Geometrically, we are using the fact that the nn-simplex is the cone of the (n1)(n-1)-simplex. Thus, with the above definition, assuming we know inductively that every semi-simplicial type has a type of (n1)(n-1)-simplices, then for every 0-simplex xx, the displayed semi-simplicial type SAx\mathsf{S}\; A \; x has a type of “displayed (n1)(n-1)-simplices” over this. Such a displayed (n1)(n-1)-simplex depends in xx (the cone vertex) as well as on an (n1)(n-1)-simplex of AA (the base face, opposite the cone vertex) and thus can be viewed geometrically as an nn-simplex.

The second visualisation explains our formulas in terms of iterated slicing.


Note that, to form each successive slice, you have to provide 2 n2^n simplex data points. The true dependent nn-simplices of a slice may then be viewed as matching objects.

The Homotopical Problem of Constructing a Semi-Simplicial Classifier

Now, to motivate the construction of “display”, let us return to classical homotopy theory. Suppose that we are working in 𝒞\mathcal{C}, some setting for homotopy theory. This could be a model category or a fibration category, which we usually think of as representing a Grothendick \infty-topos. In our paper (section 4.1) we refer to the objects of 𝒞\mathcal{C} as contexts, denoted Γctx\Gamma\;\mathsf{ctx}, the fibrations over a context Γ\Gamma as types, denoted γ:ΓAγtype\gamma : \Gamma \vdash A\;\gamma\;\mathsf{type}, and sections of a type AA as terms, denoted by γ:Γtγ:Aγ\gamma : \Gamma \vdash t\;\gamma : A\;\gamma.

We assume that this setting has an object classifier. This means that in any context Γ\Gamma, there is γ:ΓTypeγtype\gamma : \Gamma \vdash \mathsf{Type}\;\gamma \;\mathsf{type}, as well as γ:Γ,A:TypeγElAγtype\gamma : \Gamma, \;A : \mathsf{Type}\;\gamma \vdash \mathsf{El}\;A\;\gamma\;\mathsf{type}. This has the property that any “small” fibration γ:ΓAγtype\gamma : \Gamma \vdash A\;\gamma\;\mathsf{type} gives rise to a code in the universe γ:ΓCodeAγ:Typeγ\gamma : \Gamma \vdash \mathsf{Code}\;A\;\gamma : \mathsf{Type}\;\gamma, such that the pullback of the El\mathsf{El} fibration along that section exactly yields the type AA, that is: γ:ΓEl(CodeA)γAγ.\gamma : \Gamma \vdash \mathsf{El}\;(\mathsf{Code}\;A)\;\gamma \equiv A\;\gamma.

We then consider the problem of constructing a classifier for semi-simplicial diagrams. Specifically, we are interested in Reedy fibrant semi-simplicial diagrams, which are the homotopical counterpart of the indexed formulation in syntax. Thus, such a classifier would consist of a generic fibration in the empty context SSTtype\cdot \vdash \mathsf{SST}\;\mathsf{type}, along with a simplicial diagram tower of the form:

A:SSTEl 0Atype A:SST,a 01:El 0A,a 10:El 0AEl 1Aa 0a 1type A:SST,a 001:El 0A,a 010:El 0A,a 011:El 1Aa 001a 010 a 100:El 0A,a 101:El 1Aa 100a 100,a 110:El 1Aa 100a 010 El 2Aa 001a 010a 011a 100a 101a 110type \begin{aligned} & A : \mathsf{SST} \vdash \mathsf{El}_0\;A\;\mathsf{type} \\\\ & A : \mathsf{SST},\; a_{01} : \mathsf{El}_0\;A,\; a_{10} : \mathsf{El}_0\;A\vdash \mathsf{El}_1\;A\;a_0\;a_1\;\mathsf{type} \\\\ & A : \mathsf{SST},\; a_{001} : \mathsf{El}_0\;A,\; a_{010} : \mathsf{El}_0\;A,\; a_{011} : \mathsf{El}_1\;A\;a_{001}\;a_{010} \\\\ &\quad\quad a_{100} : \mathsf{El}_0\;A,\; a_{101} : \mathsf{El}_1\;A\;a_{100}\;a_{100},\; a_{110} : \mathsf{El}_1\;A\;a_{100}\;a_{010} \\\\ &\quad\quad \vdash \mathsf{El}_2\;A\;a_{001}\;a_{010}\;a_{011} \;a_{100}\;a_{101}\;a_{110}\;\mathsf{type} \\ &\ldots \end{aligned}

such that for any “small” simplicial diagram data over a context Γ\Gamma, this data arises uniquely as the appropriate series of pullbacks constructed from some term γ:ΓAγ:SST\gamma : \Gamma \vdash A\;\gamma : \mathsf{SST}.

Note that, stated in this way, this is an infinitary or non-elementary universal property: it refers to infinite diagrams indexed by the external set of natural numbers (as opposed to any internal natural-numbers object that may exist in 𝒞\mathcal{C}). The problem of defining semi-simplicial types can roughly be thought of as one of giving a finitary universal property for such an object, so that it could be characterized and even constructed in a finitary syntactic type theory.

A Finitary Universal Property for the Classifier

We have not, strictly speaking, solved this problem as originally stated. Indeed, we suspect that the classifier SST\mathsf{SST} does not, on its own, have a finitary universal characterisation. However, we discovered that there is an enhancement of it that does have a finitary universal property, if we change the setting 𝒞\mathcal{C} in which we were working to the augmented semi-simplicial diagram model 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}. Then, there is a universally characterised diagram SST\mathsf{SST} in 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}, of which the desired classifier in 𝒞\mathcal{C} was the discrete part, denoted SST\lozenge\;\mathsf{SST}.

If we playfully refer to the problem of categorically constructing a classifier for semi-simplicial objects as answering the question “what is a triangle?”. then we discovered that SST\mathsf{SST} is naturally characterised in the model 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}, which is the setting for homotopy theory in which “everything is an augmented triangle.” In a sense, then, SST\mathsf{SST} is the “augmented triangle of triangles.”

Intuitively, by working in the setting of infinitely coherent diagrams, the desired augmented diagram of diagrams can be assembled in a way that accounts for all the coherences. This at first may appear as a convenient but inessential strengthening of the inductive hypothesis. However, passing to the world of augmented diagrams seems to play a much more essential role than this because the finitary universal characterisation of SST\mathsf{SST} uses properties of augmented diagrams in an essential way.

Specifically, the world of augmented diagrams is the place where we can make sense of the general notion of a “displayed element” of a type. This happens through the existence of an operation known as décalage, which is a kind of backwards shift operation. It’s very classical in homotopy theory — you can read much more about it starting on the nLab — but the basic idea is just that if you take a (fibred) semi-simplicial diagram XX and throw away the bottom object and the last face operators in each dimension, and relabel, you get another semi-simplicial diagram. In other words, on objects we have (X D) n=X n+1(X^{\mathsf{D}})_{n} = X_{n+1}. The face operators that we threw away now assemble into a map of semi-simplicial diagrams ρ X:X DX\rho_X : X^{\mathsf{D}} \to X.

If we convert this fibred formulation to an indexed one, this means that any object XX comes with an object X dX^d over itself, such that X D x:XX dxX^{\mathsf{D}} \cong \sum_{x:X} X^d\, x. In practice, we keep the fibred formulation for contexts, but use the indexed version for types, and call it display. This gives the following rule:

γ:ΓAγtypeγ +:Γ D,a:A(ρ Γγ +)A dγ +atype\frac{\gamma : \Gamma \vdash A\;\gamma\;\mathsf{type}}{\gamma^+ : \Gamma^\mathsf{D},\; a : A\;(\rho_\Gamma\;\gamma^+) \vdash A^\mathsf{d}\; \gamma^+\;a\;\mathsf{type}}

The fact that display alters the context is the source of much technical difficulty, which we deal with in the paper using a modal type theory. We have two modes, the simplicial one corresponding to 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}, and the discrete one corresponding to 𝒞\mathcal{C} itself. Then we have three basic modalities. The first, denoted \lozenge, picks out the (1)(-1)-simplices of an augmented semi-simplicial diagram, mapping the simplicial mode to the discrete mode. The second, denoted \triangle, builds a constant augmented semi-simplicial diagram, mapping the discrete mode to the simplicial mode. And the third, denoted \Box, takes the limit of a semi-simplicial diagram, mapping the simplicial mode to the discrete mode. Then décalage and display act only on types at the simplicial mode, but we can avoid décalaging the context if it comes from the discrete mode, giving a rule something like:

γ:ΓAγtypeγ:Γ,a:AγA dγatype\frac{\gamma : \triangle \Gamma \vdash A\;\gamma\;\mathsf{type}}{\gamma : \triangle\Gamma,\; a : A\;\gamma \vdash A^\mathsf{d}\; \gamma\;a\;\mathsf{type}}

To be more precise, here Γ\triangle\Gamma refers to a “modal context lock”, and we actually allow part of the context to be simplicial and get décalaged. However, for the present we can ignore these modal issues and just work in the empty context ()(\,), for which we have () D()(\,)^\mathsf{D} \equiv (\,). Ignoring this and other subtleties involving telescopes, we can now state more formally the universal property of SST\mathsf{SST}. Suppose that YY is a type in the empty context (at the simplicial mode), a.k.a. a “closed type”. We define an endofunctor on closed types (at the simplicial mode) by:

F(Y) υ:Y A:Type(AY dυ).\mathsf{F}(Y) \equiv \sum_{\upsilon : Y} \sum_{A : \mathsf{Type}} ( A \to Y^\mathsf{d}\;\upsilon).

This endofunctor comes with an evident copointing ϵ Y:F(Y)Y\epsilon_Y : \mathsf{F}(Y) \to Y by way of projection. Our proposed characterization of SST\mathsf{SST} is that it is a terminal (copointed) coalgebra of the copointed endofunctor (F,ϵ)(\mathsf{F},\,\epsilon). Thus, it is the universal object equipped with a map

SST X:SST A:Type(ASST dX)\mathsf{SST} \to \sum_{X : \mathsf{SST}} \sum_{A : \mathsf{Type}} ( A \to \mathsf{SST}^\mathsf{d}\;X)

whose first component is the identity. What remains, therefore, is two components Z:SSTType\mathsf{Z}: \mathsf{SST} \to \mathsf{Type} and S:(X:SST)ZXSST dX\mathsf{S} : (X : \mathsf{SST}) \to \mathsf{Z}\;X \to \mathsf{SST}^{\mathsf{d}}\;X, exactly as proposed above. Once again, this corresponds to the following Agda-esque dTT code:

codata SST : Type where
  Z : SST → Type
  S : (X : SST) → Z X → SSTᵈ X


The model structure on 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+} is well known in the literature as the Reedy model structure. However, in the dTT paper, we present an explicit construction in the case of the augmented semi-simplex category. This presentation defines the relevant concepts mutually with décalage and display, and allows for definitionally strict commutation laws for display without the use of coherence theorems. These computation laws, for example, say that:

Type dAAType\mathsf{Type}^\mathsf{d}\;A \equiv A \to \mathsf{Type}

((a:A)B[a])df(a:A ρ)(a:A da)B d[a,a](fa)((a : A) \to B\;[\,a\,]){^\mathsf{d}}\;f \equiv (a : A^\rho)\,(a' : A^\mathsf{d}\;a) \to B^\mathsf{d}\;[\,a\,,\,a'\,]\;(f\;a)

These laws provide the promised algorithmic computation of displayed structures in terms of ordinary ones. If X denotes a kind of mathematical structure, the first law above says that for any type appearing in an X, a displayed X has a type dependent on that type, and the second says that similarly any function appearing in an X is tracked in a displayed X by a function lying above it. Thus, for instance, if CC is a category with object set C 0C_0 and homs hom C:C 0C 0Type\hom_C : C_0 \to C_0 \to \mathsf{Type}, a displayed category has a dependent family of objects D 0:C 0TypeD_0 : C_0 \to \mathsf{Type} and dependent hom-types lying over those of CC, with composition and identity operations lying over those of CC and so on, exactly as in the original definition of displayed category.

However, these are also the classical observational computation laws for “unary parametricity”! These say that in a unary relational model, computability witnesses for a type are predicates on the type, and computability witnesses for a function ff are constructions that transform computability witnesses of an input aa into computability witnesses of the corresponding output (fa)(f\;a). This basic structure is what underlies, for example, the normalisation proof for STLC in Pierce’s Types and Programming Languages.

Consider the type of polymorphic identity functions. We have that:

((A:Type)AA)did(A:Type)(P:AType)(a:A)PaP(ida)((A : \mathsf{Type}) \to A \to A){^\mathsf{d}}\;\mathsf{id} \equiv (A : \mathsf{Type})\,(P : A \to \mathsf{Type})\,(a : A) \to P\;a \to P\;(\mathsf{id}\;a)

Hence a computability witness of a polymorphic identity function id\mathsf{id} is a proof that id\mathsf{id} preserves arbitrary predicates. In displayed type theory, then, we have (with some abuse of notation for the modalities that we are ignoring):

id-thm : (f : △□ ((A : Type) → A → A)) (A : Type) (a : A) → f A a ≡ a
id-thm (△ (□ f)) A a = fᵈ A (λ x → x ≡ a) a refl

Thus any closed term of polymorphic identity function type is indeed an identity function at all types.

Parametricity means that, for example, the simplicial mode of dTT is incompatible with the existence of features such as universal decidable equality. Indeed, suppose that we had decidable equality on the universe. Then one could construct a bad polymorphic identity function which is everywhere the identity function, except at the type Nat\mathsf{Nat}, at which it is the constant at 22 function. Such a construction would violate the theorem above. Indeed, the spirit of most parametricity violating results is either classical or non-constructive. This is because, morally, if you do everything constructively, then any definition that you write down should be natural and respect all relations.

Using the Universal Property

Note that our definition/construction of SST\mathsf{SST} yields a type at the simplicial mode, i.e. an object of 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}. But the classical homotopy theorist is interested in our original homotopy theory 𝒞\mathcal{C}. Thus, recalling that \lozenge picks out the (1)(-1)-simplices of an augmented semi-simplicial diagram, from the classical perspective the question of interest in working with semi-simplicial types is:

Given a context Γ\Gamma in the model being studied (i.e. the discrete mode), construct a term γ:ΓAγ:SST\gamma : \Gamma \vdash A\;\gamma : \lozenge\;\mathsf{SST} representing the desired semi-simplicial type.

But in order to solve this problem, we have to work in the simplicial mode, where the universal property is stated. This means that we have to find a context Γ˜\widetilde{\Gamma} at the simplicial mode such that Γ˜Γ\lozenge\;\widetilde{\Gamma} \equiv \Gamma, i.e. to extend Γ\Gamma to a diagram of which it is the discrete part. Doing so is always possible as we may consider Γ\triangle\;\Gamma, which extends Γ\Gamma coskeletally. However, the issue is that any categorical universal properties that held of Γ\Gamma in the discrete mode will not necessarily continue to hold of Γ\triangle\;\Gamma as a diagram; in particularly, any properties of Γ\Gamma that are expressed in the syntax of type theory vanish. It is thus necessary to extend Γ\Gamma to a diagram in a bespoke way such that the relevant properties that discretely hold of Γ\Gamma continue to simplicially hold of Γ˜\widetilde{\Gamma}.

Fortunately, every construction of constructive intensional type theory (e.g. Π\Pi-types, universes, and inductive types) make sense of diagrams. Thus, metatheoretically, one can construct Γ˜\widetilde{\Gamma} of a purely syntactic entity by structural recursion on the definition of Γ\Gamma via replacing all discrete syntax with its simplicial counterpart. However, from the perspective of the homotopy theorist, the model category can have many semantic entities not captured by the syntax of type theory. For example, the semantics of dTT are compatible with the model for the discrete mode having universal decidable equality. If Γ\Gamma were to use such a parametricity violating construction in an essential way in its definition, then the obstacle would be that Γ\Gamma could not be lifted to Γ˜\widetilde{\Gamma} in a way that reflected all of the same categorical properties.

Parametricity thus serves as a kind of filter regarding what categorical universal properties of Γ\Gamma can be invoked when applying the universal property of semi-simplicial types. The categorical meaning of parametricity is then the question of what universal constructions in an arbitrary model category 𝒞\mathcal{C} exist in the diagram model 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}.

Simplices, cubes, and symmetry

It is a curious fact that to give a universal property to the type SST\mathsf{SST} of semi-simplicial types, we find ourselves needing to work in the model 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+} whose objects are augmented semi-simplicial types in 𝒞\mathcal{C} — curious that the two are closely related and yet not identical. It’s possible there is something deeper going on here, but one explanation is that this is a coincidence arising from another coincidence: the fact that the augmented semi-simplex category coincides with the unary semi-cube category.

Normally when we think of a cubical set we think of cubes as powers of an interval object that has two endpoints. Thus, an nn-dimensional cube has 2n2n faces of dimension (n1)(n-1), arising by choosing one of the nn dimensions and then an endpoint in that dimension. However, essentially the same formal construction works for an “interval” having kk endpoints for any natural number kk. We can visualize the case k=3k=3, for instance, as consisting of powers of an interval with its midpoint also distinguished along with its endpoints, so that a 2-cube looks like a window subdivided into four panes.

In the case k=1k=1, it turns out that the faces of a “unary” nn-cube have exactly the same combinatorial structure as those of an augmented (n1)(n-1)-simplex. We can even see this geometrically: the unary cubes can be thought of as powers of a half-open interval [0,1)[0,1) or ray [0,)[0,\infty) with only one endpoint, and there is a face-respecting embedding of the (n1)(n-1)-simplex in the first nn-orthant [0,) n[0,\infty)^n as {(x 1,,x n)x 1++x n=1}\{ (x_1,\dots,x_n) \mid x_1+\cdots+x_n = 1\}. Thus, we can equivalently think of dTT as having semantics in the model of unary semi-cubical diagrams.

This point of view is also suggested by the connection to parametricity, where one can consider also kk-ary parametricity for any kk. Indeed, binary parametricity seems to be more useful than unary parametricity: many of the “free theorems” arising from parametricity require the binary version. (Nullary parametricity is also possible, and is closely related to nominal sets.) Ordinary internal kk-ary parametricity has semantics in kk-ary cubical diagrams (having degeneracies as well as faces), and one might expect that there is an analogous sort of kk-ary dTT” with semantics in kk-ary semi-cubical diagrams.

For those having experience with cubical type theories, we should emphasize that unlike the cubes often used there, these cubes do not have diagonals or connections. This is essential to get the correct behavior of Π\Pi-types, for instance. (However, cubes without diagonals and connections were used in the first BCH cubical model of homotopy type theory, and also in the more recent Higher Observational Type Theory.)

Another question this perspective emphasizes is the presence or absence of symmetries. Here the simplest sort of symmetry in dTT would be an isomorphism A ddxy 0y 1A ddxy 1y 0A^{\mathsf{dd}}\,x\,y_0\,y_1 \cong A^{\mathsf{dd}}\,x\,y_1\,y_0. Our current theory does not admit such operations, in contrast to internal parametricity and cubical type theories where they have been found essential. The modal guards on the display operation allow us to omit them. This makes certain things easier, such as our explicit construction of the diagram model 𝒞 Δ + op\mathcal{C}^{\Delta^{\mathrm{op}}_+}, but may ultimately be a limitation: in particular, it seems that we cannot give a useful universal property to SST d\mathsf{SST}^{\mathsf{d}} without symmetries.

Conclusion and Vistas

In conclusion, therefore, what we achieve is to specify a new type theory, displayed type theory (dTT), which contains a unary parametricity operator () d(-)^\mathsf{d} that behaves “observationally” (computes on type-formers), and which is “modally guarded” so that it can only be applied to types in the empty context or more generally in a modal context. Inside this type theory, we specify a general notion of “displayed coinductive type”, of which our proposed definition of SST\mathsf{SST} is an instance.

We then construct, from any model of type theory having limits of countable towers of fibrations, a model of dTT, and show that in this model displayed coinductive types are modeled by terminal coalgebras of copointed endofunctors, which can be constructed as countable inverse limits. Moreover, we identify the corresponding tower for SST\mathsf{SST} as consisting of the classifiers for nn-truncated semi-simplicial types, so that the limit object SST\mathsf{SST} is, indeed, a classifier of semi-simplicial types.

Intriguingly, this leaves open the question of models of dTT and SST\mathsf{SST} without countable limits. We conjecture that there are models of dTT, perhaps obtained from realizability, in which an object satisfying our universal property for SST\mathsf{SST} exists, but is not an external classifier of semi-simplicial types. Instead, we expect it should be a classifier of “internal” or “uniform” semi-simplicial types, which is exactly what one would hope to be able to talk about in a realizability model. If this were true, then our universal property for semi-simplicial types would be more general than the classical external infinitary one, with possible implications for the notion of “elementary (,1)(\infty,1)-topos”.

n-Category Café Semi-Simplicial Types, Part I: Motivation and History

(Jointly written by Astra Kolomatskaia and Mike Shulman)

This is part one of a three-part series of expository posts on our paper Displayed Type Theory and Semi-Simplicial Types. In this part, we motivate the problem of constructing SSTs and recap its history.

A Prospectus

There are different ways to describe the relationship between type theory and set theory, but one analogy views set theory as like machine code on a specific CPU architecture, and type theory as like a high level programming language. From this perspective, set theory has its place as a foundation because almost any structure that one thinks about can be encoded through a series of representation choices. However, since the underlying reasoning of set theory is untyped, it can violate the principle of equivalence. Thus, for example, there is no guarantee that theorems proved in set theory about groups automatically translate to theorems about group objects internal to a category.

Within the programming language analogy, one can fully define a high level programming language and its operational semantics without specifying any particular compiler or any concept of a CPU architecture. Similarly, type theory allows one to reason with concepts defined in a purely operational, as opposed to representational, manner. The goal of type theory is to create expressive and semantically general languages for reasoning about mathematics.

Homotopy Type Theory (HoTT) is a perspective on intensional dependent type theory which regards types as homotopical spaces. In HoTT, one is only allowed to speak of concepts “up to homotopy”. This feature allows one to interpret HoTT into any \infty-topos. This is a fascinating state of affairs, because, in general, the constructions of higher category theory, among all those in mathematics, are the ones that sit least comfortably in a set-theoretic foundation. Thus, much of the excitement about HoTT has involved its promise to provide a language capable of reasoning about higher structures.

So far, however, the type theories used for HoTT have been limited in the generality of the higher structures they can discuss. With types as homotopical spaces, structures defined using a finite number of these and maps between them can be represented. For instance, the language of HoTT has been great for formulating 11-category theory, and there exist large formalised libraries such as the 1lab with such results; and a lot of abstract homotopy theory turns out to be doable in this way as well, sometimes by using wild categories. But 11-categories and wild categories have only two layers of structure, objects and morphisms, while we would hope also to reason internally about structures that have infinite towers of layered structure, such as \infty-categories. However, such structures have thus far resisted all attempts at definition!

One simple case of such an infinitary structure is a semi-simplicial type (SST). This is particularly important because many notions of classical higher category theory are traditionally formulated using simplicial or semi-simplicial objects. Thus, if we had a tractable approach to SSTs in HoTT, we could expect that many, if not all, other infinitary structures could also be encoded. This is one reason that the problem of defining SSTs, which was originally proposed by Vladimir Voevodsky over a decade ago, has become one of the most important open problems in Homotopy Type Theory.

SSTs: The Fibred Perspective

To explain the problem of defining SSTs, we start with a classical perspective grounded first in set theory and later in homotopy theory. A semi-simplicial set is defined to consist of sets X nX_n for n0n \geq 0, along with face maps k:X nX n1\partial_k : X_n \to X_{n-1}, for k{0,,n}k \in \{ 0, \ldots, n \}, satisfying the relations: k l= l1 kfork<l\partial_k \circ \partial_l = \partial_{l-1} \circ \partial_k \quad \text{for} \quad k \lt l One thinks of X nX_n as the set of nn-simplices, and of the face maps as giving the boundary components of a given nn-simplex. For example, X 0X_0 is the set of points, X 1X_1 is the set of lines, and X 2X_2 is the set of triangles. A triangle has three boundary lines which share three boundary points in common.

The problem of constructing semi-simplicial types can intuitively be thought of as the problem of constructing semi-simplicial sets but with homotopy types replacing sets. Here, the key is that the semi-simplicial identities from before would now read: α k,l: k l l1 kfork<l\alpha_{k,\,l} : \partial_k \circ \partial_l \simeq \partial_{l-1} \circ \partial_k \quad \text{for} \quad k \lt l Thus, we have replaced the strict set-theoretic notion of equality with a homotopical proof relevant form of equality, meaning that the choice of α k,l\alpha_{k,\,l} now carries data. In order for this notion to give something useful, we must impose coherences on these data. At the first level, for k<l<mk \lt l \lt m, we can prove that k l m m2 l1 k\partial_k \circ \partial_l \circ \partial_m \simeq \partial_{m-2} \circ \partial_{l-1} \circ \partial_k in two different ways, and we would like for those proofs to themselves be equal. This requires providing coherences β k,l,m\beta_{k,\,l,\,m} such that: β k,l,m:α k,l m l1α k,mα l1,m1 k kα l,mα k,m1 l m2α k,l\beta_{k,\,l,\,m} : \alpha_{k,\,l} \star \partial_m \cdot \partial_{l-1} \star \alpha_{k,\,m} \cdot \alpha_{l-1,\,m-1} \star \partial_k \simeq \partial_k \star \alpha_{l,\,m} \cdot \alpha_{k,\,m-1} \star \partial_l \cdot \partial_{m-2} \star \alpha_{k,\,l} Which we can visualise by the following diagram:


Of course, now the β k,l,m\beta_{k,\,l,\,m} themselves carry data, and we have to impose coherences on those. These identities come up in the context of quadruples of deletions. The first identity is given by a square diagram that says that the α k,l\alpha_{k,\,l} homotopies applied to non-interacting indices commute. The second identity is given one dimension up, and describes a filler for the following figure, called a permutohedron, whose faces are the previously mentioned hexagons and squares. It can be visualised via the illustration (by Tilman Piesk, from Wikimedia):


Writing down a formula for this is complicated, and things only become worse when you consider sequences of five or more deletions! We have thus run into the fundamental obstacle to defining infinitary structures. This goes by the technical name of Higher Coherence Issues.

When doing homotopy theory based on set theory, it is possible to overcome this problem, because we always have at our disposal the strict notion of set-theoretic equality for any objects, including homotopy types. Thus, for instance, we can talk about strict semi-simplicial spaces, where the X nX_n are now spaces and the maps m\partial_m are continuous, but the equalities k l= l1 k\partial_k \circ \partial_l = \partial_{l-1} \circ \partial_k hold strictly on the nose, and therefore all the higher coherences also hold strictly. Alternatively, we can explicitly define all the higher permutohedra and say what a “homotopy coherent semi-simplicial space” is, using the strict equality to specify how the permutohedra fit together. But in ordinary homotopy type theory, these approaches are unavailable.

SSTs: The Indexed Perspective

The previous section demonstrates a general phenomenon related to infinitary structures: as soon as the symbol (=)(=) gets used in the definition, one is plunged into also constraining the values of this data carrying equality by way of an infinite tower of coherences, each depending on the definitions of all prior ones and growing in complexity as the dimensions increase.

One promising approach, then, would be to try and define higher structures without reference to equality. In the case of semi-simplicial types, one can think of an intuitive definition which promises to do so, by breaking up “the set of nn-simplices” into a family of sets indexed by their boundaries. For example, we split up the total space of lines into many separate spaces of lines, each joining two definite endpoints (although the dependence of these indexed spaces on the endpoints is continuous). This is analogous to the two basic ways to define a category, with one collection of morphisms or with a family of collections of morphisms.

A 0:Type A 1:A 0A 0Type A 2:(xyz:A 0)A 1xyA 1xzA 1yzType A 3:(xyzw:A 0)(α:A 1xy)(β:A 1xz)(γ:A 1yz)(δ:A 1xw)(ϵ:A 1yw)(ζ:A 1zw) A 2xyzαβγA 2xywαδϵA 2xzwβδζA 2yzwγϵζType \begin{aligned} &A_0 : \mathsf{Type} \\ &A_1 : A_0 \to A_0 \to \mathsf{Type} \\ &A_2 : (x\;y\;z : A_0) \to A_1\;x\;y \to A_1\;x\;z \to A_1\;y\;z \to \mathsf{Type} \\ &A_3 : (x\;y\;z\;w : A_0)\; (\alpha : A_1\;x\;y)\; (\beta: A_1\;x\;z)\; (\gamma : A_1\;y\;z)\; (\delta : A_1\;x\;w)\; (\epsilon : A_1\;y\;w)\; (\zeta : A_1\;z\;w) \\ &\quad \to A_2\;x\;y\;z\;\alpha\;\beta\;\gamma \to A_2\;x\;y\;w\;\alpha\;\delta\;\epsilon\ \to A_2\;x\;z\;w\;\beta\;\delta\;\zeta \to A_2\;y\;z\;w\;\gamma\;\epsilon\;\zeta \to \mathsf{Type}\\ &\ldots \end{aligned}

Roughly, then, in this approach a semi-simplicial type is an “infinite record type” whose fields specify notions of points, lines, triangles, etc. When comparing this to the notion from the previous section, we call the previous one fibred and this one indexed. The face maps of the fibred formulation simply become index lookups in the indexed formulation, and this non-data is automatically infinitely coherent.

Of course, this is not yet precise either: there is the problem of what the ellipsis represents, and the lack of notion of an infinite record type. But there is evidently some kind of pattern, so it seems intuitive that this direction would be more promising as an approach to defining SSTs in type theory.

Equivalence of the Indexed and Fibred Formulations

We can begin to argue that the indexed and fibred definitions are equivalent, when all coherences are included in the latter, by considering the truncated cases that only go up to nn-simplices for some finite nn. For example, suppose that we are given the data 0, 1:X 1X 0\partial_0 ,\, \partial_1 : X_1 \to X_0, and 0, 1, 2:X 2X 1\partial_0,\, \partial_1,\, \partial_2 : X_2 \to X_1. We would like to use this data to define indexed types. At the first two stages, we define:

X˜ 0X 0 X˜ 1(x,y) α:X 1( 1α=x)×( 0α=y) \begin{aligned} &\widetilde{X}_0 \equiv X_0 \\ &\widetilde{X}_1\;(x ,\, y) \equiv \sum_{\alpha \,:\, X_1}(\partial_1\;\alpha = x) \times (\partial_0\;\alpha = y) \end{aligned}

For the next stage, we are defining X˜ 2(x,y,z,(α,p 0,q 0),(β,p 1,q 1),(γ,p 2,q 2))\widetilde{X}_2\, \big(x ,\, y,\, z,\, (\alpha,\, p_0,\, q_0),\, (\beta,\, p_1,\, q_1),\, (\gamma,\, p_2,\, q_2)\big).

One may be tempted to say that this consists of 𝔣:X 2\mathfrak{f} : X_2 with boundary data α,β,γ\alpha,\, \beta,\, \gamma, by asserting, for example, that 2𝔣=α\partial_2\;\mathfrak{f} = \alpha. However this equality in X 1X_1 leaves the endpoints free. For example, in the case of the singular semi-simplicial type of a type XX, so long as the lines 2𝔣\partial_2\;\mathfrak{f} and α\alpha lived in the same connected component of XX, they could be identified by this criterion. We see, then, that this comparison should be performed in the type X˜ 1(x,y)\widetilde{X}_1\,(x,\,y).

In order to create an element of X˜ 1(x,z)\widetilde{X}_1\,(x,\,z), we want to use 1𝔣\partial_1\;\mathfrak{f}. We then have that r 2: 0 1𝔣=zr_2 : \partial_0\; \partial_1\;\mathfrak{f} = z, giving us a proof that the right endpoint is zz. However, for the left endpoint, we only have r 0: 1 2𝔣=xr_0 : \partial_1\; \partial_2\;\mathfrak{f} = x, and we need to concatenate this on the left with an equality 1 1𝔣= 1 2𝔣\partial_1\; \partial_1\;\mathfrak{f} = \partial_1\; \partial_2\; \mathfrak{f} in order to show that 1 1𝔣=x\partial_1\; \partial_1\; \mathfrak{f} = x, as required. A similar analysis applies for X˜ 1(y,z)\widetilde{X}_1\,(y,\,z). Thus we require the commutation identities: α k,l: k l= l1 kfork<l.\alpha_{k,\,l} : \partial_k \circ \partial_l = \partial_{l-1} \circ \partial_k \quad \text{for} \quad k \lt l. Provided these identities as part of our starting data, we would then complete the definition as follows:

X˜ 2 (x,y,z,(α,p 0,q 0),(β,p 1,q 1),(γ,p 2,q 2)) (𝔣:X 2) (r 0: 1 2𝔣=x) (r 1: 0 2𝔣=y) (r 2: 0 1𝔣=z) ( 2𝔣,r 0,r 1)= X˜ 1(x,y)(α,p 0,q 0) ×( 1𝔣,apeqα 1,2𝔣r 0,r 2)= X˜ 1(x,z)(β,p 1,q 1) ×( 0𝔣,apeqα 0,2𝔣r 1,apeqα 0,1𝔣r 2)= X˜ 1(y,z)(γ,p 2,q 2) \begin{aligned} \widetilde{X}_2 \,&\big(x ,\, y,\, z,\, (\alpha,\, p_0,\, q_0),\, (\beta,\, p_1,\, q_1),\, (\gamma,\, p_2,\, q_2)\big) \equiv\\ &\sum_{(\mathfrak{f}\,:\,X_2)}\, \sum_{(r_0\,:\,\partial_1\, \partial_2\, \mathfrak{f} \,=\, x)}\, \sum_{(r_1\,:\,\partial_0\, \partial_2\, \mathfrak{f} \,=\, y)}\, \sum_{(r_2\,:\,\partial_0\, \partial_1\, \mathfrak{f} \,=\, z)}\, \\ &\quad (\partial_2\;\mathfrak{f},\,r_0,\,r_1) =_{\widetilde{X}_1\,(x,\,y)} (\alpha,\,p_0,\,q_0) \\ &\quad \times\; (\partial_1\;\mathfrak{f},\, \mathsf{apeq}\; \alpha_{1,\,2}\;\mathfrak{f} \cdot r_0,\, r_2) =_{\widetilde{X}_1\,(x,\,z)} (\beta,\, p_1,\, q_1) \\ &\quad \times\; (\partial_0\;\mathfrak{f},\, \mathsf{apeq}\; \alpha_{0,\,2}\;\mathfrak{f} \cdot r_1,\, \mathsf{apeq}\; \alpha_{0,\,1}\;\mathfrak{f} \cdot r_2) =_{\widetilde{X}_1\,(y,\,z)} (\gamma,\,p_2,\,q_2) \end{aligned}

Using contractible singletons and path algebra, one can show that forming the total spaces of the resulting indexed types leads to types equivalent to X 0,X 1,X 2X_0,\,X_1,\,X_2. Similarly, starting off with indexed types X˜ 0,X˜ 1,X˜ 2\widetilde{X}_0,\,\widetilde{X}_1,\,\widetilde{X}_2, forming their total spaces, and then performing the above construction results in equivalent types. This demonstrates an equivalence between the indexed and fibred definitions up to the second stage.

The Essence of the Problem

One can continue the above analysis to the third stage, although writing out the details would be exceptionally painful. But we can at least extrapolate the way in which the higher coherences would play a role in the definition. This leads us to two conclusions. Firstly, that the higher coherences are necessary in the fibred formulation, if we would like to extract indexed simplex types from it (which we undoubtedly would). And secondly, that since the indexed perspective is equivalent to the fibred perspective, solving the problem of defining indexed SSTs in type theory would be tantamount to solving the coherence issues in the fibred perspective; thus we should expect this problem to be more difficult than it seems.

Indeed, every naive approach to defining SSTs through the indexed formulation seems to run into the same kinds of higher coherence issues. Almost without exception, whatever clever scheme one comes up with for formalizing the pattern, it eventually transpires that in order to complete the construction, one needs to simultaneously prove a lemma about the construction. And then in order to complete that lemma, one needs to prove a meta-lemma about the proof of the lemma. In order to prove the meta-lemma, one needs to prove a meta-meta-lemma about the proof of the meta-lemma. And so on…

It’s difficult to give any more details in general. It seems that to truly appreciate this, one almost has to come up with one’s own idea for defining SSTs and try to implement it in of a proof assistant. From the outside, it seems that there’s an obvious pattern to the structure of the nn-simplex types, so one doesn’t expect it to be so hard going in. And the infinite regress tends to pop up in surprising places, when proving lemmas that seem so obvious that one tends to leave them for last (or neglect to write them down at all on paper), assuming their proofs will be easy.


The problem of semi-simplicial types, and higher coherence more generally, is also closely connected to the problem of autophagy, or “HoTT eating itself”. In fact both of us ran into this connection independently, Mike in a blog post from almost exactly ten years ago, and Astra in a second attempt to understand why the problem was hard.

The idea is that since the pattern in the indexed nn-simplex types can be defined syntactically, if we could define the syntax and typing rules of HoTT inside of HoTT, and write a self-interpreter that takes an internally-defined well-typed type or term and returns an actual type or term, then we could define the nn-simplex types syntactically and then apply the self-interpreter. However, in the course of trying to write a self-interpreter, one encounters essentially the same permutahedral identities described above. Not every approach to constructing SSTs has to go through syntax, of course, but this suggests that the problem of SSTs is closely related to the problem of self-interpreters and a notion of infinitely-coherent syntax for type theories. Indeed, one may hope that perhaps solving SSTs would be sufficient to enable self-interpretation, as we hope it would be for other higher coherence problems.

We now discuss two alternative approaches to solving this related collection of problems.

The Two-Level Approach

As noted above, in classical homotopy theory, it is possible to define (fibred) semi-simplicial types without needing infinite coherences, by using the ambient strict set-theoretic notion of equality. Thus, one way to avoid the problem of infinite coherences in HoTT is to re-introduce a stricter notion of equality. Two-level type theory (2LTT), formulated by Annenkov, Capriotti, Kraus, and Sattler, following an idea of Voevodsky, achieves this by stratifying types into “inner” or “fibrant” types, which are homotopical, and “outer” or “non-fibrant” “exo-types”, which are not. The non-fibrant equality exo-type then plays the role of the strict set-level equality in classical homotopy theory, enabling a correct definition of semi-simplicial types without incorporating higher coherences… under an additional hypothesis.

Specifically, in two-level type theory there is both a fibrant natural numbers type (“nat”) and a non-fibrant natural numbers exo-type (“exo-nat”). Without additional assumptions on the relation between these two, all we can define (apparently) is the family of types of nn-truncated semi-simplicial types indexed by nn in exo-nat. The “limit” of these types can be easily constructed, but without further assumptions it is only be an exo-type, not a fibrant type.

A sufficient assumption for this is that the two kinds of natural numbers coincide, or equivalently that exo-nat is fibrant. This appears a fairly strong axiom, however; it holds in the “classical” simplicial model, but it is unknown whether all (,1)(\infty,1)-toposes can be presented by a model in which it holds. A better axiom, therefore, is that exo-nat is “cofibrant”, a technical term from 2LTT essentially saying that Π\Pi-types with it as their domain preserve fibrancy, and therefore in particular the limit of a tower of fibrant types is fibrant. Elif Uskuplu has recently shown that any model of type theory whose types are closed under externally indexed countable products (including models for all (,1)(\infty,1)-toposes) can be enhanced to a model of 2LTT in which exo-nat is cofibrant.

Thus, this approach has reasonable semantic generality. However, it is unclear how practical it is for formalization in proof assistants. Paper proofs in 2LTT often assume that the exo-equality satisfies the “reflection rule” and hence coincides with definitional equality. But this is very difficult to achieve in a proof assistant, so implementations of 2LTT (such as Agda’s recent two-level flag) usually instead assume merely that the exo-equality satisfies Uniqueness of Identity Proofs. Unfortunately, this means we have to transport across exo-equalities explicitly in terms, which tends to lead to large combinatorial blowups in proofs.

Informally, one can argue that 2LTT is a “brute force” solution: we internalize the entire metatheory (the universe of exo-types), and then assume that whatever infinite constructions we want (e.g. exo-nat-indexed products) can be reflected into the original type theory. We would like a solution that is more closely tailored to our goal, allowing more external equalities to be represented definitionally in the syntax.

The Synthetic Approach

Another approach is to give up on the goal of defining (semi-)simplicial types and instead axiomatize their behavior. This is analogous to how ordinary homotopy type theory axiomatizes the behavior of \infty-groupoids rather than defining them in terms of sets. In type theory we call this a “synthetic” approach, in contrast to the “analytic” approach of defining them out of sets, making an analogy to the contrast between Euclid’s “synthetic geometry” of undefined points and lines and the “analytic geometry” of pairs of real numbers.

Mike and Emily Riehl formulated a “simplicial type theory” like this in A type theory for synthetic ∞-categories, where the types behave like simplicial objects. Specifically, there is a “directed interval” type that can be used to detect this simplicial structure, analogous to the undirected interval in cubical type theory that detects the homotopical structure. One can then define internally which types are “Segal” and “Rezk” and start to develop “synthetic higher category theory” with these types.

This sort of synthetic higher category theory is under active investigation, and shows a lot of promise. In particular, there is now a proof assistant called Rzk implementing it, and many of the basic results have been formalized by Nikolai Kudasov, Jonathan Weinberger, and Emily. Many of us regard this theory (and its relatives such as “bicubical” type theory) as the most practical approach to “directed type theory” currently available.

However, taking a synthetic approach has also undeniably changed the question. For various reasons, it would be interesting and valuable to have a type theory in which we can define (semi-)simplicial types rather than postulating them. This is the problem addressed in our paper, to which we will turn in the second post of this series.

April 14, 2024

John BaezProtonium

It looks like they’ve found protonium in the decay of a heavy particle!

Protonium is made of a proton and an antiproton orbiting each other. It lasts a very short time before they annihilate each other.

It’s a bit like a hydrogen atom where the electron has been replaced with an antiproton! But it’s much smaller than a hydrogen atom. And unlike a hydrogen atom, which is held together by the electric force, protonium is mainly held together by the strong nuclear force.

There are various ways to make protonium. One is to make a bunch of antiprotons and mix them with protons. This was done accidentally in 2002. They only realized this upon carefully analyzing the data 4 years later.

This time, people were studying the decay of the J/psi particle. The J/psi is made of a heavy quark and its antiparticle. It’s 3.3 times as heavy as a proton, so it’s theoretically able to decay into protonium. And careful study showed that yes, it does this sometimes!

The new paper on this has a rather dry title—not “We found protonium!” But it has over 550 authors, which hints that it’s a big deal. I won’t list them.

• BESIII Collaboration, Observation of the anomalous shape of X(1840) in J/ψ→γ3(π+π−), Phys. Rev. Lett. 132 (2024), 151901.

The idea here is that sometimes the J/ψ particle decays into a gamma ray and 3 pion-antipion pairs. When they examined this decay, they found evidence that an intermediate step involved a particle of mass 1880 MeV/c², a bit more than an already known intermediate of mass 1840 MeV/c².

This new particle is a bit lighter than twice the mass of a proton, 938 MeV/c². So, there’s a good chance that it’s protonium!

But how did physicists made protonium by accident in 2002? They were trying to make antihydrogen, which is a positron orbiting an antiproton. To do this, they used the Antiproton Decelerator at CERN. This is just one of the many cool gadgets they keep near the Swiss-French border.

You see, to create antiprotons you need to smash particles at each other at almost the speed of light—so the antiprotons usually shoot out really fast. It takes serious cleverness to slow them down and catch them without letting them bump into matter and annihilate.

That’s what the Antiproton Decelerator does. So they created a bunch of antiprotons and slowed them down. Once they managed to do this, they caught the antiprotons in a Penning trap. This holds charged particles using magnetic and electric fields. Then they cooled the antiprotons—slowed them even more—by letting them interact with a cold gas of electrons. Then they mixed in some positrons. And they got antihydrogen!

But apparently some protons got in there too, so they also made some protonium, by accident. They only realized this when they carefully analyzed the data 4 years later, in a paper with only a few authors:

• N. Zurlo, M. Amoretti, C. Amsler, G. Bonomi, C. Carraro, C. L. Cesar, M. Charlton, M. Doser, A. Fontana, R. Funakoshi, P. Genova, R. S. Hayano, L. V. Jorgensen, A. Kellerbauer, V. Lagomarsino, R. Landua, E. Lodi Rizzini, M. Macri, N. Madsen, G. Manuzio, D. Mitchard, P. Montagna, L. G. Posada, H. Pruys, C. Regenfus, A. Rotondi, G. Testera, D. P. Van der Werf, A. Variola, L. Venturelli and Y. Yamazaki, Production of slow protonium in vacuum, Hyperfine Interactions 172 (2006), 97–105.

Protonium is sometimes called an ‘exotic atom’—though personally I’d consider it an exotic nucleus. The child in me thinks it’s really cool that there’s an abbreviation for protonium, Pn, just like a normal element.

John Preskill“Once Upon a Time”…with a twist

The Noncommuting-Charges World Tour (Part 1 of 4)

This is the first part in a four part series covering the recent Perspectives article on noncommuting charges. I’ll be posting one part every 6 weeks leading up to my PhD thesis defence.

Thermodynamics problems have surprisingly many similarities with fairy tales. For example, most of them begin with a familiar opening. In thermodynamics, the phrase “Consider an isolated box of particles” serves a similar purpose to “Once upon a time” in fairy tales—both serve as a gateway to their respective worlds. Additionally, both have been around for a long time. Thermodynamics emerged in the Victorian era to help us understand steam engines, while Beauty and the Beast and Rumpelstiltskin, for example, originated about 4000 years ago. Moreover, each conclude with important lessons. In thermodynamics, we learn hard truths such as the futility of defying the second law, while fairy tales often impart morals like the risks of accepting apples from strangers. The parallels go on; both feature archetypal characters—such as wise old men and fairy godmothers versus ideal gases and perfect insulators—and simplified models of complex ideas, like portraying clear moral dichotomies in narratives versus assuming non-interacting particles in scientific models.1

Of all the ways thermodynamic problems are like fairytale, one is most relevant to me: both have experienced modern reimagining. Sometimes, all you need is a little twist to liven things up. In thermodynamics, noncommuting conserved quantities, or charges, have added a twist.

Unfortunately, my favourite fairy tale, ‘The Hunchback of Notre-Dame,’ does not start with the classic opening line ‘Once upon a time.’ For a story that begins with this traditional phrase, ‘Cinderella’ is a great choice.

First, let me recap some of my favourite thermodynamic stories before I highlight the role that the noncommuting-charge twist plays. The first is the inevitability of the thermal state. For example, this means that, at most times, the state of most sufficiently small subsystem within the box will be close to a specific form (the thermal state).

The second is an apparent paradox that arises in quantum thermodynamics: How do the reversible processes inherent in quantum dynamics lead to irreversible phenomena such as thermalization? If you’ve been keeping up with Nicole Yunger Halpern‘s (my PhD co-advisor and fellow fan of fairytale) recent posts on the eigenstate thermalization hypothesis (ETH) (part 1 and part 2) you already know the answer. The expectation value of a quantum observable is often comprised of a sum of basis states with various phases. As time passes, these phases tend to experience destructive interference, leading to a stable expectation value over a longer period. This stable value tends to align with that of a thermal state’s. Thus, despite the apparent paradox, stationary dynamics in quantum systems are commonplace.

The third story is about how concentrations of one quantity can cause flows in another. Imagine a box of charged particles that’s initially outside of equilibrium such that there exists gradients in particle concentration and temperature across the box. The temperature gradient will cause a flow of heat (Fourier’s law) and charged particles (Seebeck effect) and the particle-concentration gradient will cause the same—a flow of particles (Fick’s law) and heat (Peltier effect). These movements are encompassed within Onsager’s theory of transport dynamics…if the gradients are very small. If you’re reading this post on your computer, the Peltier effect is likely at work for you right now by cooling your computer.

What do various derivations of the thermal state’s forms, the eigenstate thermalization hypothesis (ETH), and the Onsager coefficients have in common? Each concept is founded on the assumption that the system we’re studying contains charges that commute with each other (e.g. particle number, energy, and electric charge). It’s only recently that physicists have acknowledged that this assumption was even present.

This is important to note because not all charges commute. In fact, the noncommutation of charges leads to fundamental quantum phenomena, such as the Einstein–Podolsky–Rosen (EPR) paradox, uncertainty relations, and disturbances during measurement. This raises an intriguing question. How would the above mentioned stories change if we introduce the following twist?

“Consider an isolated box with charges that do not commute with one another.” 

This question is at the core of a burgeoning subfield that intersects quantum information, thermodynamics, and many-body physics. I had the pleasure of co-authoring a recent perspective article in Nature Reviews Physics that centres on this topic. Collaborating with me in this endeavour were three members of Nicole’s group: the avid mountain climber, Billy Braasch; the powerlifter, Aleksander Lasek; and Twesh Upadhyaya, known for his prowess in street basketball. Completing our authorship team were Nicole herself and Amir Kalev.

To give you a touchstone, let me present a simple example of a system with noncommuting charges. Imagine a chain of qubits, where each qubit interacts with its nearest and next-nearest neighbours, such as in the image below.

The figure is courtesy of the talented team at Nature. Two qubits form the system S of interest, and the rest form the environment E. A qubit’s three spin components, σa=x,y,z, form the local noncommuting charges. The dynamics locally transport and globally conserve the charges.

In this interaction, the qubits exchange quanta of spin angular momentum, forming what is known as a Heisenberg spin chain. This chain is characterized by three charges which are the total spin components in the x, y, and z directions, which I’ll refer to as Qx, Qy, and Qz, respectively. The Hamiltonian H conserves these charges, satisfying [H, Qa] = 0 for each a, and these three charges are non-commuting, [Qa, Qb] 0, for any pair a, b ∈ {x,y,z} where a≠b. It’s noteworthy that Hamiltonians can be constructed to transport various other kinds of noncommuting charges. I have discussed the procedure to do so in more detail here (to summarize that post: it essentially involves constructing a Koi pond).

This is the first in a series of blog posts where I will highlight key elements discussed in the perspective article. Motivated by requests from peers for a streamlined introduction to the subject, I’ve designed this series specifically for a target audience: graduate students in physics. Additionally, I’m gearing up to defending my PhD thesis on noncommuting-charge physics next semester and these blog posts will double as a fun way to prepare for that.

  1. This opening text was taken from the draft of my thesis. ↩

April 13, 2024

Doug NatelsonElectronic structure and a couple of fun links

Real life has been very busy recently.  Posting will hopefully pick up soon.  

One brief item.  Earlier this week, Rice hosted Gabi Kotliar for a distinguished lecture, and he gave a very nice, pedagogical talk about different approaches to electronic structure calculations.  When we teach undergraduate chemistry on the one hand and solid state physics on the other, we largely neglect electron-electron interactions (except for very particular issues, like Hund's Rules).  Trying to solve the many-electron problem fully is extremely difficult.  Often, approximating by solving the single-electron problem (e.g. finding the allowed single-electron states for a spatially periodic potential as in a crystal) and then "filling up"* those states gives decent results.   As we see in introductory courses, one can try different types of single-electron states.  We can start with atomic-like orbitals localized to each site, and end up doing tight binding / LCAO / Hückel (when applied to molecules).  Alternately, we can do the nearly-free electron approach and think about Bloch wavesDensity functional theory, discussed here, is more sophisticated but can struggle with situations when electron-electron interactions are strong.

One of Prof. Kotliar's big contributions is something called dynamical mean field theory, an approach to strongly interacting problems.  In a "mean field" theory, the idea is to reduce a many-particle interacting problem to an effective single-particle problem, where that single particle feels an interaction based on the averaged response of the other particles.  Arguably the most famous example is in models of magnetism.  We know how to write the energy of a spin \(\mathbf{s}_{i}\) in terms of its interactions \(J\) with other spins \(\mathbf{s}_{j}\) as \(\sum_{j} J \mathbf{s}_{i}\cdot \mathbf{s}_{j}\).  If there are \(z\) such neighbors that interact with spin \(i\), then we can try instead writing that energy as \(zJ \mathbf{s}_{i} \cdot \langle \mathbf{s}_{i}\rangle\), where the angle brackets signify the average.  From there, we can get a self-consistent equation for \(\langle \mathbf{s}_{i}\rangle\).  

Dynamical mean field theory is rather similar in spirit; there are non-perturbative ways to solve some strong-interaction "quantum impurity" problems.  DMFT is like a way of approximating a whole lattice of strongly interacting sites as a self-consistent quantum impurity problem for one site.  The solutions are not for wave functions but for the spectral function.  We still can't solve every strongly interacting problem, but Prof. Kotliar makes a good case that we have made real progress in how to think about many systems, and when the atomic details matter.

*Here, "filling up" means writing the many-electron wave function as a totally antisymmetric linear combination of single-electron states, including the spin states.

PS - two fun links:

April 12, 2024

Matt von HippelThe Hidden Higgs

Peter Higgs, the theoretical physicist whose name graces the Higgs boson, died this week.

Peter Higgs, after the Higgs boson discovery was confirmed

This post isn’t an obituary: you can find plenty of those online, and I don’t have anything special to say that others haven’t. Reading the obituaries, you’ll notice they summarize Higgs’s contribution in different ways. Higgs was one of the people who proposed what today is known as the Higgs mechanism, the principle by which most (perhaps all) elementary particles gain their mass. He wasn’t the only one: Robert Brout and François Englert proposed essentially the same idea in a paper that was published two months earlier, in August 1964. Two other teams came up with the idea slightly later than that: Gerald Guralnik, Carl Richard Hagen, and Tom Kibble were published one month after Higgs, while Alexander Migdal and Alexander Polyakov found the idea independently in 1965 but couldn’t get it published till 1966.

Higgs did, however, do something that Brout and Englert didn’t. His paper doesn’t just propose a mechanism, involving a field which gives particles mass. It also proposes a particle one could discover as a result. Read the more detailed obituaries, and you’ll discover that this particle was not in the original paper: Higgs’s paper was rejected at first, and he added the discussion of the particle to make it more interesting.

At this point, I bet some of you are wondering what the big deal was. You’ve heard me say that particles are ripples in quantum fields. So shouldn’t we expect every field to have a particle?

Tell that to the other three Higgs bosons.

Electromagnetism has one type of charge, with two signs: plus, and minus. There are electrons, with negative charge, and their anti-particles, positrons, with positive charge.

Quarks have three types of charge, called colors: red, green, and blue. Each of these also has two “signs”: red and anti-red, green and anti-green, and blue and anti-blue. So for each type of quark (like an up quark), there are six different versions: red, green, and blue, and anti-quarks with anti-red, anti-green, and anti-blue.

Diagram of the colors of quarks

When we talk about quarks, we say that the force under which they are charged, the strong nuclear force, is an “SU(3)” force. The “S” and “U” there are shorthand for mathematical properties that are a bit too complicated to explain here, but the “(3)” is quite simple: it means there are three colors.

The Higgs boson’s primary role is to make the weak nuclear force weak, by making the particles that carry it from place to place massive. (That way, it takes too much energy for them to go anywhere, a feeling I think we can all relate to.) The weak nuclear force is an “SU(2)” force. So there should be two “colors” of particles that interact with the weak nuclear force…which includes Higgs bosons. For each, there should also be an anti-color, just like the quarks had anti-red, anti-green, and anti-blue. So we need two “colors” of Higgs bosons, and two “anti-colors”, for a total of four!

But the Higgs boson discovered at the LHC was a neutral particle. It didn’t have any electric charge, or any color. There was only one, not four. So what happened to the other three Higgs bosons?

The real answer is subtle, one of those physics things that’s tricky to concisely explain. But a partial answer is that they’re indistinguishable from the W and Z bosons.

Normally, the fundamental forces have transverse waves, with two polarizations. Light can wiggle along its path back and forth, or up and down, but it can’t wiggle forward and backward. A fundamental force with massive particles is different, because they can have longitudinal waves: they have an extra direction in which they can wiggle. There are two W bosons (plus and minus) and one Z boson, and they all get one more polarization when they become massive due to the Higgs.

That’s three new ways the W and Z bosons can wiggle. That’s the same number as the number of Higgs bosons that went away, and that’s no coincidence. We physicist like to say that the W and Z bosons “ate” the extra Higgs, which is evocative but may sound mysterious. Instead, you can think of it as the two wiggles being secretly the same, mixing together in a way that makes them impossible to tell apart.

The “count”, of how many wiggles exist, stays the same. You start with four Higgs wiggles, and two wiggles each for the precursors of the W+, W-, and Z bosons, giving ten. You end up with one Higgs wiggle, and three wiggles each for the W+, W-, and Z bosons, which still adds up to ten. But which fields match with which wiggles, and thus which particles we can detect, changes. It takes some thought to look at the whole system and figure out, for each field, what kind of particle you might find.

Higgs did that work. And now, we call it the Higgs boson.

Matt Strassler Peter Higgs versus the “God Particle”

The particle physics community is mourning the passing of Peter Higgs, the influential theoretical physicist and 2013 Nobel Prize laureate. Higgs actually wrote very few papers in his career, but he made them count.

It’s widely known that Higgs deeply disapproved of the term “God Particle”. That’s the nickname that has been given to the type of particle (the “Higgs boson”) whose existence he proposed. But what’s not as widely appreciated is why he disliked it, as do most other scientists I know.

It’s true that Higgs himself was an atheist. Still, no matter what your views on such subjects, it might bother you that the notion of a “God Particle” emerged neither from science nor from religion, and could easily be viewed as disrespectful to both of them. Instead, it arose out of marketing and advertising in the publishing industry, and it survives due to another industry: the news media.

But there’s something else more profound — something quite sad, really. The nickname puts the emphasis entirely in the wrong place. It largely obscures what Higgs (and his colleagues/competitors) actually accomplished, and why they are famous among scientists.

Let me ask you this. Imagine a type of particle that

  • once created, vanishes in a billionth of a trillionth of a second,
  • is not found naturally on Earth, nor anywhere in the universe for billions of years,
  • has no influence on daily life — in fact it has never had any direct impact on the human species — and
  • only was discovered when humans started making examples artificially.

This doesn’t seem very God-like to me. What do you think?

Perhaps this does seem spiritual or divine to you, and in that case, by all means call the “Higgs boson” the “God Particle”. But otherwise, you might want to consider alternatives.

For most humans, and even for most professional physicists, the only importance of the Higgs boson is this: it gives us insight into the Higgs field. This field

  • exists everywhere, including within the Earth and within every human body,
  • has existed throughout the history of the known universe,
  • has been reliably constant and steady since the earliest moments of the Big Bang, and
  • is crucial for the existence of atoms, and therefore for the existence of Earth and all its life;

It may even be capable of bringing about the universe’s destruction, someday in the distant future. So if you’re going to assign some divinity to Higgs’ insights, this is really where it belongs.

In short, what’s truly consequential in Higgs’ work (and that of others who had the same basic idea: Robert Brout and Francois Englert, and Gerald Guralnik, C. Richard Hagen and Tom Kibble) is the Higgs field. Your life depends upon the existence and stability of this field. The discovery in 2012 of the Higgs boson was important because it proved that the Higgs field really exists in nature. Study of this type of particle continues at the Large Hadron Collider, not because we are fascinated by the particle per se, but because measuring its properties is the most effective way for us to learn more about the all-important Higgs field.

Professor Higgs helped reveal one of the universe’s great secrets, and we owe him a great deal. I personally feel that we would honor his legacy, in a way that would have pleased him, through better explanations of what he achieved — ones that clarify how he earned a place in scientists’ Hall of Fame for eternity.

April 11, 2024

Scott Aaronson Avi Wigderson wins Turing Award!

Back in 2006, in the midst of an unusually stupid debate in the comment section of Lance Fortnow and Bill Gasarch’s blog, someone chimed in:

Since the point of theoretical computer science is solely to recognize who is the most badass theoretical computer scientist, I can only say:



Avi Wigderson: central unifying figure of theoretical computer science for decades; consummate generalist who’s contributed to pretty much every corner of the field; advocate and cheerleader for the field; postdoc adviser to a large fraction of all theoretical computer scientists, including both me and my wife Dana; derandomizer of BPP (provided E requires exponential-size circuits). Now, Avi not only “owns you,” he also owns a well-deserved Turing Award (on top of his well-deserved Nevanlinna, Abel, Gödel, and Knuth prizes). As Avi’s health has been a matter of concern to those close to him ever since his cancer treatment, which he blogged about a few years ago, I’m sure today’s news will do much to lift his spirits.

I first met Avi a quarter-century ago, when I was 19, at a PCMI summer school on computational complexity at the Institute for Advanced Study in Princeton. Then I was lucky enough to visit Avi in Israel when he was still a professor at the Hebrew University (and I was a grad student at Berkeley)—first briefly, but then Avi invited me back to spend a whole semester in Jerusalem, which ended up being one of my most productive semesters ever. Then Avi, having by then moved to the IAS in Princeton, hosted me for a one-year postdoc there, and later he and I collaborated closely on the algebrization paper. He’s had a greater influence on my career than all but a tiny number of people, and I’m far from the only one who can say that.

Summarizing Avi’s scientific contributions could easily fill a book, but Quanta and New Scientist and Lance’s blog can all get you started if you’re interested. Eight years ago, I took a stab at explaining one tiny little slice of Avi’s impact—namely, his decades-long obsession with “why the permanent is so much harder than the determinant”—in my IAS lecture Avi Wigderson’s “Permanent” Impact On Me, to which I refer you now (I can’t produce a new such lecture on one day’s notice!).

Huge congratulations to Avi.

Jordan EllenbergRoad trip to totality 2024

The last time we did this it was so magnificent that I said, on the spot, “see you again in 2024,” and seven years didn’t dim my wish to see the sun wink out again. It was easier this time — the path went through Indiana, which is a lot closer to home than St. Louis. More importantly, CJ can drive now, and likes to, so the trip is fully chauffeured. We saw the totality in Zionsville, IN, in a little park at the end of a residential cul-de-sac.

It was a smaller crowd than the one at Festus, MO in 2017; and unlike last time there weren’t a lot of travelers. These were just people who happened to live in Zionsville, IN and who were home in the middle of the day to see the eclipse. There were clouds, and a lot of worries about the clouds, but in the end it was just thin cirrus strips that blocked the sun, and then the non-sun, not at all.

To me it was a little less dramatic this time — because the crowd was more casual, because the temperature drop was less stark in April than it was in August, and of course because it was never again going to be the first time. But CJ and AB thought this one was better. We had very good corona. You could see a tiny red dot on the edge of the sun which was in fact a plasma prominence much bigger than the Earth.

Some notes:

  • We learned our lesson last time when we got caught in a massive traffic jam in the middle of a cornfield. We chose Zionsville because it was in the northern half of the totality, right on the highway, so we could be in the car zipping north on I-65 before the massive wave of northbound traffic out of Indianapolis caught up with us. And we were! Very satisfying, to watch on Google Maps as the traffic jam got longer and longer behind us, but was never quite where we were, as if we were depositing it behind us.
  • We had lunch in downtown Indianapolis where there is a giant Kurt Vonnegut Jr. painted on a wall. CJ is reading Slaughterhouse Five for school — in fact, to my annoyance, it’s the only full novel they’ve read in their American Lit elective. But it’s a pretty good choice for high school assigned reading. In the car I tried to explain Vonnegut’s theory of the granfaloon as it applied to “Hoosier” but neither kid was really interested.
  • We’ve done a fair number of road trips in the Mach-E and this was the first time charging created any annoyance. The Electrify America station we wanted on the way down had two chargers in use and the other two broken, so we had to detour quite a ways into downtown Lafayette to charge at a Cadillac dealership. On the way back, the station we planned on was full with one person waiting in line, so we had to change course and charge at the Whole Foods parking lot, and even there we got lucky as one person was leaving just as we arrived. The charging process probably added an hour to our trip each way.
  • While we charged at the Whole Foods in Schaumburg we hung out at the Woodfield Mall. Nostalgic feelings, for this suburban kid, to be in a thriving, functioning mall, with groups of kids just hanging out and vaguely shopping, the way we used to. The malls in Madison don’t really work like this any more. Is it a Chicago thing?
  • CJ is off to college next year. Sad to think there may not be any more roadtrips, or at least any more roadtrips where all of us are starting from home.
  • I was wondering whether total eclipses in the long run are equidistributed on the Earth’s surface and the answer is no: Ernie Wright at NASA made an image of the last 5000 years of eclipse paths superimposed:

There are more in the northern hemisphere than the southern because there are more eclipses in the summer (sun’s up longer!) and the sun is a little farther (whence visually a little smaller and more eclipsible) during northern hemisphere summer than southern hemisphere summer.

See you again in 2045!

April 09, 2024

Tommaso DorigoGoodbye Peter Higgs, And Thanks For The Boson

Peter Higgs passed away yesterday, at the age of 94. The scottish physicist, a winner of the 2013 Nobel Prize in Physics together with Francois Englert, hypothesized in 1964 the existence of the most mysterious elementary particle we know of, the Higgs boson, which was only discovered 48 years later by the ATLAS and CMS collaborations at the CERN Large Hadron Collider. 

read more

Matt Strassler Star Power

A quick note today, as I am flying to Los Angeles in preparation for

and other events next week.

I hope many of you were able, as I was, to witness the total solar eclipse yesterday. This was the third I’ve seen, and each one is different; the corona, prominences, stars, planets, and sky color all vary greatly, as do the sounds of animals. (I have written about my adventures going to my first one back in 1999; yesterday was a lot easier.)

Finally, of course, the physics world is mourning the loss of Peter Higgs. Back in 1964, Higgs proposed the particle known as the Higgs boson, as a consequence of what we often call the Higgs field. (Note that the field was also proposed, at the same time, by Robert Brout and Francois Englert.) Much is being written about Higgs today, and I’ll leave that to the professional journalists. But if you want to know what Higgs actually did (rather than the pseudo-descriptions that you’ll find in the press) then you have come to the right place. More on that later in the week.

April 05, 2024

Terence TaoMarton’s conjecture in abelian groups with bounded torsion

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let {G = (G,+)} be an abelian {m}-torsion group (thus, {mx=0} for all {x \in G}), and let {A \subset G} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {(2K)^{O(m^3)}} translates of a subgroup {H} of {G} of cardinality at most {|A|}. Moreover, {H} is contained in {\ell A - \ell A} for some {\ell \ll (2 + m \log K)^{O(m^3 \log m)}}.

We had previously established the {m=2} case of this result, with the number of translates bounded by {(2K)^{12}} (which was subsequently improved to {(2K)^{11}} by Jyun-Jie Liao), but without the additional containment {H \subset \ell A - \ell A}. It remains a challenge to replace {\ell} by a bounded constant (such as {2}); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The {m=2} result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic {2} to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to {m}-dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let {G} be an abelian {m}-torsion group, and let {X_1,\dots,X_m} be independent finitely supported random variables on {G}, such that

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,

where {{\bf H}} denotes Shannon entropy. Then there is a uniform random variable {U_H} on a subgroup {H} of {G} such that

\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,

where {d} denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the {X_i} take values in some symmetric set {S}, then {H} lies in {\ell S} for some {\ell \ll (2 + \log K)^{O(m^3 \log m)}}.

As a first approximation, one should think of all the {X_i} as identically distributed, and having the uniform distribution on {A}, as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the {X_i} separately. It also is technically convenient to work with {m} independent variables, rather than just a pair of variables as we did in the {m=2} case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications {X'_1,\dots,X'_m} of {X_1,\dots,X_m} that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the {X_i} are all translates of a uniform random variable {U_H} on a subgroup {H}). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent {12}, but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the {m=2} case, albeit with somewhat worse exponents).
As before, we search for such improved random variables {X'_1,\dots,X'_m} by introducing more independent random variables – we end up taking an array of {m^2} random variables {Y_{i,j}} for {i,j=1,\dots,m}, with each {Y_{i,j}} a copy of {X_i}, and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums {\sum_j Y_{i,j}} of our array will end up being close to independent of the column sums {\sum_i Y_{i,j}}, subject to conditioning on the total sum {\sum_{i,j} Y_{i,j}}. Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if {X,Y} are two independent copies of a Gaussian random variable, then {X+Y} and {X-Y} are also independent of each other.
Up until now, the argument does not use the {m}-torsion hypothesis, nor the fact that we work with an {m \times m} array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum {\sum_{i,j} Y_{i,j}}). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable {U} of small entropic doubling that is reasonably close to all of the {X_i} in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

Matt von HippelMaking More Nails

They say when all you have is a hammer, everything looks like a nail.

Academics are a bit smarter than that. Confidently predict a world of nails, and you fall to the first paper that shows evidence of a screw. There are limits to how long you can delude yourself when your job is supposed to be all about finding the truth.

You can make your own nails, though.

Suppose there’s something you’re really good at. Maybe, like many of my past colleagues, you can do particle physics calculations faster than anyone else, even when the particles are super-complicated hypothetical gravitons. Maybe you know more than anyone else about how to make a quantum computer, or maybe you just know how to build a “quantum computer“. Maybe you’re an expert in esoteric mathematics, who can re-phrase anything in terms of the arcane language of category theory.

That’s your hammer. Get good enough with it, and anyone with a nail-based problem will come to you to solve it. If nails are trendy, then you’ll impress grant committees and hiring committees, and your students will too.

When nails aren’t trendy, though, you need to try something else. If your job is secure, and you don’t have students with their own insecure jobs banging down your door, then you could spend a while retraining. You could form a reading group, pick up a textbook or two about screwdrivers and wrenches, and learn how to use different tools. Eventually, you might find a screwdriving task you have an advantage with, something you can once again do better than everyone else, and you’ll start getting all those rewards again.

Or, maybe you won’t. You’ll get less funding to hire people, so you’ll do less research, so your work will get less impressive and you’ll get less funding, and so on and so forth.

Instead of risking that, most academics take another path. They take what they’re good at, and invent new problems in the new trendy area to use that expertise.

If everyone is excited about gravitational waves, you turn a black hole calculation into a graviton calculation. If companies are investing in computation in the here-and-now, then you find ways those companies can use insights from your quantum research. If everyone wants to know how AI works, you build a mathematical picture that sort of looks like one part of how AI works, and do category theory to it.

At first, you won’t be competitive. Your hammer isn’t going to work nearly as well as the screwdrivers people have been using forever for these problems, and there will be all sorts of new issues you have to solve just to get your hammer in position in the first place. But that doesn’t matter so much, as long as you’re honest. Academic research is expected to take time, applications aren’t supposed to be obvious. Grant committees care about what you’re trying to do, as long as you have a reasonably plausible story about how you’ll get there.

(Investors are also not immune to a nice story. Customers are also not immune to a nice story. You can take this farther than you might think.)

So, unlike the re-trainers, you survive. And some of the time, you make it work. Your hammer-based screwdriving ends up morphing into something that, some of the time, actually does something the screwdrivers can’t. Instead of delusionally imagining nails, you’ve added a real ersatz nail to the world, where previously there was just a screw.

Making nails is a better path for you. Is it a better path for the world? I’m not sure.

If all those grants you won, all those jobs you and your students got, all that money from investors or customers drawn in by a good story, if that all went to the people who had the screwdrivers in the first place, could they have done a better job?

Sometimes, no. Sometimes you happen upon some real irreproducible magic. Your hammer is Thor’s hammer, and when hefted by the worthy it can do great things.

Sometimes, though, your hammer was just the hammer that got the funding. Now every screwdriver kit has to have a space for a little hammer, when it could have had another specialized screwdriver that fit better in the box.

In the end, the world is build out of these kinds of ill-fitting toolkits. We all try to survive, both as human beings and by our sub-culture’s concept of the good life. We each have our hammers, and regardless of whether the world is full of screws, we have to convince people they want a hammer anyway. Everything we do is built on a vast rickety pile of consequences, the end-results of billions of people desperate to be wanted. For those of us who love clean solutions and ideal paths, this is maddening and frustrating and terrifying. But it’s life, and in a world where we never know the ideal path, screw-nails and nail-screws are the best way we’ve found to get things done.

Scott Aaronson And yet quantum computing continues to progress

Pissing away my life in a haze of doomscrolling, sporadic attempts to “parent” two rebellious kids, and now endless conversations about AI safety, I’m liable to forget for days that I’m still mostly known (such as I am) as a quantum computing theorist, and this blog is still mostly known as a quantum computing blog. Maybe it’s just that I spent a quarter-century on quantum computing theory. As an ADHD sufferer, anything could bore me after that much time, even one of the a-priori most exciting things in the world.

It’s like, some young whippersnappers proved another monster 80-page theorem that I’ll barely understand tying together the quantum PCP conjecture, area laws, and Gibbs states? Another company has a quantum software platform, or hardware platform, and they’ve issued a press release about it? Another hypester claimed that QC will revolutionize optimization and machine learning, based on the usual rogues’ gallery of quantum heuristic algorithms that don’t seem to outperform classical heuristics? Another skeptic claimed that scalable quantum computing is a pipe dream—mashing together the real reasons why it’s difficult with basic misunderstandings of the fault-tolerance theorem? In each case, I’ll agree with you that I probably should get up, sit at my laptop, and blog about it (it’s hard to blog with two thumbs), but as likely as not I won’t.

And yet quantum computing continues to progress. In December we saw Harvard and QuEra announce a small net gain from error-detection in neutral atoms, and accuracy that increased with the use of larger error-correcting codes. Today, a collaboration between Microsoft and Quantinuum has announced what might be the first demonstration of error-corrected two-qubit entangling gates with substantially lower error than the same gates applied to the bare physical qubits. (This is still at the stage where you need to be super-careful in how you phrase every such sentence—experts should chime in if I’ve already fallen short; I take responsibility for any failures to error-correct this post.)

You can read the research paper here, or I’ll tell you the details to the best of my understanding (I’m grateful to Microsoft’s Krysta Svore and others from the collaboration for briefing me by Zoom). The collaboration used a trapped-ion system with 32 fully-connected physical qubits (meaning, the qubits can be shuttled around a track so that any qubit can directly interact with any other). One can apply an entangling gate to any pair of qubits with ~99.8% fidelity.

What did they do with this system? They created up to 4 logical encoded qubits, using the Steane code and other CSS codes. Using logical CNOT gates, they then created logical Bell pairs — i.e., (|00⟩+|11⟩)/√2 — and verified that they did this.

That’s in the version of their experiment that uses “preselection but not postselection.” In other words, they have to try many times until they prepare the logical initial states correctly—as with magic state factories. But once they do successfully prepare the initial states, there’s no further cheating involving postselection (i.e., throwing away bad results): they just apply the logical CNOT gates, measure, and see what they got.

For me personally, that’s the headline result. But then they do various further experiments to “spike the football.” For one thing, they show that when they do allow postselected measurement outcomes, the decrease in the effective error rate can be much much larger, as large as 800x. That allows them (again, under postselection!) to demonstrate up to two rounds of error syndrome extraction and correction while still seeing a net gain, or three rounds albeit with unclear gain. The other thing they demonstrate is teleportation of fault-tolerant qubits—so, a little fancier than just preparing an encoded Bell pair and then measuring it.

They don’t try to do (e.g.) a quantum supremacy demonstration with their encoded qubits, like Harvard/QuEra did—they don’t have nearly enough qubits for that. But this is already extremely cool, and it sets a new bar in quantum error-correction experiments for others to meet or exceed (superconducting, neutral atom, and photonics people, that means you!). And I wasn’t expecting it! Indeed, I’m so far behind the times that I still imagined Microsoft as committed to a strategy of “topological qubits or bust.” While Microsoft is still pursuing the topological approach, their strategy has clearly pivoted over the last few years towards “whatever works.”

Anyway, huge congratulations to the teams at Microsoft and Quantinuum for their accomplishment!

Stepping back, what is the state of experimental quantum computing, 42 years after Feynman’s lecture, 30 years after Shor’s algorithm, 25 years after I entered the field, 5 years after Google’s supremacy experiment? There’s one narrative that quantum computing is already being used to solve practical problems that couldn’t be solved otherwise (look at all the hundreds of startups! they couldn’t possibly exist without providing real value, could they?). Then there’s another narrative that quantum computing has been exposed as a fraud, an impossibility, a pipe dream. Both narratives seem utterly disconnected from the reality on the ground.

If you want to track the experimental reality, my one-sentence piece of advice would be to focus relentlessly on the fidelity with which experimenters can apply a single physical 2-qubit gate. When I entered the field in the late 1990s, ~50% woud’ve been an impressive fidelity. At some point it became ~90%. With Google’s supremacy experiment in 2019, we saw 1000 gates applied to 53 qubits, each gate with ~99.5% fidelity. Now, in superconducting, trapped ions, and neutral atoms alike, we’re routinely seeing ~99.8% fidelities, which is what made possible (for example) the new Microsoft/Quantinuum result. The best fidelities I’ve heard reported this year are more like ~99.9%.

Meanwhile, on paper, it looks like known methods for quantum fault-tolerance, for example using the surface code, should start to become practical once you have 2-qubit fidelities around ~99.99%—i.e., one more “9” from where we are now. And then there should “merely” be the practical difficulty of maintaining that 99.99% fidelity while you scale up to millions or hundreds of millions of physical qubits!

What I’m trying to say is: this looks a pretty good trajectory! It looks like, if we plot the infidelity on a log scale, the experimentalists have already gone three-quarters of the distance. It now looks like it would be a surprise if we couldn’t have hundreds of fault-tolerant qubits and millions of gates on them within the next decade, if we really wanted that—like something unexpected would have to go wrong to prevent it.

Wouldn’t be ironic if all that was true, but it will simply matter much less than we hoped in the 1990s? Either just because the set of problems for which a quantum computing is useful has remained stubbornly more specialized than the world wants it to be (for more on that, see the entire past 20 years of this blog) … or because advances in classical AI render what was always quantum computing’s most important killer app, to the simulation of quantum chemistry and materials, increasingly superfluous (as AlphaFold may have already done for protein folding) … or simply because civilization descends further into barbarism, or the unaligned AGIs start taking over, and we all have bigger things to worry about than fault-tolerant quantum computing.

But, you know, maybe fault-tolerant quantum computing will not only work, but matter—and its use to design better batteries and drugs and photovoltaic cells and so on will pass from science-fiction fantasy to quotidian reality so quickly that much of the world (weary from the hypesters crying wolf too many times?) will barely even notice it when it finally happens, just like what we saw with Large Language Models a few years ago. That would be worth getting out of bed for.

April 04, 2024

Tommaso DorigoSignificance Of Counting Experiments With Background Uncertainty

In the course of Statistics for Data Analysis I give every spring to PhD students in Physics I spend some time discussing the apparently trivial problem of evaluating the significance of an excess of observed events N over expected background B. 

This is a quite common setup in many searches in Physics and Astrophysics: you have some detection apparatus that records the number of phenomena of a specified kind, and you let it run for some time, whereafter you declare that you have observed N of them. If the occurrence of each phenomenon has equal probability and they do not influence one another, that number N is understood to be sampled from a Poisson distribution of mean B. 

read more

April 03, 2024

Scott Aaronson Open Letter to Anti-Zionists on Twitter

Dear Twitter Anti-Zionists,

For five months, ever since Oct. 7, I’ve read you obsessively. While my current job is supposed to involve protecting humanity from the dangers of AI (with a side of quantum computing theory), I’m ashamed to say that half the days I don’t do any science; instead I just scroll and scroll, reading anti-Israel content and then pro-Israel content and then more anti-Israel content. I thought refusing to post on Twitter would save me from wasting my life there as so many others have, but apparently it doesn’t, not anymore. (No, I won’t call it “X.”)

At the high end of the spectrum, I religiously check the tweets of Paul Graham, a personal hero and inspiration to me ever since he wrote Why Nerds Are Unpopular twenty years ago, and a man with whom I seem to resonate deeply on every important topic except for two: Zionism and functional programming. At the low end, I’ve read hundreds of the seemingly infinite army of Tweeters who post images of hook-nosed rats with black hats and sidecurls and dollar signs in their eyes, sneering as they strangle the earth and stab Palestinian babies. I study their detailed theories about why the October 7 pogrom never happened, and also it was secretly masterminded by Israel just to create an excuse to mass-murder Palestinians, and also it was justified and thrilling (exactly the same melange long ago embraced for the Holocaust).

I’m aware, of course, that the bottom-feeders make life too easy for me, and that a single Paul Graham who endorses the anti-Zionist cause ought to bother me more than a billion sharers of hook-nosed rat memes. And he does. That’s why, in this letter, I’ll try to stay at the higher levels of Graham’s Disagreement Hierarchy.

More to the point, though, why have I spent so much time on such a depressing, unproductive reading project?

Damned if I know. But it’s less surprising when you recall that, outside theoretical computer science, I’m (alas) mostly known to the world for having once confessed, in a discussion deep in the comment section of this blog, that I spent much of my youth obsessively studying radical feminist literature. I explained that I did that because my wish, for a decade, was to confront progressivism’s highest moral authorities on sex and relationships, and make them tell me either that

(1) I, personally, deserved to die celibate and unloved, as a gross white male semi-autistic STEM nerd and stunted emotional and aesthetic cripple, or else
(2) no, I was a decent human being who didn’t deserve that.

One way or the other, I sought a truthful answer, one that emerged organically from the reigning morality of our time and that wasn’t just an unprincipled exception to it. And I felt ready to pursue progressive journalists and activists and bloggers and humanities professors to the ends of the earth before I’d let them leave this one question hanging menacingly over everything they’d ever written, with (I thought) my only shot at happiness in life hinging on their answer to it.

You might call this my central character flaw: this need for clarity from others about the moral foundations of my own existence. I’m self-aware enough to know that it is a severe flaw, but alas, that doesn’t mean that I ever figured out how to fix it.

It’s been exactly the same way with the anti-Zionists since October 7. Every day I read them, searching for one thing and one thing only: their own answer to the “Jewish Question.” How would they ensure that the significant fraction of the world that yearns to murder all Jews doesn’t get its wish in the 21st century, as to a staggering extent it did in the 20th? I confess to caring about that question, partly (of course) because of the accident of having been born a Jew, and having an Israeli wife and family in Israel and so forth, but also because, even if I’d happened to be a Gentile, the continued survival of the world’s Jews would still seem remarkably bound up with science, Enlightenment, minority rights, liberal democracy, meritocracy, and everything else I’ve ever cared about.

I understand the charges against me. Namely: that if I don’t call for Israel to lay down its arms right now in its war against Hamas (and ideally: to dissolve itself entirely), then I’m a genocidal monster on the wrong side of history. That I value Jewish lives more than Palestinian lives. That I’m a hasbara apologist for the IDF’s mass-murder and apartheid and stealing of land. That if images of children in Gaza with their limbs blown off, or dead in their parents arms, or clawing for bread, don’t cause to admit that Israel is evil, then I’m just as evil as the Israelis are.

Unsurprisingly I contest the charges. As a father of two, I can no longer see any images of child suffering without thinking about my own kids. For all my supposed psychological abnormality, the part of me that’s horrified by such images seems to be in working order. If you want to change my mind, rather than showing me more such images, you’ll need to target the cognitive part of me: the part that asks why so many children are suffering, and what causal levers we’d need to push to reach a place where neither side’s children ever have to suffer like this ever again.

At risk of stating the obvious: my first-order model is that Hamas, with the diabolical brilliance of a Marvel villain, successfully contrived a situation where Israel could prevent the further massacring of its own population only by fighting a gruesome urban war, of a kind that always, anywhere in the world, kills tens of thousands of civilians. Hamas, of course, was helped in this plan by an ideology that considers martyrdom the highest possible calling for the innocents who it rules ruthlessly and hides underneath. But Hamas also understood that the images of civilian carnage would (rightly!) shock the consciences of Israel’s Western allies and many Israelis themselves, thereby forcing a ceasefire before the war was over, thereby giving Hamas the opportunity to regroup and, with God’s and of course Iran’s help, finally finish the job of killing all Jews another day.

And this is key: once you remember why Hamas launched this war and what its long-term goals are, every detail of Twitter’s case against Israel has to be reexamined in a new light. Take starvation, for example. Clearly the only explanation for why Israelis would let Gazan children starve is the malice in their hearts? Well, until you think through the logistical challenges of feeding 2.3 million starving people whose sole governing authority is interested only in painting the streets red with Jewish blood. Should we let that authority commandeer the flour and water for its fighters, while innocents continue to starve? No? Then how about UNRWA? Alas, we learned that UNRWA, packed with employees who cheered the Oct. 7 massacre in their Telegram channels and in some cases took part in the murders themselves, capitulates to Hamas so quickly that it effectively is Hamas. So then Israel should distribute the food itself! But as we’ve dramatically witnessed, Israel can’t distribute food without imposing order, which would seem to mean reoccupying Gaza and earning the world’s condemnation for it. Do you start to appreciate the difficulty of the problem—and why the Biden administration was pushed to absurd-sounding extremes like air-dropping food and then building a floating port?

It all seems so much easier, once you remove the constraint of not empowering Hamas in its openly-announced goal of completing the Holocaust. And hence, removing that constraint is precisely what the global left does.

For all that, by Israeli standards I’m firmly in the anti-Netanyahu, left-wing peace camp—exactly where I’ve been since the 1990s, as a teenager mourning the murder of Rabin. And I hope even the anti-Israel side might agree with me that, if all the suffering since Oct. 7 has created a tiny opening for peace, then walking through that opening depends on two things happening:

  1. the removal of Netanyahu, and
  2. the removal of Hamas.

The good news is that Netanyahu, the catastrophically failed “Protector of Israel,” not only can, but plausibly will (if enough government ministers show some backbone), soon be removed in a democratic election.

Hamas, by contrast, hasn’t allowed a single election since it took power in 2006, in a process notable for its opponents being thrown from the roofs of tall buildings. That’s why even my left-leaning Israeli colleagues—the ones who despise Netanyahu, who marched against him last year—support Israel’s current war. They support it because, even if the Israeli PM were Fred Rogers, how can you ever get to peace without removing Hamas, and how can you remove Hamas except by war, any more than you could cut a deal with Nazi Germany?

I want to see the IDF do more to protect Gazan civilians—despite my bitter awareness of survey data suggesting that many of those civilians would murder my children in front of me if they ever got a chance. Maybe I’d be the same way if I’d been marinated since birth in an ideology of Jew-killing, and blocked from other sources of information. I’m heartened by the fact that despite this, indeed despite the risk to their lives for speaking out, a full 15% of Gazans openly disapprove of the Oct. 7 massacre. I want a solution where that 15% becomes 95% with the passing of generations. My endgame is peaceful coexistence.

But to the anti-Zionists I say: I don’t even mind you calling me a baby-eating monster, provided you honestly field one question. Namely:

Suppose the Palestinian side got everything you wanted for it; then what would be your plan for the survival of Israel’s Jews?

Let’s assume that not only has Netanyahu lost the next election in a landslide, but is justly spending the rest of his life in Israeli prison. Waving my wand, I’ve made you Prime Minister in his stead, with an overwhelming majority in the Knesset. You now get to go down in history as the liberator of Palestine. But you’re now also in charge of protecting Israel’s 7 million Jews (and 2 million other residents) from near-immediate slaughter at the hands of those who you’ve liberated.

Granted, it seems pretty paranoid to expect such a slaughter! Or rather: it would seem paranoid, if the Palestinians’ Grand Mufti (progenitor of the Muslim Brotherhood and hence Hamas) hadn’t allied himself with Hitler in WWII, enthusiastically supported the Nazi Final Solution, and tried to export it to Palestine; if in 1947 the Palestinians hadn’t rejected the UN’s two-state solution (the one Israel agreed to) and instead launched another war to exterminate the Jews (a war they lost); if they hadn’t joined the quest to exterminate the Jews a third time in 1967; etc., or if all this hadn’t happened back before there were any settlements or occupation, when the only question on the table was Israel’s existence. It would seem paranoid if Arafat had chosen a two-state solution when Israel offered it to him at Camp David, rather than suicide bombings. It would seem paranoid if not for the candies passed out in the streets in celebration on October 7.

But if someone has a whole ideology, which they teach their children and from which they’ve never really wavered for a century, about how murdering you is a religious honor, and also they’ve actually tried to murder you at every opportunity—-what more do you want them to do, before you’ll believe them?

So, you tell me your plan for how to protect Israel’s 7 million Jews from extermination at the hands of neighbors who have their extermination—my family’s extermination—as their central political goal, and who had that as their goal long before there was any occupation of the West Bank or Gaza. Tell me how to do it while protecting Palestinian innocents. And tell me your fallback plan if your first plan turns out not to work.

We can go through the main options.


Maybe your plan is that Israel should unilaterally dismantle West Bank settlements, recognize a Palestinian state, and retreat to the 1967 borders.

This is an honorable plan. It was my preferred plan—until the horror of October 7, and then the even greater horror of the worldwide left reacting to that horror by sharing celebratory images of paragliders, and by tearing down posters of kidnapped Jewish children.

Today, you might say October 7 has sort of put a giant flaming-red exclamation point on what’s always been the central risk of unilateral withdrawal. Namely: what happens if, afterward, rather than building a peaceful state on their side of the border, the Palestinian leadership chooses instead to launch a new Iran-backed war on Israel—one that, given the West Bank’s proximity to Israel’s main population centers, makes October 7 look like a pillow fight?

If that happens, will you admit that the hated Zionists were right and you were wrong all along, that this was never about settlements but always, only about Israel’s existence? Will you then agree that Israel has a moral prerogative to invade the West Bank, to occupy and pacify it as the Allies did Germany and Japan after World War II? Can I get this in writing from you, right now? Or, following the future (October 7)2 launched from a Judenfrei West Bank, will your creativity once again set to work constructing a reason to blame Israel for its own invasion—because you never actually wanted a two-state solution at all, but only Israel’s dismantlement?


So, what about a two-state solution negotiated between the parties? Israel would uproot all West Bank settlements that prevent a Palestinian state, and resettle half a million Jews in pre-1967 Israel—in exchange for the Palestinians renouncing their goal of ending Israel’s existence, via a “right of return” or any other euphemism.

If so: congratulations, your “anti-Zionism” now seems barely distinguishable from my “Zionism”! If they made me the Prime Minister of Israel, and put you in charge of the Palestinians, I feel optimistic that you and I could reach a deal in an hour and then go out for hummus and babaganoush.


In my experience, in the rare cases they deign to address the question directly, most anti-Zionists advocate a “secular, binational state” between the Jordan and Mediterranean, with equal rights for all inhabitants. Certainly, that would make sense if you believe that Israel is an apartheid state just like South Africa.

To me, though, this analogy falls apart on a single question: who’s the Palestinian Nelson Mandela? Who’s the Palestinian leader who’s ever said to the Jews, “end your Jewish state so that we can live together in peace,” rather than “end your Jewish state so that we can end your existence”? To impose a binational state would be to impose something, not only that Israelis regard as an existential horror, but that most Palestinians have never wanted either.

But, suppose we do it anyway. We place 7 million Jews, almost half the Jews who remain on Earth, into a binational state where perhaps a third of their fellow citizens hold the theological belief that all Jews should be exterminated, and that a heavenly reward follows martyrdom in blowing up Jews. The exterminationists don’t quite have a majority, but they’re the second-largest voting bloc. Do you predict that the exterminationists will give up their genocidal ambition because of new political circumstances that finally put their ambition within reach? If October-7 style pogroms against Jews turn out to be a regular occurrence in our secular binational state, how will its government respond—like the Palestinian Authority? like UNRWA? like the British Mandate? like Tsarist Russia?

In such a case, perhaps the Jews (along with those Arabs and Bedouins and Druze and others who cast their lot with the Jews) would need form a country-within-a-country: their own little autonomous zone within the binational state, with its own defense force. But of course, such a country-within-a-country already formed, for pretty much this exact reason. It’s called Israel. A cycle has been detected in your arc of progress.


We come now to the anti-Zionists who are plainspoken enough to say: Israel’s creation was a grave mistake, and that mistake must now be reversed.

This is a natural option for anyone who sees Israel as an “illegitimate settler-colonial project,” like British India or French Algeria, but who isn’t quite ready to call for another Jewish genocide.

Again, the analogy runs into obvious problems: Israelis would seem to be the first “settler-colonialists” in the history of the world who not only were indigenous to the land they colonized, as much as anyone was, but who weren’t colonizing on behalf of any mother country, and who have no obvious such country to which they can return.

Some say spitefully: then let the Jews go back to Poland. These people might be unaware that, precisely because of how thorough the Holocaust was, more Israeli Jews trace their ancestry to Muslim countries than to Europe. Is there to be a “right of return” to Egypt, Iraq, Morocco, and Yemen, for all the Jews forcibly expelled from those places and for their children and grandchildren?

Others, however, talk about evacuating the Jews from Israel with goodness in their hearts. They say: we’d love the Israelis’ economic dynamism here in Austin or Sydney or Oxfordshire, joining their many coreligionists who already call these places home. What’s more, they’ll be safer here—who wants to live with missiles raining down on their neighborhood? Maybe we could even set aside some acres in Montana for a new Jewish homeland.

Again, if this is your survival plan, I’m a billion times happier to discuss it openly than to have it as unstated subtext!

Except, maybe you could say a little more about the logistics. Who will finance the move? How confident are you that the target country will accept millions of defeated, desperate Jews, as no country on earth was the last time this question arose?

I realize it’s no longer the 1930s, and Israel now has friends, most famously in America. But—what’s a good analogy here? I’ve met various Silicon Valley gazillionaires. I expect that I could raise millions from them, right now, if I got them excited about a new project in quantum computing or AI or whatever. But I doubt I could raise a penny from them if I came to them begging for their pity or their charity.

Likewise: for all the anti-Zionists’ loudness, a solid majority of Americans continue to support Israel (which, incidentally, provides a much simpler explanation than the hook-nosed perfidy of AIPAC for why Congress and the President mostly support it). But it seems to me that Americans support Israel in the “exciting project” sense, rather than in the “charity” sense. They like that Israelis are plucky underdogs who made the deserts bloom, and built a thriving tech industry, and now produce hit shows like Shtisel and Fauda, and take the fight against a common foe to the latter’s doorstep, and maintain one of the birthplaces of Western civilization for tourists and Christian pilgrims, and restarted the riveting drama of the Bible after a 2000-year hiatus, which some believe is a crucial prerequisite to the Second Coming.

What’s important, for present purposes, is not whether you agree with any of these rationales, but simply that none of them translate into a reason to accept millions of Jewish refugees.

But if you think dismantling Israel and relocating its seven million Jews is a workable plan—OK then, are you doing anything to make that more than a thought experiment, as the Zionists did a century ago with their survival plan? Have even I done more to implement your plan than you have, by causing one Israeli (my wife) to move to the US?

Suppose you say it’s not your job to give me a survival plan for Israel’s Jews. Suppose you say the request is offensive, an attempt to distract from the suffering of the Palestinians, so you change the subject.

In that case, fine, but you can now take off your cloak of righteousness, your pretense of standing above me and judging me from the end of history. Your refusal to answer the question amounts to a confession that, for you, the goal of “a free Palestine from the river to the sea” doesn’t actually require the physical survival of Israel’s Jews.

Which means, we’ve now established what you are. I won’t give you the satisfaction of calling you a Nazi or an antisemite. Thousands of years before those concepts existed, Jews already had terms for you. The terms tended toward a liturgical register, as in “those who rise up in every generation to destroy us.” The whole point of all the best-known Jewish holidays, like Purim yesterday, is to talk about those wicked would-be destroyers in the past tense, with the very presence of live Jews attesting to what the outcome was.

(Yesterday, I took my kids to a Purim carnival in Austin. Unlike in previous years, there were armed police everywhere. It felt almost like … visiting Israel.)

If you won’t answer the question, then it wasn’t Zionist Jews who told you that their choices are either to (1) oppose you or else (2) go up in black smoke like their grandparents did. You just told them that yourself.

Many will ask: why don’t I likewise have an obligation to give you my Palestinian survival plan?

I do. But the nice thing about my position is that I can tell you my Palestinian survival plan cheerfully, immediately, with zero equivocating or changing the subject. It’s broadly the same plan that David Ben-Gurion and Yitzchak Rabin and Ehud Barak and Bill Clinton and the UN put on the table over and over and over, only for the Palestinians’ leaders to sweep it off.

I want the Palestinians to have a state, comprising the West Bank and Gaza, with a capital in East Jerusalem. I want Israel to uproot all West Bank settlements that prevent such a state. I want this to happen the instant there arises a Palestinian leadership genuinely committed to peace—one that embraces liberal values and rejects martyr values, in everything from textbooks to street names.

And I want more. I want the new Palestinian state to be as prosperous and free and educated as modern Germany and Japan are. I want it to embrace women’s rights and LGBTQ+ rights and the rest of the modern package, so that “Queers for Palestine” would no longer be a sick joke. I want the new Palestine to be as intertwined with Israel, culturally and economically, as the US and Canada are.

Ironically, if this ever became a reality, then Israel-as-a-Jewish-state would no longer be needed—but it’s certainly needed in the meantime.

Anti-Zionists on Twitter: can you be equally explicit about what you want?

I come, finally, to what many anti-Zionists regard as their ultimate trump card. Look at all the anti-Zionist Jews and Israelis who agree with us, they say. Jewish Voice for Peace. IfNotNow. Noam Chomsky. Norman Finkelstein. The Neturei Karta.

Intellectually, of course, the fact of anti-Zionist Jews makes not the slightest difference to anything. My question for them remains exactly the same as for anti-Zionist Gentiles: what is your Jewish survival plan, for the day after we dismantle the racist supremacist apartheid state that’s currently the only thing standing between half the world’s remaining Jews and their slaughter by their neighbors? Feel free to choose from any of the four options above, or suggest a fifth.

But in the event that Jewish anti-Zionists evade that conversation, or change the subject from it, maybe some special words are in order. You know the famous Golda Meir line, “If we have to choose between being dead and pitied and being alive with a bad image, we’d rather be alive and have the bad image”?

It seems to me that many anti-Zionist Jews considered Golda Meir’s question carefully and honestly, and simply decided it the other way, in favor of Jews being dead and pitied.

Bear with me here: I won’t treat this as a reductio ad absurdum of their position. Not even if the anti-Zionist Jews themselves wish to remain safely ensconced in Berkeley or New Haven, while the Israelis fulfill the “dead and pitied” part for them.

In fact, I’ll go further. Again and again in life I’ve been seized by a dark thought: if half the world’s Jews can only be kept alive, today, via a militarized ethnostate that constantly needs to defend its existence with machine guns and missiles, racking up civilian deaths and destabilizing the world’s geopolitics—if, to put a fine point on it, there are 16 million Jews in the world, but at least a half billion antisemites who wake up every morning and go to sleep every night desperately wishing those Jews dead—then, from a crude utilitarian standpoint, might it not be better for the world if we Jews vanished after all?

Remember, I’m someone who spent a decade asking myself whether the rapacious, predatory nature of men’s sexual desire for women, which I experienced as a curse and an affliction, meant that the only moral course for me was to spend my life as a celibate mathematical monk. But I kept stumbling over one point: why should such a moral obligation fall on me alone? Why doesn’t it fall on other straight men, particularly the ones who presume to lecture me on my failings?

And also: supposing I did take the celibate monk route, would even that satisfy my haters? Would they come after me anyway for glancing at a woman too long or making an inappropriate joke? And also: would the haters soon say I shouldn’t have my scientific career either, since I’ve stolen my coveted academic position from the underprivileged? Where exactly does my self-sacrifice end?

When I did, finally, start approaching women and asking them out on dates, I worked up the courage partly by telling myself: I am now going to do the Zionist thing. I said: if other nerdy Jews can risk death in war, then this nerdy Jew can risk ridicule and contemptuous stares. You can accept that half the world will denounce you as a monster for living your life, so long as your own conscience (and, hopefully, the people you respect the most) continue to assure you that you’re nothing of the kind.

This took more than a decade of internal struggle, but it’s where I ended up. And today, if anyone tells me I had no business ever forming any romantic attachments, I have two beautiful children as my reply. I can say: forget about me, you’re asking for my children never to have existed—that’s why I’m confident you’re wrong.

Likewise with the anti-Zionists. When the Twitter-warriors share their memes of hook-nosed Jews strangling the planet, innocent Palestinian blood dripping from their knives, when the global protests shut down schools and universities and bridges and parliament buildings, there’s a part of me that feels eager to commit suicide if only it would appease the mob, if only it would expiate all the cosmic guilt they’ve loaded onto my shoulders.

But then I remember that this isn’t just about me. It’s about Einstein and Spinoza and Feynman and Erdös and von Neumann and Weinberg and Landau and Michelson and Rabi and Tarski and Asimov and Sagan and Salk and Noether and Meitner, and Irving Berlin and Stan Lee and Rodney Dangerfield and Steven Spielberg. Even if I didn’t happen to be born Jewish—if I had anything like my current values, I’d still think that so much of what’s worth preserving in human civilization, so much of math and science and Enlightenment and democracy and humor, would seem oddly bound up with the continued survival of this tiny people. And conversely, I’d think that so much of what’s hateful in civilization would seem oddly bound up with the quest to exterminate this tiny people, or to deny it any means to defend itself from extermination.

So that’s my answer, both to anti-Zionist Gentiles and to anti-Zionist Jews. The problem of Jewish survival, on a planet much of which yearns for the Jews’ annihilation and much of the rest of which is indifferent, is both hard and important, like P versus NP. And so a radical solution was called for. The solution arrived at a century ago, at once brand-new and older than Homer and Hesiod, was called the State of Israel. If you can’t stomach that solution—if, in particular, you can’t stomach the violence needed to preserve it, so long as Israel’s neighbors retain their annihilationist dream—then your response ought to be to propose a better solution. I promise to consider your solution in good faith—asking, just like with P vs. NP provers, how you overcome the problems that doomed all previous attempts. But if you throw my demand for a better solution back in my face, then you might as well be pushing my kids into a gas chamber yourself, for all the moral authority that I now recognize you to have over me.

Possibly the last thing Einstein wrote was a speech celebrating Israel’s 7th Independence Day, which he died a week before he was to deliver. So let’s turn the floor over to Mr. Albert, the leftist pacifist internationalist:

This is the seventh anniversary of the establishment of the State of Israel. The establishment of this State was internationally approved and recognised largely for the purpose of rescuing the remnant of the Jewish people from unspeakable horrors of persecution and oppression.

Thus, the establishment of Israel is an event which actively engages the conscience of this generation. It is, therefore, a bitter paradox to find that a State which was destined to be a shelter for a martyred people is itself threatened by grave dangers to its own security. The universal conscience cannot be indifferent to such peril.

It is anomalous that world opinion should only criticize Israel’s response to hostility and should not actively seek to bring an end to the Arab hostility which is the root cause of the tension.

I love Einstein’s use of “anomalous,” as if this were a physics problem. From the standpoint of history, what’s anomalous about the Israeli-Palestinian conflict is not, as the Twitterers claim, the brutality of the Israelis—if you think that’s anomalous, you really haven’t studied history—but something different. In other times and places, an entity like Palestine, which launches a war of total annihilation against a much stronger neighbor, and then another and another, would soon disappear from the annals of history. Israel, however, is held to a different standard. Again and again, bowing to international pressure and pressure from its own left flank, the Israelis have let their would-be exterminators off the hook, bruised but mostly still alive and completely unrepentant, to have another go at finishing the Holocaust in a few years. And after every bout, sadly but understandably, Israeli culture drifts more to the right, becomes 10% more like the other side always was.

I don’t want Israel to drift to the right. I find the values of Theodor Herzl and David Ben-Gurion to be almost as good as any human values have ever been, and I’d like Israel to keep them. Of course, Israel will need to continue defending itself from genocidal neighbors, until the day that a leader arises among the Palestinians with the moral courage of Egypt’s Anwar Sadat or Jordan’s King Hussein: a leader who not only talks peace but means it. Then there can be peace, and an end of settlements in the West Bank, and an independent Palestinian state. And however much like dark comedy that seems right now, I’m actually optimistic that it will someday happen, conceivably even soon depending on what happens in the current war. Unless nuclear war or climate change or AI apocalypse makes the whole question moot.

Anyway, thanks for reading—a lot built up these past months that I needed to get off my chest. When I told a friend that I was working on this post, he replied “I agree with you about Israel, of course, but I choose not to die on that hill in public.” I answered that I’ve already died on that hill and on several other hills, yet am somehow still alive!

Meanwhile, I was gratified that other friends, even ones who strongly disagree with me about Israel, told me that I should not disengage, but continue to tell it like I see it, trying civilly to change minds while being open to having my own mind changed.

And now, maybe, I can at last go back to happier topics, like how to prevent the destruction of the world by AI.


April 02, 2024

Terence TaoAI Mathematical Olympiad – Progress Prize Competition now open

The first progress prize competition for the AI Mathematical Olympiad has now launched. (Disclosure: I am on the advisory committee for the prize.) This is a competition in which contestants submit an AI model which, after the submissions deadline on June 27, will be tested (on a fixed computational resource, without internet access) on a set of 50 “private” test math problems, each of which has an answer as an integer between 0 and 999. Prior to the close of submission, the models can be tested on 50 “public” test math problems (where the results of the model are public, but not the problems themselves), as well as 10 training problems that are available to all contestants. As of this time of writing, the leaderboard shows that the best-performing model has solved 4 out of 50 of the questions (a standard benchmark, Gemma 7B, had previously solved 3 out of 50). A total of $2^{20} ($1.048 million) has been allocated for various prizes associated to this competition. More detailed rules can be found here.

Jordan EllenbergOrioles 13, Angels 4

I had the great privilege to be present at Camden Yards last weekend for what I believe to be the severest ass-whupping I have ever personally seen the Orioles administer. The Orioles went into the 6th winning 3-1 but the game felt like they were winning by more than that. Then suddenly they actually were — nine batters, nine runs, no outs (though in the middle of it all there was an easy double-play ball by Ramon Urias that the Angels’ shortstop Zach Neto just inexplicably dropped — it was that kind of day.) We had pitching (Grayson Rodriguez almost unhittable for six innings but for one mistake pitch), defense (Urias snagging a line drive at third almost before I saw it leave the bat) and of course a three-run homer, by Anthony Santander, to plate the 7th, 8th, and 9th of those nine runs.

Is being an Angels fan the saddest kind of fan to be right now? The Mets and the Padres, you have more of a “we spent all the money and built what should have been a superteam and didn’t win.” The A’s, you have the embarrassment of the on-field performance and the fact that your owner screwed your city and moved the team out of town. But the Angels? Somehow they just put together the two generational talents of this era of baseball and — didn’t do anything with them. There’s a certain heaviness to the sadness.

As good as the Orioles have been so far, taking three out of their first four and massively outscoring the opposition, I still think they weren’t really a 101-win team last year, and everything will have to go right again for them to be as good this year as they were last year. Our Felix Bautista replacement, Craig Kimbrel, has already blown his first and only save opportunity, which is to say he’s not really a Felix Bautista replacement. But it’s a hell of a team to watch.

The only downside — Gunnar Henderson, with a single, a triple and a home run already, is set to lead off the ninth but Hyde brings in Tony Kemp to pinch hit. Why? The fans want to see Gunnar on second for the cycle, let the fans see Gunnar on second for the cycle.

March 30, 2024

Andrew JaffeThe Milky Way

Doug NatelsonThoughts on undergrad solid-state content

Figuring out what to include in an undergraduate introduction to solid-state physics course is always a challenge.   Books like the present incarnation of Kittel are overstuffed with more content than can readily fit in a one-semester course, and because that book has grown organically from edition to edition, it's organizationally not the most pedagogical.  I'm a big fan of and have been teaching from my friend Steve Simon's Oxford Solid State Basics, which is great but a bit short for a (US) one-semester class.  Prof. Simon is interested in collecting opinions on what other topics would be good to include in a hypothetical second edition or second volume, and we thought that crowdsourcing it to this blog's readership could be fun.  As food for thought, some possibilities that occurred to me were:

  • A slightly longer discussion of field-effect transistors, since they're the basis for so much modern technology
  • A chapter or two on materials of reduced dimensionality (2D electron gas, 1D quantum wires, quantum point contacts, quantum dots; graphene and other 2D materials)
  • A discussion of fermiology (Shubnikov-DeHaas, DeHaas-van Alphen) - this is in Kittel, but it's difficult to explain in an accessible way
  • An introduction to the quantum Hall effect
  • Some mention of topology (anomalous velocity?  Berry connection?)
  • An intro to superconductivity (though without second quantization and the gap equation, this ends up being phenomenology)
  • Some discussion of Ginzburg-Landau treatment of phase transitions (though I tend to think of that as a topic for a statistical/thermal physics course)
  • An intro to Fermi liquid theory
  • Some additional discussion of electronic structure methods beyond the tight binding and nearly-free electron approaches in the present book (Wannier functions, an intro to density functional theory)
What do people think about this?

March 29, 2024

Matt von HippelGeneralizing a Black Box Theory

In physics and in machine learning, we have different ways of thinking about models.

A model in physics, like the Standard Model, is a tool to make predictions. Using statistics and a whole lot of data (from particle physics experiments), we fix the model’s free parameters (like the mass of the Higgs boson). The model then lets us predict what we’ll see next: when we turn on the Large Hadron Collider, what will the data look like? In physics, when a model works well, we think that model is true, that it describes the real way the world works. The Standard Model isn’t the ultimate truth: we expect that a better model exists that makes better predictions. But it is still true, in an in-between kind of way. There really are Higgs bosons, even if they’re a result of some more mysterious process underneath, just like there really are atoms, even if they’re made out of protons, neutrons, and electrons.

A model in machine learning, like the Large Language Model that fuels ChatGPT, is also a tool to make predictions. Using statistics and a whole lot of data (from text on the internet, or images, or databases of proteins, or games of chess…) we fix the model’s free parameters (called weights, numbers for the strengths of connections between metaphorical neurons). The model then lets us predict what we’ll see next: when a text begins “Q: How do I report a stolen card? A:”, how does it end?

So far, that sounds a lot like physics. But in machine learning, we don’t generally think these models are true, at least not in the same way. The thing producing language isn’t really a neural network like a Large Language Model. It’s the sum of many human brains, many internet users, spread over many different circumstances. Each brain might be sort of like a neural network, but they’re not like the neural networks sitting on OpenAI’s servers. A Large Language Model isn’t true in some in-between kind of way, like atoms or Higgs bosons. It just isn’t true. It’s a black box, a machine that makes predictions, and nothing more.

But here’s the rub: what do we mean by true?

I want to be a pragmatist here. I don’t want to get stuck in a philosophical rabbit-hole, arguing with metaphysicists about what “really exists”. A true theory should be one that makes good predictions, that lets each of us know, based on our actions, what we should expect to see. That’s why science leads to technology, why governments and companies pay people to do it: because the truth lets us know what will happen, and make better choices. So if Large Language Models and the Standard Model both make good predictions, why is only one of them true?

Recently, I saw Dan Elton of More is Different make the point that there is a practical reason to prefer the “true” explanations: they generalize. A Large Language Model might predict what words come next in a text. But it doesn’t predict what happens when you crack someone’s brain open and see how the neurons connect to each other, even if that person is the one who made the text. A good explanation, a true model, can be used elsewhere. The Standard Model tells you what data from the Large Hadron Collider will look like, but it also tells you what data from the muon g-2 experiment will look like. It also, in principle, tells you things far away from particle physics: what stars look like, what atoms look like, what the inside of a nuclear reactor looks like. A black box can’t do that, even if it makes great predictions.

It’s a good point. But thinking about it, I realized things are a little murkier.

You can’t generalize a Large Language Model to tell you how human neurons are connected. But you can generalize it in other ways, and people do. There’s a huge industry in trying to figure out what GPT and its relatives “know”. How much math can they do? How much do they know about geography? Can they predict the future?

These generalizations don’t work the way that they do in physics, or the rest of science, though. When we generalize the Standard Model, we aren’t taking a machine that makes particle physics predictions and trying to see what those particle physics predictions can tell us. We’re taking something “inside” the machine, the fields and particles, and generalizing that, seeing how the things around us could be made of those fields and those particles. In contrast, when people generalize GPT, they typically don’t look inside the “black box”. They use the Large Language Model to make predictions, and see what those predictions “know about”.

On the other hand, we do sometimes generalize scientific models that way too.

If you’re simulating the climate, or a baby star, or a colony of bacteria, you typically aren’t using your simulation like a prediction machine. You don’t plug in exactly what is going on in reality, then ask what happens next. Instead, you run many simulations with different conditions, and look for patterns. You see how a cloud of sulfur might cool down the Earth, or how baby stars often form in groups, leading them to grow up into systems of orbiting black holes. Your simulation is kind of like a black box, one that you try out in different ways until you uncover some explainable principle, something your simulation “knows” that you can generalize.

And isn’t nature that kind of black box, too? When we do an experiment, aren’t we just doing what the Large Language Models are doing, prompting the black box in different ways to get an idea of what it knows? Are scientists who do experiments that picky about finding out what’s “really going on”, or do they just want a model that works?

We want our models to be general, and to be usable. Building a black box can’t be the whole story, because a black box, by itself, isn’t general. But it can certainly be part of the story. Going from the black box of nature to the black box of a machine lets you run tests you couldn’t previously do, lets you investigate faster and ask stranger questions. With a simulation, you can blow up stars. With a Large Language Model, you can ask, for a million social media comments, whether the average internet user would call them positive or negative. And if you make sure to generalize, and try to make better decisions, then it won’t be just the machine learning. You’ll be learning too.

March 27, 2024

John BaezT Corona Borealis


Sometime this year, the star T Corona Borealis will go nova and become much brighter! At least that’s what a lot of astronomers think. So examine the sky between Arcturus and Vega now—and look again if you hear this event has happened. Normally this star is magnitude 10, too dim to see. When it goes nova is should reach magnitude 2 for a week—as bright as the North Star. So you will see a new star, which is the original meaning of ‘nova’.

But why do they think T Corona Borealis will go nova this year? How could they possibly know that?

It’s done this before. It’s a binary star with a white dwarf orbiting a red giant. The red giant is spewing out gas. The much denser white dwarf collects some of this gas on its surface until there’s enough fuel to cause a runaway thermonuclear reaction—a nova!

We’ve seen it happen twice. T Corona Borealis went nova on May 12, 1866 and again on February 9, 1946. What’s happening now is a lot like what happened in 1946.

In February 2015, there was a sustained brightening of T Corona Borealis: it went from magnitude 10.5 to about 9.2. The same thing happened eight years before it went nova the last time.

In June 2018, the star dimmed slightly but still remained at an unusually high level of activity. Then in April 2023 it dimmed to magnitude 12.3. The same thing happened one year before it went nova the last time.

If this pattern continues, T Corona Borealis should erupt sometime between now and September 2024. I’m not completely confident that it will follow the same pattern! But we can just wait and see.

This is one of only 5 known repeating novas in the Milky Way, so we’re lucky to have this chance.

Here’s how it might work:

The description at NASA’s blog:

A red giant star and white dwarf orbit each other in this animation of a nova. The red giant is a large sphere in shades of red, orange, and white, with the side facing the white dwarf the lightest shades. The white dwarf is hidden in a bright glow of white and yellows, which represent an accretion disk around the star. A stream of material, shown as a diffuse cloud of red, flows from the red giant to the white dwarf. The animation opens with the red giant on the right side of the screen, co-orbiting the white dwarf. When the red giant moves behind the white dwarf, a nova explosion on the white dwarf ignites, filling the screen with white light. After the light fades, a ball of ejected nova material is shown in pale orange. A small white spot remains after the fog of material clears, indicating that the white dwarf has survived the explosion.

For more details, try this:

• B. E. Schaefer, B. Kloppenborg, E. O. Waagen and the AAVSO observers, Announcing T CrB pre-eruption dip, AAVSO News and Announcements.

March 25, 2024

John PreskillMy experimental adventures in quantum thermodynamics

Imagine a billiard ball bouncing around on a pool table. High-school level physics enables us to predict its motion until the end of time using simple equations for energy and momentum conservation, as long as you know the initial conditions – how fast the ball is moving at launch, and in which direction.

What if you add a second ball? This makes things more complicated, but predicting the future state of this system would still be possible based on the same principles. What about if you had a thousand balls, or a million? Technically, you could still apply the same equations, but the problem would not be tractable in any practical sense.

Billiard balls bouncing around on a pool table are a good analogy for a many-body system like a gas of molecules. Image credit

Thermodynamics lets us make precise predictions about averaged (over all the particles) properties of complicated, many-body systems, like millions of billiard balls or atoms bouncing around, without needing to know the gory details. We can make these predictions by introducing the notion of probabilities. Even though the system is deterministic – we can in principle calculate the exact motion of every ball – there are so many balls in this system, that the properties of the whole will be very close to the average properties of the balls. If you throw a six-sided die, the result is in principle deterministic and predictable, based on the way you throw it, but it’s in practice completely random to you – it could be 1 through 6, equally likely. But you know that if you cast a thousand dice, the average will be close to 3.5 – the average of all possibilities. Statistical physics enables us to calculate a probability distribution over the energies of the balls, which tells us everything about the average properties of the system. And because of entropy – the tendency for the system to go from ordered to disordered configurations, even if the probability distribution of the initial system is far from the one statistical physics predicts, after the system is allowed to bounce around and settle, this final distribution will be extremely close to a generic distribution that depends on average properties only. We call this the thermal distribution, and the process of the system mixing and settling to one of the most likely configurations – thermalization.

For a practical example – instead of billiard balls, consider a gas of air molecules bouncing around. The average energy of this gas is proportional to its temperature, which we can calculate from the probability distribution of energies. Being able to predict the temperature of a gas is useful for practical things like weather forecasting, cooling your home efficiently, or building an engine. The important properties of the initial state we needed to know – energy and number of particles – are conserved during the evolution, and we call them “thermodynamic charges”. They don’t actually need to be electric charges, although it is a good example of something that’s conserved.

Let’s cross from the classical world – balls bouncing around – to the quantum one, which deals with elementary particles that can be entangled, or in a superposition. What changes when we introduce this complexity? Do systems even thermalize in the quantum world? Because of the above differences, we cannot in principle be sure that the mixing and settling of the system will happen just like in the classical cases of balls or gas molecules colliding.

A visualization of a complex pattern called a quantum scar that can develop in quantum systems. Image credit

It turns out that we can predict the thermal state of a quantum system using very similar principles and equations that let us do this in the classical case. Well, with one exception – what if we cannot simultaneously measure our critical quantities – the charges?

One of the quirks of quantum mechanics is that observing the state of the system can change it. Before the observation, the system might be in a quantum superposition of many states. After the observation, a definite classical value will be recorded on our instrument – we say that the system has collapsed to this state, and thus changed its state. There are certain observables that are mutually incompatible – we cannot know their values simultaneously, because observing one definite value collapses the system to a state in which the other observable is in a superposition. We call these observables noncommuting, because the order of observation matters – unlike in multiplication of numbers, which is a commuting operation you’re familiar with. 2 * 3 = 6, and also 3 * 2 = 6 – the order of multiplication doesn’t matter.

Electron spin is a common example that entails noncommutation. In a simplified picture, we can think of spin as an axis of rotation of our electron in 3D space. Note that the electron doesn’t actually rotate in space, but it is a useful analogy – the property is “spin” for a reason. We can measure the spin along the x-,y-, or z-axis of a 3D coordinate system and obtain a definite positive or negative value, but this observation will result in a complete loss of information about spin in the other two perpendicular directions.

An illustration of electron spin. We can imagine it as an axis in 3D space that points in a particular direction. Image from Wikimedia Commons.

If we investigate a system that conserves the three spin components independently, we will be in a situation where the three conserved charges do not commute. We call them “non-Abelian” charges, because they enjoy a non-Abelian, that is, noncommuting, algebra. Will such a system thermalize, and if so, to what kind of final state?

This is precisely what we set out to investigate. Noncommutation of charges breaks usual derivations of the thermal state, but researchers have managed to show that with non-Abelian charges, a subtly different non-Abelian thermal state (NATS) should emerge. Myself and Nicole Yunger Halpern at the Joint Center for Quantum Information and Computer Science (QuICS) at the University of Maryland have collaborated with Amir Kalev from the Information Sciences Institute (ISI) at the University of Southern California, and experimentalists from the University of Innsbruck (Florian Kranzl, Manoj Joshi, Rainer Blatt and Christian Roos) to observe thermalization in a non-Abelian system – and we’ve recently published this work in PRX Quantum .

The experimentalists used a device that can trap ions with electric fields, as well as manipulate and read out their states using lasers. Only select energy levels of these ions are used, which effectively makes them behave like electrons. The laser field can couple the ions in a way that approximates the Heisenberg Hamiltonian – an interaction that conserves the three total spin components individually. We thus construct the quantum system we want to study – multiple particles coupled with interactions that conserve noncommuting charges.

We conceptually divide the ions into a system of interest and an environment. The system of interest, which consists of two particles, is what we want to measure and compare to theoretical predictions. Meanwhile, the other ions act as the effective environment for our pair of ions – the environment ions interact with the pair in a way that simulates a large bath exchanging heat and spin.

Photo of our University of Maryland group. From left to right: Twesh Upadhyaya, Billy Braasch, Shayan Majidy, Nicole Yunger Halpern, Aleks Lasek, Jose Antonio Guzman, Anthony Munson.

If we start this total system in some initial state, and let it evolve under our engineered interaction for a long enough time, we can then measure the final state of the system of interest. To make the NATS distinguishable from the usual thermal state, I designed an initial state that is easy to prepare, and has the ions pointing in directions that result in high charge averages and relatively low temperature. High charge averages make the noncommuting nature of the charges more pronounced, and low temperature makes the state easy to distinguish from the thermal background. However, we also show that our experiment works for a variety of more-arbitrary states.

We let the system evolve from this initial state for as long as possible given experimental limitations, which was 15 ms. The experimentalists then used quantum state tomography to reconstruct the state of the system of interest. Quantum state tomography makes multiple measurements over many experimental runs to approximate the average quantum state of the system measured. We then check how close the measured state is to the NATS. We have found that it’s about as close as one can expect in this experiment!

And we know this because we have also implemented a different coupling scheme, one that doesn’t have non-Abelian charges. The expected thermal state in the latter case was reached within a distance that’s a little smaller than our non-Abelian case. This tells us that the NATS is almost reached in our experiment, and so it is a good, and the best known, thermal state for the non-Abelian system – we have compared it to competitor thermal states.

Working with the experimentalists directly has been a new experience for me. While I was focused on the theory and analyzing the tomography results they obtained, they needed to figure out practical ways to realize what we asked of them. I feel like each group has learned a lot about the tasks of the other. I have become well acquainted with the trapped ion experiment and its capabilities and limitation. Overall, it has been great collaborating with the Austrian group.

Our result is exciting, as it’s the first experimental observation within the field of non-Abelian thermodynamics! This result was observed in a realistic, non-fine-tuned system that experiences non-negligible errors due to noise. So the system does thermalize after all. We have also demonstrated that the trapped ion experiment of our Austrian friends can be used to simulate interesting many-body quantum systems. With different settings and programming, other types of couplings can be simulated in different types of experiments.

The experiment also opened avenues for future work. The distance to the NATS was greater than the analogous distance to the Abelian system. This suggests that thermalization is inhibited by the noncommutation of charges, but more evidence is needed to justify this claim. In fact, our other recent paper in Physical Review B suggests the opposite!

As noncommutation is one of the core features that distinguishes classical and quantum physics, it is of great interest to unravel the fine differences non-Abelian charges can cause. But we also hope that this research can have practical uses. If thermalization is disrupted by noncommutation of charges, engineered systems featuring them could possibly be used to build quantum memory that is more robust, or maybe even reduce noise in quantum computers. We continue to explore noncommutation, looking for interesting effects that we can pin on it. I am currently working on verifying the workings of a hypothesis that explains when and why quantum systems thermalize internally.

Doug NatelsonItems of interest

The time since the APS meeting has been very busy, hence the lack of posting.  A few items of interest:

  • The present issue of Nature Physics has several articles about physics education that I really want to read. 
  • This past week we hosted N. Peter Armitage for a really fun colloquium "On Ising's Model of Magnetism" (a title that he acknowledged borrowing from Peierls).  In addition to some excellent science about spin chains, the talk included a lot of history of science about Ising that I hadn't known.  An interesting yet trivial tidbit: when he was in Germany and later Luxembourg, the pronunciation was "eeesing", while after emigrating to the US, he changed it to "eye-sing", so however you've been saying it to yourself, you're not wrong.  The fact that the Isings survived the war in Europe is amazing, given that he was a Jew in an occupied country.  Someone should write a biography....
  • When I participated in a DOD-related program 13 years ago, I had the privilege to meet General Al Gray, former commandant of the US Marine Corps.  He just passed away this week, and people had collected Grayisms (pdf), his takes on leadership and management.  I'm generally not a big fan of leadership guides and advice books, but this is good stuff, told concisely.
  • It took a while, but a Scientific American article that I wrote is now out in the April issue.
  • Integrating nitrogen-vacancy centers for magnetic field sensing directly into the diamond anvils seems like a great way to make progress on characterizing possible superconductivity in hydrides at high pressures.
  • Congratulations to Peter Woit on 20 (!!) years of blogging at Not Even Wrong.  

March 24, 2024

Tommaso DorigoThe Analogy: A Powerful Instrument For Physics Outreach

About a month ago I was contacted by a colleague who invited me to write a piece on the topic of science outreach for an electronic journal (Ithaca). I was happy to accept, but when I later pondered on what I would have liked to write, I could not help thinking back at a piece on the power and limits of the use of analogies in the explanation of physics, which I wrote 12 years ago as a proceedings paper for a conference themed on physics outreach in Torino. It dawned on me that although 12 years had gone by, my understanding of what constitutes good techniques for engagement of the public and for effective communication of scientific concepts had not widened very significantly. 

read more

March 22, 2024

Matt von HippelHow Subfields Grow

A commenter recently asked me about the different “tribes” in my sub-field. I’ve been working in an area called “amplitudeology”, where we try to find more efficient ways to make predictions (calculate “scattering amplitudes”) for particle physics and gravitational waves. I plan to do a longer post on the “tribes” of amplitudeology…but not this week.

This week, I’ve got a simpler goal. I want to talk about where these kinds of “tribes” come from, in general. A sub-field is a group of researchers focused on a particular idea, or a particular goal. How do those groups change over time? How do new sub-groups form? For the amplitudes fans in the audience, I’ll use amplitudeology examples to illustrate.

The first way subfields gain new tribes is by differentiation. Do a PhD or a Postdoc with someone in a subfield, and you’ll learn that subfield’s techniques. That’s valuable, but probably not enough to get you hired: if you’re just a copy of your advisor, then the field just needs your advisor: research doesn’t need to be done twice. You need to differentiate yourself, finding a variant of what your advisor does where you can excel. The most distinct such variants go on to form distinct tribes of their own. This can also happen for researchers at the same level who collaborate as Postdocs. Each has to show something new, beyond what they did as a team. In my sub-field, it’s the source of some of the bigger tribes. Lance Dixon, Zvi Bern, and David Kosower made their names working together, but when they found long-term positions they made new tribes of their own. Zvi Bern focused on supergravity, and later on gravitational waves, while Lance Dixon was a central figure in the symbology bootstrap.

(Of course, if you differentiate too far you end up in a different sub-field, or a different field altogether. Jared Kaplan was an amplitudeologist, but I wouldn’t call Anthropic an amplitudeology project, although it would help my job prospects if it was!)

The second way subfields gain new tribes is by bridges. Sometimes, a researcher in a sub-field needs to collaborate with someone outside of that sub-field. These collaborations can just be one-and-done, but sometimes they strike up a spark, and people in each sub-field start realizing they have a lot more in common than they realized. They start showing up to each other’s conferences, and eventually identifying as two tribes in a single sub-field. An example from amplitudeology is the group founded by Dirk Kreimer, with a long track record of interesting work on the boundary between math and physics. They didn’t start out interacting with the “amplitudeology” community itself, but over time they collaborated with them more and more, and now I think it’s fair to say they’re a central part of the sub-field.

A third way subfields gain new tribes is through newcomers. Sometimes, someone outside of a subfield will decide they have something to contribute. They’ll read up on the latest papers, learn the subfield’s techniques, and do something new with them: applying them to a new problem of their own interest, or applying their own methods to a problem in the subfield. Because these people bring something new, either in what they work on or how they do it, they often spin off new tribes. Many new tribes in amplitudeology have come from this process, from Edward Witten’s work on the twistor string bringing in twistor approaches to Nima Arkani-Hamed’s idiosyncratic goals and methods.

There are probably other ways subfields gain new tribes, but these are the ones I came up with. If you think of more, let me know in the comments!

March 18, 2024

Terence TaoTalks at the JMM

Earlier this year, I gave a series of lectures at the Joint Mathematics Meetings at San Francisco. I am uploading here the slides for these talks:

I also have written a text version of the first talk, which has been submitted to the Notices of the American Mathematical Society.

Terence TaoBounding sums or integrals of non-negative quantities

A common task in analysis is to obtain bounds on sums

\displaystyle  \sum_{n \in A} f(n)

or integrals

\displaystyle  \int_A f(x)\ dx

where {A} is some simple region (such as an interval) in one or more dimensions, and {f} is an explicit (and elementary) non-negative expression involving one or more variables (such as {n} or {x}, and possibly also some additional parameters. Often, one would be content with an order of magnitude upper bound such as

\displaystyle  \sum_{n \in A} f(n) \ll X


\displaystyle  \int_A f(x)\ dx \ll X

where we use {X \ll Y} (or {Y \gg X} or {X = O(Y)}) to denote the bound {|X| \leq CY} for some constant {C}; sometimes one wishes to also obtain the matching lower bound, thus obtaining

\displaystyle  \sum_{n \in A} f(n) \asymp X


\displaystyle  \int_A f(x)\ dx \asymp X

where {X \asymp Y} is synonymous with {X \ll Y \ll X}. Finally, one may wish to obtain a more precise bound, such as

\displaystyle  \sum_{n \in A} f(n) = (1+o(1)) X

where {o(1)} is a quantity that goes to zero as the parameters of the problem go to infinity (or some other limit). (For a deeper dive into asymptotic notation in general, see this previous blog post.)

Here are some typical examples of such estimation problems, drawn from recent questions on MathOverflow:

  • (i) (From this question) If {d,p \geq 1} and {a>d/p}, is the expression

    \displaystyle  \sum_{j \in {\bf Z}} 2^{(\frac{d}{p}+1-a)j} \int_0^\infty e^{-2^j s} \frac{s^a}{1+s^{2a}}\ ds

  • (ii) (From this question) If {h,m \geq 1}, how can one show that

    \displaystyle  \sum_{d=0}^\infty \frac{2d+1}{2h^2 (1 + \frac{d(d+1)}{h^2}) (1 + \frac{d(d+1)}{h^2m^2})^2} \ll 1 + \log(m^2)?

  • (iii) (From this question) Can one show that

    \displaystyle  \sum_{k=1}^{n-1} \frac{k^{2n-4k-3}(n^2-2nk+2k^2)}{(n-k)^{2n-4k-1}} = (c+o(1)) \sqrt{n}

    as {n \rightarrow \infty} for an explicit constant {c}, and what is this constant?

Compared to other estimation tasks, such as that of controlling oscillatory integrals, exponential sums, singular integrals, or expressions involving one or more unknown functions (that are only known to lie in some function spaces, such as an {L^p} space), high-dimensional geometry (or alternatively, large numbers of random variables), or number-theoretic structures (such as the primes), estimation of sums or integrals of non-negative elementary expressions is a relatively straightforward task, and can be accomplished by a variety of methods. The art of obtaining such estimates is typically not explicitly taught in textbooks, other than through some examples and exercises; it is typically picked up by analysts (or those working in adjacent areas, such as PDE, combinatorics, or theoretical computer science) as graduate students, while they work through their thesis or their first few papers in the subject.

Somewhat in the spirit of this previous post on analysis problem solving strategies, I am going to try here to collect some general principles and techniques that I have found useful for these sorts of problems. As with the previous post, I hope this will be something of a living document, and encourage others to add their own tips or suggestions in the comments.

— 1. Asymptotic arithmetic —

Asymptotic notation is designed so that many of the usual rules of algebra and inequality manipulation continue to hold, with the caveat that one has to be careful if subtraction or division is involved. For instance, if one knows that {A \ll X} and {B \ll Y}, then one can immediately conclude that {A + B \ll X+Y} and {AB \ll XY}, even if {A,B} are negative (note that the notation {A \ll X} or {B \ll Y} automatically forces {X,Y} to be non-negative). Equivalently, we have the rules

\displaystyle  O(X) + O(Y) = O(X+Y); \quad O(X) \cdot O(Y) = O(XY)

and more generally we have the triangle inequality

\displaystyle  \sum_\alpha O(X_\alpha) = O( \sum_\alpha X_\alpha ).

Again, we stress that this sort of rule implicitly requires the {X_\alpha} to be non-negative, and that claims such as {O(X) - O(Y) = O(X-Y)} and {O(X)/O(Y) = O(X/Y)} are simply false. As a rule of thumb, if your calculations have arrived at a situation where a signed or oscillating sum or integral appears inside the big-O notation, or on the right-hand side of an estimate, without being “protected” by absolute value signs, then you have probably made a serious error in your calculations.

Another rule of inequalities that is inherited by asymptotic notation is that if one has two bounds

\displaystyle  A \ll X; \quad A \ll Y \ \ \ \ \ (1)

for the same quantity {A}, then one can combine them into the unified asymptotic bound

\displaystyle  A \ll \min(X, Y). \ \ \ \ \ (2)

This is an example of a “free move”: a replacement of bounds that does not lose any of the strength of the original bounds, since of course (2) implies (1). In contrast, other ways to combine the two bounds (1), such as taking the geometric mean

\displaystyle  A \ll X^{1/2} Y^{1/2}, \ \ \ \ \ (3)

while often convenient, are not “free”: the bounds (1) imply the averaged bound (3), but the bound (3) does not imply (1). On the other hand, the inequality (2), while it does not concede any logical strength, can require more calculation to work with, often because one ends up splitting up cases such as {X \ll Y} and {X \gg Y} in order to simplify the minimum. So in practice, when trying to establish an estimate, one often starts with using conservative bounds such as (2) in order to maximize one’s chances of getting any proof (no matter how messy) of the desired estimate, and only after such a proof is found, one tries to look for more elegant approaches using less efficient bounds such as (3).

For instance, suppose one wanted to show that the sum

\displaystyle  \sum_{n=-\infty}^\infty \frac{2^n}{(1+n^2) (1+2^{2n})}

was convergent. Lower bounding the denominator term {1+2^{2n}} by {1} or by {2^{2n}}, one obtains the bounds

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{1+n^2} \ \ \ \ \ (4)

and also

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{2^{-n}}{1+n^2} \ \ \ \ \ (5)

so by applying (2) we obtain the unified bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{\min(2^n,2^{-n})}{1+n^2}.

To deal with this bound, we can split into the two contributions {n \geq 0}, where {2^{-n}} dominates, and {n < 0}, where {2^n} dominates. In the former case we see (from the ratio test, for instance) that the sum

\displaystyle  \sum_{n=0}^\infty \frac{2^{-n}}{1+n^2}

is absolutely convergent, and in the latter case we see that the sum

\displaystyle  \sum_{n=-\infty}^{-1} \frac{2^{n}}{1+n^2}

is also absolutely convergent, so the entire sum is absolutely convergent. But once one has this argument, one can try to streamline it, for instance by taking the geometric mean of (4), (5) rather than the minimum to obtain the weaker bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{1}{1+n^2} \ \ \ \ \ (6)

and now one can conclude without decomposition just by observing the absolute convergence of the doubly infinite sum {\sum_{n=-\infty}^\infty \frac{1}{1+n^2}}. This is a less “efficient” estimate, because one has conceded a lot of the decay in the summand by using (6) (the summand used to be exponentially decaying in {n}, but is now only polynomially decaying), but it is still sufficient for the purpose of establishing absolute convergence.

One of the key advantages of dealing with order of magnitude estimates, as opposed to sharp inequalities, is that the arithmetic becomes tropical. More explicitly, we have the important rule

\displaystyle  X + Y \asymp \max(X,Y)

whenever {X,Y} are non-negative, since we clearly have

\displaystyle  \max(X,Y) \leq X+Y \leq 2 \max(X,Y).

In particular, if {Y \leq X}, then {O(X) + O(Y) = O(X)}. That is to say, given two orders of magnitudes, any term {O(Y)} of equal or lower order to a “main term” {O(X)} can be discarded. This is a very useful rule to keep in mind when trying to estimate sums or integrals, as it allows one to discard many terms that are not contributing to the final answer. It also interacts well with monotone operations, such as raising to a power {p}; for instance, we have

\displaystyle  (X+Y)^p \asymp \max(X,Y)^p = \max(X^p,Y^p) \asymp X^p + Y^p

if {X,Y \geq 0} and {p} is a fixed positive constant, whilst

\displaystyle  \frac{1}{X+Y} \asymp \frac{1}{\max(X,Y)} = \min(\frac{1}{X}, \frac{1}{Y})

if {X,Y>0}. Finally, this relation also sets up the fundamental divide and conquer strategy for estimation: if one wants to prove a bound such as {A \ll X}, it will suffice to obtain a decomposition

\displaystyle  A = A_1 + \dots + A_k

or at least an upper bound

\displaystyle  A \ll A_1 + \dots + A_k

of {A} by some bounded number of components {A_1,\dots,A_k}, and establish the bounds {A_1 \ll X, \dots, A_k \ll X} separately. Typically the {A_1,\dots,A_k} will be (morally at least) smaller than the original quantity {A} – for instance, if {A} is a sum of non-negative quantities, each of the {A_i} might be a subsum of those same quantities – which means that such a decomposition is a “free move”, in the sense that it does not risk making the problem harder. (This is because, if the original bound {A \ll X} is to be true, each of the new objectives {A_1 \ll X, \dots, A_k \ll X} must also be true, and so the decomposition can only make the problem logically easier, not harder.) The only costs to such decomposition are that your proofs might be {k} times longer, as you may be repeating the same arguments {k} times, and that the implied constants in the {A_1 \ll X, \dots, A_k \ll X} bounds may be worse than the implied constant in the original {A \ll X} bound. However, in many cases these costs are well worth the benefits of being able to simplify the problem into smaller pieces. As mentioned above, once one successfully executes a divide and conquer strategy, one can go back and try to reduce the number of decompositions, for instance by unifying components that are treated by similar methods, or by replacing strong but unwieldy estimates with weaker, but more convenient estimates.

The above divide and conquer strategy does not directly apply when one is decomposing into an unbounded number of pieces {A_j}, {j=1,2,\dots}. In such cases, one needs an additional gain in the index {j} that is summable in {j} in order to conclude. For instance, if one wants to establish a bound of the form {A \ll X}, and one has located a decomposition or upper bound

\displaystyle  A \ll \sum_{j=1}^\infty A_j

that looks promising for the problem, then it would suffice to obtain exponentially decaying bounds such as

\displaystyle  A_j \ll 2^{-cj} X

for all {j \geq 1} and some constant {c>0}, since this would imply

\displaystyle  A \ll \sum_{j=1}^\infty 2^{-cj} X \ll X \ \ \ \ \ (7)

thanks to the geometric series formula. (Here it is important that the implied constants in the asymptotic notation are uniform on {j}; a {j}-dependent bound such as {A_j \ll_j 2^{-cj} X} would be useless for this application, as then the growth of the implied constant in {j} could overwhelm the exponential decay in the {2^{-cj}} factor). Exponential decay is in fact overkill; polynomial decay such as

\displaystyle  A_j \ll \frac{X}{j^{1+c}}

would already be sufficient, although harmonic decay such

\displaystyle  A_j \ll \frac{X}{j} \ \ \ \ \ (8)

is not quite enough (the sum {\sum_{j=1}^\infty \frac{1}{j}} diverges logarithmically), although in many such situations one could try to still salvage the bound by working a lot harder to squeeze some additional logarithmic factors out of one’s estimates. For instance, if one can improve (8) to

\displaystyle  A_j \ll \frac{X}{j \log^{1+c} j}

for all {j \geq 2} and some constant {c>0}, since (by the integral test) the sum {\sum_{j=2}^\infty \frac{1}{j\log^{1+c} j}} converges (and one can treat the {j=1} term separately if one already has (8)).

Sometimes, when trying to prove an estimate such as {A \ll X}, one has identified a promising decomposition with an unbounded number of terms

\displaystyle  A \ll \sum_{j=1}^J A_j

(where {J} is finite but unbounded) but is unsure of how to proceed next. Often the next thing to do is to study the extreme terms {A_1} and {A_J} of this decomposition, and first try to establish (the presumably simpler) tasks of showing that {A_1 \ll X} and {A_J \ll X}. Often once one does so, it becomes clear how to combine the treatments of the two extreme cases to also treat the intermediate cases, obtaining a bound {A_j \ll X} for each individual term, leading to the inferior bound {A \ll JX}; this can then be used as a starting point to hunt for additional gains, such as the exponential or polynomial gains mentioned previously, that could be used to remove this loss of {J}. (There are more advanced techniques, such as those based on controlling moments such as the square function {(\sum_{j=1}^J |A_j|^2)^{1/2}}, or trying to understand the precise circumstances in which a “large values” scenario {|A_j| \gg X} occurs, and how these scenarios interact with each other for different {j}, but these are beyond the scope of this post, as they are rarely needed when dealing with sums or integrals of elementary functions.)

If one is faced with the task of estimating a doubly infinite sum {\sum_{j=-\infty}^\infty A_j}, it can often be useful to first think about how one would proceed in estimating {A_j} when {j} is very large and positive, and how one would proceed when {A_j} is very large and negative. In many cases, one can simply decompose the sum into two pieces such as {\sum_{j=1}^\infty A_j} and {\sum_{j=-\infty}^{-1} A_j} and use whatever methods you came up with to handle the two extreme cases; in some cases one also needs a third argument to handle the case when {j} is of bounded (or somewhat bounded) size, in which case one may need to divide into three pieces such as {\sum_{j=J_+}^\infty A_j}, {\sum_{j=-\infty}^{J_-} A_j}, and {\sum_{j=J_-+1}^{J_+-1} A_j}. Sometimes there will be a natural candidate for the places {J_-, J_+} where one is cutting the sum, but in other situations it may be best to just leave these cut points as unspecified parameters initially, obtain bounds that depend on these parameters, and optimize at the end. (Typically, the optimization proceeds by trying to balance the magnitude of a term that is increasing with respect to a parameter, with one that is decreasing. For instance, if one ends up with a bound such as {A \lambda + B/\lambda} for some parameter {\lambda>0} and quantities {A,B>0}, it makes sense to select {\lambda = \sqrt{B/A}} to balance the two terms. Or, if faced with something like {A e^{-\lambda} + \lambda} for some {A > 2}, then something like {\lambda = \log A} would be close to the optimal choice of parameter. And so forth.)

— 1.1. Psychological distinctions between exact and asymptotic arithmetic —

The adoption of the “divide and conquer” strategy requires a certain mental shift from the “simplify, simplify” strategy that one is taught in high school algebra. In the latter strategy, one tries to collect terms in an expression make them as short as possible, for instance by working with a common denominator, with the idea that unified and elegant-looking expressions are “simpler” than sprawling expressions with many terms. In contrast, the divide and conquer strategy is intentionally extremely willing to greatly increase the total length of the expressions to be estimated, so long as each individual component of the expressions appears easier to estimate than the original one. Both strategies are still trying to reduce the original problem to a simpler problem (or collection of simpler sub-problems), but the metric by which one judges whether the problem has become simpler is rather different.

A related mental shift that one needs to adopt in analysis is to move away from the exact identities that are so prized in algebra (and in undergraduate calculus), as the precision they offer is often unnecessary and distracting for the task at hand, and often fail to generalize to more complicated contexts in which exact identities are no longer available. As a simple example, consider the task of estimating the expression

\displaystyle  \int_0^a \frac{dx}{1+x^2}

where {a > 0} is a parameter. With a trigonometric substitution, one can evaluate this expression exactly as {\mathrm{arctan}(a)}, however the presence of the arctangent can be inconvenient if one has to do further estimation tasks (for instance, if {a} depends in a complicated fashion on other parameters, which one then also wants to sum or integrate over). Instead, by observing the trivial bounds

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^a\ dx = a


\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^\infty\ \frac{dx}{1+x^2} = \frac{\pi}{2}

one can combine them using (2) to obtain the upper bound

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \min( a, \frac{\pi}{2} ) \asymp \min(a,1)

and similar arguments also give the matching lower bound, thus

\displaystyle  \int_0^a \frac{dx}{1+x^2} \asymp \min(a,1). \ \ \ \ \ (9)

This bound, while cruder than the exact answer of {\mathrm{arctan}(a)}, is often good enough for many applications (particularly in situations where one is willing to concede constants in the bounds), and can be more tractible to work with than the exact answer. Furthermore, these arguments can be adapted without difficulty to treat similar expressions such as

\displaystyle  \int_0^a \frac{dx}{(1+x^2)^\alpha}

for any fixed exponent {\alpha>0}, which need not have closed form exact expressions in terms of elementary functions such as the arctangent when {\alpha} is non-integer.

As a general rule, instead of relying exclusively on exact formulae, one should seek approximations that are valid up to the degree of precision that one seeks in the final estimate. For instance, suppose one one wishes to establish the bound

\displaystyle  \sec(x) - \cos(x) = x^2 + O(x^3)

for all sufficiently small {x}. If one was clinging to the exact identity mindset, one could try to look for some trigonometric identity to simplify the left-hand side exactly, but the quicker (and more robust) way to proceed is just to use Taylor expansion up to the specified accuracy {O(x^3)} to obtain

\displaystyle  \cos(x) = 1 - \frac{x^2}{2} + O(x^3)

which one can invert using the geometric series formula {(1-y)^{-1} = 1 + y + y^2 + \dots} to obtain

\displaystyle  \sec(x) = 1 + \frac{x^2}{2} + O(x^3)

from which the claim follows. (One could also have computed the Taylor expansion of {\sec(x)} by repeatedly differentiating the secant function, but as this is a series that is usually not memorized, this can take a little bit more time than just computing it directly to the required accuracy as indicated above.) Note that the notion of “specified accuracy” may have to be interpreted in a relative sense if one is planning to multiply or divide several estimates together. For instance, if one wishes to establsh the bound

\displaystyle  \sin(x) \cos(x) = x + O(x^3)

for small {x}, one needs an approximation

\displaystyle  \sin(x) = x + O(x^3)

to the sine function that is accurate to order {O(x^3)}, but one only needs an approximation

\displaystyle  \cos(x) = 1 + O(x^2)

to the cosine function that is accurate to order {O(x^2)}, because the cosine is to be multiplied by {\sin(x)= O(x)}. Here the key is to obtain estimates that have a relative error of {O(x^2)}, compared to the main term (which is {1} for cosine, and {x} for sine).

The following table lists some common approximations that can be used to simplify expressions when one is only interested in order of magnitude bounds (with {c>0} an arbitrary small constant):

The quantity… has magnitude comparable to … provided that…
{X+Y} {X} {0 \leq Y \ll X} or {|Y| \leq (1-c)X}
{X+Y} {\max(X,Y)} {X,Y \geq 0}
{\sin z}, {\tan z}, {e^{iz}-1} {|z|} {|z| \leq \frac{\pi}{2} - c}
{\cos z} {1} {|z| \leq \pi/2 - c}
{\sin x} {\mathrm{dist}(x, \pi {\bf Z})} {x} real
{e^{ix}-1} {\mathrm{dist}(x, 2\pi {\bf Z})} {x} real
{\mathrm{arcsin} x} {|x|} {|x| \leq 1-c}
{\log(1+z)} {|z|} {|z| \leq 1-c}
{e^z-1}, {\sinh z}, {\tanh z} {|z|} {|z| \leq \frac{\pi}{2}-c}
{\cosh z} {1} {|z| \leq \frac{\pi}{2}-c}
{\sinh x}, {\cosh x} {e^x} {|x| \gg 1}
{\tanh x} {\min(|x|, 1)} {x} real
{(1+x)^a-1} {a|x|} {a \gg 1}, {a |x| \ll 1}
{n!} {n^n e^{-n} \sqrt{n}} {n \geq 1}
{\Gamma(s)} {|s^s e^{-s}| / |s|^{1/2}} {|z| \gg 1}, {|\mathrm{arg} z| \leq \frac{\pi}{2} - c}
{\Gamma(\sigma+it)} {|t|^{\sigma-1/2} e^{-\pi |t|/2}} {\sigma = O(1)}, {|t| \gg 1}
{\binom{n}{m}} {e^{n (p \log \frac{1}{p} + (1-p) \log \frac{1}{1-p})} / n^{1/2}} {m=pn}, {c < p < 1-c}
{\binom{n}{m}} {2^n e^{-2(m-n/2)^2} / n^{1/2}} {m = n/2 + O(n^{2/3})}
{\binom{n}{m}} {n^m/m!} {m \ll \sqrt{n}}

On the other hand, some exact formulae are still very useful, particularly if the end result of that formula is clean and tractable to work with (as opposed to involving somewhat exotic functions such as the arctangent). The geometric series formula, for instance, is an extremely handy exact formula, so much so that it is often desirable to control summands by a geometric series purely to use this formula (we already saw an example of this in (7)). Exact integral identities, such as

\displaystyle  \frac{1}{a} = \int_0^\infty e^{-at}\ dt

or more generally

\displaystyle  \frac{\Gamma(s)}{a^s} = \int_0^\infty e^{-at} t^{s-1}\ dt

for {a,s>0} (where {\Gamma} is the Gamma function) are also quite commonly used, and fundamental exact integration rules such as the change of variables formula, the Fubini-Tonelli theorem or integration by parts are all esssential tools for an analyst trying to prove estimates. Because of this, it is often desirable to estimate a sum by an integral. The integral test is a classic example of this principle in action: a more quantitative versions of this test is the bound

\displaystyle  \int_{a}^{b+1} f(t)\ dt \leq \sum_{n=a}^b f(n) \leq \int_{a-1}^b f(t)\ dt \ \ \ \ \ (10)

whenever {a \leq b} are integers and {f: [a-1,b+1] \rightarrow {\bf R}} is monotone decreasing, or the closely related bound

\displaystyle  \sum_{a \leq n \leq b} f(n) = \int_a^b f(t)\ dt + O( |f(a)| + |f(b)| ) \ \ \ \ \ (11)

whenever {a \geq b} are reals and {f: [a,b] \rightarrow {\bf R}} is monotone (either increasing or decreasing); see Lemma 2 of this previous post. Such bounds allow one to switch back and forth quite easily between sums and integrals as long as the summand or integrand behaves in a mostly monotone fashion (for instance, if it is monotone increasing on one portion of the domain and monotone decreasing on the other). For more precision, one could turn to more advanced relationships between sums and integrals, such as the Euler-Maclaurin formula or the Poisson summation formula, but these are beyond the scope of this post.

Exercise 1 Suppose {f: {\bf R} \rightarrow {\bf R}^+} obeys the quasi-monotonicity property {f(x) \ll f(y)} whenever {y-1 \leq x \leq y}. Show that {\int_a^{b-1} f(t)\ dt \ll \sum_{n=a}^b f(n) \ll \int_a^{b+1} f(t)\ dt} for any integers {a < b}.

Exercise 2 Use (11) to obtain the “cheap Stirling approximation

\displaystyle  n! = \exp( n \log n - n + O(\log n) )

for any natural number {n \geq 2}. (Hint: take logarithms to convert the product {n! = 1 \times 2 \times \dots \times n} into a sum.)

With practice, you will be able to identify any term in a computation which is already “negligible” or “acceptable” in the sense that its contribution is always going to lead to an error that is smaller than the desired accuracy of the final estimate. One can then work “modulo” these negligible terms and discard them as soon as they appear. This can help remove a lot of clutter in one’s arguments. For instance, if one wishes to establish an asymptotic of the form

\displaystyle  A = X + O(Y)

for some main term {X} and lower order error {O(Y)}, any component of {A} that one can already identify to be of size {O(Y)} is negligible and can be removed from {A} “for free”. Conversely, it can be useful to add negligible terms to an expression, if it makes the expression easier to work with. For instance, suppose one wants to estimate the expression

\displaystyle  \sum_{n=1}^N \frac{1}{n^2}. \ \ \ \ \ (12)

This is a partial sum for the zeta function

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \zeta(2) = \frac{\pi^2}{6}

so it can make sense to add and subtract the tail {\sum_{n=N+1}^\infty \frac{1}{n^2}} to the expression (12) to rewrite it as

\displaystyle  \frac{\pi^2}{6} - \sum_{n=N+1}^\infty \frac{1}{n^2}.

To deal with the tail, we switch from a sum to the integral using (10) to bound

\displaystyle  \sum_{n=N+1}^\infty \frac{1}{n^2} \ll \int_N^\infty \frac{1}{t^2}\ dt = \frac{1}{N}

giving us the reasonably accurate bound

\displaystyle  \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} - O(\frac{1}{N}).

One can sharpen this approximation somewhat using (11) or the Euler–Maclaurin formula; we leave this to the interested reader.

Another psychological shift when switching from algebraic simplification problems to estimation problems is that one has to be prepared to let go of constraints in an expression that complicate the analysis. Suppose for instance we now wish to estimate the variant

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2}

of (12), where we are now restricting {n} to be square-free. An identity from analytic number theory (the Euler product identity) lets us calculate the exact sum

\displaystyle  \sum_{n \geq 1, \hbox{ square-free}} \frac{1}{n^2} = \frac{\zeta(2)}{\zeta(4)} = \frac{15}{\pi^2}

so as before we can write the desired expression as

\displaystyle  \frac{15}{\pi^2} - \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2}.

Previously, we applied the integral test (10), but this time we cannot do so, because the restriction to square-free integers destroys the monotonicity. But we can simply remove this restriction:

\displaystyle  \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2} \leq \sum_{n > N} \frac{1}{n^2}.

Heuristically at least, this move only “costs us a constant”, since a positive fraction ({1/\zeta(2)= 6/\pi^2}, in fact) of all integers are square-free. Now that this constraint has been removed, we can use the integral test as before and obtain the reasonably accurate asymptotic

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2} = \frac{15}{\pi^2} + O(\frac{1}{N}).

— 2. More on decomposition —

The way in which one decomposes a sum or integral such as {\sum_{n \in A} f(n)} or {\int_A f(x)\ dx} is often guided by the “geometry” of {f}, and in particular where {f} is large or small (or whether various component terms in {f} are large or small relative to each other). For instance, if {f(x)} comes close to a maximum at some point {x=x_0}, then it may make sense to decompose based on the distance {|x-x_0|} to {x_0}, or perhaps to treat the cases {x \leq x_0} and {x>x_0} separately. (Note that {x_0} does not literally have to be the maximum in order for this to be a reasonable decomposition; if it is in “within reasonable distance” of the maximum, this could still be a good move. As such, it is often not worthwhile to try to compute the maximum of {f} exactly, especially if this exact formula ends up being too complicated to be useful.)

If an expression involves a distance {|X-Y|} between two quantities {X,Y}, it is sometimes useful to split into the case {|X| \leq |Y|/2} where {X} is much smaller than {Y} (so that {|X-Y| \asymp |Y|}), the case {|Y| \leq |X|/2} where {Y} is much smaller than {X} (so that {|X-Y| \asymp |X|}), or the case when neither of the two previous cases apply (so that {|X| \asymp |Y|}). The factors of {2} here are not of critical importance; the point is that in each of these three cases, one has some hope of simplifying the expression into something more tractable. For instance, suppose one wants to estimate the expression

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \ \ \ \ \ (13)

in terms of the two real parameters {a, b}, which we will take to be distinct for sake of this discussion. This particular integral is simple enough that it can be evaluated exactly (for instance using contour integration techniques), but in the spirit of Principle 1, let us avoid doing so and instead try to decompose this expression into simpler pieces. A graph of the integrand reveals that it peaks when {x} is near {a} or near {b}. Inspired by this, one can decompose the region of integration into three pieces:

  • (i) The region where {|x-a| \leq \frac{|a-b|}{2}}.
  • (ii) The region where {|x-b| \leq \frac{|a-b|}{2}}.
  • (iii) The region where {|x-a|, |x-b| > \frac{|a-b|}{2}}.

(This is not the only way to cut up the integral, but it will suffice. Often there is no “canonical” or “elegant” way to perform the decomposition; one should just try to find a decomposition that is convenient for the problem at hand.)

The reason why we want to perform such a decomposition is that in each of the three cases, one can simplify how the integrand depends on {x}. For instance, in region (i), we see from the triangle inequality that {|x-b|} is now comparable to {|a-b|}, so that this contribution to (13) is comparable to

\displaystyle  \asymp \int_{|x-a| \leq |a-b|/2} \frac{dx}{(1+(x-a)^2) (1+(a-b)^2)}.

Using a variant of (9), this expression is comparable to

\displaystyle  \asymp \min( 1, |a-b|/2) \frac{1}{1+(a-b)^2} \asymp \frac{\min(1, |a-b|)}{1+(a-b)^2}. \ \ \ \ \ (14)

The contribution of region (ii) can be handled similarly, and is also comparable to (14). Finally, in region (iii), we see from the triangle inequality that {|x-a|, |x-b|} are now comparable to each other, and so the contribution of this region is comparable to

\displaystyle  \asymp \int_{|x-a|, |x-b| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

Now that we have centered the integral around {x=a}, we will discard the {|x-b| > |a-b|/2} constraint, upper bounding this integral by

\displaystyle  \asymp \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

On the one hand this integral is bounded by

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2)^2} = \int_{-\infty}^\infty \frac{dx}{(1+x^2)^2} \asymp 1

and on the other hand we can bound

\displaystyle  \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2} \leq \int_{|x-a| > |a-b|/2} \frac{dx}{(x-a)^4}

\displaystyle \asymp |a-b|^{-3}

and so we can bound the contribution of (iii) by {O( \min( 1, |a-b|^{-3} ))}. Putting all this together, and dividing into the cases {|a-b| \leq 1} and {|a-b| > 1}, one can soon obtain a total bound of {O(\min( 1, |a-b|^{-2}))} for the entire integral. One can also adapt this argument to show that this bound is sharp up to constants, thus

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \asymp \min( 1, |a-b|^{-2})

\displaystyle  \asymp \frac{1}{1+|a-b|^2}.

A powerful and common type of decomposition is dyadic decomposition. If the summand or integrand involves some quantity {Q} in a key way, it is often useful to break up into dyadic regions such as {2^{j-1} \leq Q < 2^{j}}, so that {Q \sim 2^j}, and then sum over {j}. (One can tweak the dyadic range {2^{j-1} \leq Q < 2^{j}} here with minor variants such as {2^{j} < Q \leq 2^{j+1}}, or replace the base {2} by some other base, but these modifications mostly have a minor aesthetic impact on the arguments at best.) For instance, one could break up a sum

\displaystyle  \sum_{n=1}^{\infty} f(n) \ \ \ \ \ (15)

into dyadic pieces

\displaystyle  \sum_{j=1}^\infty \sum_{2^{j-1} \leq n < 2^{j}} f(n)

and then seek to estimate each dyadic block {\sum_{2^{j-1} \leq n < 2^{j}} f(n)} separately (hoping to get some exponential or polynomial decay in {j}). The classical technique of Cauchy condensation is a basic example of this strategy. But one can also dyadically decompose other quantities than {n}. For instance one can perform a “vertical” dyadic decomposition (in contrast to the “horizontal” one just performed) by rewriting (15) as

\displaystyle  \sum_{k \in {\bf Z}} \sum_{n \geq 1: 2^{k-1} \leq f(n) < 2^k} f(n);

since the summand {f(n)} is {\asymp 2^k}, we may simplify this to

\displaystyle  \asymp \sum_{k \in {\bf Z}} 2^k \# \{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}.

This now converts the problem of estimating the sum (15) to the more combinatorial problem of estimating the size of the dyadic level sets {\{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}} for various {k}. In a similar spirit, we have

\displaystyle  \int_A f(x)\ dx \asymp \sum_{k \in {\bf Z}} 2^k | \{ x \in A: 2^{k-1} \leq f(x) < 2^k \}|

where {|E|} denotes the Lebesgue measure of a set {E}, and now we are faced with a geometric problem of estimating the measure of some explicit set. This allows one to use geometric intuition to solve the problem, instead of multivariable calculus:

Exercise 3 Let {S} be a smooth compact submanifold of {{\bf R}^d}. Establish the bound

\displaystyle  \int_{B(0,C)} \frac{dx}{\varepsilon^2 + \mathrm{dist}(x,S)^2} \ll \varepsilon^{-1}

for all {0 < \varepsilon < C}, where the implied constants are allowed to depend on {C, d, S}. (This can be accomplished either by a vertical dyadic decomposition, or a dyadic decomposition of the quantity {\mathrm{dist}(x,S)}.)

Exercise 4 Solve problem (ii) from the introduction to this post by dyadically decomposing in the {d} variable.

Remark 5 By such tools as (10), (11), or Exercise 1, one could convert the dyadic sums one obtains from dyadic decomposition into integral variants. However, if one wished, one could “cut out the middle-man” and work with continuous dyadic decompositions rather than discrete ones. Indeed, from the integral identity

\displaystyle  \int_0^\infty 1_{\lambda < Q \leq 2\lambda} \frac{d\lambda}{\lambda} = \log 2

for any {Q>0}, together with the Fubini–Tonelli theorem, we obtain the continuous dyadic decomposition

\displaystyle  \sum_{n \in A} f(n) = \int_0^\infty \sum_{n \in A: \lambda \leq Q(n) < 2\lambda} f(n)\ \frac{d\lambda}{\lambda}

for any quantity {Q(n)} that is positive whenever {f(n)} is positive. Similarly if we work with integrals {\int_A f(x)\ dx} rather than sums. This version of dyadic decomposition is occasionally a little more convenient to work with, particularly if one then wants to perform various changes of variables in the {\lambda} parameter which would be tricky to execute if this were a discrete variable.

— 3. Exponential weights —

Many sums involve expressions that are “exponentially large” or “exponentially small” in some parameter. A basic rule of thumb is that any quantity that is “exponentially small” will likely give a negligible contribution when compared against quantities that are not exponentially small. For instance, if an expression involves a term of the form {e^{-Q}} for some non-negative quantity {Q}, which can be bounded on at least one portion of the domain of summation or integration, then one expects the region where {Q} is bounded to provide the dominant contribution. For instance, if one wishes to estimate the integral

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x}

for some {0 < \varepsilon < 1/2}, this heuristic suggests that the dominant contribution should come from the region {x = O(1/\varepsilon)}, in which one can bound {e^{-\varepsilon x}} simply by {1} and obtain an upper bound of

\displaystyle  \ll \int_{x = O(1/\varepsilon)} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}.

To make such a heuristic precise, one can perform a dyadic decomposition in the exponential weight {e^{-\varepsilon x}}, or equivalently perform an additive decomposition in the exponent {\varepsilon x}, for instance writing

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} = \sum_{j=1}^\infty \int_{j-1 \leq \varepsilon x < j} e^{-\varepsilon x} \frac{dx}{1+x}.

Exercise 6 Use this decomposition to rigorously establish the bound

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}

for any {0 < \varepsilon < 1/2}.

Exercise 7 Solve problem (i) from the introduction to this post.

More generally, if one is working with a sum or integral such as

\displaystyle  \sum_{n \in A} e^{\phi(n)} \psi(n)


\displaystyle  \int_A e^{\phi(x)} \psi(x)\ dx

with some exponential weight {e^\phi} and a lower order amplitude {\psi}, then one typically expects the dominant contribution to come from the region where {\phi} comes close to attaining its maximal value. If this maximum is attained on the boundary, then one typically has geometric series behavior away from the boundary, and one can often get a good estimate by obtaining geometric series type behavior. For instance, suppose one wants to estimate the error function

\displaystyle  \mathrm{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-t^2}\ dt

for {z \geq 1}. In view of the complete integral

\displaystyle  \int_0^\infty e^{-t^2}\ dt = \frac{\sqrt{\pi}}{2}

we can rewrite this as

\displaystyle  \mathrm{erf}(z) = 1 - \frac{2}{\sqrt{\pi}} \int_z^\infty e^{-t^2}\ dt.

The exponential weight {e^{-t^2}} attains its maximum at the left endpoint {t=z} and decays quickly away from that endpoint. One could estimate this by dyadic decomposition of {e^{-t^2}} as discussed previously, but a slicker way to proceed here is to use the convexity of {t^2} to obtain a geometric series upper bound

\displaystyle  e^{-t^2} \leq e^{-z^2 - 2 z (t-z)}

for {t \geq z}, which on integration gives

\displaystyle  \int_z^\infty e^{-t^2}\ dt \leq \int_z^\infty e^{-z^2 - 2 z (t-z)}\ dt = \frac{e^{-z^2}}{2z}

giving the asymptotic

\displaystyle  \mathrm{erf}(z) = 1 - O( \frac{e^{-z^2}}{z})

for {z \geq 1}.

Exercise 8 In the converse direction, establish the upper bound

\displaystyle  \mathrm{erf}(z) \leq 1 - c \frac{e^{-z^2}}{z}

for some absolute constant {c>0} and all {z \geq 1}.

Exercise 9 If {\theta n \leq m \leq n} for some {1/2 < \theta < 1}, show that

\displaystyle  \sum_{k=m}^n \binom{n}{k} \ll \frac{1}{2\theta-1} \binom{n}{m}.

(Hint: estimate the ratio between consecutive binomial coefficients {\binom{n}{k}} and then control the sum by a geometric series).

When the maximum of the exponent {\phi} occurs in the interior of the region of summation or integration, then one can get good results by some version of Laplace’s method. For simplicity we will discuss this method in the context of one-dimensional integrals

\displaystyle  \int_a^b e^{\phi(x)} \psi(x)\ dx

where {\phi} attains a non-degenerate global maximum at some interior point {x = x_0}. The rule of thumb here is that

\displaystyle \int_a^b e^{\phi(x)} \psi(x)\ dx \approx \sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0).

The heuristic justification is as follows. The main contribution should be when {x} is close to {x_0}. Here we can perform a Taylor expansion

\displaystyle  \phi(x) \approx \phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2

since at a non-degenerate maximum we have {\phi'(x_0)=0} and {\phi''(x_0) > 0}. Also, if {\psi} is continuous, then {\psi(x) \approx \psi(x_0)} when {x} is close to {x_0}. Thus we should be able to estimate the above integral by the gaussian integral

\displaystyle  \int_{\bf R} e^{\phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2} \psi(x_0)\ dx

which can be computed to equal {\sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0)} as desired.

Let us illustrate how this argument can be made rigorous by considering the task of estimating the factorial {n!} of a large number. In contrast to what we did in Exercise 2, we will proceed using a version of Laplace’s method, relying on the integral representation

\displaystyle  n! = \Gamma(n+1) = \int_0^\infty x^n e^{-x}\ dx.

As {n} is large, we will consider {x^n} to be part of the exponential weight rather than the amplitude, writing this expression as

\displaystyle  \int_0^\infty e^{-\phi(x)}\ dx


\displaystyle  \phi(x) = x - n \log x.

The function {\phi} attains a global maximum at {x_0 = n}, with {\phi'(n) = 0} and {\phi''(n) = 1/n}. We will therefore decompose this integral into three pieces

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx + \int_{n-R}^{n+R} e^{-\phi(x)}\ dx + \int_{n+R}^\infty e^{-\phi(x)}\ dx \ \ \ \ \ (16)

where {0 < R < n} is a radius parameter which we will choose later, as it is not immediately obvious for now what the optimal value of this parameter is (although the previous heuristics do suggest that {R \approx 1 / |\phi''(x_0)|^{1/2}} might be a reasonable choice).

The main term is expected to be the middle term, so we shall use crude methods to bound the other two terms. For the first part where {0 < x \leq n-R}, {\phi} is increasing so we can crudely bound {e^{-\phi(x)} \leq e^{-\phi(n-R)}} and thus

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx \leq (n-R) e^{-\phi(n-R)} \leq n e^{-\phi(n-R)}.

(We expect {R} to be much smaller than {n}, so there is not much point to saving the tiny {-R} term in the {n-R} factor.) For the third part where {x \geq n+R}, {\phi} is decreasing, but bounding {e^{-\phi(x)}} by {e^{-\phi(n+R)}} would not work because of the unbounded nature of {x}; some additional decay is needed. Fortunately, we have a strict increase

\displaystyle  \phi'(x) = 1 - \frac{n}{x} \geq 1 - \frac{n}{n+R} = \frac{R}{n+R}

for {x \geq n+R}, so by the intermediate value theorem we have

\displaystyle  \phi(x) \geq \phi(n+R) + \frac{R}{n+R} (x-n-R)

and after a short calculation this gives

\displaystyle  \int_{n+R}^\infty e^{-\phi(x)}\ dx \leq \frac{n+R}{R} e^{-\phi(n+R)} \ll \frac{n}{R} e^{-\phi(n+R)}.

Now we turn to the important middle term. If we assume {R \leq n/2}, then we will have {\phi'''(x) = O( 1/n^2 )} in the region {n-R \leq x \leq n+R}, so by Taylor’s theorem with remainder

\displaystyle  \phi(x) = \phi(n) + \phi'(n) (x-n) + \frac{1}{2} \phi''(n) (x-n)^2 + O( \frac{|x-n|^3}{n^2} )

\displaystyle  = \phi(n) + \frac{(x-n)^2}{2n} + O( \frac{R^3}{n^2} ).

If we assume that {R = O(n^{2/3})}, then the error term is bounded and we can exponentiate to obtain

\displaystyle  e^{-\phi(x)} = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n) - \frac{(x-n)^2}{2n}} \ \ \ \ \ (17)

for {n-R \leq x \leq n+R} and hence

\displaystyle \int_{n-R}^{n+R} e^{-\phi(x)}\ dx = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n)} \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx.

If we also assume that {R \gg \sqrt{n}}, we can use the error function type estimates from before to estimate

\displaystyle  \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx = \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n} ).

Putting all this together, and using (17) to estimate {e^{-\phi(n \pm R)} \ll e^{-\phi(n) - \frac{R^2}{2n}}}, we conclude that

\displaystyle  n! = e^{-\phi(n)} ( (1 + O(\frac{R^3}{n^2})) \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n})

\displaystyle  + O( n e^{-R^2/2n} ) + O( \frac{n}{R} e^{-R^2/2n} ) )

\displaystyle  = e^{-n+n \log n} (\sqrt{2\pi n} + O( \frac{R^2}{n} + n e^{-R^2/2n} ))

so if we select {R=n^{2/3}} for instance, we obtain the Stirling approximation

\displaystyle  n! = \frac{n^n}{e^n} (\sqrt{2\pi n} + O( n^{1/3}) ).

One can improve the error term by a finer decomposition than (16); we leave this as an exercise to the interested reader.

Remark 10 It can be convenient to do some initial rescalings to this analysis to achieve a nice normalization; see this previous blog post for details.

Exercise 11 Solve problem (iii) from the introduction. (Hint: extract out the term {\frac{k^{2n-4k}}{(n-k)^{2n-4k}}} to write as the exponential factor {e^{\phi(k)}}, placing all the other terms (which are of polynomial size) in the amplitude function {\psi(k)}. The function {\phi} will then attain a maximum at {k=n/2}; perform a Taylor expansion and mimic the arguments above.)

John PreskillThe quantum gold rush

Even if you don’t recognize the name, you probably recognize the saguaro cactus. It’s the archetype of the cactus, a column from which protrude arms bent at right angles like elbows. As my husband pointed out, the cactus emoji is a saguaro: 🌵. In Tucson, Arizona, even the airport has a saguaro crop sufficient for staging a Western short film. I didn’t have a film to shoot, but the garden set the stage for another adventure: the ITAMP winter school on quantum thermodynamics.

Tucson airport

ITAMP is the Institute for Theoretical Atomic, Molecular, and Optical Physics (the Optical is silent). Harvard University and the Smithsonian Institute share ITAMP, where I worked as a postdoc. ITAMP hosted the first quantum-thermodynamics conference to take place on US soil, in 2017. Also, ITAMP hosts a winter school in Arizona every February. (If you lived in the Boston area, you might want to escape to the southwest then, too.) The winter school’s topic varies from year to year. 

How about a winter school on quantum thermodynamics? ITAMP’s director, Hossein Sadeghpour, asked me when I visited Cambridge, Massachusetts last spring.

Let’s do it, I said. 

Lecturers came from near and far. Kanu Sinha, of the University of Arizona, spoke about how electric charges fluctuate in the quantum vacuum. Fluctuations feature also in extensions of the second law of thermodynamics, which helps explain why time flows in only one direction. Gabriel Landi, from the University of Rochester, lectured about these fluctuation relations. ITAMP Postdoctoral Fellow Ceren Dag explained why many-particle quantum systems register time’s arrow. Ferdinand Schmidt-Kaler described the many-particle quantum systems—the trapped ions—in his lab at the University of Mainz.

Ronnie Kosloff, of Hebrew University in Jerusalem, lectured about quantum engines. Nelly Ng, an Assistant Professor at Nanyang Technological University, has featured on Quantum Frontiers at least three times. She described resource theories—information-theoretic models—for thermodynamics. Information and energy both serve as resources in thermodynamics and computation, I explained in my lectures.

The 2024 ITAMP winter school

The winter school took place at the conference center adjacent to Biosphere 2. Biosphere 2 is an enclosure that contains several miniature climate zones, including a coastal fog desert, a rainforest, and an ocean. You might have heard of Biosphere 2 due to two experiments staged there during the 1990s: in each experiment, a group of people was sealed in the enclosure. The experimentalists harvested their own food and weren’t supposed to receive any matter from outside. The first experiment lasted for two years. The group, though, ran out of oxygen, which a support crew pumped in. Research at Biosphere 2 contributes to our understanding of ecosystems and space colonization.

Fascinating as the landscape inside Biosphere 2 is, so is the landscape outside. The winter school included an afternoon hike, and my husband and I explored the territory around the enclosure.

Did you see any snakes? my best friend asked after I returned home.

No, I said. But we were chased by a vicious beast. 

On our first afternoon, my husband and I followed an overgrown path away from the biosphere to an almost deserted-looking cluster of buildings. We eventually encountered what looked like a warehouse from which noises were emanating. Outside hung a sign with which I resonated.

Scientists, I thought. Indeed, a researcher emerged from the warehouse and described his work to us. His group was preparing to seal off a building where they were simulating a Martian environment. He also warned us about the territory we were about to enter, especially the creature that roosted there. We were too curious to retreat, though, so we set off into a ghost town.

At least, that’s what the other winter-school participants called the area, later in the week—a ghost town. My husband and I had already surveyed the administrative offices, conference center, and other buildings used by biosphere personnel today. Personnel in the 1980s used a different set of buildings. I don’t know why one site gave way to the other. But the old buildings survive—as what passes for ancient ruins to many Americans. 

Weeds have grown up in the cracks in an old parking lot’s tarmac. A sign outside one door says, “Classroom”; below it is a sign that must not have been correct in decades: “Class in progress.” Through the glass doors of the old visitors’ center, we glimpsed cushioned benches and what appeared to be a diorama exhibit; outside, feathers and bird droppings covered the ground. I searched for a tumbleweed emoji, to illustrate the atmosphere, but found only a tumbler one: 🥃.

After exploring, my husband and I rested in the shade of an empty building, drank some of the water we’d brought, and turned around. We began retracing our steps past the defunct visitors’ center. Suddenly, a monstrous Presence loomed on our right. 

I can’t tell you how large it was; I only glimpsed it before turning and firmly not running away. But the Presence loomed. And it confirmed what I’d guessed upon finding the feathers and droppings earlier: the old visitors’ center now served as the Lair of the Beast.

The Mars researcher had warned us about the aggressive male turkey who ruled the ghost town. The turkey, the researcher had said, hated men—especially men wearing blue. My husband, naturally, was wearing a blue shirt. You might be able to outrun him, the researcher added pensively.

My husband zipped up his black jacket over the blue shirt. I advised him to walk confidently and not too quickly. Hikes in bear country, as well as summers at Busch Gardens Zoo Camp, gave me the impression that we mustn’t run; the turkey would probably chase us, get riled up, and excite himself to violence. So we walked, and the monstrous turkey escorted us. For surprisingly and frighteningly many minutes. 

The turkey kept scolding us in monosyllabic squawks, which sounded increasingly close to the back of my head. I didn’t turn around to look, but he sounded inches away. I occasionally responded in the soothing voice I was taught to use on horses. But my husband and I marched increasingly quickly.

We left the old visitors’ center, curved around, and climbed most of a hill before ceasing to threaten the turkey—or before he ceased to threaten us. He squawked a final warning and fell back. My husband and I found ourselves amid the guest houses of workshops past, shaky but unmolested. Not that the turkey wreaks much violence, according to the Mars researcher: at most, he beats his wings against people and scratches up their cars (especially blue ones). But we were relieved to return to civilization.

Afternoon hike at Catalina State Park, a drive away from Biosphere 2. (Yes, that’s a KITP hat.)

The ITAMP winter school reminded me of Roughing It, a Mark Twain book I finished this year. Twain chronicled the adventures he’d experienced out West during the 1860s. The Gold Rush, he wrote, attracted the top young men of all nations. The quantum-technologies gold rush has been attracting the top young people of all nations, and the winter school evidenced their eagerness. Yet the winter school also evidenced how many women have risen to the top: 10 of the 24 registrants were women, as were four of the seven lecturers.1 

The winter-school participants in the shuttle I rode from the Tucson airport to Biosphere 2

We’ll see to what extent the quantum-technologies gold rush plays out like Mark Twain’s. Ours at least involves a ghost town and ferocious southwestern critters.

1For reference, when I applied to graduate programs, I was told that approximately 20% of physics PhD students nationwide were women. The percentage of women drops as one progresses up the academic chain to postdocs and then to faculty members. And primarily PhD students and postdocs registered for the winter school.

March 16, 2024

David Hoggsubmitted!

OMG I actually just submitted an actual paper, with me as first author. I submitted to the AAS Journals, with a preference for The Astronomical Journal. I don't write all that many first-author papers, so I am stoked about this. If you want to read it: It should come out on arXiv within days, or if you want to type pdflatex a few times, it is available at this GitHub repo. It is about how to combine many shifted images into one combined, mean image.

John PreskillNoncommuting charges are much like Batman

The Noncommuting-Charges World Tour Part 2 of 4

This is the second part in a four part series covering the recent Perspective on noncommuting charges. I’ll be posting one part every 6 weeks leading up to my PhD thesis defence. You can find part 1 here.

Understanding a character’s origins enriches their narrative and motivates their actions. Take Batman as an example: without knowing his backstory, he appears merely as a billionaire who might achieve more by donating his wealth rather than masquerading as a bat to combat crime. However, with the context of his tragic past, Batman transforms into a symbol designed to instill fear in the hearts of criminals. Another example involves noncommuting charges. Without understanding their origins, the question “What happens when charges don’t commute?” might appear contrived or simply devised to occupy quantum information theorists and thermodynamicists. However, understanding the context of their emergence, we find that numerous established results unravel, for various reasons, in the face of noncommuting charges. In this light, noncommuting charges are much like Batman; their backstory adds to their intrigue and clarifies their motivation. Admittedly, noncommuting charges come with fewer costumes, outside the occasional steampunk top hat my advisor Nicole Yunger Halpern might sport.

Growing up, television was my constant companion. Of all the shows I’d get lost in, ‘Batman: The Animated Series’ stands the test of time. I highly recommend giving it a watch.

In the early works I’m about to discuss, a common thread emerges: the initial breakdown of some well-understood derivations and the effort to establish a new derivation that accommodates noncommuting charges. These findings will illuminate, yet not fully capture, the multitude of results predicated on the assumption that charges commute. Removing this assumption is akin to pulling a piece from a Jenga tower, triggering a cascade of other results. Critics might argue, “If you’re merely rederiving known results, this field seems uninteresting.” However, the reality is far more compelling. As researchers diligently worked to reconstruct this theoretical framework, they have continually uncovered ways in which noncommuting charges might pave the way for new physics. That said, the exploration of these novel phenomena will be the subject of my next post, where we delve into the emerging physics. So, I invite you to stay tuned. Back to the history…

E.T. Jaynes’s 1957 formalization of the maximum entropy principle has a blink-and-you’ll-miss-it reference to noncommuting charges. Consider a quantum system, similar to the box discussed in Part 1, where our understanding of the system’s state is limited to the expectation values of certain observables. Our aim is to deduce a probability distribution for the system’s potential pure states that accurately reflects our knowledge without making unjustified assumptions. According to the maximum entropy principle, this objective is met by maximizing the entropy of the distribution, which serve as a measure of uncertainty. This resulting state is known as the generalized Gibbs ensemble. Jaynes noted that this reasoning, based on information theory for the generalized Gibbs ensemble, remains valid even when our knowledge is restricted to the expectation values of noncommuting charges. However, later scholars have highlighted that physically substantiating the generalized Gibbs ensemble becomes significantly more challenging when the charges do not commute. Due to this and other reasons, when the system’s charges do not commute, the generalized Gibbs ensemble is specifically referred to as the non-Abelian thermal state (NATS).

For approximately 60 years, discussions about noncommuting charges remain dormant, outside a few mentions here and there. This changed when two studies highlighted how noncommuting charges break commonplace thermodynamics derivations. The first of these, conducted by Matteo Lostaglio as part of his 2014 thesis, challenged expectations about a system’s free energy—a measure of the system’s capacity for performing work. Interestingly, one can define a free energy for each charge within a system. Imagine a scenario where a system with commuting charges comes into contact with an environment that also has commuting charges. We then evolve the system such that the total charges in both the system and the environment are conserved. This evolution alters the system’s information content and its correlation with the environment. This change in information content depends on a sum of terms. Each term depends on the average change in one of the environment’s charges and the change in the system’s free energy for that same charge. However, this neat distinction of terms according to each charge breaks down when the system and environment exchange noncommuting charges. In such cases, the terms cannot be cleanly attributed to individual charges, and the conventional derivation falters.

The second work delved into resource theories, a topic discussed at length in Quantum Frontiers blog posts. In short, resource theories are frameworks used to quantify how effectively an agent can perform a task subject to some constraints. For example, consider all allowed evolutions (those conserving energy and other charges) one can perform on a closed system. From these evolutions, what system can you not extract any work from? The answer is systems in thermal equilibrium. The method used to determine the thermal state’s structure also fails when the system includes noncommuting charges. Building on this result, three groups (one, two, and three) presented physically motivated derivations of the form of the thermal state for systems with noncommuting charges using resource-theory-related arguments. Ultimately, the form of the NATS was recovered in each work.

Just as re-examining Batman’s origin story unveils a deeper, more compelling reason behind his crusade against crime, diving into the history and implications of noncommuting charges reveals their untapped potential for new physics. Behind every mask—or theory—there can lie an untold story. Earlier, I hinted at how reevaluating results with noncommuting charges opens the door to new physics. A specific example, initially veiled in Part 1, involves the violation of the Onsager coefficients’ derivation by noncommuting charges. By recalculating these coefficients for systems with noncommuting charges, we discover that their noncommutation can decrease entropy production. In Part 3, we’ll delve into other new physics that stems from charges’ noncommutation, exploring how noncommuting charges, akin to Batman, can really pack a punch.

David HoggIAIFI Symposium, day two

Today was day two of a meeting on generative AI in physics, hosted by MIT. My favorite talks today were by Song Han (MIT) and Thea Aarestad (ETH), both of whom are working on making ML systems run ultra-fast on extremely limited hardware. Themes were: Work at low precision. Even 4-bit number representations! Radical. And bandwidth is way more expensive than compute: Never move data, latents, or weights to new hardware; work as locally as you can. They both showed amazing performance on terrible, tiny hardware. In addition, Han makes really cute 3d-printed devices! A conversation at the end that didn't quite happen is about how Aarestad's work might benefit from equivariant methods: Her application area is triggers in the CMS device at the LHC; her symmetry group is the Lorentz group (and permutations and etc). The day started with me on a panel in which my co-panelists said absolutely unhhinged things about the future of physics and artificial intelligence. I learned that many people think we are only years away from having independently operating, fully functional aritificial physicists that are more capable than we are.

David HoggIAIFI Symposium, day one

Today was the first day of a two-day symposium on the impact of Generative AI in physics. It is hosted by IAIFI and A3D3, two interdisciplinary and inter-institutional entities working on things related to machine learning. I really enjoyed the content today. One example was Anna Scaife (Manchester) telling us that all the different methods they have used for uncertainty quantification in astronomy-meets-ML contexts give different and inconsistent answers. It is very hard to know your uncertainty when you are doing ML. Another example was Simon Batzner (DeepMind) explaining that equivariant methods were absolutely required for the materials-design projects at DeepMind, and that introducing the equivariance absolutely did not bork optimization (as many believe it will). Those materials-design projects have been ridiculously successful. He said the amusing thing “Machine learning is IID, science is OOD”. I couldn't agree more. In a panel at the end of the day I learned that learned ML controllers now beat hand-built controllers in some robotics applications. That's interesting and surprising.

March 15, 2024

Scott Aaronson Never go to “Planet Word” in Washington DC

In fact, don’t try to take kids to Washington DC if you can possibly avoid it.

This is my public service announcement. This is the value I feel I can add to the world today.

Dana and I decided to take the kids to DC for spring break. The trip, alas, has been hell—a constant struggle against logistical failures. The first days were mostly spent sitting in traffic or searching for phantom parking spaces that didn’t exist. (So then we switched to the Metro, and promptly got lost, and had our metro cards rejected by the machines.) Or, at crowded cafes, I spent the time searching for a table so my starving kids could eat—and then when I finally found a table, a woman, smug and sure-faced, evicted us from the table because she was “going to” sit there, and my kids had to see that their dad could not provide for their basic needs, and that woman will never face any consequence for what she did.

Anyway, this afternoon, utterly frazzled and stressed and defeated, we entered “Planet Word,” a museum about language. Sounds pretty good, right? Except my soon-to-be 7-year-old son got bored by numerous exhibits that weren’t for him. So I told him he could lead the way and find any exhibit he liked.

Finally my son found an exhibit that fascinated him, one where he could weigh plastic fruits on a balancing scale. He was engrossed by it, he was learning, he was asking questions, I reflected that maybe the trip wasn’t a total loss … and that’s when a museum employee pointed at us, and screamed at us to leave the room, because “this exhibit was sold out.”

The room was actually almost empty (!). No one had stopped us from entering the room. No one else was waiting to use the balancing scale. There was no sign to warn us we were doing anything wrong. I would’ve paid them hundreds of dollars in that moment if only we could stay. My son didn’t understand why he was suddenly treated as a delinquent. He then wanted to leave the whole museum, and so did I. The day was ruined for us.

Mustering my courage to do something uncharacteristic for me, I complained at the front desk. They sneered and snickered at me, basically told me to go to hell. Looking deeply into their dumb, blank expressions, I realized that I had as much chance of any comprehension or sympathy as I’d have from a warthog. It’s true that, on the scale of all the injustices in the history of the world, this one surely didn’t crack the top quadrillion. But for me, in that moment, it came to stand for all the others. Which has always been my main weakness as a person, that injustice affects me in that way.

Speaking of which, there was one part of DC trip that went exactly like it was supposed to. That was our visit to the United States Holocaust Memorial Museum. Why? Because I feel like that museum, unlike all the rest, tells me the truth about the nature of the world that I was born into—and seeing the truth is perversely comforting. I was born into a world that right now, every day, is filled with protesters screaming for my death, for my family’s death—and this is accepted as normal, and those protesters sleep soundly at night, congratulating themselves for their progressivism and enlightenment. And thinking about those protesters, and their predecessors 80 years ago who perpetrated the Holocaust or who stood by and let it happen, is the only thing that really puts blankfaced museum employees into perspective for me. Like, of course a world with the former is also going to have the latter—and I should count myself immeasurably lucky if the latter is all I have to deal with, if the empty-skulled and the soul-dead can only ruin my vacation and lack the power to murder my family.

And to anyone who reached the end of this post and who feels like it was an unwelcome imposition on their time: I’m sorry. But the truth is, posts like this are why I started this blog and why I continue it. If I’ve ever imparted any interesting information or ideas, that’s a byproduct that I’m thrilled about. But I’m cursed to be someone who wakes up every morning, walks around every day, and goes to sleep every night crushed by the weight of the world’s injustice, and outside of technical subjects, the only thing that’s ever motivated me to write is that words are the only justice available to me.

March 14, 2024

John BaezThe Probability of the Law of Excluded Middle

The Law of Excluded Middle says that for any statement P, “P or not P” is true.

Is this law true? In classical logic it is. But in intuitionistic logic it’s not.

So, in intuitionistic logic we can ask what’s the probability that a randomly chosen statement obeys the Law of Excluded Middle. And the answer is “at most 2/3—or else your logic is classical”.

This is a very nice new result by Benjamin Bumpus and Zoltan Kocsis:

• Benjamin Bumpus, Degree of classicality, Merlin’s Notebook, 27 February 2024.

Of course they had to make this more precise before proving it. Just as classical logic is described by Boolean algebras, intuitionistic logic is described by something a bit more general: Heyting algebras. They proved that in a finite Heyting algebra, if more than 2/3 of the statements obey the Law of Excluded Middle, then it must be a Boolean algebra!

Interestingly, nothing like this is true for “not not P implies P”. They showed this can hold for an arbitrarily high fraction of statements in a Heyting algebra that is still not Boolean.

Here’s a piece of the free Heyting algebra on one generator, which some call the Rieger–Nishimura lattice:

Taking the principle of excluded middle from the mathematician would be the same, say, as proscribing the telescope to the astronomer or to the boxer the use of his fists. — David Hilbert

I disagree with this statement, but boy, Hilbert sure could write!

March 13, 2024

Tommaso DorigoOn The Utility Function Of Future Experiments

At a recent meeting of the board of editors of a journal I am an editor of, it was decided to produce a special issue (to commemorate an important anniversary). As I liked the idea I got carried away a bit, and proposed to write an article for it. 

read more

March 12, 2024

David Hoggblack holes as the dark matter

Today Cameron Norton (NYU) gave a great brown-bag talk on the possibility that the dark matter might be asteroid-mass-scale black holes. This is allowed by all constraints at present: If the masses are much smaller, the black holes evaporate or emit observably. If the black holes are much smaller, they would create observable microlensing or dynamical signatures.

She and Kleban (NYU) are working on methods for creating such black holes primordially, by modifying hte potential at inflation, creating opportunities for bubble nucleations in inflation that would subsequently collapse into small black holes after the Universe exits inflation. It's speculative obviously, but not ruled out at present!

An argument broke out during and after the talk whether you would be injured if you were intersected by a 1020 g black hole! My position is that you would be totally fine! Everyone else in the room disagreed with me, for many different reasons. Time to get calculating.

Another great idea: Could we find stars that have captured low-mass black holes by looking for the radial-velocity signal? I got really interested in this one at the end.

David HoggThe Cannon and El Cañon

At the end of the day I got a bit of quality time in with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), who have just (actually just before I met with them) created a new implementation of The Cannon (the data-driven model of stellar photospheres originally created by Melissa Ness and me back in 2014/2015). Why!? Not because the world needs another implementation. We are building a new implementation because we plan to extend out to El Cañon, which will extend the probabilistic model into the label domain: It will properly generate or treat noisy and missing labels. That will permit us to learn latent labels, and de-noise noisy labels.

March 07, 2024

Doug NatelsonAPS March Meeting 2024, Day 4 and wrap-up

Because of the timing of my flight back to Houston, I really only went to one session today, in which my student spoke as did some collaborators.  It was a pretty interesting collection of contributed talks.  

  • The work that's been done on spin transport in multiferroic insulators is particularly interesting to me.  A relevant preprint is this one, in which electric fields are used to reorient \(\mathbf{P}\) in BiFeO3, which correspondingly switches the magnetization in this system (which is described by a complicated spin cycloid order) and therefore modulates the transmission of spin currents (as seen in ferromagnetic resonance).  
  • Similarly adding a bit of La to BiFeO3 to favor single ferroelectric domain formation was a neat complement to this.
  • There were also multiple talks showing the utility of the spin Hall magnetoresistance as a way to characterize spin transport between magnetic insulators and strong spin-orbit coupled metals.
Some wrap-up thoughts:
  • This meeting venue and environment was superior in essentially every way relative to last year's mess in Las Vegas.  Nice facilities, broadly good rooms, room sizes, projectors, and climate control.  Lots of hotels.  Lots of restaurants that are not absurdly expensive.  I'd be very happy to have the meeting in Minneapolis again at some point.  There was even a puppy-visiting booth at the exhibit hall on Tuesday and Thursday.
  • Speaking of the exhibit hall, I think this is the first time I've been at a meeting where a vendor was actually running a dilution refrigerator on the premises.  
  • Only one room that I was in had what I would describe as a bad projector (poor color balance, loud fan, not really able to be focused crisply).  I also did not see any session chair this year blow it by allowing speakers to blow past their allotted times.
  • We really lucked out on the weather.  
  • Does anyone know what happens if someone ignores the "Warning: Do Not Drive Over Plate" label on the 30 cm by 40 cm yellow floor plate in the main lobby?  Like, does it trigger a self-destruct mechanism, or the apocalypse or something?
  • Next year's combined March/April meeting in Anaheim should be interesting - hopefully the venue is up to the task, and likewise I hope there are good, close housing and food options.

February 19, 2024

Mark GoodsellRencontres de Physique des Particules 2024

Just over a week ago the annual meeting of theoretical particle physicists (RPP 2024) was held at Jussieu, the campus of Sorbonne University where I work. I wrote about the 2020 edition (held just outside Paris) here; in keeping with tradition, this year's version also contained similar political sessions with the heads of the CNRS' relevant physics institutes and members of CNRS committees, although they were perhaps less spicy (despite rumours of big changes in the air). 

One of the roles of these meetings is as a shop window for young researchers looking to be hired in France, and a great way to demonstrate that they are interested and have a connection to the system. Of course, this isn't and shouldn't be obligatory by any means; I wasn't really aware of this prior to entering the CNRS though I had many connections to the country. But that sort of thing seems especially important after the problems described by 4gravitons recently, and his post about getting a permanent job in France -- being able to settle in a country is non-trivial, it's a big worry for both future employers and often not enough for candidates fighting tooth and nail for the few jobs there are. There was another recent case of someone getting a (CNRS) job -- to come to my lab, even -- who much more quickly decided to leave the entire field for personal reasons. Both these stories saddened me. I can understand -- there is the well-known Paris syndrome for one thing -- and the current political anxiety about immigration and the government's response to the rise of the far right (across the world), coupled with Brexit, is clearly leading to things getting harder for many. These stories are especially worrying because we expect to be recruiting for university positions in my lab this year.

I was obviously very lucky and my experience was vastly different; I love both the job and the place, and I'm proud to be a naturalised citizen. Permanent jobs in the CNRS are amazing, especially in terms of the time and freedom you have, and there are all sorts of connections between the groups throughout the country such as via the IRN Terascale or GdR Intensity Frontier; or IRN Quantum Fields and Strings and French Strings meetings for more formal topics. I'd recommend anyone thinking about working here to check out these meetings and the communities built around them, as well as taking the opportunity to find out about life here. For those moving with family, France also offers a lot of support (healthcare, childcare, very generous holidays, etc) once you have got into the system.

The other thing to add that was emphasised in the political sessions at the RPP (reinforcing the message that we're hearing a lot) is that the CNRS is very keen to encourage people from under-represented groups to apply and be hired. One of the ways they see to help this is to put pressure on the committees to hire researchers (even) earlier after their PhD, in order to reduce the length of the leaky pipeline.

Back to physics

Coming back to the RPP, this year was particularly well attended and had an excellent program of reviews of hot topics, invited and contributed talks, put together very carefully by my colleagues. It was particularly poignant for me because two former students in my lab who I worked with a lot, one who recently got a permanent job, were talking; and in addition both a former student of mine and his current PhD student were giving talks: this made me feel old. (All these talks were fascinating, of course!) 

One review that stood out as relevant for this blog was Bogdan Malaescu's review of progress in understanding the problem with muon g-2. As I discussed here, there is currently a lot of confusion in what the Standard Model prediction should be for that quantity. This is obviously very concerning for the experiments measuring muon g-2, who in a paper last year reduced their uncertainty by a factor of 2 to $$a_\mu (\mathrm{exp}) = 116 592 059(22)\times 10^{−11}. $$

The Lattice calculation (which has been confirmed now by several groups) disagrees with the prediction using the data-driven R-ratio method however, and there is a race on to understand why. New data from the CMD-3 experiment seems to agree with the lattice result, combining all global data on measurements of \(e^+ e^- \rightarrow \pi^+ \pi^- \) still gives a discrepancy of more than \(5\sigma\). There is clearly a significant disagreement within the data samples used (indeed, CMD-3 significantly disagrees with their own previous measurement, CMD-2). The confusion is summarised by this plot:

As can be seen, the finger of blame is often pointed at the KLOE data; excluding it but including the others in the plot gives agreement with the lattice result and a significance of non-zero \(\Delta a_\mu\) compared to experiment of \(2.8\sigma\) (or for just the dispersive method without the lattice data \( \Delta a_\mu \equiv a_\mu^{\rm SM} - a_\mu^{\rm exp} = −123 \pm 33 \pm 29 \pm 22 \times 10^{-11} \) , a discrepancy of \(2.5\sigma\)). In Bogdan's talk (see also his recent paper) he discusses these tensions and also the tensions between the data and the evaluation of \(a_\mu^{\rm win}\), which is the contribution coming from a narrow "window" (when the total contribution to the Hadronic Vacuum Polarisation is split into short, medium and long-distance pieces, the medium-range part should be the one most reliable for lattice calculations -- at short distances the lattice spacing may be too small, and at long ones the lattice may not be large enough). There he shows that, if we exclude the KLOE data and just include the BABAR, CMD-3 and Tau data, while the overall result agrees with the BMW lattice result, the window one disagrees by \(2.9 \sigma\) [thanks Bogdan for the correction to the original post]. It's clear that there is still a lot to be understood in the discrepancies of the data, and perhaps, with the added experimental precision on muon g-2, there is even still a hint of new physics ...

February 13, 2024

Jordan EllenbergAlphabetical Diaries

Enough of this.Enough.Equivocal or vague principles, as a rule, will make your life an uninspired, undirected, and meaningless act.

This is taken from Alphabetical Diaries, a remarkable book I am reading by Sheila Heti, composed of many thousands of sentences drawn from her decades of diaries and presented in alphabetical order. It starts like this:

A book about how difficult it is to change, why we don’t want to, and what is going on in our brain.A book can be about more than one thing, like a kaleidoscope, it can have man things that coalesce into one thing, different strands of a story, the attempt to do several, many, more than one thing at a time, since a book is kept together by the binding.A book like a shopping mart, all the selections.A book that does only one thing, one thing at a time.A book that even the hardest of men would read.A book that is a game.A budget will help you know where to go.

How does a simple, one might even say cheap, technique, one might even say gimmick, work so well? I thrill to the aphorisms even when I don’t believe them, as with the aphorism above: principles must be equivocal or at least vague to work as principles; without the necessary vagueness they are axioms, which are not good for making one’s life a meaningful act, only good for arguing on the Internet. I was reading Alphabetical Diaries while I walked home along the southwest bike path. I stopped for a minute and went up a muddy slope into the cemetery where there was a gap in the fence, and it turned out this gap opened on the area of infant graves, graves about the size of a book, graves overlaying people who were born and then did what they did for a week and then died — enough of this.

January 24, 2024

Robert HellingHow do magnets work?

I came across this excerpt from a a christian home schooling book:

which is of course funny in so many ways not at least as the whole process of "seeing" is electromagnetic at its very core and of course most people will have felt electricity at some point in their life. Even historically, this is pretty much how it was discovered by Galvani (using forge' legs) at a time when electricity was about cat skins and amber.

It also brings to mind this quite famous Youtube video that shows Feynman being interviewed by the BBC and first getting somewhat angry about the question how magnets work and then actually goes into a quite deep explanation of what it means to explain something

But how do magnets work? When I look at what my kids are taught in school, it basically boils down to "a magnet is made up of tiny magnets that all align" which if you think about it is actually a non-explanation. Can we do better (using more than layman's physics)? What is it exactly that makes magnets behave like magnets?

I would define magnetism as the force that moving charges feel in an electromagnetic field (the part proportional to the velocity) or said the other way round: The magnetic field is the field that is caused by moving charges. Using this definition, my interpretation of the question about magnets is then why permanent magnets feel this force.  For the permanent magnets, I want to use the "they are made of tiny magnets" line of thought but remove the circularity of the argument by replacing it by "they are made of tiny spins". 

This transforms the question to "Why do the elementary particles that make up matter feel the same force as moving charges even if they are not moving?".

And this question has an answer: Because they are Dirac particles! At small energies, the Dirac equation reduces to the Pauli equation which involves the term (thanks to minimal coupling)
$$(\vec\sigma\cdot(\vec p+q\vec A)^2$$
and when you expand the square that contains (in Coulomb gauge)
$$(\vec\sigma\cdot \vec p)(\vec\sigma\cdot q\vec A)= q\vec A\cdot\vec p + (\vec p\times q\vec A)\cdot\vec\sigma$$
Here, the first term is the one responsible for the interaction of the magnetic field and moving charges while the second one couples $$\nabla\times\vec A$$ to the operator $$\vec\sigma$$, i.e. the spin. And since you need to have both terms, this links the force on moving charges to this property we call spin. If you like, the fact that the g-factor is not vanishing is the core of the explanation how magnets work.

And if you want, you can add spin-statistics which then implies the full "stability of matter" story in the end is responsible that you can from macroscopic objects out of Dirac particles that can be magnets.

January 20, 2024

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …


Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

December 20, 2023

Richard EastherA Bigger Sky

Amongst everything else that happened in 2023, a key anniversary of a huge leap in our understanding of the Universe passed largely unnoticed – the centenary of the realisation that not only was our Sun one of many stars in the Milky Way galaxy but that our galaxy was one of many galaxies in the Universe.

I had been watching the approaching anniversary for over a decade, thanks to teaching the cosmology section of the introductory astronomy course at the University of Auckland. My lectures come at the end of the semester and each October finds me showing this image – with its “October 1923” inscription – to a roomful of students.

The image was captured by the astronomer Edwin Hubble, using the world’s then-largest telescope, on top of Mt Wilson, outside Los Angeles. At first glance, it may not even look like a picture of the night sky: raw photographic images are reversed, so stars show up as dark spots against a light backgrounds. However, this odd-looking picture changed our sense of where we live in the Universe.

My usual approach when I share this image with my students is to ask for a show of hands by people with a living relative born before 1923. It’s a decent-sized class and this year a few of them had a centenarian in the family. However, I would get far more hands a decade ago when I asked about mere 90 year olds. And sometime soon no hands will rise at this prompt and I will have to come up with a new shtick. But it is remarkable to me that there are people alive today who were born before we understood of the overall arrangement of the Universe.

For tens of thousands of years, the Milky Way – the band of light that stretches across the dark night sky – would have been one of the most striking sights in the sky on a dark night once you stepped away from the fire.

Milky Way — via Unsplash

Ironically, the same technological prowess that has allowed us to explore the farthest reaches of the Universe also gives us cities and electric lights. I always ask whether my students have seen the Milky Way for themselves with another show of hands and each year quite a few of them disclose that they have not. I encourage them (and everyone) to find chances to sit out under a cloudless, moonless sky and take in the full majesty of the heavens as it slowly reveals itself to you as your eyes adapt to the dark.

In the meantime, though, we make do with a projector and a darkened lecture theatre.

It was over 400 years ago that Galileo pointed the first, small telescope at the sky. In that moment the apparent clouds of the Milky Way revealed themselves to be composed of many individual stars. By the 1920s, we understood that our Sun is a star and that the Milky Way is a collection of billions of stars, with our Sun inside it. But the single biggest question in astronomy in 1923 — which, with hindsight, became known “Great Debate” — was whether the Milky Way was an isolated island of stars in an infinite and otherwise empty ocean of space, or if it was one of many such islands, sprinkled across the sky.

In other words, for Hubble and his contemporaries the question was whether our galaxy was the galaxy, or one of many?

More specifically, the argument was whether nebulae, which are visible as extended patches of light in the night sky, were themselves galaxies or contained within the Milky Way. These objects, almost all of which are only detectable in telescopes, had been catalogued by astronomers as they mapped the sky with increasingly capable instruments. There are many kinds of nebulae, but the white nebulae had the colour of starlight and looked like little clouds through the eyepiece. Since the 1750s these had been proposed as possible galaxies. But until 1923 nobody knew with certainty whether they were small objects on the outskirts of our galaxy – or much larger, far more distant objects on the same scale as the Milky Way itself.

To human observers, the largest and most impressive of the nebulae is Andromeda. This was this object at which Hubble had pointed his telescope in October 1923. Hubble was renowned for his ability to spot interesting details in complex images [1] and after the photographic plate was developed his eye alighted on a little spot that had not been present in an earlier observation [2].

Hubble’s original guess was that this was a nova, a kind of star that sporadically flares in brightness by a factor of 1,000 or more, so he marked it and a couple of other candidates with an “N”. However, after looking back at images that he had already taken and monitoring the star through the following months Hubble came to realise that he had found a Cepheid variable – a star whose brightness changes rhythmically over weeks or months.

Stars come in a huge range of sizes and big stars are millions of times brighter than little ones, so simply looking a star in the sky tells us little about its distance from us. But Cepheids have a useful property [3]: brighter Cepheids takes longer to pass through a single cycle than their smaller siblings.

Imagine a group of people holding torches (flashlights if you are North Americans) each of which has a bulb with its own distinctive brightness. If this group fans out across a field at night and turns on their torches, we cannot tell how far away each person simply by looking at the resulting pattern of lights. Is that torch faint because it is further from us than most, or because its bulb is dimmer than most? But if each person were to flash the wattage of their bulbs in Morse Code we could estimate distances by comparing their apparent brightness (since distant objects appear fainter) to their actual intensity (which is encoded in the flashing light).

In the case of Cepheids they are not flashing in Morse code; instead, nature provides us with the requisite information via the time it takes for their brightness to vary from maximum to minimum and back to a maximum again.

Hubble used this knowledge to estimate the distance to Andromeda. While the number he found was lower than the best present-day estimates it was still large enough to show that it was far from the Milky Way and this roughly the same size as our galaxy.

The immediate implication, given that Andromeda is the brightest of the many nebulae we see in big telescopes, was that our Milky Way was neither alone nor unique in the Universe. Thus we confirmed that our galaxy was just one of an almost uncountable number of islands in the ocean of space – and the full scale of the cosmos yielded to human measurement for the first time, through Hubble’s careful lens on a curious star.

A modern image (made by Richard Gentler) of the Andromeda galaxy with a closeup on what is now called “Hubble’s star” taken using the (appropriately enough) Hubble Space Telescope, in the white circle. A “positive” image from Hubble’s original plate is shown at the bottom right.

Illustration Credit: NASA, ESA and Z. Levay (STScI). Credit: NASA, ESA and the Hubble Heritage Team (STScI/AURA)

[1] Astronomers in Hubble’s day used a gizmo called a “Blink Comparator” that chops quickly between two images viewed through an eyepiece, so objects changing in brightness draw attention to themselves by flickering.

[2] In most reproductions of the original plate I am hard put to spot it all, even more so when it is projected on a screen in a lecture theatre. A bit of mild image processing makes it a little clearer, but it hardly calls attention to itself.


[3] This “period-luminosity law” had been described just 15 years earlier by Henrietta Swan Leavitt and it is still key to setting the overall scale of the Universe.

December 18, 2023

Jordan EllenbergShow report: Bug Moment, Graham Hunt, Dusk, Disq at High Noon Saloon

I haven’t done a show report in a long time because I barely go to shows anymore! Actually, though, this fall I went to three. First, The Beths, opening for The National, but I didn’t stay for The National because I don’t know or care about them; I just wanted to see the latest geniuses of New Zealand play “Expert in a Dying Field”

Next was the Violent Femmes, playing their self-titled debut in order. They used to tour a lot and I used to see them a lot, four or five times in college and grad school I think. They never really grow old and Gordon Gano never stops sounding exactly like Gordon Gano. A lot of times I go to reunion shows and there are a lot of young people who must have come to the band through their back catalogue. Not Violent Femmes! 2000 people filling the Sylvee and I’d say 95% were between 50 and 55. One of the most demographically narrowcast shows I’ve ever been to. Maybe beaten out by the time I saw Black Francis at High Noon and not only was everybody exactly my age they were also all men. (Actually, it was interesting to me there were a lot of women at this show! I think of Violent Femmes as a band for the boys.)

But I came in to write about the show I saw this weekend, four Wisconsin acts playing the High Noon. I really came to see Disq, whose single “Daily Routine” I loved when it came out and I still haven’t gotten tired of. Those chords! Sevenths? They’re something:

Dusk was an Appleton band that played funky/stompy/indie, Bug Moment had an energetic frontwoman named Rosenblatt and were one of those bands where no two members looked like they were in the same band. But the real discovery of the night, for me, was Graham Hunt, who has apparently been a Wisconsin scene fixture forever. Never heard of the guy. But wow! Indie power-pop of the highest order. When Hunt’s voice cracks and scrapes the high notes he reminds me a lot of the other great Madison noisy-indie genius named Graham, Graham Smith, aka Kleenex Girl Wonder, who recorded the last great album of the 1990s in his UW-Madison dorm room. Graham Hunt’s new album, Try Not To Laugh, is out this week. ”Emergency Contact” is about as pretty and urgent as this kind of music gets. 

And from his last record, If You Knew Would You Believe it, “How Is That Different,” which rhymes blanket, eye slit, left it, and orbit. Love it! Reader, I bought a T-shirt.

November 27, 2023

Sean Carroll New Course: The Many Hidden Worlds of Quantum Mechanics

In past years I’ve done several courses for The Great Courses/Wondrium (formerly The Teaching Company): Dark Matter and Dark Energy, Mysteries of Modern Physics:Time, and The Higgs Boson and Beyond. Now I’m happy to announce a new one, The Many Hidden Worlds of Quantum Mechanics.

This is a series of 24 half-hour lectures, given by me with impressive video effects from the Wondrium folks.

The content will be somewhat familiar if you’ve read my book Something Deeply Hidden — the course follows a similar outline, with a few new additions and elaborations along the way. So it’s both a general introduction to quantum mechanics, and also an in-depth exploration of the Many Worlds approach in particular. It’s meant for absolutely everybody — essentially no equations this time! — but 24 lectures is plenty of time to go into depth.

Check out this trailer:

As I type this on Monday 27 November, I believe there is some kind of sale going on! So move quickly to get your quantum mechanics at unbelievably affordable prices.