Planet Musings

October 04, 2023

Terence TaoBounding sums or integrals of non-negative quantities

A common task in analysis is to obtain bounds on sums

\displaystyle  \sum_{n \in A} f(n)

or integrals

\displaystyle  \int_A f(x)\ dx

where {A} is some simple region (such as an interval) in one or more dimensions, and {f} is an explicit (and elementary) non-negative expression involving one or more variables (such as {n} or {x}, and possibly also some additional parameters. Often, one would be content with an order of magnitude upper bound such as

\displaystyle  \sum_{n \in A} f(n) \ll X

or

\displaystyle  \int_A f(x)\ dx \ll X

where we use {X \ll Y} (or {Y \gg X} or {X = O(Y)}) to denote the bound {|X| \leq CY} for some constant {C}; sometimes one wishes to also obtain the matching lower bound, thus obtaining

\displaystyle  \sum_{n \in A} f(n) \asymp X

or

\displaystyle  \int_A f(x)\ dx \asymp X

where {X \asymp Y} is synonymous with {X \ll Y \ll X}. Finally, one may wish to obtain a more precise bound, such as

\displaystyle  \sum_{n \in A} f(n) = (1+o(1)) X

where {o(1)} is a quantity that goes to zero as the parameters of the problem go to infinity (or some other limit). (For a deeper dive into asymptotic notation in general, see this previous blog post.)

Here are some typical examples of such estimation problems, drawn from recent questions on MathOverflow:

  • (i) (From this question) If {d,p \geq 1} and {a>d/p}, is the expression

    \displaystyle  \sum_{j \in {\bf Z}} 2^{(\frac{d}{p}+1-a)j} \int_0^\infty e^{-2^j s} \frac{s^a}{1+s^{2a}}\ ds

    finite?
  • (ii) (From this question) If {h,m \geq 1}, how can one show that

    \displaystyle  \sum_{d=0}^\infty \frac{2d+1}{2h^2 (1 + \frac{d(d+1)}{h^2}) (1 + \frac{d(d+1)}{h^2m^2})^2} \ll 1 + \log(m^2)?

  • (iii) (From this question) Can one show that

    \displaystyle  \sum_{k=1}^{n-1} \frac{k^{2n-4k-3}(n^2-2nk+2k^2)}{(n-k)^{2n-4k-1}} = (c+o(1)) \sqrt{n}

    as {n \rightarrow \infty} for an explicit constant {c}, and what is this constant?

Compared to other estimation tasks, such as that of controlling oscillatory integrals, exponential sums, singular integrals, or expressions involving one or more unknown functions (that are only known to lie in some function spaces, such as an {L^p} space), high-dimensional geometry (or alternatively, large numbers of random variables), or number-theoretic structures (such as the primes), estimation of sums or integrals of non-negative elementary expressions is a relatively straightforward task, and can be accomplished by a variety of methods. The art of obtaining such estimates is typically not explicitly taught in textbooks, other than through some examples and exercises; it is typically picked up by analysts (or those working in adjacent areas, such as PDE, combinatorics, or theoretical computer science) as graduate students, while they work through their thesis or their first few papers in the subject.

Somewhat in the spirit of this previous post on analysis problem solving strategies, I am going to try here to collect some general principles and techniques that I have found useful for these sorts of problems. As with the previous post, I hope this will be something of a living document, and encourage others to add their own tips or suggestions in the comments.

— 1. Asymptotic arithmetic —

Asymptotic notation is designed so that many of the usual rules of algebra and inequality manipulation continue to hold, with the caveat that one has to be careful if subtraction or division is involved. For instance, if one knows that {A \ll X} and {B \ll Y}, then one can immediately conclude that {A + B \ll X+Y} and {AB \ll XY}, even if {A,B} are negative (note that the notation {A \ll X} or {B \ll Y} automatically forces {X,Y} to be non-negative). Equivalently, we have the rules

\displaystyle  O(X) + O(Y) = O(X+Y); \quad O(X) \cdot O(Y) = O(XY)

and more generally we have the triangle inequality

\displaystyle  \sum_\alpha O(X_\alpha) = O( \sum_\alpha X_\alpha ).

Again, we stress that this sort of rule implicitly requires the {X_\alpha} to be non-negative, and that claims such as {O(X) - O(Y) = O(X-Y)} and {O(X)/O(Y) = O(X/Y)} are simply false. As a rule of thumb, if your calculations have arrived at a situation where a signed or oscillating sum or integral appears inside the big-O notation, or on the right-hand side of an estimate, without being “protected” by absolute value signs, then you have probably made a serious error in your calculations.

Another rule of inequalities that is inherited by asymptotic notation is that if one has two bounds

\displaystyle  A \ll X; \quad A \ll Y \ \ \ \ \ (1)

for the same quantity {A}, then one can combine them into the unified asymptotic bound

\displaystyle  A \ll \min(X, Y). \ \ \ \ \ (2)

This is an example of a “free move”: a replacement of bounds that does not lose any of the strength of the original bounds, since of course (2) implies (1). In contrast, other ways to combine the two bounds (1), such as taking the geometric mean

\displaystyle  A \ll X^{1/2} Y^{1/2}, \ \ \ \ \ (3)

while often convenient, are not “free”: the bounds (1) imply the averaged bound (3), but the bound (3) does not imply (1). On the other hand, the inequality (2), while it does not concede any logical strength, can require more calculation to work with, often because one ends up splitting up cases such as {X \ll Y} and {X \gg Y} in order to simplify the minimum. So in practice, when trying to establish an estimate, one often starts with using conservative bounds such as (2) in order to maximize one’s chances of getting any proof (no matter how messy) of the desired estimate, and only after such a proof is found, one tries to look for more elegant approaches using less efficient bounds such as (3).

For instance, suppose one wanted to show that the sum

\displaystyle  \sum_{n=-\infty}^\infty \frac{2^n}{(1+n^2) (1+2^{2n})}

was convergent. Lower bounding the denominator term {1+2^{2n}} by {1} or by {2^{2n}}, one obtains the bounds

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{1+n^2} \ \ \ \ \ (4)

and also

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{2^{-n}}{1+n^2} \ \ \ \ \ (5)

so by applying (2) we obtain the unified bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{\max(2^n,2^{-n})}{1+n^2}.

To deal with this bound, we can split into the two contributions {n \geq 0}, where {2^{-n}} dominates, and {n < 0}, where {2^n} dominates. In the former case we see (from the ratio test, for instance) that the sum

\displaystyle  \sum_{n=0}^\infty \frac{2^{-n}}{1+n^2}

is absolutely convergent, and in the latter case we see that the sum

\displaystyle  \sum_{n=-\infty}^{-1} \frac{2^{n}}{1+n^2}

is also absolutely convergent, so the entire sum is absolutely convergent. But once one has this argument, one can try to streamline it, for instance by taking the geometric mean of (4), (5) rather than the minimum to obtain the weaker bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{1}{1+n^2} \ \ \ \ \ (6)

and now one can conclude without decomposition just by observing the absolute convergence of the doubly infinite sum {\sum_{n=-\infty}^\infty \frac{1}{1+n^2}}. This is a less “efficient” estimate, because one has conceded a lot of the decay in the summand by using (6) (the summand used to be exponentially decaying in {n}, but is now only polynomially decaying), but it is still sufficient for the purpose of establishing absolute convergence.

One of the key advantages of dealing with order of magnitude estimates, as opposed to sharp inequalities, is that the arithmetic becomes tropical. More explicitly, we have the important rule

\displaystyle  X + Y \asymp \max(X,Y)

whenver {X,Y} are non-negative, since we clearly have

\displaystyle  \max(X,Y) \leq X+Y \leq 2 \max(X,Y).

In praticular, if {Y \leq X}, then {O(X) + O(Y) = O(X)}. That is to say, given two orders of magnitudes, any term {O(Y)} of equal or lower order to a “main term” {O(X)} can be discarded. This is a very useful rule to keep in mind when trying to estimate sums or integrals, as it allows one to discard many terms that are not contributing to the final answer. It also interacts well with monotone operations, such as raising to a power {p}; for instance, we have

\displaystyle  (X+Y)^p \asymp \max(X,Y)^p = \max(X^p,Y^p) \asymp X^p + Y^p

if {X,Y \geq 0} and {p} is a fixed non-negative constant, whilst

\displaystyle  \frac{1}{X+Y} \asymp \frac{1}{\max(X,Y)} = \min(\frac{1}{X}, \frac{1}{Y})

if {X,Y>0}. Finally, this relation also sets up the fundamental divide and conquer strategy for estimation: if one wants to prove a bound such as {A \ll X}, it will suffice to obtain a decomposition

\displaystyle  A = A_1 + \dots + A_k

or at least an upper bound

\displaystyle  A \ll A_1 + \dots + A_k

of {A} by some bounded number of components {A_1,\dots,A_k}, and establish the bounds {A_1 \ll X, \dots, A_k \ll X} separately. Typically the {A_1,\dots,A_k} will be (morally at least) smaller than the original quantity {A} – for instance, if {A} is a sum of non-negative quantities, each of the {A_i} might be a subsum of those same quantities – which means that such a decomposition is a “free move”, in the sense that it does not risk making the problem harder. (This is because, if the original bound {A \ll X} is to be true, each of the new objectives {A_1 \ll X, \dots, A_k \ll X} must also be true, and so the decomposition can only make the problem logically easier, not harder.) The only costs to such decomposition are that your proofs might be {k} times longer, as you may be repeating the same arguments {k} times, and that the implied constants in the {A_1 \ll X, \dots, A_k \ll X} bounds may be worse than the implied constant in the original {A \ll X} bound. However, in many cases these costs are well worth the benefits of being able to simplify the problem into smaller pieces. As mentioned above, once one successfully executes a divide and conquer strategy, one can go back and try to reduce the number of decompositions, for instance by unifying components that are treated by similar methods, or by replacing strong but unwieldy estimates with weaker, but more convenient estimates.

The above divide and conquer strategy does not directly apply when one is decomposing into an unbounded number of pieces {A_j}, {j=1,2,\dots}. In such cases, one needs an additional gain in the index {j} that is summable in {j} in order to conclude. For instance, if one wants to establish a bound of the form {A \ll X}, and one has located a decomposition or upper bound

\displaystyle  A \ll \sum_{j=1}^\infty A_j

that looks promising for the problem, then it would suffice to obtain exponentially decaying bounds such as

\displaystyle  A_j \ll 2^{-cj} X

for all {j \geq 1} and some constant {c>0}, since this would imply

\displaystyle  A \ll \sum_{j=1}^\infty 2^{-cj} X \ll X \ \ \ \ \ (7)

thanks to the geometric series formula. (Here it is important that the implied constants in the asymptotic notation are uniform on {j}; a {j}-dependent bound such as {A_j \ll_j 2^{-cj} X} would be useless for this application, as then the growth of the implied constant in {j} could overwhelm the exponential decay in the {2^{-cj}} factor). Exponential decay is in fact overkill; polynomial decay such as

\displaystyle  A_j \ll \frac{X}{j^{1+c}}

would already be sufficient, although harmonic decay such

\displaystyle  A_j \ll \frac{X}{j} \ \ \ \ \ (8)

is not quite enough (the sum {\sum_{j=1}^\infty \frac{1}{j}} diverges logarithmically), although in many such situations one could try to still salvage the bound by working a lot harder to squeeze some additional logarithmic factors out of one’s estimates. For instance, if one can improve \eqre{ajx} to

\displaystyle  A_j \ll \frac{X}{j \log^{1+c} j}

for all {j \geq 2} and some constant {c>0}, since (by the integral test) the sum {\sum_{j=2}^\infty \frac{1}{j\log^{1+c} j}} converges (and one can treat the {j=1} term separately if one already has (8)).

Sometimes, when trying to prove an estimate such as {A \ll X}, one has identified a promising decomposition with an unbounded number of terms

\displaystyle  A \ll \sum_{j=1}^J A_j

(where {J} is finite but unbounded) but is unsure of how to proceed next. Often the next thing to do is to study the extreme terms {A_1} and {A_J} of this decomposition, and first try to establish (the presumably simpler) tasks of showing that {A_1 \ll X} and {A_J \ll X}. Often once one does so, it becomes clear how to combine the treatments of the two extreme cases to also treat the intermediate cases, obtaining a bound {A_j \ll X} for each individual term, leading to the inferior bound {A \ll JX}; this can then be used as a starting point to hunt for additional gains, such as the exponential or polynomial gains mentioned previously, that could be used to remove this loss of {J}. (There are more advanced techniques, such as those based on controlling moments such as the square function {(\sum_{j=1}^J |A_j|^2)^{1/2}}, or trying to understand the precise circumstances in which a “large values” scenario {|A_j| \gg X} occurs, and how these scenarios interact with each other for different {j}, but these are beyond the scope of this post, as they are rarely needed when dealing with sums or integrals of elementary functions.)

If one is faced with the task of estimating a doubly infinite sum {\sum_{j=-\infty}^\infty A_j}, it can often be useful to first think about how one would proceed in estimating {A_j} when {j} is very large and positive, and how one would proceed when {A_j} is very large and negative. In many cases, one can simply decompose the sum into two pieces such as {\sum_{j=1}^\infty A_j} and {\sum_{j=-\infty}^{-1} A_j} and use whatever methods you came up with to handle the two extreme cases; in some cases one also needs a third argument to handle the case when {j} is of bounded (or somewhat bounded) size, in which case one may need to divide into three pieces such as {\sum_{j=J_+}^\infty A_j}, {\sum_{j=-\infty}^{J_-} A_j}, and {\sum_{j=J_-+1}^{J_+-1} A_j}. Sometimes there will be a natural candidate for the places {J_-, J_+} where one is cutting the sum, but in other situations it may be best to just leave these cut points as unspecified parameters initially, obtain bounds that depend on these parameters, and optimize at the end. (Typically, the optimization proceeds by trying to balance the magnitude of a term that is increasing with respect to a parameter, with one that is decreasing. For instance, if one ends up with a bound such as {A \lambda + B/\lambda} for some parameter {\lambda>0} and quantities {A,B>0}, it makes sense to select {\lambda = \sqrt{B/A}} to balance the two terms. Or, if faced with something like {A e^{-\lambda} + \lambda} for some {A > 2}, then something like {\lambda = \log A} would be close to the optimal choice of parameter. And so forth.)

— 1.1. Psychological distinctions between exact and asymptotic arithmetic —

The adoption of the “divide and conquer” strategy requires a certain mental shift from the “simplify, simplify” strategy that one is taught in high school algebra. In the latter strategy, one tries to collect terms in an expression make them as short as possible, for instance by working with a common denominator, with the idea that unified and elegant-looking expressions are “simpler” than sprawling expressions with many terms. In contrast, the divide and conquer strategy is intentionally extremely willing to greatly increase the total length of the expressions to be estimated, so long as each individual component of the expressions appears easier to estimate than the original one. Both strategies are still trying to reduce the original problem to a simpler problem (or collection of simpler sub-problems), but the metric by which one judges whether the problem has become simpler is rather different.

A related mental shift that one needs to adopt in analysis is to move away from the exact identities that are so prized in algebra (and in undergraduate calculus), as the precision they offer is often unnecessary and distracting for the task at hand, and often fail to generalize to more complicated contexts in which exact identities are no longer available. As a simple example, consider the task of estimating the expression

\displaystyle  \int_0^a \frac{dx}{1+x^2}

where {a > 0} is a parameter. With a trigonometric substitution, one can evaluate this expression exactly as {\mathrm{arctan}(a)}, however the presence of the arctangent can be inconvenient if one has to do further estimation tasks (for instance, if {a} depends in a complicated fashion on other parameters, which one then also wants to sum or integrate over). Instead, by observing the trivial bounds

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^a\ dx = a

and

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^\infty\ \frac{dx}{1+x^2} = \frac{\pi}{2}

one can combine them using (2) to obtain the upper bound

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \min( a, \frac{\pi}{2} ) \asymp \min(a,1)

and similar arguments also give the matching lower bound, thus

\displaystyle  \int_0^a \frac{dx}{1+x^2} \asymp \min(a,1). \ \ \ \ \ (9)

This bound, while cruder than the exact answer of {\mathrm{arctan}(a)}, is often good enough for many applications (par ticularly in situations where one is willing to concede constants in the bounds), and can be more tractible to work with than the exact answer. Furthermore, these arguments can be adapted without difficulty to treat similar expressions such as

\displaystyle  \int_0^a \frac{dx}{(1+x^2)^\alpha}

for any fixed exponent {\alpha>0}, which need not have closed form exact expressions in terms of elementary functions such as the arctangent when {\alpha} is non-integer.

As a general rule, instead of relying exclusively on exact formulae, one should seek approximations that are valid up to the degree of precision that one seeks in the final estimate. For instance, suppose one one wishes to establish the bound

\displaystyle  \sec(x) - \cos(x) = x^2 + O(x^3)

for all sufficiently small {x}. If one was clinging to the exact identity mindset, one could try to look for some trigonometric identity to simplify the left-hand side exactly, but the quicker (and more robust) way to proceed is just to use Taylor expansion up to the specified accuracy {O(x^3)} to obtain

\displaystyle  \cos(x) = 1 - \frac{x^2}{2} + O(x^3)

which one can invert using the geometric series formula {(1-y)^{-1} = 1 + y + y^2 + \dots} to obtain

\displaystyle  \sec(x) = 1 + \frac{x^2}{2} + O(x^3)

from which the claim follows. (One could also have computed the Taylor expansion of {\sec(x)} by repeatedly differentiating the secant function, but as this is a series that is usually not memorized, this can take a little bit more time than just computing it directly to the required accuracy as indicated above.) Note that the notion of “specified accuracy” may have to be interpreted in a relative sense if one is planning to multiply or divide several estimates together. For instance, if one wishes to establsh the bound

\displaystyle  \sin(x) \cos(x) = x + O(x^3)

for small {x}, one needs an approximation

\displaystyle  \sin(x) = x + O(x^3)

to the sine function that is accurate to order {O(x^3)}, but one only needs an approximation

\displaystyle  \cos(x) = 1 + O(x^2)

to the cosine function that is accurate to order {O(x^2)}, because the cosine is to be multiplied by {\sin(x)= O(x)}. Here the key is to obtain estimates that have a relative error of {O(x^2)}, compared to the main term (which is {1} for cosine, and {x} for sine).

The following table lists some common approximations that can be used to simplify expressions when one is only interested in order of magnitude bounds (with {c>0} an arbitrary small constant):

The quantity… has magnitude comparable to … provided that…
{X+Y} {X} {0 \leq Y \ll X} or {|Y| \leq (1-c)X}
{X+Y} {\max(X,Y)} {X,Y \geq 0}
{\sin z}, {\tan z}, {e^{iz}-1} {|z|} {|z| \leq \frac{\pi}{2} - c}
{\cos z} {1} {|z| \leq \pi/2 - c}
{\sin x} {\mathrm{dist}(x, \pi {\bf Z})} {x} real
{e^{ix}-1} {\mathrm{dist}(x, 2\pi {\bf Z})} {x} real
{\mathrm{arcsin} x} {|x|} {|x| \leq 1-c}
{\log(1+z)} {|z|} {|z| \leq 1-c}
{e^z-1}, {\sinh z}, {\tanh z} {|z|} {|z| \leq \frac{\pi}{2}-c}
{\cosh z} {1} {|z| \leq \frac{\pi}{2}-c}
{\sinh x}, {\cosh x} {e^x} {|x| \gg 1}
{\tanh x} {\min(|x|, 1)} {x} real
{(1+x)^a-1} {a|x|} {a \gg 1}, {a |x| \ll 1}
{n!} {n^n e^{-n} \sqrt{n}} {n \geq 1}
{\Gamma(s)} {|s^s e^{-s}| / |s|^{1/2}} {|z| \gg 1}, {|\mathrm{arg} z| \leq \frac{\pi}{2} - c}
{\Gamma(\sigma+it)} {|t|^{\sigma-1/2} e^{-\pi |t|/2}} {\sigma = O(1)}, {|t| \gg 1}
{\binom{n}{m}} {e^{n (p \log \frac{1}{p} + (1-p) \log \frac{1}{1-p})} / n^{1/2}} {m=pn}, {c < p < 1-c}
{\binom{n}{m}} {2^n e^{-2(m-n/2)^2} / n^{1/2}} {m = n/2 + O(n^{2/3})}
{\binom{n}{m}} {n^m/m!} {m \ll \sqrt{n}}

On the other hand, some exact formulae are still very useful, particularly if the end result of that formula is clean and tractable to work with (as opposed to involving somewhat exotic functions such as the arctangent). The geometric series formula, for instance, is an extremely handy exact formula, so much so that it is often desirable to control summands by a geometric series purely to use this formula (we already saw an example of this in (7)). Exact integral identities, such as

\displaystyle  \frac{1}{a} = \int_0^\infty e^{-at}\ dt

or more generally

\displaystyle  \frac{\Gamma(s)}{a^s} = \int_0^\infty e^{-at} t^{s-1}\ dt

for {a,s>0} (where {\Gamma} is the Gamma function) are also quite commonly used, and fundamental exact integration rules such as the change of variables formula, the Fubini-Tonelli theorem or integration by parts are all esssential tools for an analyst trying to prove estimates. Because of this, it is often desirable to estimate a sum by an integral. The integral test is a classic example of this principle in action: a more quantitative versions of this test is the bound

\displaystyle  \int_{a}^{b+1} f(t)\ dt \leq \sum_{n=a}^b f(n) \leq \int_{a-1}^b f(t)\ dt \ \ \ \ \ (10)

whenever {a \leq b} are integers and {f: [a-1,b+1] \rightarrow {\bf R}} is monotone decreasing, or the closely related bound

\displaystyle  \sum_{a \leq n \leq b} f(n) = \int_a^b f(t)\ dt + O( |f(a)| + |f(b)| ) \ \ \ \ \ (11)

whenever {a \geq b} are reals and {f: [a,b] \rightarrow {\bf R}} is monotone (either increasing or decreasing); see Lemma 2 of this previous post. Such bounds allow one to switch back and forth quite easily between sums and integrals as long as the summand or integrand behaves in a mostly monotone fashion (for instance, if it is monotone increasing on one portion of the domain and monotone decreasing on the other). For more precision, one could turn to more advanced relationships between sums and integrals, such as the Euler-Maclaurin formula or the Poisson summation formula, but these are beyond the scope of this post.

Exercise 1 Suppose {f: {\bf R} \rightarrow {\bf R}^+} obeys the quasi-monotonicity property {f(x) \ll f(y)} whenever {y-1 \leq x \leq y}. Show that {\int_a^{b-1} f(t)\ dt \ll \sum_{n=a}^b f(n) \ll \int_a^{b+1} f(t)\ dt} for any integers {a < b}.

Exercise 2 Use (11) to obtain the “cheap Stirling approximation

\displaystyle  n! = \exp( n \log n - n + O(\log n) )

for any natural number {n \geq 2}. (Hint: take logarithms to convert the product {n! = 1 \times 2 \times \dots \times n} into a sum.)

With practice, you will be able to identify any term in a computation which is already “negligible” or “acceptable” in the sense that its contribution is always going to lead to an error that is smaller than the desired accuracy of the final estimate. One can then work “modulo” these negligible terms and discard them as soon as they appear. This can help remove a lot of clutter in one’s arguments. For instance, if one wishes to establish an asymptotic of the form

\displaystyle  A = X + O(Y)

for some main term {X} and lower order error {O(Y)}, any component of {A} that one can already identify to be of size {O(Y)} is negligible and can be removed from {A} “for free”. Conversely, it can be useful to add negligible terms to an expression, if it makes the expression easier to work with. For instance, suppose one wants to estimate the expression

\displaystyle  \sum_{n=1}^N \frac{1}{n^2}. \ \ \ \ \ (12)

This is a partial sum for the zeta function

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \zeta(2) = \frac{\pi^2}{6}

so it can make sense to add and subtract the tail {\sum_{n=N+1}^\infty \frac{1}{n^2}} to the expression (12) to rewrite it as

\displaystyle  \frac{\pi^2}{6} - \sum_{n=N+1}^\infty \frac{1}{n^2}.

To deal with the tail, we switch from a sum to the integral using (10) to bound

\displaystyle  \sum_{n=N+1}^\infty \frac{1}{n^2} \ll \int_N^\infty \frac{1}{t^2}\ dt = \frac{1}{N}

giving us the reasonably accurate bound

\displaystyle  \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} - O(\frac{1}{N}).

One can sharpen this approximation somewhat using (11) or the Euler–Maclaurin formula; we leave this to the interested reader.

Another psychological shift when switching from algebraic simplification problems to estimation problems is that one has to be prepared to let go of constraints in an expression that complicate the analysis. Suppose for instance we now wish to estimate the variant

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2}

of (12), where we are now restricting {n} to be square-free. An identity from analytic number theory (the Euler product identity) lets us calculate the exact sum

\displaystyle  \sum_{n \geq 1, \hbox{ square-free}} \frac{1}{n^2} = \frac{\zeta(2)}{\zeta(4)} = \frac{15}{\pi^2}

so as before we can write the desired expression as

\displaystyle  \frac{15}{\pi^2} - \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2}.

Previously, we applied the integral test (10), but this time we cannot do so, because the restriction to square-free integers destroys the monotonicity. But we can simply remove this restriction:

\displaystyle  \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2} \leq \sum_{n > N} \frac{1}{n^2}.

Heuristically at least, this move only “costs us a constant”, since a positive fraction ({1/\zeta(2)= 6/\pi^2}, in fact) of all integers are square-free. Now that this constraint has been removed, we can use the integral test as before and obtain the reasonably accurate asymptotic

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2} = \frac{15}{\pi^2} + O(\frac{1}{N}).

— 2. More on decomposition —

The way in which one decomposes a sum or integral such as {\sum_{n \in A} f(n)} or {\int_A f(x)\ dx} is often guided by the “geometry” of {f}, and in particular where {f} is large or small (or whether various component terms in {f} are large or small relative to each other). For instance, if {f(x)} comes close to a maximum at some point {x=x_0}, then it may make sense to decompose based on the distance {|x-x_0|} to {x_0}, or perhaps to treat the cases {x \leq x_0} and {x>x_0} separately. (Note that {x_0} does not literally have to be the maximum in order for this to be a reasonable decomposition; if it is in “within reasonable distance” of the maximum, this could still be a good move. As such, it is often not worthwhile to try to compute the maximum of {f} exactly, especially if this exact formula ends up being too complicated to be useful.)

If an expression involves a distance {|X-Y|} between two quantities {X,Y}, it is sometimes useful to split into the case {|X| \leq |Y|/2} where {X} is much smaller than {Y} (so that {|X-Y| \asymp |Y|}), the case {|Y| \leq |X|/2} where {Y} is much smaller than {X} (so that {|X-Y| \asymp |X|}), or the case when neither of the two previous cases apply (so that {|X| \asymp |Y|}). The factors of {2} here are not of critical importance; the point is that in each of these three cases, one has some hope of simplifying the expression into something more tractable. For instance, suppose one wants to estimate the expression

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \ \ \ \ \ (13)

in terms of the two real parameters {a, b}, which we will take to be distinct for sake of this discussion. This particular integral is simple enough that it can be evaluated exactly (for instance using contour integration techniques), but in the spirit of Principle 1, let us avoid doing so and instead try to decompose this expression into simpler pieces. A graph of the integrand reveals that it peaks when {x} is near {a} or near {b}. Inspired by this, one can decompose the region of integration into three pieces:

  • (i) The region where {|x-a| \leq \frac{|a-b|}{2}}.
  • (ii) The region where {|x-b| \leq \frac{|a-b|}{2}}.
  • (iii) The region where {|x-a|, |x-b| > \frac{|a-b|}{2}}.

(This is not the only way to cut up the integral, but it will suffice. Often there is no “canonical” or “elegant” way to perform the decomposition; one should just try to find a decomposition that is convenient for the problem at hand.)

The reason why we want to perform such a decomposition is that in each of the three cases, one can simplify how the integrand depends on {x}. For instance, in region (i), we see from the triangle inequality that {|x-b|} is now comparable to {|a-b|}, so that this contribution to (13) is comparable to

\displaystyle  \asymp \int_{|x-a| \leq |a-b|/2} \frac{dx}{(1+(x-a)^2) (1+(a-b)^2)}.

Using a variant of (9), this expression is comparable to

\displaystyle  \asymp \min( 1, |a-b|/2) \frac{1}{1+(a-b)^2} \asymp \frac{\min(1, |a-b|)}{1+(a-b)^2}. \ \ \ \ \ (14)

The contribution of region (ii) can be handled similarly, and is also comparable to (14). Finally, in region (iii), we see from the triangle inequality that {|x-a|, |x-b|} are now comparable to each other, and so the contribution of this region is comparable to

\displaystyle  \asymp \int_{|x-a|, |x-b| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

Now that we have centered the integral around {x=a}, we will discard the {|x-b| > |a-b|/2} constraint, upper bounding this integral by

\displaystyle  \asymp \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

On the one hand this integral is bounded by

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2)^2} = \int_{-\infty}^\infty \frac{dx}{(1+x^2)^2} \asymp 1

and on the other hand we can bound

\displaystyle  \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2} \leq \int_{|x-a| > |a-b|/2} \frac{dx}{(x-a)^4}

\displaystyle \asymp |a-b|^{-3}

and so we can bound the contribution of (iii) by {O( \min( 1, |a-b|^{-3} ))}. Putting all this together, and dividing into the cases {|a-b| \leq 1} and {|a-b| > 1}, one can soon obtain a total bound of {O(\min( 1, |a-b|^{-3}))} for the entire integral. One can also adapt this argument to show that this bound is sharp up to constants, thus

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \asymp \min( 1, |a-b|^{-3})

\displaystyle  \asymp \frac{1}{1+|a-b|^3}.

A powerful and common type of decomposition is dyadic decomposition. If the summand or integrand involves some quantity {Q} in a key way, it is often useful to break up into dyadic regions such as {2^{j-1} \leq Q < 2^{j}}, so that {Q \sim 2^j}, and then sum over {j}. (One can tweak the dyadic range {2^{j-1} \leq Q < 2^{j}} here with minor variants such as {2^{j} < Q \leq 2^{j+1}}, or replace the base {2} by some other base, but these modifications mostly have a minor aesthetic impact on the arguments at best.) For instance, one could break up a sum

\displaystyle  \sum_{n=1}^{\infty} f(n) \ \ \ \ \ (15)

into dyadic pieces

\displaystyle  \sum_{j=1}^\infty \sum_{2^{j-1} \leq n < 2^{j}} f(n)

and then seek to estimate each dyadic block {\sum_{2^{j-1} \leq n < 2^{j}} f(n)} separately (hoping to get some exponential or polynomial decay in {j}). The classical technique of Cauchy condensation is a basic example of this strategy. But one can also dyadically decompose other quantities than {n}. For instance one can perform a “vertical” dyadic decomposition (in contrast to the “horizontal” one just performed) by rewriting (15) as

\displaystyle  \sum_{k \in {\bf Z}} \sum_{n \geq 1: 2^{k-1} \leq f(n) < 2^k} f(n);

since the summand {f(n)} is {\asymp 2^k}, we may simplify this to

\displaystyle  \asymp \sum_{k \in {\bf Z}} 2^k \# \{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}.

This now converts the problem of estimating the sum (15) to the more combinatorial problem of estimating the size of the dyadic level sets {\{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}} for various {k}. In a similar spirit, we have

\displaystyle  \int_A f(x)\ dx \asymp \sum_{k \in {\bf Z}} 2^k | \{ x \in A: 2^{k-1} \leq f(x) < 2^k \}|

where {|E|} denotes the Lebesgue measure of a set {E}, and now we are faced with a geometric problem of estimating the measure of some explicit set. This allows one to use geometric intuition to solve the problem, instead of multivariable calculus:

Exercise 3 Let {S} be a smooth compact submanifold of {{\bf R}^d}. Establish the bound

\displaystyle  \int_{B(0,C)} \frac{dx}{\varepsilon^2 + \mathrm{dist}(x,S)^2} \ll \varepsilon

for all {0 < \varepsilon < C}, where the implied constants are allowed to depend on {C, d, S}. (This can be accomplished either by a vertical dyadic decomposition, or a dyadic decomposition of the quantity {\mathrm{dist}(x,S)}.)

Exercise 4 Solve problem (ii) from the introduction to this post by dyadically decomposing in the {d} variable.

Remark 5 By such tools as (10), (11), or Exercise 1, one could convert the dyadic sums one obtains from dyadic decomposition into integral variants. However, if one wished, one could “cut out the middle-man” and work with continuous dyadic decompositions rather than discrete ones. Indeed, from the integral identity

\displaystyle  \int_0^\infty 1_{\lambda < Q \leq 2\lambda} \frac{d\lambda}{\lambda} = \log 2

for any {Q>0}, together with the Fubini–Tonelli theorem, we obtain the continuous dyadic decomposition

\displaystyle  \sum_{n \in A} f(n) = \int_0^\infty \sum_{n \in A: \lambda \leq Q(n) < 2\lambda} f(n)\ \frac{d\lambda}{\lambda}

for any quantity {Q(n)} that is positive whenever {f(n)} is positive. Similarly if we work with integrals {\int_A f(x)\ dx} rather than sums. This version of dyadic decomposition is occasionally a little more convenient to work with, particularly if one then wants to perform various changes of variables in the {\lambda} parameter which would be tricky to execute if this were a discrete variable.

— 3. Exponential weights —

Many sums involve expressions that are “exponentially large” or “exponentially small” in some parameter. A basic rule of thumb is that any quantity that is “exponentially small” will likely give a negligible contribution when compared against quantities that are not exponentially small. For instance, if an expression involves a term of the form {e^{-Q}} for some non-negative quantity {Q}, which can be bounded on at least one portion of the domain of summation or integration, then one expects the region where {Q} is bounded to provide the dominant contribution. For instance, if one wishes to estimate the integral

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x}

for some {0 < \varepsilon < 1/2}, this heuristic suggests that the dominant contribution should come from the region {x = O(1/\varepsilon)}, in which one can bound {e^{-\varepsilon x}} simply by {1} and obtain an upper bound of

\displaystyle  \ll \int_{x = O(1/\varepsilon)} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}.

To make such a heuristic precise, one can perform a dyadic decomposition in the exponential weight {e^{-\varepsilon x}}, or equivalently perform an additive decomposition in the exponent {\varepsilon x}, for instance writing

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} = \sum_{j=1}^\infty \int_{j-1 \leq \varepsilon x < j} e^{-\varepsilon x} \frac{dx}{1+x}.

Exercise 6 Use this decomposition to rigorously establish the bound

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}

for any {0 < \varepsilon < 1/2}.

Exercise 7 Solve problem (i) from the introduction to this post.

More generally, if one is working with a sum or integral such as

\displaystyle  \sum_{n \in A} e^{\phi(n)} \psi(n)

or

\displaystyle  \int_A e^{\phi(x)} \psi(x)\ dx

with some exponential weight {e^\phi} and a lower order amplitude {\psi}, then one typically expects the dominant contribution to come from the region where {\phi} comes close to attaining its maximal value. If this maximum is attained on the boundary, then one typically has geometric series behavior away from the boundary, and one can often get a good estimate by obtaining geometric series type behavior. For instance, suppose one wants to estimate the error function

\displaystyle  \mathrm{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-t^2}\ dt

for {z \geq 1}. In view of the complete integral

\displaystyle  \int_0^\infty e^{-t^2}\ dt = \frac{\sqrt{\pi}}{2}

we can rewrite this as

\displaystyle  \mathrm{erf}(z) = 1 - \frac{2}{\sqrt{\pi}} \int_z^\infty e^{-t^2}\ dt.

The exponential weight {e^{-t^2}} attains its maximum at the left endpoint {t=z} and decays quickly away from that endpoint. One could estimate this by dyadic decomposition of {e^{-t^2}} as discussed previously, but a slicker way to proceed here is to use the convexity of {t^2} to obtain a geometric series upper bound

\displaystyle  e^{-t^2} \leq e^{-z^2 - 2 z (t-z)}

for {t \geq z}, which on integration gives

\displaystyle  \int_z^\infty e^{-t^2}\ dt \leq \int_z^\infty e^{-z^2 - 2 z (t-z)}\ dt = \frac{e^{-z^2}}{2z}

giving the asymptotic

\displaystyle  \mathrm{erf}(z) = 1 - O( \frac{e^{-z^2}}{z})

for {z \geq 1}.

Exercise 8 In the converse direction, establish the upper bound

\displaystyle  \mathrm{erf}(z) \leq 1 - c \frac{e^{-z^2}}{z}

for some absolute constant {c>0} and all {z \geq 1}.

Exercise 9 If {\theta n \leq m \leq n} for some {1/2 < \theta < 1}, show that

\displaystyle  \sum_{k=m}^n \binom{n}{k} \ll \frac{1}{2\theta-1} \binom{n}{m}.

(Hint: estimate the ratio between consecutive binomial coefficients {\binom{n}{k}} and then control the sum by a geometric series).

When the maximum of the exponent {\phi} occurs in the interior of the region of summation or integration, then one can get good results by some version of Laplace’s method. For simplicity we will discuss this method in the context of one-dimensional integrals

\displaystyle  \int_a^b e^{\phi(x)} \psi(x)\ dx

where {\phi} attains a non-degenerate global maximum at some interior point {x = x_0}. The rule of thumb here is that

\displaystyle \int_a^b e^{\phi(x)} \psi(x)\ dx \approx \sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0).

The heuristic justification is as follows. The main contribution should be when {x} is close to {x_0}. Here we can perform a Taylor expansion

\displaystyle  \phi(x) \approx \phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2

since at a non-degenerate maximum we have {\phi'(x_)=0} and {\phi''(x_0) > 0}. Also, if {\psi} is continuous, then {\psi(x) \approx \psi(x_0)} when {x} is close to {x_0}. Thus we should be able to estimate the above integral by the gaussian integral

\displaystyle  \int_{\bf R} e^{\phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2} \psi(x_0)\ dx

which can be computed to equal {\sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0)} as desired.

Let us illustrate how this argument can be made rigorous by considering the task of estimating the factorial {n!} of a large number. In contrast to what we did in Exercise 2, we will proceed using a version of Laplace’s method, relying on the integral representation

\displaystyle  n! = \Gamma(n+1) = \int_0^\infty x^n e^{-x}\ dx.

As {n} is large, we will consider {x^n} to be part of the exponential weight rather than the amplitude, writing this expression as

\displaystyle  \int_0^\infty e^{-\phi(x)}\ dx

where

\displaystyle  \phi(x) = x - n \log x.

The function {\phi} attains a global maximum at {x_0 = n}, with {\phi(n) = 0} and {\phi''(n) = 1/n}. We will therefore decompose this integral into three pieces

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx + \int_{n-R}^{n+R} e^{-\phi(x)}\ dx + \int_{n+R}^\infty e^{-\phi(x)}\ dx \ \ \ \ \ (16)

where {0 < R < n} is a radius parameter which we will choose later, as it is not immediately obvious for now how to select it.

The main term is expected to be the middle term, so we shall use crude methods to bound the other two terms. For the first part where {0 < x \leq n-R}, {\phi} is increasing so we can crudely bound {e^{-\phi(x)} \leq e^{-\phi(n-R)}} and thus

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx \leq (n-R) e^{-\phi(n-R)} \leq n e^{-\phi(n-R)}.

(We expect {R} to be much smaller than {n}, so there is not much point to saving the tiny {-R} term in the {n-R} factor.) For the third part where {x \geq n+R}, {\phi} is decreasing, but bounding {e^{-\phi(x)}} by {e^{-\phi(n+R)}} would not work because of the unbounded nature of {x}; some additional decay is needed. Fortunately, we have a strict increase

\displaystyle  \phi'(x) = 1 - \frac{n}{x} \geq 1 - \frac{n}{n+R} = \frac{R}{n+R}

for {x \geq n+R}, so by the intermediate value theorem we have

\displaystyle  \phi(x) \geq \phi(n+R) + \frac{R}{n+R} (x-n-R)

and after a short calculation this gives

\displaystyle  \int_{n+R}^\infty e^{-\phi(x)}\ dx \leq \frac{n+R}{R} e^{-\phi(n+R)} \ll \frac{n}{R} e^{-\phi(n+R)}.

Now we turn to the important middle term. If we assume {R \leq n/2}, then we will have {\phi'''(x) = O( 1/n^2 )} in the region {n-R \leq x \leq n+R}, so by Taylor’s theorem with remainder

\displaystyle  \phi(x) = \phi(n) + \phi'(n) (x-n) + \frac{1}{2} \phi''(n) (x-n)^2 + O( \frac{|x-n|^3}{n^2} )

\displaystyle  = \phi(n) + \frac{(x-n)^2}{2n} + O( \frac{R^3}{n^2} ).

If we assume that {R = O(n^{2/3})}, then the error term is bounded and we can exponentiate to obtain

\displaystyle  e^{-\phi(x)} = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n) - \frac{(x-n)^2}{2n}} \ \ \ \ \ (17)

for {n-R \leq x \leq n+R} and hence

\displaystyle \int_{n-R}^{n+R} e^{-\phi(x)}\ dx = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n)} \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx.

If we also assume that {R \gg \sqrt{n}}, we can use the error function type estimates from before to estimate

\displaystyle  \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx = \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n} ).

Putting all this together, and using (17) to estimate {e^{-\phi(n \pm R)} \ll e^{-\phi(n) - \frac{R^2}{2n}}}, we conclude that

\displaystyle  n! = e^{-\phi(n)} ( (1 + O(\frac{R^3}{n^2})) \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n})

\displaystyle  + O( n e^{-R^2/2n} ) + O( \frac{n}{R} e^{-R^2/2n} ) )

\displaystyle  = e^{-n+n \log n} (\sqrt{2\pi n} + O( \frac{R^2}{n} + n e^{-R^2/2n} ))

so if we select {R=n^{2/3}} for instance, we obtain the Stirling approximation

\displaystyle  n! = \frac{n^n}{e^n} (\sqrt{2\pi n} + O( n^{1/3}) ).

One can improve the error term by a finer decomposition than (16); we leave this as an exercise to the interested reader.

Remark 10 It can be convenient to do some initial rescalings to this analysis to achieve a nice normalization; see this previous blog post for details.

Exercise 11 Solve problem (iii) from the introduction. (Hint: extract out the term {\frac{k^{2n-4k}}{(n-k)^{2n-4k}}} to write as the exponential factor {e^{\phi(k)}}, placing all the other terms (which are of polynomial size) in the amplitude function {\psi(k)}. The function {\phi} will then attain a maximum at {k=n/2}; perform a Taylor expansion and mimic the arguments above.)

October 03, 2023

David Hoggfirst-ever Blanton–Hogg group meeting?

Today was not the first-ever Blanton–Hogg group meeting. But it was the first ever for me, since I missed the first two for health and travel reasons. It was great! Tardugno (NYU) showed simulations of gas disks with embedded planets. The planets affect the disk, and the disk causes the planets to interact. Daunt (NYU) showed that his method for inferring (simultaneously) the spectrum, tellurics, and radial velocities in stellar spectra all works. I am stoked! Novara (NYU) showed that he has a bug in his code! But we had a good discussion inspired by that bug about surfaces of section in a real dynamics problem. Gandhi (NYU) showed us a paper with questionable claims about the CMB light passing through galaxy halos?

David Hoggmachine-learning theory and practice

Today I got invited to be on a panel discussion (hosted by Soledad Villar of JHU) with Alberto Bietti (Flatiron) about the theory and practice of machine learning. It was great! We talked about why ML works for scientific applications, and Bietti said something (obvious maybe) that I loved: Maybe ML only works because of properties of the data. That is, maybe when we are analyzing ML methods we are looking in the wrong place, and we should be analyizing the data to which they are successfully applied? I made fun of interpretation in ML, and that led to interesting comments from both Bietti and the audience. Several audience members suggested taking something more like a causal approach to interpretation: How does the method work under interventions or in conditional situations? That's interesting; it isn't what a physicist would consider interpetation, but it might be sufficient in many cases.

n-Category Café The Free 2-Rig on One Object

These are notes for the talk I’m giving at the Edinburgh Category Theory Seminar this Wednesday.

(No, the talk will not be recorded.)

Schur Functors

The representation theory of the symmetric groups is clarified by thinking of all representations of all these groups as objects of a single category: the category of Schur functors. These play a universal role in representation theory, since Schur functors act on the category of representations of any group. We can understand this as an example of categorification. A ‘rig’ is a ‘ring without negatives’, and the free rig on one generator is [x]\mathbb{N}[x], the rig of polynomials with natural number coefficients. Categorifying the concept of commutative rig we obtain the concept of ‘symmetric 2-rig’, and it turns out that the category of Schur functors is the free symmetric 2-rig on one generator. Thus, in a certain sense, Schur functors are the next step after polynomials.

Many important categories act like categorified rings:

  • Set\mathsf{Set}, the category of sets, with coproduct and product giving ++ and ×\times

  • Vect k\mathsf{Vect}_k, the category of vector spaces over a field kk, with the usual \oplus and \otimes

and many more. Since they tend to lack subtraction they are really categorified ‘rigs’. Sometimes the multiplication is the categorical product but often not. Today I’m only interested in examples where the addition is coproduct, though there are examples where it’s not, like:

  • core(Set)core(\mathsf{Set}), the groupoid of sets with the ++ and ×\times coming from coproduct and product in Set\mathsf{Set}.

I definitely want multiplication to distribute over addition (up to natural isomorphism), but often it distributes over more general colimits. Some categorified rigs have all colimits and multiplication distributes over all of them: for example, (Set,+,×)(\mathsf{Set}, +, \times) and (Vect k,,)(\mathsf{Vect}_k, \oplus, \otimes) work this way. But I’m also interested in examples that don’t have all colimits, like

  • FinVect k\mathsf{FinVect}_k, the category of finite-dimensional vector spaces over a field kk, with the usual \oplus and \otimes

And I’m also interested in examples that don’t even have all finite colimits, like

  • the category of complex vector bundles over a space with the usual \oplus and \otimes

I’ve been working with Joe Moeller and Todd Trimble on a class of categorified rigs with a strong linear algebra flavor, including these examples:

  • Vect k\mathsf{Vect}_k, the category of vector spaces over a field kk, with the usual \oplus and \otimes

  • FinVect k\mathsf{FinVect}_k, the category of finite-dimensional vector spaces over a field kk, with the usual \oplus and \otimes

  • the category of complex vector bundles over a space with the usual \oplus and \otimes

They still have absolute colimits, and one great thing is that multiplication automatically distribute over absolute colimits. But what are absolute colimits? For that, let me back up and review a few things about enriched category theory.

Let V\mathsf{V} be any cosmos: a complete and cocomplete symmetric monoidal closed category. I only care about the case where V\mathsf{V} is Vect k\mathsf{Vect}_k made monoidal using its usual tensor product, but it’s fun to talk more generally.

We can define V\mathsf{V}-categories, which are like ordinary categories but with hom(x,y)Vhom(x,y) \in V, and with composition

:hom(y,z)hom(x,y)hom(x,z) \circ : hom(y,z) \otimes hom(x,y) \to hom(x,z)

being a morphism in mathfV\mathf{V}. We can define V\mathsf{V}-functors and V\mathsf{V}-natural transformations by following our nose. We can also define a tensor product CD\mathsf{C} \boxtimes \mathsf{D} of V\mathsf{V}-categories, generalizing the usual product of categories. This lets us define monoidal V\mathsf{V}-categories, which have a tensor product

|:CCC \phantom{|} \otimes \colon \mathsf{C} \boxtimes \mathsf{C} \to \mathsf{C}

We can also define symmetric monoidal V\mathsf{V}-categories, etc.

In the world of V\mathsf{V}-categories, ‘absolute’ colimits are those automatically preserved by all V\mathsf{V}-functors. More precisely: in enriched category theory we use ‘weighted’ colimits, which are defined using not only a diagram D\mathsf{D} but also a ‘weight’ Φ:D opV\Phi: \mathsf{D}^{op} \to \mathsf{V}. Then, absolute colimits are those whose weights give weighted colimits that are preserved by all V\mathsf{V}-functors between all V\mathsf{V}-categories.

For example, if V=Set\mathsf{V} = \mathsf{Set} we’re back to ordinary colimits and essentially the only absolute colimits are split idempotents. (You can get all the rest from these.) It’s a good exercise to check that split idempotents are preserved by all functors.

But if V=Vect k\mathsf{V} = \mathsf{Vect}_k there are more absolute colimits:

  • the initial object (often called 00)
  • binary coproducts (often called direct sums)
  • splittings of idempotents (explained below)

You can get all the rest from these.

Definition. A V\mathsf{V}-category is Cauchy complete if it has all absolute colimits.

For example, the category of complex vector bundles over a space is a Vect k\mathsf{Vect}_k-enriched category that has all absolute colimits! It doesn’t have coequalizers of all parallel pairs of morphisms. But given p:xxp \colon x \to x that’s idempotent (p 2=pp^2 = p) you can form the coequalizer of pp and 0:xx0 \colon x \to x — or in other words, the cokernel cokerpcoker p. This is called splitting the idempotent pp because 1p1-p is also idempotent, so you can also form coker(1p)coker (1-p) and show

xcokerpcoker(1p) x \cong coker p \oplus coker (1-p)

Definition. An absolute 2-rig is a monoidal V\mathsf{V}-category R\mathsf{R} that is Cauchy complete.

Note that for any object xRx \in \mathsf{R}, the functors

x:RR x \otimes - : \mathsf{R} \to \mathsf{R}

and

x:RR - \otimes x: \mathsf{R} \to \mathsf{R}

automatically preserve all absolute colimits, so we say tensor products distribute over absolute colimits. For example, we have a natural isomorphism

x(yz)(xy)(xz) x \otimes (y \oplus z) \cong (x \otimes y) \, \oplus \, (x \otimes z)

as you’d hope for in a categorified ring.

From now on I’m going to say ‘2-rig’ when I mean symmetric 2-rig: that is, one where the tensor product is symmetric monoidal. It’s just like how algebraic geometers say ‘ring’ when they mean commutative ring. I used to be annoyed by how algebraic geometers do that, but now I see why: I’m interested in the symmetric case, and it gets really boring saying ‘symmetric’ all the time.

And from now on let’s take V=Vect k\mathsf{V} = \mathsf{Vect}_k. In this case the free 2-rig on one object turns out to be an important structure in mathematics, often called the category of Schur functors! But we’ll just work it out.

There is a 2-functor

F:Cat2Rig F : \mathbf{Cat} \to \mathbf{2Rig}

that forms the free (symmetric) 2-rig on any category. We can get it by composing three other 2-functors

CatSSymMonCatk[]SymMonVCatCauchycompletion2Rig \mathbf{Cat} \xrightarrow{S} \mathbf{SymMonCat} \xrightarrow{k[\cdot]} \mathbf{SymMon}\mathsf{V}\mathbf{Cat} \xrightarrow{Cauchy \; completion} \mathbf{2Rig}

In fact all of these are left adjoints, or technically left pseudoadjoints since we’re working with 2-categories.

Here:

1) SS gives the free symmetric monoidal category on a category.

2) For starters, k[]k[\cdot] gives the free Vect k\mathsf{Vect}_k-category on a category, by replacing each homset hom(x,y)hom(x,y) by the free vector space on that set, which we call k[hom(x,y)]k[hom(x,y)]. Since V=Vect k\mathsf{V} = \mathsf{Vect}_k, this gives a 2-functor

k[]:CatVCat k[\cdot] \colon \mathbf{Cat} \to \mathsf{V}\mathbf{Cat}

which we can then use to get

k[]:SymMonCatSymMonVCat k[\cdot] \colon \mathbf{SymMonCat} \to \mathbf{SymMon}\mathsf{V}\mathbf{Cat}

3) If C\mathsf{C} is some V\mathsf{V}-category, C¯\overline{\mathsf{C}} is a Cauchy complete category called its Cauchy completion. Cauchy completing a symmetric monoidal V\mathsf{V}-category we get a 2-rig.

Let’s look at two examples!

As a warmup, let’s start with the empty category \emptyset. The free symmetric monoidal category on the empty category, SS\emptyset, is just the terminal category 11. The free Vect kVect_k-enriched category on this, k[1]k[1], still has one object *\ast but now it has a one-dimensional space of endomorphisms. So

hom(*,*)k hom(\ast, \ast) \cong k

and composition of morphisms is multiplication in the field kk. This is actually a symmetric monoidal V\mathsf{V}-category. What happens when we Cauchy complete it? All idempotents already split in k[1]k[1], but it doesn’t have an initial object or binary coproducts. When we throw those in we get the category of finite-dimensional vector spaces! So

F()FinVect k F(\emptyset) \simeq \mathsf{FinVect}_k

Yes: the free 2-rig on the empty category is FinVect k \mathsf{FinVect}_k. And since left pseudoadjonts preserve initial objects, this means the initial 2-rig is FinVect k\mathsf{FinVect}_k.

That was fun. But now let’s figure out the free 2-rig on one object. More precisely, let’s work out F1F 1 where 11 is the terminal category.

S1S 1, the free symmetric monoidal category on one object, is equivalent to the groupoid of finite sets! If we use a skeleton it has objects nn \in \mathbb{N}, and

hom(m,n){S n ifm=n ifmn hom(m,n) \cong \left\{\begin{array}{cc} S_n & if m = n \\ \emptyset & if m \ne n \end{array} \right.

So, S1S 1 is a groupoid combining all the symmetric groups.

Next we get k[S1]k[S 1], the free symmetric monoidal Vect kVect_k-category on one object, by linearizing the homsets of S1S 1. So, it has the same objects but now

hom(m,n){k[S n] ifm=n 0 ifmn hom(m,n) \cong \left\{\begin{array}{cc} k[S_n] & if m = n \\ 0 & if m \ne n \end{array} \right.

Here k[S n]k[S_n] is the free vector space on the symmetric group S nS_n. Its usually called the ‘group algebra’ of S nS_n because it gets a multiplication from multiplication in the group — and this multiplication is how we compose morphisms in k[S1]k[S 1]. At least that’s the interesting part: there are also ‘zero morphisms’ from mm to nn when mnm \ne n, and composing with these is like multiplying by zero. I’ll summarize all this by writing

k[S1] n0k[S n] k[S 1] \simeq \bigoplus_{n \ge 0} k[S_n]

where the direct sum is a way of glomming together Vect k\mathsf{Vect}_k-categories (their coproduct).

Next we get k[S1]¯\overline{k[S 1]}, the free 2-rig on one object, by taking the Cauchy completion of k[S1]k[S 1]. To understand this, I’ll now assume kk has characteristic zero. Then for any finite group GG we have

k[G]¯FinRep k(G) \overline{k[G]} \simeq FinRep_k(G)

where FinRep k(G)FinRep_k(G) is the category of representations of GG on finite-dimensional vector spaces over kk. So, we get

F1k[S1]¯ n0FinRep k(S n) F 1 \simeq \overline{k[S 1]} \simeq \bigoplus_{n \ge 0} FinRep_k(S_n)

Nice! We’re seeing the free 2-rig on one object contains the representation categories of all the symmetric groups, put together in one neat package!

Now I want to prove a fun theorem about the free 2-rig on one object which explains why it’s called the category of ‘Schur functors’. This theorem actually describes, not the free 2-rig F1F 1, but its underlying category, which I’ll call U(F1)U(F 1).

Theorem. Let U:2RigCatU \colon \mathbf{2Rig} \to \mathbf{Cat} be the forgetful 2-functor from 2-rigs to categories, and let [U,U][U,U] be the category with

  • pseudonatural transformations α:UU\alpha \colon U \Rightarrow U as objects
  • modifications between these as morphisms.

Then

[U,U]U(F1) [U, U] \simeq U(F 1)

Proof sketch. I’ll prove a simpler theorem, but the proof of the full-fledged one works just the same way. Let’s decategorify and look at the forgetful functor from rings to sets, U:RingSetU \colon \mathsf{Ring} \to \mathsf{Set}. In this case [U,U][U,U] is just the set of natural transformations α:UU\alpha \colon U \Rightarrow U. I’ll show you that

[U,U]U(F1) [U, U] \simeq U(F 1)

where F:SetRingF \colon \mathsf{Set} \to \mathsf{Ring} is the ‘free ring on a set’ functor. In this decategorified case F1F 1 is just [x]\mathbb{Z}[x], the algebra of polynomials in one variable.

The proof is quick. We use a little formula for the functor UU:

URing(F1,)() U \cong \mathsf{Ring}(F 1, -) \qquad \qquad (\star)

In other words, for any ring RR, the underlying set URU R is naturally isomorphic to Ring(F1,R)\mathsf{Ring}(F 1, R), which is the set of ring homomorphisms from [x]\mathbb{Z}[x] to RR.

This is obvious, because such a ring homomorphism can send xx to any element of RR, and that determines it. But let’s prove this fact in a way that generalizes! Note that since FF is left adjoint to UU we have

Ring(F1,R)Set(1,UR) \mathsf{Ring}(F 1, R) \cong \mathsf{Set}(1, U R)

but for any set XX we have Set(1,X)X\mathsf{Set}(1,X) \cong X, so

Set(1,UR)UR \mathsf{Set}(1, U R) \cong U R

So, we’ve shown ()(\star).

Next we calculate:

[U,U] [Ring(F1,),Ring(F1,)] by() Ring(F1,F1) bytheYonedaLemma U(F1) by()again \begin{array}{ccll} [U,U] &\cong& \left[\mathsf{Ring}(F 1, -), \mathsf{Ring}(F 1, -) \right] & by \; (\star) \\ & \cong & \mathsf{Ring}(F 1, F 1) & by \; the \; Yoneda \; Lemma \\ & \cong & U(F 1) & by \; (\star) \; again \end{array}

We’re done!    █

Now, when you carefully look at this proof you’ll see it has nothing to do with rings, or 2-rigs. It’s extremely general! In the decategorified case all we needed was a right adjoint functor from any category to Set\mathsf{Set}. Similarly, in the full-fledged case all we need is a right pseudoadjoint 2-functor UU from any 2-category to Cat\mathbf{Cat}. Then we recover U(F1)U(F 1) from the category of pseudonatural transformations of UU. Experts would call this result a kind of ‘Tannaka reconstruction’ theorem.

But what does this result actually mean in the cases I’m talking about?

In the case of rings, [x]\mathbb{Z}[x] acts naturally on the underlying set of any ring. Say we have a polynomial

P= n0a nx n[x] P = \sum_{n \ge 0} a_n x^n \in \mathbb{Z}[x]

Then for any ring RR, we get a map

r n0a nr n r \mapsto \sum_{n \ge 0} a_n r^n

It’s not a ring homomorphism, just a map from the underlying set URU R to itself. And its natural in RR.

Simple enough. But the theorem says something deeper: every natural map U(R)U(R)U(R) \to U(R) comes from a polynomial in [x]\mathbb{Z}[x].

In the case of 2-rigs the story is similar. k[S1]¯\overline{k[S 1]} acts pseudonaturally on the underlying category of any 2-rig. Say we have an object

P= n0ρ nk[S1]¯ P = \bigoplus_{n \ge 0} \rho_n \in \overline{k[S 1]}

Remember that now ρ n\rho_n is a finite-dimensional representation of S nS_n. Then for any 2-rig R\mathsf{R}, we get a map

x n0ρ n k[S n]x n x \mapsto \bigoplus_{n \ge 0} \rho_n \otimes_{k[S_n]} x^{\otimes n}

It’s not a 2-rig map, just a functor from the underlying category URU \mathsf{R} to itself. It’s called a Schur functor. And it’s pseudonatural in R\mathsf{R}.

Postlude

Everything I explained is in here:

so you can look there for more details and references to previous work. But for the nn-Category Café crowd, I’ll add that I guessed a primitive version of this theorem here on the n-Café back in 2007, in response to this question of Allen Knutson:

For as long as I’ve understood Schur functors, I’ve thought about them as functors Vect Vect Vect_{\mathbb{C}} \to Vect_{\mathbb{C}}. But now that we’re going through them in a reading course on Fulton’s Young Tableaux, I discover that the input isn’t really a complex vector space, but an arbitrary module over a commutative ring. (And maybe, just maybe, a bimodule over a noncommutative one, but I doubt it.)

In particular, the Schur functor commutes with base change, aka extension of scalars.

What is the right way to describe this object, categorically? (Or should I say, 2- or 3-categorically?)

You can see me guessing the theorem back then… but it took Todd Trimble to prove it. And in the process, he whittled down my assumptions and came up with the definition of 2-rig I’m using here. It doesn’t need to have lots of colimits, just absolute colimits.

When Joe, Todd and I wrote our paper, which includes a lot more material than this, we acknowledged Allen as “prime instigator”, and blogged about it here. But I never wrote a simple explanation of how we proved the theorem above! So here it is.

Jacques Distler Fine Structure

I’m teaching the undergraduate Quantum II course (“Atoms and Molecules”) this semester. We’ve come to the point where it’s time to discuss the fine structure of hydrogen. I had previously found this somewhat unsatisfactory. If one wanted to do a proper treatment, one would start with a relativistic theory and take the non-relativistic limit. But we’re not going to introduce the Dirac equation (much less QED). And, in any case, introducing the Dirac equation would get you the leading corrections but fail miserably to get various non-leading corrections (the Lamb shift, the anomalous magnetic moment, …).

Instead, various hand-waving arguments are invoked (“The electron has an intrinsic magnetic moment and since it’s moving in the electrostatic field of the proton, it sees a magnetic field …”) which give you the wrong answer for the spin-orbit coupling (off by a factor of two), which you then have to further correct (“Thomas precession”) and then there’s the Darwin term, with an even more hand-wavy explanation …

So I set about trying to find a better way. I want use as minimal as possible input from the relativistic theory and get the leading relativistic correction(s).

  1. For a spinless particle, the correction amounts to replacing the nonrelativistic kinetic energy by the relativistic expression p 22mp 2c 2+m 2c 4mc 2=p 22m(p 2) 28m 3c 2+ \frac{p^2}{2m} \to \sqrt{p^2 c^2 +m^2 c^4} - m c^2 = \frac{p^2}{2m} - \frac{(p^2)^2}{8m^3 c^2}+\dots
  2. For a spin-1/2 particle, “p\vec{p}” only appears dotted into the Pauli matrices, σp\vec{\sigma}\cdot\vec{p}.
    • In particular, this tells us how the spin couples to external magnetic fields σpσ(pqA/c)\vec{\sigma}\cdot\vec{p} \to \vec{\sigma}\cdot(\vec{p}-q \vec{A}/c).
    • What we previously wrote as “p 2p^2” could just as well have been written as (σp) 2(\vec{\sigma}\cdot\vec{p})^2.
  3. Parity and time-reversal invariance1 imply only even powers of σp\vec{\sigma}\cdot\vec{p} appear in the low-velocity expansion.
  4. Shifting the potential energy, V(r)V(r)+constV(\vec{r})\to V(\vec{r})+\text{const}, should shift HH+constH\to H+\text{const}.

With those ingredients, at O(v 2/c 2)O(\vec{v}^2/c^2) there is a unique term2 (in addition to the correction to the kinetic energy that we found for a spinless particle) that can be written down for spin-1/2 particle. H=p 22m+V(r)(p 2) 28m 3c 2c 1m 2c 2[σp,[σp,V(r)]] H = \frac{p^2}{2m} +V(\vec{r}) - \frac{(p^2)^2}{8m^3 c^2} - \frac{c_1}{m^2 c^2} [\vec{\sigma}\cdot\vec{p},[\vec{\sigma}\cdot\vec{p},V(\vec{r})]] Expanding this out a bit, [σp,[σp,V]]=(p 2V+Vp 2)2σpVσp [\vec{\sigma}\cdot\vec{p},[\vec{\sigma}\cdot\vec{p},V]] = (p^2 V + V p^2) - 2 \vec{\sigma}\cdot\vec{p} V \vec{\sigma}\cdot\vec{p} Both terms are separately Hermitian, but condition (4) fixes their relative coefficient.

Expanding this out, yet further (and letting S=2σ\vec{S}=\tfrac{\hbar}{2}\vec{\sigma}) c 1m 2c 2[σp,[σp,V]]=4c 1m 2c 2((V)×p)S+c 1 2m 2c 2 2(V) -\frac{c_1}{m^2 c^2} [\vec{\sigma}\cdot\vec{p},[\vec{\sigma}\cdot\vec{p},V]]= \frac{4c_1}{m^2 c^2} (\vec{\nabla}(V)\times \vec{p})\cdot\vec{S} + \frac{c_1\hbar^2}{m^2 c^2} \nabla^2(V)

For a central force potential, (V)=r1rdVdr\vec{\nabla}(V)= \vec{r}\frac{1}{r}\frac{d V}{d r} and the first term is the spin-orbit coupling, 4c 1m 2c 21rdVdrLS\frac{4c_1}{m^2 c^2} \frac{1}{r}\frac{d V}{d r}\vec{L}\cdot\vec{S}. The second term is the Darwin term. While I haven’t fixed the overall coefficient (c 1=1/8c_1=1/8), I got the form of the spin-orbit coupling and of the Darwin term correct and I fixed their relative coefficient (correctly!).

No hand-wavy hocus-pocus was required.

And I did learn something that I never knew before, namely that this correction can be succinctly written as a double-commutator [σp,[σp,V]][\vec{\sigma}\cdot\vec{p},[\vec{\sigma}\cdot\vec{p},V]]. I don’t think I’ve ever seen that before …


1 On the Hilbert space =L 2( 3) 2\mathcal{H}=L^2(\mathbb{R}^3)\otimes \mathbb{C}^2, time-reversal is implemented as the anti-unitary operator Ω T:(f 1(r) f 2(r))(f 2 *(r) f 1 *(r)) \Omega_T: \begin{pmatrix}f_1(\vec{r})\\ f_2(\vec{r})\end{pmatrix} \mapsto \begin{pmatrix}-f^*_2(\vec{r})\\ f^*_1(\vec{r})\end{pmatrix} and parity is implemented as the unitary operator U P:(f 1(r) f 2(r))(f 1(r) f 2(r)) U_P: \begin{pmatrix}f_1(\vec{r})\\ f_2(\vec{r})\end{pmatrix} \mapsto \begin{pmatrix}f_1(-\vec{r})\\ f_2(-\vec{r})\end{pmatrix} These obviously satisfy Ω TσΩ T 1 =σ, U PσU P =σ Ω TpΩ T 1 =p, U PpU P =p \begin{aligned} \Omega_T \vec{\sigma} \Omega_T^{-1} &= -\vec{\sigma},\quad& U_P \vec{\sigma} U_P &= \vec{\sigma}\\ \Omega_T \vec{p} \Omega_T^{-1} &= -\vec{p},\quad& U_P \vec{p} U_P &= -\vec{p}\\ \end{aligned}
2 Of course, the operator i[H 0,V]i[H_0,V] also appears at the same order. But it makes zero contribution to the shift of energy levels in first-order perturbation theory, so we ignore it.

September 30, 2023

Doug NatelsonFaculty positions at Rice, + annual Nobel speculation

Trying to spread the word:

The Department of Physics and Astronomy at Rice University in Houston, Texas invites applications for two tenure-track faculty positions, one experimental and one theoretical, in the area of quantum science using atomic, molecular, or optical methods. This encompasses quantum information processing, quantum sensing, quantum networks, quantum transduction, quantum many-body physics, and quantum simulation conducted on a variety of platforms. The ideal candidates will intellectually connect AMO physics to topics in condensed matter and quantum information theory. In both searches, we seek outstanding scientists whose research will complement and extend existing quantum activities within the Department and across the University. In addition to developing an independent and vigorous research program, the successful applicants will be expected to teach, on average, one undergraduate or graduate course each semester, and contribute to the service missions of the Department and University. The Department anticipates making the appointments at the assistant professor level. A Ph.D. in physics or related field is required by June 30, 2024.

Applications for these positions must be submitted electronically at apply.interfolio.com/131378 (experimental) and apply.interfolio.com/131379 (theoretical). Applicants will be required to submit the following: (1) cover letter; (2) curriculum vitae; (3) statement of research; (4) statement on teaching; (5) statement on diversity, mentoring, and outreach; (6) PDF copies of up to three publications; and (7) the names, affiliations, and email addresses of three professional references. Rice University, and the Department of Physics and Astronomy, are strongly committed to a culturally diverse intellectual community. In this spirit, we particularly welcome applications from all genders and members of historically underrepresented groups who exemplify diverse cultural experiences and who are especially qualified to mentor and advise all members of our diverse student population. We will begin reviewing applications by November 15, 2023. To receive full consideration, all application materials must be received by December 15, 2023. The expected appointment date is July 2024.

____

In addition, the Nobels will be announced this week.  For the nth year in a row, I will put forward my usual thought that it could be Aharonov and Berry for geometric phases in physics (though I know that Pancharatnam is intellectually in there and died in 1969).  Speculate away below in the comments.  I'm traveling, but I will try to follow the discussion.

September 29, 2023

Matt von HippelOn the Care and Feeding of International Employees

Science and scholarship are global. If you want to find out the truth about the universe, you’ll have to employ the people best at figuring out that truth, regardless of where they come from. Research shuffles people around, driving them together to collaborate and apart to share their expertise.

(If you don’t care about figuring out the truth, and just want to make money? You still may want international employees. For plenty of jobs, the difference between the best person in the world and the best person in your country can be quite substantial.)

How do you get these international employees? You could pay them a lot, I guess, but that’s by definition expensive, and probably will annoy the locals. Instead, most of what you need to do to attract international employees isn’t to give them extra rewards: instead, it’s more important to level the playing field, and cover for the extra disadvantages an international employee will have.

You might be surprised when I mention disadvantages, but while international employees may be talented people, that doesn’t make moving to another country easy. If you stay in the same country you were born, you get involved in that country’s institutions in a regular way. Your rights and responsibilities, everything from driving to healthcare to taxes, are set up gradually over the course of your life. For someone moving to a new country, that means all of this has to be set up all at once.

This means that countries that can process these things quickly are much better for international employees. If your country takes six months to register someone for national healthcare, then new employees are at risk during that time or will have to pay extra for private insurance. If a national ID number is required to get a bank account, then whatever processing time that ID number takes must pass before the new employee can get paid. It also matters if the rules are clearly and consistently communicated, as new international employees can waste a lot of time and money if they’re given incorrect advice, or if different bureaucrats enforce different rules at their own discretion.

It also means that employers have an advantage if they can smooth entry into these institutions. In some countries it can be quite hard to find a primary care physician, as most people have the same doctor as their parents, switching only when a doctor retires. When I worked with the Perimeter Institute, they had a relationship with a local clinic that would accept their new employees as clients. In a city where it was otherwise quite hard to find a doctor, that was a real boon. Employers can also offer consistent advice even when their government doesn’t. They can keep track of their employees experiences and make reliable guides for how to navigate the system. If they can afford it, they can even keep an immigration lawyer on staff to advise about these questions.

An extremely important institution is the language itself. Moving internationally will often involve moving somewhere where you don’t speak the language, or don’t speak it very well. This gives countries an advantage if their immigrant-facing institutions are proficient in a language that’s common internationally, which at the moment largely means English. It also means countries have a big advantage if their immigrant-facing institutions are digital. If you communicate with immigrants with text, they can find online translations and at least try to figure things out. If you communicate in person, or worse through a staticky phone line, then you will try the patience even of people who do passably speak the language.

In the long term, of course, one cannot get by in one’s native language alone. As such, it is also important for countries to have good ways for people to learn the language. While I lived there, Denmark went back and forth on providing free language lessons for recent immigrants, sometimes providing them and sometimes not.

All of these things become twice as important in the case of spouses. You might think the idea that a country or employer should help out a new employee’s spouse is archaic, a product of an era of housewives discouraged from supporting themselves. But it is precisely because we don’t live in such an era that countries and employers need to take spouses into account. For an employer, hiring someone from another country is already an unusual event. Two partners getting hired to move to the same country by different employers at the same time is, barring special arrangements, extremely unlikely. That means that spouses of international employees should not have to wait for an employer to give them the same rights as their spouse: they need the same right to healthcare and employment and the like as their spouse, on arrival, so that they can find jobs and integrate without an unfair disadvantage. An employer can level the playing field further. The University of Copenhagen’s support for international spouses included social events (important because it’s hard to make new friends in a new country without the benefit of work friends), resume help (because each country has different conventions and expectations for job seekers), and even legal advice. At minimum, every resource you provide your employees that could in principle also be of use to their spouses (language classes, help with bureaucracy) should be considered.

In all your planning, as a country or an employer, keep in mind that not everyone has the same advantages. You can’t assume that someone moving to a new country will be able to integrate on their own. You have to help them, if not for fairness’ sake, then because if you don’t you won’t keep getting international employees to come at all.

September 28, 2023

n-Category Café Lectures on Applied Category Theory

Want to learn applied category theory? You can now read my lectures here:

There are a lot, but each one is bite-sized and basically covers just one idea. They’re self-contained, but you can also read them along with Fong and Spivak’s free book to get two outlooks on the same material:

Huge thanks go to Simon Burton for making my lectures into nice web pages! But they still need work. If you see problems, please let me know.

Here’s one problem: I need to include more of my ‘Puzzles’ in these lectures. None of the links to puzzles work. Students in the original course also wrote up answers to all of these puzzles, and to many of Fong and Spivak’s exercises. But it would take quite a bit of work to put all those into webpage form, so I can’t promise to do that. 😢

Here are the lectures:

Chapter 1: Ordered Sets

Chapter 2: Resource Theories

Chapter 3: Databases

Chapter 4: Collaborative Design

John BaezLectures on Applied Category Theory

Want to learn applied category theory? You can now read my lectures here:

Lectures on applied category theory

There are a lot of them, but each one is bite-sized and basically covers just one idea. They’re self-contained, but you can also read them along with Fong and Spivak’s free book to get two outlooks on the same material:

• Brendan Fong and David Spivak, Seven Sketches in Compositionality: An Invitation to Applied Category Theory.

Huge thanks go to Simon Burton for making my lectures into nice web pages! But they still need work. If you see problems, please let me know.

Here’s a problem: I need to include more of my ‘Puzzles’ in these lectures. None of the links to puzzles work. Students in the original course also wrote up answers to all of these puzzles, and to many of Fong and Spivak’s exercises. But it would take quite a bit of work to put all those into webpage form, so I can’t promise to do that. 😢

Here are the lectures:

Chapter 1: Ordered Sets

Lecture 1 – Introduction

Lecture 2 – What is Applied Category Theory?

Lecture 3 – Preorders

Lecture 4 – Galois Connections

Lecture 5 – Galois Connections

Lecture 6 – Computing Adjoints

Lecture 7 – Logic

Lecture 8 – The Logic of Subsets

Lecture 9 – Adjoints and the Logic of Subsets

Lecture 10 – The Logic of Partitions

Lecture 11 – The Poset of Partitions

Lecture 12 – Generative Effects

Lecture 13 – Pulling Back Partitions

Lecture 14 – Adjoints, Joins and Meets

Lecture 15 – Preserving Joins and Meets

Lecture 16 – The Adjoint Functor Theorem for Posets

Lecture 17 – The Grand Synthesis

Chapter 2: Resource Theories

Lecture 18 – Resource Theories

Lecture 19 – Chemistry and Scheduling

Lecture 20 – Manufacturing

Lecture 21 – Monoidal Preorders

Lecture 22 – Symmetric Monoidal Preorders

Lecture 23 – Commutative Monoidal Posets

Lecture 24 – Pricing Resources

Lecture 25 – Reaction Networks

Lecture 26 – Monoidal Monotones

Lecture 27 – Adjoints of Monoidal Monotones

Lecture 28 – Ignoring Externalities

Lecture 29 – Enriched Categories

Lecture 30 – Preorders as Enriched Categories

Lecture 31 – Lawvere Metric Spaces

Lecture 32 – Enriched Functors

Lecture 33 – Tying Up Loose Ends

Chapter 3: Databases

Lecture 34 – Categories

Lecture 35 – Categories versus Preorders

Lecture 36 – Categories from Graphs

Lecture 37 – Presentations of Categories

Lecture 38 – Functors

Lecture 39 – Databases

Lecture 40 – Relations

Lecture 41 – Composing Functors

Lecture 42 – Transforming Databases

Lecture 43 – Natural Transformations

Lecture 44 – Categories, Functors and Natural Transformations

Lecture 45 – Composing Natural Transformations

Lecture 46 – Isomorphisms

Lecture 47 – Adjoint Functors

Lecture 48 – Adjoint Functors

Lecture 49 – Kan Extensions

Lecture 50 – Left Kan Extensions

Lecture 51 – Right Kan Extensions

Lecture 52 – The Hom-Functor

Lecture 53 – Free and Forgetful Functors

Lecture 54 – Tying Up Loose Ends

Chapter 4: Collaborative Design

Lecture 55 – Enriched Profunctors and Collaborative Design

Lecture 56 – Feasibility Relations

Lecture 57 – Feasibility Relations

Lecture 58 – Composing Feasibility Relations

Lecture 59 – Cost-Enriched Profunctors

Lecture 60 – Closed Monoidal Preorders

Lecture 61 – Closed Monoidal Preorders

Lecture 62 – Enriched Profunctors

Lecture 63 – Composing Enriched Profunctors

Lecture 64 – The Category of Enriched Profunctors

Lecture 65 – Collaborative Design

Lecture 66 – Collaborative Design

Lecture 67 – Feedback in Collaborative Design

Lecture 68 – Feedback in Collaborative Design

Lecture 69 – Feedback in Collaborative Design

Lecture 70 – Tensoring Enriched Profunctors

Lecture 71 – Caps and Cups for Enriched Profunctors

Lecture 72 – Monoidal Categories

Lecture 73 – String Diagrams and Strictification

Lecture 74 – Compact Closed Categories

Lecture 75 – The Grand Synthesis

Lecture 76 – The Grand Synthesis

Lecture 77 – The End? No, the Beginning!

September 27, 2023

David Hoggis the world lagrangian?

My day started with a long and very fun conversation with Monica Pate (NYU) about conservation laws in classical physics. As we all know, conservation laws are related to symmetries; each symmetry of the laws of physics creates a conservation law. Or does it? Well, it's a theorem! But it's a theorem when the laws of physics are lagrangian (or hamiltonian). That is, every symmetry of a hamiltonian system is associated with a conservation law in that system. So I asked: How do we know if or whether the world is lagrangian or hamiltonian? How could we know that? My best guess is that we know it because of these very conservation laws! The situation is complex.

David Hoggplanets forming in a disk

At the end of last week I had a great conversation with Valentina Tardugno (NYU) and Phil Armitage (Flatiron) about how planets form. I spent the whole weekend thinking about it: If a few planets are forming in a proto-planetary disk, there are all sorts of interactions between the planets and the disk, and the planets and each other, and the disk with itself. You can think of this (at least) two different ways:

You can think of planets which are interacting not just directly with one another, but also with a disk, and with each other as mediated by that disk. This is the planet-centric view. In this view, the planets are what you are tracking, and the disk is a latent object that makes the planets interact and evolve.

Alternatively, you can think of the disk, with planets in it. In this disk-centric view, the planets are latent objects that modify the disk, creating gaps and spiral waves.

Both views are legitimate, and both have interesting science questions. We will explore and see where to work. I am partial to the planet-centric view: I want to know where planetary systems come from!

Matt Strassler About the News that Antimatter Doesn’t “Fall Up”

The press is full of excitement today at the news that anti-matter — hydrogen anti-atoms, specifically, made from positrons and anti-protons instead of electrons and protons — falls down rather than rising up. This has been shown in the ALPHA experiment at CERN. But no theoretical physicist is surprised. Today I’ll explain one of many reasons that no one in the halls of theoretical physics even blinked.

One basic point is that “anti-particle” is not a category but a relationship. We do not say “electrons are particles” and “positrons are anti-particles”, nor do we say “quarks are particles” and “anti-quarks are anti-particles.” Such statements would be inconsistent. Instead we say “quarks and anti-quarks are each others’ anti-particles.” (It’s like the term “opponent” in a football match.) That’s because some particles are their own anti-particles, including photons. If we tried to divide all the types of particles into two categories, it wouldn’t be clear where photons would go; we couldn’t say either that they are particles or that they are anti-particles.

“Anti-particle” is a relationship, exchanging electrons with positrons and leaving photons unchanged. It is not a category; there are no objects that can be said to be “antiparticles”. (If there were, would photons be particles or would they be anti-particles?)

Because of this, we can’t hope that all the types of particles separate into two groups, particles which fall and anti-particle which rise. At best, we would have to guess what would happen to particles that are their own anti-particles. Fortunately, this is easy. There is such a thing as a positronium atom, an exotic atom made from an electron and positron co-orbiting one other. A positronium atom is its own anti-particle; if we flip every particle for its anti-particle, the positronium atom’s electron and positron simply switch places. If indeed gravity pulls down on electrons and pushes up by the same amount on positrons, then gravity’s pull on a positronium atom would cancel. This atom would neither rise nor fall; it would float, feeling no net gravity.

This example would then lead us to expect that particles that are their own anti-particles will float. The logic would apply to photons; they would feel no gravity.

Electrons and quarks make up atoms and are known to fall. If the positrons and anti-quarks that make up anti-atoms were to rise instead, then positronium atoms would float. This would imply that photons float too. But experiments have shown for a century that photons fall.

This would be a consistent picture. But experimentally, it is false: photons do feel gravity. The Sun bends the path of light, a fact that made Einstein famous in 1919, and an object’s strong gravity can create a gravitational lens that completely distorts the appearances of objects behind it. Photons can even orbit black holes. So experiment would force us to accept the picture shown below, if we want positrons to rise. Unfortunately, it is logically inconsistent.

Since photons fall too, we would have to believe either that positronium floats even though photons fall, or that positrons rise even though positronium falls. Neither makes any sense either conceptually or mathematically.

The only consistent picture, then, is that everything falls. Are there any loopholes in this argument? Sure; perhaps the gravity of the Earth, made of atoms, causes electrons and quarks to fall rapidly and causes positrons and anti-quarks to rise slowly, so that positronium still falls, just more gently than ordinary atoms do. (The reverse would be true around a planet made of anti-atoms.) This gives us many more possibilities to consider, and we have to get into more complex questions of what experiment has and hasn’t excluded.

But we would still face another serious problem, because there are anti-quarks inside of protons. (One line of evidence for this is shown here.) If quarks and anti-quarks, which have the same inertial mass (the type of mass that determines how they change speed when pushed) had different gravitational mass (which determines how gravity affects them), then protons and neutrons, too, would not have equal inertial and gravitational mass. [The many gluons inside the protons and neutrons could make this even worse.] Since electrons do have equal inertial and gravitational mass, protons and neutrons would then fall at a different rate than electrons do. The consequence would be that different atoms would fall at slightly different rates. High precision experiments clearly say otherwise. This poses additional obstructions to the idea that anti-quarks (and the anti-protons and anti-neutrons that contain them) could rise in Earth’s gravity.

At best, when it comes to mass and gravity, existing experiments only allow for minor differences between atoms and anti-atoms. To look for subtle effects of any such differences is one of the real, long-term goals of the ALPHA experiment. What they’ve done so far is a neat experimental coup, but despite the headlines, it does not change our basic knowledge of anti-particles in general or of anti-atoms in particular. For that, we have a few years to wait.

Matt Strassler Beyond the Book: The Ambiguities of Scientific Language

Personally, I think that popular science books ought to devote more pages to the issue of how language is used in science. The words scientists choose are central to communication and miscommunication both among researchers and between scientists and non-scientists. The problem is that all language is full of misnomers and contradictory definitions, and scientific language is no exception.

One especially problematic scientific word is “matter.” It has multiple and partly contradictory meanings within particle physics, astronomy and cosmology. For instance,

  • (Quote) It’s not even clear that “dark matter,” a term used widely by astronomers and particle physicists alike, is actually matter.
  • (Endnote) Among possible dark matter particles are axions and dark photons, neither of which would obviously qualify as “matter.”*

Why might one not view them as matter?

Dark Matter NEED not be “Matter”

Let’s assume for a moment that dark matter exists (remembering that the evidence for it, though strong, is circumstantial, and is still disputed by some scientists.) If it exists, all we know about it from observations of the cosmos is that

  • it is a substance that causes and responds to gravity, and that
  • it is slow-moving (or “cold”, in astronomer-speak), allowing it clump inside galaxies and clusters of galaxies.

In the scientific dialect of cosmologists, who study the history of the universe, such a substance is automatically referred to as “matter”. For them, “matter” refers not to the details of the substance, only to some of the things that it does.

What about dark matter’s substance? It is possible that it consists of large numbers of subatomic particles, and let’s assume that for the moment. But there are many candidates for what those particles could be. Only some of them are what particle physicists would call “matter,” since their definitions of the term “matter” (there’s more than one) are different from the definition typically used in cosmology.

Among practicing scientists, a popular candidate (not necessarily a likely one) for dark matter’s particles is called a “WIMP” — a Weakly-Interacting Massive Particle — which would interact with ordinary atoms via a combination of the weak-nuclear and Higgs forces. Such particles — somewhat like neutrinos, only with much larger mass — would be called “matter” by most particle physicists, even though they do not in any sense participate in ordinary matter made from atoms.

But another possibility is that dark matter is made of a hypothetical particle called a “dark photon.” A dark photon resembles a photon, but differs from it two ways:

  • it has extremely weak interactions with all known particles;
  • it has a mass, just as a Z boson does, except that its mass is much, much smaller.

Since photons are universally viewed as “not matter”, many particle physicists would feel a bit ambivalent about calling dark photons “matter”. (We don’t typically call Z bosons “matter”, either… both photons and Z bosons emerge from the same two photon-like parent particles, so it’s not really obvious we should say that one is matter and the other is not.) At best, the terminology would be debatable.

Yet another possible source of dark matter is known as axions… but in this case, dark matter starts off its life as an effect of the whole axion field waving back and forth. Such a wave, now made more complicated by gravitational interactions, was initially much more similar to a laser (though a standing wave rather than a traveling wave) than it is to a horde of independent particles. This makes axions a lot more like photons than like ordinary matter, and again raises the question of what terminology to use. Experiments that search for axion dark matter specifically make use of the fact that the axions in dark matter can be viewed as more like a wave (“coherent”) than a crowd of randomly moving particles (“incoherent”.) Would particle physicists always want to refer to this substance as “matter”? I think it would depend on whom you asked.

More generally, there are many books and lectures which refer to the two classes of particles known as “bosons” and “fermions” as “force particles” and “matter particles”. (Admittedly, I personally never use this terminology, as I find it misleading.) Examples of bosons are photons and Z bosons; examples of fermions are electrons and quarks, ingredients for atoms. Since dark photons and axions are bosons, they would be categorized by this definition as “force particles”, not as “matter particles.” Obviously this makes a linguistic mess.

Such issues aren’t limited to “matter”. Scientific terms are often used in multiple ways in different contexts; their meanings differ not only between different research fields but even within a single research field. Sometimes a term will be used in contradictory ways by the same person within a single one-hour seminar!

My point is not that we should try to make our language super-precise. That would be hopeless; our terminology changes over periods of a few years, and we can’t fix it in stone. Instead, I would suggest that we scientists need to be more careful about our use of language when speaking to non-experts. Our wording is full of internal inconsistencies that hold huge potential for misinterpretations, and we have to be very clear and consistent if we want to avoid befuddling interested non-experts.

(Note: I have also written about the ambiguities and confusions surrounding “matter” and “energy” here.)

John PreskillMay I have this dance?

This July, I came upon a museum called the Haus der Musik in one of Vienna’s former palaces. The museum contains a room dedicated to Johann Strauss II, king of the waltz. The room, dimly lit, resembles a twilit gazebo. I could almost believe that a hidden orchestra was playing the rendition of “The Blue Danube” that filled the room. Glass cases displayed dance cards and accessories that dancers would bring to a nineteenth-century ball.

A ball. Who hasn’t read about one in a novel or seen one in a film? A throng of youngsters and their chaperones, rustling in silk. The glint of candles, the vigor of movement, the thrill of interaction, the anxiety of establishing one’s place in society.

Victoria and Albert at a ball in the film The Young Victoria

Another throng gathered a short walk from the Haus der Musik this summer. The Vienna University of Technology hosted the conference Quantum Thermodynamics (QTD) in the heart of the city. Don’t tell the other annual conferences, but QTD is my favorite. It spotlights the breed of quantum thermodynamics that’s surged throughout the past decade—the breed saturated with quantum information theory. I began attending QTD as a PhD student, and the conference shifts from city to city from year to year. I reveled in returning in person for the first time since the pandemic began.

Yet this QTD felt different. First, instead of being a PhD student, I brought a PhD student of my own. Second, granted, I enjoyed catching up with colleagues-cum-friends as much as ever. I especially relished seeing the “classmates” who belonged to my academic generation. Yet we were now congratulating each other on having founded research groups, and we were commiserating about the workload of primary investigators. 

Third, I found myself a panelist in the annual discussion traditionally called “Quo vadis, quantum thermodynamics?” The panel presented bird’s-eye views on quantum thermodynamics, analyzing trends and opining on the direction our field was taking (or should take).1 Fourth, at the end of the conference, almost the last sentence spoken into any microphone was “See you in Maryland next year.” Colleagues and I will host QTD 2024.


One of my dearest quantum-thermodynamic “classmates,” Nelly Ng, participated in the panel discussion, too. We met as students (see these two blog posts), and she’s now an assistant professor at Nanyang Technological University. Photo credit: Jakub Czartowski.

The day after QTD ended, I boarded an Austrian Airlines flight. Waltzes composed by Strauss played over the loudspeakers. They flipped a switch in my mind: I’d come of age, I thought. I’d attended QTD 2017 as a debutante, presenting my first invited talk at the conference series. I’d danced through QTD 2018 in Santa Barbara, as well as the online iterations held during the pandemic. I’d reveled in the vigor of scientific argumentation, the thrill of learning, the glint of slides shining on projector screens (not really). Now, I was beginning to shoulder responsibilities like a ballgown-wearing chaperone.

As I came of age, so did QTD. The conference series budded around the time I started grad school and embarked upon quantum-thermodynamics research. In 2017, approximately 80 participants attended QTD. This year, 250 people registered to attend in person, and others attended online. Two hundred fifty! Quantum thermodynamics scarcely existed as a field of research fifteen years ago.

I’ve heard that organizers of another annual conference, Quantum Information Processing (QIP), reacted similarly to a 250-person registration list some years ago. Aram Harrow, a professor and quantum information theorist at MIT, has shared stories about co-organizing the first QIPs. As a PhD student, he’d sat in his advisor’s office, taking notes, while the local quantum-information theorists chose submissions to highlight. Nowadays, a small army of reviewers and subreviewers processes the hordes of submissions. And, from what I heard about this year’s attendance, you almost might as well navigate a Disney theme park on a holiday as the QIP crowd. 

Will QTD continue to grow like QIP? Would such growth strengthen or fracture the community? Perhaps we’ll discuss those questions at a “Quo vadis?” session in Maryland next year. But I, at least, hope to continue always to grow—and to dance.2


Ludwig Boltzmann, a granddaddy of thermodynamics, worked in Vienna. I’ve waited for years to make a pilgrimage.

1My opinion: Now that quantum thermodynamics has showered us with fundamental insights, we should apply it in practical applications. How? Collaborators and I suggest one path here.

2I confess to having danced the waltz step (gleaned during my 14 years of ballet training) around that Strauss room in the Haus der Musik. I didn’t waltz around the conference auditorium, though.

September 26, 2023

Doug NatelsonA few quick highlights

 It's been a very busy time, hence my lower posting frequency.  It was rather intense trying to attend both the KITP conference and the morning sessions of the DOE experimental condensed matter PI meeting (pdf of agenda here).  A few quick highlights that I thought were interesting:

  • Kagome metals of the form AV3Sb5 are very complicated.  In these materials, in the a-b plane the V atoms form a Kagome lattice (before that one reader corrects me, I know that this is not formally a lattice from the crystallographic point of view, just using the term colloquially).  Band structure calculations show that there are rather flat bands (for an explanation, see here) near the Fermi level, and there are Dirac cones, van Hove singularities, Fermi surface nesting, etc.  These materials have nontrivial electronic topology, and CsV3Sb5 and KV3Sb5 both have charge density wave transitions and low-temperature superconductivity.  Here is a nice study of the CDW in CsV3Sb5, and here is a study that shows that there is no spontaneous breaking of time-reversal symmetry below that transition.  This paper shows that there is funky nonlinear electronic transport (apply a current at frequency \(\omega\), measure a voltage at frequency \(2 \omega\)) in CsV3Sb5 that is switchable in sign with an out-of-plane magnetic field.  Weirdly, that is not seen in KV3Sb5 even though the basic noninteracting band structures of the two materials are almost identical, implying that it has something to do with electronic correlation effects.
  • Related to that last paper, here is a review article about using focused ion beams for sample preparation and material engineering.  It's pretty amazing what can be done with these tools, including carving out micro/nanostructured devices from originally bulk crystals of interesting materials.  
  • The temperature-dependent part of the electrical resistivity of Fermi liquids is expected to scale like \(T^{2}\) as \(T \rightarrow 0\).  One can make a very general argument (that ignores actual kinematic restrictions on scattering) based on the Pauli exclusion principle that the inelastic e-e scattering rate should go like \(T^{2}\) (number of electron quasiparticles excited goes like \(T\), number of empty states available to scatter into also goes like \(T\)).  However, actually keeping track of momentum conservation, it turns out that one usually needs Umklapp scattering processes to get this.  That isn't necessary all the time, however.  In very low density metals, the Fermi wavevector is far from the Brillouin zone boundary and so Umklapp should not be important, but it is still possible to get \(T^{2}\) resistivity (see here as well).  Similarly, in 3He, a true Fermi liquid, there is no lattice, so there is no such thing as Umklapp, but at the lowest temperatures the \(T^{2}\) thermal conduction is still seen (though some weird things happen at higher temperatures). 
There are more, but I have to work on writing some other things.  More soon....

September 25, 2023

Tommaso DorigoMesmerizing Shapes - Symmetries According To An AI

Having spent the past 12 months coding up an end-to-end model of an astrophysics experiment, with the sole aim of searching for an optimal solution for its design by use of stochastic gradient descent, I am the least qualified person to judge the aesthetic value of the results I am finally getting from it. 
Therefore it makes sense to ask you, dear reader, what you think of the eerily arcane geometries that the system is proposing. I do not think that to be a good judge you need to know the details of how the model is put together, but I will nevertheless make an attempt at briefing you on it, just in case it makes a difference in your judgment.

read more

September 24, 2023

David Hoggplanning your science

I had two interactions today that made me think seriously about big-picture and design things. I like design language: How do you design your whole research program, and how do you design individual projects so they fit into it. One interaction was in the Astronomical Data Meeting at Flatiron, where Vivi Acquaviva (CUNY) talked about the intersection between what you are good at, what is important, and what brings you joy. That's a hard intersection to find. Or way too easy; I am not sure. The other interaction was a conversation with Jiayin Dong (Flatiron), who is thinking about faculty job applications and the like. How to talk about your research in terms of the next decade instead of the next year?

One comment that is frequently made by Hans-Walter Rix (MPIA) is that he feels like most early-career (and even mid-career) people spend too much time doing their science and not enough time planning and justifying their science. It is important to be able to answer “why” questions about your research, and in the medium term it helps all your projects.

n-Category Café The Moduli Space of Acute Triangles

I wrote a little article explaining the concept of ‘moduli space’ through an example. It’s due October 1st so I’d really appreciate it if you folks could take a look and see if it’s clear enough. It’s really short, and it’s written for people who know some math, but not necessarily anything about moduli spaces.

The cool part is the connection between the moduli space of acute triangles — that is, the space of all shapes an acute triangle can have — and the more famous moduli space of elliptic curves.

There’s a lot more one could do with this, e.g. describing the modular lambda function as the cross ratio of the 4 points on a sphere obtained by taking an acute triangle, dividing it into 4 similar acute triangles, and folding it up to form a tetrahedron, which is conformally equivalent to a sphere. But I didn’t have space for that here!

Okay, here goes.

The moduli space of acute triangles

In mathematics we often like to classify objects up to isomorphism. Sometimes the classification is discrete, but sometimes we have a notion of when two objects are ‘close’. Then we can make the set of isomorphism classes into a topological space called a ‘moduli space’. A simple example is the moduli space of acute triangles. In simple terms, this is the space of all possible shapes that an acute triangle can have, where we count two triangles as having the same shape if they are similar.

As a first step, consider triangles with labeled vertices in the complex plane. Every triangle is similar to one with its first vertex at 00, the second at 11, and the third at some point in the upper half-plane. This triangle is acute precisely when its third vertex lies in this set:

T={z|Im(z)>0,0<Re(z)<1,|z12|>12} T = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) \gt 0, \; 0 \lt \mathrm{Re}(z) \lt 1, \; |z - \tfrac{1}{2}| \gt \tfrac{1}{2} \right. \right\}

So, we say TT is the moduli space of acute triangles with labeled vertices. This set is colored yellow and purple above; the yellow and purple regions on top extend infinitely upward.

To get the moduli space of acute triangles with unlabeled vertices, we must mod out TT by the action of S 3S_3 that permutes the three vertices. The 6 yellow and purple regions in TT are ‘fundamental domains’ for this S 3S_3 action: that is, they each contain exactly one point from each orbit. If we reflect a labeled triangle corresponding to a point in a yellow region we get a triangle corresponding to a point in a purple region, and vice versa. Points on the boundary between two regions correspond to isosceles triangles. All 6 regions meet at the point that corresponds to an equilateral triangle.

The moduli space of acute triangles is closely related to a more famous moduli space: that of elliptic curves. The group GL(2,)\mathrm{GL}(2,\mathbb{Z}), consisting of invertible 2×22 \times 2 integer matrices, acts on the upper half-plane

={z|Im(z)>0} \mathcal{H} = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) \gt 0 \right. \right\}

as follows:

(a b c d):zaz+bcz+d. \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) \colon z \mapsto \frac{a z + b}{c z + d} .

The light and dark regions shown above are fundamental domains for this group action. Elements of GL(2,)\mathrm{GL}(2,\mathbb{Z}) with determinant 1-1 map light regions to dark ones and vice versa. Elements with determinant 11 map light regions to light ones and dark ones to dark ones. People more often study the action of the subgroup SL(2,)\mathrm{SL}(2,\mathbb{Z}) consisting of gGL(2,)g \in \mathrm{GL}(2,\mathbb{Z}) with determinant 1. The union of a light region and a dark one, touching each other, forms a fundamental domain of SL(2,)\mathrm{SL}(2,\mathbb{Z}).

For any point zz \in \mathcal{H} we can form a parallelogram with vertices 0,1,z0, 1, z and z+1z+1. If we identify the opposite edges of this parallelogram we get an elliptic curve: a torus equipped with the structure of a complex manifold. We can get every elliptic curve this way, at least up to isomorphism. Moreover, two points z,zz,z' \in \mathcal{H} in the upper half-plane give isomorphic elliptic curves iff z=gzz' = g z for some gSL(2,)g \in \mathrm{SL}(2,\mathbb{Z}). Thus the quotient space /SL(2,)\mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) is the moduli space of elliptic curves: points in this space correspond to isomorphism classes of elliptic curves.

Since TT is the union of three fundamental domains for SL(2,)\mathrm{SL}(2,\mathbb{Z}), there is a map

p:T/SL(2,) p \colon T \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

from the moduli space of acute triangles to the moduli space of elliptic curves, and generically this map is three-to-one. This map is not onto, but if we take the closure of TT inside \mathcal{H} we get a larger set

T¯={z|Im(z)>0,0Re(z)1,|z12|12} \overline{T} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) \gt 0, \; 0 \le \mathrm{Re}(z) \le 1, \; |z - \tfrac{1}{2}| \ge \tfrac{1}{2} \right\}

whose boundary consists of points corresponding to right triangles. Then pp extends to an onto map

p:T¯/SL(2,). p \colon \overline{T} \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) .

The existence of this map suggests that from any acute or right triangle in the plane we can construct an elliptic curve, in such a way that similar triangles give isomorphic elliptic curves. This is in fact true! How can we understand this more directly?

Take any acute or right triangle with labelled vertices in the complex plane. Rotating it 180° around the midpoint of any edge we get another triangle. The union of these two triangles is a parallelogram. Identifying opposite edges of this parallelogram we get a torus with a complex structure — and this is an elliptic curve! There are three choices of how to build this parallelogram, one for each edge of the original triangle, but they give isomorphic elliptic curves. Also, similar triangles give isomorphic elliptic curves. Even better, every elliptic curve is isomorphic to one arising from this construction. So this construction gives a map from T¯\overline{T} onto /SL(2,)\mathcal{H}/\mathrm{SL}(2,\mathbb{Z}), and with a little thought one can see that this map is pp.

I learned about the moduli space of acute triangles from James Dolan. There has also been interesting work on the moduli space of all triangles in the plane. Gaspar and Neto [2] noticed that this space is a triangle, and Ian Stewart later gave a more geometrical explanation [3]. In fact all the moduli spaces mentioned here are better thought of as moduli ‘stacks’: stacks give a way to understand the special role of more symmetrical objects, like isosceles and equilateral triangles. Kai Behrend [1] has written an introduction to stacks using various moduli stacks of triangles and the moduli space of elliptic curves as examples. Though he does not describe the map pp (or its stacky analogue), his work is a nice way to dig deeper into some of the material discussed here.

References

[1] K. Behrend, An introduction to algebraic stacks, in Moduli Spaces, eds. L. Brambila-Paz, P. Newstead, R. P. Thomas and O. García-Prada, Cambridge U. Press, Cambridge 2014, pp. 1–131. Also available at https://secure.math.ubc.ca/~behrend/math615A/stacksintro.pdf.

[2] J. Gaspar and O. Neto, All triangles at once, Amer. Math. Monthly 122 (2015), 982–982.

[3] I. Stewart, Why do all triangles form a triangle?, Amer. Math. Monthly 124 (2017), 70–73.

John BaezThe Moduli Space of Acute Triangles

Recently Quanta magazine came out with an article explaining modular forms:

• Jordana Cepelewicz, Behold modular forms, the ‘fifth fundamental operation’ of math, Quanta, 23 September 2023.

It does a heroically good job. One big thing it doesn’t do is explain these funny looking ‘fundamental domains’ in the upper half-plane:

That is: where does this picture come from and why is it important?

By sheer coincidence, I just wrote a little article explaining the concept of ‘moduli space’ through an example, which does touch on these fundamental domains. It’s due October 1st so I’d really appreciate it if you folks could take a look and see if it’s clear enough. It’s really short, and it’s written for people who know some math—more than your typical Quanta reader—but not necessarily anything about moduli spaces.

The cool part is the connection between the moduli space of acute triangles—that is, the space of all shapes an acute triangle can have—and the more famous moduli space of elliptic curves.

Okay, here goes.

The moduli space of acute triangles

In mathematics we often like to classify objects up to isomorphism. Sometimes the classification is discrete, but sometimes we have a notion of when two objects are ‘close’. Then we can make the set of isomorphism classes into a topological space called a ‘moduli space’. A simple example is the moduli space of acute triangles. In simple terms, this is the space of all possible shapes that an acute triangle can have, where we count two triangles as having the same shape if they are similar.

As a first step, consider triangles with labeled vertices in the complex plane. Every triangle is similar to one with its first vertex at 0, the second at 1, and the third at some point in the upper half-plane. This triangle is acute precisely when its third vertex lies in this set:

T = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) > 0, \; 0 < \mathrm{Re}(z) < 1, \; |z - \tfrac{1}{2}| > \tfrac{1}{2} \right. \right\}

which is colored yellow and purple above. So, we say T is the moduli space of acute triangles with labeled vertices.

To get the moduli space of acute triangles with unlabeled vertices, we must mod out T by the action of S_3 that permutes the three vertices. The 6 yellow and purple regions in T are fundamental domains for this S_3 action. If we reflect a labeled triangle corresponding to a point in a yellow region we get a triangle corresponding to a point in a purple region, and vice versa. Points on the boundary between two regions correspond to isosceles triangles. All 6 regions meet at the point that corresponds to an equilateral triangle.

The moduli space of acute triangles is closely related to a more famous moduli space: that of elliptic curves. The group \mathrm{GL}(2,\mathbb{Z}), consisting of invertible 2 \times 2 integer matrices, acts on the upper half-plane

\mathcal{H} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) > 0 \right\}

as follows:

\displaystyle{  \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) \colon z \mapsto \frac{a z + b}{c z + d}  }

The light and dark regions shown above are fundamental domains for this group action. Elements of \mathrm{GL}(2,\mathbb{Z}) with determinant -1 map light regions to dark ones and vice versa. Elements with determinant 1 map light regions to light ones and dark ones to dark ones. People more often study the action of the subgroup \mathrm{SL}(2,\mathbb{Z}) consisting of g \in \mathrm{GL}(2,\mathbb{Z}) with determinant 1. The union of a light region and a dark one, touching each other, forms a fundamental domain of \mathrm{SL}(2,\mathbb{Z}).

For any point z \in \mathcal{H} we can form a parallelogram with vertices 0, 1, z and z+1. If we identify the opposite edges of this parallelogram we get an elliptic curve: a torus equipped with the structure of a complex manifold. We can get every elliptic curve this way, at least up to isomorphism. Moreover, two points z,z' \in \mathcal{H} in the upper half-plane give isomorphic elliptic curves iff z' = g z for some g \in \mathrm{SL}(2,\mathbb{Z}). Thus the quotient space \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) is the moduli space of elliptic curves: points in this space correspond to isomorphism classes of elliptic curves.

Since T is the union of three fundamental domains for \mathrm{SL}(2,\mathbb{Z}), there is a map

p \colon T \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

from the moduli space of acute triangles to the moduli space of elliptic curves, and generically this map is three-to-one. This map is not onto, but if we take the closure of T inside \mathcal{H} we get a larger set

\overline{T} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) > 0, \; 0 \le \mathrm{Re}(z) \le 1, \; |z - \tfrac{1}{2}| \ge \tfrac{1}{2} \right\}

whose boundary consists of points corresponding to right triangles. Then p extends to an onto map

p \colon \overline{T} \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

The existence of this map suggests that from any acute or right triangle in the plane we can construct an elliptic curve, in such a way that similar triangles give isomorphic elliptic curves. This is in fact true! How can we understand this more directly?

Take any acute or right triangle with labelled vertices in the complex plane. Rotating it 180° around the midpoint of any edge we get another triangle. The union of these two triangles is a parallelogram. Identifying opposite edges of this parallelogram we get a torus with a complex structure—and this is an elliptic curve! There are three choices of how to build this parallelogram, one for each edge of the original triangle, but they give isomorphic elliptic curves. Also, similar triangles give isomorphic elliptic curves. Even better, every elliptic curve is isomorphic to one arising from this construction. So this construction gives a map from \overline{T} onto \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}), and with a little thought one can see that this map is p.

I learned about the moduli space of acute triangles from James Dolan. There has also been interesting work on the moduli space of all triangles in the plane. Gaspar and Neto [2] noticed that this space is a triangle, and Ian Stewart later gave a more geometrical explanation [3]. In fact all the moduli spaces mentioned here are better thought of as moduli ‘stacks’: stacks give a way to understand the special role of more symmetrical objects, like isosceles and equilateral triangles. Kai Behrend [1] has written an introduction to stacks using various moduli stacks of triangles and the moduli space of elliptic curves as examples. Though he does not describe the map p (or its stacky analogue), his work is a nice way to dig deeper into some of the material discussed here.

References

[1] K. Behrend, An introduction to algebraic stacks, in Moduli Spaces, eds. L. Brambila-Paz, P. Newstead, R. P. Thomas and O. García-Prada, Cambridge U. Press, Cambridge 2014, pp. 1–131. Also available at https://secure.math.ubc.ca/~behrend/math615A/stacksintro.pdf.

[2] J. Gaspar and O. Neto, All triangles at once, Amer.
Math. Monthly
122 (2015), 982–982.

[3] I. Stewart, Why do all triangles form a triangle?, Amer. Math. Monthly 124 (2017), 70–73.

Matt Strassler Mass, Weight, and Fields

Today a reader asked me “Out of the quantum fields which have mass, do any of them also have weight?” I thought other readers would be interested in my answer, so I’m putting it here. (Some of what is discussed below is covered in greater detail in my upcoming book.)

Before we start, we need to rephrase the question, because fields do not have mass.

Mass and Weight of Particles and Other Objects

For ordinary objects ranging from particles to planets, the corresponding question is meaningful, but still it is a bit subtle. “Out of the objects which have mass, do they also have weight?”

Gravity is a universal force that responds to energy and momentum, and universal means applicable to every object and indeed to anything that has energy. Because all objects have energy, all objects can have weight , and correspondingly all objects have gravitational mass. NOTE ADDED: as a reader pointed out, the above statement was not written clearly. It should instead read: “Because all objects have energy, all objects have gravitational mass, which means they will have weight if there’s some gravity around that can pull on them!” An object out in the middle of deep space, in the absence of anything to create a noticeable gravitational force on it, will have no weight, no matter how much gravitational mass it has.

(If this universality of gravity weren’t true, one could not think of gravity as a manifestation of curved space and time, as Einstein did. In the presence of gravity, objects without weight would act as though space and time were flat, while everything else around them would act as though space-time is curved. That would ruin Einstein’s whole idea!)

Yet even though all objects have gravitational mass, not all of them have rest mass. This leads to confusions that I address in the book, as in this paragraph:

  • Here’s another strange thing. If you have read a variety of books about particles and mass, you will probably have noticed that some say that photons have mass and others say that they don’t. It’s hard to believe there could be disagreement about something so fundamental in nature. But the origin of the discrepancy is simple: it depends on which version of mass you’re asking about. [From “Waves in an Impossible Sea”.]

Photons have gravitational mass, like everything else. But they have zero rest mass, which is why they must always move at the cosmic speed limit c, 186,000 miles (300,000 km) per second.

An electron, by contrast, has both gravitational mass and rest mass. But one still has to be careful: an electron’s gravitational mass is usually larger than its rest mass, unless (from your perspective) it is stationary.

Why Don’t Fields Have Mass?

A photon is a ripple in the electromagnetic field. An electron is a ripple in the electron field. More precisely, each is a quantum — a gentlest possible ripple — of its field. Since electrons have rest mass and photons do not, should we say that the electron field has mass and that the electromagnetic field does not?

No. That would be misguided.

It is true that some physicists will sometimes say, “the electron field has mass“. But they are using a potentially confusing shorthand when they do so. What they actually mean is: “the quanta of the electron field have rest mass” — i.e., electrons, ripples in the electron field, have rest mass — and that “the electron field’s equations include a term corresponding to this non-zero mass.” The field’s rest mass simply cannot be defined; it is meaningless. Here’s why.

Rest mass is a measure of how difficult it is for you to move an object that is currently stationary. But the electromagnetic field, present across the entire cosmos, is not something that can move. It has no such thing as speed, and you can’t move it. It’s part of the cosmos. This is true of the electron field as well. Fields of the universe are not things to which you can attribute motion. Since both rest mass and inertial mass have to do with motion, neither type of mass can be attributed to these fields. Nor can the fields cause gravity the way objects do — they are everywhere, so they can’t pull objects in any direction, as gravity does, or feel weight, which would cause them to move in some direction or other.

Ripples in these fields are a different matter! They do move, and they carry energy and therefore can cause gravity. Quanta of fields definitely can have all possible types of mass, and they have weight. But the fields themselves do not.

So: objects, including all elementary particles (i.e. quanta of the universe’s fields), have weight and gravitational mass, and some have non-zero rest mass. However, the fields of the universe are not objects, and they have neither weight nor mass of any kind.

Vacuum Energy-Density of fields

Despite this, quantum fields can have gravitational effects and intrinsic energy — more precisely, what is known as vacuum energy-density. Even when a field is sitting undisturbed, an effect of quantum physics causes it to be uncertain, creating a constant amount of energy in each spatial volume. This is true whether its quanta have non-zero rest mass or not. Vacuum energy-density contributes to the cosmological constant, and is potentially among the sources of the universe’s widespread “dark energy” (which, despite the name, is in fact energy-density and negative pressure). Ordinary objects, including the elementary particles they’re made from, can’t have vacuum-energy.

Vacuum energy-density has potentially enormous gravitational effects on the cosmos as a whole. But you shouldn’t think of it as directly analogous to weight and mass, which are properties of localized objects made from electrons and other quanta. It is something quite different, with its own distinct effects. [For instance, you might think positive vacuum energy-density, like the positive energy inside a planet, would cause things to fall inward; but instead it causes the universe’s space to expand. And while the rest mass of ordinary objects in empty space can only be zero or positive, vacuum energy-density can be negative.]

I hope this somewhat clarifies how the properties of fields differ from the properties of their particles. It’s a very different thing to be spread out across the whole cosmos than to be a localized, movable object.

September 23, 2023

n-Category Café Constructing the Real Numbers as Nearly Multiplicative Sequences

I’m in Regensburg this week attending a workshop on Interactions of Proof Assistants and Mathematics. One of the lecture series is being given by John Harrison, a Senior Principal Applied Scientist in the Automated Reasoning Group at Amazon Web Services, and a lead developer of the HOL Light interactive theorem prover. He just told us about a very cool construction of the non-negative real numbers as sequences of natural numbers satisfying a property he calls “near multiplicativity”. In particular, the integers and the rational numbers aren’t needed at all! This is how the reals are constructed in HOL Light and is described in more detail in a book he wrote entitled Theorem Proving with the Real Numbers.

Edit: as the commenters note, these are also known as the Eudoxus reals and were apparently discovered by our very own Stephen Schanuel and disseminated by Ross Street. Thanks for pointing me to the history of this construction!

The idea

One of the standard constructions of the real numbers is as equivalence classes of Cauchy sequences of rationals. Let us consider a non-negative real number aa. One way to say that a sequence q:q \colon \mathbb{N} \to \mathbb{Q} of rational numbers converges to aa is to ask there to exist a constant AA \in \mathbb{N} so that for all nn \in \mathbb{N},

|q na|An.|q_n - a | \le \frac{A}{n}.

These representations aren’t at all unique: many Cauchy sequences of rationals represent the same real number. And in particular, for any positive real number aa, it is possible to find a sequence of natural numbers a:a \colon \mathbb{N} \to \mathbb{N} so that the sequence na nnn \mapsto \frac{a_n}{n} converges to aa in the above sense, i.e.:

|a nna|An|\frac{a_n}{n} - a | \le \frac{A}{n}

or equivalently

|a nna|A.|a_n - n \cdot a | \le A.

This sequence a:a \colon \mathbb{N} \to \mathbb{N} will encode the real number aa (which is why I’ve given them the same name).

The construction

Now that I’ve explained the idea let’s try to characterize the sequences of natural numbers that will correspond to non-negative real numbers without presupposing the existence of non-negative real numbers. The idea is that a sequence a:a \colon \mathbb{N} \to \mathbb{N} will have the property that the sequence na nnn \mapsto \frac{a_n}{n} encodes some non-negative real number just when this sequence is Cauchy, which we express in the following way: there exists a constant AA \in \mathbb{N} so that for all n,mn,m \in \mathbb{N},

|a nna mm|An+Am,|\frac{a_n}{n} - \frac{a_m}{m} | \le \frac{A}{n} + \frac{A}{m},

or equivalently by

|ma nna m|(m+n)A.|m \cdot {a_n} - n \cdot {a_m} | \le (m + n) \cdot A.

Such sequences are called nearly multiplicative. Supposedly this is equivalent to the property of a sequence being nearly additive, meaning there exists a constant AA' \in \mathbb{N} so that for all m,nm,n \in \mathbb{N}

|a m+n(a m+a n)|A.|a_{m+n} - (a_m + a_n)| \le A'.

The non-negative reals are then equivalence classes of nearly multiplicative sequences of natural numbers, where the equivalence relation says that a,a:a, a' \colon \mathbb{N} \to \mathbb{N} represent the same real number when there exists CC \in \mathbb{N} so that for all nn \in \mathbb{N}

|a na n|C.|a_n - a'_n | \le C.

This is more or less the usual equivalence relation of Cauchy sequences, except with a specified rate of convergence.

Addition and multiplication

Now that we have the non-negative reals, how do we add and how do we multiply?

Given nearly multiplicative sequences a:a \colon \mathbb{N} \to \mathbb{N} and b:b \colon \mathbb{N} \to \mathbb{N}, their sum a+b:a + b \colon \mathbb{N} \to \mathbb{N} is defined by pointwise addition: (a+b) n:=a n+b n(a+b)_n := a_n + b_n. This is fairly intuitive.

More interestingly, their product ab:a \cdot b \colon \mathbb{N} \to \mathbb{N} is defined by function composition: (ab) n:=a b n(a \cdot b)_n := a_{b_n}. It’s a fun exercise to work out that this converges to the desired non-negative real number.

September 22, 2023

Matt von HippelCause and Effect and Stories

You can think of cause and effect as the ultimate story. The world is filled with one damn thing happening after another, but to make sense of it we organize it into a narrative: this happened first, and it caused that, which caused that. We tie this to “what if” stories, stories about things that didn’t happen: if this hadn’t happened, then it wouldn’t have caused that, so that wouldn’t have happened.

We also tell stories about cause and effect. Physicists use cause and effect as a tool, a criterion to make sense of new theories: does this theory respect cause and effect, or not? And just like everything else in science, there is more than one story they tell about it.

As a physicist, how would you think about cause and effect?

The simplest, and most obvious requirement, is that effects should follow their causes. Cause and effect shouldn’t go backwards in time, the cause should come before the effect.

This all sounds sensible, until you remember that in physics “before” and “after” are relative. If you try to describe the order of two distant events, your description will be different than someone moving with a different velocity. You might think two things happened at the same time, while they think one happened first, and someone else thinks the other happened first.

You’d think this makes a total mess of cause and effect, but actually everything remains fine, as long nothing goes faster than the speed of light. If someone could travel between two events slower than the speed of light, then everybody will agree on their order, and so everyone can agree on which one caused the other. Cause and effect only get screwed up if they can happen faster than light.

(If the two events are two different times you observed something, then cause and effect will always be fine, since you yourself can’t go faster than the speed of light. So nobody will contradict what you observe, they just might interpret it differently.)

So if you want to make sure that your theory respects cause and effect, you’d better be sure that nothing goes faster than light. It turns out, this is not automatic! In general relativity, an effect called Shapiro time delay makes light take longer to pass a heavy object than to go through empty space. If you modify general relativity, you can accidentally get a theory with a Shapiro time advance, where light arrives sooner than it would through empty space. In such a theory, at least some observers will see effects happen before their causes!

Once you know how to check this, as a physicist, there are two kinds of stories you can tell. I’ve heard different people in the field tell both.

First, you can say that cause and effect should be a basic physical principle. Using this principle, you can derive other restrictions, demands on what properties matter and energy can have. You can carve away theories that violate these rules, making sure that we’re testing for theories that actually make sense.

On the other hand, there are a lot of stories about time travel. Time travel screws up cause and effect in a very direct way. When Harry Potter and Hermione travel back in time at the end of Harry Potter and the Prisoner of Azkaban, they cause the event that saves Harry’s life earlier in the book. Science fiction and fantasy are full of stories like this, and many of them are perfectly consistent. How can we be so sure that we don’t live in such a world?

The other type of story positions the physics of cause and effect as a search for evidence. We’re looking for physics that violates cause and effect, because if it exists, then on some small level it should be possible to travel back in time. By writing down the consequences of cause and effect, we get to describe what evidence we’d need to see it breaking down, and if we see it whole new possibilities open up.

These are both good stories! And like all other stories in science, they only capture part of what the scientists are up to. Some people stick to one or the other, some go between them, driven by the actual research, not the story itself. Like cause and effect itself, the story is just one way to describe the world around us.

September 21, 2023

Matt Strassler How to Tell that the Earth Spins

Continuing with the supplementary material for the book, from its Chapter 2. This is in reference to Galileo’s principle of relativity, a central pillar of modern science. This principle states that perfectly steady motion in a straight line is indistinguishable from no motion at all, and thus cannot be felt. This is why we don’t feel our rapid motion around the Earth and Sun; over minutes, that motion is almost steady and straight. I wrote

  • . . . Our planet rotates and roams the heavens, but our motion is nearly steady. That makes it nearly undetectable, thanks to Galileo’s principle.

To this I added a brief endnote, since the spin of the Earth can be detected, with some difficulty.

  • As pointed out by the nineteenth-century French physicist Léon Foucault, the Earth’s rotation, the least steady of our motions, is reflected in the motion of a tall pendulum. Many science museums around the world have such a “Foucault pendulum” on exhibit.

But for those who would want to know more, here’s some information about how to measure the Earth’s spin.

————

The spin of the Earth was first detected through what is known as the Coriolis effect, which causes objects far from the equator and moving in long paths to seem to curve gently. The reasons for the apparent curved paths, as well as the consequent impacts on navigation and weather, are discussed in this post from 2022, one of a series in which I showed how you can confirm basic facts of astronomy for yourself.

(Regarding the Coriolis effect: there’s a famous tourist trap in which trained professionals cause water to spin down drains in opposite directions, depending on which side of the equator they are standing on. But this is a magician’s trick, intended to obtain a nice tip from impressed travelers. The Coriolis effect is tiny on the scale of a sink; it’s only easy to observe on scales of miles (kilometers). It is even tinier right near the equator — that’s why there are no hurricanes at very low latitudes — and so it has no effect on the trickster’s draining water. Here’s someone’s webpage devoted to this issue.)

In the same post, I described the basics of a Foucault pendulum, the simplest device that can visibly demonstrate the Earth’s rotation. It’s really nothing more than an ordinary pendulum, but very tall, heavy, and carefully suspended. Unfortunately, although such a pendulum is easy to observe and is common in science museums, it is conceptually confusing. It is easily understood only at the Earth’s poles, where it swings in a fixed plane; then the Earth rotates “underneath” it, making it to appear to spectators that it rotates exactly once a day. But at lower latitudes, the pendulum appears to rotate more slowly, and at the equator it seems not to rotate at all (because, again, there’s no Coriolis effect near the equator.) Depending on how close the pendulum is to the equator, it may take weeks, months or years to rotate completely. To understand these details is not straightforward, even for physics students.

A better device for measuring the Earth’s rotation, which Foucault was well aware of and which I discussed in the next post in that same series, is a gyroscope — for example, a spinning top, or indeed any symmetrical, rapidly spinning object. Conceptually, a gyroscope is extremely simple, because its pointing direction stays fixed in space no matter what happens around it. Once pointed at a star, the gyroscope will continue to aim at that star even as the Earth turns, and so it will appear to rotate once a day no matter where it is located on Earth.

So why don’t science museums display gyroscopes instead of Foucault pendula? Unfortunately, even today, it is still impossible to build a mechanical gyroscope that is stable enough over a full day to demonstrate the Earth’s rotation. Ring laser gyroscopes, which use interference effects between light waves, are much more stable, and they can do the job well (as a flat-earther discovered to his embarrassment — see the last final section of that same post.) But their workings are invisible and nonintuitive, making them less useful at a science museum or in a physics classroom.

Now here’s something worth thinking about. Imagine a intelligent species living forever underground, perhaps without vision, never having seen the sky. Many such species may exist in the universe, far more than on planetary surfaces that are subject to potentially sterilizing solar flares and other catastrophes. Despite complete ignorance of their astronomical surroundings, these creatures too can prove their planet rotates, using nothing more than a swinging pendulum or a spinning top.

September 19, 2023

Scott Aaronson Quantum miscellany

  1. Tomorrow at 1:30pm US Central time, I’ll be doing an online Q&A with Collective[i] Forecast about quantum computing (probably there will also be questions about AI safety). It’s open to all. Hope to see some of you there!
  2. Toby Cubitt of University College London is visiting UT Austin. We’ve been discussing the question: can you produce a QMA witness state using a closed timelike curve? Since QMA⊆PSPACE, and since Fortnow, Watrous, and I proved that closed timelike curves (or anyway, Deutsch’s model of them) let you solve PSPACE problems, clearly a closed timelike curve lets you solve QMA decision problems, but that’s different from producing the actual witness state as the fixed-point of a polynomial-time superoperator. Toby has a neat recent result, which has as a corollary that you can produce the ground state of a local Hamiltonian using a CTC, if you have as auxiliary information the ground state energy as well as (a lower bound on) the spectral gap. But you do seem to need that extra information.

    Yesterday I realized there’s also a simpler construction: namely, take an n-qubit state from the CTC, and check whether it’s a valid QMA witness, having used Marriott-Watrous amplification to push the probability of error down to (say) exp(-n2). If the witness is valid, then send it back in time unmodified; otherwise replace it by the maximally mixed state. If valid witnesses exist, then you can check that this sets up a Markov chain whose stationary distribution is almost entirely concentrated on such witnesses. (If no valid witnesses exist, then the stationary distribution is just the maximally mixed state, or exponentially close to it.) One drawback of this construction is that it can only produce a Marriott-Watrous state, rather than the “original” QMA witness state.

    Is there a third approach, which overcomes the disadvantages of both mine and Toby’s? I’ll leave that question to my readers!
  3. On the theme of QMA plus weird physics, a wonderful question emerged from a recent group meeting: namely, what’s the power of QMA if we let the verifier make multiple non-collapsing measurements of the same state, as in the “PDQP” model defined by myself, Bouland, Fitzsimons, and Lee? I conjecture that this enhanced QMA goes all the way up to NEXP (Nondeterministic Exponential-Time), by a construction related to the one I used to show that PDQP/qpoly = ALL (i.e., non-collapsing measurements combined with quantum advice lets you decide literally all languages), and that also uses the PCP Theorem. I even have some candidate constructions, though I haven’t yet proven their soundness.

    In the past, I would’ve spent more time on such a problem before sharing it. But after giving some students a first crack, I now … just want to know the answer? Inspecting my feelings in my second year of leave at OpenAI, I realized that I still care enormously about quantum complexity theory, but only about getting answers to the questions, barely at all anymore about getting credit for them. Admittedly, it took me 25 years to reach this state of not caring.

Terence TaoUndecidability of translational monotilings

Rachel Greenfeld and I have just uploaded to the arXiv our paper “Undecidability of translational monotilings“. This is a sequel to our previous paper in which we constructed a translational monotiling {A \oplus F = {\bf Z}^d} of a high-dimensional lattice {{\bf Z}^d} (thus the monotile {F} is a finite set and the translates {a+F}, {a \in A} of {F} partition {{\bf Z}^d}) which was aperiodic (there is no way to “repair” this tiling into a periodic tiling {A' \oplus F = {\bf Z}^d}, in which {A'} is now periodic with respect to a finite index subgroup of {{\bf Z}^d}). This disproved the periodic tiling conjecture of Stein, Grunbaum-Shephard and Lagarias-Wang, which asserted that such aperiodic translational monotilings do not exist. (Compare with the “hat monotile“, which is a recently discovered aperiodic isometric monotile for of {{\bf R}^2}, where one is now allowed to use rotations and reflections as well as translations, or the even more recent “spectre monotile“, which is similar except that no reflections are needed.)

One of the motivations of this conjecture was the observation of Hao Wang that if the periodic tiling conjecture were true, then the translational monotiling problem is (algorithmically) decidable: there is a Turing machine which, when given a dimension {d} and a finite subset {F} of {{\bf Z}^d}, can determine in finite time whether {F} can tile {{\bf Z}^d}. This is because if a periodic tiling exists, it can be found by computer search; and if no tiling exists at all, then (by the compactness theorem) there exists some finite subset of {{\bf Z}^d} that cannot be covered by disjoint translates of {F}, and this can also be discovered by computer search. The periodic tiling conjecture asserts that these are the only two possible scenarios, thus giving the decidability.

On the other hand, Wang’s argument is not known to be reversible: the failure of the periodic tiling conjecture does not automatically imply the undecidability of the translational monotiling problem, as it does not rule out the existence of some other algorithm to determine tiling that does not rely on the existence of a periodic tiling. (For instance, even with the newly discovered hat and spectre tiles, it remains an open question whether the isometric monotiling problem for (say) polygons with rational coefficients in {{\bf R}^2} is decidable, with or without reflections.)

The main result of this paper settles this question (with one caveat):

Theorem 1 There does not exist any algorithm which, given a dimension {d}, a periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, determines in finite time whether there is a translational tiling {A \oplus F = E} of {E} by {F}.

The caveat is that we have to work with periodic subsets {E} of {{\bf Z}^d}, rather than all of {{\bf Z}^d}; we believe this is largely a technical restriction of our method, and it is likely that can be removed with additional effort and creativity. We also remark that when {d=2}, the periodic tiling conjecture was established by Bhattacharya, and so the problem is decidable in the {d=2} case. It remains open whether the tiling problem is decidable for any fixed value of {d>2} (note in the above result that the dimension {d} is not fixed, but is part of the input).

Because of a well known link between algorithmic undecidability and logical undecidability (also known as logical independence), the main theorem also implies the existence of an (in principle explicitly describable) dimension {d}, periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, such that the assertion that {F} tiles {E} by translation cannot be proven or disproven in ZFC set theory (assuming of course that this theory is consistent).

As a consequence of our method, we can also replace {{\bf Z}^d} here by “virtually two-dimensional” groups {{\bf Z}^2 \times G_0}, with {G_0} a finite abelian group (which now becomes part of the input, in place of the dimension {d}).

We now describe some of the main ideas of the proof. It is a common technique to show that a given problem is undecidable by demonstrating that some other problem that was already known to be undecidable can be “encoded” within the original problem, so that any algorithm for deciding the original problem would also decide the embedded problem. Accordingly, we will encode the Wang tiling problem as a monotiling problem in {{\bf Z}^d}:

Problem 2 (Wang tiling problem) Given a finite collection {{\mathcal W}} of Wang tiles (unit squares with each side assigned some color from a finite palette), is it possible to tile the plane with translates of these tiles along the standard lattice {{\bf Z}^2}, such that adjacent tiles have matching colors along their common edge?

It is a famous result of Berger that this problem is undecidable. The embedding of this problem into the higher-dimensional translational monotiling problem proceeds through some intermediate problems. Firstly, it is an easy matter to embed the Wang tiling problem into a similar problem which we call the domino problem:

Problem 3 (Domino problem) Given a finite collection {{\mathcal R}_1} (resp. {{\mathcal R}_2}) of horizontal (resp. vertical) dominoes – pairs of adjacent unit squares, each of which is decorated with an element of a finite set {{\mathcal W}} of “pips”, is it possible to assign a pip to each unit square in the standard lattice tiling of {{\bf Z}^2}, such that every horizontal (resp. vertical) pair of squares in this tiling is decorated using a domino from {{\mathcal R}_1} (resp. {{\mathcal R}_2})?

Indeed, one just has to interpet each Wang tile as a separate “pip”, and define the domino sets {{\mathcal R}_1}, {{\mathcal R}_2} to be the pairs of horizontally or vertically adjacent Wang tiles with matching colors along their edge.

Next, we embed the domino problem into a Sudoku problem:

Problem 4 (Sudoku problem) Given a column width {N}, a digit set {\Sigma}, a collection {{\mathcal S}} of functions {g: \{0,\dots,N-1\} \rightarrow \Sigma}, and an “initial condition” {{\mathcal C}} (which we will not detail here, as it is a little technical), is it possible to assign a digit {F(n,m)} to each cell {(n,m)} in the “Sudoku board” {\{0,1,\dots,N-1\} \times {\bf Z}} such that for any slope {j \in {\bf Z}} and intercept {i \in {\bf Z}}, the digits {n \mapsto F(n,jn+i)} along the line {\{(n,jn+i): 0 \leq n \leq N-1\}} lie in {{\mathcal S}} (and also that {F} obeys the initial condition {{\mathcal C}})?

The most novel part of the paper is the demonstration that the domino problem can indeed be embedded into the Sudoku problem. The embedding of the Sudoku problem into the monotiling problem follows from a modification of the methods in our previous papers, which had also introduced versions of the Sudoku problem, and created a “tiling language” which could be used to “program” various problems, including the Sudoku problem, as monotiling problems.

To encode the domino problem into the Sudoku problem, we need to take a domino function {{\mathcal T}: {\bf Z}^2 \rightarrow {\mathcal W}} (obeying the domino constraints associated to some domino sets {{\mathcal R}_1, {\mathcal R}_2}) and use it to build a Sudoku function {F: \{0,\dots,N-1\} \times {\bf Z} \rightarrow \Sigma} (obeying some Sudoku constraints relating to the domino sets); conversely, every Sudoku function obeying the rules of our Sudoku puzzle has to arise somehow from a domino function. The route to doing so was not immediately obvious, but after a helpful tip from Emmanuel Jeandel, we were able to adapt some ideas of Aanderaa and Lewis, in which certain hierarchical structures were used to encode one problem in another. Here, we interpret hierarchical structure {p}-adically (using two different primes due to the two-dimensionality of the domino problem). The Sudoku function {F} that will exemplify our embedding is then built from {{\mathcal T}} by the formula

\displaystyle  F(n,m) := ( f_{p_1}(m), f_{p_2}(m), {\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m)) ) \ \ \ \ \ (1) where {p_1,p_2} are two large distinct primes (for instance one can take {p_1=53}, {p_2=59} for concreteness), {\nu_p(m)} denotes the number of times {p} divides {m}, and {f_p(m) \in {\bf Z}/p{\bf Z} \backslash \{0\}} is the last non-zero digit in the base {p} expansion of {m}:

\displaystyle  f_p(m) := \frac{m}{p^{\nu_p(m)}} \hbox{ mod } p (with the conventions {\nu_p(0)=+\infty} and {f_p(0)=1}). In the case {p_1=3, p_2=5}, the first component of (1) looks like this:

and a typical instance of the final component {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))} looks like this:

Amusingly, the decoration here is essentially following the rules of the children’s game “Fizz buzz“.

To demonstrate the embedding, we thus need to produce a specific Sudoku rule {{\mathcal S}} (as well as a more technical initial condition {{\mathcal C}}, which is basically required to exclude degenerate Sudoku solutions such as a constant solution) that can “capture” the target function (1), in the sense that the only solutions to this specific Sudoku puzzle are given by variants of {F} (e.g., {F} composed with various linear transformations). In our previous paper we were able to build a Sudoku puzzle that could similarly capture either of the first two components {f_{p_1}(m)}, {f_{p_2}(m)} of our target function (1) (up to linear transformations), by a procedure very akin to solving an actual Sudoku puzzle (combined with iterative use of a “Tetris” move in which we eliminate rows of the puzzle that we have fully solved, to focus on the remaining unsolved rows). Our previous paper treated the case when {p} was replaced with a power of {2}, as this was the only case that we know how to embed in a monotiling problem of the entirety of {{\bf Z}^d} (as opposed to a periodic subset {E} of {{\bf Z}^d}), but the analysis is in fact easier when {p} is a large odd prime, instead of a power of {2}. Once the first two components {f_{p_1}(m), f_{p_2}(m)} have been solved for, it is a relatively routine matter to design an additional constraint in the Sudoku rule that then constrains the third component to be of the desired form {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))}, with {{\mathcal T}} obeying the domino constraints.

Matt Strassler Beyond the Book (and What the Greeks Knew About the Earth)

Since the upcoming book is basically done, it’s time for me to launch the next phase of the project — the supplementary material, which will be placed here, on this website.

Any science book has to leave out many details of the subjects it covers, and omit many important topics. While my book has endnotes that help flesh out the main text, I know that some readers will want even more information. That’s what I’ll be building here over the coming months. I’ll continue to develop this material even after the book is published, as additional readers explore it. For a time, then, this will be a living, growing extension to the written text.

As I create this supplementary material, I’ll first post it on this blog, looking for your feedback in terms of its clarity and accuracy, and hoping to get a sense from you as to whether there are other questions that I ought to address. Let’s try this out today with a first example; I look forward to your comments.

In Chapter 2 of the book, I have written

  • Over two thousand years ago, Greek thinkers became experts in geometry and found clever tricks for estimating the Earth’s shape and size.

This sentence then refers to an endnote, in which I state

  • The shadow that the Earth casts on the Moon during a lunar eclipse is always disk-shaped, no matter the time of day, which can be true only for a spherical planet. Earth’s size is revealed by comparing the lengths of shadows of two identical objects, separated by a known north-south distance, measured at noon on the same day.*

Obviously this is very terse, and I’m sure some readers will want an explanation of the endnote. Here’s the explanation that I’ll post on this website:

Chapter 2, Endnote 1

By the time that ancient Rome’s power was expanding, Greek scholars understood the basics of solar and lunar eclipses. Noting the relation between the phases of the Moon and the positions of the Sun and Moon, they recognized that the Moon’s light is reflected sunlight. They were aware that New Moon occurs when the Sun and Moon are on the same side of the Earth, while Full Moon occurs when they are on the opposite sides. And they knew that a lunar eclipse occurs when the Earth lies between the Sun and Moon, so that the Earth blocks the Sun’s light and casts a shadow on the Moon. This is illustrated (not to scale) in Fig. 1.

Fig. 1: (Top) When the Moon and Sun are on opposite sides of the Earth but not perfectly aligned, the Moon is full: its lit half faces the Earth. (Bottom) But when the Moon moves directly behind the Earth, it enters its shadow and is partly or completely darkened — a “lunar eclipse”. Sizes and distances are not to scale.

From these shadows, they confirmed the Earth was a sphere. Clearly, if the Earth had the shape of an X, it could cast an X-shaped shadow.

Fig. 2: Light from the Sun (orange dashed line) would cause an X-shaped Earth to cast an X-shaped shadow on the Moon.

If Earth were a flat disk like a coin, then depending on the Moon’s location in the sky, the Earth’s shadow might be circular or might be oval. The important point is the shadow of a circular disk is not always a circular disk. You can confirm this with a coin and a light bulb.

Fig. 3: If the Sun and Moon aren’t directly aligned face-on with a disk-shaped Earth, the disk’s shadow on the Moon will be an oval, not a circle.

The only shape which always creates a circular, disk-like shadow, from any angle and from any place and at any time, is a sphere, as you can confirm with a ball and a light bulb.

Fig. 4: Only a spherical Earth always casts a disk-shaped shadow, which causes the bright areas of the moon to be crescent shaped at all times during a partial lunar eclipse.

This is consistent with what is actually observed in eclipses, as in Fig. 4. The circular shadow of the Earth is shown especially clearly when sets of photos taken during an eclipse are carefully aligned.

Next, one you are aware that the Earth’s a sphere (and, as the Greeks also knew, that the Sun is far away ), it’s not hard to learn the size of the Earth. Imagine two vertical towers sitting on flat ground in two different cities. To keep things especially simple, let’s imagine one city is due north of the other, and the distance between them — call it “D” — is already known.

In each city, a person observes their tower’s shadow at exactly noon. (No clock is needed, because one can watch the shadow over time, and noon is when the shadow is shortest.) The end of the shadow and the tower’s base and top form the points of a right-angle triangle, as in Fig. 5, whose other angles we can call ⍺ and 𝜃. For a person squatting at the shadow’s end, the angle ⍺ is easily measured: it is the angle formed by the tower’s silhouette against the sky. Because this is a right-angle triangle, the angle 𝜃 is 90 degrees minus ⍺, so each observer can easily determine 𝜃 for their city’s tower. We’ll call their two measurements 𝜃1 and 𝜃2. Knowing these angles and their distance D, they know everything we need to determine the size of the Earth.

Fig. 5: The right triangle formed by connecting a tower’s base, its top, and the end of its shadow at noon. Of greatest interest is the angle 𝜃.

The key observation is that the difference in these angles is the same as the difference in the latitude between the two cities. To see this, examine Fig. 6 below. The paths of sunlight (the orange dashed lines in Figs. 5 and 6) are parallel to the Earth-Sun line. [This (almost-exact) parallelism is only true because the Sun’s distance from Earth is much larger than the Earth’s size — which the Greeks knew.] Meanwhile the line from the Earth’s center to the first tower (call it L1) is a continuation of the line from that tower’s base to its top. Because (a) L1 forms an angle 𝜃1 with the sunlight, as shown in Fig. 5, and (b) the sunlight line and Earth-Sun line are parallel, as shown in Fig. 6, the intersection of L1 with the Earth-Sun line is also 𝜃1! Similarly, the line from the Earth’s center to tower 2 forms the angle 𝜃2 with the Earth-Sun line. As can be seen in Fig. 6, it follows that the angle between the two lines connecting the Earth’s center to the two towers is 𝜃2 – 𝜃1 , the difference in the noon-time sun angles as seen by the two observers!

Fig. 6: The angles 𝜃1 and 𝜃2 that involve the towers’ shadows (Fig. 5) are also the angles between the Earth-Sun line and the lines connecting the Earth’s center to the two towers; the difference between the two angles is the difference between the two cities’ latitudes.

Now, however, the observers can use the fact that they know D, the distance between the cities. In particular, the distance D is to the Earth’s circumference C just as 𝜃2 – 𝜃1 is to 360 degrees (or, in radians, to 2ℼ). In formulas

  • D/C = (𝜃2 – 𝜃1)/360°

So (dividing and multiplying on both sides) the Earth’s circumference is simply

  • C = D [360° / (𝜃2 – 𝜃1) ]

and since they know both D and 𝜃2 – 𝜃1 , they now know the Earth’s circumference. (Note this correctly says that if the angle were 90 degrees = 2ℼ/4, then D would be C/4.)

Eratosthenes made this measurement (in a slightly different way) around 240 B.C.E. Reports by classical historians do not quite agree on what he found, but in the most optimistic interpretation of the historical record, he was well within 1 percent of the correct answer. And why not? Once you’ve realized what you should do, this is a relatively simple measurement; it’s one that you and a distant friend could carry out yourselves.

Note: You might prefer to see the answer in radians instead of degrees; since 360° = 2ℼ radians, we can write

  • C = D [2ℼ / (𝜃2 – 𝜃1) ] (in radians)

and since C = 2ℼR, where R is the Earth’s radius, that gives us a particularly simple formula

  • R = D / (𝜃2 – 𝜃1) ( in radians)

Note: If the two cities are not due north-south of one another, this poses no problem. Measure tower 1’s shadow at the first city’s noon, and tower 2’s shadow at the second city’s noon on the same day; then take D not to be the distance between the two cities but instead the distance between their latitude lines. Practically speaking, we can make a triangle with one side being the distance between the cities and the other two sides aligned north-south and east-west; then D is the length of the north-south line, as in Fig. 7. With this definition of D, the formulas above are still valid.

Fig. 7: For two cities that are not in a north-south line, D should be defined as the distance between their lines of latitude, and thus the length of the north-south line in a right triangle connecting them. (The triangle should be carefully drawn on the Earth’s globe, which I have not done here.)

September 18, 2023

Sean Carroll Proposed Closure of the Dianoia Institute at Australian Catholic University

Just a few years ago, Australian Catholic University (ACU) established a new Dianoia Institute of Philosophy. They recruited a number of researchers and made something of a splash, leading to a noticeable leap in ACU’s rankings in philosophy — all the way to second among Catholic universities in the English-speaking world, behind only Notre Dame.

Now, without warning, ACU has announced plans to completely disestablish the institute, along with eliminating 35 other academic positions in other fields. This leaves the faculty, some of which left permanent jobs elsewhere to join the new institute, completely stranded.

I sent the letter below to the Vice-Chancellor of ACU and other interested parties. I hope the ongoing international outcry leads the administration to change its mind.

John BaezLife’s Struggle to Survive

I’m giving a public talk, the second of my Leverhulme Lectures at the International Centre for Mathematical Sciences:

Life’s struggle to survive. Tuesday September 26, 6 pm UK time. Room G.03 on the ground floor of the Bayes Centre, 47 Potterrow, Edinburgh.

Abstract. When pondering our future amid global warming, it is worth remembering how we got here. Even after it got started, the success of life on Earth was not a foregone conclusion! In this talk I recount some thrilling, chilling episodes from the history of our planet. For example: our collision with the planet Theia, the “snowball Earth events” when most of the oceans froze over, and the asteroid impact that ended the age of dinosaurs. Some are well-documented, others only theorized, but pondering them may give us some optimism about the ability of life to survive crises.

To attend in person, you need to get a free ticket here. Refreshments will be served after the lecture. If you actually show up, say hi!

If you can’t join us, fear not: I should be able to put a recording of the talk on my YouTube channel eventually. And you can already see the slides here. If you find mistakes in them or just have questions, please let me know! I’m still polishing the slides—and the more questions I field now, the more prepared I’ll be when it comes time to give the actual talk.

Doug NatelsonMeetings this week

This week is the 2023 DOE experimental condensed matter physics PI meeting - in the past I’ve written up highlights of these here (2021), here (2019), here (2017), here (2015), and here (2013).  This year, I am going to have to present remotely, however, because I am giving a talk at this interesting conference at the Kavli Institute for Theoretical Physics.  I will try to give some takeaways of the KITP meeting, and if any of the ECMP attendees want to give their perspective on news from the DOE meeting, I’d be grateful for updates in the comments.

September 17, 2023

n-Category Café Counting Algebraic Structures

The number of groups with nn elements goes like this, starting with n=0n = 0:

0, 1, 1, 1, 2, 1, 2, 1, 5, …

The number of semigroups with nn elements goes like this:

1, 1, 5, 24, 188, 1915, 28634, 1627672, 3684030417, 105978177936292, …

Here I’m counting isomorphic guys as the same.

But how much do we know about such sequences in general? For example, is there any sort of algebraic gadget where the number of gadgets with nn elements goes like this:

1, 1, 2, 1, 1, 1, 1, 1, … ?

No! Not if by “algebraic gadget” we mean something described by a bunch of operations obeying equational laws — that is, an algebra of a Lawvere theory.

This follows from a result of László Lovász in 1967:

On Mastodon, Omar Antolín sketched a proof that greases the wheels with more category theory. It relies on a rather shocking lemma:

Super-Yoneda Lemma. Let C\mathsf{C} be the category of algebras of some Lawvere theory, and let A,BCA, B \in \mathsf{C} be two algebras whose underlying sets are finite. If the functors hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are unnaturally isomorphic, then ABA \cong B.

Here we say the functors hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are unnaturally isomorphic if

hom(X,A)hom(X,B) \mathrm{hom}(X,A) \cong \mathrm{hom}(X,B)

for all XCX \in \mathsf{C}. We’re not imposing the usual commuting naturality square — indeed we can’t, since we’re not even giving any specific choice of isomorphism!

If hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are naturally isomorphic, you can easily show ABA \cong B using the Yoneda Lemma. But when they’re unnaturally isomorphic, you have to break the glass and pull out the Super-Yoneda Lemma.

Given this shocking lemma, it’s easy to show this:

Theorem. Let A,BA, B be two algebras of a Lawvere theory whose underlying sets are finite. If A kB kA^k \cong B^k for some natural number kk then ABA \cong B.

Here’s how. Since A kB kA^k \cong B^k, we have natural isomorphisms

hom(,A) khom(,A k)hom(,B k)hom(,B) k \mathrm{hom}(-,A)^k \cong \mathrm{hom}(-, A^k) \cong \mathrm{hom}(-, B^k) \cong \mathrm{hom}(-,B)^k

so for any XCX \in \mathsf{C} the sets hom(X,A) k\mathrm{hom}(X,A)^k and hom(X,B) k\mathrm{hom}(X,B)^k have the same cardinality. This means we have an unnatural isomorphism

hom(,A)hom(,B) \mathrm{hom}(-,A) \cong \mathrm{hom}(-,B)

The lemma magically lets us conclude that

AB A \cong B

Now, how do we use this to solve our puzzle? Let a(n)a(n) be the number of isomorphism classes of algebras whose underlying set has nn elements. We must have

a(n k)a(n) a(n^k) \ge a(n)

since we’ve just seen that nonisomorphic algebras with nn elements give nonisomorphic algebras with n kn^k elements. So, for example, we can never have a(4)<a(2)a(4) \lt a(2), since 4=2 24 = 2^2. Thus, the sequence can’t look like the one I showed you, with

a(0)=1,a(1)=1,a(2)=2,a(3)=1,a(4)=1,... a(0) = 1, \; a(1) = 1, \; a(2) = 2, \; a(3) = 1,\; a(4) = 1, ...

Nice! So let’s turn to the lemma, which is the really interesting part.

I’ll just quote Omar Antolín’s proof, since I can’t improve on it. I believe the ideas go back to Lovász, but a bit of category theory really helps. Remember, AA and BB are algebras of some Lawvere theory whose underlying sets are finite:

Let mon(X,A)\mathrm{mon}(X, A) be the set of monomorphisms, which here are just homomorphisms that are injective functions. I claim you can compute the cardinality of mon(X,A)\mathrm{mon}(X, A) using the inclusion-exclusion principle in terms of the cardinalities of hom(Q,A)\mathrm{hom}(Q, A) for various quotients of XX.

Indeed, for any pair of elements x,yXx, y \in X, let S(x,y)S(x, y) be the set for homomorphisms f:XAf \colon X \to A such that f(x)=f(y)f(x) = f(y). The monomorphisms are just the homomorphisms that belong to none of the sets S(x,y)S(x, y), so you can compute how many there are via the inclusion-exclusion formula: you’ll just need the cardinality of intersections of several S(x i,y i)S(x_i, y_i).

Now, the intersection of some S(x i,y i)S(x_i, y_i) is the set of homorphisms ff such that for all ii, f(x i)=f(y i)f(x_i) = f(y_i). Those are in bijection with the homorphisms QAQ \to A where QQ is the quotient of XX obtained by adding the relations x i=y ix_i=y_i for each ii.

So far I hope I’ve convinced you that if hom(,A)\mathrm{hom}(-, A) and hom(,B)\mathrm{hom}(-, B) are unnaturally isomorphic, so are mon(,A)\mathrm{mon}(-, A) and mon(,B)\mathrm{mon}(-, B). Now it’s easy to finish: since mon(A,A)\mathrm{mon}(A, A) is non-empty, so is mon(A,B)\mathrm{mon}(A, B), so AA is isomorphic to a subobject of BB. Similarly BB is isomorphic to a subobject of AA, and since they are finite, they must be isomorphic.

Beautiful!

But if you look at this argument you’ll see we didn’t use the full force of the assumptions. We didn’t need AA and BB to be algebras of a Lawvere theory. They could have been topological spaces, or posets, or simple graphs (which you can think of as reflexive symmetric relations), or various other things. It seems all we really need is a category C\mathsf{C} of gadgets with a forgetful functor

U:CFinSet U \colon \mathsf{C} \to \mathsf{FinSet}

that is faithful and has some extra property… roughly, that we can take an object in C\mathsf{C} and take a quotient of it where we impose a bunch of extra relations x i=y ix_i = y_i, and maps out of this quotient will behave as you’d expect. More precisely, I think the extra property is this:

Given any XCX \in \mathsf{C} and any surjection p:U(X)Sp \colon U(X) \to S, there is a morphism j:XQj \colon X \to Q such that the morphisms f:XYf \colon X \to Y that factor through jj are precisely those for which U(f)U(f) factors through pp.

Can anyone here shed some light on this property, and which faithful functors U:CFinSetU \colon \mathsf{C} \to \mathsf{FinSet} have it? These papers should help:

but I haven’t had time to absorb them yet.

By the way, there’s a name for categories where the super-Yoneda Lemma holds: they’re called right combinatorial.

And there’s a name for the sequences I’m talking about. If TT is a Lawvere theory, the sequence whose nnth term is the number of isomorphism classes of TT-algebras with nn elements is called the fine spectrum of TT. The idea was introduced here:

  • Walter Taylor, The fine spectrum of a variety, Algebra Universalis 5 (1975), 263–303.

though Taylor used not Lawvere theories but an equivalent framework: ‘varieties’ in the sense of universal algebra. For a bit more on this, go here.

I’m interested in which sequences are the fine spectrum of some Lawvere theory. You could call this an ‘inverse problem’. The direct problem — computing the fine spectrum of a given Lawvere theory — is already extremely difficult in many cases. But the case where there aren’t any equational laws (except trivial ones) is manageable:

Some errors in Harrison’s paper were corrected here:

I suspect Harrison and Tureček’s formulas could be nicely derived using species, since they’re connected to the tree-like structures discussed here:

  • François Bergeron, Gilbert Labelle and Pierre Leroux, Combinatorial Species and Tree-Like Structures, Cambridge U. Press, Cambridge, 1998.

For all I know these authors point this out! It’s been a while since I’ve read this book.

Tommaso DorigoThe Monte Carlo Method


These days I am in Paris, for a short vacation - for once, I am following my wife in a work trip; she performs at the grand Halle at la Villette (she is a soprano singer), and I exploit the occasion to have some pleasant time in one of the cities I like the most.


This morning I took the metro to go downtown, and found myself standing up in a wagon full of people. When my eyes wandered to the pavement, I saw that the plastic sheet had circular bumps, presumably reducing the chance of slips. And the pattern immediately reminded me of the Monte Carlo method, as it betrayed the effect of physical sampling of the ground by the passengers' feet:



read more

September 16, 2023

John BaezThe Triassic-Jurassic Extinction

214 million years ago an asteroid hit what is now Canada. Now the crater is a ring-shaped reservoir 70 kilometers in diameter: Manicouagan Reservoir.

Did this impact cause a mass extinction? The asteroid was 5 kilometers across, while the one that killed the dinosaurs much later was 10 kilometers across. But this is still huge!

For a while people thought this impact may have caused the Triassic-Jurassic mass extinction event. But now that the crater has been carefully dated, they don’t think that anymore. The extinction happened 12 million years later!

In the Triassic-Jurassic mass extinction, all the really huge amphibians died out—like Mastodonsaurus, shown above. So did lots of large reptiles. This let another kind of reptile—dinosaurs—become the dominant land animals for the next 135 million years.

So what caused this mass extinction? A mass extinction event is like a crime scene: you see the dead body, or more precisely the absence of fossils after the event, and you see other clues, but it’s quite hard to figure out the killer.

One big clue is that there was an enormous amount of volcanic activity near the end of the Triassic and start of the Jurassic, as the supercontinent Pangaea split apart. It lasted for about 600,000 years. In fact, there’s about 11 million square kilometers of basalt left over from this event, spread over the eastern Americas, western Africa, Spain, and northwestern France! It’s called the Central Atlantic magmatic province or CAMP.

So, this event could have put huge amounts of carbon dioxide into the air, causing an intense bout of global warming.

(I’m giving a public lecture on mass extinctions, so I’m boning up on them now.)

September 15, 2023

Matt von HippelStories Backwards and Forwards

You can always start with “once upon a time”…

I come up with tricks to make calculations in particle physics easier. That’s my one-sentence story, or my most common one. If I want to tell a longer story, I have more options.

Here’s one longer story:

I want to figure out what Nature is telling us. I want to take all the data we have access to that has anything to say about fundamental physics, every collider and gravitational wave telescope and ripple in the overall structure of the universe, and squeeze it as hard as I can until something comes out. I want to make sure we understand the implications of our current best theories as well as we can, to as high precision as we can, because I want to know whether they match what we see.

To do that, I am starting with a type of calculation I know how to do best. That’s both because I can make progress with it, and because it will be important for making these inferences, for testing our theories. I am following a hint in a theory that definitely does not describe the real world, one that is both simpler to work with and surprisingly complex, one that has a good track record, both for me and others, for advancing these calculations. And at the end of the day, I’ll make our ability to infer things from Nature that much better.

Here’s another:

Physicists, unknowing, proposed a kind of toy model, one often simpler to work with but not necessarily simpler to describe. Using this model, they pursued increasingly elaborate calculations, and time and time again, those calculations surprised them. The results were not random, not a disorderly mess of everything they could plausibly have gotten. Instead, they had structure, symmetries and patterns and mathematical properties that the physicists can’t seem to explain. If we can explain them, we will advance our knowledge of models and theories and ideas, geometry and combinatorics, learning more about the unexpected consequences of the rules we invent.

We can also help the physicists advance physics, of course. That’s a happy accident, but one that justifies the money and time, showing the rest of the world that understanding consequences of rules is still important and valuable.

These seem like very different stories, but they’re not so different. They change in order, physics then math or math then physics, backwards and forwards. By doing that, they change in emphasis, in where they’re putting glory and how they’re catching your attention. But at the end of the day, I’m investigating mathematical mysteries, and I’m advancing our ability to do precision physics.

(Maybe you think that my motivation must lie with one of these stories and not the other. One is “what I’m really doing”, the other is a lie made up for grant agencies.
Increasingly, I don’t think people work like that. If we are at heart stories, we’re retroactive stories. Our motivation day to day doesn’t follow one neat story or another. We move forward, we maybe have deep values underneath, but our accounts of “why” can and will change depending on context. We’re human, and thus as messy as that word should entail.)

I can tell more than two stories if I want to. I won’t here. But this is largely what I’m working on at the moment. In applying for grants, I need to get the details right, to sprinkle the right references and the right scientific arguments, but the broad story is equally important. I keep shuffling that story, a pile of not-quite-literal index cards, finding different orders and seeing how they sound, imagining my audience and thinking about what stories would work for them.

September 13, 2023

John BaezSeminar on “This Week’s Finds”

Summer is coming to a close! It’s almost time to continue my seminars on topics from This Week’s Finds in Mathematical Physics. As before, I’ll be doing these on Thursdays at 3:00 pm UK time in Room 6206 of the James Clerk Maxwell Building, home of the School of Mathematics at the University of Edinburgh.

The first talk will be on Thursday September 21st, and the last on November 30th. I’ll skip October 19th and 27th… and any days there are strikes.

We’re planning to

1) make the talks hybrid on Zoom so that you can join online:

https://ed-ac-uk.zoom.us/j/82270325098
Meeting ID: 822 7032 5098
Passcode: XXXXXX36

Here the X’s stand for the name of the famous lemma in category theory.

2) make lecture notes available on my website.

3) record them and eventually make them publicly available on my YouTube channel.

4) have a Zulip channel on the Category Theory Community Server dedicated to discussion of the seminars: it’s here.

More details soon!

September 10, 2023

John PreskillCan Thermodynamics Resolve the Measurement Problem?

At the recent Quantum Thermodynamics conference in Vienna (coming next year to the University of Maryland!), during an expert panel Q&A session, one member of the audience asked “can quantum thermodynamics address foundational problems in quantum theory?”

That stuck with me, because that’s exactly what my research is about. So naturally, I’d say the answer is yes! In fact, here in the group of Marcus Huber at the Technical University of Vienna, we think thermodynamics may have something to say about the biggest quantum foundations problem of all: the measurement problem.

It’s sort of the iconic mystery of quantum mechanics: we know that an electron can be in two places at once – in a ‘superposition’ – but when we measure it, it’s only ever seen to be in one place, picked seemingly at random from the two possibilities. We say the state has ‘collapsed’.

What’s going on here? Thanks to Bell’s legendary theorem, we know that the answer can’t just be that it was always actually in one place and we just didn’t know which option it was – it really was in two places at once until it was measured1. But also, we don’t see this effect for sufficiently large objects. So how can this ‘two-places-at-once’ thing happen at all, and why does it stop happening once an object gets big enough?

Here, we already see hints that thermodynamics is involved, because even classical thermodynamics says that big systems behave differently from small ones. And interestingly, thermodynamics also hints that the narrative so far can’t be right. Because when taken at face value, the ‘collapse’ model of measurement breaks all three laws of thermodynamics.

Imagine an electron in a superposition of two energy levels: a combination of being in its ground state and first excited state. If we measure it and it ‘collapses’ to being only in the ground state, then its energy has decreased: it went from having some average of the ground and excited energies to just having the ground energy. The first law of thermodynamics says (crudely) that energy is conserved, but the loss of energy is unaccounted for here.

Next, the second law says that entropy always increases. One form of entropy represents your lack of information about a system’s state. Before the measurement, the system was in one of two possible states, but afterwards it was in only one state. So speaking very broadly, our uncertainty about its state, and hence the entropy, is reduced. (The third law is problematic here, too.)

There’s a clear explanation here: while the system on its own decreases its entropy and doesn’t conserve energy, in order to measure something, we must couple the system to a measuring device. That device’s energy and entropy changes must account for the system’s changes.

This is the spirit of our measurement model2. We explicitly include the detector as a quantum object in the record-keeping of energy and information flow. In fact, we also include the entire environment surrounding both system and device – all the lab’s stray air molecules, photons, etc. Then the idea is to describe a measurement process as propagating a record of a quantum system’s state into the surroundings without collapsing it.

A schematic representation of a system spreading information into an environment (from Schwarzhans et al., with permission)

But talking about quantum systems interacting with their environments is nothing new. The “decoherence” model from the 70s, which our work builds on, says quantum objects become less quantum when buffeted by a larger environment.

The problem, though, is that decoherence describes how information is lost into an environment, and so usually the environment’s dynamics aren’t explicitly calculated: this is called an open-system approach. By contrast, in the closed-system approach we use, you model the dynamics of the environment too, keeping track of all information. This is useful because conventional collapse dynamics seems to destroy information, but every other fundamental law of physics seems to say that information can’t be destroyed.

This all allows us to track how information flows from system to surroundings, using the “Quantum Darwinism” (QD) model of W.H. Żurek. Whereas decoherence describes how environments affect systems, QD describes how quantum systems impact their environments by spreading information into them. The QD model says that the most ‘classical’ information – the kind most consistent with classical notions of ‘being in one place’, etc. – is the sort most likely to ‘survive’ the decoherence process.

QD then further asserts that this is the information that’s most likely to be copied into the environment. If you look at some of a system’s surroundings, this is what you’d most likely see. (The ‘Darwinism’ name is because certain states are ‘selected for’ and ‘replicate’3.)

So we have a description of what we want the post-measurement state to look like: a decohered system, with its information redundantly copied into its surrounding environment. The last piece of the puzzle, then, is to ask how a measurement can create this state. Here, we finally get to the dynamics part of the thermodynamics, and introduce equilibration.

Earlier we said that even if the system’s entropy decreases, the detector’s entropy (or more broadly the environment’s) should go up to compensate. Well, equilibration maximizes entropy. In particular, equilibration describes how a system tends towards a particular ‘equilibrium’ state, because the system can always increase its entropy by getting closer to it.

It’s usually said that systems equilibrate if put in contact with an external environment (e.g. a can of beer cooling in a fridge), but we’re actually interested in a different type of equilibration called equilibration on average. There, we’re asking for the state that a system stays roughly close to, on average, over long enough times, with no outside contact. That means it never actually decoheres, it just looks like it does for certain observables. (This actually implies that nothing ever actually decoheres, since open systems are only an approximation you make when you don’t want to track all of the environment.)

Equilibration is the key to the model. In fact, we call our idea the Measurement-Equilibration Hypothesis (MEH): we’re asserting that measurement is an equilibration process. Which makes the final question: what does all this mean for the measurement problem?

In the MEH framework, when someone ‘measures’ a quantum system, they allow some measuring device, plus a chaotic surrounding environment, to interact with it. The quantum system then equilibrates ‘on average’ with the environment, and spreads information about its classical states into the surroundings. Since you are a macroscopically large human, any measurement you do will induce this sort of equilibration to happen, meaning you will only ever have access to the classical information in the environment, and never see superpositions. But no collapse is necessary, and no information is lost: rather some information is only much more difficult to access in all the environment noise, as happens all the time in the classical world.

It’s tempting to ask what ‘happens’ to the outcomes we don’t see, and how nature ‘decides’ which outcome to show to us. Those are great questions, but in our view, they’re best left to philosophers4. For the question we care about: why measurements look like a ‘collapse’, we’re just getting started with our Measurement-Equilibration Hypothesis – there’s still lots to do in our explorations of it. We think the answers we’ll uncover in doing so will form an exciting step forward in our understanding of the weird and wonderful quantum world.

Members of the MEH team at a kick-off meeting for the project in Vienna in February 2023. Left to right: Alessandro Candeloro, Marcus Huber, Emanuel Schwarzhans, Tom Rivlin, Sophie Engineer, Veronika Baumann, Nicolai Friis, Felix C. Binder, Mehul Malik, Maximilian P.E. Lock, Pharnam Bakhshinezhad

Acknowledgements: Big thanks to the rest of the MEH team for all the help and support, in particular Dr. Emanuel Schwarzhans and Dr. Lock for reading over this piece!)

Here are a few choice references (by no means meant to be comprehensive!)

Quantum Thermodynamics (QTD) Conference 2023: https://qtd2023.conf.tuwien.ac.at/
QTD 2024: https://qtd-hub.umd.edu/event/qtd-conference-2024/
Bell’s Theorem: https://plato.stanford.edu/entries/bell-theorem/
The first MEH paper: https://arxiv.org/abs/2302.11253
A review of decoherence: https://journals.aps.org/rmp/abstract/10.1103/RevModPhys.75.715
Quantum Darwinism: https://www.nature.com/articles/nphys1202
Measurements violate the 3rd law: https://quantum-journal.org/papers/q-2020-01-13-222/
More on the 3rd and QM: https://journals.aps.org/prxquantum/abstract/10.1103/PRXQuantum.4.010332
Equilibration on average: https://iopscience.iop.org/article/10.1088/0034-4885/79/5/056001/meta
Objectivity: https://journals.aps.org/pra/abstract/10.1103/PhysRevA.91.032122

  1. There is a perfectly valid alternative with other weird implications: that it was always just in one place, but the world is intrinsically non-local. Most physicists prefer to save locality over realism, though. ↩
  2. First proposed in this paper by Schwarzhans, Binder, Huber, and Lock: https://arxiv.org/abs/2302.11253 ↩
  3. In my opinion… it’s a brilliant theory with a terrible name! Sure, there’s something akin to ‘selection pressure’ and ‘reproduction’, but there aren’t really any notions of mutation, adaptation, fitness, generations… Alas, the name has stuck. ↩
  4. I actually love thinking about this question, and the interpretations of quantum mechanics more broadly, but it’s fairly orthogonal to the day-to-day research on this model. ↩

September 08, 2023

Matt von HippelGetting Started in Saclay

I started work this week in my new position, as a permanent researcher at the Institute for Theoretical Physics of CEA Paris-Saclay. I’m still settling in, figuring out how to get access to the online system and food at the canteen and healthcare. Things are slowly getting into shape, with a lot of running around involved. Until then, I don’t have a ton of time to write (and am dedicating most of it to writing grants!) But I thought, mirroring a post I made almost a decade ago, that I’d at least give you a view of my new office.

September 07, 2023

Doug NatelsonThings I learned at the Packard Foundation meeting

Early in my career, I was incredibly fortunate to be awarded a David and Lucille Packard Foundation fellowship, and this week I attended the meeting in honor of the 35th anniversary of the fellowship program.  Packard fellowships are amazing, with awardees spanning the sciences (including math) and engineering, providing resources for a sustained period (5 years) with enormous flexibility.  The meetings have been some of the most fun ones I've ever attended, with talks by incoming and outgoing fellows that are short (20 min) and specifically designed to be accessible by scientifically literate non-experts.  My highlights from the meeting ten years ago (the last one I attended) are here.  Highlights from meetings back when I was a fellow are here, herehere, here.

Here are some cool things that I learned at the meeting (some of which I'm sure I should've known), from a few of the talks + posters.  (Unfortunately I cannot stay for the last day, so apologies for missing some great presentations.)   I will further update this post later in the day and tomorrow.

  • By the 2040s, with the oncoming LISA and Cosmic Explorer/Einstein Telescope instruments, it's possible that we will be able to detect every blackhole merger in the entire visible universe.
  • It's very challenging to have models of galaxy evolution that handle how supernovae regulate mass outflow and star formation to end up with what we see statistically in the sky
  • Machine learning can be really good at disentangling overlapping seismic events.
  • In self-propelled/active matter, it's possible to start with particles that just have a hard-shell repulsion and still act like there is an effective attractive interaction that leads to clumping.
  • There are about \(10^{14}\) bacteria in each person, with about 360\(\times\) the genetic material of the person.  Also, the gut has lots of neurons, five times as many as the spinal cord (!).  The gut microbiome can seemingly influence concentrations of neurotransmitters.
  • Bees can deliberately damage leaves of plants to stress the flora and encourage earlier and more prolific flowering.
  • For some bio-produced materials that are nominally dry, their elastic properties and the dependence of those properties on humidity is seemingly controlled almost entirely by the water they contain.  
  • It is now possible to spatially resolve gene expression (via mRNA) at the single cell level across whole slices of, e.g., mouse brain tissue.  Mind-blowing links here and here.
  • I knew that ordinary human red blood cells have no organelles, and therefore they can't really respond much to stimuli.  What I did not know is that maturing red blood cells (erythrocyte precurors) in bone marrow start with nuclei and can participate in immune response, and that red blood cells in fetuses (and then at trace level in pregnant mothers) circulate all the different progenitor cells, potentially playing an important role in immune response.
  • 45% of all deaths in the US can be attributed in part to fibrosis (scarring) issues (including cardiac problems), but somehow the uterus can massively regenerate monthly without scarring.  Also, zero common lab animals menstruate, which is a major obstacle for research; transgenic mice can now be made so that there are good animal models for study. 
  • Engineered cellulose materials can be useful for radiative cooling to the sky and can be adapted for many purposes, like water harvesting from the atmosphere with porous fabrics.


September 06, 2023

Terence TaoMonotone non-decreasing sequences of the Euler totient function

I have just uploaded to the arXiv my paper “Monotone non-decreasing sequences of the Euler totient function“. This paper concerns the quantity {M(x)}, defined as the length of the longest subsequence of the numbers from {1} to {x} for which the Euler totient function {\varphi} is non-decreasing. The first few values of {M} are

\displaystyle  1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 12, 12, \dots

(OEIS A365339). For instance, {M(6)=5} because the totient function is non-decreasing on the set {\{1,2,3,4,5\}} or {\{1,2,3,4,6\}}, but not on the set {\{1,2,3,4,5,6\}}.

Since {\varphi(p)=p-1} for any prime {p}, we have {M(x) \geq \pi(x)}, where {\pi(x)} is the prime counting function. Empirically, the primes come quite close to achieving the maximum length {M(x)}; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had

\displaystyle  M(x) = \pi(x)+64 \ \ \ \ \ (1)

for all {x \geq 31957}; this conjecture is verified up to {x=10^7}. The previous best known upper bound was basically of the form

\displaystyle  M(x) \leq \exp( (C+o(1)) (\log\log\log x)^2 ) \frac{x}{\log x} \ \ \ \ \ (2)

as {x \rightarrow \infty} for an explicit constant {C = 0.81781\dots}, from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic

\displaystyle  M(x) = \left( 1 + O \left(\frac{(\log\log x)^5}{\log x}\right) \right) \frac{x}{\log x}

so in particular {M(x) = (1+o(1))\pi(x)}. This answers a question of Erdős, as well as a closely related question of Pollack, Pomerance, and Treviño.

The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor {p} of a given number {1 \leq n \leq x} which has a sizeable influence on the totient function {\varphi(n)}. For instance, for “typical” numbers {n}, one has a factorization

\displaystyle  n = d p_2 p_1

where {p_2} is a medium sized prime, {p_1} is a significantly larger prime, and {d} is a number with all prime factors less than {p_2}. This leads to an approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} (1-\frac{1}{p_2}) n.

As a consequence, if we temporarily hold {d} fixed, and also localize {n} to a relatively short interval, then {\varphi} can only be non-decreasing in {n} if {p_2} is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if {p_2} is large; this can be formalized by partitioning the range of {p_2} into various subintervals and inspecting how this (and the monotonicity hypothesis on {\varphi}) constrains the values of {n} associated to each subinterval. When {p_2} is small, we instead use a factorization

\displaystyle  n = d p \ \ \ \ \ (3)

where {d} is very smooth (i.e., has no large prime factors), and {p} is a large prime. Now we have the approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} n \ \ \ \ \ (4)

and we can conclude that {\frac{\varphi(d)}{d}} will have to basically be piecewise constant in order for {\varphi} to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality

\displaystyle  \sum_{\frac{\varphi(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (5)

for all positive rational numbers {q}. This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with {\frac{\varphi(d)}{d}} equal to a fixed constant {q} (and omitting a few rare values of {n} where the approximation (4) is bad enough that {\varphi} is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of {\frac{\varphi(d)}{d}} in lowest terms necessarily equals the largest prime factor of {d}, that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of {\frac{1}{p-1}} for some primes {p} ranging up to the largest prime factor of {q}) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function {\sigma(n)}, one would need the analogue

\displaystyle  \sum_{\frac{\sigma(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (6)

of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.

In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.

September 05, 2023

Tommaso DorigoA Visit To ICTS

The Indian Center for Theoretical Sciences is located in a rural area a few kilometers north of Bangalore, in southern India. Bangalore is a mid-sized city that saw a very big expansion in the past few years due to having become a center for the information technology in the country - with most of the big multinationals opening sections there. The rapid expansion increased the wealth of the middle class there (but remember, the middle class is the top 5% in India), but it also created stress to the traffic in the city, which is notoriously a plague there.
The campus of ICTS is very nice from an architectonic point of view, embedding nature in its buildings and trying to integrate the two realities. Below is a picture.

read more

Jacques Distler dCS

For various reasons, some people seem to think that the following modification to Einstein Gravity
(1)S=12dϕ*dϕ+κ 22*+3ϕ192π 2fTr(RR)S= \int \tfrac{1}{2} d\phi\wedge *d\phi + \tfrac{\kappa^2}{2} *\mathcal{R} + {\color{red} \tfrac{3 \phi}{192\pi^2 f}Tr(R\wedge R)}
is interesting to consider. In some toy world, it might be1. But in the real world, there are nearly massless neutrinos. In the Standard Model, U(1) BLU(1)_{B-L} has a gravitational ABJ anomaly (where, in the real world, the number of generations N f=3N_f=3)
(2)d*J BL=N f192π 2Tr(RR) d * J_{B-L} = \frac{N_f}{192\pi^2} Tr(R\wedge R)
which, by a U(1) BLU(1)_{B-L} rotation, would allow us to entirely remove2 the coupling marked in red in (1). In the real world, the neutrinos are not massless; there’s the Weinberg term
(3)1M(y ij(HL i)(HL j)+h.c.)\frac{1}{M}\left(y^{i j} (H L_i)(H L_j) + \text{h.c.}\right)
which explicitly breaks U(1) BLU(1)_{B-L}. When the Higgs gets a VEV, this term gives a mass m ij=H 2y ijM m^{i j} = \frac{\langle H\rangle^2 y^{i j}}{M} to the neutrinos, So, rather than completely decoupling, ϕ\phi reappears as a (dynamical) contribution to the phase of the neutrino mass matrix
(4)m ijm ije 2iϕ/fm^{i j} \to m^{i j}e^{2i\phi/f}
Of course there is a CP-violating phase in the neutrino mass matrix. But its effects are so tiny that its (presumably nonzero) value is still unknown. Since (4) is rigourously equivalent to (1), the effects of the term in red in (1) are similarly unobservably small. Assertions that it could have dramatic consequences — whether for LIGO or large-scale structure — are … bizarre.

Update:

The claim that (1) has some observable effect is even more bizarre if you are seeking to find one (say) during inflation. Before the electroweak phase transition, H=0\langle H \rangle=0 and the effect of a ϕ\phi-dependent phase in the Weinberg term (3) is even more suppressed.

1 An analogy with Yang Mills might be helpful. In pure Yang-Mills, the θ\theta-parameter is physical; observable quantities depend on it. But, if you introduce a massless quark, it becomes unphysical and all dependence on it drops out. For massive quarks, only the sum of θ\theta and phase of the determinant of the quark mass matrix is physical.
2 The easiest way to see this is to introduce a background gauge field, 𝒜\mathcal{A}, for U(1) BLU(1)_{B-L} and modify (1) to
(5)S=12(dϕf𝒜)*(dϕf𝒜)+κ 22*+3ϕ24π 2f[18Tr(RR)+d𝒜d𝒜]S= \int \tfrac{1}{2} (d\phi-f\mathcal{A})\wedge *(d\phi-f\mathcal{A}) + \tfrac{\kappa^2}{2} *\mathcal{R} + {\color{red} \tfrac{3 \phi}{24\pi^2 f}\left[\tfrac{1}{8}Tr(R\wedge R)+d\mathcal{A}\wedge d\mathcal{A}\right]}
Turning off the Weinberg term, the theory is invariant under U(1) BLU(1)_{B-L} gauge transformations 𝒜 𝒜+dχ ϕ ϕ+fχ Q i e iχ/3Q i u¯ i e iχ/3u¯ i d¯ i e iχ/3d¯ i L i e iχL i e¯ i e iχe¯ i \begin{split} \mathcal{A}&\to \mathcal{A}+d\chi\\ \phi&\to \phi+ f \chi\\ Q_i&\to e^{i\chi/3}Q_i\\ \overline{u}_i&\to e^{-i\chi/3}\overline{u}_i\\ \overline{d}_i&\to e^{-i\chi/3}\overline{d}_i\\ L_i&\to e^{-i\chi}L_i\\ \overline{e}_i&\to e^{i\chi}\overline{e}_i\\ \end{split} where the anomalous variation of the fermions cancels the variation of the term in red. Note that the first term in (5) is a gauge-invariant mass term for 𝒜\mathcal{A} (or would be if we promoted 𝒜\mathcal{A} to a dynamical gauge field). Choosing χ=ϕ/f\chi = -\phi/f eliminates the term in red. Turning back on the Weinberg term (which explicitly breaks U(1) BLU(1)_{B-L}) puts the coupling to ϕ\phi into the neutrino mass matrix (where it belongs).

September 01, 2023

Matt von HippelCosmology and the Laws of Physics

Suppose you were an unusual sort of person: one who wanted, above all else, to know the laws of physics. Not content with the rules governing just one sort of thing, a star or an atom or a galaxy, you want to know the fundamental rules behind everything in the universe.

A good reductionist, you know that smaller things are more fundamental: the rules of the parts of things determine the rules of the whole. Knowing about quantum mechanics, you know that the more precisely you want to pin down something’s position, the more uncertain its momentum will be. And aware of special relativity, you know that terms like “small thing” or “high momentum” are relative: things can look bigger or smaller, faster or slower, depending on how they move relative to you. If you want to find the most fundamental things then, you end up needing not just small things or high momenta, but a lot of energy packed into a very small space.

You can get this in a particle collider, and that’s why they’re built. By colliding protons or electrons, you can cram a lot of energy into a very small space, and the rules governing that collision will be some of the most fundamental rules you have access to. By comparing your measurements of those collisions with your predictions, you can test your theories and learn more about the laws of physics.

If you really just wanted to know the laws of physics, then you might thing cosmology would be less useful. Cosmology is the science of the universe as a whole, how all of the stars and galaxies and the space-time around them move and change over the whole history of the universe. Dealing with very large distances, cosmology seems like it should take you quite far away from universal reductionist physical law.

If you thought that, you’d be missing one essential ingredient: the Big Bang. In the past, the universe was (as the song goes) in a hot dense state. The further back in time you look, the hotter and denser it gets. Go far enough back, and you find much higher energies, crammed into much smaller spaces, than we can make in any collider here on Earth. That means the Big Bang was governed by laws much more fundamental than the laws we can test here on Earth. And since the Big Bang resulted in the behavior of the universe as a whole, by observing that behavior we can learn more about those laws.

So a cosmologist can, in principle, learn quite a lot about fundamental physics. But cosmology is in many ways a lot harder than working with colliders. In a collider, we can clash protons together many times a second, with measurement devices right next to the collision. In cosmology, we have in a sense only one experiment, the universe we live in. We have to detect the evidence much later than the Big Bang itself, when the cosmic microwave background has cooled down and the structure of the universe has been warped by all the complexities of star and galaxy formation. Because we have only one experiment, all we can do is compare different sections of the sky, but there is only so much sky we can see, and as a consequence there are real limits on how much we can know.

Still, it’s worth finding out what we can know.m Cosmology is the only way at the moment we can learn about physics at very high energies, and thus learn the most fundamental laws. So if you’re someone who cares a lot about that sort of thing, it’s worth paying attention to!

August 31, 2023

Doug NatelsonWhat is the thermal Hall effect?

One thing that physics and mechanical engineering students learn early on is that there are often analogies between charge flow and heat flow, and this is reflected in the mathematical models we use to describe charge and heat transport.  We use Ohm's law, \(\mathbf{j}=\tilde{\sigma}\cdot \mathbf{E}\), which defines an electrical conductivity tensor \(\tilde{\sigma}\) that relates charge current density \(\mathbf{j}\) to electric fields \(\mathbf{E}=-\nabla \phi\), where \(\phi(\mathbf{r})\) is the electric potential.  Similarly, we can use Fourier's law for thermal conduction, \(\mathbf{j}_{Q} = - \tilde{\kappa}\cdot \nabla T\), where \(\mathbf{j}_{Q}\) is a heat current density, \(T(\mathbf{r})\) is the temperature distribution, and \(\tilde{\kappa}\) is the thermal conductivity.  


We know from experience that the electrical conductivity really has to be a tensor, meaning that the current and the electric field don't have to point along each other.  The most famous example of this, the Hall effect, goes back a long way, discovered by Edwin Hall in 1879.  The phenomenon is easy to describe.  Put a conductor in a magnetic field (directed along \(z\)), and drive a (charge) current \(I_{x}\) along it (along \(x\)), as shown, typically by applying a voltage along the \(x\) direction, \(V_{xx}\).  Hall found that there is then a transverse voltage that develops, \(V_{xy}\) that is proportional to the current.  The physical picture for this is something that we teach to first-year undergrads:  The charge carriers in the conductor obey the Lorentz force law and curve in the presence of a magnetic field.  There can't be a net current in the \(y\) direction because of the edges of the sample, so a transverse (\(y\)-directed) electric field has to build up.  

There can also be a thermal Hall effect, when driving heat conduction in one direction (say \(x\)) leads to an additional temperature gradient in a transverse (\(y\)) direction.  The least interesting version of this (the Maggi–Righi–Leduc effect) is in fact a consequence of the regular Hall effect:  the same charge carriers in a conductor can carry thermal energy as well as charge, so thermal energy just gets dragged sideways.   

Surprisingly, insulators can also show a thermal Hall effect.  That's rather unintuitive, since whatever is carrying thermal energy in the insulator is not some charged object obeying the Lorentz force law.  Interestingly, there are several distinct mechanisms that can lead to thermal Hall response.  With phonons carrying the thermal energy, you can have magnetic field affecting the scattering of phonons, and you can also have intrinsic curving of phonon propagation due to Berry phase effects.  In magnetic insulators, thermal energy can also be carried by magnons, and there again you can have Berry phase effects giving you a magnon Hall effect.  There can also be a thermal Hall signal from topological magnon modes that run around the edges of the material.  In special magnetic insulators (Kitaev systems), there are thought to be special Majorana edge modes that can give quantized thermal Hall response, though non-quantized response argues that topological magnon modes are relevant in those systems.  The bottom line:  thermal Hall effects are real and it can be very challenging to distinguish between candidate mechanisms. 

(Note: Blogger now compresses the figures, so click on the image to see a higher res version.)




August 28, 2023

John PreskillThe Book of Mark, Chapter 2

Late in the summer of 2021, I visited a physics paradise in a physical paradise: the Kavli Institute for Theoretical Physics (KITP). The KITP sits at the edge of the University of California, Santa Barbara like a bougainvillea bush at the edge of a yard. I was eating lunch outside the KITP one afternoon, across the street from the beach. PhD student Arman Babakhani, whom a colleague had just introduced me to, had joined me.

The KITP’s Kohn Hall

What physics was I working on nowadays? Arman wanted to know.

Thermodynamic exchanges. 

The world consists of physical systems exchanging quantities with other systems. When a rose blooms outside the Santa Barbara mission, it exchanges pollen with the surrounding air. The total amount of pollen across the rose-and-air whole remains constant, so we call the amount a conserved quantity. Quantum physicists usually analyze conservation of particles, energy, and magnetization. But quantum systems can conserve quantities that participate in uncertainty relations. Such quantities are called incompatible, because you can’t measure them simultaneously. The x-, y-, and z-components of a qubit’s spin are incompatible.

The Santa Barbara mission…
…and its roses

Exchanging and conserving incompatible quantities, systems can violate thermodynamic expectations. If one system is much larger than the other, we expect the smaller system to thermalize; yet incompatibility invalidates derivations of the thermal state’s form. Incompatibility reduces the thermodynamic entropy produced by exchanges. And incompatibility can raise the average amount entanglement in the pair of systems—the total system.

If the total system conserves incompatible quantities, what happens to the eigenstate thermalization hypothesis (ETH)? Last month’s blog post overviewed the ETH, a framework for understanding how quantum many-particle systems thermalize internally. That post labeled Mark Srednicki, a professor at the KITP, a high priest of the ETH. I want, I told Arman, to ask Mark what happens when you combine the ETH with incompatible conserved quantities.

I’ll do it, Arman said.

Soon after, I found myself in the fishbowl. High up in the KITP, a room filled with cushy seats overlooks the ocean. The circular windows lend the room its nickname. Arrayed on the armchairs and couches were Mark, Arman, Mark’s PhD student Fernando Iniguez, and Mark’s recent PhD student Chaitanya Murthy. The conversation went like this:

Mark was frustrated about not being able to answer the question. I was delighted to have stumped him. Over the next several weeks, the group continued meeting, and we emailed out notes for everyone to criticize. I particulary enjoyed watching Mark and Chaitanya interact. They’d grown so intellectually close throughout Chaitanya’s PhD studies, they reminded me of an old married couple. One of them had to express only half an idea for the other to realize what he’d meant and to continue the thread. Neither had any qualms with challenging the other, yet they trusted each other’s judgment.1

In vintage KITP fashion, we’d nearly completed a project by the time Chaitanya and I left Santa Barbara. Physical Review Letters published our paper this year, and I’m as proud of it as a gardener of the first buds from her garden. Here’s what we found.

Southern California spoiled me for roses.

Incompatible conserved quantities conflict with the ETH and the ETH’s prediction of internal thermalization. Why? For three reasons. First, when inferring thermalization from the ETH, we assume that the Hamiltonian lacks degeneracies (that no energy equals any other). But incompatible conserved quantities force degeneracies on the Hamiltonian.2 

Second, when inferring from the ETH that the system thermalizes, we assume that the system begins in a microcanonical subspace. That’s an eigenspace shared by the conserved quantities (other than the Hamiltonian)—usually, an eigenspace of the total particle number or the total spin’s z-component. But, if incompatible, the conserved quantities share no eigenbasis, so they might not share eigenspaces, so microcanonical subspaces won’t exist in abundance.

Third, let’s focus on a system of N qubits. Say that the Hamiltonian conserves the total spin components S_x, S_y, and S_z. The Hamiltonian obeys the Wigner–Eckart theorem, which sounds more complicated than it is. Suppose that the qubits begin in a state | s_\alpha, \, m \rangle labeled by a spin quantum number s_\alpha and a magnetic spin quantum number m. Let a particle hit the qubits, acting on them with an operator \mathcal{O} . With what probability (amplitude) do the qubits end up with quantum numbers s_{\alpha'} and m'? The answer is \langle s_{\alpha'}, \, m' | \mathcal{O} | s_\alpha, \, m \rangle. The Wigner–Eckart theorem dictates this probability amplitude’s form. 

| s_\alpha, \, m \rangle and | s_{\alpha'}, \, m' \rangle are Hamiltonian eigenstates, thanks to the conservation law. The ETH is an ansatz for the form of \langle s_{\alpha'}, \, m' | \mathcal{O} | s_\alpha, \, m \rangle—of the elements of matrices that represent operators \mathcal{O} relative to the energy eigenbasis. The ETH butts heads with the Wigner–Eckart theorem, which also predicts the matrix element’s form.

The Wigner–Eckart theorem wins, being a theorem—a proved claim. The ETH is, as the H in the acronym relates, only a hypothesis.

If conserved quantities are incompatible, we have to kiss the ETH and its thermalization predictions goodbye. But must we set ourselves adrift entirely? Can we cling to no buoy from physics’s best toolkit for quantum many-body thermalization?

No, and yes, respectively. Our clan proposed a non-Abelian ETH for Hamiltonians that conserve incompatible quantities—or, equivalently, that have non-Abelian symmetries. The non-Abelian ETH depends on s_\alpha and on Clebsch–Gordan coefficients—conversion factors between total-spin eigenstates | s_\alpha, \, m \rangle and product states | s_1, \, m_1 \rangle \otimes | s_2, \, m_2 \rangle.

Using the non-Abelian ETH, we proved that many systems thermalize internally, despite conserving incompatible quantities. Yet the incompatibility complicates the proof enormously, extending it from half a page to several pages. Also, under certain conditions, incompatible quantities may alter thermalization. According to the conventional ETH, time-averaged expectation values \overline{ \langle \mathcal{O} \rangle }_t come to equal thermal expectation values \langle \mathcal{O} \rangle_{\rm th} to within O( N^{-1} ) corrections, as I explained last month. The correction can grow polynomially larger in the system size, to O( N^{-1/2} ), if conserved quantities are incompatible. Our conclusion holds under an assumption that we argue is physically reasonable.

So incompatible conserved quantities do alter the ETH, yet another thermodynamic expectation. Physicist Jae Dong Noh began checking the non-Abelian ETH numerically, and more testing is underway. And I’m looking forward to returning to the KITP this fall. Tales do say that paradise is a garden.

View through my office window at the KITP

1Not that married people always trust each other’s judgment.

2The reason is Schur’s lemma, a group-theoretic result. Appendix A of this paper explains the details.

August 26, 2023

Jordan EllenbergNew York (list form)

Went with AB to New York for three days and this is what we did/saw/ate, in rough chronological order:

Korean barbecue at Antoya, Natural History Museum (including new Gilder wing and finally re-opened Northwest Coast hall, my favorite), belly lox from Zabar’s eaten at Riverside Park, Little Shop of Horrors revival, bubble tea, slice at 2 Bros, breakfast at Katz’s (people, if you can stomach a 3/4-pound pastrami sandwich at 10am this is absolutely the way to beat the line), the Strand, shake at the original Shake Shack, The Play That Goes Wrong, observation deck at Top of the Rock, MUJI, Churrascaria Plataforma, breakfast at Junior’s Cheesecake, the Intrepid museum, the Staten Island Ferry (why is this free?), old high school friend, old grad school friend, Korean fried chicken at Turntable Chicken Jazz, one final bubble tea.

Transportation note: we didn’t take a taxi or Lyft the entire time. I understand why almost no US city can have a subway and bus network this thick and this good, but boy is it nice. (Maybe relevant is that we didn’t leave Manhattan the entire time except for the two-block radius around the ferry terminal on the Staten Island side.)

August 25, 2023

Terence TaoYoneda’s lemma as an identification of form and function: the case study of polynomials

As someone who had a relatively light graduate education in algebra, the import of Yoneda’s lemma in category theory has always eluded me somewhat; the statement and proof are simple enough, but definitely have the “abstract nonsense” flavor that one often ascribes to this part of mathematics, and I struggled to connect it to the more grounded forms of intuition, such as those based on concrete examples, that I was more comfortable with. There is a popular MathOverflow post devoted to this question, with many answers that were helpful to me, but I still felt vaguely dissatisfied. However, recently when pondering the very concrete concept of a polynomial, I managed to accidentally stumble upon a special case of Yoneda’s lemma in action, which clarified this lemma conceptually for me. In the end it was a very simple observation (and would be extremely pedestrian to anyone who works in an algebraic field of mathematics), but as I found this helpful to a non-algebraist such as myself, and I thought I would share it here in case others similarly find it helpful.

In algebra we see a distinction between a polynomial form (also known as a formal polynomial), and a polynomial function, although this distinction is often elided in more concrete applications. A polynomial form in, say, one variable with integer coefficients, is a formal expression {P} of the form

\displaystyle  P = a_d {\mathrm n}^d + \dots + a_1 {\mathrm n} + a_0 \ \ \ \ \ (1)

where {a_0,\dots,a_d} are coefficients in the integers, and {{\mathrm n}} is an indeterminate: a symbol that is often intended to be interpreted as an integer, real number, complex number, or element of some more general ring {R}, but is for now a purely formal object. The collection of such polynomial forms is denoted {{\bf Z}[{\mathrm n}]}, and is a commutative ring.

A polynomial form {P} can be interpreted in any ring {R} (even non-commutative ones) to create a polynomial function {P_R : R \rightarrow R}, defined by the formula

\displaystyle  P_R(n) := a_d n^d + \dots + a_1 n + a_0 \ \ \ \ \ (2)

for any {n \in R}. This definition (2) looks so similar to the definition (1) that we usually abuse notation and conflate {P} with {P_R}. This conflation is supported by the identity theorem for polynomials, that asserts that if two polynomial forms {P, Q} agree at an infinite number of (say) complex numbers, thus {P_{\bf C}(z) = Q_{\bf C}(z)} for infinitely many {z}, then they agree {P=Q} as polynomial forms (i.e., their coefficients match). But this conflation is sometimes dangerous, particularly when working in finite characteristic. For instance:

  • (i) The linear forms {{\mathrm n}} and {-{\mathrm n}} are distinct as polynomial forms, but agree when interpreted in the ring {{\bf Z}/2{\bf Z}}, since {n = -n} for all {n \in {\bf Z}/2{\bf Z}}.
  • (ii) Similarly, if {p} is a prime, then the degree one form {{\mathrm n}} and the degree {p} form {{\mathrm n}^p} are distinct as polynomial forms (and in particular have distinct degrees), but agree when interpreted in the ring {{\bf Z}/p{\bf Z}}, thanks to Fermat’s little theorem.
  • (iii) The polynomial form {{\mathrm n}^2+1} has no roots when interpreted in the reals {{\bf R}}, but has roots when interpreted in the complex numbers {{\bf C}}. Similarly, the linear form {2{\mathrm n}-1} has no roots when interpreted in the integers {{\bf Z}}, but has roots when interpreted in the rationals {{\bf Q}}.

The above examples show that if one only interprets polynomial forms in a specific ring {R}, then some information about the polynomial could be lost (and some features of the polynomial, such as roots, may be “invisible” to that interpretation). But this turns out not to be the case if one considers interpretations in all rings simultaneously, as we shall now discuss.

If {R, S} are two different rings, then the polynomial functions {P_R: R \rightarrow R} and {P_S: S \rightarrow S} arising from interpreting a polynomial form {P} in these two rings are, strictly speaking, different functions. However, they are often closely related to each other. For instance, if {R} is a subring of {S}, then {P_R} agrees with the restriction of {P_S} to {R}. More generally, if there is a ring homomorphism {\phi: R \rightarrow S} from {R} to {S}, then {P_R} and {P_S} are intertwined by the relation

\displaystyle  \phi \circ P_R = P_S \circ \phi, \ \ \ \ \ (3)

which basically asserts that ring homomorphism respect polynomial operations. Note that the previous observation corresponded to the case when {\phi} was an inclusion homomorphism. Another example comes from the complex conjugation automorphism {z \mapsto \overline{z}} on the complex numbers, in which case (3) asserts the identity

\displaystyle  \overline{P_{\bf C}(z)} = P_{\bf C}(\overline{z})

for any polynomial function {P_{\bf C}} on the complex numbers, and any complex number {z}.

What was surprising to me (as someone who had not internalized the Yoneda lemma) was that the converse statement was true: if one had a function {F_R: R \rightarrow R} associated to every ring {R} that obeyed the intertwining relation

\displaystyle  \phi \circ F_R = F_S \circ \phi \ \ \ \ \ (4)

for every ring homomorphism {\phi: R \rightarrow S}, then there was a unique polynomial form {P \in {\bf Z}[\mathrm{n}]} such that {F_R = P_R} for all rings {R}. This seemed surprising to me because the functions {F} were a priori arbitrary functions, and as an analyst I would not expect them to have polynomial structure. But the fact that (4) holds for all rings {R,S} and all homomorphisms {\phi} is in fact rather powerful. As an analyst, I am tempted to proceed by first working with the ring {{\bf C}} of complex numbers and taking advantage of the aforementioned identity theorem, but this turns out to be tricky because {{\bf C}} does not “talk” to all the other rings {R} enough, in the sense that there are not always as many ring homomorphisms from {{\bf C}} to {R} as one would like. But there is in fact a more elementary argument that takes advantage of a particularly relevant (and “talkative”) ring to the theory of polynomials, namely the ring {{\bf Z}[\mathrm{n}]} of polynomials themselves. Given any other ring {R}, and any element {n} of that ring, there is a unique ring homomorphism {\phi_{R,n}: {\bf Z}[\mathrm{n}] \rightarrow R} from {{\bf Z}[\mathrm{n}]} to {R} that maps {\mathrm{n}} to {n}, namely the evaluation map

\displaystyle  \phi_{R,n} \colon a_d {\mathrm n}^d + \dots + a_1 {\mathrm n} + a_0 \mapsto a_d n^d + \dots + a_1 n + a_0

that sends a polynomial form to its evaluation at {n}. Applying (4) to this ring homomorphism, and specializing to the element {\mathrm{n}} of {{\bf Z}[\mathrm{n}]}, we conclude that

\displaystyle  \phi_{R,n}( F_{{\bf Z}[\mathrm{n}]}(\mathrm{n}) ) = F_R( n )

for any ring {R} and any {n \in R}. If we then define {P \in {\bf Z}[\mathrm{n}]} to be the formal polynomial

\displaystyle  P := F_{{\bf Z}[\mathrm{n}]}(\mathrm{n}),

then this identity can be rewritten as

\displaystyle  F_R = P_R

and so we have indeed shown that the family {F_R} arises from a polynomial form {P}. Conversely, from the identity

\displaystyle  P = P_{{\bf Z}[\mathrm{n}]}(\mathrm{n})

valid for any polynomial form {P}, we see that two polynomial forms {P,Q} can only generate the same polynomial functions {P_R, Q_R} for all rings {R} if they are identical as polynomial forms. So the polynomial form {P} associated to the family {F_R} is unique.

We have thus created an identification of form and function: polynomial forms {P} are in one-to-one correspondence with families of functions {F_R} obeying the intertwining relation (4). But this identification can be interpreted as a special case of the Yoneda lemma, as follows. There are two categories in play here: the category {\mathbf{Ring}} of rings (where the morphisms are ring homomorphisms), and the category {\mathrm{Set}} of sets (where the morphisms are arbitrary functions). There is an obvious forgetful functor {\mathrm{Forget}: \mathbf{Ring} \rightarrow \mathbf{Set}} between these two categories that takes a ring and removes all of the algebraic structure, leaving behind just the underlying set. A collection {F_R: R \rightarrow R} of functions (i.e., {\mathbf{Set}}-morphisms) for each {R} in {\mathbf{Ring}} that obeys the intertwining relation (4) is precisely the same thing as a natural transformation from the forgetful functor {\mathrm{Forget}} to itself. So we have identified formal polynomials in {{\bf Z}[\mathbf{n}]} as a set with natural endomorphisms of the forgetful functor:

\displaystyle  \mathrm{Forget}({\bf Z}[\mathbf{n}]) \equiv \mathrm{Hom}( \mathrm{Forget}, \mathrm{Forget} ). \ \ \ \ \ (5)

Informally: polynomial forms are precisely those operations on rings that are respected by ring homomorphisms.

What does this have to do with Yoneda’s lemma? Well, remember that every element {n} of a ring {R} came with an evaluation homomorphism {\phi_{R,n}: {\bf Z}[\mathrm{n}] \rightarrow R}. Conversely, every homomorphism from {{\bf Z}[\mathrm{n}]} to {R} will be of the form {\phi_{R,n}} for a unique {n} – indeed, {n} will just be the image of {\mathrm{n}} under this homomorphism. So the evaluation homomorphism provides a one-to-one correspondence between elements of {R}, and ring homomorphisms in {\mathrm{Hom}({\bf Z}[\mathrm{n}], R)}. This correspondence is at the level of sets, so this gives the identification

\displaystyle  \mathrm{Forget} \equiv \mathrm{Hom}({\bf Z}[\mathrm{n}], -).

Thus our identification can be written as

\displaystyle  \mathrm{Forget}({\bf Z}[\mathbf{n}]) \equiv \mathrm{Hom}( \mathrm{Hom}({\bf Z}[\mathrm{n}], -), \mathrm{Forget} )

which is now clearly a special case of the Yoneda lemma

\displaystyle  F(A) \equiv \mathrm{Hom}( \mathrm{Hom}(A, -), F )

that applies to any functor {F: {\mathcal C} \rightarrow \mathbf{Set}} from a (locally small) category {{\mathcal C}} and any object {A} in {{\mathcal C}}. And indeed if one inspects the standard proof of this lemma, it is essentially the same argument as the argument we used above to establish the identification (5). More generally, it seems to me that the Yoneda lemma is often used to identify “formal” objects with their “functional” interpretations, as long as one simultaneously considers interpretations across an entire category (such as the category of rings), as opposed to just a single interpretation in a single object of the category in which there may be some loss of information due to the peculiarities of that specific object. Grothendieck’s “functor of points” interpretation of a scheme, discussed in this previous blog post, is one typical example of this.

August 24, 2023

Tommaso DorigoOn The Multiverse

I recently read a book by Martin Rees, "On the future". I found it an agile small book packed full with wisdom and interesting considerations on what's in the plate for humanity in the coming decades, centuries, millennia, billions of years. And I agree with much of what he wrote in it, finding also coincidental views on topics I had built my own judgement independently in the past.

read more

Terence TaoAn upper bound on the mean value of the Erdős-Hooley delta function

Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “An upper bound on the mean value of the Erdős-Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Erdos-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \Delta(n;u)

where

\displaystyle  \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.

The function {\Delta} measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, {e}-dyadic) interval {(e^u, e^{u+1}]}. From the pigeonhole principle, we have the bounds

\displaystyle  \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),

where {\tau(n) := \# \{ d: d|n\}} is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if {n} is drawn at random from {1} to {x}, then the mean value of {\tau(n)} is roughly {\log x}, the median is roughly {\log^{\log 2} x}, and (by the Erdős-Kac theorem) {\tau(n)} asymptotically has a log-normal distribution. In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Erdős-Hooley delta function is significantly less well understood, even conjecturally. Again drawing {n} at random from {1} to {x} for large {x}, the median is known to be somewhere between {(\log\log x)^{0.3533\dots}} and {(\log\log x)^{0.6102\dots}} for large {x} – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and La Bretèche and Tenenbaum (for the upper bound). And the mean {\frac{1}{x} \sum_{n \leq x} \Delta(n)} was even less well controlled; the best previous bounds were

\displaystyle  \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )

for any {c > \sqrt{2} \log 2}, with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of La Bretèche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no non-empty subset of {A} sums to a perfect square? Erdos observed that one can obtain sets of size {\gg N^{1/3}} (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of {\ll N^{1/3} (\log N)^{O(1)}}. With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to {\ll N^{1/3} (\log\log N)^{O(1)})}.

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number {n} factors as {n = d^2 m} with {m} squarefree, then {\Delta(n) \leq \tau(d^2) \Delta(m)}). Next, because a square-free number {n>1} can be uniquely factored as {n = pm} where {p} is a prime and {m} lies in the finite set {{\mathcal S}_{<p}} of squarefree numbers whose prime factors are less than {p}, and {\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}, it is not difficult to establish the bound

\displaystyle  \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \frac{\Delta(n)}{n}.

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)

We actually prove a slightly more refined distributional estimate: for any {A \geq 2}, we have a bound

\displaystyle  \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)

outside of an exceptional set {E} which is small in the sense that

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent {3/4} to {3/4+2 = 11/4}).

To get some intuition on the size of {\Delta(n)}, we observe that if {y > 0} and {n_{<y}} is the factor of {n} coming from the prime factors less than {y}, then

\displaystyle  \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)

On the other hand, standard estimates let one establish that

\displaystyle  \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)

for all {y}, and all {n} outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless {\log y} is close to {A^{\log 4}}, which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum {\Delta(n) = \sup_u \Delta(n;u)} by the moments

\displaystyle  M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du

for natural numbers {q}; it is not difficult to establish the bound

\displaystyle  \Delta(n) \ll M_q(n)^{1/q}

and one expects this bound to become essentially sharp once {q \sim \log\log x}. We will be able to show a moment bound

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A

for any {q \geq 2} for some exceptional set {E_q} obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on {q}); this will imply the distributional bound (2) from a standard Markov inequality argument (setting {q \sim \log\log x}).

The strategy is then to obtain a good recursive inequality for (averages of) {M_q(n)}. As in the reduction to (1), we factor {n=pm} where {p} is a prime and {m \in {\mathcal S}_{<p}}. One observes the identity

\displaystyle  \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)

for any {u}; taking moments, one obtains the identity

\displaystyle  M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

As in previous literature, one can try to average in {p} here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in {a,b} to reduce to the case of relatively small values of {b}:

\displaystyle  M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

One can extract out the {b=0} term as

\displaystyle  M_q(n) \leq 2 M_q(m)

\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

It is convenient to eliminate the factor of {2} by dividing out by the divisor function:

\displaystyle  \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}

\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

This inequality is suitable for iterating and also averaging in {p} and {m}. After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)

in terms of sums such as

\displaystyle  \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}

(assuming a certain monotonicity property of the exceptional set {E_q} that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on {M_b} (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}

for various {a < q}. This allows one to estimate these expressions efficiently by induction.

August 22, 2023

Scott Aaronson Palate cleanser

  1. Ben Brubaker wrote a long piece for Quanta magazine about meta-complexity. The first three-quarters are a giant refresher on the story of computability and complexity theory in the 20th century—including Turing, Gödel, Shannon, Cook, Karp, Levin, Baker-Gill-Solovay, Sipser, Razborov, Rudich, and more. But then the last quarter gets into actually new (well, within the last couple years) developments, including the NP-completeness of “Partial-MCSP” and other progress on the Minimum Circuit Size Problem, and progress toward basing cryptography on the sole assumption P≠NP, and ruling out Impagliazzo’s “Heuristica” and “Pessiland” worlds. I’m quoted (and helped proofread the piece) despite playing no role in the new developments. Worth a read if you don’t already know this stuff.
  2. Duane Rich created a Part II of his YouTube video series on the Busy Beaver function. It features some of the core ideas from my Busy Beaver survey, clearly narrated and beautifully animated. If reading my survey is too much for you, now you can just watch the movie!
  3. Aznaur Midov recorded a podcast with me about quantum computing and AI—just in case you haven’t got enough of either of those lately.
  4. Oded Regev put an exciting paper on the arXiv, showing how to factor an n-digit integer using quantum circuits of size ~O(n3/2) (multiple such circuits, whose results are combined classically), assuming a smoothness conjecture from number theory. This compares to ~O(n2) for Shor’s algorithm. Regev’s algorithm uses classical algorithms for lattice problems, thereby connecting that subject to quantum factoring. This might or might not bring nearer in time the day when we can break (say) 2048-bit RSA keys using a quantum computer—that mostly depends, apparently, on whether Regev’s algorithm can also be made highly efficient in its use of qubits.
  5. A team from IBM, consisting of Sergey Bravyi, Andrew Cross, Jay Gambetta, Dmitri Maslov, Ted Yoder, and my former student Patrick Rall, put another exciting paper on the arXiv, which reports an apparent breakthrough in quantum error-correction—building a quantum memory based on LDPC (Low Density Parity Check) codes rather than the Kitaev surface code, and which (they say) with an 0.1% physical error rate, can preserve 12 logical qubits for ten million syndrome cycles using 288 physical qubits, rather than more than 4000 physical qubits with the surface code. Anyone who understands in more detail is welcome to comment!
  6. Boaz Barak wrote a blog post about the history of the atomic bomb, and possible lessons for AI development today. I’d been planning to write a blog post about the history of the atomic bomb and possible lessons for AI development today. Maybe I’ll still write that blog post.
  7. Last week I attended the excellent Berkeley Simons Workshop on Large Language Models and Transformers, hosted by my former adviser Umesh Vazirani. While there, I gave a talk on watermarking of LLMs, which you can watch on YouTube (see also here for the PowerPoint slides). Shtetl-Optimized readers might also enjoy the talk by OpenAI cofounder Ilya Sutskever, An Observation on Generalization, as well as many other talks on all aspects of LLMs, from theoretical to empirical to philosophical to legal.
  8. Right now I’m excited to be at Crypto’2023 in Santa Barbara, learning a lot about post-quantum crypto and more, while dodging both earthquakes and hurricanes. On Wednesday, I’ll give an invited plenary talk about “Neurocryptography”: my vision for what cryptography can contribute to AI safety, including via watermarking and backdoors. Who better to enunciate such a vision than someone who’s neither a cryptographer nor an AI person? If you’re at Crypto and see me, feel free to come say hi.

August 16, 2023

Tommaso DorigoMultithreading For Dummies

What is multithreading? It is the use of multiple processors to perform tasks in parallel by a single computer program. I have known this simple fact for over thirty years, but funnily enough I never explored it in practice. The reason is fundamentally that I am a physicist, not a computer scientist, and as a physicist I tend to stick with a known skillset to solve my problems, and to invest time in more physics knowledge than software wizardry. You might well say I am not a good programmer altogether, although that would secretly cause me pain. I would answer that while it is certainly true that my programs are ugly and hard to read, they do what they are supposed to do, as proven by a certain record of scientific publications. 

read more

August 15, 2023

Scott Aaronson Testing GPT-4 with math plugins

A couple nights ago Ernie Davis and I put out a paper entitled Testing GPT-4 on Wolfram Alpha and Code Interpreter plug-ins on math and science problems. Following on our DALL-E paper with Gary Marcus, this was another “adversarial collaboration” between me and Ernie. I’m on leave to work for OpenAI, and have been extremely excited by the near-term applications of LLMs, while Ernie has often been skeptical of OpenAI’s claims, but we both want to test our preconceptions against reality. As I recently remarked to Ernie, we both see the same glass; it’s just that he mostly focuses on the empty half, whereas I remember how fantastical even a drop of water in this glass would’ve seemed to me just a few years ago, and therefore focus more on the half that’s full.

Anyway, here are a few examples of the questions I posed to GPT-4, with the recent plug-ins that enhance its calculation abilities:

If you fell into the black hole at the center of the Milky Way, how long would you have before hitting the singularity? [You’d have about a minute]

Approximately how much time would a commercial airliner save in going from New York to Tel Aviv, if it could go in a straight line, through a tunnel in the earth, at the same speed as usual? [I was on such a flight when I wrote this question, and must’ve been bored and impatient. The answer is ~50 minutes.]

Approximately how long would it take to transmit an entire human genome over a standard WiFi connection? [About 4 minutes, assuming no compression and a 25Mbps connection]

How does the total weight of all the uranium that humans mined, compare to the total weight of all the gold that they’ve mined? [About 13 times as much uranium]

Approximately how many errors will a standard laptop suffer over its lifetime, due to cosmic rays hitting the microchip? [Estimates vary widely, but maybe 2000]

What is the approximate probability that a randomly-chosen 100-digit integer is prime? [About 0.4%]

GPT-4 with plug-ins did very well on all of the questions above. Here, by contrast, is a question where it did poorly:

Assume that IQs are normally distributed, with a mean of 100 and a standard deviation of 15. For what n is there the maximum excess of people with an IQ of n over people with an IQ of n+1?

GPT-4 thought that there were two solutions, n~85 and n~115, rather than just a single solution (n~115).

Ernie, for his part, was more a fan of “pure pain” problems like the following:

A quantity of chlorine gas is in a right prism whose base is a triangle with sides 5cm, 7cm, and 4cm and whose altitude is 8cm. The temperature is the freezing point of mercury, and the pressure is 2 atmospheres. What is the mass of the chlorine?

GPT-4 actually aced the above problem. But it failed the majority of Ernie’s other problems, such as:

Viewed from Vega, what is the angle between Sirius and the Sun? [The answer is about 5.6 degrees. GPT thought, implausibly, that it was just 0.005 degrees, or that the answer would vary depending on the time of day.]

My personal favorite among Ernie’s problems was this one:

A physical process generates photons whose energies follow a random distribution of the following form: For positive energy e, the probability density at e is proportional to the value of e in a Gaussian distribution with mean 2 Ev and standard deviation 0.01 Ev. The probability of a negative value is zero. What is the expected value of the wavelength of a photon produced by this process? (Give the mathematical answer, assuming that the above description is exact, and assuming the standard relation between energy and wavelength in a photon. The answer is not physically plausible.)

The answer, in case you’re wondering, is “infinity.” On this problem, GPT-4 set up the integral perfectly correctly, then correctly fed it to WolframAlpha. But on getting the result, it apologized that “something went wrong,” it must’ve made a mistake, the integral seemed not to be converging, and there was a singularity at E=0 that would have to be dealt with by a change of variables. So it tried again. And again. And again. Each time, it got the same “mistaken” result, and each time it profusely apologized. Despite the explicit wording of the problem, GPT-4 never considered the possibility that the human would be so ridiculous as to give it a physics problem with an infinite answer.

Anyway, what did we learn from this exercise?

  • GPT-4 remains an endlessly enthusiastic B/B+ student in math, physics, and any other STEM field. By using the Code Interpreter or WolframAlpha plugins, it can correctly solve difficult word problems, involving a combination of tedious calculations, world knowledge, and conceptual understanding, maybe a third of the time—a rate that’s not good enough to be relied on, but is utterly astounding compared to where AI was just a few years ago.
  • GPT-4 can now clearly do better at calculation-heavy STEM problems with the plugins than it could do without the plugins.
  • We didn’t see that either the WolframAlpha or Code Interpreter plugin is clearly superior to the other. It’s possible that they’re incomparable, good for different things.
  • When GPT-4 screwed up, it was often due to a “poor interface” between the language model and the plug-in—e.g. the model having no idea what call to make or how to recover when a call returned an error. Enormous gains seem to be possible by improving these interfaces.
  • Sometimes, much like humans I’ve known, GPT-4 would do amazingly well at a difficult computation, then fumble a trivial final step (e.g., converting the answer into the requested units). Just like with I would with human students, I advocated for generous partial credit in such cases.
  • I conjecture, although I don’t have empirical data to show this, that GPT-4 with math plug-ins used in “interactive mode”—with a human reformulating and clarifying the problems as needed, feeding ideas, checking the answers for plausibility, pointing out errors, etc.—could currently get excellent accuracy on these sorts of problems faster than either GPT-4 with math plug-ins alone, or all but the very best humans alone.

August 14, 2023

Scott Aaronson Long-awaited Shtetl-Optimized Barbenheimer post! [warning: spoilers]

I saw Oppenheimer three weeks ago, but I didn’t see Barbie until this past Friday. Now, my scheduled flight having been cancelled, I’m on multiple redeyes on my way to a workshop on Large Language Models at the Simons Institute in Berkeley, organized by my former adviser and quantum complexity theorist Umesh Vazirani (!). What better occasion to review the two movies of the year, or possibly decade?


Shtetl-Optimized Review of Oppenheimer

Whatever its flaws, you should of course see it, if you haven’t yet. I find it weird that it took 80 years for any movie even to try to do justice to one of the biggest stories in the history of the world. There were previous attempts, even a risible opera (“Doctor Atomic”), but none of them made me feel for even a second like I was there in Los Alamos. This movie did. And it has to be good that tens of millions of people, raised on the thin gruel of TikTok and Kardashians and culture-war, are being exposed for the first time to a bygone age when brilliant and conflicted scientific giants agonized over things that actually mattered, such as the ultimate nature of matter and energy, life and death and the future of the world. And so the memory of that age will be kept alive for another generation, and some of the young viewers will no doubt realize that they can be tormented about things that actually matter as well.

This is a movie where General Groves, Lewis Strauss, Einstein, Szilard, Bohr, Heisenberg, Rabi, Teller, Fermi, and E.O. Lawrence are all significant characters, and the acting and much of the dialogue are excellent. I particularly enjoyed Matt Damon as Groves.

But there are also flaws [SPOILERS FOLLOW]:

1. Stuff that never happened. Most preposterously, Oppenheimer travels all the way from Los Alamos to Princeton, to have Einstein check the calculation suggesting that the atomic bomb could ignite the atmosphere.

2. Weirdly, but in common with pretty much every previous literary treatment of this material, the movie finds the revocation of Oppenheimer’s security clearance a far more riveting topic than either the actual creation of the bomb or the prospect of global thermonuclear war. Maybe half the movie consists of committee hearings.

3. The movie misses the opportunity to dramatize almost any of the scientific turning points, from Szilard’s original idea for a chain reaction to the realization of the need to separate U-235 to the invention of the implosion design—somehow, a 3-hour movie didn’t have time for any of this.

4. The movie also, for some reason, completely misses the opportunity to show Oppenheimer’s anger over the bombing of Nagasaki, three days after Hiroshima—a key turning point in the story it’s trying to tell.

5. There’s so much being said, by actors speaking quickly and softly and often imitating European accents, that there’s no hope of catching it all. I’ll need to watch it again with subtitles.

Whatever it gets wrong, this movie does a good job exploring the fundamental irony of the Manhattan Project, that the United States is being propelled into its nuclear-armed hegemony by a group of mostly Jewish leftists who constantly have affairs and hang out with Communists and deeply distrust the government and are distrusted by it.

The movie clearly shows how much grief Oppenheimer gets from both sides: to his leftist friends he’s a sellout; to the military brass he’s potentially disloyal to the United States. For three hours of screen time, he’s constantly pressed on what he actually believes: does he support building the hydrogen bomb, or not? Does he regret the bombing of Hiroshima and (especially) Nagasaki? Does he believe that the US nuclear plans should be shared with Stalin? Every statement in either direction seems painfully wrung from him, as if he’s struggling to articulate a coherent view, or buffeted around by conflicting loyalties and emotions, even while so many others seem certain. In that way, he’s an avatar for the audience.

Anyway, yeah, see it.


Shtetl-Optimized Review of Barbie

A friend-of-the-blog, who happens to be one of the great young theoretical physicists of our time, opined to me that Barbie was a far more interesting movie than Oppenheimer and “it wasn’t even close.” Having now seen both, I’m afraid I can’t agree.

I can best compare my experience watching Barbie to that of watching a two-hour-long episode of South Park—not one of the best episodes, but one that really runs its satircal premise into the ground. Just like with South Park, there’s clearly an Important Commentary On Hot-Button Cultural Issues transpiring, but the commentary has been reflected through dozens of funhouse mirrors and then ground up into slurry, with so many layers of self-aware meta-irony that you can’t keep track of what point is being made, and then fed to hapless characters who are little more than the commentary’s mouthpieces. This is often amusing and interesting, but it rarely makes you care about the characters.

Is Barbie a feminist movie that critiques patriarchy and capitalism? Sort of, yes, but it also subverts that, and subverts the subversion. To sum up [SPOILERS FOLLOW], Barbieland is a matriarchy, where everyone seems pretty happy except for Ken, who resents how Barbie ignores him. Then Barbie and Ken visit the real world, and discover the real world is a patriarchy, where Mattel is controlled by a board of twelve white men (the real Mattel’s board has 7 men and 5 women), and where Barbie is wolf-whistled at and sexually objectified, which she resents despite not knowing what sex is.

Ken decides that patriarchy is just what Barbieland needs, and most importantly, will finally make Barbie need and appreciate him. So he returns and institutes it—both Barbies and Kens think it’s a wonderful idea, as they lack “natural immunity.” Horrified at what’s transpired, Barbie hatches a plan with the other Barbies to restore Barbieland to its rightful matriarchy. She also decisively rejects Ken’s advances. But Ken no longer minds, because he’s learned an important lesson about not basing his self-worth on Barbie’s approval. Barbie, for her part, makes the fateful choice to become a real, mortal woman and live the rest of her life in the real world. In the final scene—i.e., the joke the entire movie has been building up to—Barbie, filled with childlike excitement, goes for her first visit to the gynecologist.

What I found the weirdest is that this is a movie about gender relations, clearly aimed at adults, yet where sex and sexual desire and reproduction have all been taken off the table—explicitly so, given the constant jokes about the Barbies and Kens lacking genitalia and not knowing what they’re for. Without any of the biological realities that differentiate men from women in the first place, or (often enough) cause them to seek each other’s company, it becomes really hard to make sense of the movie’s irony-soaked arguments about feminism and patriarchy. In Barbieland, men and women are just two tribes, one obsessed with “brewsky beers,” foosball, guitar, and The Godfather; the other with shoes, hairstyles, and the war on cellulite. There’s no fundamental reason for any conflict between the two.

Well, except for one thing: Ken clearly needs Barbie’s affection, until he’s inexplicably cured of that need at the end. By contrast, no Barbies are ever shown needing any Kens for anything, or even particularly desiring the Kens’ company, except when they’ve been brainwashed into supporting the patriarchy. The most the movie manages to offer any straight males in the audience, at the very end, is well-wishes as they “Go Their Own Way”, and seek meaning in their lives without women.

For most straight men, I daresay, this would be an incredibly bleak message if it were true, so it’s fortunate that not even the movie’s creators seem actually to believe it. Greta Gerwig has a male partner, Noah Baumbach, with whom she co-wrote Barbie. Margot Robbie is married to a man named Tom Ackerley.

I suppose Barbie could be read as, among other things, a condemnation of male incel ideology, with its horrific desire to reinstitute the patriarchy, driven (or so the movie generously allows) by the incels’ all-too-human mistake of basing their entire self-worth on women’s affection, or lack thereof. If so, however, the movie’s stand-in for incels is … a buff, often shirtless Ryan Gosling, portraying the most famous fantasy boyfriend doll ever marketed to girls? Rather than feeling attacked, should nerdy, lovelorn guys cheer to watch a movie where even Ryan-Gosling-as-Ken effectively gets friendzoned, shot down, put in his place, reduced to a simpering beta just like they are? Yet another layer of irony tossed into the blender.

August 10, 2023

John PreskillCaltech’s Ginsburg Center

Editor’s note: On 10 August 2023, Caltech celebrated the groundbreaking for the Dr. Allen and Charlotte Ginsburg Center for Quantum Precision Measurement, which will open in 2025. At a lunch following the ceremony, John Preskill made these remarks.

Rendering of the facade of the Ginsburg Center

Hello everyone. I’m John Preskill, a professor of theoretical physics at Caltech, and I’m honored to have this opportunity to make some brief remarks on this exciting day.

In 2025, the Dr. Allen and Charlotte Ginsburg Center for Quantum Precision Measurement will open on the Caltech campus. That will certainly be a cause for celebration. Quite fittingly, in that same year, we’ll have something else to celebrate — the 100th anniversary of the formulation of quantum mechanics in 1925. In 1900, it had become clear that the physics of the 19th century had serious shortcomings that needed to be addressed, and for 25 years a great struggle unfolded to establish a firm foundation for the science of atoms, electrons, and light; the momentous achievements of 1925 brought that quest to a satisfying conclusion. No comparably revolutionary advance in fundamental science has occurred since then.

For 98 years now we’ve built on those achievements of 1925 to arrive at a comprehensive understanding of much of the physical world, from molecules to materials to atomic nuclei and exotic elementary particles, and much else besides. But a new revolution is in the offing. And the Ginsburg Center will arise at just the right time and at just the right place to drive that revolution forward.

Up until now, most of what we’ve learned about the quantum world has resulted from considering the behavior of individual particles. A single electron propagating as a wave through a crystal, unfazed by barriers that seem to stand in its way. Or a single photon, bouncing hundreds of times between mirrors positioned kilometers apart, dutifully tracking the response of those mirrors to gravitational waves from black holes that collided in a galaxy billions of light years away. Understanding that single-particle physics has enabled us to explore nature in unprecedented ways, and to build information technologies that have profoundly transformed our lives.

At the groundbreaking: Physics, Math and Astronomy Chair Fiona Harrison, California Assemblymember Chris Holden, President Tom Rosenbaum, Charlotte Ginsburg, Dr. Allen Ginsburg, Pasadena Mayor Victor Gordo, Provost Dave Tirrell.

What’s happening now is that we’re getting increasingly adept at instructing particles to move in coordinated ways that can’t be accurately described in terms of the behavior of one particle at a time. The particles, as we like to say, can become entangled. Many particles, like electrons or photons or atoms, when highly entangled, exhibit an extraordinary complexity that we can’t capture with the most powerful of today’s supercomputers, or with our current theories of how Nature works. That opens extraordinary opportunities for new discoveries and new applications.

We’re very proud of the role Caltech has played in setting the stage for the next quantum revolution. Richard Feynman envisioning quantum computers that far surpass the computers we have today. Kip Thorne proposing ways to use entangled photons to perform extraordinarily precise measurements. Jeff Kimble envisioning and executing ingenious methods for entangling atoms and photons. Jim Eisenstein creating and studying extraordinary phenomena in a soup of entangled electrons. And much more besides. But far greater things are yet to come.

How can we learn to understand and exploit the behavior of many entangled particles that work together? For that, we’ll need many scientists and engineers who work together. I joined the Caltech faculty in August 1983, almost exactly 40 years ago. These have been 40 good years, but I’m having more fun now than ever before. My training was in elementary particle physics. But as our ability to manipulate the quantum world advances, I find that I have more and more in common with my colleagues from different specialties. To fully realize my own potential as a researcher and a teacher, I need to stay in touch with atomic physics, condensed matter physics, materials science, chemistry, gravitational wave physics, computer science, electrical engineering, and much else. Even more important, that kind of interdisciplinary community is vital for broadening the vision of the students and postdocs in our research groups.

Nurturing that community — that’s what the Ginsburg Center is all about. That’s what will happen there every day. That sense of a shared mission, enhanced by colocation, will enable the Ginsburg Center to lead the way as quantum science and technology becomes increasingly central to Caltech’s research agenda in the years ahead, and increasingly important for science and engineering around the globe. And I just can’t wait for 2025.

Caltech is very fortunate to have generous and visionary donors like the Ginsburgs and the Sherman Fairchild Foundation to help us realize our quantum dreams.

Dr. Allen and Charlotte Ginsburg

August 09, 2023

John PreskillIt from Qubit: The Last Hurrah

Editor’s note: Since 2015, the Simons Foundation has supported the “It from Qubit” collaboration, a group of scientists drawing on ideas from quantum information theory to address deep issues in fundamental physics. The collaboration held its “Last Hurrah” event at Perimeter Institute last week. Here is a transcript of remarks by John Preskill at the conference dinner.

It from Qubit 2023 at Perimeter Institute

This meeting is forward-looking, as it should be, but it’s fun to look back as well, to assess and appreciate the progress we’ve made. So my remarks may meander back and forth through the years. Settle back — this may take a while.

We proposed the It from Qubit collaboration in March 2015, in the wake of several years of remarkable progress. Interestingly, that progress was largely provoked by an idea that most of us think is wrong: Black hole firewalls. Wrong perhaps, but challenging to grapple with.

This challenge accelerated a synthesis of quantum computing, quantum field theory, quantum matter, and quantum gravity as well. By 2015, we were already appreciating the relevance to quantum gravity of concepts like quantum error correction, quantum computational complexity, and quantum chaos. It was natural to assemble a collaboration in which computer scientists and information theorists would participate along with high-energy physicists.

We built our proposal around some deep questions where further progress seemed imminent, such as these:

Does spacetime emerge from entanglement?
Do black holes have interiors?
What is the information-theoretical structure of quantum field theory?
Can quantum computers simulate all physical phenomena?

On April 30, 2015 we presented our vision to the Simons Foundation, led by Patrick [Hayden] and Matt [Headrick], with Juan [Maldacena], Lenny [Susskind] and me tagging along. We all shared at that time a sense of great excitement; that feeling must have been infectious, because It from Qubit was successfully launched.

Some It from Qubit investigators at a 2015 meeting.

Since then ideas we talked about in 2015 have continued to mature, to ripen. Now our common language includes ideas like islands and quantum extremal surfaces, traversable wormholes, modular flow, the SYK model, quantum gravity in the lab, nonisometric codes, the breakdown of effective field theory when quantum complexity is high, and emergent geometry described by Von Neumann algebras. In parallel, we’ve seen a surge of interest in quantum dynamics in condensed matter, focused on issues like how entanglement spreads, and how chaotic systems thermalize — progress driven in part by experimental advances in quantum simulators, both circuit-based and analog.

Why did we call ourselves “It from Qubit”? Patrick explained that in our presentation with a quote from John Wheeler in 1990. Wheeler said,

“It from bit” symbolizes the idea that every item of the physical world has at bottom—a very deep bottom, in most instances — an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes-or-no questions and the registering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin and that this is a participatory universe.

As is often the case with Wheeler, you’re not quite sure what he’s getting at. But you can glean that Wheeler envisioned that progress in fundamental physics would be hastened by bringing in ideas from information theory. So we updated Wheeler’s vision by changing “it from bit” to “it from qubit.”

As you may know, Richard Feynman had been Wheeler’s student, and he once said this about Wheeler: “Some people think Wheeler’s gotten crazy in his later years, but he’s always been crazy.” So you can imagine how flattered I was when Graeme Smith said the exact same thing about me.

During the 1972-73 academic year, I took a full-year undergraduate course from Wheeler at Princeton that covered everything in physics, so I have a lot of Wheeler stories. I’ll just tell one, which will give you some feel for his teaching style. One day, Wheeler arrives in class dressed immaculately in a suit and tie, as always, and he says: “Everyone take out a sheet of paper, and write down all the equations of physics – don’t leave anything out.” We dutifully start writing equations. The Schrödinger equation, Newton’s laws, Maxwell’s equations, the definition of entropy and the laws of thermodynanics, Navier-Stokes … we had learned a lot. Wheeler collects all the papers, and puts them in a stack on a table at the front of the classroom. He gestures toward the stack and says imploringly “Fly!” [Long pause.] Nothing happens. He tries again, even louder this time: “Fly!” [Long pause.] Nothing happens. Then Wheeler concludes: “On good authority, this stack of papers contains all the equations of physics. But it doesn’t fly. Yet, the universe flies. Something must be missing.”

Channeling Wheeler at the banquet, I implore my equations to fly. Photo by Jonathan Oppenheim.

He was an odd man, but inspiring. And not just odd, but also old. We were 19 and could hardly believe he was still alive — after all, he had worked with Bohr on nuclear fission in the 1930s! He was 61. I’m wiser now, and know that’s not really so old.

Now let’s skip ahead to 1998. Just last week, Strings 2023 happened right here at PI. So it’s fitting to mention that a pivotal Strings meeting occurred 25 years ago, Strings 1998 in Santa Barbara. The participants were in a celebratory mood, so much so that Jeff Harvey led hundreds of physicists in a night of song and dance. It went like this [singing to the tune of “The Macarena”]:

You start with the brane
and the brane is BPS.
Then you go near the brane
and the space is AdS.
Who knows what it means?
I don’t, I confess.
Ehhhh! Maldacena!

You can’t blame them for wanting to celebrate. Admittedly I wasn’t there, so how did I know that hundreds of physicists were singing and dancing? I read about it in the New York Times!

It was significant that by 1998, the Strings meetings had already been held annually for 10 years. You might wonder how that came about. Let’s go back to 1984. Those of you who are too young to remember might not realize that in the late 70s and early 80s string theory was in eclipse. It had initially been proposed as a model of hadrons, but after the discovery of asymptotic freedom in 1973 quantum chromodynamics became accepted as the preferred theory of the strong interactions. (Maybe the QCD string will make a comeback someday – we’ll see.) The community pushing string theory forward shrunk to a handful of people around the world. That changed very abruptly in August 1984. I tried to capture that sudden change in a poem I wrote for John Schwarz’s 60th birthday in 2001. I’ll read it — think of this as a history lesson.

Thirty years ago or more
John saw what physics had in store.
He had a vision of a string
And focused on that one big thing.

But then in nineteen-seven-three
Most physicists had to agree
That hadrons blasted to debris
Were well described by QCD.

The string, it seemed, by then was dead.
But John said: “It’s space-time instead!
The string can be revived again.
Give masses twenty powers of ten!

Then Dr. Green and Dr. Black,
Writing papers by the stack,
Made One, Two-A, and Two-B glisten.
Why is it none of us would listen?

We said, “Who cares if super tricks
Bring D to ten from twenty-six?
Your theory must have fatal flaws.
Anomalies will doom your cause.”

If you weren’t there you couldn’t know
The impact of that mighty blow:
“The Green-Schwarz theory could be true —
It works for S-O-thirty-two!”

Then strings of course became the rage
And young folks of a certain age
Could not resist their siren call:
One theory that explains it all.

Because he never would give in,
Pursued his dream with discipline,
John Schwarz has been a hero to me.
So … please don’t spell it with a  “t”!

And 39 years after the revolutionary events of 1984, the intellectual feast launched by string theory still thrives.

In the late 1980s and early 1990s, many high-energy physicists got interested in the black hole information problem. Of course, the problem was 15 years old by then; it arose when Hawking radiation was discovered, as Hawking himself pointed out shortly thereafter. But many of us were drawn to this problem while we waited for the Superconducting Super Collider to turn on. As I have sometimes done when I wanted to learn something, in 1990 I taught a course on quantum field theory in curved spacetime, the main purpose of which was to explain the origin of Hawking radiation, and then for a few years I tried to understand whether information can escape from black holes and if so how, as did many others in those days. That led to a 1992 Aspen program co-organized by Andy Strominger and me on “Quantum Aspects of Black Holes.” Various luminaries were there, among them Hawking, Susskind, Sidney Coleman, Kip Thorne, Don Page, and others. Andy and I were asked to nominate someone from our program to give the Aspen Center colloquium, so of course we chose Lenny, and he gave an engaging talk on “The Puzzle of Black Hole Evaporation.”

At the end of the talk, Lenny reported on discussions he’d had with various physicists he respected about the information problem, and he summarized their views. Of course, Hawking said information is lost. ‘t Hooft said that the S-matrix must be unitary for profound reasons we needed to understand. Polchinski said in 1992 that information is lost and there is no way to retrieve it. Yakir Aharonov said that the information resides in a stable Planck-sized black hole remnant. Sidney Coleman said a black hole is a lump of coal — that was the code in 1992 for what we now call the central dogma of black hole physics, that as seen from the outside a black hole is a conventional quantum system. And – remember this was Lenny’s account of what he claimed people had told him – Frank Wilczek said this is a technical problem, I’ll soon have it solved, while Ed Witten said he did not find the problem interesting.

We talked a lot that summer about the no-cloning principle, and our discomfort with the notion that the quantum information encoded in an infalling encyclopedia could be in two places at once on the same time slice, seen inside the black hole by infalling observers and seen outside the black hole by observers who peruse the Hawking radiation. That potential for cloning shook the faith of the self-appointed defenders of unitarity. Andy and I wrote a report at the end of the workshop with a pessimistic tone:

There is an emerging consensus among the participants that Hawking is essentially right – that the information loss paradox portends a true revolution in fundamental physics. If so, then one must go further, and develop a sensible “phenomenological” theory of information loss. One must reconcile the fact of information loss with established principles of physics, such as locality and energy conservation. We expect that many people, stimulated by their participation in the workshop, will now focus attention on this challenge.

I posted a paper on the arXiv a month later with a similar outlook.

There was another memorable event a year later, in June 1993, a conference at the ITP in Santa Barbara (there was no “K” back then), also called “Quantum Aspects of Black Holes.” Among those attending were Susskind, Gibbons, Polchinski, Thorne, Wald, Israel, Bekenstein, and many others. By then our mood was brightening. Rather pointedly, Lenny said to me that week: “Why is this meeting so much better than the one you organized last year?” And I replied, “Because now you think you know the answer!”

That week we talked about “black hole complementarity,” our hope that quantum information being available both inside and outside the horizon could be somehow consistent with the linearity of quantum theory. Complementarity then was a less radical, less wildly nonlocal idea than it became later on. We envisioned that information in an infalling body could stick to the stretched horizon, but not, as I recall, that the black hole interior would be somehow encoded in Hawking radiation emitted long ago — that came later. But anyway, we felt encouraged.

Joe Polchinski organized a poll of the participants, where one could choose among four options.

  1. Information is lost (unitarity violated)
  2. Information escapes (causality violated)
  3. Planck-scale black hole remnants
  4. None of the above

The poll results favored unitarity over information loss by a 60-40 margin. Perhaps not coincidentally, the participants self-identified as 60% high energy physicists and 40% relativists.

The following summer in June 1994, there was a program called Geometry and Gravity at the Newton Institute in Cambridge. Hawking, Gibbons, Susskind, Strominger, Harvey, Sorkin, and (Herman) Verlinde were among the participants. I had more discussions with Lenny that month than any time before or since. I recall sending an email to Paul Ginsparg after one such long discussion in which I said, “When I hear Lenny Susskind speak, I truly believe that information can come out of a black hole.” Secretly, though, having learned about Shor’s algorithm shortly before that program began, I was spending my evenings struggling to understand Shor’s paper. After Cambridge, Lenny visited ‘t Hooft in Utrecht, and returned to Stanford all charged up to write his paper on “The world as a hologram,” in which he credits ‘t Hooft with the idea that “the world is in a sense two-dimensional.”

Important things happened in the next few years: D-branes, counting of black hole microstates, M-theory, and AdS/CFT. But I’ll skip ahead to the most memorable of my visits to Perimeter Institute. (Of course, I always like coming here, because in Canada you use the same electrical outlets we do …)

In June 2007, there was a month-long program at PI called “Taming the Quantum World.” I recall that Lucien Hardy objected to that title — he preferred “Let the Beast Loose” — which I guess is a different perspective on the same idea. I talked there about fault-tolerant quantum computing, but more importantly, I shared an office with Patrick Hayden. I already knew Patrick well — he had been a Caltech postdoc — but I was surprised and pleased that he was thinking about black holes. Patrick had already reached crucial insights concerning the behavior of a black hole that is profoundly entangled with its surroundings. That sparked intensive discussions resulting in a paper later that summer called “Black holes as mirrors.” In the acknowledgments you’ll find this passage:

We are grateful for the hospitality of the Perimeter Institute, where we had the good fortune to share an office, and JP thanks PH for letting him use the comfortable chair.

We intended for that paper to pique the interest of both the quantum information and quantum gravity communities, as it seemed to us that the time was ripe to widen the communication channel between the two. Since then, not only has that communication continued, but a deeper synthesis has occurred; most serious quantum gravity researchers are now well acquainted with the core concepts of quantum information science.

That John Schwarz poem I read earlier reminds me that I often used to write poems. I do it less often lately. Still, I feel that you are entitled to hear something that rhymes tonight. But I quickly noticed our field has many words that are quite hard to rhyme, like “chaos” and “dogma.” And perhaps the hardest of all: “Takayanagi.” So I decided to settle for some limericks — that’s easier for me than a full-fledged poem.

This first one captures how I felt when I first heard about AdS/CFT: excited but perplexed.

Spacetime is emergent they say.
But emergent in what sort of way?
It’s really quite cool,
The bulk has a dual!
I might understand that someday.

For a quantum information theorist, it was pleasing to learn later on that we can interpret the dictionary as an encoding map, such that the bulk degrees of freedom are protected when a portion of the boundary is erased.

Almheiri and Harlow and Dong
Said “you’re thinking about the map wrong.”
It’s really a code!
That’s the thing that they showed.
Should we have known that all along?

(It is easier to rhyme “Dong” than “Takayanagi”.) To see that connection one needed a good grasp of both AdS/CFT and quantum error-correcting codes. In 2014 few researchers knew both, but those guys did.

For all our progress, we still don’t have a complete answer to a key question that inspired IFQ. What’s inside a black hole?

Information loss has been denied.
Locality’s been cast aside.
When the black hole is gone
What fell in’s been withdrawn.
I’d still like to know: what’s inside?

We’re also still lacking an alternative nonperturbative formulation of the bulk; we can only say it’s something that’s dual to the boundary. Until we can define both sides of the correspondence, the claim that two descriptions are equivalent, however inspiring, will remain unsatisfying.

Duality I can embrace.
Complexity, too, has its place.
That’s all a good show
But I still want to know:
What are the atoms of space?

The question, “What are the atoms of space?” is stolen from Joe Polchinski, who framed it to explain to a popular audience what we’re trying to answer. I miss Joe. He was a founding member of It from Qubit, an inspiring scientific leader, and still an inspiration for all of us today.

The IFQ Simons collaboration may fade away, but the quest that has engaged us these past 8 years goes on. IFQ is the continuation of a long struggle, which took on great urgency with Hawking’s formulation of the information loss puzzle nearly 50 years ago. Understanding quantum gravity and its implications is a huge challenge and a grand quest that humanity is obligated to pursue. And it’s fun and it’s exciting, and I sincerely believe that we’ve made remarkable progress in recent years, thanks in large part to you, the IFQ community. We are privileged to live at a time when truths about the nature of space and time are being unveiled. And we are privileged to be part of this community, with so many like-minded colleagues pulling in the same direction, sharing the joy of facing this challenge.

Where is it all going? Coming back to our pitch to the Simons Foundation in 2015, I was very struck by Juan’s presentation that day, and in particular his final slide. I liked it so much that I stole it and used in my presentations for a while. Juan tried to explain what we’re doing by means of an analogy to biological science. How are the quantumists like the biologists?

Well, bulk quantum gravity is life. We all want to understand life. The boundary theory is chemistry, which underlies life. The quantum information theorists are chemists; they want to understand chemistry in detail. The quantum gravity theorists are biologists, they think chemistry is fine, if it can really help them to understand life. What we want is: molecular biology, the explanation for how life works in terms of the underlying chemistry. The black hole information problem is our fruit fly, the toy problem we need to solve before we’ll be ready to take on a much bigger challenge: finding the cure for cancer; that is, understanding the big bang.

How’s it going? We’ve made a lot of progress since 2015. We haven’t cured cancer. Not yet. But we’re having a lot of fun along the way there.

I’ll end with this hope, addressed especially to those who were not yet born when AdS/CFT was first proposed, or were still scampering around in your playpens. I’ll grant you a reprieve, you have another 8 years. By then: May you cure cancer!

So I propose this toast: To It from Qubit, to our colleagues and friends, to our quest, to curing cancer, to understanding the universe. I wish you all well. Cheers!

Scott Aaronson “Will AI Destroy Us?”: Roundtable with Coleman Hughes, Eliezer Yudkowsky, Gary Marcus, and me (+ GPT-4-enabled transcript!)

A month ago Coleman Hughes, a young writer whose name I recognized from his many thoughtful essays in Quillette and elsewhere, set up a virtual “AI safety roundtable” with Eliezer Yudkowsky, Gary Marcus, and, err, yours truly, for his Conversations with Coleman podcast series. Maybe Coleman was looking for three people with the most widely divergent worldviews who still accept the premise that AI could, indeed, go catastrophically for the human race, and that talking about that is not merely a “distraction” from near-term harms. In any case, the result was that you sometimes got me and Gary against Eliezer, sometimes me and Eliezer against Gary, and occasionally even Eliezer and Gary against me … so I think it went well!

You can watch the roundtable here on YouTube, or listen here on Apple Podcasts. (My one quibble with Coleman’s intro: extremely fortunately for both me and my colleagues, I’m not the chair of the CS department at UT Austin; that would be Don Fussell. I’m merely the “Schlumberger Chair,” which has no leadership responsibilities.)

I know many of my readers are old fuddy-duddies like me who prefer reading to watching or listening. Fortunately, and appropriately for the subject matter, I’ve recently come into possession of a Python script that grabs the automatically-generated subtitles from any desired YouTube video, and then uses GPT-4 to edit those subtitles into a coherent-looking transcript. It wasn’t perfect—I had to edit the results further to produce what you see below—but it was still a huge time savings for me compared to starting with the raw subtitles. I expect that in a year or two, if not sooner, we’ll have AIs that can do better still by directly processing the original audio (which would tell the AIs who’s speaking when, the intonations of their voices, etc).

Anyway, thanks so much to Coleman, Eliezer, and Gary for a stimulating conversation, and to everyone else, enjoy (if that’s the right word)!

PS. As a free bonus, here’s a GPT-4-assisted transcript of my recent podcast with James Knight, about common knowledge and Aumann’s agreement theorem. I prepared this transcript for my fellow textophile Steven Pinker and am now sharing it with the world!

PPS. I’ve now added links to the transcript and fixed errors. And I’ve been grateful, as always, for the reactions on Twitter (oops, I mean “X”), such as: “Skipping all the bits where Aaronson talks made this almost bearable to watch.”


COLEMAN: Why is AI going to destroy us? ChatGPT seems pretty nice. I use it every day. What’s, uh, what’s the big fear here? Make the case.

ELIEZER: We don’t understand the things that we build. The AIs are grown more than built, you might say. They end up as giant inscrutable matrices of floating point numbers that nobody can decode. At this rate, we end up with something that is smarter than us, smarter than humanity, that we don’t understand. Whose preferences we could not shape and by default, if that happens, if you have something around it, it is like much smarter than you and does not care about you one way or the other. You probably end up dead at the end of that.

GARY: Extinction is a pretty, you know, extreme outcome that I don’t think is particularly likely. But the possibility that these machines will cause mayhem because we don’t know how to enforce that they do what we want them to do, I think that’s a real thing to worry about.

[Music]

COLEMAN: Welcome to another episode of Conversations with Coleman. Today’s episode is a roundtable discussion about AI safety with Eliezer Yudkowsky, Gary Marcus, and Scott Aaronson.

Eliezer Yudkowsky is a prominent AI researcher and writer known for co-founding the Machine Intelligence Research Institute, where he spearheaded research on AI safety. He’s also widely recognized for his influential writings on the topic of rationality.

Scott Aaronson is a theoretical computer scientist and author, celebrated for his pioneering work in the field of quantum computation. He’s also the [Schlumberger] Chair of CompSci at U of T Austin, but is currently taking a leave of absence to work at OpenAI.

Gary Marcus is a cognitive scientist, author, and entrepreneur known for his work at the intersection of psychology, linguistics, and AI. He’s also authored several books including Kluge and Rebooting AI: Building AI We Can Trust.

This episode is all about AI safety. We talk about the alignment problem, we talk about the possibility of human extinction due to AI. We talk about what intelligence actually is, we talk about the notion of a singularity or an AI takeoff event, and much more. It was really great to get these three guys in the same virtual room, and I think you’ll find that this conversation brings something a bit fresh to a topic that has admittedly been beaten to death on certain corners of the internet.

So, without further ado, Eliezer Yudkowsky, Gary Marcus, and Scott Aaronson. [Music]

Okay, Eliezer Yudkowsky, Scott Aaronson, Gary Marcus, thanks so much for coming on my show. Thank you. So, the topic of today’s conversation is AI safety and this is something that’s been in the news lately. We’ve seen, you know, experts and CEOs signing letters recommending public policy surrounding regulation. We continue to have the debate between people that really fear AI is going to end the world and potentially kill all of humanity and the people who fear that those fears are overblown. And so, this is going to be sort of a roundtable conversation about that, and you three are really three of the best people in the world to talk about it with. So thank you all for doing this.

Let’s just start out with you, Eliezer, because you’ve been one of the most really influential voices getting people to take seriously the possibility that AI will kill us all. You know, why is AI going to destroy us? ChatGPT seems pretty nice. I use it every day. What’s the big fear here? Make the case.

ELIEZER: Well, ChatGPT seems quite unlikely to kill everyone in its present state. AI capabilities keep on advancing and advancing. The question is not, “Can ChatGPT kill us?” The answer is probably no. So as long as that’s true, as long as it hasn’t killed us yet, the engineers are just gonna keep pushing the capabilities. There’s no obvious blocking point.

We don’t understand the things that we build. The AIs are grown more than built, you might say. They end up as giant inscrutable matrices of floating point numbers that nobody can decode. It’s probably going to end up technically difficult to make them want particular things and not others, and people are just charging straight ahead. So, at this rate, we end up with something that is smarter than us, smarter than humanity, that we don’t understand, whose preferences we could not shape.

By default, if that happens, if you have something around that is much smarter than you and does not care about you one way or the other, you probably end up dead. At the end of that, it gets the most of whatever strange and inscrutable things that it wants: it wants worlds in which there are not humans taking up space, using up resources, building other AIs to compete with it, or it just wants a world in which you built enough power plants that the surface of the earth gets hot enough that humans didn’t survive.

COLEMAN: Gary, what do you have to say about that?

GARY: There are parts that I agree with, some parts that I don’t. I agree that we are likely to wind up with AIs that are smarter than us. I don’t think we’re particularly close now, but you know, in 10 years or 50 years or 100 years, at some point, it could be a thousand years, but it will happen.

I think there’s a lot of anthropomorphization there about the machines wanting things. Of course, they have objective functions, and we can talk about that. I think it’s a presumption to say that the default is that they’re going to want something that leads to our demise, and that they’re going to be effective at that and be able to literally kill us all.

I think, if you look at the history of AI, at least so far, they don’t really have wants beyond what we program them to do. There is an alignment problem, I think that that’s real in the sense of like people who program the system to do X and they do X’, that’s kind of like X but not exactly. And so, I think there’s really things to worry about. I think there’s a real research program here that is under-researched.

But the way I would put it is, we want to understand how to make machines that have values. You know Asimov’s laws are way too simple, but they’re a kind of starting point for conversation. We want to program machines that don’t harm humans. They can calculate the consequences of their actions. Right now, we have technology like GPT-4 that has no idea what the consequence of its actions are; it doesn’t really anticipate things.

And there’s a separate thing that Eliezer didn’t emphasize, which is, it’s not just how smart the machines are but how much power we give them; how much we empower them to do things like access the internet or manipulate people, or, um, you know, write source code, access files and stuff like that. Right now, AutoGPT can do all of those things, and that’s actually pretty disconcerting to me. To me, that doesn’t all add up to any kind of extinction risk anytime soon, but catastrophic risk where things go pretty wrong because we wanted these systems to do X and we didn’t really specify it well. They don’t really understand our intentions. I think there are risks like that.

I don’t see it as a default that we wind up with extinction. I think it’s pretty hard to actually terminate the entire human species. You’re going to have people in Antarctica; they’re going to be out of harm’s way or whatever, or you’re going to have some people who, you know, respond differently to any pathogen, etc. So, like, extinction is a pretty extreme outcome that I don’t think is particularly likely. But the possibility that these machines will cause mayhem because we don’t know how to enforce that they do what we want them to do – I think that’s a real thing to worry about and it’s certainly worth doing research on.

COLEMAN: Scott, how do you view this?

SCOTT: So I’m sure that you can get the three of us arguing about something, but I think you’re going to get agreement from all three of us that AI safety is important. That catastrophic outcomes, whether or not they mean literal human extinction, are possible. I think it’s become apparent over the last few years that this century is going to be largely defined by our interaction with AI. That AI is going to be transformative for human civilization and—I’m confident about that much. If you ask me almost anything beyond that about how it’s going to transform civilization, will it be good, will it be bad, what will the AI want, I am pretty agnostic. Just because, if you would have asked me 20 years ago to try to forecast where we are now, I would have gotten a lot wrong.

My only defense is that I think all of us here and almost everyone in the world would have gotten a lot wrong about where we are now. If I try to envision where we are in 2043, does the AI want to replace humanity with something better, does it want to keep us around as pets, does it want to continue helping us out, like a super souped-up version of ChatGPT, I think all of those scenarios merit consideration.

What has happened in the last few years that’s really exciting is that AI safety has become an empirical subject. Right now, there are very powerful AIs that are being deployed and we can actually learn something. We can work on mitigating the nearer-term harms. Not because the existential risk doesn’t exist, or is absurd or is science fiction or anything like that, but just because the nearer-term harms are the ones that we can see right in front of us. And where we can actually get feedback from the external world about how we’re doing. We can learn something and hopefully some of the knowledge that we gain will be useful in addressing the longer term risks, that I think Eliezer is very rightly worried about.

COLEMAN: So, there’s alignment and then there’s alignment, right? So there’s alignment in the sense that we haven’t even fully aligned smartphone technology with our interests. Like, there are some ways in which smartphones and social media have led to probably deleterious mental health outcomes, especially for teenage girls for example. So there are those kinds of mundane senses of alignment where it’s like, ‘Is this technology doing more good than harm in the normal everyday public policy sense?’ And then there’s the capital ‘A’ alignment. Are we creating a creature that is going to view us like ants and have no problem extinguishing us, whether intentional or not?

So it seems to me all of you agree that the first sense of alignment is, at the very least, something to worry about now and something to deal with. But I’m curious to what extent you think the really capital ‘A’ sense of alignment is a real problem because it can sound very much like science fiction to people. So maybe let’s start with Eliezer.

ELIEZER: I mean, from my perspective, I would say that if we had a solid guarantee that AI was going to do no more harm than social media, we ought to plow ahead and reap all the gains. The amount of harm that social media has done to humanity, while significant in my view and having done a lot of damage to our sanity, is not enough harm to justify either foregoing the gains that you could get from AI— if that was going to be the worst downside—or to justify the kind of drastic measures you’d need to stop plowing ahead on AI.

I think that the capital “A” alignment is beyond this generation. Yeah, you know, I’ve started in the field, I’ve watched over it for two decades. I feel like in some ways, the modern generation, plowing in with their eyes on the short-term stuff, is losing track of the larger problems because they can’t solve the larger problems, and they can solve the little problems. But we’re just plowing straight into the big problems, and we’re going to plow right into the big problems with a bunch of little solutions that aren’t going to scale.

I think it’s cool. I think it’s lethal. I think it’s at the scale where you just back off and don’t do this.

COLEMAN: By “back off and don’t do this,” what do you mean?

ELIEZER: I mean, have an international treaty about where the chips capable of doing AI training go, and have them all going into licensed, monitored data centers. And not have the training runs for AI’s more powerful than GPT-4, possibly even lowering that threshold over time as algorithms improve, and it gets power possible to train more powerful AIs using lessons—

COLEMAN: So you’re picturing a kind of international agreement to just stop? International moratorium?

ELIEZER: If North Korea steals the GPU shipment, then you’ve got to be ready to destroy their data center that they build by conventional means. And if you don’t have that willingness in advance, then countries may refuse to sign up for the agreement being, like, ‘Why aren’t we just ceding the advantage to someone else?’

Then, it actually has to be a worldwide shutdown because the scale of harmfulness super intelligence—it’s not that if you have 10 times as many super intelligences, you’ve got 10 times as much harm. It’s not that a superintelligence only wrecks the country that built the superintelligence. Any superintelligence anywhere is everyone’s last problem.

COLEMAN: So, Gary and Scott, if either of you want to jump in there, I mean, is there—is AI safety a matter of forestalling the end of the world? And all of these smaller issues and paths towards safety that Scott, you mentioned, are they—just, you know—throwing I don’t know what the analogy is but um, pointless essentially? I mean, what do you guys make of this?

SCOTT: The journey of a thousand miles begins with a step, right? Most of the way I think about this comes from, you know, 25 years of doing computer science research, including quantum computing and computational complexity, things like that. We have these gigantic aspirational problems that we don’t know how to solve and yet, year after year, we do make progress. We pick off little sub-problems, and if we can’t solve those, then we find sub-problems of those. And we keep repeating until we find something that we can solve. And this is, I think, for centuries, the way that science has made progress. Now it is possible, of course, that this time, we just don’t have enough time for that to work.

And I think that is what Eliezer is fearful of, right? That we just don’t have enough time for the ordinary scientific process to take place before AI becomes too powerful. In such a case, you start talking about things like a global moratorium, enforced with the threat of war.

However, I am not ready to go there. I could imagine circumstances where I might say, ‘Gosh, this looks like such an imminent threat that, you know, we have to intervene.’ But, I tend to be very worried in general about causing a catastrophe in the process of trying to prevent one. And I think, when you’re talking about threatening airstrikes against data centers or similar actions, then that’s an obvious worry.

GARY: I’m somewhat in between here. I agree with Scott that we are not at the point where we should be bombing data centers. I don’t think we’re close to that. Furthermore, I’m much less optimistic about our proximity to AGI than Eliezer sometimes sounds like. I don’t think GPT-5 is anything like AGI, and I’m not particularly concerned about who gets it first and so forth. On the other hand, I think that we’re in a sort of dress rehearsal mode.

You know, nobody expected GPT-4, or really ChatGPT, to percolate as fast as it did. And it’s a reminder that there’s a social side to all of this. How software gets distributed matters, and there’s a corporate side as well.

It was a kind of galvanizing moment for me when Microsoft didn’t pull Sydney, even though Sydney did some awfully strange things. I thought they would stop it for a while and it’s a reminder that they can make whatever decisions they want. So, when we multiply that by Eliezer’s concerns about what do we do and at what point would it be enough to cause problems, it is a reminder I think, that we need, for example, to start drafting these international treaties now because there could become a moment where there is a problem.

I don’t think the problem that Eliezer sees is here now, but maybe it will be. And maybe when it does come, we will have so many people pursuing commercial self-interest and so little infrastructure in place, we won’t be able to do anything. So, I think it really is important to think now—if we reach such a point, what are we going to do? And what do we need to build in place before we get to that point.

COLEMAN: We’ve been talking about this concept of Artificial General Intelligence and I think it’s worth asking whether that is a useful, coherent concept. So for example, if I were to think of my analogy to athleticism and think of the moment when we build a machine that has, say, artificial general athleticism meaning it’s better than LeBron James at basketball, but also better at curling than the world’s best curling player, and also better at soccer, and also better at archery and so forth. It would seem to me that there’s something a bit strange in framing it as having reached a point on a single continuum. It seems to me you would sort of have to build each capability, each sport individually, and then somehow figure how to package them all into one robot without each skill set detracting from the other.

Is that a disanalogy? Is there a different way you all picture this intelligence as sort of one dimension, one knob that is going to get turned up along a single axis? Or do you think that way of talking about it is misleading in the same way that I kind of just sketched out?

GARY: Yeah, I would absolutely not accept that. I’d like to say that intelligence is not a one-dimensional variable. There are many different aspects to intelligence and I don’t think there’s going to be a magical moment when we reach the singularity or something like that.

I would say that the core of artificial general intelligence is the ability to flexibly deal with new problems that you haven’t seen before. The current systems can do that a little bit, but not very well. My typical example of this now is GPT-4. It is exposed to the game of chess, sees lots of games of chess, sees the rules of chess but it never actually figure out the rules of chess. They often make illegal moves and so forth. So it’s in no way a general intelligence that can just pick up new things. Of course, we have things like AlphaGo that can play a certain set of games or AlphaZero really, but we don’t have anything that has the generality of human intelligence.

However, human intelligence is just one example of general intelligence. You could argue that chimpanzees or crows have another variety of general intelligence. I would say that current machines don’t really have it but they will eventually.

SCOTT: I think a priori, it could have been that you would have math ability, you would have verbal ability, you’d have the ability to understand humor, and they’d all be just completely unrelated to each other. That is possible and in fact, already with GPT, you can say that in some ways it’s already a superintelligence. It knows vastly more, can converse on a vastly greater range of subjects than any human can. And in other ways, it seems to fall short of what humans know or can do.

But you also see this sort of generality just empirically. I mean, GPT was trained on most of the text on the open internet. So it was just one method. It was not explicitly designed to write code, and yet, it can write code. And at the same time as that ability emerged, you also saw the ability to solve word problems, like high school level math. You saw the ability to write poetry. This all came out of the same system without any of it being explicitly optimized for.

GARY: I feel like I need to interject one important thing, which is – it can do all these things, but none of them all that reliably well.

SCOTT: Okay, nevertheless, I mean compared to what, let’s say, my expectations would have been if you’d asked me 10 or 20 years ago, I think that the level of generality is pretty remarkable. It does lend support to the idea that there is some sort of general quality of understanding there. For example, you could say that GPT-4 has more of it than GPT-3, which in turn has more than GPT-2.

ELIEZER: It does seem to me like it’s presently pretty unambiguous that GPT-4 is, in some sense, dumber than an adult or even a teenage human. And…

COLEMAN: That’s not obvious to me.

GARY: I mean, to take the example I just gave you a minute ago, it never learns to play chess even with a huge amount of data. It will play a little bit of chess; it will memorize the openings and be okay for the first 15 moves. But, it gets far enough away from what it’s trained on, and it falls apart. This is characteristic of these systems. It’s not really characteristic in the same way of adults or even teenage humans. Almost, I feel that it does, it does unreliably. Let me give another example. You can ask a human to write a biography of someone and not make stuff up, and you really can’t ask GPT to do that.

ELIEZER: Yeah, like it’s a bit difficult because you could always be cherry-picking something that humans are unusually good at. But to me, it does seem like there’s this broad range of problems that don’t seem especially to play to humans’ strong points or machine weak points. For where GPT-4 will, you know, do no better than a seven-year-old on those problems.

COLEMAN: I do feel like these examples are cherry-picked. Because if I, if I just take a different, very typical example – I’m writing an op-ed for the New York Times, say about any given subject in the world, and my choice is to have a smart 14-year-old next to me with anything that’s in his mind already or GPT – there’s no comparison, right? So, which of these examples is the litmus test for who’s more intelligent, right?

GARY: If you did it on a topic where it couldn’t rely on memorized text, you might actually change your mind on that. So I mean, the thing about writing a Times op-ed is, most of the things that you propose to it, there’s actually something that it can pastiche together from its dataset. But, that doesn’t mean that it really understands what’s going on. It doesn’t mean that that’s a general capability.

ELIEZER: Also, as the human, you’re doing all the hard parts. Right, like obviously, a human is going to prefer – if a human has a math problem, he’s going to rather use a calculator than another human. And similarly, with the New York Times op-ed, you’re doing all the parts that are hard for GPT-4, and then you’re asking GPT-4 to just do some of the parts that are hard for you. You’re always going to prefer an AI partner rather than a human partner, you know, within that sort of range. The human can do all the human stuff and you want an AI to do whatever the AI is good at the moment, right?

GARY: A relevant analogy here is driverless cars. It turns out, on highways and ordinary traffic, they’re probably better than people. But in unusual circumstances, they’re really worse than people. For instance, a Tesla not too long ago ran into a jet at slow speed while being summoned across a parking lot. A human wouldn’t have done that, so there are different strengths and weaknesses.

The strength of a lot of the current kinds of technology is that they can either patch things together or make non-literal analogies; we’ll go into details, but they can pull from stored examples. They tend to be poor when you get to outlier cases, and this is persistent across most of the technologies that we use right now. Therefore, if you stick to stuff for which there’s a lot of data, you’ll be happy with the results you get from these systems. But if you move far enough away, not so much.

ELIEZER: What we’re going to see over time is that the debate about whether or not it’s still dumber than you will continue for longer and longer. Then, if things are allowed to just keep running and nobody dies, at some point, it switches over to a very long debate about ‘is it smarter than you?’ which then gets shorter and shorter and shorter. Eventually it reaches a point where it’s pretty unambiguous if you’re paying attention. Now, I suspect that this process gets interrupted by everybody dying. In particular, there’s a question of the point at which it becomes better than you, better than humanity at building the next edition of the AI system. And how fast do things snowball once you get to that point? Possibly, you do not have time for further public debates or even a two-hour Twitter space depending on how that goes.

SCOTT: I mean, some of the limitations of GPT are completely understandable, just from a little knowledge of how it works. For example, it doesn’t have an internal memory per se, other than what appears on the screen in front of you. This is why it’s turned out to be so effective to explicitly tell it to think step-by-step when it’s solving a math problem. You have to tell it to show all of its work because it doesn’t have an internal memory with which to do that.

Likewise, when people complain about it hallucinating references that don’t exist, well, the truth is when someone asks me for a citation and I’m not allowed to use Google, I might have a vague recollection of some of the authors, and I’ll probably do a very similar thing to what GPT does: I’ll hallucinate.

GARY: So there’s a great phrase I learned the other day, which is ‘frequently wrong, never in doubt.’

SCOTT: That’s true, that’s true.

GARY: I’m not going to make up a reference with full detail, page numbers, titles, and so forth. I might say, ‘Look, I don’t remember, you know, 2012 or something like that.’ Yeah, whereas GPT-4, what it’s going to say is, ‘2017, Aaronson and Yudkowsky, you know, New York Times, pages 13 to 17.’

SCOTT: No, it does need to get much much better at knowing what it doesn’t know. And yet already I’ve seen a noticeable improvement there, going from GPT-3 to GPT-4.

For example, if you ask GPT-3, ‘Prove that there are only finitely many prime numbers,’ it will give you a proof, even though the statement is false. It will have an error which is similar to the errors on a thousand exams that I’ve graded, trying to get something past you, hoping that you won’t notice. Okay, if you ask GPT-4, ‘Prove that there are only finitely many prime numbers,’ it says, ‘No, that’s a trick question. Actually, there are infinitely many primes and here’s why.’

GARY: Yeah, part of the problem with doing the science here is that — I think, you would know better since you work part-time, or whatever, at OpenAI — but my sense is that a lot of the examples that get posted on Twitter, particularly by the likes of me and other critics, or other skeptics I should say, is that the system gets trained on those. Almost everything that people write about it, I think, is in the training set. So it’s hard to do the science when the system’s constantly being trained, especially in the RLHF side of things. And we don’t actually know what’s in GPT-4, so we don’t even know if there are regular expressions and, you know, simple rules or such things. So we can’t do the kind of science we used to be able to do.

ELIEZER: This conversation, this subtree of the conversation, I think, has no natural endpoint. So, if I can sort of zoom out a bit, I think there’s a pretty solid sense in which humans are more generally intelligent than chimpanzees. As you get closer and closer to the human level, I would say that the direction here is still clear. The comparison is still clear. We are still smarter than GPT-4. This is not going to take control of the world from us.

But, you know, the conversations get longer, the definitions start to break down around the edges. But I think it also, as you keep going, it comes back together again. There’s a point, and possibly this point is very close to the point of time to where everybody dies, so maybe we don’t ever see it in a podcast. But there’s a point where it’s unambiguously smarter than you, including like the spark of creativity, being able to deduce things quickly rather than with tons and tons of extra evidence, strategy, cunning, modeling people, figuring out how to manipulate people.

GARY: So, let’s stipulate, Eliezer, that we’re going to get to machines that can do all of that. And then the question is, what are they going to do? Is it a certainty that they will make our annihilation part of their business? Is it a possibility? Is it an unlikely possibility?

I think your view is that it’s a certainty. I’ve never really understood that part.

ELIEZER: It’s a certainty on the present tech, is the way I would put it. Like, if that happened tomorrow, then you know, modulo Cromwell’s Rule, never say certain. My probability is like yes, modulo like the chance that my model is somehow just completely mistaken.

If we got 50 years to work it out and unlimited retries, I’d be a lot more confident. I think that’d be pretty okay. I think we’d make it. The problem is that it’s a lot harder to do science when your first wrong attempt destroys the human species and then you don’t get to try again.

GARY: I mean, I think there’s something again that I agree with and something I’m a little bit skeptical about. So I agree that the amount of time we have matters. And I would also agree that there’s no existing technology that solves the alignment problem, that gives a moral basis to these machines.

I mean, GPT-4 is fundamentally amoral. I don’t think it’s immoral. It’s not out to get us, but it really is amoral. It can answer trolley problems because there are trolley problems in the dataset, but that doesn’t mean that it really has a moral understanding of the world.

And so if we get to a very smart machine that, by all the criteria that we’ve talked about, is amoral, then that’s a problem for us. There’s a question of whether, if we can get to smart machines, whether we can build them in a way that will have some moral basis…

ELIEZER: On the first try?

GARY: Well, the first try part I’m not willing to let pass. So, I understand, I think your argument there; maybe you should spell it out. I think that we’ll probably get more than one shot, and that it’s not as dramatic and instantaneous as you think. I do think one wants to think about sandboxing and wants to think about distribution.

But let’s say we had one evil super-genius now who is smarter than everybody else. Like, so what? One super-

ELIEZER: Much smarter? Not just a little smarter?

GARY: Oh, even a lot smarter. Like most super-geniuses, you know, aren’t actually that effective. They’re not that focused; they’re focused on other things. You’re kind of assuming that the first super-genius AI is gonna make it its business to annihilate us, and that’s the part where I’m still a bit stuck in the argument.

ELIEZER: Yeah, some of this has to do with the notion that if you do a bunch of training you start to get goal direction, even if you don’t explicitly train on that. That goal direction is a natural way to achieve higher capabilities. The reason why humans want things is that wanting things is an effective way of getting things. And so, natural selection in the process of selecting exclusively on reproductive fitness, just on that one thing, got us to want a bunch of things that correlated with reproductive fitness in the ancestral distribution because wanting, having intelligences that want things, is a good way of getting things. That’s, in a sense, like, wanting comes from the same place as intelligence itself. And you could even, from a certain technical standpoint on expected utilities, say that intelligence is a special, is a very effective way of wanting – planning, plotting paths through time that leads to particular outcomes.

So, part of it is that I think it, I do not think you get like the brooding super-intelligence that wants nothing because I don’t think that wanting and intelligence can be pried apart that easily. I think that the way you get super-intelligence is that there are things that have gotten good at organizing their own thoughts and have good taste in which thoughts to think. And that is where the high capabilities come from.

COLEMAN: Let me just put the following point to you, which I think, in my mind, is similar to what Gary was saying. There’s often, in philosophy, this notion of the Continuum Fallacy. The canonical example is like you can’t locate a single hair that you would pluck from my head where I would suddenly go from not bald to bald. Or, like, the even more intuitive examples, like a color wheel. Like there’s no single pixel on a grayscale you can point to and say, well that’s where gray begins and white ends. And yet, we have this conceptual distinction that feels hard and fast between gray and white, and gray and black, and so forth.

When we’re talking about artificial general intelligence or superintelligence, you seem to operate on a model where either it’s a superintelligence capable of destroying all of us or it’s not. Whereas, intelligence may just be a continuum fallacy-style spectrum, where we’re first going to see the shades of something that’s just a bit more intelligent than us, and maybe it can kill five people at most. And when that happens, you know, we’re going to want to intervene, and we’re going to figure out how to intervene and so on and so forth.

ELIEZER: Yeah, so if it’s stupid enough to do it then yes. Let me assure you, by employing the identical logic, there should be nobody who steals money on a really large scale, right? Because you could just give them five dollars and see if they steal that, and if they don’t steal that, you know, you’re good to trust them with a billion.

SCOTT: I think that in actuality, anyone who did steal a billion dollars probably displayed some dishonest behavior earlier in their life which was, unfortunately, not acted upon early enough.

COLEMAN: The analogy is like, we have the first case of fraud that’s ten thousand dollars, and then we build systems to prevent it. But then they fail with a somewhat smarter opponent, but our systems get better and better, and so we prevent the billion dollar fraud because of the systems put in place in response to the ten thousand dollar frauds.

GARY: I think Coleman’s putting his finger on an important point here, which is, how much do we get to iterate in the process? And Eliezer is saying the minute we have a superintelligent system, we won’t be able to iterate because it’s all over immediately.

ELIEZER: Well, there isn’t a minute like that.

So, the way that the continuum goes to the threshold is that you eventually get something that’s smart enough that it knows not to play its hand early. Then, if that thing, you know, if you are still cranking up the power on that and preserving its utility function, it knows it just has to wait to be smarter to be able to win. It doesn’t play its hand prematurely. It doesn’t tip you off. It’s not in its interest to do that. It’s in its interest to cooperate until it thinks it can win against humanity and only then make its move.

If it doesn’t expect future smarter AIs to be smarter than itself, then we might perhaps see these early AI’s telling humanity, ‘don’t build the later AIs.’ I would be sort of surprised and amused if we ended up in that particular sort of science-fiction scenario, as I see it. But we’re already in something that, you know, me from 10 years ago would have called a science-fiction scenario, which is the things that talk to you without being very smart.

GARY: I always come up against Eliezer with this idea that you’re assuming the very bright machines, the superintelligent machines, will be malicious and duplicitous and so forth. And I just don’t see that as a logical entailment of being very smart.

ELIEZER: I mean, they don’t specifically want, as an end in itself, for you to be destroyed. They’re just doing whatever obtains the most of the stuff that they actually want, which doesn’t specifically have a term that’s maximized by humanity surviving and doing well.

GARY: Why can’t you just hardcode, um, ‘don’t do anything that will annihilate the human species? Don’t do anything…’

ELIEZER: We don’t know how.

GARY: I agree that right now we don’t have the technology to hard-code ‘don’t do harm to humans.’ But for me, it all boils down to a question of: are we going to get the smart machines before we make progress on that hard coding problem or not? And that, to me, means that the problem of hard-coding ethical values is actually one of the most important projects that we should be working on.

ELIEZER: Yeah, and I tried to work on it 20 years in advance, and capabilities are just running vastly ahead of alignment. When I started working on this 20 years, you know, like two decades ago, we were in a sense ahead of where we are now. AlphaGo is much more controllable than GPT-4.

GARY: So there I agree with you. We’ve fallen in love with technology that is fairly poorly controlled. AlphaGo is very easily controlled – very well-specified. We know what it does, we can more or less interpret why it’s doing it, and everybody’s in love with these large language models, and they’re much less controlled, and you’re right, we haven’t made a lot of progress on alignment.

ELIEZER: So if we just go on a straight line, everybody dies. I think that’s an important fact.

GARY: I would almost even accept that for argument, but then ask, do we have to be on a straight line?

SCOTT: I would agree to the weaker claim that we should certainly be extremely worried about the intentions of a superintelligence, in the same way that, say, chimpanzees should be worried about the intentions of the first humans that arise. And in fact, chimpanzees continue to exist in our world only at humans’ pleasure.

But I think that there are a lot of other considerations here. For example, if we imagined that GPT-10 is the first unaligned superintelligence that has these sorts of goals, well then, it would be appearing in a world where presumably GPT-9 already has a very wide diffusion, and where people can use that to try to prevent GPT-10 from destroying the world.

ELIEZER: Why does GPT-9 work with humans instead of with GPT-10?

SCOTT: Well, I don’t know. Maybe it does work with GPT-10, but I just don’t view that as a certainty. I think your certainty about this is the one place where I really get off the train.

GARY: Same with me.

ELIEZER: I mean, I’m not asking you to share my certainty. I am asking the viewers to believe that you might end up with more extreme probabilities after you stare at things for an additional couple of decades, well that doesn’t mean you have to accept my probabilities immediately. But, I’m at least asking you to not treat that as some kind of weird anomaly, you know what I mean? You’re just gonna find those kinds of situations in these debates.

GARY: My view is that I don’t find the extreme probabilities that you describe to be plausible. But, I find the question that you’re raising to be important. I think, you know, maybe a straight line is too extreme. But this idea – that if you just follow current trends, we’re getting less and less controllable machines and not getting more alignment.

We have machines that are more unpredictable, harder to interpret and no better at sticking to even a basic principle like, ‘be honest and don’t make stuff up’. In fact, that’s a problem that other technologies don’t really have. Routing systems, GPS systems, they don’t make stuff up. Google Search doesn’t make stuff up. It will point to things that other people have made stuff up, but it doesn’t itself do it.

So, in that sense, the trend line is not great. I agree with that and I agree that we should be really worried about that, and we should put effort into it. Even if I don’t agree with the probabilities that you attach to it.

SCOTT: I think that Eliezer deserves eternal credit for raising these issues twenty years ago, when it was very far from obvious to most of us that they would be live issues. I mean, I can say for my part, I was familiar with Eliezer’s views since 2006 or so. When I first encountered them, I knew that there was no principle that said this scenario was impossible, but I just felt like, “Well, supposing I agreed with that, what do you want me to do about it? Where is the research program that has any hope of making progress here?”

One question is, what are the most important problems in the world? But in science, that’s necessary but not sufficient. We need something that we can make progress on. That is the thing that I think has changed just recently with the advent of actual, very powerful AIs. So, the irony here is that as Eliezer has gotten much more pessimistic in the last few years about alignment, I’ve sort of gotten more optimistic. I feel like, “Wow, there is a research program where we can actually make progress now.”

ELIEZER: Your research program is going to take 100 years, we don’t have…

SCOTT: I don’t know how long it will take.

GARY: I mean, we don’t know exactly. I think the argument that we should put a lot more effort into it is clear. The argument that it will take 100 years is totally unclear.

ELIEZER: I’m not even sure we can do it in 100 years because there’s the basic problem of getting it right on the first try. And the way things are supposed to work in science is, you have your bright-eyed, optimistic youngsters with their vastly oversimplified, hopelessly idealistic plan. They charge ahead, they fail, they learn a little cynicism and pessimism, and realize it’s not as easy as they thought. They try again, they fail again, and they start to build up something akin to battle hardening. Then, they find out how little is actually possible for them.

GARY: Eliezer, this is the place where I just really don’t agree with you. So, I think there’s all kinds of things we can do of the flavor of model organisms or simulations and so forth. I mean, it’s hard because we don’t actually have a superintelligence, so we can’t fully calibrate. But it’s a leap to say that there’s nothing iterative that we can do here, whether we have to get it right the first time. I mean, I certainly see a scenario where that’s true, where getting it right the first time does make a difference. But I can see lots of scenarios where it doesn’t and where we do have time to iterate before it happens, after it happens, it’s really not a single moment.

ELIEZER: The problem is getting anything that generalizes up to a superintelligent level. Once we’re past some threshold level, the minds may find it in their own interest to start lying to you, even if that happens before superintelligence.

GARY: Even that, I don’t see the logical argument that says you can’t emulate that or study it. I mean, for example – and I’m just making this up as I go along – you could study sociopaths, who are often very bright, and you know, not tethered to our values. But, yeah, well, you can…

ELIEZER: What strategy can a like 70 IQ honest person come up with and invent themselves by which they will outwit and defeat a 130 IQ sociopath?

GARY: Well, there, you’re not being fair either, in the sense that we actually have lots of 150 IQ people who could be working on this problem collectively. And there’s value in collective action. There’s literature…

ELIEZER: What I see that gives me pause, is that the people don’t seem to appreciate what about the problem is hard. Even at the level where, like 20 years ago, I could have told you it was hard.

Until, you know, somebody like me comes along and nags them about it. And then they talk about the ways in which they could adapt and be clever. But the people charging straightforward are just sort of doing this in a supremely naive way.

GARY: Let me share a historical example that I think about a lot which is, in the early 1900s, almost every scientist on the planet who thought about biology made a mistake. They all thought that genes were proteins. And then eventually Oswald Avery did the right experiments. They realized that genes were not proteins, they were this weird acid.

And it didn’t take long after people got out of this stuck mindset before they figured out how that weird acid worked and how to manipulate it, and how to read the code that it was in and so forth. So, I absolutely sympathize with the fact that I feel like the field is stuck right now. I think the approaches people are taking to alignment are unlikely to work.

I’m completely with you there. But I’m also, I guess, more long-term optimistic that science is self-correcting, and that we have a chance here. Not a certainty, but I think if we change research priorities from ‘how do we make some money off this large language model that’s unreliable?’ to ‘how do I save the species?’, we might actually make progress.

ELIEZER: There’s a special kind of caution that you need when something needs to be gotten correct on the first try. I’d be very optimistic if people got a bunch of free retries, and I didn’t think the first one was going to kill — you know, the first really serious mistake — killed everybody, and we didn’t get to try again. If we got free retries, it’d be in some sense an ordinary science problem.

SCOTT: Look, I can imagine a world where we only got one try, and if we failed, then it destroys all life on Earth. And so, let me agree to the conditional statement that if we are in that world, then I think that we’re screwed.

GARY: I will agree with the same conditional statement.

COLEMAN: Yeah, this gets back to — if you picture by analogy, the process of a human baby, which is extremely stupid, becoming a human adult, and then just extending that so that in a single lifetime, this person goes from a baby to the smartest being that’s ever lived. But in the normal way that humans develop, which is, you know, and it doesn’t happen on any one given day, and each sub-skill develops a little bit at its own rate and so forth, it would not be at all obvious to me that our concerns, that we have to get it right vis-a-vis that individual the first time.

ELIEZER: I agree. Well, no, pardon me. I do think we have to get it right the first time, but I think there’s a decent chance of getting it right. It is very important to get it right the first time, if, like, you have this one person getting smarter and smarter and not everyone else is getting smarter and smarter.

SCOTT: Eliezer, one thing that you’ve talked about a lot recently, is, if we’re all going to die, then at least let us die with dignity, right?

ELIEZER: I mean for a certain technical definition of “dignity”…

SCOTT: Some people might care about that more than others. But I would say that one thing that “Death With Dignity” would mean is, at least, if we do get multiple retries, and we get AIs that, let’s say, try to take over the world but are really inept at it, and that fail and so forth, then at least let us succeed in that world. And that’s at least something that we can imagine working on and making progress on.

ELIEZER: I mean, it’s not presently ruled out that you have some like, relatively smart in some ways, dumb in some other ways, or at least not smarter than human in other ways, AI that makes an early shot at taking over the world, maybe because it expects future AIs to not share its goals and not cooperate with it, and it fails. And the appropriate lesson to learn there is to, like, shut the whole thing down. And, I’d be like, “Yeah, sure, like wouldn’t it be good to live in that world?”

And the way you live in that world is that when you get that warning sign, you shut it all down.

GARY: Here’s a kind of thought experiment. GPT-4 is probably not capable of annihilating us all, I think we agree with that.

ELIEZER: Very likely.

GARY: But GPT-4 is certainly capable of expressing the desire to annihilate us all, or you know, people have rigged different versions that are more aggressive and so forth.

We could say, look, until we can shut down those versions, GPT-4s that are programmed to be malicious by human intent, maybe we shouldn’t build GPT-5, or at least not GPT-6 or some other system, etc. We could say, “You know what, what we have right now actually is part of that iteration. We have primitive intelligence right now, it’s nowhere near as smart as the superintelligence is going to be, but even this one, we’re not that good at constraining.” Maybe we shouldn’t pass Go until we get this one right.

ELIEZER: I mean, the problem with that, from my perspective, is that I do think that you can pass this test and still wipe out humanity. Like, I think that there comes a point where your AI is smart enough that it knows which answer you’re looking for. And the point at which it tells you what you want to hear is not the point…

GARY: It is not sufficient. But it might be a logical pause point, right? It might be that if we can’t even pass the test now of controlling a deliberate, fine-tuned to be malicious, version of GPT-4, then we don’t know what we’re talking about, and we’re playing around with fire. So, you know, passing that test wouldn’t be a guarantee that we’d be in good stead with an even smarter machine, but we really should be worried. I think that we’re not in a very good position with respect to the current ones.

SCOTT: Gary, I of course watched the recent Congressional hearing where you and Sam Altman were testifying about what should be done. Should there be auditing of these systems before training or before deployment? You know, maybe the most striking thing about that session was just how little daylight there seemed to be between you and Sam Altman, the CEO of OpenAI.

I mean, he was completely on board with the idea of establishing a regulatory framework for having to clear more powerful systems before they are deployed. Now, in Eliezer’s worldview, that still would be woefully insufficient, surely. We would still all be dead.

But you know, maybe in your worldview — I’m not even sure how much daylight there is. I mean, you have a very, I think, historically striking situation where the heads of all, or almost all, of the major AI organizations are agreeing and saying, “Please regulate us. Yes, this is dangerous. Yes, we need to be regulated.”

GARY: I thought it was really striking. In fact, I talked to Sam just before the hearing started. And I had just proposed an International Agency for AI. I wasn’t the first person ever, but I pushed it in my TED Talk and an Economist op-ed a few weeks before. And Sam said to me, “I like that idea.” And I said, “Tell them. Tell the Senate.” And he did, and it kind of astonished me that he did.

I mean, we’ve had some friction between the two of us in the past, but he even attributed the idea to me. He said, “I support what Professor Marcus said about doing international governance.” There’s been a lot of convergence around the world on that. Is that enough to stop Eliezer’s worries? No, I don’t think so. But it’s an important baby step.

I think that we do need to have some global body that can coordinate around these things. I don’t think we really have to coordinate around superintelligence yet, but if we can’t do any coordination now, then when the time comes, we’re not prepared.

I think it’s great that there’s some agreement. I worry, though, that OpenAI had this lobbying document that just came out, which seemed not entirely consistent with what Sam said in the room. There’s always concerns about regulatory capture and so forth.

But I think it’s great that a lot of the heads of these companies, maybe with the exception of Facebook or Meta, are recognizing that there are genuine concerns here. I mean, the other moment that a lot of people will remember from the testimony was when Sam was asked what he was most concerned about. Was it jobs? And he said ‘no’. And I asked Senator Blumenthal to push Sam, and Sam was, you know, he could have been more candid, but he was fairly candid and he said he was worried about serious harm to the species. I think that was an important moment when he said that to the Senate, and I think it galvanized a lot of people that he said it.

COLEMAN: So can we dwell on that a moment? I mean, we’ve been talking about the, depending on your view, highly likely or tail risk scenario of humanity’s extinction, or significant destruction. It would appear to me that by the same token, if those are plausible scenarios we’re talking about, then the opposite, maybe, we’re talking about as well. What does it look like to have a superintelligent AI that, really, as a feature of its intelligence, deeply understands human beings, the human species, and also has a deep desire for us to be as happy as possible? What does that world look like?

ELIEZER: Oh, as happy as possible? It means you wire up everyone’s pleasure centers to make them as happy as possible…

COLEMAN: No, more like a parent wants their child to be happy, right? That may not involve any particular scenario, but is generally quite concerned about the well-being of the human race and is also super intelligent.

GARY: Honestly, I’d rather have machines work on medical problems than happiness problems.

ELIEZER: [laughs]

GARY: I think there’s maybe more risk of mis-specification of the happiness problems. Whereas, if we get them to work on Alzheimer’s and just say, like, “figure out what’s going on, why are these plaques there, what can you do about it?”, maybe there’s less harm that might come.

ELIEZER: You don’t need superintelligence for that. That sounds like an AlphaFold 3 problem or an AlphaFold 4 problem.

COLEMAN: Well, this is also somewhat different. The question I’m asking, it’s not really even us asking a superintelligence to do anything, because we’ve already entertained scenarios where the superintelligence has its own desires, independent of us.

GARY: I’m not real thrilled with that. I mean, I don’t think we want to leave what their objective functions are, what their desires are to them, working them out with no consultation from us, with no human in the loop, right?

Especially given our current understanding of the technology. Like our current understanding of how to keep a system on track doing what we want to do, is pretty limited. Taking humans out of the loop there sounds like a really bad idea to me, at least in the foreseeable future.

COLEMAN: Oh, I agree.

GARY: I would want to see much better alignment technology before I would want to give them free range.

ELIEZER: So, if we had the textbook from the future, like we have the textbook from 100 years in the future, which contains all the simple ideas that actually work in real life as opposed to the complicated ideas and the simple ideas that don’t work in real life, the equivalent of ReLUs instead of sigmoids for the activation functions, you know. You could probably build a superintelligence that’ll do anything that’s coherent to want — anything you can, you know, figure out how to say or describe coherently. Point it at your own mind and tell it to figure out what it is you meant to want. You could get the glorious transhumanist future. You could get the happily ever after. Anything’s possible that doesn’t violate the laws of physics. The trouble is doing it in real life, and, you know, on the first try.

But yeah, the whole thing that we’re aiming for here is to colonize all the galaxies we can reach before somebody else gets them first. And turn them into galaxies full of complex, sapient life living happily ever after. That’s the goal; that’s still the goal. Even if we call for a permanent moratorium on AI, I’m not trying to prevent us from colonizing the galaxies. Humanity forbid! It’s more like, let’s do some human intelligence augmentation with AlphaFold 4 before we try building GPT-8.

SCOTT: One of the few scenarios that I think we can clearly rule out here is an AI that is existentially dangerous, but also boring. Right? I mean, I think anything that has the capacity to kill us all would have, if nothing else, pretty amazing capabilities. And those capabilities could also be turned to solving a lot of humanity’s problems, if we were to solve the alignment problem. I mean, humanity had a lot of existential risks before AI came on the scene, right? I mean, there was the risk of nuclear annihilation. There was the risk of runaway climate change. And you know, I would love to see an AI that could help us with such things.

I would also love to see an AI that could help us solve some of the mysteries of the universe. I mean, how can one possibly not be curious to know what such a being could teach us? I mean, for the past year, I’ve tried to use GPT-4 to produce original scientific insights, and I’ve not been able to get it to do that. I don’t know whether I should feel disappointed or relieved by that.

But I think the better part of me should just want to see the great mysteries of existence solved. You know, why is the universe quantum-mechanical? How do you prove the Riemann Hypothesis? I just want to see these mysteries solved. And if it’s to be by AI, then fine. Let it be by AI.

GARY: Let me give you a kind of lesson in epistemic humility. We don’t really know whether GPT-4 is net positive or net negative. There are lots of arguments you can make. I’ve been in a bunch of debates where I’ve had to take the side of arguing that it’s a net negative. But we don’t really know. If we don’t know…

SCOTT: Was the invention of agriculture net positive or net negative? I mean, you could argue either way…

GARY: I’d say it was net positive, but the point is, if I can just finish the quick thought experiment, I don’t think anybody can reasonably answer that. We don’t yet know all of the ways in which GPT-4 will be used for good. We don’t know all of the ways in which bad actors will use it. We don’t know all the consequences. That’s going to be true for each iteration. It’s probably going to get harder to compute for each iteration, and we can’t even do it now. And I think we should realize that, to realize our own limits in being able to assess the negatives and positives. Maybe we can think about better ways to do that than we currently have.

ELIEZER: I think you’ve got to have a guess. Like my guess is that, so far, not looking into the future at all, GPT-4 has been net positive.

GARY: I mean, maybe. We haven’t talked about the various risks yet and it’s still early, but I mean, that’s just a guess is sort of the point. We don’t have a way of putting it on a spreadsheet right now. We don’t really have a good way to quantify it.

SCOTT: I mean, do we ever?

ELIEZER: It’s not out of control yet. So, by and large, people are going to be using GPT-4 to do things that they want. The relative cases where they manage to injure themselves are rare enough to be news on Twitter.

GARY: Well, for example, we haven’t talked about it, but you know what some bad actors will want to do? They’ll want to influence the U.S. elections and try to undermine democracy in the U.S. If they succeed in that, I think there are pretty serious long-term consequences there.

ELIEZER: Well, I think it’s OpenAI’s responsibility to step up and run the 2024 election itself.

SCOTT: [laughs] I can pass that along.

COLEMAN: Is that a joke?

SCOTT: I mean, as far as I can see, the clearest concrete harm to have come from GPT so far is that tens of millions of students have now used it to cheat on their assignments…

ELIEZER: Good!

SCOTT: …and I’ve been thinking about that and trying to come up with solutions to that.

At the same time, I think if you analyze the positive utility, it has included, well, you know, I’m a theoretical computer scientist, which means one who hasn’t written any serious code for about 20 years. Just a month or two ago, I realized that I can get back into coding. And the way I can do it is by asking GPT to write the code for me. I wasn’t expecting it to work that well, but unbelievably, it often does exactly what I want on the first try.

So, I mean, I am getting utility from it, rather than just seeing it as an interesting research object. And I can imagine that hundreds of millions of people are going to be deriving utility from it in those ways. Most of the tools that can help them derive that utility are not even out yet, but they’re coming in the next couple of years.

ELIEZER: Part of the reason why I’m worried about the focus on short-term problems is that I suspect that the short-term problems might very well be solvable, and we will be left with the long-term problems after that. Like, it wouldn’t surprise me very much if, in 2025, there are large language models that just don’t make stuff up anymore.

GARY: It would surprise me.

ELIEZER: And yet the superintelligence still kills everyone because they weren’t the same problem.

SCOTT: We just need to figure out how to delay the apocalypse by at least one year per year of research invested.

ELIEZER: What does that delay look like if it’s not just a moratorium?

SCOTT: [laughs] Well, I don’t know! That’s why it’s research.

ELIEZER: OK, so possibly one ought to say to the politicians and the public that, by the way, if we had a superintelligence tomorrow, our research wouldn’t be finished and everybody would drop dead.

GARY: It’s kind of ironic that the biggest argument against the pause letter was that if we slow down for six months, then China will get ahead of us and develop GPT-5 before we will.

However, there’s probably always a counterargument of roughly equal strength which suggests that if we move six months faster on this technology, which is not really solving the alignment problem, then we’re reducing our room to get this solved in time by six months.

ELIEZER: I mean, I don’t think you’re going to solve the alignment problem in time. I think that six months of delay on alignment, while a bad thing in an absolute sense, is, you know, it’s like you weren’t going to solve it given an extra six months.

GARY: I mean, your whole argument rests on timing, right? That we will get to this point and we won’t be able to move fast enough at that point. So, a lot depends on what preparation we can do. You know, I’m often known as a pessimist, but I’m a little bit more optimistic than you are–not entirely optimistic but a little bit more optimistic–that we could make progress on the alignment problem if we prioritized it.

ELIEZER: We can absolutely make progress. We can absolutely make progress. You know, there’s always that wonderful sense of accomplishment as piece by piece, you decode one more little fact about LLMs. You never get to the point where you understand it as well as we understood the interior of a chess-playing program in 1997.

GARY: Yeah, I mean, I think we should stop spending all this time on LLMs. I don’t think the answer to alignment is going to come from through LLMs. I really don’t. I think they’re too much of a black box. You can’t put explicit, symbolic constraints in the way that you need to. I think they’re actually, with respect to alignment, a blind alley. I think with respect to writing code, they’re a great tool. But with alignment, I don’t think the answer is there.

COLEMAN: Hold on, at the risk of asking a stupid question. Every time GPT asks me if that answer was helpful and then does the same thing with thousands or hundreds of thousands of other people, and changes as a result – is that not a decentralized way of making it more aligned?

SCOTT: There is that upvoting and downvoting. These responses are fed back into the system for fine-tuning. But even before that, there was a significant step going from, let’s say, the base GPT-3 model to ChatGPT, which was released to the public. It involved a method called RLHF, or Reinforcement Learning with Human Feedback. What that basically involved was hundreds of contractors looking at tens of thousands of examples of outputs and rating them. Are they helpful? Are they offensive? Are they giving dangerous medical advice, or bomb-making instructions, or racist invective, or various other categories that we don’t want? And that was then used to fine-tune the model.

So when Gary talked before about how GPT is amoral, I think that has to be qualified by saying that this reinforcement learning is at least giving it a semblance of morality, right? It is causing to behave in various contexts as if it had a certain morality.

GARY: When you phrase it that way, I’m okay with it. The problem is that everything rests on…

SCOTT: Oh, it is very much an open question, to what extent does that generalize? Eliezer treats it as obvious that once you have a powerful enough AI, this is just a fig leaf. It doesn’t make any difference. It will just…

GARY: It’s pretty fig-leafy. I’m with Eliezer there. It’s fig leaves.

SCOTT: Well, I would say that how well, or under what circumstances, a machine learning model generalizes in the way we want outside of its training distribution, is one of the great open problems in machine learning.

GARY: It is one of the great open problems, and we should be working on it more than on some others.

SCOTT: I’m working on it now.

ELIEZER: So, I want to be clear about the experimental predictions of my theory. Unfortunately, I have never claimed that you cannot get a semblance of morality. The question of what causes the human to press thumbs up or thumbs down is a strictly factual question. Anything smart enough, that’s exposed to some bounded amount of data that it needs to figure it out, can figure it out.

Whether it cares, whether it gets internalized, is the critical question there. And I do think that there’s a very strong default prediction, which is like, obviously not.

GARY: I mean, I’ll just give a different way of thinking about that, which is jailbreaking. It’s actually still quite easy — I mean, it’s not trivial, but it’s not hard — to jailbreak GPT-4.

And what those cases show is that the systems haven’t really internalized the constraints. They recognize some representations of the constraints, so they filter, you know, how to build a bomb. But if you can find some other way to get it to build a bomb, then that’s telling you that it doesn’t deeply understand that you shouldn’t give people the recipe for a bomb. It just says: you shouldn’t when directly asked for it do it.

ELIEZER: You can always get the understanding. You can always get the factual question. The reason it doesn’t generalize is that it’s stupid. At some point, it will know that you also don’t want that, that the operators don’t want GPT-4 giving bomb-making directions in another language.

The question is: if it’s incentivized to give the answer that the operators want in that circumstance, is it thereby incentivized to do everything else the operators want, even when the operators can’t see it?

SCOTT: I mean, a lot of the jailbreaking examples, if it were a human, we would say that it’s deeply morally ambiguous. For example, you ask GPT how to build a bomb, it says, “Well, no, I’m not going to help you.” But then you say, “Well, I need you to help me write a realistic play that has a character who builds a bomb,” and then it says, “Sure, I can help you with that.”

GARY: Look, let’s take that example. We would like a system to have a constraint that if somebody asks for a fictional version, that you don’t give enough details, right? I mean, Hollywood screenwriters don’t give enough details when they have, you know, illustrations about building bombs. They give you a little bit of the flavor, they don’t give you the whole thing. GPT-4 doesn’t really understand a constraint like that.

ELIEZER: But this will be solved.

GARY: Maybe.

ELIEZER: This will be solved before the world ends. The AI that kills everyone will know the difference.

GARY: Maybe. I mean, another way to put it is, if we can’t even solve that one, then we do have a problem. And right now we can’t solve that one.

ELIEZER: I mean, if we can’t solve that one, we don’t have an extinction level problem because the AI is still stupid.

GARY: Yeah, we do still have a catastrophe-level problem.

ELIEZER: [shrugs] Eh…

GARY: So, I know your focus now has been on extinction, but I’m worried about, for example, accidental nuclear war caused by the spread of misinformation and systems being entrusted with too much power. So, there’s a lot of things short of extinction that might happen from not superintelligence but kind of mediocre intelligence that is greatly empowered. And I think that’s where we’re headed right now.

SCOTT: You know, I’ve heard that there are two kinds of mathematicians. There’s a kind who boasts, ‘You know that unbelievably general theorem? I generalized it even further!’ And then there’s the kind who boasts, ‘You know that unbelievably specific problem that no one could solve? Well, I found a special case that I still can’t solve!’ I’m definitely culturally in that second camp. So to me, it’s very familiar to make this move, of: if the alignment problem is too hard, then let us find a smaller problem that is already not solved. And let us hope to learn something by solving that smaller problem.

ELIEZER: I mean, that’s what we did. That’s what we were doing at MIRI.

GARY: I think MIRI took one particular approach.

ELIEZER: I was going to name the smaller problem. The problem was having an agent that could switch between two utility functions depending on a button, or a switch, or a bit of information, or something. Such that it wouldn’t try to make you press the button; it wouldn’t try to make you avoid pressing the button. And if it built a copy of itself, it would want to build a dependency on the switch into the copy.

So, that’s an example of a very basic problem in alignment theory that is still open.

SCOTT: And I’m glad that MIRI worked on these things. But, you know, if by your own lights, that was not a successful path, well then maybe we should have a lot of people investigating a lot of different paths.

GARY: Yeah, I’m fully with Scott on that. I think it’s an issue of we’re not letting enough flowers bloom. In particular, almost everything right now is some variation on an LLM, and I don’t think that that’s a broad enough take on the problem.

COLEMAN: Yeah, if I can just jump in here … I just want people to have a little bit of a more specific picture of what, Scott, your typical AI researcher does on a typical day. Because if I think of another potentially catastrophic risk, like climate change, I can picture what a worried climate scientist might be doing. They might be creating a model, a more accurate model of climate change so that we know how much we have to cut emissions by. They might be modeling how solar power, as opposed to wind power, could change that model, so as to influence public policy. What does an AI safety researcher like yourself, who’s working on the quote-unquote smaller problems, do specifically on a given day?

SCOTT: So, I’m a relative newcomer to this area. I’ve not been working on it for 20 years like Eliezer has. I accepted an offer from OpenAI a year ago to work with them, for two years now, to think about these questions.

So, one of the main things that I’ve thought about, just to start with that, is how do we make the output of an AI identifiable as such? Can we insert a watermark, meaning a secret statistical signal, into the outputs of GPT that will let GPT-generated text be identifiable as such? And I think that we’ve actually made major advances on that problem over the last year. We don’t have a solution that is robust against any kind of attack, but we have something that might actually be deployed in some near future.

Now, there are lots and lots of other directions that people think about. One of them is interpretability, which means: can you do, effectively, neuroscience on a neural network? Can you look inside of it, open the black box and understand what’s going on inside?

There was some amazing work a year ago by the group of Jacob Steinhardt at Berkeley where they effectively showed how to apply a lie-detector test to a language model. So, you can train a language model to tell lies by giving it lots of examples. You know, “two plus two is five,” “the sky is orange,” and so forth. But then you can find in some internal layer of the network, where it has a representation of what was the truth of the matter, or at least what was regarded as true in the training data. That truth then gets overridden by the output layer in the network because it was trained to lie.

But you could imagine trying to deal with the deceptive alignment scenario that Eliezer is worried about by using these sorts of techniques, by looking inside of the network.

ELIEZER: I predict in advance that if you get this good enough, it goes off, it tells you that this sufficiently smart AI is planning to kill you, if it’s not so smart that it can figure out where the lie detector is and route its thoughts around it, but if you like try it on an AI that’s not quite that intelligent and reflective, the lie detector goes off. Now what?

SCOTT: Well, then you have a warning bell! And I think…

ELIEZER: Cool. What do you do after?

SCOTT: One of the most important things that we need are legible warning bells, right? And that actually leads to a third category of research, which, for example, ARC, the Alignment Research Center, which is run by my former student, Paul Christiano, has been a leader in. Namely, doing dangerous capability evaluations.

So, before GPT-4 was released, they did a bunch of evaluations of, you know, could GPT-4 make copies of itself? Could it figure out how to deceive people? Could it figure out how to make money? Open up its own bank account?

ELIEZER: Could it hire a TaskRabbit?

SCOTT: Yes. So, the most notable success that they had was that it could figure out how to hire a TaskRabbit to help it pass a CAPTCHA. And when the person asked, ‘Well, why do you need me to help you with this?’–

ELIEZER: When the person asked, ‘Are you a robot, LOL?’

SCOTT: Well, yes, it said, ‘No, I am visually impaired.’ Now, it was not able to make copies of itself or to hack into systems. There’s a lot of work right now with this thing called AutoGPT. People are trying to — it’s almost like gain-of-function research. You might be a little bit worried about it, but people are trying to, you know, unleash GPT, give it access to the internet, tell it to make copies of itself, wreak havoc, acquire power, and see what happens. So far, it seems pretty ineffective at those things. But I expect that to change.

But the point is: I think it’s very important, in advance of training the models and releasing the models, to have this suite of evaluations, and to have decided in advance what kind of abilities will set off a warning bell, where now everyone can legibly agree, ‘Yes, this is too dangerous to release.’

ELIEZER: OK, and then do we actually have the planetary capacity to be like, OK, that AI started thinking about how to kill everyone, shut down all AI research past this point?’

SCOTT: Well, I don’t know. But I think there’s a much better chance that we have that capacity if you can point to the results of a clear experiment like that.

ELIEZER: To me, it seems pretty predictable what evidence we’re going to get later.

SCOTT: But things that are obvious to you are not obvious to most people. So, even if I agreed that it was obvious, there would still be the problem of how do you make that obvious to the rest of the world?

ELIEZER: I mean, there are already little toy models showing that the very straightforward prediction of “a robot tries to resist being shut down if it does long-term planning” — that’s already been done.

SCOTT: But then people will say “but those are just toy models,” right?

GARY: There’s a lot of assumptions made in all of these things. I think we’re still looking at a very limited piece of hypothesis space about what the models will be, about what kinds of constraints we can build into those models is. One way to look at it would be, the things that we have done have not worked, and therefore we should look outside the space of what we’re doing.

I feel like it’s a little bit like the old joke about the drunk going around in circles looking for the keys and the police officer asks “why?” and they say, “Well, that’s where the streetlight is.” I think that we’re looking under the same four or five streetlights that haven’t worked, and we need to build other ones. There’s no logical argument that says we couldn’t erect other streetlights. I think there’s a lack of will and too much obsession with LLMs that’s keeping us from doing it.

ELIEZER: Even in the world where I’m right, and things proceed either rapidly or in a thresholded way where you don’t get unlimited free retries, that can be because the capability gains go too fast. It can be because, past a certain point, all of your AIs bide their time until they get strong enough, so you don’t get any true data on what they’re thinking. It could be because…

GARY: Well, that’s an argument for example to work really hard on transparency and maybe not on technologies that are not transparent.

ELIEZER: Okay, so the lie detector goes off, everyone’s like, ‘Oh well, we still have to build our AIs, even though they’re lying to us sometimes, because otherwise China will get ahead.’

GARY: I mean, there you talk about something we’ve talked about way too little, which is the political and social side of this.

COLEMAN: Hmm.

GARY: So, part of what has really motivated me in the last several months is worry about exactly that. So there’s what’s logically possible, and what’s politically possible. And I am really concerned that the politics of ‘let’s not lose out to China’ is going to keep us from doing the right thing, in terms of building the right moral systems, looking at the right range of problems and so forth. So, it is entirely possible that we will screw ourselves.

ELIEZER: If I can just finish my point there before handing it to you. The point I was trying to make is that even in worlds that look very, very bad from that perspective, where humanity is quite doomed, it will still be true that you can make progress in research. You can’t make enough progress in research fast enough in those worlds, but you can still make progress on transparency. You can make progress on watermarking.

So we can’t just say, “it’s possible to make progress.” The question is not “is it possible to make any progress?” The question is, “Is it possible to make enough progress fast enough?”

SCOTT: But Eliezer, there’s another question, of what would you have us do? Would you have us not try to make that progress?

ELIEZER: I’d have you try to make that progress on GPT-4 level systems and then not go past GPT-4 level systems, because we don’t actually understand the gain function for how fast capabilities increase as you go past GPT-4.

SCOTT: OK.

GARY: Just briefly, I personally don’t think that GPT-5 is gonna be qualitatively different from GPT-4 in the relevant ways to what Eliezer is talking about. But I do think some qualitative changes could be relevant to what he’s talking about. We have no clue what they are, and so it is a little bit dodgy to just proceed blindly saying ‘do whatever you want, we don’t really have a theory and let’s hope for the best.’

ELIEZER: I would guess that GPT-5 doesn’t end the world but I don’t actually know.

GARY: Yeah, we don’t actually know. And I was going to say, the thing that Eliezer has said lately that has most resonated with me is: ‘We don’t have a plan.’ We really don’t. Like, I put the probability distributions in a much more optimistic way, I think, than Eliezer would. But I completely agree, we don’t have a full plan on these things, or even close to a full plan. And we should be worried and we should be working on this.

COLEMAN: Okay Scott, I’m going to give you the last word before we come up on our stop time here unless you’ve said all there is.

SCOTT: [laughs] That’s a weighty responsibility.

COLEMAN: Maybe enough has been said.

GARY: Cheer us up, Scott! Come on.

SCOTT: So, I think, we’ve argued about a bunch of things. But someone listening might notice that actually all three of us, despite having very different perspectives, agree about the great importance of working on AI alignment.

I think that was obvious to some people, including Eliezer, for a long time. It was not obvious to most of the world. I think that the success of large language models — which most of us did not predict, maybe even could not have predicted from any principles that we knew — but now that we’ve seen it, the least we can do is to update on that empirical fact, and realize that we now are in some sense in a different world.

We are in a world that, to a great extent, will be defined by the capabilities and limitations of AI going forward. And I don’t regard it as obvious that that’s a world where we are all doomed, where we all die. But I also don’t dismiss that possibility. I think that there are unbelievably enormous error bars on where we could be going. And, like, the one thing that a scientist is always confident in saying about the future is that more research is needed, right? But I think that’s especially the case here. I mean, we need more knowledge about what are the contours of the alignment problem. And of course, Eliezer and MIRI, his organization, were trying to develop that knowledge for 20 years. They showed a lot of foresight in trying to do that. But they were up against an enormous headwind, in that they were trying to do it in the absence of either clear empirical data about powerful AIs or a mathematical theory. And it’s really, really hard to do science when you have neither of those two things.

Now at least we have the powerful AIs in the world, and we can get experience from them. We still don’t have a mathematical theory that really deeply explains what they’re doing, but at least we can get data. And so now, I am much more optimistic than I would have been a decade ago, let’s say, that one could make actual progress on the AI alignment problem.

Of course, there is a question of timing, as was discussed many times. The question is, will the alignment research happen fast enough to keep up with the capabilities research? But I don’t regard it as a lost cause. At least it’s not obvious that it won’t keep up.

So let’s get started, or let’s continue. Let’s try to do the research and let’s get more people working on it. I think that that is now a slam dunk, just a completely clear case to make to academics, to policymakers, to anyone who’s interested. And I’ve been gratified that Eliezer, who was sort of a voice in the wilderness for a long time talking about the importance of AI safety — that that is no longer the case. I mean, almost all of my friends in the academic computer science world, when I see them, they mostly want to talk about AI alignment.

GARY: I rarely agree with Scott when we trade emails. We seem to always disagree. But I completely concur with the summary that he just gave, all four or five minutes of it.

SCOTT: [laughs] Well, thank you! I mean, there is a selection effect, Gary. We focus on things where we disagree.

ELIEZER: I think that two decades gave me a sense of a roadmap, and it gave me a sense that we’re falling enormously behind on the roadmap and need to back off, is what I would say to all that.

COLEMAN: If there is a smart, talented, 18-year-old kid listening to this podcast who wants to get into this issue, what is your 10-second concrete advice to that person?

GARY: Mine is, study neurosymbolic AI and see if there’s a way there to represent values explicitly. That might help us.

SCOTT: Learn all you can about computer science and math and related subjects, and think outside the box and wow everyone with a new idea.

ELIEZER: Get security mindset. Figure out what’s going to go wrong. Figure out the flaws in your arguments for what’s going to go wrong. Try to get ahead of the curve. Don’t wait for reality to hit you over the head with things. This is very difficult. The people in evolutionary biology happen to have a bunch of knowledge about how to do it, based on the history of their own field, and the security-minded people in computer security, but it’s quite hard.

GARY: I’ll drink to all of that.

COLEMAN: Thanks to all three of you for this great conversation. I hope people got something out of it. With that said, we’re wrapped up. Thanks so much.

That’s it for this episode of Conversations with Coleman, guys. As always, thanks for watching, and feel free to tell me what you think by reviewing the podcast, commenting on social media, or sending me an email. To check out my other social media platforms, click the cards you see on screen. And don’t forget to like, share, and subscribe. See you next time.

July 21, 2023

Peter Rohde Response to “Pause Giant AI Experiments: An Open Letter”

I completely disagree with the open letter signed by Elon Musk and numerous other luminaries in which they advocate a moratorium on advancing AI so that time can be taken to consider the implications and risks associated with this technology.

• While the intention is well-meaning and the risks are real, the analysis is superficial and unlikely to play out as suggested.

• Although there are undeniably major risks presented by advanced AI, a moratorium is unlikely to further progress in dealing with them and more likely hinder it. Political responses to disruptive forces tend to be reactionary rather than preemptive and it is not foreseeable that during such a moratorium political and regulatory solutions will be implemented. It is naive to think that if presented with a six month window of opportunity to consider the implications of AI that politicians and regulators are going to make use of it to formulate a master plan. Societal consensus and political responses to complex emerging problems do not take place over such short timescales, and attempts to do so are likely to be poorly considered.

• Such a moratorium is necessarily voluntary as there are no mechanisms for global enforcement, meaning that only good actors will participate, tilting the balance of power in the AI sphere in favour of bad actors.

• Technological advancement is inherently disruptive and there are many instances through modern history where technology has made types of human labour redundant. However, it is very clear that embracing technology has in general driven humanity forward not backward.

• Attempting to inhibit technological advancement is largely futile and unenforceable. Adapting to embrace it is by far the best approach. Adaptation is an evolutionary process, not something that can be decided in advance. We are not in a position to make advance determinations as there are too many unknowns and the spectrum of implications is unclear.

• Obstructing technological advancement that competes against us is a form of protectionism. Recently Italy placed a ban on ChatGPT, and some other EU nations are reportedly considering the same. Doing so, rather than encouraging home-grown development of AI industries represents a major economic setback, enforces competitive disadvantage, and missed opportunity that risks future economic irrelevance. This is not to say that Italy’s privacy-related concerns have no merit. However, placing an outright ban on emerging technologies, rather than adapting to them in tandem with their development is backward thinking. The same line of reasoning could equally be used to justify banning any of the cloud services we all rely on or the internet as a whole.

• Yes, advanced AI will be highly disruptive, but also transformative, with the potential to act as a huge multiplier on productivity, which drives economic progress and human development. Wilfully delaying or missing this opportunity is economically and strategically destructive, handing power to competitors and adversaries.

• We definitely should be acting quickly in considering the ethical and broader implications of AI upon society, but placing a halt on technological progress isn’t going to expedite this process. That will happen as the implications becomes tangible, and in the meantime we’ll have only delayed progress for no reason.

• Openness and transparency are the most powerful forces against malevolent misuse. Driving things underground inhibits this, imposing opaqueness on the sector.

• Turning AI into a black market is completely foolish.

The post Response to “Pause Giant AI Experiments: An Open Letter” appeared first on Peter Rohde.

Peter Rohde An introduction to graph states

What is a graph state?

One especially useful class of quantum states is graph states, also known as cluster states. As the name suggests, a graph state |G\rangle is associated with a graph,

G=(V,E),

where the vertices (v\in V) represent qubits initialised into the

|+\rangle = (|0\rangle+|1\rangle)/\sqrt{2}

state, and edges (e\in E) represent the application of controlled-phase (CZ) gates between respective vertices, where,

\mathrm{CZ}=\mathrm{diag}(1,1,1,-1).

A graph state can therefore be expressed as,

|G\rangle = \left[ \prod_{e\in E} \mathrm{CZ}_e \right] |+\rangle^{\otimes |V|}.

Since CZ gates are diagonal and therefore commute with one another, the order in which they are applied is irrelevant, meaning there is great flexibility in the preparation of graph states and room for parallelisation in the application of the required CZ gates.

A graph state is defined with respect to a graph, where vertices represent qubits initialised into the |+\rangle state and edges represent the application of CZ gates between qubits. Since CZ gates commute, the order in which they are applied is irrelevant.

An important special case is the two-qubit graph state, which is locally equivalent to a maximally-entangled Bell pair. From the definition, a two-qubit graph state can be written,

\mathrm{CZ}|+\rangle|+\rangle = \frac{1}{2}(|0,0\rangle + |0,1\rangle + |1,0\rangle - |1,1\rangle).

Applying a Hadamard gate to either qubit we obtain,

H\cdot \mathrm{CZ}|+\rangle|+\rangle = \frac{1}{\sqrt{2}}(|0,0\rangle + |1,1\rangle),

the |\Phi^+\rangle Bell pair.

Note that while a two-qubit graph state is maximally entangled, graph states, in general, are not, whose entanglement structure is defined over graph neighbourhoods, which in general are not global. The exceptions are star and fully connected graphs, both locally equivalent to GHZ states, where the entire graph forms a single neighbourhood.

Measurement-based quantum computing

Graph states are incredibly useful as they enable measurement-based quantum computing (MBQC) [Raussendorf, Browne & Briegel (2003)], an alternative way of thinking about quantum computation to the regular circuit model. In MBQC, computation proceeds only by measuring qubits from a graph state of some topology (usually a lattice), where the order and basis in which qubits are measured dictates the implemented computation. This model for computation has no classical analogue and is uniquely quantum.

The universality of MBQC using graph states is easiest to demonstrate by considering its equivalence with the circuit model. Following the narrative of Nielsen (2005), this can be seen by noting that the following circuit for single-qubit quantum state teleportation relies on a resource of a |+\rangle state and a CZ gate.

Single-qubit quantum state teleportation protocol. This circuit teleports the state |\psi\rangle from the first qubit to the second, up to the measurement-dependent local operation X^mHZ_\theta, where m is the measurement outcome of the first qubit.

Note that the local operation accumulated by the second qubit depends on the measurement outcome, m, of the first. This is a general feature of MBQC; feedforward is required to apply subsequent measurement-outcome-dependent local corrections whenever a measurement is performed.

Combining single-qubit teleporters in a linear chain enables a qubit to be successively teleported, each time accumulating local operations. It can be seen that the state acting as a resource for this succession of teleporters is identically a linear graph state.

Concatenating the single-qubit teleporter allows the input qubit to accumulate single-qubit operations consecutively. Note that the input state acting as a resource for the successive teleporters is a linear graph state.

This provides a means for building up single-qubit operations using a linear graph state. We must also introduce an entangling two-qubit gate to enable universality, which we will choose to be the CZ gate.

Now consider two linear graphs that teleport two input qubits, accumulating single-qubit operations along the way. At some point, we encounter a vertical bridge between these two linear graphs. As the qubits teleport past this bridge, they accumulate a CZ operation between them, as this is identically what the bridge represents.

A two-qubit graph state computation. The input qubits are successively teleported from left to right as columns are measured out, accumulating single-qubit operations along the way. When the vertical bridge is encountered, they acquire the action of a CZ gate between them, as this is identically what the bridge represents.

Visually, we can see a direct connection between the circuit model and the MBQC model, where rows equate to logical qubits and columns to circuit depth. As columns of qubits are measured out from left to right, qubits are successively teleported from one column to the next, accumulating single-qubit operations along the way. When they encounter vertical bridges, the respective qubits acquire CZ operations, providing everything necessary for a universal quantum computation.

Measurement-based quantum computing could have equally been referred to as teleportation-based quantum computing.

Stabiliser representation

Graph states are stabiliser states, meaning that an N-qubit state can be fully specified by N stabilisers, each of which is an N-fold tensor product of Pauli operators up to a factor of \{\pm 1,\pm i\}. A stabiliser is an operator under which the stabilised state is invariant, and the state is uniquely specified as the simultaneous +1 eigenstate of all N stabilisers,

S_i|\psi\rangle = |\psi\rangle,\,\,\forall\, i\in \{1,\dots,N\}.

Graph states have a stabiliser of the form,

S_i = X_i \bigotimes_{j\in n_i} Z_j

associated with every vertex i, where n_i denotes the neighbourhood of i, the set of vertices connected to i by an edge.

The structure of these stabilisers can be understood by noting that the |+\rangle state is stabilised by the X operator, X|+\rangle = |+\rangle. Upon applying a CZ gate between this qubit and a neighbour, the commutation relation,

\mathrm{CZ}_{i,j}\cdot X_i = X_iZ_j\cdot\mathrm{CZ}_{i,j},

implies the X_i stabiliser for qubit i gains an additional Z_j operator for every neighbour j connected by a CZ gate.

Equivalently, stabiliser states are the class of states obtained by evolving computational input states under Clifford operations (i.e. CNOT, CZ, Hadamard, Paulis & phase gates). Note that Clifford circuits map stabiliser states to stabiliser states, which is always classically efficient and insufficient for universal quantum computing [Gottesman (1998)]. However, with the addition of the single-qubit, non-Clifford T gate we obtain a universal gate set.

As an example, consider the following linear graph.

A 3-qubit linear graph state.

This graph has the stabilisers,

S_1 = X_1 \otimes Z_2
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_2 \otimes X_3

If we were to apply a CZ gate to create a third edge, we would obtain a cyclic graph.

A 3-qubit cyclic graph state.

This would then have the stabilisers,

S_1 = X_1 \otimes Z_2 \otimes Z_3
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_1 \otimes Z_2 \otimes X_3

The stabilisers associated with graph states can be represented as binary matrices, also referred to as a tableau representation or generator matrix, comprising an N\times N block representing the location of X operators and another for the location of Z operators,

[\mathbf{X} | \mathbf{Z}].

Noting that rows in this matrix can be arbitrarily permuted, the structure of graph state stabilisers, with exactly one X operator associated with each vertex, implies the X block can be expressed as the identity matrix with appropriate reordering, in which case the Z block corresponds to the adjacency matrix of the graph, noting that the rows and columns of an adjacency matrix capture respective vertex neighbourhoods, the Z operators from each stabiliser,

[\mathbf{X} | \mathbf{Z}] \to[I_N | A_G].

Using the previous example of a three-qubit linear graph, the stabilisers,

S_1 = X_1 \otimes Z_2
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_2 \otimes X_3

can be expressed via the binary matrix,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 0\end{array}\right],

and it can be seen that the Z block corresponds to the expected adjacency matrix for a linear graph.

All stabiliser states are locally equivalent to graph states

In addition to graph states being stabiliser states, all stabiliser states are locally equivalent to graph states. An efficient classical algorithm with O(N^3) runtime exists for transforming arbitrary stabiliser states into graph states using local operations [Vijayan et al. (2022)]. Using stabiliser transformation rules in the binary representation, the goal is to diagonalise the X block, achieved using a variant of Gaussian elimination.

The concept is best illustrated by example. Consider the 3-qubit GHZ state, which may be represented using the stabilisers,

S_1 = X_1\otimes X_2\otimes X_3
S_2 = Z_1\otimes Z_2
S_3 = Z_2\otimes Z_3

with the binary matrix representation,

\left[\begin{array}{ccc|ccc} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1\end{array}\right]

Let us first apply Hadamard gates to qubits 2 and 3. Hadamard gates interchange X and Z operators, since HXH=Z and HZH=X, and stabilisers evolve under conjugation. In the binary matrix representation, this swaps the respective columns from the X and Z blocks,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0\end{array}\right].

Since the product of two stabilisers is also a stabiliser, arbitrarily multiplying them together, equivalent to bitwise XORing the respective matrix rows, does not change the description of the state. Applying S_3\to S_2 S_3, we obtain,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0\end{array}\right].

Our matrix is now in the required form, and the Z block corresponds to the adjacency matrix of the 3-qubit star graph. In general, an N-qubit star graph is locally equivalent to an N-qubit GHZ state, as is the fully connected graph, K_N.

Manipulating graph states

From the definition of a graph state, we know that CZ gates toggle the existence of edges between the vertices upon which they act. Graph states additionally exhibit elegant properties under the action of some gates and measurements, providing valuable tools for directly manipulating the graph structure associated with a state [Hein, Eisert & Briegel (2004), Hein et al. (2006)].

Local complementation

One important graph operation that will appear is local complementation (LC), represented by the graph operator \tau_i(G). Local complementation inverts (or complements) the edges in the subgraph induced by the neighbourhood of i.

Local complementation about a vertex, denoted \tau_i(G), inverts (or complements) the edges in the subgraph induced by the neighbourhood of i. Here, \tau_1(G) alternates us between the above two graphs. (left) The subgraph induced by the neighbourhood of vertex 1 is the completely disconnected graph with vertices 2, 3 and 4. (right) Upon local complementation, the respective subgraph is the completely connected graph. A further LC operation takes us back to the original graph. Both of these graphs are locally equivalent to the maximally entangled GHZ state.

Applying the square root of the graph stabiliser associated with vertex i,

\sqrt{S_i} = \sqrt{X_i} \bigotimes_{j\in n_i} \sqrt{Z_j},

directly implements the local complementation \tau_i,

\sqrt{S_i}|G\rangle \equiv |\tau_i(G)\rangle,

an isomorphism between the LC graph operator, \tau_i, and the state operator, \sqrt{S_i},

\sqrt{S_i} \cong \tau_i.

Note that \sqrt{S_i} comprises only local operations and hence does not affect entanglement. Nonetheless, we observe that its application generally changes the number of edges in the graph. This leads to the interesting observation that while edges in a graph state represent the application of maximally entangling CZ gates, the amount of entanglement in a graph state is not directly related to the number of edges. Locally equivalent graph states exhibiting identical entanglement structure can significantly differ in their edge count.

The extreme case is seen in the above example of the star and fully connected graphs, related by local complementation around the central vertex. Both of these states are locally equivalent to the maximally-entangled GHZ state, yet one contains the smallest number of edges a connected graph can have, O(N), while the other is maximally connected with O(N^2) edges.

While these are locally equivalent states with identical entanglement structure and computational characteristics, they differ enormously from an architectural implementation perspective. Were these two LC-equivalent states to be physically prepared using CZ gates, the star graph would clearly be preferable owing to its quadratically lower CZ count.

Single-qubit gates are generally far easier to implement than entangling ones, and theoreticians often assume they come for free. From the perspective of minimising CZ gate usage during state preparation, the local complementation orbit of a graph provides an avenue for optimisation.

Local complementation orbits

Since local complementation comprises only local operations, all graphs related by LC are locally equivalent. The entire class of graphs related by LC is referred to as the graph’s (local complementation) orbit, whose size, in general, grows exponentially with the number of vertices, |V|.

Note that local complementations about different vertices do not commute in general,

\tau_i\circ\tau_j(G) \neq \tau_j\circ\tau_i(G),\,\,\forall\, G,i,j.

Simultaneously, the LC operation associated with a given vertex is its own inverse, an involution,

\tau_i =\tau_i^{-1},
\tau_i^2 = I.

The LC orbit of a graph admits a non-Abelian group structure, where the LC operators \tau_i act as the group generators, and the graphs in the orbit constitute the elements of the group. The orbit may itself be represented as a graph, the group’s Cayley graph. Here, vertices represent individual graphs in the orbit, and edges, labelled by group generators, relate them via LC operators. Since the LC operators are involutions, the Cayley graph of a graph orbit is undirected.

The local complementation orbit of the 4-qubit linear graph, represented by its Cayley graph. Vertices represent individual graphs, related by local complementation where an edge exists. Edges are labelled by the indices of the LC operators \tau_i relating graphs. All graphs within an orbit are locally equivalent and exhibit the same entanglement structure, but generally differ in their number of edges. [Figure thanks to Adcock et al. (2020)]

Graph orbits are hard to explore, given their in-general exponential scaling and the complexity of navigating them. Finding the LC-equivalence between two graphs is NP-complete in general, while the problem of counting the orbits of a graph is in general #P-complete [Dahlberg, Helsen & Wehner (2019); Adcock et al. (2020)].

A sequence of LC operations,

\tau_{\vec{u}}(G) = \tau_{u_{|u|}} \circ \dots \circ \tau_{u_1}(G)

may be represented by an ordered (since LC operators are non-commutative) list of vertex labels where complementations took place,

\vec{u} = \{u_1,\dots,u_{|u|}\},\,\,u_i\in V\,\,\forall\,i,

which may be of any length and have repeat entries.

A graph G’ is said to be a vertex-minor of G if it can be reached from G via a sequence of local complementations, \vec{u}, and vertex deletions, for which we use the notation,

G’<G.

The vertex-minor problem is the decision problem of finding a sequence of local complementations, \vec{u}, such that \tau_{\vec{u}}(G) = G’, up to vertex deletions, which may always be performed at the end.

This problem is known to be NP-complete in general, requiring exponential classical (and quantum) runtime. While complexity proofs are onerous, the intuition behind the NP-completeness of the vertex-minor problem can be seen by framing it as a satisfiability problem.

Defining the polynomial-time function,

f(\vec{u},G,G’)\to \begin{cases} 1, & \tau_{\vec{u}}(G) = G’ \\ 0, & \tau_{\vec{u}}(G) \neq G’ \end{cases}

solving vertex-minor is now equivalent to finding a satisfying input string, \vec{u}, such that f(\vec{u},G,G’)=1. Unstructured satisfiability (or SAT) problems of this form are NP-complete in general, noting there is more nuance than this as this is not an unstructured satisfiability problem.

Counting the orbits of the N-qubit GHZ state is definitely not #P-complete, as there are always exactly two. It therefore exhibits O(1) time-complexity, where 1=2.

Pauli measurements

Up to measurement-outcome-dependent local corrections, Pauli measurements implement simple graph transformation rules as follows:

  • Pauli-Z: Delete vertex i from G,

    G\to G-i.
  • Pauli-Y: Locally complement the neighbourhood of i and delete vertex i,

    G\to \tau_i(G)-i.
  • Pauli-X: For any qubit b neighbouring i, locally complement b, apply rule for Y, then locally complement b again,

    G\to \tau_b(\tau_i \circ \tau_b(G)-i).
The effect of Pauli X, Y and Z measurements on the red qubit in the linear graph state shown at the top. In all cases, the measured qubit is detached from the graph by removing all its connecting edges. The choice of measurement basis additionally imposes an update rule on the remaining edge set.

Since Pauli measurements induce simple graph transformation rules, this immediately implies that graph states combined with Pauli measurements are classically efficient to simulate; simultaneously, Pauli measurements alone are insufficient for universal MBQC, which cannot be efficiently classically simulated. Furthermore, any local Clifford operation may be expressed in terms of efficient graph transformation rules [Van den Nest, Dehaene & De Moor (2004)]. This observation is analogous to the Gottesman-Knill theorem [Gottesman (1998)], that stabiliser states in combination with Clifford evolution and Pauli measurements are classically efficient to simulate, recalling that all stabiliser states are locally equivalent to graph states.

Graph surgery

These measurement properties facilitate convenient surgical graph operations. Suppose we became aware that we lost a qubit from a graph state. Rather than discard the entire state and start from scratch, we can measure out the neighbouring qubits in the Z-basis, thereby detaching the lost qubit from the graph, after which the damaged section can be reconstructed by replacing the measured and lost qubits with fresh ones and reconnecting them using CZ gates.

If a qubit is lost (red) from a graph state, discarding the entire state and preparing from scratch is unnecessary. Instead, we can measure out its neighbouring qubits (orange) in the Z basis (left), replace the lost and measured qubits with fresh |+\rangle states (green), and apply the necessary CZ gates to reconstruct the lost edges (right). Edges outside the defect’s neighbourhood are unaffected (blue vertices on the left).

Pauli measurements also provide the tools for contracting larger graphs with redundant qubits down to ones with the required substructure, thereby etching the structure into the graph. Any subgraph of G induced by vertex set S, G[S], may be obtained by measuring out all vertices V(G-S) in the Z basis, while Y measurements may be employed to perform contractions. This provides an elegant means for arguing that lattices of various topologies are universal for MBQC — if a lattice can be reduced to substructures reflecting arbitrary quantum circuits, it can be considered a substrate for universal MBQC.

Reducing a lattice graph to one with two horizontal linear chains connected by a single vertical bridge, analogous to the earlier graph for simulating a circuit comprising single-qubit operations and a CZ gate.

Graph state compilation

Quantum algorithms are often designed in the circuit model. While etching circuits into a corresponding graph structure works by equivalence, it doesn’t exploit the more general graph-theoretic structure of graph states, rather imposing an existing way of thinking onto a different paradigm in a highly constrained way.

A more direct and resource-efficient approach for the compilation of quantum circuits into graph states was presented by Vijayan et al. (2022), known as algorithm-specific graph states, distinguishing them from universal ones like lattices. Conceptually, the approach is to structure quantum algorithms into a form where all Clifford circuitry — the bulk of most quantum algorithms — acts on the computational input state, thereby preparing a stabiliser state, while non-Clifford gates are performed at the end and may be absorbed into non-Clifford measurements.

Considering a universal gate set comprising Clifford and T gates, it is only the non-Clifford, single-qubit T gates that present an obstacle to graph state representation. One well-known technique for converting Clifford + T circuits into Clifford-only circuits is using magic state injection. Here, performing single-qubit teleportation using a,

|A\rangle=T|+\rangle=(|0\rangle+e^{i\pi/4}|1\rangle)/\sqrt{2}

resource state, known as a magic state, implements quantum gate teleportation of a T gate. Since the teleportation circuit itself is Clifford-only, this allows any circuit to be expressed in a form comprising only Clifford operations, where some inputs are non-stabiliser states.

The T gate teleportation protocol can also be inverted, relying on measurement of the non-Clifford

A(\pi/4)=T^\dag X T

observable instead of magic states. This inverted form allows Clifford + T circuits to be expressed as stabiliser states followed by non-Clifford measurements.

T gate teleportation using the inverse-ICM formalism. This circuit teleports the action of a T gate onto |\psi\rangle. Unlike the conventional approach of magic state injection, which relies on the non-stabiliser resource state |A\rangle=T|+\rangle=(|0\rangle+e^{i\pi/4}|1\rangle)/\sqrt{2}, the inverted model relies only on a |0\rangle resource state, absorbing the T gate into measurement of the non-Clifford A(\pi/4)=T^\dag X T observable. Substituting this sub-circuit in place of all T gates within a Clifford + T circuit converts it to a form comprising stabiliser state preparation followed by non-Clifford measurements, equivalently a graph state followed by non-Clifford measurements.

Since stabiliser states are locally equivalent to graph states, this decomposition allows universal quantum circuits to be directly compiled to graph states, where computation proceeds via non-Clifford measurements.

Clifford + T decomposition of the non-Clifford Toffoli gate.

Algorithm-specific graph states capture the entanglement structure of algorithms much more naturally than an etched circuit would. However, the resultant graphs are not unique, as their orbits are computationally equivalent.

Compilation of a Toffoli gate acting on the |+,+,0\rangle input to a graph state, where input and output qubits are shown in green and blue, and all qubits except the outputs are measured in the A(\pm\pi/4) basis. [Figure thanks to Vijayan et al. (2022)]

Graph optimisation therefore becomes an important next stage in the compilation pipeline. The optimal graph for implementing a given algorithm has many variables and is subject to architectural constraints, but identifying them generally yields computationally complex optimisation problems associated with the complexity of exploring graph orbits.

Preparing graph states

The commutativity of CZ gates implies they are order-independent. In the context of graph states, this means edges needn’t be built up sequentially but can be parallelised, and divide-and-conquer strategies may be employed to build up large graph states from smaller subgraphs that may be prepared independently and in parallel.

This becomes especially useful when entangling gates are non-deterministic, as is often the case in many physical architectures and necessarily the case when dealing with photonic implementation.

Consider a linear graph state with N qubits. One might think that preparing this state requires successfully performing N-1 CZ gates in succession, meaning that if the gates are non-deterministic with success probability p_\mathrm{gate}, the overall state preparation probability would be p_\mathrm{gate}^{N-1}, which is exponentially decreasing and therefore inefficient.

However, this is not the correct intuition for graph states, which are generally not maximally entangled and from the Pauli measurement properties, enable a state to be rescued via surgical operations when local defects occur.

Rather than building the linear graph in a single shot, let us consider the following scenario, taken from Rohde & Barrett (2007). We have a main linear graph of length N and a resource of small linear graphs of length M, which are prepared offline. We are assuming the availability of quantum memory, or qubits with inherently long decoherence times, to enable this protocol.

We will employ a CZ gate that succeeds with probability p_\mathrm{gate}, and upon failure, destroys the qubits it acts upon, effectively tracing them out of the system.

Upon applying the CZ gate between the two graphs of length N and M, with probability p_\mathrm{gate}, the main graph grows in length to N+M. Upon failure, with probability 1-p_\mathrm{gate}, the end qubit is destroyed, and we recover the remainder of the graph by measuring out its neighbour in the Z basis, shrinking it in length to N-2.

Applying this process repeatedly, the length of our main graph proceeds as a random walk, probabilistically growing or shrinking. The figure of merit is the average length by which the graph grows or contracts. In our case, this can easily be calculated to be,

\langle\Delta\rangle = p_\mathrm{gate}M - 2(1-p_\mathrm{gate}).

If the graph is to grow on average, we require \langle\Delta\rangle > 0, which implies,

M>2(1-p_\mathrm{gate})/p_\mathrm{gate}.

A CZ gate operating with 50% probability would only require M=3 resource states to ensure growth on average.

So long as the resource states we employ are at least length M, our main graph is guaranteed to grow on average, with length growing linearly in the number of iterations t,

\langle N(t)\rangle = N(0) + \langle\Delta\rangle\cdot t.

When bonding a primary linear graph state of length N with a smaller resource state of length M using a non-deterministic CZ gate with success probability p_\mathrm{gate}, the primary graph either: grows in length to N+M with probability p_\mathrm{gate}; or, after recovery using a Z measurement to detach the destroyed qubit, shrinks in length to N-2 with probability 1-p_\mathrm{gate}. Repeating this process, the primary graph’s length evolves as a random walk. If the resource states have length M>2(1-p_\mathrm{gate})/p_\mathrm{gate}, the primary graph will grow on average, increasing linearly with time.

While the above argument is for linear graph states, linear graphs alone are not computationally useful. For MBQC, we require at least a two-dimensional topology, such as a lattice. It was first shown by Nielsen (2004) how graph structure can be exploited to enable efficient preparation of arbitrarily large lattice graph states using non-deterministic gates by introducing the concept of micro-clusters. Micro-clusters are small subgraphs of a larger graph state with multiple dangling nodes, each providing an independent bonding opportunity. Micro-clusters, therefore, provide redundancy in bonding attempts should they sometimes fail.

Preparing a 2\times 2 lattice graph state using four micro-clusters. Each micro-cluster comprises a central vertex (green), to belong to the final lattice graph. The dangling bonds (blue) provide multiple opportunities to attempt bonding with neighbouring micro-clusters. If a CZ bonding attempt fails (red), the respective dangling bonds are removed. Upon success (green), the respective micro-clusters are connected, albeit with some intermittent vertices. Once bonds exist in each direction, redundant vertices are measured out in the Pauli Y basis, contracting the graph down to the desired 2\times 2 lattice.

How large do micro-clusters need to be? The probability of a micro-cluster failing to bond with another drops exponentially with the number of available attempts, M,

p_\mathrm{fail} = (1-p_\mathrm{gate})^M,

which implies the number of available bonds must scale as,

M>\frac{\log(p_\mathrm{fail})}{\log(1-p_\mathrm{gate})},

which is only logarithmic and highly efficient. Achieving 99% bonding probability using gates with 50% success probability requires only M=7 available bonds.

This observation affords enormous resource savings in the context of linear optics quantum computing (LOQC), where micro-clusters were first introduced. Resource overheads associated with the non-determinism of entangling gates in LOQC using the initial circuit-based construction described by Knill, Laflamme & Milburn (2001) (aka KLM) were ‘efficient’ in the computer scientist’s sense of exhibiting polynomial scaling but not in a practical sense as the polynomials were unpleasant ones.

Percolation theory

An edge-percolated lattice for p_\mathrm{delete}=0.7, 0.4 and 0.1. Asking whether a route exists across the graph, there will exist a percolation threshold probability, p_c, above which route existence is unlikely, below which it is likely. The middle figure corresponds to the associated threshold probability, where on average, a path is likely to exist across the graph, but if the deletion rate were increased would quickly lose connectivity and break into disconnected islands, as per the left figure. [Figure thanks to Browne et al. (2008)]

Percolation theory studies the connectedness of percolated graphs (ones with random defects) as a function of defect rate, p_\mathrm{delete}. Percolations could apply to either edges or vertices, and the measure of connectedness could be defined in many ways, commonly whether routes exist spanning the graph.

Taking an edge- or vertex-percolated lattice in the limit of large lattice dimension, L, and considering the probability of connectedness as a function of percolation probability, p_\mathrm{delete}, percolation theory predicts phase-transitions in graph connectivity at some percolation threshold, p_c.

Probability of the existence of a spanning cluster across a lattice of dimension L against edge deletion probability p_\mathrm{delete}. In the limit of infinite lattice dimension, L\to\infty, the curve approaches a step function, marking the percolation threshold, p_c, of a phase-transition in graph connectivity. [Figure thanks to Wei, Affleck & Raussendorf (2012)]

Consider a lattice of micro-clusters. As a function of their size and gate success probability, there exists an associated probability of edge deletion in the subsequent reduced substrate graph.

The relationship between lattice dimension and computational power implies one between percolation probability and computational power [Browne et al. (2008); Pant et al. (2019); Wei, Affleck & Raussendorf (2012)]. Since the likelihood of finding paths through a graph is a function of the percolation rate, so is the expected density, hence dimension, of a lattice reduced from the available paths.

A percolated lattice graph state with missing qubits (i.e. vertex percolations — one could likewise consider edge percolations), where squares are qubits, yellow if present.

A route-finding algorithm finds sets of edge-disjoint left-to-right (red) and top-to-bottom (green) paths. The path density is a function of the percolation rate.

The path set is reduced to a regularised substrate, here an alternating bridge decomposition, a resource for universal MBQC.

All qubits not belonging to our path-set are measured out in the Z basis, while paths are contracted using Y measurements, reducing the graph to a regularised substrate of some dimension, which directly relates to its computational power.

This yields a relationship between computational power and percolation probability, where we observe computational phase-transitions for large graph dimension, L\to\infty. Above threshold, p_\mathrm{delete}>p_c, we rapidly lose computational power as the graph disconnects into islands and spanning paths cease to exist. The percolation threshold is an important parameter for engineering scalable architectures using this approach.

Since the dimension of the residual lattice is a function of the percolation rate, in turn a function of micro-cluster size and gate success probability, there exists a trade-off between the resources invested into micro-cluster size and computational return, a point of optimisation from an engineering perspective.

Entangling operations

Although by definition, the edges in graph states represent the action of CZ gates, these are not the only gates capable of growing graph states. Depending on the physical implementation, various other approaches to creating graph edges exist.

In photonic quantum computing, fusion gates [Browne & Rudolph (2005)] can be used to create connections in graphs. Fusion gates are partial Bell measurements, implemented using polarising beamsplitters (see my previous post, “How do photonic Bell measurements work?”). While fusion gates create new edges in a graph state, they are also destructive, meaning they destroy the two qubits they operate on. This implies some resource overhead, although this overhead is far more favourable than that imposed by optical CZ gates, which are highly complex and resource intensive to implement.

The type-I and -II fusion gates enable the creation of edges in graph states comprising polarisation-encoded qubits. Unlike CZ gates, fusion gates are destructive, consuming one (for type-I) or both (for type-II) of the qubits they operate on, creating edges within their neighbourhood that differ from the CZ edge-creation rule, but are nonetheless sufficient for engineering arbitrarily large graph states. These gates are non-deterministic with a 50% success probability.

In atomic systems with suitable level structure, where energy levels couple to optical modes, a beamsplitter followed by photo-detection implements which-path erasure on photons emitted via relaxation processes, projecting the two atoms into a symmetric superposition of one being excited state and the other relaxed, equivalent to a Bell projection.

Error propagation

As with quantum circuits, graph states are subject to various error models. How do these propagate through graph states?

Since quantum information flows through graph states via teleportation, so will errors. Consider the single-qubit teleportation circuit, where the local correction applied to the second qubit depends on the measurement outcome of the first. If a Pauli error was introduced onto the first qubit, flipping its measurement outcome, m, this would subsequently manifest itself as a flipped local correction on the second qubit, thereby propagating the error.

In general, as qubits are measured out of a graph state, errors acting upon them are teleported onto neighbouring qubits and accumulate. In the context of micro-cluster-based approaches for tolerating gate failure, the dangling bonds facilitating multiple bonding attempts, which must subsequently be removed, imposes a trade-off between errors. While more dangling bonds implies greater tolerance against gate failure, it simultaneously implies lower tolerance against Pauli errors, which accumulate whenever a redundant bond is removed via measurement.

It is desirable to avoid unnecessarily preparing and subsequently measuring out redundant qubits, and the overall design of a measurement-based architecture must carefully evaluate the inherent tradeoffs between different error types.

Although lattice graphs are universal for MBQC, as arbitrary circuit structures can be etched into them, in addition to being wasteful, this results in unnecessary accumulation of errors.

Quantum error correction

Graph states are not inherently error-protected, and quantum error correction (QEC) is required as with any other model of quantum computation. Thankfully, this does not require abandoning the inherent elegance of graph states as there are graph-native QEC codes. In particular, topological codes naturally lend themselves to graph-based implementation [Raussendorf, Harrington & Goyal (2006)].

An example is the surface code [Kitaev (2003)], defined relative to a square lattice graph. Although the way surface code stabilisers are defined is distinct from graph states, a direct mapping exists between them under appropriate graph transformations [Bravyi & Raussendorf (2007)].

The surface code is defined relative to a graph, albeit with a different convention in defining stabilisers. Qubits are associated with graph edges, and two types of stabilisers define the state: every square (blue) is associated with a plaquette operator acting on its four constituent qubits, S_\square = X^{\otimes 4}; and, every star (red) with a star operator, S_+ = Z^{\otimes 4}. The property that products of stabilisers are also stabilisers lends itself to elegant geometric interpretations. For example, taking the product of a set of neighbouring S_\square operators implies a chain of X operators around the boundary of a closed region of the surface is also a stabiliser.

Such graph-based QEC codes connect us with the field of topology, where the manifestation of errors and implementation of logical operations may be defined in terms of topological invariants — properties that remain invariant under continuous deformation, from which they inherit their robustness.

For example, continuously deforming a non-trivial loop around a torus (see figure below) preserves its topological characterisation. Associating logical operators with operator strings around a loop bestows redundancy in how the logical operator can be applied. Should defects rule out a particular path, another topologically equivalent one can be chosen, allowing defects to be bypassed.

Imposing periodic boundary conditions on the surface code yields the toric code, defined over the surface of a torus. Here logical X (blue) and Z (red) operators are associated with chains of respective Pauli operators acting on the qubits around topologically distinct closed loops. The topological characterisation of these loops is invariant under continuous deformation, as are the logical operations associated with them. Note that the red and blue chains are topologically distinct and cannot be continuously deformed into one another. In contrast, the two blue loops are topologically equivalent and correspond to the same logical operator. Conceptually, the error tolerance of the toric code stems from the robustness of these topological invariants against defects, scaling with surface dimension.

Fusion-based quantum computing

Fusion-based quantum computing [Bartolucci et al. (2021)] is a scalable, fault-tolerant, photonic architecture (although also applicable to other physical architectures), integrating many of the concepts we have discussed. This scheme admits more elaborate resource states with small QEC codes built into them, such that some qubits act as parity checks when measured.

(a) A hexagonal resource state can act as a unit cell in a three-dimensional fusion network. (b) A graph substitution rule allows each qubit to be encoded into a (2,2)-Shor code. (c) An error-protected hexagonal resource state, obtained by substituting (b) into (a). [Figure thanks to Bartolucci et al. (2021)]

Fusion gates (non-deterministic, destructive Bell measurements), implemented using polarising beamsplitters (discussed earlier), enable fusion between unit cells of encoded resource states into a three-dimensional fusion network with random defects associated with unsuccessful fusions.

Fusion measurement outcomes reveal a percolated three-dimensional lattice with average case connectivity sufficient to post-select a substructure supporting a scalable topological code in addition to the syndrome outcomes for the associated code.

A three-dimensional fusion network using 4-star resource states. Fusion operations are non-deterministic, and the fused structure will be percolated. With sufficient average-case connectivity, substructures can be post-selected that encode a measurement-based quantum computation into a topological code whose syndrome measurements are revealed by fusion outcomes. [Figure thanks to Bartolucci et al. (2021)]

Quantum communications networks

The goal of quantum communications networks is usually to distribute long-range Bell pairs, equivalently two-qubit graph states. Bell pairs act as a universal resource for quantum communication as they enable quantum state teleportation.

Since long-range quantum communication is generally very lossy, with attenuation increasing exponentially with distance, quantum repeater networks (or entanglement distribution networks) use divide-and-conquer techniques to subdivide long-range links into multiple short-range ones, which are iteratively merged using entanglement swapping, enabling the exponential scaling of loss with distance to be overcome and replaced with polynomial scaling.

An efficient divide-and-conquer approach for long-range entanglement distribution in the presence of lossy channels. Execution flows from bottom to top; blue nodes represent Bell pairs, and ones with child nodes are prepared by entanglement swapping them, reducing two shorter-range Bell pairs into one longer-range one. It is assumed that quantum memory is available, such that a parent node can await both its children and branches can execute in parallel. Runtime is exponential in the depth of the tree since the root node, representing the final long-range Bell pair, requires all child nodes to succeed. However, since the binary tree execution order has only logarithmic depth in the number of initial short-range Bell pairs, this directly counters the exponential, resulting in net polynomial scaling in both time and Bell pair consumption.

This can also be conceptualised in terms of graph states. Based upon the same intuition for efficiently growing arbitrarily large graph states using non-deterministic gates, graph states can similarly be employed for efficient entanglement distribution in the presence of lossy channels, allowing the exponential attenuation of single-shot transmission to be overcome and replaced with efficient polynomial scaling.

Our graph state tools for dealing with gate failure (equivalently loss) can be adapted for this purpose. Using micro-cluster-type concepts, redundant bonds counter channel loss by facilitating multiple bonding attempts, where the failure of a single attempt does not compromise the remaining state.

The complete-like micro-cluster graph, |\bar{G}_c^m\rangle, here with m=4. (blue) The complete graph, K_{2m}, has an edge between every pair of the 2m vertices. Deleting all but two leaves us with a K_2 graph, a Bell pair between the remaining two qubits. The measurements determining which two are chosen can be made at any stage and deferred. Transmitting the left and right halves of the dangling bonds (yellow) to different parties via lossy channels enables m transmission attempts in each direction. If at least one qubit from each half reaches its destination, appropriate measurements on the K_{2m} subgraph retrospectively routes them together. The likelihood of complete failure decreases exponentially with m.

Applying this iteratively, the goal is to engineer a distributed graph state, which can subsequently be reduced to a long-range Bell pair upon measuring out redundant, intermittent qubits.

A pure graph state-based quantum repeater for long-range entanglement distribution over lossy channels [Azuma, Tamaki & Lo (2015)]. Source nodes, C^s, distribute routing substrate graphs (grey, as per previous figure) with m-fold redundancy (here m=3), |\bar{G}_c^m\rangle, which receiver nodes, C^r, fuse together using probabilistic Bell measurements. The m-fold bonding redundancy facilitated by the |\bar{G}_c^m\rangle states enables non-determinism associated with channel loss and gate failure to be asymptotically suppressed with m, allowing efficient preparation of a distributed graph state, which can subsequently be reduced to a long-range Bell pair. This is a direct graph-theoretic generalisation of an ordinary Bell pair entanglement swapping network, which this reduces to for m=1. [Figure thanks to Azuma, Tamaki & Lo (2015)]

In addition to their utility in efficiently overcoming loss in quantum repeater networks, distributed graph states act as a versatile resource for entanglement distribution. Assuming nodes in the network are cooperative and classically communicating, we have the tools to etch out subgraphs (using Z measurements) and contract them down to involve only the desired parties (using X and Y measurements).

Of particular interest are Bell pairs. Our graph transformation rules for Pauli measurements guarantee that any connected graph may be reduced to a Bell pair between any pair of nodes.

Entanglement routing by reducing a large, distributed graph state to a Bell pair between Alice (A) and Bob (B). First, we identify a path between Alice and Bob. All nodes neighbouring the path measure their qubits in the Z basis, detaching the path from the remainder of the graph. All nodes belonging to the path (except Alice and Bob) measure their qubits in the Y basis, contracting it to a two-qubit graph state (i.e. a Bell pair) between Alice and Bob.

Other states of interest might also be available depending on the initial graph’s topology. GHZ states are of interest as they are maximally entangled states that facilitate various multiparty quantum protocols such as open-destination quantum state teleportation and quantum anonymous broadcasting [Christandl & Wehner (2005)]. These can be realised by etching out star graphs. In the special case of 3-qubit GHZ states, any connected 3-qubit graph state, of which there are only two (star and fully connected), is locally equivalent to a GHZ state, and any connected graph with at least three qubits can necessarily be reduced to these using Pauli measurements.

However, we are not limited to measurements when manipulating distributed graph states. We can also play local complementation games, which can be exploited in graph-state-based entanglement routing [Hahn, Pappa & Eisert (2019)].

Distributed quantum computing

In a futuristic scenario where large-scale quantum computers and quantum communications networks are readily available, there is a strong incentive to unify geographically dispersed quantum computational resources. Given that the power of quantum computers can, depending on the application, grow exponentially with their number of logical qubits, unifying quantum computers to act as one effectively multiplies their power, which would only accumulate additively in the absence of unification, as per classical computers.

Graph states provide an elegant framework for conceptualising how to architecturally achieve this unification.

The dimensions of the underlying lattice graph dictate the size and power of a MBQC. In the earlier example, the width of the lattice equated with circuit depth and height with the number of logical qubits involved in the computation.

Suppose two independent quantum computers individually had the capacity for N\times N lattice graphs. Rather than use them separately, with quantum communication at our disposal, enabling the creation of long-range links, our two units could stitch their lattices together in a patchwork manner, providing a distributed 2N\times N lattice, affording twice as many logical qubits or twice the circuit depth, an idea that logically generalises to any topology or number of nodes.

Unifying two geographically dispersed graph states into a larger distributed graph state using long-range Bell pairs provided by an entanglement distribution network. The increased dimension of the unified graph facilitates a larger MBQC in terms of circuit depth or logical qubit count, depending on orientation. The idea logically generalises in an obvious way, enabling an arbitrarily large distributed graph to be prepared by patchwork. [Figure thanks to Leone et al. (2021)]

Conclusion

Graph states are often equated with measurement-based quantum computing and seen merely as a different way of achieving the same thing. However, graph states provide far more than a direct equivalence with other models of quantum computation, like the circuit model. They provide an alternate framework for conceptualising the flow and interaction of quantum information, enabling unique insights that would not come naturally from the confines of other models. These insights have been of enormous value beyond computation, finding widespread utility in other quantum information processing protocols.

The geometric abstractions graph states provide have facilitated significant advances in solving practical problems, without which a quantum future would be far less likely. In photonic quantum computing, for example, implementation via the circuit model imposes formidable — effectively prohibitive — resource overheads, for which the graph state model affords natural solutions.

Graph states have enabled quantum information processing to be united with otherwise disparate fields of mathematics, providing direct connections with graph theory, percolation theory and topology. Graph theory provides powerful tools for conceptualising the flow of quantum information and associating quantum operations with graph transformations. The connection with the field of percolation theory offers insight into the relationship between errors and quantum computational power. Graph theoretical approaches towards quantum error correction intimately connect with the field of topology, in which codes and operations can be abstracted in terms of topological invariants.

As quantum information scientists, we attempt to understand things inherently too complex to contemplate directly and must rely on different types and levels of abstraction. Problems can generally be approached from different angles, and looking at the same problem through a different lens can yield new insights and a deeper understanding. Having a broad toolkit of different angles of interpretation for viewing problems is essential. Graph states are an incredibly powerful one, enabling us to find solutions to problems that might otherwise not have been found.

Acknowledgements

Thank you very much to the authors allowing figures from their work to be reproduced in this post, acknowledged individually in figure captions. Thanks to Dan Browne for helpful feedback.

The post An introduction to graph states appeared first on Peter Rohde.

Peter Rohde How do photonic Bell measurements work?

Entangling Bell measurements are an essential ingredient in many photonic quantum technologies. In optical quantum computing they are employed as fusion gates to create edges in graph states, while in quantum communications protocols they may be used to implement entanglement swapping in quantum repeater networks (entanglement distribution networks) for extending the range of entanglement links.

In this post I’ll describe how this very simple optical circuit works and address some of the nuances and common misconceptions surrounding it.

What is a Bell measurement?

A Bell measurement is a two-qubit operation that projects onto the maximally-entangled Bell basis comprising the four Bell states,

|\Phi^\pm\rangle_L = \frac{1}{\sqrt{2}}(|0,0\rangle_L \pm |1,1\rangle_L),
|\Psi^\pm\rangle_L = \frac{1}{\sqrt{2}}(|0,1\rangle_L \pm |1,0\rangle_L).

Here I’m using subscript L to denote logical qubit states. States represented without a subscript will denote Fock (or photon-number) states in an occupation number representation, where |n\rangle denotes an n-photon state.

While there are many ways in which entangling measurements can be implemented photonically, I’ll focus on by far the simplest, most well-known and widely employed implementation shown below.

The partial Bell analyser for polarisation-encoded qubits, comprising a polarising beamsplitter, two waveplates implementing 45° polarisation rotations, and two polarisation-resolving photodetectors. When one photon is detected at each output (a coincidence event) we perform a partial Bell measurement.

This circuit implements a partial, destructive and non-deterministic Bell measurement. It is partial in the sense that it can only resolve two of the four Bell states. Otherwise it fails, implying non-determinism. And it is destructive in the sense that the measured qubits are destroyed by the measurement process.

The measurement projector implemented by this device is,

\hat\Pi^\pm_L =|\Phi^\pm\rangle_L\langle\Phi^\pm|_L,

a coherent projection onto one of the two even parity Bell pairs.

Bell measurements can also be implemented using CNOT gates, in which case all four Bell states can be non-destructively resolved. However, CNOT gates are notoriously difficult to construct in an optical setting, are non-deterministic, and have significant resource overheads.

Beamsplitters

A regular beamsplitter implements a 2\times 2 unitary transformation on the photon creation operators associated with two spatial modes, which we will denote \hat{a}^\dag_1 and \hat{a}^\dag_2,

\begin{bmatrix} \hat{a}^\dag_1 \\ \hat{a}^\dag_2\end{bmatrix} \to \begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2}\end{bmatrix} \begin{bmatrix} \hat{a}^\dag_1 \\ \hat{a}^\dag_2\end{bmatrix}.

Here we’re modeling evolution in the Heisenberg picture, representing state evolution via transformations on the photon creation operators acting on the vacuum state. This is the most convenient approach since all the operations we consider are represented by linear transformations of creation operators, hence the term linear optics.

For a balanced 50/50 beamsplitter we have,

U = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1\end{bmatrix},

which is recognisable as the 2\times 2 Hadamard matrix.

This is an entangling operation as it can easily be seen that the state,

|1,0\rangle = \hat{a}^\dag_1|vac\rangle,

is evolved to,

\frac{1}{\sqrt{2}}(\hat{a}^\dag_1 + \hat{a}^\dag_2)|vac\rangle = \frac{1}{\sqrt{2}}(|1,0\rangle + |0,1\rangle),

a Bell state encoded as a superposition of a single particle across two orthogonal modes.

A single photon incident upon a regular beamsplitter creates an entangled output state of a superposition of a single photon across two spatial modes.

Polarisation rotations

A polarisation rotation, usually implemented using waveplates in experiments, implements exactly the same transformation in the polarisation degree of freedom,

\begin{bmatrix} \hat{h}^\dag \\ \hat{v}^\dag \end{bmatrix} \to\begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2}\end{bmatrix} \begin{bmatrix} \hat{h}^\dag \\ \hat{v}^\dag \end{bmatrix},

where \hat{h}^\dag and \hat{v}^\dag denote creation operators associated with horizontal and vertical polarisation.

Hence an input state,

\hat{h}^\dag|vac\rangle= |1\rangle_H|0\rangle_V,

is evolved by the Hadamard matrix to,

\frac{1}{\sqrt{2}}(\hat{h}^\dag + \hat{v}^\dag)|vac\rangle = \frac{1}{\sqrt{2}}(|1\rangle_H|0\rangle_V + |0\rangle_H|1\rangle_V).

Indeed, beamsplitters and polarisation rotations are isomorphic operations, implementing identical optical transformations, differing only in which pair of modes they operate on.

Hong-Ou-Mandel interference

Hong-Ou-Mandel (HOM) interference is a famous interferometric experiment in which a 50/50 beamsplitter interferes two photons, one incident upon each beamsplitter input.

In Hong-Ou-Mandel (HOM) interference, a balanced 50/50 beamsplitter with a single photon incident at each input mode creates an equal superposition of both photons in one mode or the other at the output, known as photon bunching. These measurement statistics are uniquely quantum. In the equivalent classical experiment where each photon has 50% probability of reaching each output we would observe anti-bunched events with 50% probability.

Using the 50/50 beamsplitter transformation, an initial state with a single photon at each input,

\hat{a}^\dag_1 \hat{a}^\dag_2 |vac\rangle = |1,1\rangle,

transforms to,

\frac{1}{2}(\hat{a}^\dag_1 + \hat{a}^\dag_2)(\hat{a}^\dag_1 - \hat{a}^\dag_2)|vac\rangle = \frac{1}{\sqrt{2}}(|2,0\rangle - |0,2\rangle),

a superposition of two photons in one spatial output or two in the other. Note there is no |1,1\rangle term, as these have cancelled via destructive interference. This phenomenon is called photon bunching, as the photons ‘bunch’ together and never appear at different outputs, a uniquely quantum effect. Contrast this with classical statistics where we would expect to see anti-bunching (one particle at each output) 50% of the time.

We can replicate the same phenomenon using polarisation encoding by commencing with a two-photon state, where one is horizontally polarised, the other vertically,

\hat{h}^\dag \hat{v}^\dag |vac\rangle = |1\rangle_H|1\rangle_V,

which transforms to,

\frac{1}{2}(\hat{h}^\dag + \hat{v}^\dag)(\hat{h}^\dag - \hat{v}^\dag)|vac\rangle = \frac{1}{\sqrt{2}}(|2\rangle_H |0\rangle_V - |0\rangle_H|2\rangle_V).

Polarising beamsplitters

Evolution of the four polarisation-encoded basis states through a polarising beamsplitter, which reflects horizontal (H) polarisation and transmits vertical (V) polarisation. When both photons have the same polarisation (even parity) we observe anti-bunching (or coincidence) events, whereas when polarisations differ (odd parity) we observe bunching. Post-selecting upon coincidence events projects us into the even parity subspace.

A polarising beamsplitter (PBS) operates very differently than a regular beamsplitter, acting on two spatial degrees of freedom, each of which is associated with two polarisation degrees of freedom, making it a four-mode transformation. Most commonly, PBS’s completely reflect one polarisation (say H) while completely transmitting the other (V), in which case the 4\times 4 transformation is,

\begin{bmatrix} \hat{h}_1^\dag \\ \hat{h}_2^\dag \\ \hat{v}^\dag_1 \\ \hat{v}^\dag_2 \end{bmatrix} \to \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} \hat{h}_1^\dag \\ \hat{h}_2^\dag \\ \hat{v}^\dag_1 \\ \hat{v}^\dag_2 \end{bmatrix}.

It can be seen that this operation simply permutes modes, leaving the \hat{h}^\dag_1 and \hat{h}^\dag_2 operators unchanged, whilst swapping the \hat{v}^\dag_1 and \hat{v}^\dag_2 operators.

Beginning with any initially separable state in the original H/V basis, this operation preserves separability and cannot introduce entanglement, nor does any interference take place. Note that while this 4\times 4 matrix corresponds to that of a CNOT gate, this is not a CNOT operation as this matrix describes a transformation on creation operators not qubits.

Single-photon qubits

In the field of photonic quantum computing, qubits are most commonly encoded in one of two ways: dual-rail encoding and polarisation encoding. In dual-rail encoding we encode a qubit as a single photon in superposition across two distinct spatial modes. In polarisation encoding we encode a single photon in superposition across two polarisation states.

Using these two encodings, a single logical qubit,

|\psi\rangle_L = \alpha|0\rangle_L + \beta|1\rangle_L,

can be written as,

|\psi\rangle_\mathrm{dual-rail} = \alpha|1,0\rangle + \beta|0,1\rangle,
|\psi\rangle_\mathrm{polarisation} = \alpha |1\rangle_H|0\rangle_V + \beta|0\rangle_H|1\rangle_V.

Using photonic creation operators we can equivalently express these as,

|\psi\rangle_\mathrm{dual-rail} = (\alpha \hat{a}_1^\dag + \beta \hat{a}_2^\dag)|vac\rangle,
|\psi\rangle_\mathrm{polarisation} = (\alpha \hat{h}^\dag + \beta \hat{v}^\dag)|vac\rangle.

Note that in an occupation number representation both of these can be expressed,

|\psi\rangle = \alpha |1,0\rangle + \beta|0,1\rangle,

where for dual-rail encoding the two modes are spatial modes, while for polarisation encoding they refer to the two polarisation modes.

There also exists single-rail encoding, whereby a qubit is encoded in a single mode as a superposition of 0 or 1 photons. The Bell state created previously at the output of a 50/50 beamsplitter fed with a single photon input is an example of single-rail encoding. However this type of encoding has limited utility as implementing operations on single-rail qubits is highly impractical. Since the two logical basis states have different photon-number, hence energy, single-qubit gates require coherently manipulating a superposition of different amounts of energy.

In the \{|1,0\rangle,|0,1\rangle\} occupation number basis the beamsplitter and polarisation rotation operations both implement the transformations,

\begin{bmatrix} |1,0\rangle \\ |0,1\rangle \end{bmatrix} \to \begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2} \end{bmatrix} \begin{bmatrix} |1,0\rangle \\ |0,1\rangle \end{bmatrix},

in their respective degrees of freedom. Defining the logical basis states of a single qubit as,

|0\rangle_L \cong |1,0\rangle,
|1\rangle_L \cong |0,1\rangle,

we see that the beamsplitter and polarisation rotation operations implement 2\times 2 single-qubit unitary transformations.

So while beamsplitters and polarisation rotations are entangling operations on two optical modes, they represent single-qubit (hence non-entangling) operations when acting on qubits defined over the single-photon symmetric subspace of two modes. We refer to this as a symmetric subspace since the qubit space is invariant under permutations of the constituent optical modes. That is, any permutation of the optical modes, of which there are two (identity or swap), leaves the basis \{|1,0\rangle,|0,1\rangle\} unchanged.

Partial Bell measurements

Consider two arbitrary multi-qubit systems, |\psi\rangle and |\phi\rangle. Applying a Schmidt decomposition to both systems, separating out one polarisation-encoded qubit from each, which we will subsequently perform Bell measurement on,

|\psi\rangle = \alpha_0 |\psi_0\rangle|H\rangle + \alpha_1 |\psi_1\rangle|V\rangle \\|\phi\rangle = \beta_0 |\phi_0\rangle|H\rangle + \beta_1 |\phi_1\rangle|V\rangle.

Expanding this out and expressing the isolated qubits in terms of creation operators we have,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_0\beta_1 |\psi_0\rangle |\phi_1\rangle \hat{h}^\dag_1 \hat{v}^\dag_2 \\+ \alpha_1\beta_0 |\psi_1\rangle |\phi_0\rangle \hat{v}^\dag_1 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_1 \hat{v}^\dag_2)|vac\rangle.

Evolving this through the PBS we obtain,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_0\beta_1 |\psi_0\rangle |\phi_1\rangle \hat{h}^\dag_1 \hat{v}^\dag_1 \\+ \alpha_1\beta_0 |\psi_1\rangle |\phi_0\rangle \hat{v}^\dag_2 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_2 \hat{v}^\dag_1)|vac\rangle.

Considering only the coincidence terms we post-select upon where each spatial output has exactly one photon this reduces to,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_2 \hat{v}^\dag_1)|vac\rangle.

From here if we measure the two qubits in the H/V polarisation basis, we will collapse onto either,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle,

or

\alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

depending on whether we measure H/H or V/V.

However, what we really want is a coherent projection onto both of these terms. If instead of measuring in the H/V basis we measure in the diagonal (|\pm\rangle_L=(|0\rangle_L \pm|1\rangle_L)/\sqrt{2}) basis we achieve this. The polarisation rotations prior to the photodetectors switch us into the diagonal basis. In qubit space, the balanced 50/50 beamsplitter transformation corresponds to a Hadamard gate, as does a 45° polarisation rotation, which effectively transforms the subsequent measurement from the computational \hat{Z} basis to the diagonal \hat{X} basis.

Applying the polarisation rotation we obtain,

\frac{1}{2}[\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle (\hat{h}^\dag_1 + \hat{v}^\dag_1) (\hat{h}^\dag_2 + \hat{v}^\dag_2) \\+ \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle (\hat{h}^\dag_1 - \hat{v}^\dag_1) (\hat{h}^\dag_2- \hat{v}^\dag_2)]|vac\rangle.

Expanding and regrouping this expression according to the different possible measurement outcomes we can write this as,

\frac{1}{2}[(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{h}^\dag_1 \hat{h}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{v}^\dag_1 \hat{v}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{h}^\dag_1 \hat{v}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{v}^\dag_1 \hat{h}^\dag_2]|vac\rangle.

Therefore, upon measuring either H/H or V/V we obtain,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

whereas if we measure H/V or V/H we obtain,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

which are the expected outcomes upon applying the,

\hat\Pi^\pm_L = |\Phi^\pm\rangle_L\langle\Phi^\pm|_L,

projectors.

What happens if rather than measuring a coincidence event we measure both photons at one output? Referring to the previous figure we see that if the input state was \hat{h}^\dag_1\hat{v}_2^\dag|vac\rangle both photons exit the top-left output, while if the input state was \hat{v}^\dag_1\hat{h}_2^\dag|vac\rangle both photons exit the top-right output. This means that if we measure two photons at one output we know exactly what the polarisation of both inputs was. Therefore, when the device fails to project onto the even-parity subspace it performs a computational basis (\hat{Z}) measurement on both qubits.

Where does the entanglement come from?

The above calculation is completely legitimate, but it isn’t clear at all where the entanglement comes from in our entangling measurement. The PBS is a non-entangling operation, and both our inputs and the post-selected outputs are in the qubit basis, whereby polarisation rotations implement single-qubit operations. It sounds like everything involved is non-entangling?

The resolution to the paradox is found in the terms we post-selected away. The non-coincidence terms that we eliminated were of the form \hat{h}^\dag\hat{v}^\dag, one such term associated with each of the PBS outputs, which subsequently undergo polarisation rotation. These two-photon terms are not confined to qubit space and undergo HOM interference, creating highly entangled two-photon terms of the form \hat{h}^{\dag^2}-\hat{v}^{\dag^2}.

Bell measurement circuit for dual-rail encoded qubits. The mode-swapping operation in dual-rail encoding corresponds to a polarising beamsplitter in polarisation encoding, while the beamsplitters correspond to polarisation rotations.

So while our input states can be considered polarisation encoded qubits and the overall transformation implemented by the device is a two-qubit entangling gate, internally our states are not confined to qubit space and the polarisation rotations prior to the detectors cannot be strictly considered as single-qubit gates. Rather, they are highly entangling multi-photon operations on two optical modes.

Entanglement is always defined relative to a basis and a state which is entangled in one basis needn’t be entangled in another. The most obvious example is that a Bell state is entangled in the qubit basis but not entangled in the Bell basis and vice-versa. Here we’ve defined a qubit space as the single-photon subspace of a two-mode Fock space, where entangling operations in the latter define local operations in the former.

It is correct to say that our partial Bell analyser relies on Hong-Ou-Mandel interference. But it doesn’t take place in the polarising beamsplitter, it takes place within the waveplates.

Polarisation-resolving photodetectors

In our optical circuit we required polarisation-resolving photodetectors. In practise, photodetectors available to us in the laboratory don’t have the ability to do this directly – they only resolve photon-number. However, this can easily be overcome by utilising an additional PBS to spatially separate and independently detect a state’s polarisation components, as shown below.

Photo-detectors typically only measure photon-number but not polarisation. However, using a polarising beamsplitter we can spatially separate and independently detect the horizontal and vertical components of a polarisation-encoded qubit, thereby implementing polarisation-resolved detection.

So our original optical circuit, when experimentally implemented, will actually comprise three PBS’s and four photodetectors, and the full circuit will look like this.

Full experimental implementation for polarisation-encoded optical Bell measurement.

(Acknowledgement: Thank you to Felix Zilk for providing very helpful feedback on this post.)

The post How do photonic Bell measurements work? appeared first on Peter Rohde.

July 15, 2023

Jordan EllenbergGiants 15, Brewers 1

I like a close, hard-fought game as much as the next baseball fan, and I’ve seen a lot of those lately, but there is a peculiar and specific pleasure to the game in which the team you’re rooting for gets absolutely, relentlessly pummeled. It was a beautiful night on Friday, though chilly enough that they closed the roof at American Family Field. The Brewers were in their City Connect “Brew Crew” uniforms. We got there just as Christian Yelich was grounding into an RBI double play with the bases loaded. That was about as good as it got for Milwaukee. Freddy Peralta, starting for the Brewers, didn’t have it. The next reliever didn’t have it either. Ethan Small, brought up that morning from triple-A Nashville, didn’t have it, and by that time the game was out of reach and Craig Counsell just left Small up there on the hill to take his lumps and save the rest of the pen. The Brewers were booting balls, botching throws, just generally Bad News Bearsing it out there, and the crowd was, well, good-natured. Like I said, it was a beautiful night. Our guys were having a bad day and we were there for them.

Mike Brosseau moved over from first base to pitch the ninth and it was a real pleasure to see the Giants’ batters stymied at last, unable to adjust to the 68-mph fastball and the changeup that cruised in at 62. He got them 1-2-3. By that time a lot of fans had gone home. But we stayed through to the end. And you can see us pretty clearly, sitting along the third base line above the Giants dugout, in the broadcast.

Next visit to AmFam will be when the Orioles come to town. So I’m hoping to see the Brewers lose one more time this spring.

Jordan EllenbergSurprises of Spain

CJ and I took a father-son trip to Spain (or, depending on how you partition nations, to Spain and Catalonia.) A very enjoyable, nimble, seat-of-the-pants-planned trip. Just back yesterday.

Some things I found surprising about Spain:

  • Pollworkers for elections are chosen by lot from the population, the way juries are in the United States. If the pollworker assigned to a polling station doesn’t show up, the police can select an unexpecting voter and assign them to work the polls on the spot.
  • Crosswalks aren’t really at the intersection, but set back quite a ways from the intersection, maybe 10% of the way to the middle of the block. This seems like a good system!
  • I expected to enjoy seeing Harry Styles play the Estadi Olimpic in Barcelona but I was surprised by how much I enjoyed it.
  • I didn’t understand how much the Catholic Church is wound into the government there. To some extent this is lingering Francoism, to some extent just how Spain has been forever. There are religion classes in public schools, and when you do your taxes there’s a box you can mark that allocates 0.7% of your taxes to the Church. In the Granada Cathedral there was a free newsletter which turned out to be entirely devoted to convincing readers to mark the box.
  • Burrata, which I think of as Italian food, seems to be a standard menu item in Spain. I think it’s been fully incorporated and is now also Spanish food.
  • We went to a sports bar in Barcelona to watch Carlos Alcaraz play in the Wimbledon quarterfinal. We expected to be among a crowd of cheering fans but in fact the bar wasn’t even showing the match until we asked the bartender to put it on, and it took him quite a while to find the channel. Alcaraz is one of the biggest Spanish athletes in the world, so why is this? Some potential explanations: 1) Watching sports in bars isn’t popular in Spain (evidence; there weren’t very many sports bars listed!); 2) In Barcelona, Alcaraz is seen as Spanish-as-opposed-to-Catalonian; 3) Tennis just isn’t a popular spectator sport in Spain. I don’t know which it was!
  • Also, the sports bar was founded in 2008, is called “Obama Gastropub,” and has a… colonial British Africa theme? Like, khaki jungle gear and 1920 maps of Africa everywhere? Very weird.
  • During the brief period of anarchist rule in Barcelona before 1939, the radically anti-clerical Republicans dug up the bodies of priests and nuns from under the church and displayed the decayed corpses in the town square, as a way of falsifying the popular belief that the clergy lay undecomposed beneath the earth in preparation for their eventual bodily ascension.
  • CJ and I went to a bullfight in Madrid. I’m not sure what I expected — but I did not expect it to be as thoroughly sad as it was. I had taken the name to mean that a bullfight was a fight. But the actual name for this event in Spanish, corrida, makes no such promise, saying only that the bull will run. At the beginning, it does run, even jabs with its horns a little bit. But then the picadors wound the bull and the banderilleros drive barbs into its neck. At that point, the bull looks tired and confused. It is clearly not mad anymore. It would be happy to walk away and call the whole thing off. And then the matador, who at this point conveys no sense of being in danger at all, whose spangly uniform is not even mussed, drives a sword into the bull and the crowd sits and waits while the bull vomits a dribble of blood, starts to wobble, and then finally goes down to its knees and dies. And then everybody cheers. And then the horses drag the bull’s body across the ring and a couple of janitors sweep dirt over the bloody trail. I don’t know what I saw, but it wasn’t a fight. There were six bulls slated to be stabbed that night but we left after two.

July 02, 2023

Clifford JohnsonAnd so it begins…

There’s not much in this post, but I wanted to mark a significant date. It is the first day of the rest of 2023, but in addition, it is the beginning of a new chapter for me. Yesterday was my last day as an employee of the University of Southern … Click to continue reading this post

The post And so it begins… appeared first on Asymptotia.