Planet Musings

August 18, 2019

David Hoggwriting, talking

I'm stuck in bed with a bad back. I have been for a few days now. I am using the time to write in my summer writing projects, and talk to students and postdocs by Skype (tm). But it is hard to work when out sick, and it isn't necessarily a good idea. I'm not advocating it!

David Hoggmore catalog likelihoods

I resolved (for now, to my current satisfaction) my issues from a few days ago, about likelihoods for catalogs. I showed that the likelihood that I advocate does not give biased inferences, and does permit inference of the selection function (censoring process) along with the inference of the world. I did this with my first ever use of the Google (tm) Collaboratory (tm). I wanted to see if it works, and it does. My notebook is here (subject to editing and changing, so no promises about its state when you go there). If your model includes the censoring process—that is, if you want to parameterize and learn the catalog censoring along with the model of the world—then (contra Loredo, 2004) you have to use a likelihood function that depends on the selection function at the individual-source level. And I think it is justified, because it is the assumption that the universe plus the censoring is the thing which is generating your catalog. That's a reasonable position to take.

August 17, 2019

n-Category Café Graphical Regular Logic

guest post by Sophie Libkind and David Jaz Myers

This post continues the series from the Adjoint School of Applied Category Theory 2019.

Graphical Regular Logic

guest post by Sophie Libkind and David Jaz Myers

This post continues the series from the Adjoint School of Applied Category Theory 2019.

Now that we have self-driving cars and talking watches, holographic data analysis should be the next innovation inspired by Hollywood spy films. Think of when Tom Cruise in Minority Report shuffles information around a 3D screen. As a step towards this future, Fong and Spivak formulated a graphical system for doing regular logic in their paper Graphical Regular Logic. You can see Spivak give a talk about it here. This visual knowledge representation system has the potential to be an intuitive user-interface for regular logic. It would allow laypeople to harness the power of regular logic without the clunkiness of the formal language.

The road from regular logic to graphical regular logic has two major stops: regular categories and regular calculi.

In this post, we’ll start on the right with graphical regular logic and work our way leftwards to regular categories.

Graphical Regular Logic: A User’s Guide

Graphical Regular Logic is meant to be broadly applicable and easy-to-use. So in this user’s guide we will apply it to something as non-mathy as a neighborhood watch program that is run by beings as gibberishy as a clan of minions.

Building blocks

When we imagine the headquarters of the neighborhood watch program, there’s a big screen with glowing blue dots to move, connect, and zoom in and out of. In graphical regular logic, these orbs are called shells. A shell has finitely many ports labeled with a type and wires can connect ports of the same type. In the neighborhood watch program, we care about types like person and location and use shells like the following.

A shell is the foundation for talking about a relationship between the types labeled on the ports. A relationship between a person and a location (more formally a predicate in context Γ person,location \Gamma_{\text{person}, \text{location}}) is like a Madlibs with a blank space for a person and a location:

To represent such a relationship, we take a shell with a port for each blank space and fill the shell with the corresponding predicate.

We would read the right cell as “There exists a person pp and a location \ell such that pp works at \ell”. Every shell represents a possible relationship between the wires coming out of it. There is no implied directionality on these wires.


Now comes the fun part where we really get to feel like Tom Cruise. How do we manipulate the filled shells in a way that reflects logical moves?

Operation 1: Substitution

The most basic operation is to substitute one filled shell for another. If someone lives in a location, then it’s also true that they have visited that location. Therefore, if we see the filled shell corresponding to (person) lives in (location) then we should be able to replace it with the filled shell corresponding to (person) has visited (location). The red arrows below indicate when we can substitute one filled shell with another.

In general, we are allowed to substitute in the midst of a more complicated diagram.

Operation 2: Zooming out

Sometimes we want a coarser view of the neighborhood, so we will zoom out and blur two shells together. For example, suppose we have two relationships:

  • Bob works at Gru’s house
  • Edith has a roommate

And we want to turn them in to one relationship:

  • Bob works at Gru’s house and Edith has a roommate

So we start with a relationship between a person and a location and a relationship on one person. And we want to transform these into one relationship about two people and a location. This process is called trench coating and goes as follows: (1) stack the two shells, (2) enclosed them in a trench coat, (3) ignore the coat’s interior.

At last the true purpose of this example comes to light:

Now filling the inner shells with predicates, determines a predicate for the trench coat in a way that corresponds with logical AND.

We will discuss this parallel more deeply when we show how to put a logic on the graphical calculus.

Operation 3: Zooming in

If we can zoom out, then we should also be able to zoom in to get a more fine-grained view of the neighborhood. Suppose we have a predicate which fills a trench coat. Then it defines a predicate on the inner shells by a process called dotting off.

Note that zooming in and out do not compose to the identity. In fact, in the next section we will show that they are adjoint operations.

Operation 4: Connecting Shells

To really tell a story about the neighborhood we need to be able to combine information and build complicated relationships out of the basic ones. For example, suppose we have two relationships:

  • Agnes lives with Margo
  • Margo lives in Gru’s house

And we want to derive the relationship:

  • Agnes lives in Gru’s house

Step 1 Draw the shells corresponding to the input, inside of the shell corresponding to the output

Step 2 Draw an intermediate shell labeled with ports for the wires to attach through.

Step 3 Connect the wires.

Once we have hooked up shells, any choice of predicates for the inner shells defines a relationship for the outer shell. In the example above, we have the entailment:

However if we fill the inner shells differently, we may get a different filling for the outer shell.

We would read the bottom left diagram as “There exists a person p1p1, a person p2p2 and a location \ell such that p1p1 lives with p2p2 and p1p1 works at \ell”.

With these four operations, we can visually represent any deduction in regular logic. Next, we give a taste of how a graphical calculus presents a regular logic.

Putting a Logic on a Graphical Calculus

Before we dive into the particular relationship between the graphical calculus and regular categories, let’s take a little bit to dwell on some abstract ruminations.

Lawvere divides logic into two opposing modes: subjective and objective.

  • Subjective logic is the logic we use to reason with – the syntax and the rules of inference. For example, the axioms of a boolean algebra.
  • Objective logic is the logical arrangement of the objects we are interested in studying. For example, the power set of a given set.

We use the subjective logic to describe and reason about the objective logic at hand. This gives us a transformation which is sometimes called semantics from subjective to objective reasoning. For example, we can use the axioms of a boolean algebra to reason about the subsets of a given a set.

In nice cases, we can “objectify” our subjective logic by freely generating objective situations of our syntax. By stretching what we mean by syntax, we can also build subjective logics out of our objective situations. If everything goes perfectly, these two processes are inverse. A case where everything does go perfectly is Seely’s equivalence between first order (constructive) theories and hyperdoctrines, where the subjective first order constructive logic is interpreted into the objective logic of a hyperdoctrine.

In the situation at hand, the graphical calculus is our subjective logic for dealing with the objective logic of a regular category. We would like to put a logic back on our calculus – that is, build a regular category from a graphical calculus. However, we can’t do it with the pictures alone; we have to give a way for the pictures to mean something. How should we go about this?

Ajax Functors

The objects under consideration in predicate logic are the parts of a domain of discourse. The association of each object to its order of parts gives something akin to a hyperdoctrine. A hyperdoctrine is just an axiomatization of this association for first order constructive logic. We need something like this to add meaning to our pictures.

But hyperdoctrines are notoriously complex; they have a number of axioms that on a first read seem unmotivated (or, at least, they seemed that way to us). While they all turn out to directly correspond to features of the logic being modeled, it would be nice to have a more basic, conceptual axiomatization from which the desired properties follow. In the case of regular logic—the logic of “true”, “and”, and “exists”—David and Brendan find this conceptual axiomatization in the form of ajax functors. The definition of an ajax functor is not terribly complicated, but to understand why it does what we want takes a bit of work. So we’ll meander our way to the definition.

Regular logic is the logic of “truth”, “and” and “exists” – let’s bring them into the ajax frame in that order. Our ambient setting will be the category textbfOrder\textbf{Order} of ordered sets and order preserving maps. In fact, we will think of this as a 2-category where the arrows between order preserving maps f,g:ABf,\, g : A \to B are dominations fgf \vdash g, that is, a.f(a)g(a)\forall a.\, f(a) \vdash g(a) (since we are thinking logically, I’ll denote the order relation by aba \vdash b, which you can read as aa entails bb if you like). If you are familiar with enriched category theory, this is the 2-category of Boolean-enriched categories.

“Truth”, or \top, in an order theoretic sense, is the most general proposition. It has a mapping-in property; namely, from any proposition pp, one can infer \top. If EE is our ordered set of propositions and entailments, then :1E\top : 1 \to E would be an order preserving function from the terminal order. To say that \top has the universal property we expect, we can ask that :1E\top : 1 \to E be right adjoint to the terminal functor !:E1! : E \to 1. In other words, we ask that that the top holds if and only if the bottom holds

p**\frac{p \vdash \top}{\ast \vdash \ast}

where \vdash is our ordering and *\ast is the unique element of 11. Since per the axioms of an ordering, **\ast \vdash \ast always, we always have that pp \vdash \top, as we wanted.

“And”, or \wedge, in an order theoretic sense, is the meet of two elements. The proposition pqp \wedge q is the most general proposition which proves both pp and qq. Its mapping in property can be expressed by

xp,xqxpq\frac{x \vdash p,\quad x \vdash q}{x \vdash p \wedge q}

or, in other words, that :E×EE\wedge : E \times E \to E is right adjoint to the diagonal map Δ:EE×E\Delta : E \to E \times E.

Now, it won’t come as a surprise that “truth” and “and” together form a monoid, but we can give a slick proof of this knowing that they are right adjoints. This is because !:E1! : E \to 1 and Δ:EE×E\Delta : E \to E \times E give EE the structure of a comonoid and taking right adjoints is contravariantly functorial. Monoids like this are common enough that we have a name for them:

Definition: In a monoidal 2-category, an adjoint monoid is a monoid whose unit and multiplication are right adjoints.

We just saw that all meet semi-lattices are adjoint monoids in the 2-category of orders, but we can flip the argument on its head to see that every adjoint monoid is a meet semi-lattice as well. This is because the 2-category of orders is cartesian, and in a cartesian category every object has a unique comonoid structure given by !! and Δ\Delta. Since the left adjoints of the structure maps of an adjoint monoid form a comonoid, the right adjoints must be the top and meet.

“Exists” is a quantifier, and quantification is all about changing the domain of discourse. So we can’t really talk about “exists” without bringing in a system of domains of discourse, and this means bringing in a regular category. Let’s get back to regular categories now. A regular category is a category in which we can use the logic of “truth” (a terminal object), “and”, (pullbacks), and “exists” (images), and it all works the right way (see the nLab page for a formal defintion).

Recall that a subobject of an object AA is a monomorphism SAS \hookrightarrow A, considered up to isomorphism over AA. A subobject SS is contained in TT when there is a map STS \to T over AA; this makes the subobjects of any object into an order. A relation R:ABR : A \to B is a subobject RA×BR \hookrightarrow A \times B of a product; using variables, we say R(a,b)R(a, b) when the pair (a,b)(a, b) is in RR.

In a regular catgory, relations can be composed by “existing out the middle” RS(a,c)b.R(a,b)S(b,c).RS(a, c) \coloneqq \exists b.\, R(a, b) \wedge S(b, c). This gives a category of relations, which we will call \mathcal{R}.

Note that a subobject of AA is equivalently a relation S(I,A)S \in \mathcal{R}(I, A), where II is the terminal object of the regular category. Therefore, the functor

Sub()(I,):textbfOrder{Sub}(-) \coloneqq \mathcal{R}(I, -) : \mathcal{R} \to\textbf{Order}

represented by II sends an object to its order of subobjects.

This functor (or, really, 2-functor, since it respects the ordering of relations) gets a lax monoidal structure from the cartesian product of the regular category from which we built the category of relations \mathcal{R}. I’ll write this product as \otimes as a reminder that it isn’t cartesian in the category of relations; that is, AB=A×BA \otimes B = A \times B on objects, but it doesn’t have the same universal property in the category of relations. So, IIII \otimes I \cong I, and therefore we get a lax monoidal structure

(I,A)×(I,B)(II,AB)(I,AB)\mathcal{R}(I, A) \times \mathcal{R}(I, B) \to \mathcal{R}(I \otimes I, A \otimes B) \cong \mathcal{R}(I, A \otimes B)

which on the level of logic looks like (ϕ(a),ψ(b))ϕ(a)ψ(b)(\phi(a),\, \psi(b)) \mapsto \phi(a) \wedge \psi(b). We also have the identity relation on II

:*(I,I)\top : \ast \to \mathcal{R}(I,I)

which is logically “truth”.

There’s something special about these laxators: they have universal properties! In particular, they are both right adjoints, with left adjoints

(π 1,π 2):(I,AB)(I,A)×(I,B)(\pi_1, \pi_2) : \mathcal{R}(I, A \otimes B) \to \mathcal{R}(I, A) \times \mathcal{R}(I, B)


(I,I)*\mathcal{R}(I, I) \to \ast

or, logically, χ(a,b)(b.χ(a,b),a.χ(a,b))\chi(a, b) \mapsto (\exists b.\, \chi (a, b), \exists a.\, \chi (a, b)), and ν*\nu \mapsto \ast. That these are adjoint means we have the following correspondences:

b.χ(a,b)φ(a)a.χ(a,b)ψ(b)χ(a,b)φ(a)ψ(b)\frac{\exists b.\, \chi(a, b) \vdash \varphi(a) \quad \exists a.\, \chi(a, b) \vdash \psi(b)}{\chi(a, b) \vdash \varphi(a) \wedge \psi(b)}

These are just the introduction rules for truth and existensial quantification!

We call these kinds of functors ajax for adjoint lax.

Definition: A lax monoidal 2-functor F:𝒜F: \mathcal{A} \to \mathcal{B} between monoidal 2-categories is ajax if its laxators are right adjoints.

Ajax functors are very interesting objects in their own right, and we’d love to just ramble on about them for a good while. But we won’t.

Except for one thing. Just as lax functors preserve monoids, ajax functors preserve adjoint monoids. Now a regular category is in particular cartesian, so every object has a unique comonoid structure given by !! and Δ\Delta, and the regular category can be embedded contravariantly in its category of relations by sending f:ABf : A \to B to its “backwards graph” b=f(a)b = f(a). This gives every object of the category of relations a (not necessarily unique) monoid structure. But, there is also a covariant embedding into the category of relations, sending f:ABf : A \to B to its “graph” f(a)=bf(a) = b, and the “graph” and “backwards graph” of a function are adjoints in the category of relations. So every object of \mathcal{R} is in fact an adjoint monoid, and is therefore preserved by the ajax functor Sub{Sub}!

This shows that Sub(A){Sub}(A) is in fact always a meet semi-lattice. So, in conclusion, the fact that Sub{Sub} is ajax not only gives us “exists”, but “truth” and “and” as well. This is the structure we will adorn our graphical calculus with.

Regular Calculi, There and Back Again

If you’re a mathematician (and not a minion), then while reading the user’s guide to graphical regular logic you may have wondered “where does the entailment between predicates come from?” or “when a choice of predicates for some shells determines the predicate for another shell, where does that mapping live?” A regular calculus explicitly defines the structure between predicates that we use implicitly in the game of graphical calculus. In this section, we will see how different aspects of a regular calculus underpin the building blocks and operations of the graphical calculus. The major parallels are:

  • labels for ports and wires - a set of types TT
  • shells - objects of 𝔽Rb(T)\mathbb{F}\text{Rb}(T), the category of relations of the free regular category on the set of “types” TT
  • filling shells with predicates - a pofunctor P:𝔽Rb(T)textbfOrderP: \mathbb{F}\text{Rb}(T)\to\textbf{Order}
  • zooming in and out - PP satisfies the ajax condition
  • connecting shells - morphisms in 𝔽Rb(T)\mathbb{F}\text{Rb}(T)

Intuitively, 𝔽Rb(T)\mathbb{F}\text{Rb}(T) is isomorphic to the category with (1) objects given by shells (2) morphisms given by connecting shells, and (3) monoidal structure given by trench coating. An overview of this isomorphism can be found in detail in Graphical Regular Logic and in pictures below:

So this is the regular categorical incarnation of our pictures, and we will adorn it with a structure that mimics the behavior of the ajax functor Sub{Sub} and corresponds to the graphical actions of filling shells.

Definition: A regular calculus is a pair (T,P)(T, P) where TT is a set and P:𝔽Rb(T)textbfOrderP : \mathbb{F}\text{Rb}(T) \to \textbf{Order} is an ajax functor. A morphism of regular calculi is a pair (F,F )(F, F^{\sharp}) where F:TTF : T \to T' and F F^{\sharp} is a monoidal natural transformation

that is strict in every respect.

The functor PP defines an ordering on the predicates that can fill each shell. Given a context Γ\Gamma, the elements of P(Γ)P(\Gamma) behave as predicates on Γ\Gamma and the partial ordering corresponds to entailment. For example,

Thus filling a shell Γ\Gamma with a predicate is equivalent to choosing an element of the poset P(Γ)P(\Gamma). Furthermore, we are allowed to substitute one predicate for another according to the ordering of P(Γ)P(\Gamma).

The next operations are trench coating and dotting off. Recall that in graphical regular logic filling the inner shells, Γ\Gamma and Γ\Gamma', of a trench coat defines a predicate for the coat, ΓΓ\Gamma \oplus \Gamma'. Now that we know filling a shell Γ\Gamma corresponds to picking an element of the poset P(Γ)P(\Gamma), the process of trench coating translates to: Picking an element of P(Γ)×P(Γ)P(\Gamma) \times P(\Gamma') gives an element of P(ΓΓ)P(\Gamma \oplus \Gamma'). In other words we have a monotonic laxator ρ:P(Γ)×P(Γ)P(ΓΓ)\rho: P(\Gamma) \times P(\Gamma') \to P(\Gamma \oplus \Gamma'). Monotonicity implies that ρ\rho commutes with entailment.

Since dotting off is a pseudo-inverse to trench coating, dotting off corresponds to an adjoint λ:P(ΓΓ)P(Γ)×P(Γ)\lambda: P(\Gamma \oplus \Gamma') \to P(\Gamma) \times P(\Gamma').

Note that dotting off and then trench coating possibly loses information. In the example below, this composition maps the predicate “live together” to the weaker predicate “both have roommates”.

Since aa entails ρλ(a)\rho \circ \lambda (a), λ\lambda is a right adjoint for ρ\rho. This is why PP must be ajax!

The last operation is connecting shells. Suppose we have an wiring diagram corresponding to ω:Γ inΓ out\omega: \Gamma_{\text{in}} \to \Gamma_{\text{out}}. Like in trench coating, filling the inner shells of a wired diagram defines a predicate for the outer shell. Thus we have a map P(Γ in)P(\Gamma_{\text{in}}) to P(Γ out)P(\Gamma_{\text{out}}) which is given by P(ω)P(\omega).

If the free regular category is an incarnation of our graphical syntax, then we can think of a regular calculus as the way we stretch our definition of syntax to encompass semantics.

In particular, every regular category gives us an example of a regular calculus. If \mathcal{R} is the category of relations of a regular category, then there is an “interpretation” functor 𝔽Rb(ob)\mathbb{F}\text{Rb}(\text{ob}\mathcal{R}) \to \mathcal{R}; composing this with Sub:textbfOrder{Sub} : \mathcal{R} \to \textbf{Order} gives us a regular calculus whose “types” are just the objects of \mathcal{R}.

Definition: The functor prd:RegCatRegCalc{prd} : \text{RegCat} \to \text{RegCalc} is the functor sending a regular category whose category of relations is \mathcal{R} to SubInterpret:𝔽Rb(ob)textbfOrder{Sub} \circ {Interpret} : \mathbb{F}\text{Rb}(\text{ob}\mathcal{R}) \to \textbf{Order}.

The bulk of the paper is devoted to constructing a left adjoint to this functor, and proving that it is fully faithful. We’ll just give a sketch of the idea.

If we have a regular calculus P:𝔽Rb(T)textbfOrderP : \mathbb{F}\text{Rb}(T) \to \textbf{Order}, we can try and think of it as assigning to each context (object of 𝔽Rb(T)\mathbb{F}\text{Rb}(T)) its order of “subobjects”. So the objects of our “syntactic” regular category syn(P){syn}(P) should be these “subobjects”, reified. So, for any Γ𝔽Rb(T)\Gamma \in \mathbb{F}\text{Rb}(T) and any ϕP(Γ)\phi \in P(\Gamma), we should have an object {x:Γϕ(x)}\{ x : \Gamma \mid \phi(x) \} of syn(P){syn}(P). This is just a fancy name for the pair (Γ,ϕ)(\Gamma, \phi), one that tells us how to think of this pair.

A relation R:{x:Γ 1ϕ 1(x)}{x:Γ 2ϕ 2(x)}R : \{x : \Gamma_1 \mid \phi_1(x)\} \to \{x : \Gamma_2 \mid \phi_2(x)\} should be a subobject of the product of the two objects represented by ϕ 1\phi_1 and ϕ 2\phi_2, or RP(Γ 1Γ 2)R \in P(\Gamma_1 \oplus \Gamma_2). This should restrict nicely to ϕ 1\phi_1 and ϕ 2\phi_2 in the sense that P(π 1)(R)ϕ 1P(\pi_1)(R) \vdash \phi_1 and P(π 2)(R)ϕ 2P(\pi_2)(R) \vdash \phi_2. Functions can now be defined as functional relations, and with no small effort one can check that this gives a regular category.

The syntactic regular category of a predicate regular calculus is equivalent to the original regular category. This means that prd{prd} is a reflective inclusion of regular categories into regular calculi, and justifies us using the graphical calculus, equipped with an order of “propositions” for each object varying with each signature, as a tool for reasoning about regular categories.

The reason that prd{prd} is not an equivalence is that syn{syn} changes the “types” of a graphical calculus. Perhaps if types were considered up to isomorphism in a “univalent” way, we could get an equivalence of categories; but as we move toward an equivalence between our subjective and objective logics, the less “syntactical” our subjective logic becomes.

August 16, 2019

Matt von HippelAt Aspen

I’m at the Aspen Center for Physics this week, for a workshop on Scattering Amplitudes and the Conformal Bootstrap.

A place even greener than its ubiquitous compost bins

Aspen is part of a long and illustrious tradition of physics conference sites located next to ski resorts. It’s ten years younger than its closest European counterpart Les Houches School of Physics, but if anything its traditions are stricter: all blackboard talks, and a minimum two-week visit. Instead of the summer schools of Les Houches, Aspen’s goal is to inspire collaboration: to get physicists to spend time working and hiking around each other until inspiration strikes.

This workshop is a meeting between two communities: people who study the Conformal Bootstrap (nice popular description here) and my own field of Scattering Amplitudes. The Conformal Boostrap is one of our closest sister-fields, so there may be a lot of potential for collaboration. This week’s talks have been amplitudes-focused, I’m looking forward to the talks next week that will highlight connections between the two fields.

Doug Natelson"Seeing" chemistry - another remarkable result

I have written before (here, here) about the IBM Zurich group that has used atomic force microscopy in ultrahigh vacuum to image molecules on surfaces with amazing resolution.  They've done it again.  Starting with an elaborate precursor molecule, the group has been able to use voltage pulses to strip off side groups, so that in the end they leave behind a ring of eighteen carbon atoms, each bound to its neighbor on either side.  The big question was, would it be better to think of this molecule as having double bonds between each carbon, or would it be energetically favorable to break that symmetry and have the bonds alternate between triple and single.  It turns out to be the latter, as the final image shows a nonagon with nine-fold rotational symmetry.  Here is a video where the scientists describe the work themselves (the video is non-embeddable for some reason).  Great stuff.

n-Category Café Evil Questions About Equalizers

I have a few questions about equalizers. I have my own reasons for wanting to know the answers, but I’ll admit right away that these questions are evil in the technical sense. So, investigating them requires a certain morbid curiosity… and have a feeling that some of you will be better at this than I am.

Here are the categories:

RexRex = [categories with finite colimits, functors preserving finite colimits]

SMCSMC = [symmetric monoidal categories, strong symmetric monoidal functors]

Both are brutally truncated stumps of very nice 2-categories!

My questions:

1) Does RexRex have equalizers?

2) Does SMCSMC have equalizers?

3) We can get a functor F:RexSMCF \colon Rex \to SMC by arbitrarily choosing an initial object for each category in RexRex to serve as the unit of the tensor product and arbitrarily choosing a binary coproduct for each pair of objects to serve as their tensor product. Do this. Does the resulting FF preserve equalizers?

You see, I have some equalizers that exist in RexRex, and I know FF preserves them, but I know this by direct computation. If 1) were true I’d know these equalizers exist on general grounds, and if 3) were true I’d know they’re preserved on general grounds. This might help me avoid a bit of extra fiddliness as I walk on the dark side of mathematics and engage in some evil maneuvers… especially if there’s some reference I can just cite.

2) is just my morbid curiosity acting up. I’m pretty sure the category of symmetric monoidal categories and strict symmetric monoidal functors has equalizers, but strong?

August 15, 2019

Terence TaoQuantitative bounds for critically bounded solutions to the Navier-Stokes equations

I’ve just uploaded to the arXiv my paper “Quantitative bounds for critically bounded solutions to the Navier-Stokes equations“, submitted to the proceedings of the Linde Hall Inaugural Math Symposium. (I unfortunately had to cancel my physical attendance at this symposium for personal reasons, but was still able to contribute to the proceedings.) In recent years I have been interested in working towards establishing the existence of classical solutions for the Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p

\displaystyle \nabla \cdot u = 0

that blow up in finite time, but this time for a change I took a look at the other side of the theory, namely the conditional regularity results for this equation. There are several such results that assert that if a certain norm of the solution stays bounded (or grows at a controlled rate), then the solution stays regular; taken in the contrapositive, they assert that if a solution blows up at a certain finite time {T_*}, then certain norms of the solution must also go to infinity. Here are some examples (not an exhaustive list) of such blowup criteria:

  • (Leray blowup criterion, 1934) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\liminf_{t \rightarrow T_*} (T_* - t)^{\frac{1}{2}-\frac{3}{2p}} \| u(t) \|_{L^p_x({\bf R}^3)} \geq c} for an absolute constant {c>0}.
  • (ProdiSerrinLadyzhenskaya blowup criterion, 1959-1967) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\| u \|_{L^q_t L^p_x([0,T_*) \times {\bf R}^3)} =+\infty}, where {\frac{1}{q} := \frac{1}{2} - \frac{3}{2p}}.
  • (Beale-Kato-Majda blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\| \omega \|_{L^1_t L^\infty_x([0,T_*) \times {\bf R}^3)} = +\infty}, where {\omega := \nabla \times u} is the vorticity.
  • (Kato blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\liminf_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} \geq c} for some abs}{olute constant {c>0}.
  • (Escauriaza-Seregin-Sverak blowup criterion, 2003) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Seregin blowup criterion, 2012) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Phuc blowup criterion, 2015) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^{3,q}_x({\bf R}^3)} = +\infty} for any {q < \infty}.
  • (Gallagher-Koch-Planchon blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.
  • (Albritton blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.

My current paper is most closely related to the Escauriaza-Seregin-Sverak blowup criterion, which was the first to show a critical (i.e., scale-invariant, or dimensionless) spatial norm, namely {L^3_x({\bf R}^3)}, had to become large. This result now has many proofs; for instance, many of the subsequent blowup criterion results imply the Escauriaza-Seregin-Sverak one as a special case, and there are also additional proofs by Gallagher-Koch-Planchon (building on ideas of Kenig-Koch), and by Dong-Du. However, all of these proofs rely on some form of a compactness argument: given a finite time blowup, one extracts some suitable family of rescaled solutions that converges in some weak sense to a limiting solution that has some additional good properties (such as almost periodicity modulo symmetries), which one can then rule out using additional qualitative tools, such as unique continuation and backwards uniqueness theorems for parabolic heat equations. In particular, all known proofs use some version of the backwards uniqueness theorem of Escauriaza, Seregin, and Sverak. Because of this reliance on compactness, the existing proofs of the Escauriaza-Seregin-Sverak blowup criterion are qualitative, in that they do not provide any quantitative information on how fast the {\|u(t)\|_{L^3_x({\bf R}^3)}} norm will go to infinity (along a subsequence of times).

On the other hand, it is a general principle that qualitative arguments established using compactness methods ought to have quantitative analogues that replace the use of compactness by more complicated substitutes that give effective bounds; see for instance these previous blog posts for more discussion. I therefore was interested in trying to obtain a quantitative version of this blowup criterion that gave reasonably good effective bounds (in particular, my objective was to avoid truly enormous bounds such as tower-exponential or Ackermann function bounds, which often arise if one “naively” tries to make a compactness argument effective). In particular, I obtained the following triple-exponential quantitative regularity bounds:

Theorem 1 If {u} is a classical solution to Navier-Stokes on {[0,T) \times {\bf R}^3} with

\displaystyle \| u \|_{L^\infty_t L^3_x([0,T) \times {\bf R}^3)} \leq A \ \ \ \ \ (1)


for some {A \geq 2}, then

\displaystyle | \nabla^j u(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+1}{2}}


\displaystyle | \nabla^j \omega(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+2}{2}}

for {(t,x) \in [0,T) \times {\bf R}^3} and {j=0,1}.

As a corollary, one can now improve the Escauriaza-Seregin-Sverak blowup criterion to

\displaystyle \limsup_{t \rightarrow T_*} \frac{\|u(t) \|_{L^3_x({\bf R}^3)}}{(\log\log\log \frac{1}{T_*-t})^c} = +\infty

for some absolute constant {c>0}, which to my knowledge is the first (very slightly) supercritical blowup criterion for Navier-Stokes in the literature.

The proof uses many of the same quantitative inputs as previous arguments, most notably the Carleman inequalities used to establish unique continuation and backwards uniqueness theorems for backwards heat equations, but also some additional techniques that make the quantitative bounds more efficient. The proof focuses initially on points of concentration of the solution, which we define as points {(t_0,x_0)} where there is a frequency {N_0} for which one has the bound

\displaystyle |N_0^{-1} P_{N_0} u(t_0,x_0)| \geq A^{-C_0} \ \ \ \ \ (2)


for a large absolute constant {C_0}, where {P_{N_0}} is a Littlewood-Paley projection to frequencies {\sim N_0}. (This can be compared with the upper bound of {O(A)} for the quantity on the left-hand side that follows from (1).) The factor of {N_0^{-1}} normalises the left-hand side of (2) to be dimensionless (i.e., critical). The main task is to show that the dimensionless quantity {t_0 N_0^2} cannot get too large; in particular, we end up establishing a bound of the form

\displaystyle t_0 N_0^2 \lesssim \exp\exp\exp A^{O(C_0^6)}

from which the above theorem ends up following from a routine adaptation of the local well-posedness and regularity theory for Navier-Stokes.

The strategy is to show that any concentration such as (2) when {t_0 N_0^2} is large must force a significant component of the {L^3_x} norm of {u(t_0)} to also show up at many other locations than {x_0}, which eventually contradicts (1) if one can produce enough such regions of non-trivial {L^3_x} norm. (This can be viewed as a quantitative variant of the “rigidity” theorems in some of the previous proofs of the Escauriaza-Seregin-Sverak theorem that rule out solutions that exhibit too much “compactness” or “almost periodicity” in the {L^3_x} topology.) The chain of causality that leads from a concentration (2) at {(t_0,x_0)} to significant {L^3_x} norm at other regions of the time slice {t_0 \times {\bf R}^3} is somewhat involved (though simpler than the much more convoluted schemes I initially envisaged for this argument):

  1. Firstly, by using Duhamel’s formula, one can show that a concentration (2) can only occur (with {t_0 N_0^2} large) if there was also a preceding concentration

    \displaystyle |N_1^{-1} P_{N_1} u(t_1,x_1)| \geq A^{-C_0} \ \ \ \ \ (3)


    at some slightly previous point {(t_1,x_1)} in spacetime, with {N_1} also close to {N_0} (more precisely, we have {t_1 = t_0 - A^{-O(C_0^3)} N_0^{-2}}, {N_1 = A^{O(C_0^2)} N_0}, and {x_1 = x_0 + O( A^{O(C_0^4)} N_0^{-1})}). This can be viewed as a sort of contrapositive of a “local regularity theorem”, such as the ones established by Caffarelli, Kohn, and Nirenberg. A key point here is that the lower bound {A^{-C_0}} in the conclusion (3) is precisely the same as the lower bound in (2), so that this backwards propagation of concentration can be iterated.

  2. Iterating the previous step, one can find a sequence of concentration points

    \displaystyle |N_n^{-1} P_{N_n} u(t_n,x_n)| \geq A^{-C_0} \ \ \ \ \ (4)


    with the {(t_n,x_n)} propagating backwards in time; by using estimates ultimately resulting from the dissipative term in the energy identity, one can extract such a sequence in which the {t_0-t_n} increase geometrically with time, the {N_n} are comparable (up to polynomial factors in {A}) to the natural frequency scale {(t_0-t_n)^{-1/2}}, and one has {x_n = x_0 + O( |t_0-t_n|^{1/2} )}. Using the “epochs of regularity” theory that ultimately dates back to Leray, and tweaking the {t_n} slightly, one can also place the times {t_n} in intervals {I_n} (of length comparable to a small multiple of {|t_0-t_n|}) in which the solution is quite regular (in particular, {u, \nabla u, \omega, \nabla \omega} enjoy good {L^\infty_t L^\infty_x} bounds on {I_n \times {\bf R}^3}).

  3. The concentration (4) can be used to establish a lower bound for the {L^2_t L^2_x} norm of the vorticity {\omega} near {(t_n,x_n)}. As is well known, the vorticity obeys the vorticity equation

    \displaystyle \partial_t \omega = \Delta \omega - (u \cdot \nabla) \omega + (\omega \cdot \nabla) u.

    In the epoch of regularity {I_n \times {\bf R}^3}, the coefficients {u, \nabla u} of this equation obey good {L^\infty_x} bounds, allowing the machinery of Carleman estimates to come into play. Using a Carleman estimate that is used to establish unique continuation results for backwards heat equations, one can propagate this lower bound to also give lower {L^2} bounds on the vorticity (and its first derivative) in annuli of the form {\{ (t,x) \in I_n \times {\bf R}^3: R \leq |x-x_n| \leq R' \}} for various radii {R,R'}, although the lower bounds decay at a gaussian rate with {R}.

  4. Meanwhile, using an energy pigeonholing argument of Bourgain (which, in this Navier-Stokes context, is actually an enstrophy pigeonholing argument), one can locate some annuli {\{ x \in {\bf R}^3: R \leq |x-x_n| \leq R'\}} where (a slightly normalised form of) the entrosphy is small at time {t=t_n}; using a version of the localised enstrophy estimates from a previous paper of mine, one can then propagate this sort of control forward in time, obtaining an “annulus of regularity” of the form {\{ (t,x) \in [t_n,t_0] \times {\bf R}^3: R_n \leq |x-x_n| \leq R'_n\}} in which one has good estimates; in particular, one has {L^\infty_x} type bounds on {u, \nabla u, \omega, \nabla \omega} in this cylindrical annulus.
  5. By intersecting the previous epoch of regularity {I_n \times {\bf R}^3} with the above annulus of regularity, we have some lower bounds on the {L^2} norm of the vorticity (and its first derivative) in the annulus of regularity. Using a Carleman estimate first introduced by theorem of Escauriaza, Seregin, and Sverak, as well as a second application of the Carleman estimate used previously, one can then propagate this lower bound back up to time {t=t_0}, establishing a lower bound for the vorticity on the spatial annulus {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}. By some basic Littlewood-Paley theory one can parlay this lower bound to a lower bound on the {L^3} norm of the velocity {u}; crucially, this lower bound is uniform in {n}.
  6. If {t_0 N_0^2} is very large (triple exponential in {A}!), one can then find enough scales {n} with disjoint {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}} annuli that the total lower bound on the {L^3_x} norm of {u(t_0)} provided by the above arguments is inconsistent with (1), thus establishing the claim.

The chain of causality is summarised in the following image:


It seems natural to conjecture that similar triply logarithmic improvements can be made to several of the other blowup criteria listed above, but I have not attempted to pursue this question. It seems difficult to improve the triple logarithmic factor using only the techniques here; the Bourgain pigeonholing argument inevitably costs one exponential, the Carleman inequalities cost a second, and the stacking of scales at the end to contradict the {L^3} upper bound costs the third.


David Hoggstruggling with likelihoods

I worked more on my selection-function paper with Rix. I continued to struggle with understanding the controversy (between Loredo on one hand and various collaboration of my own on the other) about the likelihood function for a catalog. In my view, if you take a variable-rate Poisson process, and then censor it, where the censoring depends only on the individual properties of the individual objects being censored, you get a new variable-rate Poisson process with just a different rate function. If I am right, then there is at least one way of thinking about things such that the likelihood functions in the Bovy et al and Foreman-Mackey et al papers are correct. My day ended with a very valuable phone discussion of this with Foreman-Mackey. He (and I) would like to understand what is the difference in assumptions between us and Loredo.

I also worked today with Soledad Villar (NYU) to develop capstone projects for the masters program in the Center for Data Science. The Masters students do research projects, and we have lots of ideas about mashing up deep learning and astrophysics.

John BaezCarbon Offsets

A friend asks:

A quick question: if somebody wants to donate money to reduce his or her carbon footprint, which org(s) would you recommend that he or she donate to?

Do you have a good answer to this? I don’t want answers that deny the premise. We’re assuming someone wants to donate money to reduce his or her carbon footprint, and choosing an organization based on this. We’re not comparing this against other activities, like cutting personal carbon emissions or voting for politicians who want to cut carbon emissions.

Here’s my best answer so far:

The Gold Standard Foundation is one organization that tackles my friend’s question. See for example:

• Gold Standard, Offset your emissions.

Here they list various ways to offset your carbon emissions, currently with prices between $11 and $18 per tonne.

The Gold Standard Foundation is a non-profit foundation headquartered in Geneva that tries to ensure that carbon credits are real and verifiable and that projects make measurable contributions to sustainable development.

August 14, 2019

Terence TaoEigenvectors from eigenvalues

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv the short unpublished note “Eigenvectors from eigenvalues“. This note gives two proofs of a general eigenvector identity observed recently by Denton, Parke and Zhang in the course of some quantum mechanical calculations. The identity is as follows:

Theorem 1 Let {A} be an {n \times n} Hermitian matrix, with eigenvalues {\lambda_1(A),\dots,\lambda_n(A)}. Let {v_i} be a unit eigenvector corresponding to the eigenvalue {\lambda_i(A)}, and let {v_{i,j}} be the {j^{th}} component of {v_i}. Then

\displaystyle  |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))

where {M_j} is the {n-1 \times n-1} Hermitian matrix formed by deleting the {j^{th}} row and column from {A}.

For instance, if we have

\displaystyle  A = \begin{pmatrix} a & X^* \\ X & M \end{pmatrix}

for some real number {a}, {n-1}-dimensional vector {X}, and {n-1 \times n-1} Hermitian matrix {M}, then we have

\displaystyle  |v_{i,1}|^2 = \frac{\prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M))}{\prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A))} \ \ \ \ \ (1)

assuming that the denominator is non-zero.

Once one is aware of the identity, it is not so difficult to prove it; we give two proofs, each about half a page long, one of which is based on a variant of the Cauchy-Binet formula, and the other based on properties of the adjugate matrix. But perhaps it is surprising that such a formula exists at all; one does not normally expect to learn much information about eigenvectors purely from knowledge of eigenvalues. In the random matrix theory literature, for instance in this paper of Erdos, Schlein, and Yau, or this later paper of Van Vu and myself, a related identity has been used, namely

\displaystyle  |v_{i,1}|^2 = \frac{1}{1 + \| (M-\lambda_i(A))^{-1} X \|^2}, \ \ \ \ \ (2)

but it is not immediately obvious that one can derive the former identity from the latter. (I do so below the fold; we ended up not putting this proof in the note as it was longer than the two other proofs we found. I also give two other proofs below the fold, one from a more geometric perspective and one proceeding via Cramer’s rule.) It was certainly something of a surprise to me that there is no explicit appearance of the {a,X} components of {A} in the formula (1) (though they do indirectly appear through their effect on the eigenvalues {\lambda_k(A)}; for instance from taking traces one sees that {a = \sum_{k=1}^n \lambda_k(A) - \sum_{k=1}^{n-1} \lambda_k(M)}).

One can get some feeling of the identity (1) by considering some special cases. Suppose for instance that {A} is a diagonal matrix with all distinct entries. The upper left entry {a} of {A} is one of the eigenvalues of {A}. If it is equal to {\lambda_i(A)}, then the eigenvalues of {M} are the other {n-1} eigenvalues of {A}, and now the left and right-hand sides of (1) are equal to {1}. At the other extreme, if {a} is equal to a different eigenvalue of {A}, then {\lambda_i(A)} now appears as an eigenvalue of {M}, and both sides of (1) now vanish. More generally, if we order the eigenvalues {\lambda_1(A) \leq \dots \leq \lambda_n(A)} and {\lambda_1(M) \leq \dots \leq \lambda_{n-1}(M)}, then the Cauchy interlacing inequalities tell us that

\displaystyle  0 \leq \lambda_i(A) - \lambda_k(M) \leq \lambda_i(A) - \lambda_k(A)

for {1 \leq k < i}, and

\displaystyle  \lambda_i(A) - \lambda_{k+1}(A) \leq \lambda_i(A) - \lambda_k(M) < 0

for {i \leq k \leq n-1}, so that the right-hand side of (1) lies between {0} and {1}, which is of course consistent with (1) as {v_i} is a unit vector. Thus the identity relates the coefficient sizes of an eigenvector with the extent to which the Cauchy interlacing inequalities are sharp.

— 1. Relating the two identities —

We now show how (1) can be deduced from (2). By a limiting argument, it suffices to prove (1) in the case when {\lambda_i(A)} is not an eigenvalue of {M}. Without loss of generality we may take {i=n}. By subtracting the matrix {\lambda_n(A) I_n} from {A} (and {\lambda_n(A) I_{n-1}} from {M_j}, thus shifting all the eigenvalues down by {\lambda_i(A)}, we may also assume without loss of generality that {\lambda_i(A)=0}. So now we wish to show that

\displaystyle  |v_{n,1}|^2 \prod_{k=1}^{n-1} \lambda_k(A) = \prod_{k=1}^{n-1} \lambda_k(M).

The right-hand side is just {\mathrm{det}(M)}. If one differentiates the characteristic polynomial

\displaystyle \mathrm{det}(A - \lambda I_n) = \prod_{k=1}^n (\lambda_k(A) - \lambda) = - \lambda \prod_{k=1}^{n-1} (\lambda_k(A) - \lambda)

at {\lambda=0}, one sees that

\displaystyle \prod_{k=1}^{n-1} \lambda_k(A) = -\frac{d}{d\lambda}\mathrm{det}(A - \lambda I_n)|_{\lambda=0}.

Finally, (2) can be rewritten as

\displaystyle  |v_{n,1}|^2 = \frac{1}{1 + X^* M^{-2} X}

so our task is now to show that

\displaystyle  \frac{d}{d\lambda}\mathrm{det}(A - \lambda I_n)|_{\lambda=0} = - \mathrm{det}(M) ( 1 + X^* M^{-2} X ). \ \ \ \ \ (3)

By Schur complement, we have

\displaystyle  \mathrm{det}(A - \lambda I_n) = \mathrm{det}(M - \lambda I_{n-1}) ( a - \lambda - X^* (M_1 - \lambda I_{n-1})^{-1} X ). \ \ \ \ \ (4)

Since {\lambda=0} is an eigenvalue of {A}, but not of {M_1} (by hypothesis), the factor {a - \lambda - X^* (M_1 - \lambda I_{n-1})^{-1} X} vanishes when {\lambda=0}. If we then differentiate (4) in {\lambda} and set {\lambda=0} we obtain (3) as desired.

— 2. A geometric proof —

Here is a more geometric way to think about the identity. One can view {\lambda_i(A) - A} as a linear operator on {{\bf C}^n} (mapping {w} to {(\lambda_i(A) w - Aw)} for any vector {w}); it then also acts on all the exterior powers {\bigwedge^k {\bf C}^n} by mapping {w_1 \wedge \dots \wedge w_k} to {(\lambda_i(A) w_1 - Aw_1) \wedge \dots \wedge (\lambda_i(A) w_k - Aw_k)} for all vectors {w_1,\dots,w_k}. In particular, if one evaluates {A} on the basis {v_1 \wedge \dots \wedge v_{j-1} \wedge v_{j+1} \wedge \dots \wedge n_n} of {\bigwedge^{n-1} {\bf C}^n} induced by the orthogonal eigenbasis {v_1,\dots,v_n}, we see that the action of {\lambda_i(A) - A} on {\bigwedge^{n-1} {\bf C}^n} is rank one, with

\displaystyle  \langle (\lambda_i(A) - A) \omega, \omega \rangle = \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) |\omega \wedge v_i|^2

for any {\omega \in \bigwedge^{n-1} {\bf C}^n}, where {\langle,\rangle} is the inner product on {\bigwedge^{n-1} {\bf C}^n} induced by the standard inner product on {{\bf C}^n}. If we now apply this to the {n-1}-form {\omega = e_1 \wedge \dots \wedge e_{j-1} \wedge e_{j+1} \wedge \dots \wedge e_n}, we have {|\omega \wedge v_i| = |v_{i,j}|}, while {(\lambda_i(A)-A) \omega} is equal to {\mathrm{det}(\lambda_i(A) I_{n-1} - M) \omega} plus some terms orthogonal to {\omega}. Since {\mathrm{det}(\lambda_i(A) I_{n-1} - M ) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M))}, Theorem 1 follows.

— 3. A proof using Cramer’s rule —

By a limiting argument we can assume that all the eigenvalues of {A} are simple. From the spectral theorem we can compute the resolvent {(\lambda I_n - A)^{-1}} for {\lambda \neq \lambda_1(A),\dots,\lambda_n(A)} as

\displaystyle  (\lambda I_n - A)^{-1} = \sum_{k=1}^n \frac{1}{\lambda - \lambda_k(A)} v_k v_k^*.

Extracting the {(j,j)} component of both sides and using Cramer’s rule, we conclude that

\displaystyle  \frac{\mathrm{det}(\lambda I_{n-1} - M_j)}{\mathrm{det}(\lambda I_n - A)} = \sum_{k=1}^n \frac{1}{\lambda - \lambda_k(A)} |v_{k,j}|^2

or in terms of eigenvalues

\displaystyle  \frac{\prod_{k=1}^{n-1} (\lambda - \lambda_k(M_j)) }{\prod_{k=1}^{n} (\lambda - \lambda_k(A)) } = \sum_{k=1}^n \frac{1}{\lambda - \lambda_k(A)} |v_{k,j}|^2.

Both sides are rational functions with a simple pole at the eigenvalues {\lambda_i(A)}. Extracting the residue at {\lambda = \lambda_i(A)} we conclude that

\displaystyle  \frac{\prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j)) }{\prod_{k=1; k \neq i}^{n} (\lambda_i(A) - \lambda_k(A)) } = |v_{i,j}|^2

and Theorem 1 follows. (Note that this approach also gives a formula for {v_{i,j} \overline{v_{i,j'}}} for {j,j'=1,\dots,n}, although the formula becomes messier when {j \neq j'} because the relevant minor of {\lambda I_n} is no longer a scalar multiple of the identity {I_{n-1}}.)

David Hoggcomparing points to points

For a number of projects, my group has been trying to compare point sets to point sets, to determine transformations. Some contexts have been calibration (like photometric and astrometric calibration of images, where stars need to align, either on the sky or in magnitude space) and others have been in dynamics. Right now Suroor Gandhi (NYU), Adrian Price-Whelan (Flatiron), and I have been trying to find transformations that align phase-space structures (and especially the Snail) observed in different tracers: What transformation between tracers matches the phase-space structure? These projects are going by our code name MySpace.

Projects like these tend to have a pathology, however, related to a pathology that Robyn Sanderson (Flatiron) and I found in a different context in phase space: If you write down a naive objective for matching two point clouds, the optimal match often has one point cloud shrunk down to zero size and put on top of the densest location on the other point cloud! Indeed, Gandhi is finding this so we decided (today) to try symmetrizing the objective function to stop it. That is, don't just compare points A to points B, but also symmetrically compare points B to points A. Then (I hope) neither set can shrink to zero usefully. I hope this works! Now to make a symmetric objective function...

August 13, 2019

David Hogglikelihood function for a catalog

I spent my research time today writing in a paper Rix (MPIA) and I are preparing about selecting sources for a catalog or target selection. The fundamental story is that you need to make a likelihood function at the end of the day. And this, in turn, means that you need a tractable and relatively accurate selection function. This all took me down old paths I have traveled with Bovy (Toronto) and Foreman-Mackey (Flatiron).

In email correspondence, Foreman-Mackey reminded me of past correspondence with Loredo (Cornell), who disagrees with our work on these things for very technical reasons. His (very nice) explanation of his point is around equations (8) through (10) in this paper: It has to do with how to factorize a probability distribution for a collection of objects obtained in a censored, variable-rate Poisson process. But our historical view of this (and my restored view after a day of struggling) is that the form of the likelihood depends on fine details of how you believe the objects of study were selected for the catalog, or censored. If they were censored only by your detector, I think Loredo's form is correct. But if they were censored for physical reasons over which you have no dominion (for example a planet transit obscured by a tiny fluctuation in a star's brightness), the selection can come in to the likelihood function differently. That is, it depends on the causal chain involved in the source censoring.

August 12, 2019

Tommaso DorigoThe Terrascope - Using Earth As A Lens

Ever since telescopes were first invented, by some dutch lens grinder in the late XVIth century, and then demonstrated to be invaluable tools for investigating the cosmos around us by Galileo Galilei in the early 1600s, there has been a considerable, steady effort to construct bigger and better ones. Particularly bigger ones.

read more

August 11, 2019

Doug NatelsonAPS Division of Condensed Matter Physics invited symposium nominations

While I've rotated out of my "member-at-large" spot on the APS DCMP executive committee, I still want to pass this on.  Having helped evaluate invited symposium proposals for the last three years, I can tell you that the March Meeting benefits greatly when there is a rich palette.


The DCMP invited speaker program at March Meeting 2020 is dependent on the invited sessions that are nominated by DCMP members. All invited speakers that we host must be nominated in advance by a DCMP member.

The deadline to submit a nomination is coming up soon: August 23, 2019.

Please take a moment to submit an invited session nomination.

Notes regarding nominations:
  • All nominations must be submitted through ScholarOne by August 23, 2019.
  • In ScholarOne, an invited session should be submitted as an "Invited Symposium Nomination".
  • An invited session consists of 5 speakers. You may include up to 2 alternate speakers in your nomination, in the event that one of the original 5 does not work out.
  • While no invited speakers will be guaranteed placement in the program until after all nominations have been reviewed, please get a tentative confirmation of interest from your nominated speakers. There will be a place in the nomination to indicate this.
  • A person cannot give technical invited talks in two consecutive years. A list of people that gave technical invited talks in 2019, and are therefore ineligible for 2020, can be found on this page.
  • Nominations of women, members of underrepresented minority groups, and scientists from outside the United States are especially encouraged. 
Be sure to select a DCMP Category in your nomination. DCMP categories are:

Thank you for your prompt attention to this matter.

Daniel Arovas, DCMP Chair, and Eva Andrei, DCMP Chair-Elect

n-Category Café Even-Dimensional Balls

Some of the oddballs on the nn-Café are interested in odd-dimensional balls, but here’s a nice thing about even-dimensional balls: the volume of the 2n2n-dimensional ball of radius rr is

(πr 2) nn! \frac{(\pi r^2)^n}{n!}

Dillon Berger pointed out that summing up over all nn we get

n=0 (πr 2) nn!=e πr 2 \sum_{n=0}^\infty \frac{(\pi r^2)^n}{n!} = e^{\pi r^2}

It looks nice. But what does it mean?

First, why is it true?

We can show the volume of the 2n2n-dimensional ball of radius rr is (πr 2) n/n!(\pi r^2)^n/n! inductively. How? It’s enough to show that the volume of the unit 2(n+1)2(n+1)-ball is π/(n+1)\pi /(n+1) times the volume of the unit 2n2n-ball. Wikipedia gives a proof, and Greg Egan recently summarized the argument in a couple of tweets.

The idea is to map the unit 2(n+1)2(n+1)-ball to the unit disk in the plane via a projection 2n 2\mathbb{R}^{2n} \to \mathbb{R}^2. Imagine a 2(n+1)2(n+1)-ball hovering over a disk! I can’t draw it, because the first interesting example is 4-dimensional. But imagine it.

The points over any point in the disk at distance rr from the origin form a 2n2n-ball of radius 1r 2\sqrt{1 - r^2}. We can get the volume of the 2(n+1)2(n+1)-ball by integrating the volumes of these 2n2n-balls over the disk.

That’s the idea. The rest is just calculus. Let the volume of the unit 2n2n-ball be V 2nV_{2n}. We do the integral using polar coordinates:

V 2(n+1)= 0 1V 2n(1r 2) n2πrdr V_{2(n+1)} = \int_0^1 V_{2n} (1 - r^2)^n \; 2 \pi r d r

Make the substitution u=1r 2u = 1 - r^2, so du=2rdrd u = -2 r d r. Some minus signs and 2’s cancel and we get

V 2(n+1)=πV 2n 0 1u ndu=πn+1V 2n V_{2(n+1)} = \pi V_{2n} \int_0^1 u^n \, d u = \frac{\pi}{n+1} V_{2n}

as desired!

So that’s simple enough: since V 0=1V_0 = 1 we get

V 2n=π nn! V_{2n} = \frac{\pi^n}{n!}

And maybe that’s all that needs to be said. But suppose we follow Dillon Berger and sum the volumes of all even-dimensional balls of radius rr. We get

n=0 V 2nr 2n= n=0 (πr 2) nn!=e πr 2 \sum_{n=0}^\infty V_{2n} r^{2n} = \sum_{n=0}^\infty \frac{(\pi r^2)^n}{n!} = e^{\pi r^2}

This looks cute, but what does it mean?

It seems odd to sum the volumes of shapes that have different dimensions. But we do it in classical statistical mechanics when considering a system of particles with a variable numbers of particles: a so-called ‘grand canonical ensemble’, like an open container of gas, where molecules can flow in and out.

If we have a single classical harmonic oscillator, with momentum pp and position qq, its energy is

12(p 2+q 2) \frac{1}{2}(p^2 + q^2)

where I’m choosing units that get rid of distracting constants.

If we impose the constraint that the energy is E\le E for some number EE, the state of the oscillator is described by a point (p,q)(p,q) with

p 2+q 22E p^2 + q^2 \le 2 E

This is a point in the disk of radius r=2Er = \sqrt{2E}.

Now suppose we have an arbitrary collection of such oscillators—and we don’t know how many there are! Then the state of the system is described first by a natural number nn, the number of oscillators, and then momenta p 1,,p np_1, \dots, p_n and positions q 1,,q nq_1, \dots, q_n obeying

p 1 2+q 1 2++p n 2+q n 22E p_1^2 + q_1^2 + \cdots + p_n^2 + q_n^2 \le 2 E

In other words, it’s a point in the union of all even-dimensional balls of radius r=2Er = \sqrt{2E}.

Physicists would call this union the ‘phase space’ of our collection of oscillators. Its volume is the sum of the volumes of the balls:

n=0 V 2nr 2n=e πr 2=e 2πE \sum_{n=0}^\infty V_{2n} r^{2n} = e^{\pi r^2} = e^{2 \pi E}

Actually physicists would include the constants that I’m hiding, and this is actually somewhat interesting. For example, to make the area 2-form dpdqd p \wedge d q dimensionless, they divide it by Planck’s ‘quantum of action’. This is what allows them to add up volumes of balls (or other symplectic manifolds) of different dimensions and get something that doesn’t break the rules of dimensional analysis. But this also brings Planck’s constant into a question of classical statistical mechanics—a surprise, but a surprise that’s well-known among experts: it shows up in computations such as the entropy of an ideal gas.

And indeed, entropy is exactly what I’m interested in now! Why would a physicist care about the volume of phase space for a collection of arbitrarily many oscillators with total energy E\le E? Here’s why: if all we know about this collection of oscillators is that it’s energy is E\le E, its entropy is just the logarithm of this phase space volume! So, it’s

ln(e 2πE)=2πE \ln (e^{2 \pi E}) = 2 \pi E


Physicists will not like how I tucked all the constants under the rug, getting a formula for entropy that’s missing things like Boltzmann’s constant and Planck’s constant, as well as the mass of the particles in our harmonic oscillators, and their spring constant. I leave it as a challenge to restore all these constants correctly. I suspect that the 2π2\pi will get eaten by the 2π2 \pi hiding in Planck’s constant =h/2π\hbar = h / 2\pi. This I know: we will get something proportional to EE, but with units of entropy.

This is what I’ve got so far. It’s not yet a new proof that the volumes of all even-dimensional balls of radius rr sum to

e πr 2 e^{\pi r^2}

but rather a physical interpretation of this fact. Maybe someone can keep the ball rolling….

Jordan EllenbergWe are a long way from such sentiments

All of us living at a certain time on this planet together, and together experiencing all its earthly joys and sorrows, seeing the same sky, loving and hating what are, after all, the same things, each and every one of us condemned to suffer the same sentence, the same disappearance off the face of the earth, should really nurture the greatest tenderness towards each other, a feeling of the most heart-rending closeness, and should be literally screaming with terror and pain whenever we are parted by a fate which at any moment is fully capable of transforming every one of our separations, even if only meant to last ten minutes, into an eternal one. But, as you know, for most of the time, we are a long way from such sentiments, and often take leave of even those closest to us in the most thoughtless manner imaginable.

Ivan Bunin, “Long Ago,” 1921 (Sophie Lund, trans.)

n-Category Café The Conway 2-Groups

I recently bumped into this nice paper:

• Theo Johnson-Freyd and David Treumann, H 4(Co 0,)=/24\mathrm{H}^4(\mathrm{Co}_0,\mathbb{Z}) = \mathbb{Z}/24.

which proves just what it says: the 4th integral cohomology of the Conway group Co 0\mathrm{Co}_0, in the sense of group cohomology, is /24\mathbb{Z}/24. I want to point out a few immediate consequences.

The Leech lattice Λ 24\Lambda \subset \mathbb{R}^{24} is, up rotations and reflections, the only lattice in 24 dimensions that is:

  • unimodular: the volume of 24/Λ\mathbb{R}^{24}/\Lambda is 1,
  • even: vvv \cdot v is even for every vector vv in the lattice,


  • contains no roots: that is, vectors with vv=2v \cdot v = 2.

It comes in two chiral forms: that is, no transformation of determinant 1-1 maps this lattice to itself.

The group of rotations that preserves the Leech lattice is called the Conway group Co 0\mathrm{Co}_0. It has 8,315,553,613,086,720,000 elements. It’s not simple, because the transformation vvv \mapsto -v lies in the Conway group and commutes with everything else. If you mod out the Conway group by its center, which is just /2\mathbb{Z}/2, you get a finite simple group called Co 1\mathrm{Co}_1. But never mind, that’s just a distraction for now: I’m more interested in Co 0\mathrm{Co}_0.

In fact you don’t need to understand anything I just said, except Co 0\mathrm{Co}_0 is a group. As soon as Johnson-Freyd and Treumann tell us that

H 4(Co 0,)=/24, \mathrm{H}^4(\mathrm{Co}_0,\mathbb{Z}) = \mathbb{Z}/24,

we know that there are 24 inequivalent ways to extend this group Co 0\mathrm{Co}_0 to a 3-group. Let us call these the Conway 3-groups. (Beware: they may not all be inequivalent as 3-groups. They are inequivalent as extensions.)

We can think of a 3-group as a one-object weak 3-groupoid. In any of the Conway 3-groups, the one object has 8,315,553,613,086,720,000 automorphisms. These 1-morphisms compose in a manner given by the group Co 0\mathrm{Co}_0. The only 2-morphisms are identities, but each 2-morphism has \mathbb{Z}’s worth of automorphisms, so there are infinitely many 3-morphisms.

In a general 3-group the composition of 1-morphisms is associative only up to an ‘associator’, which obeys the pentagon equation up to a ‘pentagonator’, which obeys an equation coming from the 3d associahedron. In the Conway 3-groups, the associator is trivial because there are only identity 2-morphisms. The 24 fundamentally different integral 4-cocycles on Co 0\mathrm{Co}_0 give us 24 choices for the pentagonator. The cocycle condition means these pentagonators obey the equation coming from the associahedron.

We can play various games with these data. For example, any group GG has a ‘Bockstein homomorphism’ relating its 4th cohomology with \mathbb{Z} coefficients and its 3rd cohomology with U(1)\mathrm{U}(1) coefficients:

β:H 3(G,U(1))H 4(G,) \beta \colon H^3(G, \mathrm{U}(1)) \to H^4(G, \mathbb{Z})

and for some reason I think this is an isomorphism when GG is finite. Could someone check this out? I think it’s because H n(G,)H^n(G,\mathbb{R}) vanishes for n0n &ge; 0, and I vaguely seem to recall that’s because of some group averaging argument, but it’s been a long time since I’ve thought about this stuff.

If so, we also get a bunch of Conway 2-groups: 24 different ways of extending Co 0\mathrm{Co}_0 to a 2-group. These are one-object weak 2-groupoids where the morphisms form the group Co 0\mathrm{Co}_0, each morphism has a circle’s worth of automorphisms, and while composition of morphisms is associative there is still an associator given by some 3-cocycle α:Co 0 3U(1)\alpha \colon \mathrm{Co}_0^3 \to \mathrm{U}(1).

For either the 2-groups or the 3-groups, we can also start with not just Co 0\mathrm{Co}_0 but with the whole semidirect product Co 0Λ\mathrm{Co}_0 \ltimes \Lambda, which is the full symmetry group of the Leech lattice including translations.

Perhaps the 2-groups are a bit more intuitive. Here we start with the Leech lattice symmetries, but then decree that they compose associatively only up to a phase, which is the U(1)\mathrm{U}(1)-valued 3-cocycle. There are, nicely, 24 consistent ways to do this!

There’s a lot more to say, but maybe this will get the ball rolling.

August 09, 2019

Tommaso DorigoMy Take On The Breakthrough Prizes

I'll admit, I wanted to rather title this post "Billionaire Awards Prizes To Failed Theories", just for the sake of being flippant. But in any joke there is a little bit of truth, as I wish to discuss below.
The (not-so-anymore) news is that the "Special Breakthrough prize" in fundamental physics, instituted a decade ago by Russian philantropist Yuri Milner, and then co-funded by other filthy wealthy folks, recently went to three brillant theoretical physicists: Sergio Ferrara, Dan Freedman,  and Peter van Nieuwenzhuizen, who in the seventies developed an elegant quantum field theory, SuperGravity. 

read more

Matt von HippelBreakthrough Prize for Supergravity

This week, $3 Million was awarded by the Breakthrough Prize to Sergio Ferrara, Daniel Z. Freedman and Peter van Nieuwenhuizen, the discoverers of the theory of supergravity, part of a special award separate from their yearly Fundamental Physics Prize. There’s a nice interview with Peter van Nieuwenhuizen on the Stony Brook University website, about his reaction to the award.

The Breakthrough Prize was designed to complement the Nobel Prize, rewarding deserving researchers who wouldn’t otherwise get the Nobel. The Nobel Prize is only awarded to theoretical physicists when they predict something that is later observed in an experiment. Many theorists are instead renowned for their mathematical inventions, concepts that other theorists build on and use but that do not by themselves make testable predictions. The Breakthrough Prize celebrates these theorists, and while it has also been awarded to others who the Nobel committee could not or did not recognize (various large experimental collaborations, Jocelyn Bell Burnell), this has always been the physics prize’s primary focus.

The Breakthrough Prize website describes supergravity as a theory that combines gravity with particle physics. That’s a bit misleading: while the theory does treat gravity in a “particle physics” way, unlike string theory it doesn’t solve the famous problems with combining quantum mechanics and gravity. (At least, as far as we know.)

It’s better to say that supergravity is a theory that links gravity to other parts of particle physics, via supersymmetry. Supersymmetry is a relationship between two types of particles: bosons, like photons, gravitons, or the Higgs, and fermions, like electrons or quarks. In supersymmetry, each type of boson has a fermion “partner”, and vice versa. In supergravity, gravity itself gets a partner, called the gravitino. Supersymmetry links the properties of particles and their partners together: both must have the same mass and the same charge. In a sense, it can unify different types of particles, explaining both under the same set of rules.

In the real world, we don’t see bosons and fermions with the same mass and charge. If gravitinos exist, then supersymmetry would have to be “broken”, giving them a high mass that makes them hard to find. Some hoped that the Large Hadron Collider could find these particles, but now it looks like it won’t, so there is no evidence for supergravity at the moment.

Instead, supergravity’s success has been as a tool to understand other theories of gravity. When the theory was proposed in the 1970’s, it was thought of as a rival to string theory. Instead, over the years it consistently managed to point out aspects of string theory that the string theorists themselves had missed, for example noticing that the theory needed not just strings but higher-dimensional objects called “branes”. Now, supergravity is understood as one part of a broader string theory picture.

In my corner of physics, we try to find shortcuts for complicated calculations. We benefit a lot from toy models: simpler, unrealistic theories that let us test our ideas before applying them to the real world. Supergravity is one of the best toy models we’ve got, a theory that makes gravity simple enough that we can start to make progress. Right now, colleagues of mine are developing new techniques for calculations at LIGO, the gravitational wave telescope. If they hadn’t worked with supergravity first, they would never have discovered these techniques.

The discovery of supergravity by Ferrara, Freedman, and van Nieuwenhuizen is exactly the kind of work the Breakthrough Prize was created to reward. Supergravity is a theory with deep mathematics, rich structure, and wide applicability. There is of course no guarantee that such a theory describes the real world. What is guaranteed, though, is that someone will find it useful.

n-Category Café 2020 Category Theory Conferences

Here are some dates to help you plan your carbon emissions.

It seems ACT2019 just ended, but we’re already planning next year’s applied category theory conference and school! I’m happy to say that Brendan Fong and David Spivak have volunteered to run it at MIT on these dates:

  • Applied Category Theory School: June 29–July 3, 2020.
  • Applied Category Theory Conference: July 6–10, 2020.

The precise dates for the other big category theory conference, CT2020, have not yet been decided. However, it will take place in Genoa sometime in the interval June 18–28, 2020.

There may also be an additional applied theory school in Marrakesh during May 25–29, 2020. More on that later, with any luck!

Also: don’t forget to submit your abstracts for the November 2019 applied category theory special session at U. C. Riverside by September 3rd! We’ve got a great lineup of speakers, but anyone who wants to give a talk—including the invited speakers—needs to submit an abstract to the AMS website by September 3rd. The AMS has no mercy about this.

John Baez2020 Category Theory Conferences


Yes, my last post was about ACT2019, but we’re already planning next year’s applied category theory conference and school! I’m happy to say that Brendan Fong and David Spivak have volunteered to run it at MIT on these dates:

• Applied Category Theory School: June 29–July 3, 2020.
• Applied Category Theory Conference: July 6–10, 2020.

The precise dates for the other big category theory conference, CT2020, have not yet been decided. However, it will take place in Genoa sometime in the interval June 18–28, 2020.

There may also be an additional applied theory school in Marrakesh from May 25–29, 2020. More on that later, with any luck!

And don’t forget to submit your abstracts for the November 2019 applied category theory special session at U. C. Riverside by September 3rd! We’ve got a great lineup of speakers, but anyone who wants to give a talk—including the invited speakers—needs to submit an abstract to the AMS website by September 3rd. The AMS has no mercy about this.

John BaezApplied Category Theory Meeting at UCR


The American Mathematical Society is having their Fall Western meeting here at U. C. Riverside during the weekend of November 9th and 10th, 2019. Joe Moeller and I are organizing a session on Applied Category Theory! We already have some great speakers lined up:

• Tai-Danae Bradley
• Vin de Silva
• Brendan Fong
• Nina Otter
• Evan Patterson
• Blake Pollard
• Prakash Panangaden
• David Spivak
• Brad Theilman
• Dmitry Vagner
• Zhenghan Wang

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through the aether to me! Abstracts are due September 3rd, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 9–10 November 2019.

I will also be running a special meeting on diversity and excellence in mathematics on Friday November 8th. There will be a banquet that evening, and at some point I’ll figure out how tickets for that will work.

We had a special session like this in 2017, and it’s fun to think about how things have evolved since then.

David Spivak had already written Category Theory for the Sciences, but more recently he’s written another book on applied category theory, Seven Sketches, with Brendan Fong. He already had a company, but now he’s helping run Conexus, which plans to award grants of up to $1.5 million to startups that use category theory (in exchange for equity). Proposals are due June 30th, by the way!

I guess Brendan Fong was already working with David Spivak at MIT in the fall of 2017, but since then they’ve written Seven Sketches and developed a graphical calculus for logic in regular categories. He’s also worked on a functorial approach to machine learning—and now he’s using category theory to unify learners and lenses.

Blake Pollard had just finished his Ph.D. work at U.C. Riverside back in 2018. He will now talk about his work with Spencer Breiner and Eswaran Subrahmanian at the National Institute of Standards and Technology, using category theory to help develop the “smart grid”—the decentralized power grid we need now. Above he’s talking to Brendan Fong at the Centre for Quantum Technologies, in Singapore. I think that’s where they first met.

Eswaran Subrahmanian will also be giving a talk this time! He works at NIST and Carnegie Mellon; he’s an engineer who specializes in using mathematics to help design smart networks and other complex systems.

Nina Otter was a grad student at Oxford in 2017, but now she’s at UCLA and the University of Leipzig. She worked with Ulrike Tillmann and Heather Harrington on stratifying multiparameter persistent homology, and is now working on a categorical formulation of positional and role analysis in social networks. Like Brendan, she’s on the executive board of the applied category theory journal Compositionality.

I first met Tai-Danae Bradley at ACT2018. Now she will talk about her work at Tunnel Technologies, a startup run by her advisor John Terilla. They model sequences—of letters from an alphabet, for instance—using quantum states and tensor networks.

Vin de Silva works on topological data analysis using persistent cohomology so he’ll probably talk about that. He’s studied the “interleaving distance” between persistence modules, using category theory to treat it and the Gromov-Hausdorff metric in the same setting. He came to the last meeting and it will be good to have him back.

Evan Patterson is a statistics grad student at Stanford. He’s worked on knowledge representation in bicategories of relations, and on teaching machines to understand data science code by the semantic enrichment of dataflow graphs. He too came to the last meeting.

Dmitry Vagner was also at the last meeting, where he spoke about his work with Spivak on open dynamical systems and the operad of wiring diagrams. He is now working on mathematically defining and implementing (in Idris) wiring diagrams for symmetric monoidal categories.

Prakash Panangaden has long been a leader in applied category theory, focused on semantics and logic for probabilistic systems and languages, machine learning, and quantum information theory.

Brad Theilman is a grad student in computational neuroscience at U.C. San Diego. I first met him at ACT2018. He’s using algebraic topology to design new techniques for quantifying the spatiotemporal structure of neural activity in the auditory regions of the brain of the European starling. (I bet you didn’t see those last two words coming!)

Last but not least, Zhenghan Wang works on condensed matter physics and modular tensor categories at U.C. Santa Barbara. At Microsoft’s Station Q, he is using this research to help design topological quantum computers.

In short: a lot has been happening in applied category theory, so it will be good to get together and talk about it!

August 07, 2019

Cylindrical OnionOne unforgettable week with Nobel laureates

Guest post by Nadezda Chernyavskaya - ETH Zurich

  Bavarian Evening at the Lindau Nobel Laureate Meetings - Picture Credit: Christian Flemming

August 06, 2019

Matt StrasslerA Catastrophic Weekend for Theoretical High Energy Physics

It is beyond belief that not only am I again writing a post about the premature death of a colleague whom I have known for decades, but that I am doing it about two of them.

Over the past weekend, two of the world’s most influential and brilliant theoretical high-energy physicists — Steve Gubser of Princeton University and Ann Nelson of the University of Washington — fell to their deaths in separate mountain accidents, one in the Alps and one in the Cascades.

Theoretical high energy physics is a small community, and within the United States itself the community is tiny.  Ann and Steve were both justifiably famous and highly respected as exceptionally bright lights in their areas of research. Even for those who had not met them personally, this is a stunning and irreplaceable loss of talent and of knowledge.

But most of us did know them personally.  For me, and for others with a personal connection to them, the news is devastating and tragic. I encountered Steve when he was a student and I was a postdoc in the Princeton area, and later helped bring him into a social group where he met his future wife (a great scientist in her own right, and a friend of mine going back decades).  As for Ann, she was one of my teachers at Stanford in graduate school, then my senior colleague on four long scientific papers, and then my colleague (along with her husband David B. Kaplan) for five years at the University of Washington, where she had the office next to mine. I cannot express what a privilege it always was to work with her, learn from her, and laugh with her.

I don’t have the heart or energy right now to write more about this, but I will try to do so at a later time. Right now I join their spouses and families, and my colleagues, in mourning.

August 04, 2019

Jordan EllenbergA lot of affluent liberals

Binyamin Appelbaum wrote an article in the New York Times about my native county, Montgomery County, Maryland, and this is what he tweeted about it:

A lot of affluent liberals in Montgomery County, outside Washington, D.C., fiercely opposed a plan to build a little more affordable housing. “Affordable is not what people move here for,” one of them told me.

The plan in question was approved unanimously by the County Council, all nine of whom were Democrats, but, as Appelbaum reported, not everyone in progressive Montgomery County was happy about it. He quoted several residents making remarks that made them look like, well, uptight snobs:

Ellen Paul, 59, said in-law suites were bad enough: “It’s changing suburbia to allow two homes on each lot. You’ll have strangers walking by your house all the time now.”

“That’s where the backyard trailers are going to go,” said Dale Barnhard, one of the more than 1,500 people who signed a petition opposing the “dramatic” change in rules.

or worse:

One county resident, Katherine C. Gugulis, wrote a protest letter in The Washington Post that concluded, “Just because others flee crime-ridden and poverty-stricken areas doesn’t mean Montgomery County has to be turned into a slum to accommodate them.”

I was interested in these affluent liberals and wanted to learn about them. A few minutes of Googling later, here’s what I found out. Katherine Gangulis is a Republican appointed official. Ellen Paul, according to her public LinkedIn profile, is a former staff assistant to a Republican member of the House of Representatives, and her most recent listed activity was public relations for a Republican candidate for Montgomery County Board of Education in 2014. Dale Barnhard doesn’t have any political career, as far as I know, but he wrote a letter to the Washington Post last year complaining about their biased coverage of Donald Trump. Hessie Harris, who worries aloud in Appelbaum’s piece about “flophouses” and literally utters the words “There goes the neighborhood,” is listed by the FEC as contributing thousands of dollars a year to Americans for Legal Immigration; that’s a PAC which describes its mission as “our fight AGAINST the costly and deadly illegal immigration & illegal immigrant invasion of America.”

These people aren’t liberals!

I don’t doubt there are liberals in Montgomery County who oppose relaxation of zoning. (I grew up there, and I live in Madison, WI: liberal NIMBYism is not a foreign idea to me.) But why weren’t any of those people in Appelbaum’s article? Why run a piece featuring a bunch of conservatives protesting a decision by an all-Democratic county council and bill it as a portrait of progressivism failing to live up to its ideals?

Here’s my theory. I don’t think Appelbaum purposely gathered quotes from Montgomery County’s small but nonzero Republican population for his piece. I think he had a story already in mind, a story of rich liberals who profess a commitment to affordable housing but really have a lot of contempt for the kind of person who lives there, and who would certainly under no circumstances stand for such people residing in Potomac or the nice parts of Bethesda, you know, the Whitman part. Those people might say their opposition to density had to do with something other than snobbery. But their words would show how they truly felt about the poorly-to-mediocrely-heeled.

And he got the quotes he wanted, the quotes that supported this story. Good, salty quotes. But the people who said those things weren’t self-styled progressives. They were Republicans.

Maybe there’s a reason for that!

August 03, 2019

Jordan EllenbergSnappy comeback

AB has curly hair, really curly hair, and strangers comment on it all the time. My stance on this is to tell her “it’s not really polite for people to randomly comment on your appearance, but it’s not impolite enough for you to be impolite back — just say thanks and move on.” Is that the right stance?

Anyway, though, today someone at the farmer’s market said “I would die to have hair like yours” and AB said, in a non-combative, sunny way, “How would that help you if you were dead?” and I was super proud.

August 02, 2019

Clifford JohnsonNews from the Front XIX: A-Masing de Sitter

[caption id="attachment_19335" align="alignright" width="215"] Diamond maser. Image from Jonathan Breeze, Imperial College[/caption]This is part 2 of a chat about some recent thoughts and results I had about de Sitter black holes, reported in this arxiv preprint. Part 1 is here, so maybe best to read that first.

Now let us turn to de Sitter black holes. I mean here any black hole for which the asymptotic spacetime is de Sitter spacetime, which is to say it has positive cosmological constant. This is of course also interesting since one of the most natural (to some minds) possible explanations for the accelerating expansion of our universe is a cosmological constant, so maybe all black holes in our universe are de Sitter black holes in some sense. This is also interesting because you often read here about explorations of physics involving negative cosmological constant, so this is a big change!

One of the things people find puzzling about applying the standard black hole thermodynamics is that there are two places where the standard techniques tell you there should be a temperature associated with them. There's the black hole horizon itself, and there's also the cosmological horizon. These each have temperature, and they are not necessarily the same. For the Schwarzschild-de Sitter black hole, for example, (so, no spins or charges... just a mass with an horizon associated with it, like in flat space), the black hole's temperature is always larger than that of the cosmological horizon. In fact, it runs from very large (where the black hole is small) all the way (as the black hole grows) to zero, where the two horizons coincide.

You might wonder, as many have, how to make sense of the two temperatures. This cannot, for a start, be an equilibrium thermodynamics system. Should there be dynamics where the two temperatures try to equalise? Is there heat flow from one horizon to another, perhaps? Maybe there's some missing ingredient needed to make sense of this - do we have any right to be writing down temperatures (an equilibrium thermodynamics concept, really) when the system is not in equilibrium? (Actually, you could ask that about Schwarzschild in flat space - you compute the temperature and then discover that it depends upon the mass in such a way that the system wants to move to a different temperature. But I digress.)

The point of my recent work is that it is entirely within the realm of physics we have to hand to make sense of this. The simple system described in the previous post - the three level maser - has certain key interconnected features that seem relevant:

  • admits two distinct temperatures and
  • a maximum energy, and
  • a natural instability (population inversion) and a channel for doing work - the maser output.

My point is that these features are all present for de Sitter black holes too, starting with the two temperatures. But you won't see the rest by staring at just the Schwarzschild case, you need to add rotation, or charge (or both). As we shall see, the ability to reduce angular momentum, or to reduce charge, will be the work channel. I'll come back to the maximum [...] Click to continue reading this post

The post News from the Front XIX: A-Masing de Sitter appeared first on Asymptotia.

Terence TaoTwisted convolution and the sensitivity conjecture

Earlier this month, Hao Huang (who, incidentally, was a graduate student here at UCLA) gave a remarkably short proof of a long-standing problem in theoretical computer science known as the sensitivity conjecture. See for instance this blog post of Gil Kalai for further discussion and links to many other online discussions of this result. One formulation of the theorem proved is as follows. Define the {n}-dimensional hypercube graph {Q_n} to be the graph with vertex set {({\bf Z}/2{\bf Z})^n}, and with every vertex {v \in ({\bf Z}/2{\bf Z})^n} joined to the {n} vertices {v + e_1,\dots,v+e_n}, where {e_1,\dots,e_n} is the standard basis of {({\bf Z}/2{\bf Z})^n}.

Theorem 1 (Lower bound on maximum degree of induced subgraphs of hypercube) Let {E} be a set of at least {2^{n-1}+1} vertices in {Q_n}. Then there is a vertex in {E} that is adjacent (in {Q_n}) to at least {\sqrt{n}} other vertices in {E}.

The bound {\sqrt{n}} (or more precisely, {\lceil \sqrt{n} \rceil}) is completely sharp, as shown by Chung, Furedi, Graham, and Seymour; we describe this example below the fold. When combined with earlier reductions of Gotsman-Linial and Nisan-Szegedy; we give these below the fold also.

Let {A = (a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} be the adjacency matrix of {Q_n} (where we index the rows and columns directly by the vertices in {({\bf Z}/2{\bf Z})^n}, rather than selecting some enumeration {1,\dots,2^n}), thus {a_{vw}=1} when {w = v+e_i} for some {i=1,\dots,n}, and {a_{vw}=0} otherwise. The above theorem then asserts that if {E} is a set of at least {2^{n-1}+1} vertices, then the {E \times E} minor {(a_{vw})_{v,w \in E}} of {A} has a row (or column) that contains at least {\sqrt{n}} non-zero entries.

The key step to prove this theorem is the construction of rather curious variant {\tilde A} of the adjacency matrix {A}:

Proposition 2 There exists a {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n} matrix {\tilde A = (\tilde a_{vw})_{v,w \in ({\bf Z}/2{\bf Z})^n}} which is entrywise dominated by {A} in the sense that

\displaystyle  |\tilde a_{vw}| \leq a_{vw} \hbox{ for all } v,w \in ({\bf Z}/2{\bf Z})^n \ \ \ \ \ (1)

and such that {\tilde A} has {\sqrt{n}} as an eigenvalue with multiplicity {2^{n-1}}.

Assuming this proposition, the proof of Theorem 1 can now be quickly concluded. If we view {\tilde A} as a linear operator on the {2^n}-dimensional space {\ell^2(({\bf Z}/2{\bf Z})^n)} of functions of {({\bf Z}/2{\bf Z})^n}, then by hypothesis this space has a {2^{n-1}}-dimensional subspace {V} on which {\tilde A} acts by multiplication by {\sqrt{n}}. If {E} is a set of at least {2^{n-1}+1} vertices in {Q_n}, then the space {\ell^2(E)} of functions on {E} has codimension at most {2^{n-1}-1} in {\ell^2(({\bf Z}/2{\bf Z})^n)}, and hence intersects {V} non-trivially. Thus the {E \times E} minor {\tilde A_E} of {\tilde A} also has {\sqrt{n}} as an eigenvalue (this can also be derived from the Cauchy interlacing inequalities), and in particular this minor has operator norm at least {\sqrt{n}}. By Schur’s test, this implies that one of the rows or columns of this matrix has absolute values summing to at least {\sqrt{n}}, giving the claim.

Remark 3 The argument actually gives a strengthening of Theorem 1: there exists a vertex {v_0} of {E} with the property that for every natural number {k}, there are at least {n^{k/2}} paths of length {k} in the restriction {Q_n|_E} of {Q_n} to {E} that start from {v_0}. Indeed, if we let {(u_v)_{v \in E}} be an eigenfunction of {\tilde A} on {\ell^2(E)}, and let {v_0} be a vertex in {E} that maximises the value of {|u_{v_0}|}, then for any {k} we have that the {v_0} component of {\tilde A_E^k (u_v)_{v \in E}} is equal to {n^{k/2} |u_{v_0}|}; on the other hand, by the triangle inequality, this component is at most {|u_{v_0}|} times the number of length {k} paths in {Q_n|_E} starting from {v_0}, giving the claim.

This argument can be viewed as an instance of a more general “interlacing method” to try to control the behaviour of a graph {G} on all large subsets {E} by first generating a matrix {\tilde A} on {G} with very good spectral properties, which are then partially inherited by the {E \times E} minor of {\tilde A} by interlacing inequalities. In previous literature using this method (see e.g., this survey of Haemers, or this paper of Wilson), either the original adjacency matrix {A}, or some non-negatively weighted version of that matrix, was used as the controlling matrix {\tilde A}; the novelty here is the use of signed controlling matrices. It will be interesting to see what further variants and applications of this method emerge in the near future. (Thanks to Anurag Bishoi in the comments for these references.)

The “magic” step in the above argument is constructing {\tilde A}. In Huang’s paper, {\tilde A} is constructed recursively in the dimension {n} in a rather simple but mysterious fashion. Very recently, Roman Karasev gave an interpretation of this matrix in terms of the exterior algebra on {{\bf R}^n}. In this post I would like to give an alternate interpretation in terms of the operation of twisted convolution, which originated in the theory of the Heisenberg group in quantum mechanics.

Firstly note that the original adjacency matrix {A}, when viewed as a linear operator on {\ell^2(({\bf Z}/2{\bf Z})^n)}, is a convolution operator

\displaystyle  A f = f * \mu


\displaystyle \mu(x) := \sum_{i=1}^n 1_{x=e_i}

is the counting measure on the standard basis {e_1,\dots,e_n}, and {*} denotes the ordinary convolution operation

\displaystyle  f * g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} f(y) g(x-y) = \sum_{y_1+y_2 = x} f(y_1) g(y_2).

As is well known, this operation is commutative and associative. Thus for instance the square {A^2} of the adjacency operator {A} is also a convolution operator

\displaystyle  A^2 f = f * (\mu * \mu)(x)

where the convolution kernel {\mu * \mu} is moderately complicated:

\displaystyle  \mu*\mu(x) = n \times 1_{x=0} + \sum_{1 \leq i < j \leq n} 2 \times 1_{x = e_i + e_j}.

The factor {2} in this expansion comes from combining the two terms {1_{x=e_i} * 1_{x=e_j}} and {1_{x=e_j} * 1_{x=e_i}}, which both evaluate to {1_{x=e_i+e_j}}.

More generally, given any bilinear form {B: ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n \rightarrow {\bf Z}/2{\bf Z}}, one can define the twisted convolution

\displaystyle  f *_B g(x) := \sum_{y \in ({\bf Z}/2{\bf Z})^n} (-1)^{B(y,x-y)} f(y) g(x-y)

\displaystyle  = \sum_{y_1+y_2=x} (-1)^{B(y_1,y_2)} f(y_1) g(y_2)

of two functions {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}. This operation is no longer commutative (unless {B} is symmetric). However, it remains associative; indeed, one can easily compute that

\displaystyle  (f *_B g) *_B h(x) = f *_B (g *_B h)(x)

\displaystyle = \sum_{y_1+y_2+y_3=x} (-1)^{B(y_1,y_2)+B(y_1,y_3)+B(y_2,y_3)} f(y_1) g(y_2) h(y_3).

In particular, if we define the twisted convolution operator

\displaystyle  A_B f(x) := f *_B \mu(x)

then the square {A_B^2} is also a twisted convolution operator

\displaystyle  A_B^2 f = f *_B (\mu *_B \mu)

and the twisted convolution kernel {\mu *_B \mu} can be computed as

\displaystyle  \mu *_B \mu(x) = (\sum_{i=1}^n (-1)^{B(e_i,e_i)}) 1_{x=0}

\displaystyle + \sum_{1 \leq i < j \leq n} ((-1)^{B(e_i,e_j)} + (-1)^{B(e_j,e_i)}) 1_{x=e_i+e_j}.

For general bilinear forms {B}, this twisted convolution is just as messy as {\mu * \mu} is. But if we take the specific bilinear form

\displaystyle  B(x,y) := \sum_{1 \leq i < j \leq n} x_i y_j \ \ \ \ \ (2)

then {B(e_i,e_i)=0} for {1 \leq i \leq n} and {B(e_i,e_j)=1, B(e_j,e_i)=0} for {1 \leq i < j \leq n}, and the above twisted convolution simplifies to

\displaystyle  \mu *_B \mu(x) = n 1_{x=0}

and now {A_B^2} is very simple:

\displaystyle  A_B^2 f = n f.

Thus the only eigenvalues of {A_B} are {+\sqrt{n}} and {-\sqrt{n}}. The matrix {A_B} is entrywise dominated by {A} in the sense of (1), and in particular has trace zero; thus the {+\sqrt{n}} and {-\sqrt{n}} eigenvalues must occur with equal multiplicity, so in particular the {+\sqrt{n}} eigenvalue occurs with multiplicity {2^{n-1}} since the matrix has dimensions {2^n \times 2^n}. This establishes Proposition 2.

Remark 4 Twisted convolution {*_B} is actually just a component of ordinary convolution, but not on the original group {({\bf Z}/2{\bf Z})^n}; instead it relates to convolution on a Heisenberg group extension of this group. More specifically, define the Heisenberg group {H} to be the set of pairs {(x, t) \in ({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})} with group law

\displaystyle  (x,t) \cdot (y,s) := (x+y, t+s+B(x,y))

and inverse operation

\displaystyle  (x,t)^{-1} = (-x, -t+B(x,x))

(one can dispense with the negative signs here if desired, since we are in characteristic two). Convolution on {H} is defined in the usual manner: one has

\displaystyle  F*G( (x,t) ) := \sum_{(y,s) \in H} F(y,s) G( (y,s)^{-1} (x,t) )

for any {F,G \in \ell^2(H)}. Now if {f \in \ell^2(({\bf Z}/2{\bf Z})^n)} is a function on the original group {({\bf Z}/2{\bf Z})^n}, we can define the lift {\tilde f \in \ell^2(H)} by the formula

\displaystyle  \tilde f(x,t) := (-1)^t f(x)

and then by chasing all the definitions one soon verifies that

\displaystyle  \tilde f * \tilde g = 2 \widetilde{f *_B g}

for any {f,g \in \ell^2(({\bf Z}/2{\bf Z})^n)}, thus relating twisted convolution {*_B} to Heisenberg group convolution {*}.

Remark 5 With the twisting by the specific bilinear form {B} given by (2), convolution by {1_{x=e_i}} and {1_{x=e_j}} now anticommute rather than commute. This makes the twisted convolution algebra {(\ell^2(({\bf Z}/2{\bf Z})^n), *_B)} isomorphic to a Clifford algebra {Cl({\bf R}^n,I_n)} (the real or complex algebra generated by formal generators {v_1,\dots,v_n} subject to the relations {(v_iv_j+v_jv_i)/2 = 1_{i=j}} for {i,j=1,\dots,n}) rather than the commutative algebra more familiar to abelian Fourier analysis. This connection to Clifford algebra (also observed independently by Tom Mrowka and by Daniel Matthews) may be linked to the exterior algebra interpretation of the argument in the recent preprint of Karasev mentioned above.

Remark 6 One could replace the form (2) in this argument by any other bilinear form {B'} that obeyed the relations {B'(e_i,e_i)=0} and {B'(e_i,e_j) + B'(e_j,e_i)=1} for {i \neq j}. However, this additional level of generality does not add much; any such {B'} will differ from {B} by an antisymmetric form {C} (so that {C(x,x) = 0} for all {x}, which in characteristic two implied that {C(x,y) = C(y,x)} for all {x,y}), and such forms can always be decomposed as {C(x,y) = C'(x,y) + C'(y,x)}, where {C'(x,y) := \sum_{i<j} C(e_i,e_j) x_i y_j}. As such, the matrices {A_B} and {A_{B'}} are conjugate, with the conjugation operator being the diagonal matrix with entries {(-1)^{C'(x,x)}} at each vertex {x}.

Remark 7 (Added later) This remark combines the two previous remarks. One can view any of the matrices {A_{B'}} in Remark 6 as components of a single canonical matrix {A_{Cl}} that is still of dimensions {({\bf Z}/2{\bf Z})^n \times ({\bf Z}/2{\bf Z})^n}, but takes values in the Clifford algebra {Cl({\bf R}^n,I_n)} from Remark 5; with this “universal algebra” perspective, one no longer needs to make any arbitrary choices of form {B}. More precisely, let {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} denote the vector space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} from the hypercube to the Clifford algebra; as a real vector space, this is a {2^{2n}} dimensional space, isomorphic to the direct sum of {2^n} copies of {\ell^2(({\bf Z}/2{\bf Z})^n)}, as the Clifford algebra is itself {2^n} dimensional. One can then define a canonical Clifford adjacency operator {A_{Cl}} on this space by

\displaystyle  A_{Cl} f(x) := \sum_{i=1}^n f(x+e_i) v_i

where {v_1,\dots,v_n} are the generators of {Cl({\bf R}^n,I_n)}. This operator can either be identified with a Clifford-valued {2^n \times 2^n} matrix or as a real-valued {2^{2n} \times 2^{2n}} matrix. In either case one still has the key algebraic relations {A_{Cl}^2 = n} and {\mathrm{tr} A_{Cl} = 0}, ensuring that when viewed as a real {2^{2n} \times 2^{2n}} matrix, half of the eigenvalues are equal to {+\sqrt{n}} and half equal to {-\sqrt{n}}. One can then use this matrix in place of any of the {A_{B'}} to establish Theorem 1 (noting that Schur’s test continues to work for Clifford-valued matrices because of the norm structure on {Cl({\bf R}^n,I_n)}).

To relate {A_{Cl}} to the real {2^n \times 2^n} matrices {A_{B'}}, first observe that each point {x} in the hypercube {({\bf Z}/2{\bf Z})^n} can be associated with a one-dimensional real subspace {\ell_x} (i.e., a line) in the Clifford algebra {Cl({\bf R}^n,I_n)} by the formula

\displaystyle  \ell_{e_{i_1} + \dots + e_{i_k}} := \mathrm{span}_{\bf R}( v_{i_1} \dots v_{i_k} )

for any {i_1,\dots,i_k \in \{1,\dots,n\}} (note that this definition is well-defined even if the {i_1,\dots,i_k} are out of order or contain repetitions). This can be viewed as a discrete line bundle over the hypercube. Since {\ell_{x+e_i} = \ell_x e_i} for any {i}, we see that the {2^n}-dimensional real linear subspace {V} of {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} of sections of this bundle, that is to say the space of functions {f: ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n)} such that {f(x) \in \ell_x} for all {x \in ({\bf Z}/2{\bf Z})^n}, is an invariant subspace of {A_{Cl}}. (Indeed, using the left-action of the Clifford algebra on {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))}, which commutes with {A_{Cl}}, one can naturally identify {\ell^2( ({\bf Z}/2{\bf Z})^n \rightarrow Cl({\bf R}^n,I_n))} with {Cl({\bf R}^n,I_n) \otimes V}, with the left action of {Cl({\bf R}^n,I_n)} acting purely on the first factor and {A_{Cl}} acting purely on the second factor.) Any trivialisation of this line bundle lets us interpret the restriction {A_{Cl}|_V} of {A_{Cl}} to {V} as a real {2^n \times 2^n} matrix. In particular, given one of the bilinear forms {B'} from Remark 6, we can identify {V} with {\ell^2(({\bf Z}/2{\bf Z})^n)} by identifying any real function {f \in \ell^2( ({\bf Z}/2{\bf Z})^n)} with the lift {\tilde f \in V} defined by

\displaystyle  \tilde f(e_{i_1} + \dots + e_{i_k}) := (-1)^{\sum_{1 \leq j < j' \leq k} B'(e_{i_j}, e_{i_{j'}})}

\displaystyle f(e_{i_1} + \dots + e_{i_k}) v_{i_1} \dots v_{i_k}

whenever {1 \leq i_1 < \dots < i_k \leq n}. A somewhat tedious computation using the properties of {B'} then eventually gives the intertwining identity

\displaystyle  A_{Cl} \tilde f = \widetilde{A_{B'} f}

and so {A_{B'}} is conjugate to {A_{Cl}|_V}.

— 1. The Chung, Furedi, Graham, and Seymour example —

The paper of by Chung, Furedi, Graham, and Seymour gives, for any {n \geq 1}, an example of a subset {E} of {Q_n} of cardinality {2^{n-1}+1} for which the maximum degree of {Q_n} restricted to {E} is at most {\lceil \sqrt{n} \rceil}, thus showing that Theorem 1 cannot be improved (beyond the trivial improvement of upgrading {\sqrt{n}} to {\lceil \sqrt{n} \rceil}, because the maximum degree is obviously a natural number).

Define the “Möbius function” {\mu: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the function

\displaystyle  \mu(x) := (-1)^{x_1+\dots+x_n}

for {x = (x_1,\dots,x_n) \in ({\bf Z}/2{\bf Z})^n}. This function is extremely balanced on coordinate spaces. Indeed, from the binomial theorem (which uses the convention {0^0=1}) we have

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n} \mu(x) = (1-1)^n = 1_{n=0}.

More generally, given any index set {I \subset \{1,\dots,n\}} of cardinality {|I|}, we have

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n: x_i=0 \forall i \in I} \mu(x) = (1-1)^{n-|I|} = 1_{n=|I|}.

Now let {I_1,\dots,I_k} be a partition of {\{1,\dots,n\}} into disjoint non-empty sets. For each {j \in J}, let {V_j} be the subspace of {({\bf Z}/2{\bf Z})^n} consisting of those {x = (x_1,\dots,x_n)} such that {x_i = 0} for all {i \in I_j}. Then for any {J \subset \{1,\dots,k\}}, we have

\displaystyle  \sum_{x \in \bigcap_{j \in J} V_j} \mu(x) = (1-1)^{1_{n= \sum_{j \in J}|I_i|}} = 1_{n = \sum_{j \in J} |I_i|},

and the right-hand side vanishes if {J \neq \{1,\dots,k\}} and equals {1} when {J = \{1,\dots,k\}}. Applying the inclusion-exclusion principle, we conclude that

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} \mu(x) = (-1)^k

and thus also (assuming {n \geq 1})

\displaystyle  \sum_{x \in \bigcup_{j=1}^k V_j} \mu(x) = (-1)^{k+1}

so that

\displaystyle  \sum_{x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} (-1)^k \mu(x) + \sum_{x \in \bigcup_{j=1}^k V_j} (-1)^{k+1} \mu(x) = 2

Thus, if {E} denotes the set of those {x \in ({\bf Z}/2{\bf Z})^n \backslash \bigcup_{j=1}^k V_j} with {(-1)^k \mu(x) = 1}, together with those {x \in \bigcup_{j=1}^k V_j} with {(-1)^{k+1} \mu(x) = 1}, then {E} has to have two more elements than its complement {({\bf Z}/2{\bf Z})^n \backslash E}, and hence {E} has cardinality {2^{n-1}+1}.

Now observe that, if {x \in \bigcup_{j=1}^k V_j} with {(-1)^{k+1} \mu(x)=1} and {i = 1,\dots,n}, then {(-1)^k \mu(x+e_i)=1}, and if {x \in V_j} then {x+e_i \in V_j} unless {i \in I_j}. Thus in this case the total number of {i} for which {x+e_i \in E} is at most {\max_j |I_j|}. Conversely, if {x \not \in \bigcup_{j=1}^k V_j} with {(-1)^k \mu(x)=1} and {i=1,\dots,n}, then {(-1)^{k+1} \mu(x+e_i)=1}, and for each {j} there is at most one {i} that will make {x+e_i} lie in {V_j}. Hence in this case the total number of {i} for which {x+e_i \in E} is at most {k}. Thus the maximum degree of the subgraph of {Q_n} induced by {E} is at most {\max( \max_j |I_j|, k )}. By choosing the {I_j} to be a partition of {\{1,\dots,n\}} into {k=\lceil \sqrt{n} \rceil} pieces, each of cardinality at most {\lceil \sqrt{n} \rceil}, we obtain the claim.

Remark 8 Suppose that {n = k^2} is a perfect square, then the lower bound here exactly matches the upper bound in Theorem 1. In particular, the {E \times E} minor of the matrix {\tilde A} must have an eigenvector of eigenvalue {\sqrt{n}}. Such an eigenvector can be explicitly constructed as follows. Let {f \in \ell^2(E)} be the vector defined by setting

\displaystyle  f(x) := (-1)^{\sum_{1 \leq j < j' \leq k} B( e_{i_j}, e_{i_{j'}} )}

whenever {x} is of the form

\displaystyle  x = \sum_{j=1}^k e_{i_j} \ \ \ \ \ (3)

for some {i_j \in I_j}, {j =1,\dots,k}, and

\displaystyle  f(x) := (-1)^{\sum_{1 \leq j < j' \leq k: j,j' \neq j_0} B( e_{i_j}, e_{i_{j'}} ) + k - j_0}

whenever {x} is of the form

\displaystyle  x = \sum_{1 \leq j \leq k: j \neq j_0} e_{i_j} \ \ \ \ \ (4)

for some {i_j \in I_j}, {j =1,\dots,k}, {j \neq j_0}, and {f(x)=0} for all other {x \in E} (one easily verifies that the previous types of {x} lie in {E}). We claim that

\displaystyle  A_B f(x) = k f(x)

for all {x \in E}. Expanding out the left-hand side, we wish to show that

\displaystyle  \sum_{1 \leq i \leq n: x+e_i \in E} (-1)^{B(x,e_i)} f(x+e_i) = k f(x) \ \ \ \ \ (5)

for all {x \in E}.

First suppose that {x} is of the form (3). One checks that {x+e_i} lies in {E} precisely when {i = i_{j_0}} for one of the {j_0=1,\dots,k}, in which case

\displaystyle  f(x+e_{i_{j_0}}) = f(x) (-1)^{\sum_{1 \leq j < j_0} B( e_{i_j}, e_{i_{j_0}}) + \sum_{j_0 < j \leq k} B(e_{i_{j_0}}, e_{i_j}) + k-j_0}.

Since {B(e_{i_{j_0}}, e_{i_j}) = B(e_{i_j}, e_{i_{j_0}}) + 1}, this simplifies using (3) as

\displaystyle  (-1)^{B(x,e_{i_{j_0}})} f(x + e_{i_{j_0}}) = f(x)

giving (5) in this case. Similarly, if {x} is of the form (4), then {y+e_i} lies in {E} precisely when {i \in I_{j_0}}, in which case one can argue as before to show that

\displaystyle  (-1)^{B(x,e_{i})} f(x + e_{i}) = f(x)

and (5) again follows. Finally, if {x \in E} is not of either of the two forms (3), (4), one can check that {x+e_i} is never of these forms either, and so both sides of (5) vanish.

The same analysis works for any of the other bilinear forms {B'} in Remark 6. Using the Clifford-valued operator {A_{Cl}} from Remark 7, the eigenfunction {f \in \ell^2( E \rightarrow Cl({\bf R}^n,I_n) )} is cleaner; it is defined by

\displaystyle  f(x) := v_{e_{i_1}} \dots v_{e_{i_k}}

when {x} is of the form (3), and

\displaystyle  f(x) := (-1)^{k-j_0} v_{e_{i_1}} \dots v_{e_{i_{j_0-1}}} v_{e_{i_{j_0+1}}} v_{e_{i_k}}

when {x} is of the form (4), with {f(x)=0} otherwise.

— 2. From induced subgraph bounds to the sensitivity conjecture —

On the hypercube {({\bf Z}/2{\bf Z})^n}, let {X_1,\dots,X_n: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} denote the functions

\displaystyle  X_i(x) := (-1)^{x_i}.

The monomials {\prod_{i \in I} X_i} in {X_i} are then the characters of {({\bf Z}/2{\bf Z})^n}, so by Fourier expansion every function {f \in \ell^2(({\bf Z}/2{\bf Z})^n)} can be viewed as a polynomial in the {X_i} (with each monomial containing at most one copy of {X_i}; higher powers of each {X_i} are unnecessary since {X_i^2=1}. In particular, one can meaningfully talk about the degree of a function {f \in \ell^2(({\bf Z}/2{\bf Z})^n)}. Observe also that the Möbius function {\mu} from the preceding section is just the monomial {X_1 \dots X_n}.

Define the sensitivity {s(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the largest number for which there is an {x} such that there are at least {s(f)} values of {i = 1,\dots,n} with {f(x+e_i) \neq f(x)}. Using an argument of Gotsman and Linial, we can now relate the sensitivity of a function to its degree:

Corollary 9 (Lower bound on sensitivity) For any boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}}, one has {s(f) \geq \sqrt{\mathrm{deg}(f)}}.

Proof: Write {m = \mathrm{deg}(f)}. By permuting the indices, we may assume that {f} contains a non-trivial multiple of the monomial {X_1 \dots X_m}. By restricting to the subspace {({\bf Z}/2{\bf Z})^m \times \{0\}^{n-m} \equiv ({\bf Z}/2{\bf Z})^m} (which cannot increase the sensitivity), we may then assume without loss of generality that {\mathrm{deg}(f)=m=n}. The Fourier coefficient of {X_1 \dots X_n} is just the mean value

\displaystyle  \frac{1}{2^n} \sum_{x \in ({\bf Z}/2{\bf Z})^n} f(x) X_1 \dots X_n = \frac{1}{2^n} \sum_{x \in ({\bf Z}/2{\bf Z})^n} f(x) \mu(x)

of {f} times the Möbius function {\mu}, so this mean value is non-zero. This means that one of the sets {\{ \mu(x) f(x) = +1\}} or {\{\mu(x) f(x)=-1\}} has cardinality at least {2^{n-1}+1}. Let {E} denote the larger of these two sets. By Theorem 1, there is an {x \in E} such that {x + e_i \in E} for at least {\sqrt{n}} values of {i}; since {\mu(x+e_i) = -\mu(x)}, this implies that {f(x+e_i) \neq f(x)} for at least {\sqrt{n}} values of {i}, giving the claim. \Box

The construction of Chung, Furedi, Graham, and Seymour from the previous section can be easily adapted to show that this lower bound is tight (other than the trivial improvement of replacing {\sqrt{\mathrm{deg}(f)}} by {\lceil \sqrt{\mathrm{deg}(f)} \rceil}).

Now we need to digress on some bounds involving polynomials of one variable. We begin with an inequality of Bernstein concerning trigonometric polynomials:

Lemma 10 (Bernstein inequality) Let {P(\theta)} be a trigonometric polynomial of degree at most {n}, that is to say a complex linear combination of {\sin(k \theta), \cos(k \theta)} for {k=0,\dots,n}. Then

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

Observe that equality holds when {P(\theta)=\sin(n\theta)} or {P(\theta) = \cos(n\theta)}. Specialising to linear combinations of {e^{ik\theta} = \cos(k\theta) + i \sin(k \theta)}, we obtain the classical Bernstein inequality

\displaystyle  \sup_{|z|=1} |P'(z)| \leq n \sup_{|z|=1} |P(z)|

for complex polynomials of degree at most {n}.

Proof: If one is willing to lose a constant factor in this estimate, this bound can be easily established from modern Littlewood-Paley theory (see e.g., Exercise 52 of these lecture notes). Here we use an interlacing argument due to Boas. We first restrict to the case when {P} has real coefficients. We may normalise {\sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)| =1}. Let {\lambda} be a real parameter in {(-1,1)}. The trigonometric polynomial {\cos(n\theta)} alternately takes the values {+1} and {-1} at the {2n} values {\frac{j\pi}{n}, j=0,\dots,2n-1}. Thus the trigonometric polynomial {\cos(n\theta) + \lambda P(\theta)} alternates in sign at these values, and thus by the intermediate value theorem has a zero on each of the intervals {[\frac{j\pi}{n}, \frac{(j+1)\pi}{n}]}. On the other hand, a trigonometric polynomial {P(\theta)} of degree at most {n} can be expressed by de Moivre’s theorem as {e^{-in\theta}} times a complex polynomial in {e^{i\theta}} of degree at most {2n}, and thus has at most {2n} zeroes. Thus we see that {\cos(n\theta) + \lambda P(\theta)} has exactly one zero in each {[\frac{j\pi}{n}, \frac{(j+1)\pi}{n}]}. Furthermore, at this zero, the derivative {-n \sin(n \theta) + \lambda P'(\theta)} of this function must be positive if {\cos(n\theta)} is increasing on this interval, and negative if {\cos(n\theta)} is decreasing on this interval. In summary, we have shown that if {\lambda \in (-1,1)} and {\theta,\alpha} are such that {\cos(n\theta) + \lambda P(\theta)=0}, then {-n \sin(n \theta) + \lambda P'(\theta)} has the same sign as {-n\sin(n\theta)}. By translating the function {\cos(n\theta)}, we also conclude that if {\lambda \in (-1,1)} and {\theta,\alpha} are such that {\cos(n(\theta+\alpha)) + \lambda P(\theta)=0} for some {\alpha \in {\bf R}}, then {-n \sin(n(\theta+\alpha)) + \lambda P'(\theta)} has the same sign as {-n\sin(n(\theta+\alpha))}.

If {\lambda \in (-1,1)}, then we can find {\alpha} such that {\cos(n(\theta+\alpha)) + \lambda P(\theta)=0} and {n\sin(n(\theta+\alpha))} is positive, and we conclude that {\lambda P'(\theta) < n \sin(n(\theta+\alpha)) \leq n}; thus we have the upper bound

\displaystyle  P'(\theta) \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

A similar argument (with {n \sin(n(\theta+\alpha))} now chosen to be negative) similarly bounds {-P'(\theta)}. This gives the claim for real-valued trigonometric polynomials {P}. (Indeed, this argument even gives the slightly sharper bound {\sup_{\theta} \frac{1}{n^2} |P'(\theta)|^2 + |P(\theta)|^2 = \sup_{\theta} |P(\theta)|^2}.)

To amplify this to complex valued polynomials, we take advantage of phase rotation invariance. If {P} is a complex trigonometric polynomial, then by applying Bernstein’s inequality to the real part {\mathrm{Re} P} we have

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |\mathrm{Re} P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

But then we can multiply {P} by any complex phase {e^{i\alpha}} and conclude that

\displaystyle  \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |\mathrm{Re} e^{i\alpha} P'(\theta)| \leq n \sup_{\theta \in {\bf R}/2\pi {\bf Z}} |P(\theta)|.

Taking suprema in {\alpha}, one obtains the claim for complex polynomials {P}. \Box

The analogue of Bernstein’s inequality for the unit interval is known as Markov’s inequality for polynomials:

Lemma 11 (Markov’s inequality for polynomials) Let {P: {\bf R} \rightarrow {\bf R}} be a polynomial of degree {n}. Then

\displaystyle  \sup_{x \in [-1,1]} |P'(x)| \leq n^2 \sup_{x \in [-1,1]} |P(x)|.

This bound is sharp, as is seen by inspecting the Chebyshev polynomial {T_n}, defined as the unique polynomial giving the trigonometric identity

\displaystyle  T_n(\cos \theta) := \cos(n \theta). \ \ \ \ \ (6)

Differentiating (6) using the chain rule, we see that

\displaystyle  T'_n(\cos \theta) := n \frac{\sin(n \theta)}{\sin(\theta)}; \ \ \ \ \ (7)

the right-hand side approaches {n^2} as {\theta \rightarrow 0}, demonstrating that the {n^2} factor here is sharp.

Proof: We again use an argument of Boas. We may normalise so that

\displaystyle  \sup_{x \in [-1,1]} |P(x)| = 1. \ \ \ \ \ (8)

The function {P(\cos \theta)} is a trigonometric polynomial of degree at most {n}, so by Bernstein’s inequality and the chain rule we have

\displaystyle  |\sin \theta| |P'(\cos \theta)| \leq n \ \ \ \ \ (9)

for all {\theta}, and thus

\displaystyle  |P'(x)| \leq \frac{n}{\sqrt{1-x^2}}

for all {x \in [-1,1]}. This already gives Markov’s inequality except in the edge regions {|x| \geq \sqrt{1 - \frac{1}{n^2}} \geq \cos \frac{\pi}{2n}} (since {\sin \frac{\pi}{2n} \geq \frac{1}{n} \sin \frac{\pi}{2} = \frac{1}{n}}). By reflection symmetry, it then suffices to verify Markov’s inequality in the region {\cos \frac{\pi}{2n} \leq x \leq 1}.

From (6), the Chebyshev polynomial {T_n} attains the values {\pm 1} alternately at the {n+1} different points {\cos(\frac{j\pi}{n}), j=0,\dots,n}. Thus, if {\lambda \in (-1,1)}, the polynomial {T_n(x)+\lambda P(x)} changes sign at least {n} times on {[-1,1]}, and thus must have all {n} zeroes inside this interval by the intermediate value theorem; furthermore, {n-1} of these zeroes will lie to the left of {\cos(\frac{\pi}{n})}. By Rolle’s theorem, the derivative {T'_n(x) + \lambda P'(x)} then has all {n-1} zeroes in the interval {[-1,1]}, and at least {n-2} of these will lie to the left of {\cos( \frac{\pi}{n} )}. In particular, the derivative {T'_n(x) + \lambda P'(x)} can have at most one zero to once to the right of {\cos(\frac{\pi}{2n} ) \geq \cos(\frac{\pi}{n})}.

Since {T_n(1)=1}, {T_n(x)+\lambda P(x)} is positive at {x=+1}, and hence positive as {x \rightarrow \infty} since there are no zeroes outside of {[-1,1]}. Thus the leading coefficient of {T_n(x) + \lambda P(x)} is positive, which implies the same for its derivative {T'_n(x) + \lambda P'(x)}. Thus {T'_n(x) + \lambda P'(x)} is positive when {x \rightarrow \infty}.

From (9) one has {|P'(\cos(\frac{\pi}{2n}))| \leq \frac{n}{\sin(\pi/2n)}}, hence by (7) we see that {T'_n(x) + \lambda P'(x)} is also positive at {x = \cos(\frac{\pi}{2n})}. Thus {T'_n(x) + \lambda P'(x)} cannot become negative for {x > \cos(\frac{\pi}{2n})}, as this would create at least two zeroes to the right of {\cos(\frac{\pi}{2n})}. We conclude that in this region we have

\displaystyle  |P'(x)| \leq T'_n(x).

From (7) we have {|T'_n(x)| \leq n^2}, and the claim follows. \Box

Remark 12 The following slightly shorter argument gives the slightly weaker bound {\sup_{x \in [-1,1]} |P'(x)| \leq n \mathrm{cosec}(1/n) \sup_{x \in [-1,1]} |P(x)|}. We again use the normalisation (8). By two applications of Bernstein’s inequality, the function {\theta \mapsto P(\cos(\theta))} has first derivative bounded in magnitude by {n}, and second derivative bounded in magnitude by {n^2}. As this function also has vanishing first derivative at {0}, we conclude the bounds

\displaystyle  |\frac{d}{d\theta} P(\cos \theta)| \leq \min( n^2 |\theta|, n )

and thus by the chain rule

\displaystyle  |P(\cos \theta)| \leq \min( n^2 |\theta|, n ) \mathrm{cosec}(\theta).

For {|\theta| \leq \pi}, one easily checks that the right-hand side is at most {n \mathrm{cosec}(1/n)}, giving the claim.

This implies a result of Ehlich-Zeller and of Rivlin-Cheney:

Corollary 13 (Discretised Markov inequality) Let {P: {\bf R} \rightarrow {\bf R}} be a polynomial of degree {d}. If

\displaystyle  \sup_{x \in [0,n]} |P'(j)| > A \sup_{j =0,\dots,n} |P(j)| \ \ \ \ \ (10)

then we have {d > \sqrt{\frac{An}{A+2}}}.

Proof: We use an argument of Nisan and Szegedy. Assume for sake of contradiction that {d \leq \sqrt{\frac{An}{A+2}}}, so in particular {d < \sqrt{n}}. From the fundamental theorem of calculus and the triangle inequality one has

\displaystyle  \sup_{x \in [0,n]} |P(x)| \leq \sup_{j =0,\dots,n} |P(j)| + \frac{1}{2} \sup_{x \in [0,n]} |P'(x)|.

By a rescaled and translated version of Markov’s inequality we have

\displaystyle  \sup_{x \in [0,n]} |P'(x)| \leq \frac{2d^2}{n} \sup_{x \in [0,n]} |P(x)| \ \ \ \ \ (11)

which when inserted into the preceding inequality gives after some rearranging

\displaystyle  \sup_{x \in [0,n]} |P(x)| \leq (1 - \frac{d^2}{n})^{-1} \sup_{j =0,\dots,n} |P(j)|

and then after a second application of (11) gives

\displaystyle  \sup_{x \in [0,n]} |P'(x)| \leq \frac{2d^2}{n} (1 - \frac{d^2}{n})^{-1} \sup_{j =0,\dots,n} |P(j)|.

Comparing with (10), we conclude that

\displaystyle  A < \frac{2d^2}{n} (1 - \frac{d^2}{n})^{-1}

and the claim follows after some rearranging. \Box

Nisan and Szegedy observed that this one-dimensional degree bound can be lifted to the hypercube by a symmetrisation argument:

Corollary 14 (Multidimensional Markov inequality bound) Let {f: ({\bf Z}/2{\bf Z})^n \rightarrow [-1,1]} be such that {f(0)=1} and {f(e_i)=-1} for {i=1,\dots,n}. Then

\displaystyle  \mathrm{deg}(f) \geq \sqrt{n/2}.

Proof: By averaging {f} over all permutations of the indices (which can decrease the degree of {f}, but not increase it), we may assume that {f} is a symmetric function of the inputs {x_1,\dots,x_n}. Using the Newton identities, we can then write

\displaystyle  f(x) = P(|x|)

for some real polynomial {P: {\bf R} \rightarrow {\bf R}} of degree at most {\mathrm{deg}(f)}, where

\displaystyle  |x| = \# \{ i =1,\dots,n: x_i = 1 \}

is the Hamming length of {x}. By hypothesis, {\sup_{j = 0,\dots,n} |P(j)| \leq 1}, {P(0)=1}, and {P(1)=-1}, hence by the mean-value theorem {\sup_{x \in [0,n]} |P'(x)| \geq 2}. Applying Corollary 13 with {A=2}, we obtain the claim. \Box

Define the block sensitivity {bs(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} to be the largest number for which there is an {x} such that there are at least {bs(f)} disjoint subsets {I_1,\dots,I_{bs(f)}} of {\{1,\dots,n\}} with {f(x+\sum_{i \in I_j} e_i) \neq f(x)} for {j=1,\dots,bs(f)}. We have

Theorem 15 (Sensitivity conjecture) One has

\displaystyle  s(f) \leq bs(f) \leq 2 s(f)^4.

More precisely, the sensitivity conjecture of Nisan and Szegedy asserted the bound {bs(f) = O( s(f)^{O(1)})}; Huang’s result thus gives explicit values for the exponents. It is still open whether the exponent {4} in this theorem can be improved; it is known that one cannot improve it to below {2}, by analysing a variant of the Chung-Furedi-Graham-Seymour example (see these notes of Regan for details). Proof: The lower bound for {bs(f)} is immediate from the definitions, since the sensitivity arises by restricting the {I_i} in the definition of block sensitivity to singleton sets. To prove the upper bound, it suffices from Proposition 9 to establish the bound

\displaystyle  \mathrm{deg}(f) \geq \sqrt{bs(f)/2}.

Let {m = bs(f)}. By hypothesis, there are {x \in ({\bf Z}/2{\bf Z})^n} and {m} disjoint subsets {I_1,\dots,I_m} of {\{1,\dots,n\}} such that {f(x+\sum_{i \in I_j} e_i) \neq f(x)} for {j=1,\dots,m}. We may normalise {x=0}, {f(0)=1} and {f(\sum_{i\in I_j}e_i)=-1}. If we then define the pullback boolean function {\tilde f: ({\bf Z}/2{\bf Z})^m \rightarrow \{-1,1\}} by the formula

\displaystyle  \tilde f( y_1,\dots,y_m) := f( \sum_{j=1}^m y_j \sum_{i \in I_j} e_i )

then it is easy to see that {\mathrm{deg}(\tilde f) \leq \mathrm{deg}(f)}, {\tilde f(0)=1}, and {\tilde f(e_i)=-1} for {i=1,\dots,m}. The claim now follows from Corollary 14. \Box

Remark 16 The following slightly shorter variant of the argument lets one remove the factor {2}. Let {m, I_1,\dots,I_m} be as above. We again may normalise {f(0)=1} and {f(\sum_{i \in I_j} e_i)=-1}. For any {\theta}, let {\xi_{\theta,1},\dots,\xi_{\theta,m} \in \{0,1\}} be iid Bernoulli random variable that equal {1} with probability {\sin^2 \theta} and {0} with probability {\cos^2 \theta}. The quantity

\displaystyle  F(\theta) := \mathop{\bf E} f( \sum_{j=1}^m \xi_{\theta,i} \sum_{i \in I_j} e_i )

is a trigonometric polynomial of degree at most {2\mathrm{deg}(f)} that is bounded in magnitude by {1}, so by two applications of Bernstein’s inequality

\displaystyle  |F''(0)| \leq 4 \mathrm{deg}(f)^2.

On the other hand, for small {\theta}, the random variable {\sum_{j=1}^m \xi_{\theta,i} \sum_{i \in I_j} e_i} is equal to zero with probability {1 - m \theta^2 + O(\theta^3)} and equal to each {\sum_{i \in I_j} e_i} with probability {\theta^2 + O(\theta^3)}, hence

\displaystyle  F(\theta) = 1 - 2m \theta^2 + O(\theta^3)

and hence {F''(0) = - 4m}. Combining these estimates we obtain {\mathrm{deg}(f) \geq \sqrt{bs(f)}} and hence {bs(f) \leq s(f)^4}.

Remark 17 The sensitivity {s(f)} of a Boolean function {f: ({\bf Z}/2{\bf Z})^n \rightarrow \{-1,1\}} can be split as {s(f) = \max( s_{+1}(f), s_{-1}(f))}, where {s_c(f)} is largest number for which there is an {x} such that there are at least {s_c(f)} values of {i = 1,\dots,n} with {f(x+e_i) \neq f(x) = c}. It is not difficult to use the {k=2} case of Remark 3 to improve Corollary 9 slightly to {s_{+1}(f) s_{-1}(f) \geq \mathrm{deg}(f)}. Combining this with the previous remark, we can thus improve the upper bound in Theorem 15 slightly to

\displaystyle  bs(f) \leq s_{+1}(f)^2 s_{-1}(f)^2.

Matt von HippelCommunicating the Continuum Hypothesis

I have a friend who is shall we say, pessimistic, about science communication. He thinks it’s too much risk for too little gain, too many misunderstandings while the most important stuff is so abstract the public will never understand it anyway. When I asked him for an example, he started telling me about a professor who works on the continuum hypothesis.

The continuum hypothesis is about different types of infinity. You might have thought there was only one type of infinity, but in the nineteenth century the mathematician Georg Cantor showed there were more, the most familiar of which are countable and uncountable. If you have a countably infinite number of things, then you can “count” them, “one, two, three…”, assigning a number to each one (even if, since they’re still infinite, you never actually finish). To imagine something uncountably infinite, think of a continuum, like distance on a meter stick, where you can always look at smaller and smaller distances. Cantor proved, using various ingenious arguments, that these two types of infinity are different: the continuum is “bigger” than a mere countable infinity.

Cantor wondered if there could be something in between, a type of infinity bigger than countable and smaller than uncountable. His hypothesis (now called the continuum hypothesis) was that there wasn’t: he thought there was no type of infinite between countable and uncountable.

(If you think you have an easy counterexample, you’re wrong. In particular, fractions are countable.)

Kurt Gödel didn’t prove the continuum hypothesis, but in 1940 he showed that at least it couldn’t be disproved, which you’d think would be good enough. In 1964, though, another mathematician named Paul Cohen showed that the continuum hypothesis also can’t be proved, at least with mathematicians’ usual axioms.

In science, if something can’t be proved or disproved, then we shrug our shoulders and say we don’t know. Math is different. In math, we choose the axioms. All we have to do is make sure they’re consistent.

What Cohen and Gödel really showed is that mathematics is consistent either way: if the continuum hypothesis is true or false, the rest of mathematics still works just as well. You can add it as an extra axiom, and add-on that gives you different types of infinity but doesn’t change everyday arithmetic.

You might think that this, finally, would be the end of the story. Instead, it was the beginning of a lively debate that continues to this day. It’s a debate that touches on what mathematics is for, whether infinity is merely a concept or something out there in the world, whether some axioms are right or wrong and what happens when you change them. It involves attempts to codify intuition, arguments about which rules “make sense” that blur the boundary between philosophy and mathematics. It also involves the professor my friend mentioned, W. H. Woodin.

Now, can I explain Woodin’s research to you?

No. I don’t understand it myself, it’s far more abstract and weird than any mathematics I’ve ever touched.

Despite that, I can tell you something about it. I can tell you about the quest he’s on, its history and its relevance, what is and is not at stake. I can get you excited, for the same reasons that I’m excited, I can show you it’s important for the same reasons I think it’s important. I can give you the “flavor” of the topic, and broaden your view of the world you live in, one containing a hundred-year conversation about the nature of infinity.

My friend is right that the public will never understand everything. I’ll never understand everything either. But what we can do, what I strive to do, is to appreciate this wide weird world in all its glory. That, more than anything, is why I communicate science.

July 31, 2019

Doug NatelsonMore brief items

Writing writing writing.  In the meantime:

Scott AaronsonLinks, proofs, talks, jokes

For those who haven’t yet seen it, Erica Klarreich has a wonderful article in Quanta on Hao Huang’s proof of the Sensitivity Conjecture. This is how good popular writing about math can be.

Klarreich quotes my line from this blog, “I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.” However, even if God doesn’t know a simpler proof, that of course doesn’t rule out the possibility that Don Knuth does! And indeed, a couple days ago Knuth posted his own variant of Huang’s proof on his homepage—in Knuth’s words, fleshing out the argument that Shalev Ben-David previously posted on this blog—and then left a comment about it here, the first comment by Knuth that I know about on this blog or any other blog. I’m honored—although as for whether the variants that avoid the Cauchy Interlacing Theorem are actually “simpler,” I guess I’ll leave that between Huang, Ben-David, Knuth, and God.

In Communications of the ACM, Samuel Greengard has a good, detailed article on Ewin Tang and her dequantization of the quantum recommendation systems algorithm. One warning (with thanks to commenter Ted): the sentence “The only known provable separation theorem between quantum and classical is sqrt(n) vs. n” is mistaken, though it gestures in the direction of a truth. In the black-box setting, we can rigorously prove all sorts of separations: sqrt(n) vs. n (for Grover search), exponential (for period-finding), and more. In the non-black-box setting, we can’t prove any such separations at all.

Last week I returned to the US from the FQXi meeting in the Tuscan countryside. This year’s theme was “Mind Matters: Intelligence and Agency in the Physical World.” I gave a talk entitled “The Search for Physical Correlates of Consciousness: Lessons from the Failure of Integrated Information Theory” (PowerPoint slides here), which reprised my blog posts critical of IIT from five years ago. There were thought-provoking talks by many others who might be known to readers of this blog, including Sean Carroll, David Chalmers, Max Tegmark, Seth Lloyd, Carlo Rovelli, Karl Friston … you can see the full schedule here. Apparently video of the talks is not available yet but will be soon.

Let me close this post by sharing two important new insights about quantum mechanics that emerged from my conversations at the FQXi meeting:

(1) In Hilbert space, no one can hear you scream. Unless, that is, you scream the exact same way everywhere, or unless you split into separate copies, one for each different way of screaming.

(2) It’s true that, as a matter of logic, the Schrödinger equation does not imply the Born Rule. Having said that, if the Schrödinger equation were leading a rally, and the crowd started a chant of “BORN RULE! BORN RULE! BORN RULE!”—the Schrödinger equation would just smile and wait 13 seconds for the chant to die down before continuing.

July 29, 2019

Terence TaoSharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint

I have just uploaded to the arXiv my paper “Sharp bounds for multilinear curved Kakeya, restriction and oscillatory integral estimates away from the endpoint“, submitted to Mathematika. In this paper I return (after more than a decade’s absence) to one of my first research interests, namely the Kakeya and restriction family of conjectures. The starting point is the following “multilinear Kakeya estimate” first established in the non-endpoint case by Bennett, Carbery, and myself, and then in the endpoint case by Guth (with further proofs and extensions by Bourgain-Guth and Carbery-Valdimarsson:

Theorem 1 (Multilinear Kakeya estimate) Let {\delta > 0} be a radius. For each {j = 1,\dots,d}, let {\mathbb{T}_j} denote a finite family of infinite tubes {T_j} in {{\bf R}^d} of radius {\delta}. Assume the following axiom:

  • (i) (Transversality) whenever {T_j \in \mathbb{T}_j} is oriented in the direction of a unit vector {n_j} for {j =1,\dots,d}, we have

    \displaystyle  \left|\bigwedge_{j=1}^d n_j\right| \geq A^{-1}

    for some {A>0}, where we use the usual Euclidean norm on the wedge product {\bigwedge^d {\bf R}^d}.

Then, for any {p \geq \frac{1}{d-1}}, one has

\displaystyle  \left\| \prod_{j=1}^d \sum_{T_j \in \mathbb{T}_j} 1_{T_j} \right\|_{L^p({\bf R}^d)} \lesssim_{A,p} \delta^{\frac{d}{p}} \prod_{j \in [d]} \# \mathbb{T}_j. \ \ \ \ \ (1)

where {L^p({\bf R}^d)} are the usual Lebesgue norms with respect to Lebesgue measure, {1_{T_j}} denotes the indicator function of {T_j}, and {\# \mathbb{T}_j} denotes the cardinality of {\mathbb{T}_j}.

The original proof of this proceeded using a heat flow monotonicity method, which in my previous post I reinterpreted using a “virtual integration” concept on a fractional Cartesian product space. It turns out that this machinery is somewhat flexible, and can be used to establish some other estimates of this type. The first result of this paper is to extend the above theorem to the curved setting, in which one localises to a ball of radius {O(1)} (and sets {\delta} to be small), but allows the tubes {T_j} to be curved in a {C^2} fashion. If one runs the heat flow monotonicity argument, one now picks up some additional error terms arising from the curvature, but as the spatial scale approaches zero, the tubes become increasingly linear, and as such the error terms end up being an integrable multiple of the main term, at which point one can conclude by Gronwall’s inequality (actually for technical reasons we use a bootstrap argument instead of Gronwall). A key point in this approach is that one obtains optimal bounds (not losing factors of {\delta^{-\varepsilon}} or {\log^{O(1)} \frac{1}{\delta}}), so long as one stays away from the endpoint case {p=\frac{1}{d-1}} (which does not seem to be easily treatable by the heat flow methods). Previously, the paper of Bennett, Carbery, and myself was able to use an induction on scale argument to obtain a curved multilinear Kakeya estimate losing a factor of {\log^{O(1)} \frac{1}{\delta}} (after optimising the argument); later arguments of Bourgain-Guth and Carbery-Valdimarsson, based on algebraic topology methods, could also obtain a curved multilinear Kakeya estimate without such losses, but only in the algebraic case when the tubes were neighbourhoods of algebraic curves of bounded degree.

Perhaps more interestingly, we are also able to extend the heat flow monotonicity method to apply directly to the multilinear restriction problem, giving the following global multilinear restriction estimate:

Theorem 2 (Multilinear restriction theorem) Let {\frac{1}{d-1} < p \leq \infty} be an exponent, and let {A \geq 2} be a parameter. Let {M} be a sufficiently large natural number, depending only on {d}. For {j \in [d]}, let {U_j} be an open subset of {B^{d-1}(0,A)}, and let {h_j: U_j \rightarrow {\bf R}} be a smooth function obeying the following axioms:

Let {U_{j,1/A} \subset U_j} be the sets

\displaystyle  U_{j,1/A} := \{ \xi \in U_j: B^{d-1}(\xi,1/A) \subset U_j \}. \ \ \ \ \ (3)

Then one has

\displaystyle  \left\| \prod_{j \in [d]} {\mathcal E}_j f_j \right\|_{L^{2p}({\bf R}^d)} \leq A^{O(1)} \left(d-1-\frac{1}{p}\right)^{-O(1)} \prod_{j \in [d]} \|f_j \|_{L^2(U_{j,1/A})}

for any {f_j \in L^2(U_{j,1/A} \rightarrow {\bf C})}, {j \in [d]}, extended by zero outside of {U_{j,1/A}}, and {{\mathcal E}_j} denotes the extension operator

\displaystyle  {\mathcal E}_j f_j( x', x_d ) := \int_{U_j} e^{2\pi i (x' \xi^T + x_d h_j(\xi))} f_j(\xi)\ d\xi.

Local versions of such estimate, in which {L^{2p}({\bf R}^d)} is replaced with {L^{2p}(B^d(0,R))} for some {R \geq 2}, and one accepts a loss of the form {\log^{O(1)} R}, were already established by Bennett, Carbery, and myself using an induction on scale argument. In a later paper of Bourgain-Guth these losses were removed by “epsilon removal lemmas” to recover Theorme 2, but only in the case when all the hypersurfaces involved had curvatures bounded away from zero.

There are two main new ingredients in the proof of Theorem 2. The first is to replace the usual induction on scales scheme to establish multilinear restriction by a “ball inflation” induction on scales scheme that more closely resembles the proof of decoupling theorems. In particular, we actually prove the more general family of estimates

\displaystyle  \left\| \prod_{j \in [d]} E_{r}[{\mathcal E}_j f_j] \right\|_{L^{p}({\bf R}^d)} \leq A^{O(1)} \left(d-1 - \frac{1}{p}\right)^{O(1)} r^{\frac{d}{p}} \prod_{j \in [d]} \| f_j \|_{L^2(U_{j,1/A})}^2

where {E_r} denotes the local energies

\displaystyle  E_{r}[f](x',x_d) := \int_{B^{d-1}(x',r)} |f(y',x_d)|^2\ dy'

(actually for technical reasons it is more convenient to use a smoother weight than the strict cutoff to the disk {B^{d-1}(x',r)}). With logarithmic losses, it is not difficult to establish this estimate by an upward induction on {r}. To avoid such losses we use the heat flow monotonicity method. Here we run into the issue that the extension operators {{\mathcal E}_j f_j} are complex-valued rather than non-negative, and thus would not be expected to obey many good montonicity properties. However, the local energies {E_r[{\mathcal E}_j f_j]} can be expressed in terms of the magnitude squared of what is essentially the Gabor transform of {{\mathcal E}_j f_j}, and these are non-negative; furthermore, the dispersion relation associated to the extension operators {{\mathcal E}_j f_j} implies that these Gabor transforms propagate along tubes, so that the situation becomes quite similar (up to several additional lower order error terms) to that in the multilinear Kakeya problem. (This can be viewed as a continuous version of the usual wave packet decomposition method used to relate restriction and Kakeya problems, which when combined with the heat flow monotonicity method allows for one to use a continuous version of induction on scales methods that do not concede any logarithmic factors.)

Finally, one can combine the curved multilinear Kakeya result with the multilinear restriction result to obtain estimates for multilinear oscillatory integrals away from the endpoint. Again, this sort of implication was already established in the previous paper of Bennett, Carbery, and myself, but the arguments there had some epsilon losses in the exponents; here we were able to run the argument more carefully and avoid these losses.

Clifford JohnsonNews from the Front, XVIII: de Sitter Black Holes and Continuous Heat Engines

[caption id="attachment_19313" align="alignright" width="250"] Hubble photo of jupiter's aurorae.[/caption]Another title for this could be "Making sense of de Sitter black hole thermodynamics", I suppose. What I'm going to tell you about is either a direct correspondence or a series of remarkable inspiring coincidences. Either way, I think you will come away agreeing that there is certainly something interesting afoot.

It is an idea I'd been tossing around in my head from time to time over years, but somehow did not put it all together, and then something else I was working on years later, that was seemingly irrelevant, helped me complete the puzzle, resulting in my new paper, which (you guessed it) I'm excited about.

It all began when I was thinking about heat engines, for black holes in anti-de Sitter, which you may recall me talking about in posts here, here, and here, for example. Those are reciprocating heat engines, taking the system through a cycle that -through various stages- takes in heat, does work, and exhausts some heat, then repeats and repeats. And repeats.

I've told you the story about my realisation that there's this whole literature on quantum heat engines that I'd not known about, that I did not even know of a thing called a quantum heat engine, and my wondering whether my black hole heat engines could have a regime where they could be considered quantum heat engines, maybe enabling them to be useful tools in that arena...(resulting in the paper I described here)... and my delight in combining 18th Century physics with 21st Century physics in this interesting way.

All that began back in 2017. One thing I kept coming back to that really struck me as lovely is what can be regarded as the prototype quantum heat engine. It was recognized as such as far back as 1959!! It is a continuous heat engine, meaning that it does its heat intake and work and heat output all at the same time, as a continuous flow. It is, in fact a familiar system - the three-level maser! (a basic laser also uses the key elements).

A maser can be described as taking in energy as heat from an external source, and giving out energy in the form of heat and work. The work is the desired [...] Click to continue reading this post

The post News from the Front, XVIII: de Sitter Black Holes and Continuous Heat Engines appeared first on Asymptotia.

Tommaso DorigoXENON Reports New Strong Limits On Light Dark Matter

Our current understanding of the Universe includes the rather unsettling notion that most of its matter is not luminous - it does not clump into stars, that is. Nobody has a clue of what this Dark Matter (DM) really is, and hypotheses on what it could be made of are sold at a dime a dozen. 
On the other hand, we clearly see the gravitational effects of DM on galaxies and clusters of galaxies, so the consensus of the scientific community is that one of those cheap theories must be true. What make this very close to a dream situation for an experimental scientist is the fact that we do have instruments capable of detecting, or ruling out, dark matter behaving according to most of the majority of those possibilities. 

read more

July 26, 2019

Matt von HippelThe Real E=mc^2

It’s the most famous equation in all of physics, written on thousands of chalkboard stock photos. Part of its charm is its simplicity: E for energy, m for mass, c for the speed of light, just a few simple symbols in a one-line equation. Despite its simplicity, E=mc^2 is deep and important enough that there are books dedicated to explaining it.

What does E=mc^2 mean?

Some will tell you it means mass can be converted to energy, enabling nuclear power and the atomic bomb. This is a useful picture for chemists, who like to think about balancing ingredients: this much mass on one side, this much energy on the other. It’s not the best picture for physicists, though. It makes it sound like energy is some form of “stuff” you can pour into your chemistry set flask, and energy really isn’t like that.

There’s another story you might have heard, in older books. In that story, E=mc^2 tells you that in relativity mass, like distance and time, is relative. The more energy you have, the more mass you have. Those books will tell you that this is why you can’t go faster than light: the faster you go, the greater your mass, and the harder it is to speed up.

Modern physicists don’t talk about it that way. In fact, we don’t even write E=mc^2 that way. We’re more likely to write:


“v” here stands for the velocity, how fast the mass is moving. The faster the mass moves, the more energy it has. Take v to zero, and you get back the familiar E=mc^2.

The older books weren’t lying to you, but they were thinking about a different notion of mass: “relativistic mass” m_r instead of “rest mass” $m_0$, related like this:


which explains the difference in how we write E=mc^2.

Why the change? In part, it’s because of particle physics. In particle physics, we care about the rest mass of particles. Different particles have different rest mass: each electron has one rest mass, each top quark has another, regardless of how fast they’re going. They still get more energy, and harder to speed up, the faster they go, but we don’t describe it as a change in mass. Our equations match the old books, we just talk about them differently.

Of course, you can dig deeper, and things get stranger. You might hear that mass does change with energy, but in a very different way. You might hear that mass is energy, that they’re just two perspectives on the same thing. But those are stories for another day.

I titled this post “The Real E=mc^2”, but to clarify, none of these explanations are more “real” than the others. They’re words, useful in different situations and for different people. “The Real E=mc^2” isn’t the E=mc^2 of nuclear chemists, or old books, or modern physicists. It’s the theory itself, the mathematical rules and principles that all the rest are just trying to describe.

July 23, 2019

Doug NatelsonFerromagnetic droplets

Ferromagnets are solids, in pretty nearly every instance I can recall (though I suppose it's not impossible to imagine an itinerant Stoner magnet that's a liquid below its Curie temperature, and here is one apparent example). There's a neat paper in Science this week, reporting liquid droplets that act like ferromagnets and can be reshaped. 

The physics at work here is actually a bit more interesting than just a single homogeneous material that happens to be liquid below its magnetic ordering temperature.  The liquid in this case is a suspension of magnetite nanoparticles.  Each nanoparticle is magnetic, as the microscopic ordering temperature for Fe3O4 is about 858 K.  However, the individual particles are so small (22 nm in diameter) that they are superparamagnetic at room temperature, meaning that thermal fluctuations are energetic enough to reorient how the little north/south poles of the single-domain particles are pointing.  Now, if the interface at the surface of the suspension droplet confines the nanoparticles sufficiently, they jam together with such small separations that their magnetic interactions are enough to lock their magnetizations, killing the superparamagnetism and leading to a bulk magnetic response from the aggregate.  Pretty cool!  (Extra-long-time readers of this blog will note that this hearkens waaaay back to this post.)

July 22, 2019

John BaezApplied Category Theory 2019 Talks

Applied Category Theory 2019 happened last week! It was very exciting: about 120 people attended, and they’re pushing forward to apply category theory in many different directions. The topics ranged from ultra-abstract to ultra-concrete, sometimes in the same talk.

The talks are listed above — click for a more readable version. Below you can read what Jules Hedges and I wrote about all those talks:

• Jules Hedges, Applied Category Theory 2019.

I tend to give terse summaries of the talks, with links to the original papers or slides. Jules tends to give his impressions of their overall significance. They’re nicely complementary.

You can also see videos of some talks, created by Jelle Herold with help from Fabrizio Genovese:

• Giovanni de Felice, Functorial question answering.

• Antonin Delpeuch, Autonomization of monoidal categories.

• Colin Zwanziger, Natural model semantics for comonadic and adjoint modal type theory.

• Nicholas Behr, Tracelets and tracelet analysis Of compositional rewriting systems.

• Dan Marsden, No-go theorems for distributive laws.

• Christian Williams, Enriched Lawvere theories for operational semantics.

• Walter Tholen, Approximate composition.

• Erwan Beurier, Interfacing biology, category theory & mathematical statistics.

• Stelios Tsampas, Categorical contextual reasoning.

• Fabrizio Genovese, idris-ct: A library to do category theory in Idris.

• Michael Johnson, Machine learning and bidirectional transformations.

• Bruno Gavranović, Learning functors using gradient descent

• Zinovy Diskin, Supervised learning as change propagation with delta lenses.

• Bryce Clarke, Internal lenses as functors and cofunctors.

• Ryan Wisnewsky, Conexus AI.

• Ross Duncan, Cambridge Quantum Computing.

• Beurier Erwan, Memoryless systems generate the class of all discrete systems.

• Blake Pollard, Compositional models for power systems.

• Martti Karvonen, A comonadic view of simulation and quantum resources.

• Quanlong Wang, ZX-Rules for 2-qubit Clifford+T quantum circuits, and beyond.

• James Fairbank, A Compositional framework for scientific model augmentation.

• Titoan Carette, Completeness of graphical languages for mixed state quantum mechanics.

• Antonin Delpeuch, A complete language for faceted dataflow languages.

• John van der Wetering, An effect-theoretic reconstruction of quantum mechanics.

• Vladimir Zamdzhiev, Inductive datatypes for quantum programming.

• Octavio Malherbe, A categorical construction for the computational definition of vector spaces.

• Vladimir Zamdzhiev, Mixed linear and non-linear recursive types.

John PreskillWhat distinguishes quantum thermodynamics from quantum statistical mechanics?

Yoram Alhassid asked the question at the end of my Yale Quantum Institute colloquium last February. I knew two facts about Yoram: (1) He belongs to Yale’s theoretical-physics faculty. (2) His PhD thesis’s title—“On the Information Theoretic Approach to Nuclear Reactions”—ranks among my three favorites.1 

Over the past few months, I’ve grown to know Yoram better. He had reason to ask about quantum statistical mechanics, because his research stands up to its ears in the field. If forced to synopsize quantum statistical mechanics in five words, I’d say, “study of many-particle quantum systems.” Examples include gases of ultracold atoms. If given another five words, I’d add, “Calculate and use partition functions.” A partition function is a measure of the number of states, or configurations, accessible to the system. Calculate a system’s partition function, and you can calculate the system’s average energy, the average number of particles in the system, how the system responds to magnetic fields, etc.

Line in the sand

My colloquium concerned quantum thermodynamics, which I’ve blogged about many times. So I should have been able to distinguish quantum thermodynamics from its neighbors. But the answer I gave Yoram didn’t satisfy me. I mulled over the exchange for a few weeks, then emailed Yoram a 502-word essay. The exercise grew my appreciation for the question and my understanding of my field. 

An adaptation of the email appears below. The adaptation should suit readers who’ve majored in physics, but don’t worry if you haven’t. Bits of what distinguishes quantum thermodynamics from quantum statistical mechanics should come across to everyone—as should, I hope, the value of question-and-answer sessions:

One distinction is a return to the operational approach of 19th-century thermodynamics. Thermodynamicists such as Sadi Carnot wanted to know how effectively engines could operate. Their practical questions led to fundamental insights, such as the Carnot bound on an engine’s efficiency. Similarly, quantum thermodynamicists often ask, “How can this state serve as a resource in thermodynamic tasks?” This approach helps us identify what distinguishes quantum theory from classical mechanics.

For example, quantum thermodynamicists found an advantage in charging batteries via nonlocal operations. Another example is the “MBL-mobile” that I designed with collaborators. Many-body localization (MBL), we found, can enhance an engine’s reliability and scalability. 

Asking, “How can this state serve as a resource?” leads quantum thermodynamicists to design quantum engines, ratchets, batteries, etc. We analyze how these devices can outperform classical analogues, identifying which aspects of quantum theory power the outperformance. This question and these tasks contrast with the questions and tasks of many non-quantum-thermodynamicists who use statistical mechanics. They often calculate response functions and (e.g., ground-state) properties of Hamiltonians.

These goals of characterizing what nonclassicality is and what it can achieve in thermodynamic contexts resemble upshots of quantum computing and cryptography. As a 21st-century quantum information scientist, I understand what makes quantum theory quantum partially by understanding which problems quantum computers can solve efficiently and classical computers can’t. Similarly, I understand what makes quantum theory quantum partially by understanding how much more work you can extract from a singlet \frac{1}{ \sqrt{2} } ( | 0 1 \rangle - |1 0 \rangle ) (a maximally entangled state of two qubits) than from a product state in which the reduced states have the same forms as in the singlet, \frac{1}{2} ( | 0 \rangle \langle 0 | + | 1 \rangle \langle 1 | ).

As quantum thermodynamics shares its operational approach with quantum information theory, quantum thermodynamicists use mathematical tools developed in quantum information theory. An example consists of generalized entropies. Entropies quantify the optimal efficiency with which we can perform information-processing and thermodynamic tasks, such as data compression and work extraction.

Most statistical-mechanics researchers use just the Shannon and von Neumann entropies, H_{\rm Sh} and H_{\rm vN}, and perhaps the occasional relative entropy. These entropies quantify optimal efficiencies in large-system limits, e.g., as the number of messages compressed approaches infinity and in the thermodynamic limit.

Other entropic quantities have been defined and explored over the past two decades, in quantum and classical information theory. These entropies quantify the optimal efficiencies with which tasks can be performed (i) if the number of systems processed or the number of trials is arbitrary, (ii) if the systems processed share correlations, (iii) in the presence of “quantum side information” (if the system being used as a resource is entangled with another system, to which an agent has access), or (iv) if you can tolerate some probability \varepsilon that you fail to accomplish your task. Instead of limiting ourselves to H_{\rm Sh} and H_{\rm vN}, we use also “\varepsilon-smoothed entropies,” Rényi divergences, hypothesis-testing entropies, conditional entropies, etc.

Another hallmark of quantum thermodynamics is results’ generality and simplicity. Thermodynamics characterizes a system with a few macroscopic observables, such as temperature, volume, and particle number. The simplicity of some quantum thermodynamics served a chemist collaborator and me, as explained in the introduction of

Yoram’s question reminded me of one reason why, as an undergrad, I adored studying physics in a liberal-arts college. I ate dinner and took walks with students majoring in economics, German studies, and Middle Eastern languages. They described their challenges, which I analyzed with the physics mindset that I was acquiring. We then compared our approaches. Encountering other disciplines’ perspectives helped me recognize what tools I was developing as a budding physicist. How can we know our corner of the world without stepping outside it and viewing it as part of a landscape?


1The title epitomizes clarity and simplicity. And I have trouble resisting anything advertised as “the information-theoretic approach to such-and-such.”

July 20, 2019

John BaezApplied Category Theory 2019 Program

Bob Coecke, David Spivak, Christina Vasilakopoulou and I are running a conference on applied category theory:

Applied Category Theory 2019, 15–19 July, 2019, Lecture Theatre B of the Department of Computer Science, 10 Keble Road, Oxford.

You can now see the program here, or below. Hope to see you soon!

Tommaso DorigoThe Plot Of The Week - CMS Searches For Higgs Decays To Charm Quarks

While you and I may have been lagging behind a bit as of late, excused by a particularly hot July, the CMS collaboration has kept grinding its data, producing exquisite new results from the large amounts of proton-proton collisions that the experiment has been collecting during Run 2, i.e. until last year. 
Of course, the process of going from subatomic collisions to submitted papers is a long and complex one. The first part of the journey involves triggering, storage, and reconstruction of the acquired datasets, and reduction of the data into an analysis-friendly format. While this might sound like a partly automatic and painless procedure, it involves the expense of liters of sweat by conscentious collaborators who oversee the data acquisition and their processing.

read more

July 19, 2019

Scott AaronsonFake it till you make it (to the moon)

While I wait to board a flight at my favorite location on earth—Philadelphia International Airport—I figured I might as well blog something to mark the 50th anniversary of Apollo 11. (Thanks also to Joshua Zelinsky for a Facebook post that inspired this.)

I wasn’t alive for Apollo, but I’ve been alive for 3/4 of the time after it, even though it now seems like ancient history—specifically, like a Roman cathedral being gawked at by a medieval peasant, like an achievement by some vanished, more cohesive civilization that we can’t even replicate today, let alone surpass.

Which brings me to a depressing mystery: why do so many people now deny that humans walked on the moon at all? Like, why that specifically? While they’re at it, why don’t they also deny that WWII happened, or that the Beatles existed?

Surprisingly, skepticism of the reality of Apollo seems to have gone all the way back to the landings themselves. One of my favorite stories growing up was of my mom, as a teenager, working as a waitress at an Israeli restaurant in Philadelphia, on the night of Apollo 11 landing. My mom asked for a few minutes off to listen to news of the landing on the radio. The owners wouldn’t grant it—explaining that it was all Hollywood anyway, just some actors in spacesuits on a sound stage, and obviously my mom wasn’t so naïve as to think anyone was actually walking to the moon?

Alas, as we get further and further from the event, with no serious prospect of ever replicating it past the stage of announcing an optimistic timetable (nor, to be honest, any scientific reason to replicate it), as the people involved die off, and as our civilization becomes ever more awash in social-media-fueled paranoid conspiracies, I fear that moon-landing denalism will become more common.

Because here’s the thing: Apollo could happen, but only because of a wildly improbable, once-in-history confluence of social and geopolitical factors. It was economically insane, taking 100,000 people and 4% of the US federal budget for some photo-ops, a flag-planting, some data and returned moon rocks that had genuine scientific value but could’ve been provided much more cheaply by robots. It was dismantled immediately afterwards like a used movie set, rather than leading to any greater successes. Indeed, manned spaceflight severely regressed afterwards, surely mocking the expectations of every last science fiction fan and techno-utopian who was alive at that time.

One could summarize the situation by saying that, in certain respects, the Apollo program really was “faked.” It’s just that the way they “faked” it, involved actually landing people on the moon!

Matt von HippelReader Background Poll Reflections

A few weeks back I posted a poll, asking you guys what sort of physics background you have. The idea was to follow up on a poll I did back in 2015, to see how this blog’s audience has changed.

One thing that immediately leaped out of the data was how many of you are physicists. As of writing this, 66% of readers say they either have a PhD in physics or a related field, or are currently in grad school. This includes 7% specifically from my sub-field, “amplitudeology” (though this number may be higher than usual since we just had our yearly conference, and more amplitudeologists were reminded my blog exists).

I didn’t use the same categories in 2015, so the numbers can’t be easily compared. In 2015 only 2.5% of readers described themselves as amplitudeologists. Adding these up with the physics PhDs and grad students gives 59%, which goes up to 64.5% if I include the mathematicians (who this year might have put either “PhD in a related field” or “Other Academic”). So overall the percentages are pretty similar, though now it looks like more of my readers are grad students.

Despite the small difference, I am a bit worried: it looks like I’m losing non-physicist readers. I could flatter myself and think that I inspired those non-physicists to go to grad school, but more realistically I should admit that fewer of my posts have been interesting to a non-physics audience. In 2015 I worked at the Perimeter Institute, and helped out with their public lectures. Now I’m at the Niels Bohr Institute, and I get fewer opportunities to hear questions from non-physicists. I get fewer ideas for interesting questions to answer.

I want to keep this blog’s language accessible and its audience general. I appreciate that physicists like this blog and view it as a resource, but I don’t want it to turn into a blog for physicists only. I’d like to encourage the non-physicists in the audience: ask questions! Don’t worry if it sounds naive, or if the question seems easy: if you’re confused, likely others are too.

July 16, 2019

Richard EastherOne Small Step

One of my earliest memories is standing with my father on the balcony of my grandmother’s house in Auckland. “Ma’s House” had a spectacular view northwards, across Auckland’s Hauraki Gulf, and the Moon was visible in the early evening sky. Following my eye, my father pointed and said, “I think there are people there at the moment.”

I would love to describe this as the father-son moment that lit my lifelong passion for astronomy, but doing so would take me far from the truth. My love of astronomy must have already been well aflame, because my actual recollection is that this was the first time I saw my father – a man frequently summoned to the hospital to tend the sick, who could fix any broken toy and who was able to split logs with an axe – as imperfect, fallible and human. How could any adult, I wondered, not know exactly how many people were standing on that pale little ball at any moment? 

My memory is that this moment took place during a pre-Christmas visit. Looking at the list of Apollo missions I can work out the date: the only landing in any December was Apollo 17, the last mission, just after my sixth birthday in 1972. While it was clear that there would be hiatus in lunar travel after Apollo 17, it is still a shock to realise that I am now in my 50s and that no one significantly younger than me can have a clear memory of a moment when human beings were standing on another world. Ever since that day there have always been precisely zero people standing on the surface of the Moon. 

The Apollo landings took place against alongside the Vietnam War and the Civil Rights struggle and constituted the winning entry in a race with the Soviet Union, a competition animated by superpower rivalry. But they are also a story of commitment, courage, teamwork and vision, a literal moonshot that shines as a moment of optimism and purpose. Perhaps even more so when you set it against the wider turmoil of the 1960s. I am not the first person to say this, but before Apollo “flying to the Moon” was a byword for an impossible, ridiculous dream – afterwards it was a metaphor for what can be done when we set ourselves to achieve lofty goals.

Sooner or later we will go back. Several countries and even SpaceX, a private company, are developing concrete plans to return human beings to the Moon. However, even with the most optimistic schedules those new voyages are still a few years off. 

If it seems that we have had an unduly long wait, it is worth recalling that after the first trips to the South Pole — with sleds pulled by men and dogs — it was close to 50 years before human beings again stood at 90 degrees South but those subsequent visits were made with tractors and planes, and marked the beginning of a permanent human presence at the Pole. 

Similarly, the Apollo missions pushed their 1960s technology to the absolute limit, and Apollo 13 escaped tragedy by the narrowest of margins. While the final missions spent several days on the lunar surface, the Apollo programme would have been hard-pressed to provide the foundation for a permanent lunar base. When humans do go back to the Moon, they will do so using spacecraft and rockets that have been developed to the point where lunar travel can conceivably become routine. 

And then – perhaps in a decade or two -- a permanent human presence on the Moon will develop to the point where even the best-informed parents might be forgiven for not knowing exactly how many people are standing on the Moon.


IMAGE: The header image is from the Apollo 16 mission. The launch photo above is Apollo 11, on its way to the moon atop a Saturn V rocket. Both images via NASA.

The analogy between lunar and polar exploration has been made by a number of people — I believe I first heard it from the (now sadly deceased) polymathic Columbia astronomer Arlin Crotts, when we were colleagues in the early 2000s, and it appears in his book The New Moon.

July 15, 2019

John PreskillIntroducing a new game: Quantum TiqTaqToe

A passing conversation with my supervisor

Video games have been a part of my life for about as long as I can remember. From Paperboy and The Last Ninja on the Commodore 64 when I was barely old enough to operate a keyboard, to Mario Kart 8 and Zelda on the Nintendo Switch, as a postdoc at Caltech, working on quantum computing and condensed matter physics. Up until recently, I have kept my two lives separate: my love of video games and my career in quantum physics.

The realization that I could combine quantum physics with games came during an entertaining discussion with my current supervisor, Gil Refael. Gil and I were brainstorming approaches to develop a quantum version of Tetris. Instead of stopping and laughing it off, or even keeping the idea on the horizon, Gil suggested that we talk to Spyridon (Spiros) Michalakis for some guidance. 

This is not the story of Quantum Tetris (yet), but rather the story of how we made a quantum version of a much older, and possibly more universally known game. This is a new game that Spiros and myself have been testing at elementary schools.

And so I am super excited to be able to finally present to you: Quantum TiqTaqToe! As of right now, the app is available both for Android devices and iPhone/iPad:

Get it on Google Play


Previous quantum games

Gil and I knew that Spiros had been involved in prior quantum games (most notably qCraft and Quantum Chess), so he seemed like the perfect contact point. He was conveniently located on the same campus, and even in the same department. But more importantly, he was curious about the idea and eager to talk. 

After introducing the idea of Quantum Tetris, Spiros came up with an alternative approach. Seeing as this was going to be my first attempt at creating a video game, not to mention building a game from the ground up with quantum physics, he proposed to put me in touch with Chris Cantwell and help him improve the AI for Quantum Chess.

I thought long and hard about this proposition. Like five seconds. It was an amazing opportunity. I would get to look under the hood of a working and incredibly sophisticated video game, unlike any game ever made: the only game in the world I knew of that was truly based on quantum physics. And I would be solving a critical problem that I would have to deal with eventually, by adapting a conventional, classical rules-based game AI for quantum. 

Fun and Games

My first focus was to jump on Quantum Chess full-force, with the aim of helping Chris implement a new AI player for the game. After evaluating some possible chess-playing AI engines, including state-of-the-art players based off of Google’s AlphaZero, we landed on Stockfish as our best candidate for integration. The AI is currently hot-swappable though, so users can try to develop their own! 

While some of the work for implementing the AI could be done directly using Chris’s C++ implementation of Quantum Chess, other aspects of the work required me to learn the program he had used to develop the user interface. That program is called Unity. Unity is a free game development program that I would highly recommend trying out and playing around with. 

This experience was essential to the birth of Quantum TiqTaqToe. In my quest to understand Unity and Quantum Games, I set out to implement a “simple” game to get a handle on how all the different game components worked together. Having a game based on quantum mechanics is one thing; making sure it is fun to play requires an entirely different skill set.


Classic Tic-Tac-Toe is a game in which two players, called X and O, take turns in placing their symbols on a 3×3 grid. The first player to get 3 of their symbols in a line (diagonally, vertically or horizontally) wins. The game goes as far back as ancient Egypt, and evidence of the game has been found on roof tiles dating to 1300 BC [1].

Many variations of the game have existed across many cultures. The first print reference to a game called “tick-tack-toe” was in 1884. In the US the game was renamed “tic-tac-toe” sometime in the 20th century. Here’s a random fun fact: in Dutch, the game is most often referred to as “Butter-Cheese-and-Eggs” [2]. In 1952, computer scientist Alexander S. Douglas at the University of Cambridge turned it into one of the first computer games, featuring an AI player that could play perfect games against a human opponent.

Combinatorics has determined that whoever plays first will win 91 out of 138 possible board combinations. The second player will win in 44 boards. However, if both players play optimally, looking ahead through all the possible future outcomes, neither player should ever win and the game always ends in a draw, in one of only 3 board combinations.

In Quantum TiqTaqToe, with the current ruleset, we don’t yet know if a winning strategy exists.

I explicitly refer to the current ruleset because we currently limit the amount of quantumness in the game. We want to make sure the game is fun to play and ‘graspable’ for now. In addition, it turns out there already is a game called Quantum TicTacToe, developed by Allan Goff [3]. That version of TicTacToe has similar concepts but has a different set of rules. 

The Game

A typical game of Quantum TiqTaqToe will look very much like regular Tic-Tac-Toe until one of the players decides to make a quantum move:

A quantum superposition is made by dragging from one empty square to another.

At this point, the game board enters into a superposition. The X is in each position with 50/50 chance; in one universe the X is on the left and in the other it is on the right. Neither player knows how things will play out. And the game only gets more interesting from here. The opponent can choose to place their O in a superposition between an empty square and a square occupied by a quantum X.

A quantum entanglement move.

Et voilà, player O has entangled his fate with his opponent’s. Once the two squares become entangled, the only outcomes are X-O or O-X, each with probability ½. Interestingly, since the game is fully quantum, the phase between the two entangled outcomes can in principle be leveraged to create interesting plays through destructive and constructive interference. The app features a simple tutorial (to be updated) that teaches you these moves and a few others. There are boards that classically result in a draw but are quantumly “winnable”.

A quick note on the quantumness

The squares in TiqTaqToe are all fully quantum. I represent them as qutrits (like qubits, but instead of having states 0 and 1 my qutrits have states 0, 1 and 2), and moves made by the players are unitary operations acting on them. So the game consists of these essential elements:

  1. The squares of the 3×3 grid are turned into qutrits (Empty, X, O). Each move is a unitary gate operation on those qutrits. I’ll leave the details of the math out, but for the case of qubits check out Chris’ detailed writeup on Quantum Chess [4].
  2. Quantum TiqTaqToe allows you to select two squares in the grid, providing you with the option of creating a superposition or an entangled state. For the sake of simplicity (i.e. keeping the game fun to play and ‘graspable’ for now), no more than 3 squares can be involved in a given entangled state.

I chose to explicitly track sets of qutrits that share a Hilbert space. The entire quantum state of the game combines these sets with classical strings of the form “XEEOXEOXE”, indicating that the first square is an X, the second is Empty, etc.

Victory in the multiverse

So, when does the game end if these quantum states are in play? In Quantum TiqTaqToe, the board collapses to a single classical state as soon as it is full (i.e. every square is non-empty). The resulting state is randomly chosen from all the possible outcomes, with a probability that is equal to the (square of the) wave-function amplitude (basic quantum mechanics). If there is a winner after the collapse, the game ends. Otherwise, the game continues until either there is a winner or until there are no more moves to be made (ending in a draw). On top of this, players get the option to forfeit their move for the opportunity to cause a partial collapse of the state, by using the collapse-mode. Future versions may include other ways of collapse, including one that does not involve rolling dice! [5]

Can you beat the quantum AI?

Due to quantum physics and the collapse of the state, the game is inherently statistical. So instead of asking: “Can I beat my opponent in a game of Quantum TiqTaqToe?” one should ask “If I play 100 games against my opponent, can I consistently win more than 50 of them?”

You can test your skill against the in-game quantum AI to see if you’ve indeed mastered Quantum TiqTaqToe yet. At the hardest setting, winning even 30% of the time after, say, 20 games may be extraordinary. The implementation of this AI, by the way, would have been a blog-post by itself. For the curious, I can say it is based on the ExpectiMiniMax algorithm. As of the moment of this writing, the hardest AI setting is not available in the app yet. Keep your eyes out for an update soon though!

The future

Perhaps kids who grow up playing quantum games will acquire a visceral understanding of quantum phenomena that our generation lacks.” – John Preskill, in his recent article [6].

From the get-go, Quantum TiqTaqToe (and Quantum Chess) have had outreach as a core motivation. Perhaps future quantum engineers and quantum programmers will look back on their youth and remember playing Quantum TiqTaqToe as I remember my Commodore 64 games. I am convinced that these small steps into the realm of Quantum Games are only just the beginning of an entirely new genre of fun and useful games.

In the meantime, we are hard at work implementing an Online mode so you can play with your fellow human friends remotely too. This online mode, plus the option of fighting a strong quantum AI, will be unlockable in-game through a small fee (unless you are an educator who wishes to introduce quantum physics in class through this game; those use cases are fee-free courtesy of IQIM and NSF). Each purchase will go towards supporting the future development of exciting new Quantum TiqTaqToe features, as well as other exciting Quantum Games (Tetris, anyone?)

Just in case you missed it: the app is available both for Android devices and iPhone/iPad right now:

Get it on Google Play


I really hope you enjoy the game, and perhaps use it to get your friends and family excited about quantum physics. Oh, and start practicing! You never know if the online mode will bring along with it a real Quantum TiqTaqToe Tournament down the road 😉



[2] The origin of this name in Dutch isn’t really certain as far as I know. Alledgedly, it is a left-over from the period in which butter, cheese and eggs were sold at the door (so was milk, but that was done separately since it was sold daily). The salesman had a list with columns for each of these three products, and would jot down a cross or a zero whenever a customer at an address bought or declined a product. Three crosses in a row would earn them praise from the boss. 





Doug NatelsonBrief items

I just returned from some travel, and I have quite a bit of writing I need to do, but here are a few items of interest:

  • No matter how many times I see them (here I discussed a result from ten years ago), I'm still impressed by images taken of molecular orbitals, as in the work by IBM Zurich that has now appeared in Science.  Here is the relevant video.
  • Speaking of good videos, here is a talk by Tadashi Tokieda, presently at Stanford, titled "Science from a Sheet of Paper".  Really nicely done, and it shows a great example of how surprising general behavior can emerge from simple building blocks.
  • It's a couple of years old now, but this is a nice overview of the experimental state of the problem of high temperature superconductivity, particularly in the cuprates.
  • Along those lines, here is a really nice article from SciAm by Greg Boebinger about achieving the promise of those materials.  
  • Arguments back and forth continue about the metallization of hydrogen.
  • And Sean Carroll shows how remunerative it can be to be a science adviser for a Hollywood production.

July 14, 2019

Scott AaronsonOn two blog posts of Jerry Coyne

A few months ago, I got to know Jerry Coyne, the recently-retired biologist at the University of Chicago who writes the blog “Why Evolution Is True.” The interaction started when Jerry put up a bemused post about my thoughts on predictability and free will, and if I pointed out that if he wanted to engage me on those topics, there was more to go on than an 8-minute YouTube video. I told Coyne that it would be a shame to get off on the wrong foot with him, since perusal of his blog made it obvious that whatever he and I disputed, it was dwarfed by our areas of agreement. He and I exchanged more emails and had lunch in Chicago.

By way of explaining how he hadn’t read “The Ghost in the Quantum Turing Machine,” Coyne emphasized the difference in my and his turnaround times: while these days I update my blog only a couple times per month, Coyne often updates multiple times per day. Indeed the sheer volume of material he posts, on subjects from biology to culture wars to Chicago hot dogs, would take months to absorb.

Today, though, I want to comment on just two posts of Jerry’s.

The first post, from back in May, concerns David Gelernter, the computer science professor at Yale who was infamously injured in a 1993 attack by the Unabomber, and who’s now mainly known as a right-wing commentator. I don’t know Gelernter, though I did once attend a small interdisciplinary workshop in the south of France that Gelernter also attended, wherein I gave a talk about quantum computing and computational complexity in which Gelernter showed no interest. Anyway, Gelernter, in an essay in May for the Claremont Review of Books, argued that recent work has definitively disproved Darwinism as a mechanism for generating new species, and until something better comes along, Intelligent Design is the best available alternative.

Curiously, I think that Gelernter’s argument falls flat not for detailed reasons of biology, but mostly just because it indulges in bad math and computer science—in fact, in precisely the sorts of arguments that I was trying to answer in my segment on Morgan Freeman’s Through the Wormhole (see also Section 3.2 of Why Philosophers Should Care About Computational Complexity). Gelernter says that

  1. a random change to an amino acid sequence will pretty much always make it worse,
  2. the probability of finding a useful new such sequence by picking one at random is at most ~1 in 1077, and
  3. there have only been maybe ~1040 organisms in earth’s history.

Since 1077 >> 1040, Darwinism is thereby refuted—not in principle, but as an explanation for life on earth. QED.

The most glaring hole in the above argument, it seems to me, is that it simply ignores intermediate possible numbers of mutations. How hard would it be to change, not 1 or 100, but 5 amino acids in a given protein to get a usefully different one—as might happen, for example, with local optimization methods like simulated annealing run at nonzero temperature? And how many chances were there for that kind of mutation in the earth’s history?

Gelernter can’t personally see how a path could cut through the exponentially large solution space in a polynomial amount of time, so he asserts that it’s impossible. Many of the would-be P≠NP provers who email me every week do the same. But this particular kind of “argument from incredulity” has an abysmal track record: it would’ve applied equally well, for example, to problems like maximum matching that turned out to have efficient algorithms. This is why, in CS, we demand better evidence of hardness—like completeness results or black-box lower bounds—neither of which seem however to apply to the case at hand. Surely Gelernter understands all this, but had he not, he could’ve learned it from my lecture at the workshop in France!

Alas, online debate, as it’s wont to do, focused less on Gelernter’s actual arguments and the problems with them, than on the tiresome questions of “standing” and “status.” In particular: does Gelernter’s authority, as a noted computer science professor, somehow lend new weight to Intelligent Design? Or conversely: does the very fact that a computer scientist endorsed ID prove that computer science itself isn’t a real science at all, and that its practitioners should never be taken seriously in any statements about the real world?

It’s hard to say which of these two questions makes me want to bury my face deeper into my hands. Serge Lang, the famous mathematician and textbook author, spent much of his later life fervently denying the connection between HIV and AIDS. Lynn Margulis, the discoverer of the origin of mitochondria (and Carl Sagan’s first wife), died a 9/11 truther. What broader lesson should we draw from any of this? And anyway, what percentage of computer scientists actually do doubt evolution, and how does it compare to the percentage in other academic fields and other professions? Isn’t the question of how divorced we computer scientists are from the real world an … ahem … empirical matter, one hard to answer on the basis of armchair certainties and anecdotes?

Speaking of empiricism, if you check Gelernter’s publication list on DBLP and his Google Scholar page, you’ll find that he did influential work in programming languages, parallel computing, and other areas from 1981 through 1997, and then in the past 22 years published a grand total of … two papers in computer science. One with four coauthors, the other a review/perspective piece about his earlier work. So it seems fair to say that, some time after receiving tenure in a CS department, Gelernter pivoted (to put it mildly) away from CS and toward conservative punditry. His recent offerings, in case you’re curious, include the book America-Lite: How Imperial Academia Dismantled Our Culture (and Ushered In the Obamacrats).

Some will claim that this case underscores what’s wrong with the tenure system itself, while others will reply that it’s precisely what tenure was designed for, even if in this instance you happen to disagree with what Gelernter uses his tenured freedom to say. The point I wanted to make is different, though. It’s that the question “what kind of a field is computer science, anyway, that a guy can do high-level CS research on Monday, and then on Tuesday reject Darwinism and unironically use the word ‘Obamacrat’?”—well, even if I accepted the immense weight this question places on one atypical example (which I don’t), and even if I dismissed the power of compartmentalization (which I again don’t), the question still wouldn’t arise in Gelernter’s case, since getting from “Monday” to “Tuesday” seems to have taken him 15+ years.

Anyway, the second post of Coyne’s that I wanted to talk about is from just yesterday, and is about Jeffrey Epstein—the financier, science philanthropist, and confessed sex offender, whose appalling crimes you’ll have read all about this week if you weren’t on a long sea voyage without Internet or something.

For the benefit of my many fair-minded friends on Twitter, I should clarify that I’ve never met Jeffrey Epstein, let alone accepted any private flights to his sex island or whatever. I doubt he has any clue who I am either—even if he did once claim to be “intrigued” by quantum information.

I do know a few of the scientists who Epstein once hung out with, including Seth Lloyd and Steven Pinker. Pinker, in particular, is now facing vociferous attacks on Twitter, similar in magnitude perhaps to what I faced in the comment-171 affair, for having been photographed next to Epstein at a 2014 luncheon that was hosted by Lawrence Krauss (a physicist who later faced sexual harassment allegations of his own). By the evidentiary standards of social media, this photo suffices to convict Pinker as basically a child molester himself, and is also a devastating refutation of any data that Pinker might have adduced in his books about the Enlightenment’s contributions to human flourishing.

From my standpoint, what’s surprising is not that Pinker is up against this, but that it took this long to happen, given that Pinker’s pro-Enlightenment, anti-blank-slate views have had the effect of painting a giant red target on his back. Despite the near-inevitability, though, you can’t blame Pinker for wanting to defend himself, as I did when it was my turn for the struggle session.

Thus, in response to an emailed inquiry by Jerry Coyne, Pinker shared some detailed reflections about Epstein; Pinker then gave Coyne permission to post those reflections on his blog (though they were originally meant for Coyne only). Like everything Pinker writes, they’re worth reading in full. Here’s the opening paragraph:

The annoying irony is that I could never stand the guy [Epstein], never took research funding from him, and always tried to keep my distance. Friends and colleagues described him to me as a quantitative genius and a scientific sophisticate, and they invited me to salons and coffee klatches at which he held court. But I found him to be a kibitzer and a dilettante — he would abruptly change the subject ADD style, dismiss an observation with an adolescent wisecrack, and privilege his own intuitions over systematic data.

Pinker goes on to discuss his record of celebrating, and extensively documenting, the forces of modernity that led to dramatic reductions in violence against women and that have the power to continue doing so. On Twitter, Pinker had already written: “Needless to say I condemn Epstein’s crimes in the strongest terms.”

I probably should’ve predicted that Pinker would then be attacked again—this time, for having prefaced his condemnation with the phrase “needless to say.” The argument, as best I can follow, runs like this: given all the isms of which woke Twitter has already convicted Pinker—scientism, neoliberalism, biological determinism, etc.—how could Pinker’s being against Epstein’s crimes (which we recently learned probably include the rape, and not only statutorily, of a 15-year-old) possibly be assumed as a given?

For the record, just as Epstein’s friends and enablers weren’t confined to one party or ideology, so the public condemnation of Epstein strikes me as a matter that is (or should be) beyond ideology, with all reasonable dispute now confined to the space between “very bad” and “extremely bad,” between “lock away for years” and “lock away for life.”

While I didn’t need Pinker to tell me that, one reason I personally appreciated his comments is that they helped to answer a question that had bugged me, and that none of the mountains of other condemnations of Epstein had given me a clear sense about. Namely: supposing, hypothetically, that I’d met Epstein around 2002 or so—without, of course, knowing about his crimes—would I have been as taken with him as many other academics seem to have been? (Would you have been? How sure are you?)

Over the last decade, I’ve had the opportunity to meet some titans and semi-titans of finance and business, to discuss quantum computing and other nerdy topics. For a few (by no means all) of these titans, my overriding impression was precisely their unwillingness to concentrate on any one point for more than about 20 seconds—as though they wanted the crust of a deep intellectual exchange without the meat filling. My experience with them fit Pinker’s description of Epstein to a T (though I hasten to add that, as far as I know, none of these others ran teenage sex rings).

Anyway, given all the anger at Pinker for having intersected with Epstein, it’s ironic that I could easily imagine Pinker’s comments rattling Epstein the most of anyone’s, if Epstein hears of them from his prison cell. It’s like: Epstein must have developed a skin like a rhinoceros’s by this point about being called a child abuser, a creep, and a thousand similar (and similarly deserved) epithets. But “a kibitzer and a dilettante” who merely lured famous intellectuals into his living room, with wads of cash not entirely unlike the ones used to lure teenage girls to his massage table? Ouch!

OK, but what about Alan Dershowitz—the man who apparently used to be Epstein’s close friend, who still is Pinker’s friend, and who played a crucial role in securing Epstein’s 2008 plea bargain, the one now condemned as a travesty of justice? I’m not sure how I feel about Dershowitz.  It’s like: I understand that our system requires attorneys willing to mount a vociferous defense even for clients who they privately know or believe to be guilty—and even to get those clients off on technicalities or bargaining whenever they can.  I’m also incredibly grateful that I chose CS rather than law school, because I don’t think I could last an hour advocating causes that I knew to be unjust. Just like my fellow CS professor, the intelligent design advocate David Gelernter, I have the privilege and the burden of speaking only for myself.

July 11, 2019

Dave BaconThe open access wars

Vox has just published an excellent article on open access scientific publishing, “The open access wars“. While that is too provocative of a title, the article still manages to give an accurate assessment of the current state of play. Although I like the article, I can’t help but nitpick a few points.

From the article,

“We spoke with executives at both Elsevier and Springer Nature, and they maintain their companies still provide a lot of value in ensuring the quality of academic research.”

This is false. Publishers do not add any significant value in ensuring the quality of the academic research. Peer reviewers do that, and even then not consistently. True, the publishers facilitate finding peer reviewers, but this has nothing to do with the actual publishing of the research. The role of the journal itself (sans peer review) is just to market and advertise the research, not to ensure quality. The journal also provides a venue for signaling for the prestige-smitten researcher. It is a personal value judgement how much these other things matter, but they certainly don’t impact the quality of the research.

Organizing the peer review (as opposed to actually reviewing) is the job of a middleman: it may provide a service, but it doesn’t add value to the product and it only drives up prices. This is why non-profit overlay journals like Quantum only cost between 0 and €200 to publish. The average cost per submission for hosting a preprint on the arxiv is less than $7.

Which brings me to my next point: I was also a little disappointed in the article that they failed to mention They do mention the rise of preprints, but they only mention, which apparently only came online in 2007. By contrast, the oldest arxiv preprint that I’m aware of is Paul Ginsparg’s notes on conformal field theory, which were posted in 1988!

That might be the oldest timestamp, but the arxiv only started having regular preprint service in 1991. Still, this means that essentially all physics research in the last 25+ years is available for free online. In practice, this means that any time you need a physics paper, you simply find the associated preprint and read that instead of the journal version. This is especially convenient for a field like quantum information, where all but a handful of papers are available on the arxiv.

Any article on open access should lead with a discussion of the arxiv. It’s one of the most important science-related developments of the last 30 years.

July 09, 2019

Scott AaronsonJohn Wright joins UT Austin

I’m delighted to announce that quantum computing theorist John Wright will be joining the computer science faculty at UT Austin in Fall 2020, after he finishes a one-year postdoc at Caltech.

John made an appearance on this blog a few months ago, when I wrote about the new breakthrough by him and Anand Natarajan: namely, that MIP* (multi-prover interactive proofs with entangled provers) contains NEEXP (nondeterministic double-exponential time). Previously, MIP* had only been known to contain NEXP (nondeterministic single exponential time). So, this is an exponential expansion in the power of entangled provers over what was previously known and believed, and the first proof that entanglement actually increases the power of multi-prover protocols, rather than decreasing it (as it could’ve done a priori). Even more strikingly, there seems to be no natural stopping point: MIP* might soon swallow up arbitrary towers of exponentials or even the halting problem (!). For more, see for example this Quanta article, or this post by Thomas Vidick, or this short story [sic] by Henry Yuen.

John grew up in Texas, so he’s no stranger to BBQ brisket or scorching weather. He did his undergrad in computer science at UT Austin—my colleagues remember him as a star—and then completed his PhD with Ryan O’Donnell at Carnegie Mellon, followed by a postdoc at MIT. Besides the work on MIP*, John is also well-known for his 2015 work with O’Donnell pinning down the sample complexity of quantum state tomography. Their important result, a version of which was independently obtained by Haah et al., says that if you want to learn an unknown d-dimensional quantum mixed state ρ to a reasonable precision, then ~d2 copies of ρ are both necessary and sufficient. This solved a problem that had personally interested me, and already plays a role in, e.g., my work on shadow tomography and gentle measurements.

Our little quantum information center at UT Austin is growing rapidly. Shyam Shankar, a superconducting qubits guy who previously worked in Michel Devoret’s group at Yale, will also be joining UT’s Electrical and Computer Engineering department this fall. I’ll have two new postdocs—Andrea Rocchetto and Yosi Atia—as well as new PhD students. We’ll continue recruiting this coming year, with potential opportunities for students, postdocs, faculty, and research scientists across the CS, physics, and ECE departments as well as the Texas Advanced Computing Center (TACC). I hope you’ll consider applying to join us.

With no evaluative judgment attached, I can honestly say that this is an unprecedented time for quantum computing as a field. Where once faculty applicants struggled to make a case for quantum computing (physics departments: “but isn’t this really CS?” / CS departments: “isn’t it really physics?” / everyone: “couldn’t this whole QC thing, like, all blow over in a year?”), today departments are vying with each other and with industry players and startups to recruit talented people. In such an environment, we’re fortunate to be doing as well as we are. We hope to continue to expand.

Meanwhile, this was an unprecedented year for CS hiring at UT Austin more generally. John Wright is one of at least four new faculty (probably more) who will be joining us. It’s a good time to be in CS.

A huge welcome to John, and hook ’em Hadamards!

(And for US readers: have a great 4th! Though how could any fireworks match the proof of the Sensitivity Conjecture?)

July 08, 2019

Sean CarrollSpacetime and Geometry: Now at Cambridge University Press

Hard to believe it’s been 15 years since the publication of Spacetime and Geometry: An Introduction to General Relativity, my graduate-level textbook on everyone’s favorite theory of gravititation. The book has become quite popular, being used as a text in courses around the world. There are a lot of great GR books out there, but I felt another one was needed that focused solely on the idea of “teach students general relativity.” That might seem like an obvious goal, but many books also try to serve as reference books, or to put forward a particular idiosyncratic take on the subject. All I want to do is to teach you GR.

And now I’m pleased to announce that the book is changing publishers, from Pearson to Cambridge University Press. Even with a new cover, shown above.

I must rush to note that it’s exactly the same book, just with a different publisher. Pearson was always good to me, I have no complaints there, but they are moving away from graduate physics texts, so it made sense to try to find S&G a safe permanent home.

Well, there is one change: it’s cheaper! You can order the book either from CUP directly, or from other outlets such as Amazon. Copies had been going for roughly $100, but the new version lists for only $65 — and if the Amazon page is to be believed, it’s currently on sale for an amazing $46. That’s a lot of knowledge for a minuscule price. I’d rush to snap up copies for you and your friends, if I were you.

My understanding is that copies of the new version are not quite in stores yet, but they’re being printed and should be there momentarily. Plenty of time for courses being taught this Fall. (Apologies to anyone who has been looking for the book over the past couple of months, when it’s been stuck between publishers while we did the handover.)

Again: it’s precisely the same book. I have thought about doing revisions to produce an actually new edition, but I think about many things, and that’s not a super-high priority right now. Maybe some day.

Thanks to everyone who has purchased Spacetime and Geometry over the years, and said such nice things about it. Here’s to the next generation!

July 06, 2019

Jordan EllenbergOrioles optimism update

My Orioles optimism from the end of April hasn’t held up too well. When I wrote that, the team was 10-18. Since then, they’ve won 16 more games, and lost — it kind of hurts to type this — 43.

Why so bad? The team ERA has dropped almost half a run since I wrote that post, from 6.15 to 5.75. Their RS and RA for June were about the same as they were for May, but they went 6-20 instead of 8-19.

I can’t believe I’m saying this, but — maybe the Orioles aren’t really that bad? Their Pythagorean record is 28-59, which is terrible, but not even worst in MLB right now. (That honor belongs to the Tigers.) John Means continues to be great and Andrew Cashner and Dylan Bundy have now been pretty consistently turning in utterly acceptable starts.

The thing about baseball is, things happen suddenly. On Tuesday, September 5, 2017, less than two years ago, Manny Machado hit a 2-run homer in the bottom of the 9th to give the Orioles a 7-6 win against the Yankees. The Orioles were 71-68.

The next game after that, they lost 9-1. And then went 4-18 the rest of the way. They haven’t had a full month since then with a record better than .360. The Orioles became terrible in an instant. I don’t see why it can’t go the other way.

July 05, 2019

Jordan EllenbergParis June 2019

Back from nearly two weeks at the Institut Henri Poincare, where we were reinventing rational points, though they actually seem pretty much as they have always been. But lots of new ideas floating around and in particular lots of problems I see as potentially rich ones for students.

Last week featured the hottest temperatures ever recorded in France, reminding one that when you move the mean of a distribution even a little, the frequency of formerly rare events might jump quite a lot. Paris was spared the worst of the heat; after initial predictions of temperatures going over 100F, the hottest day of the conference was 97 and the rest of the week was in the mid-90s, regular old East Coast US summer weather. But of course France doesn’t have regular old East Coast US summer air-conditioning. Faiblement climatisé is the order of the day. The word for heatwave in French is “canicule,” which comes from the Italian word for Sirius, thought to be a bringer of hot weather.

It’s also the Women’s World Cup. Tickets for the US-France quarterfinal, held the night before I left, were going at 350 euros for the very cheapest, but I don’t think I’d have wanted to go, anyway. The Orioles are the only team I love enough to really enjoy rooting for them as the visiting team. Instead I went to Scotland-Argentina, which looked like a laugher 70 minutes in with Scotland up 3-0, but ended in a controversial tie after Scotland’s apparent save of a last-minute penalty kick was called back when VAR showed the goalie jumping off the line a moment before the ball was kicked. The ref called end of time directly after the second kick went in to tie the game, to the confusion and dismay of the players on the field; both teams needed a win to have a real chance of advancing past the group stage, and the tie left them both out. Scottish forward Erin Cuthbert pulled something out of her sock and kissed it after her goal; later I found out it was a picture of herself as a baby. I like her style!

I ate well. I ate whelks. They’re OK. I ate thiebou djienne at this place near IHP which was much better than OK. I ate a watermelon-chevre salad that was so good I went to a spice store and bought the pepper they used, piment d’espelette, and now I have a spice Penzey’s doesn’t sell. Favorite new cheese I ate on this trip was Soumaintrain.

I went to the museum of Jewish history where I saw this campaign poster:

And I saw the computer teen Blaise Pascal built for his dad in 1642, which is at the Musée des arts et métiers, along with a revolutionary 10-hour clock:

And right there at the museum, later that night, just by my good luck, there was a free Divine Comedy concert as part of the Fête de la Musique. It was sold out but, my good luck part deux, someone’s friend didn’t show up and in I went. Great set. Sort of a beautifully multinational moment to watch an Irish guy play a They Might Be Giants song in Paris in front of a cast of the Statue of Liberty:

I also learned on this trip that when French kids play Capture the Flag they use an actual French flag:

and that “Good Grief!” in French is “Bon sang!”

Scott AaronsonSensitivity Conjecture resolved

The Sensitivity Conjecture, which I blogged about here, says that, for every Boolean function f:{0,1}n→{0,1}, the sensitivity of f—that is, the maximum, over all 2n input strings x∈{0,1}n, of the number of input bits such that flipping them changes the value of f—is at most polynomially smaller than a bunch of other complexity measures of f, including f’s block sensitivity, degree as a real polynomial, and classical and quantum query complexities. (For more, see for example this survey by Buhrman and de Wolf. Or for quick definitions of the relevant concepts, see here.)

Ever since it was posed by Nisan and Szegedy in 1989, this conjecture has stood as one of the most frustrating and embarrassing open problems in all of combinatorics and theoretical computer science. It seemed so easy, and so similar to other statements that had 5-line proofs. But a lot of the best people in the field sank months into trying to prove it. For whatever it’s worth, I also sank … well, at least weeks into it.

Now Hao Huang, a mathematician at Emory University, has posted a 6-page preprint on his homepage that finally proves the Sensitivity Conjecture, in the form s(f)≥√deg(f). (I thank Ryan O’Donnell for tipping me off to this.) Within the preprint, the proof itself is about a page and a half.

Whenever there’s an announcement like this, ~99% of the time either the proof is wrong, or at any rate it’s way too complicated for outsiders to evaluate it quickly. This is one of the remaining 1% of cases. I’m rather confident that the proof is right. Why? Because I read and understood it. It took me about half an hour. If you’re comfortable with concepts like induced subgraph and eigenvalue, you can do the same.

From pioneering work by Gotsman and Linial in 1992, it was known that to prove the Sensitivity Conjecture, it suffices to prove the following even simpler combinatorial conjecture:

Let S be any subset of the n-dimensional Boolean hypercube, {0,1}n, which has size 2n-1+1. Then there must be a point in S with at least ~nc neighbors in S.

Here c>0 is some constant (say 1/2), and two points in S are “neighbors” if and only they differ in a single coordinate. Note that if S had size 2n-1, then the above statement would be false—as witnessed, for example, by the set of all n-bit strings with an even number of 1’s.

Huang proceeds by proving the Gotsman-Linial Conjecture. And the way he proves Gotsman-Linial is … well, at this point maybe I should just let you read the damn preprint yourself. I can’t say it more simply than he does.

If I had to try anyway, I’d say: Huang constructs a 2n×2n matrix, called An, that has 0’s where there are no edges between the corresponding vertices of the Boolean hypercube, and either 1’s or -1’s where there are edges—with a simple, weird pattern of 1’s and -1’s that magically makes everything work. He then lets H be an induced subgraph of the Boolean hypercube of size 2n-1+1. He lower-bounds the maximum degree of H by the largest eigenvalue of the corresponding (2n-1+1)×(2n-1+1) submatrix of An. Finally, he lower-bounds that largest eigenvalue by … no, I don’t want to spoil it! Read it yourself!

Paul Erdös famously spoke of a book, maintained by God, in which was written the simplest, most beautiful proof of each theorem. The highest compliment Erdös could give a proof was that it “came straight from the book.” In this case, I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.

Indeed, the question is: how could such an elementary 1.5-page argument have been overlooked for 30 years? I don’t have a compelling answer to that, besides noting that “short” and “elementary” often have little to do with “obvious.” Once you start looking at the spectral properties of this matrix An, the pieces snap together in precisely the right way—but how would you know to look at that?

By coincidence, earlier today I finished reading my first PG Wodehouse novel (Right Ho, Jeeves!), on the gushing recommendation of a friend. I don’t know how I’d missed Wodehouse for 38 years. His defining talent is his ability to tie together five or six plot threads in a way that feels perfect and inevitable even though you didn’t see it coming. This produces a form of pleasure that’s nearly indistinguishable from the pleasure one feels in reading a “proof from the book.” So my pleasure centers are pretty overloaded today—but in such depressing times for the world, I’ll take pleasure wherever I can get it.

Huge congratulations to Hao!

Added thought: What this really is, is one of the purest illustrations I’ve seen in my career of the power and glory of the P≠NP phenomenon. We talk all the time about how proofs are easier to verify than to find. In practice, though, it can be far from obvious that that’s true. Consider your typical STOC/FOCS paper: writing it probably took the authors several months, while fully understanding the thing from scratch would probably take … also several months! If there’s a gap, it’s only by a factor of 4 or 5 or something. Whereas in this case, I don’t know how long Huang spent searching for the proof, but the combined search efforts of the community add up to years or decades. The ratio of the difficulty of finding to the difficulty of completely grasping is in the hundreds of thousands or millions.

Another added thought: Because Hao actually proves a stronger statement than the original Sensitivity Conjecture, it has additional implications, a few of which Hao mentions in his preprint. Here’s one he didn’t mention: any randomized algorithm to guess the parity of an n-bit string, which succeeds with probability at least 2/3 on the majority of strings, must make at least ~√n queries to the string, while any such quantum algorithm must make at least ~n1/4 queries. For more, see the paper Weak Parity by me, Ambainis, Balodis, and Bavarian (Section 6).

Important Update: Hao Huang himself has graciously visited the comment section to satisfy readers’ curiosity by providing a detailed timeline of his work on the Sensitivity Conjecture. (tl;dr: he was introduced to the problem by Mike Saks in 2012, and had been attacking it on and off since then, until he finally had the key insight this past month while writing a grant proposal. Who knew that grant proposals could ever be useful for anything?!?)

Another Update: In the comments section, my former student Shalev Ben-David points out a simplification of Huang’s argument, which no longer uses Cauchy’s interlacing theorem. I thought there was no way this proof could possibly be made any simpler, and I was wrong!

July 04, 2019

Tommaso DorigoGuest Post: A. Kowacs, "Is There A Simpler Perspective On Some Fundamental Laws Of Physics?"

Andras Kovacs studied Physics at Columbia University. He currently works as CTO of BroadBit Batteries company. Andras recently wrote an interesting book, which I asked him to summarize and introduce here. The text below is from him [T.D.]

This blog post introduces a newly published book, titled "Maxwell-Dirac Theory and Occam's Razor: Unified Field, Elementary Particles, and Nuclear Interactions".

read more

July 03, 2019

Cylindrical OnionSubatomic Desire at CMS

Les Atomes Dansants at CMS experimental site

Many philosophers thought that the Desire is the driving force of the world, on both microscopic and macroscopic scales. The same Desire that moves scientists in their demanding and endless quest after knowledge.  

July 02, 2019

Terence TaoSymmetric functions in a fractional number of variables, and the multilinear Kakeya conjecture

Let {\Omega} be some domain (such as the real numbers). For any natural number {p}, let {L(\Omega^p)_{sym}} denote the space of symmetric real-valued functions {F^{(p)}: \Omega^p \rightarrow {\bf R}} on {p} variables {x_1,\dots,x_p \in \Omega}, thus

\displaystyle  F^{(p)}(x_{\sigma(1)},\dots,x_{\sigma(p)}) = F^{(p)}(x_1,\dots,x_p)

for any permutation {\sigma: \{1,\dots,p\} \rightarrow \{1,\dots,p\}}. For instance, for any natural numbers {k,p}, the elementary symmetric polynomials

\displaystyle  e_k^{(p)}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} x_{i_1} \dots x_{i_k}

will be an element of {L({\bf R}^p)_{sym}}. With the pointwise product operation, {L(\Omega^p)_{sym}} becomes a commutative real algebra. We include the case {p=0}, in which case {L(\Omega^0)_{sym}} consists solely of the real constants.

Given two natural numbers {k,p}, one can “lift” a symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} of {k} variables to a symmetric function {[F^{(k)}]_{k \rightarrow p} \in L(\Omega^p)_{sym}} of {p} variables by the formula

\displaystyle  [F^{(k)}]_{k \rightarrow p}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} F^{(k)}(x_{i_1}, \dots, x_{i_k})

\displaystyle  = \frac{1}{k!} \sum_\pi F^{(k)}( x_{\pi(1)}, \dots, x_{\pi(k)} )

where {\pi} ranges over all injections from {\{1,\dots,k\}} to {\{1,\dots,p\}} (the latter formula making it clearer that {[F^{(k)}]_{k \rightarrow p}} is symmetric). Thus for instance

\displaystyle  [F^{(1)}(x_1)]_{1 \rightarrow p} = \sum_{i=1}^p F^{(1)}(x_i)

\displaystyle  [F^{(2)}(x_1,x_2)]_{2 \rightarrow p} = \sum_{1 \leq i < j \leq p} F^{(2)}(x_i,x_j)


\displaystyle  e_k^{(p)}(x_1,\dots,x_p) = [x_1 \dots x_k]_{k \rightarrow p}.

Also we have

\displaystyle  [1]_{k \rightarrow p} = \binom{p}{k} = \frac{p(p-1)\dots(p-k+1)}{k!}.

With these conventions, we see that {[F^{(k)}]_{k \rightarrow p}} vanishes for {p=0,\dots,k-1}, and is equal to {F} if {k=p}. We also have the transitivity

\displaystyle  [F^{(k)}]_{k \rightarrow p} = \frac{1}{\binom{p-k}{p-l}} [[F^{(k)}]_{k \rightarrow l}]_{l \rightarrow p}

if {k \leq l \leq p}.

The lifting map {[]_{k \rightarrow p}} is a linear map from {L(\Omega^k)_{sym}} to {L(\Omega^p)_{sym}}, but it is not a ring homomorphism. For instance, when {\Omega={\bf R}}, one has

\displaystyle  [x_1]_{1 \rightarrow p} [x_1]_{1 \rightarrow p} = (\sum_{i=1}^p x_i)^2 \ \ \ \ \ (1)

\displaystyle  = \sum_{i=1}^p x_i^2 + 2 \sum_{1 \leq i < j \leq p} x_i x_j

\displaystyle  = [x_1^2]_{1 \rightarrow p} + 2 [x_1 x_2]_{1 \rightarrow p}

\displaystyle  \neq [x_1^2]_{1 \rightarrow p}.

In general, one has the identity

\displaystyle  [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow p} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow p} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (2)

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow p}

for all natural numbers {k,l,p} and {F^{(k)} \in L(\Omega^k)_{sym}}, {G^{(l)} \in L(\Omega^l)_{sym}}, where {\pi, \rho} range over all injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}} with {\pi(\{1,\dots,k\}) \cup \rho(\{1,\dots,l\}) = \{1,\dots,m\}}. Combinatorially, the identity (2) follows from the fact that given any injections {\tilde \pi: \{1,\dots,k\} \rightarrow \{1,\dots,p\}} and {\tilde \rho: \{1,\dots,l\} \rightarrow \{1,\dots,p\}} with total image {\tilde \pi(\{1,\dots,k\}) \cup \tilde \rho(\{1,\dots,l\})} of cardinality {m}, one has {k,l \leq m \leq k+l}, and furthermore there exist precisely {m!} triples {(\pi, \rho, \sigma)} of injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}}, {\sigma: \{1,\dots,m\} \rightarrow \{1,\dots,p\}} such that {\tilde \pi = \sigma \circ \pi} and {\tilde \rho = \sigma \circ \rho}.

Example 1 When {\Omega = {\bf R}}, one has

\displaystyle  [x_1 x_2]_{2 \rightarrow p} [x_1]_{1 \rightarrow p} = [\frac{1}{2! 1!}( 2 x_1^2 x_2 + 2 x_1 x_2^2 )]_{2 \rightarrow p} + [\frac{1}{2! 1!} 6 x_1 x_2 x_3]_{3 \rightarrow p}

\displaystyle  = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow p} + [3x_1 x_2 x_3]_{3 \rightarrow p}

which is just a restatement of the identity

\displaystyle  (\sum_{i < j} x_i x_j) (\sum_k x_k) = \sum_{i<j} x_i^2 x_j + x_i x_j^2 + \sum_{i < j < k} 3 x_i x_j x_k.

Note that the coefficients appearing in (2) do not depend on the final number of variables {p}. We may therefore abstract the role of {p} from the law (2) by introducing the real algebra {L(\Omega^*)_{sym}} of formal sums

\displaystyle  F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

where for each {k}, {F^{(k)}} is an element of {L(\Omega^k)_{sym}} (with only finitely many of the {F^{(k)}} being non-zero), and with the formal symbol {[]_{k \rightarrow *}} being formally linear, thus

\displaystyle  [F^{(k)}]_{k \rightarrow *} + [G^{(k)}]_{k \rightarrow *} := [F^{(k)} + G^{(k)}]_{k \rightarrow *}


\displaystyle  c [F^{(k)}]_{k \rightarrow *} := [cF^{(k)}]_{k \rightarrow *}

for {F^{(k)}, G^{(k)} \in L(\Omega^k)_{sym}} and scalars {c \in {\bf R}}, and with multiplication given by the analogue

\displaystyle  [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow *} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow *} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (3)

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow *}

of (2). Thus for instance, in this algebra {L(\Omega^*)_{sym}} we have

\displaystyle  [x_1]_{1 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2]_{1 \rightarrow *} + 2 [x_1 x_2]_{2 \rightarrow *}


\displaystyle  [x_1 x_2]_{2 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow *} + [3 x_1 x_2 x_3]_{3 \rightarrow *}.

Informally, {L(\Omega^*)_{sym}} is an abstraction (or “inverse limit”) of the concept of a symmetric function of an unspecified number of variables, which are formed by summing terms that each involve only a bounded number of these variables at a time. One can check (somewhat tediously) that {L(\Omega^*)_{sym}} is indeed a commutative real algebra, with a unit {[1]_{0 \rightarrow *}}. (I do not know if this algebra has previously been studied in the literature; it is somewhat analogous to the abstract algebra of finite linear combinations of Schur polynomials, with multiplication given by a Littlewood-Richardson rule. )

For natural numbers {p}, there is an obvious specialisation map {[]_{* \rightarrow p}} from {L(\Omega^*)_{sym}} to {L(\Omega^p)_{sym}}, defined by the formula

\displaystyle  [\sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}]_{* \rightarrow p} := \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}.

Thus, for instance, {[]_{* \rightarrow p}} maps {[x_1]_{1 \rightarrow *}} to {[x_1]_{1 \rightarrow p}} and {[x_1 x_2]_{2 \rightarrow *}} to {[x_1 x_2]_{2 \rightarrow p}}. From (2) and (3) we see that this map {[]_{* \rightarrow p}: L(\Omega^*)_{sym} \rightarrow L(\Omega^p)_{sym}} is an algebra homomorphism, even though the maps {[]_{k \rightarrow *}: L(\Omega^k)_{sym} \rightarrow L(\Omega^*)_{sym}} and {[]_{k \rightarrow p}: L(\Omega^k)_{sym} \rightarrow L(\Omega^p)_{sym}} are not homomorphisms. By inspecting the {p^{th}} component of {L(\Omega^*)_{sym}} we see that the homomorphism {[]_{* \rightarrow p}} is in fact surjective.

Now suppose that we have a measure {\mu} on the space {\Omega}, which then induces a product measure {\mu^p} on every product space {\Omega^p}. To avoid degeneracies we will assume that the integral {\int_\Omega \mu} is strictly positive. Assuming suitable measurability and integrability hypotheses, a function {F \in L(\Omega^p)_{sym}} can then be integrated against this product measure to produce a number

\displaystyle  \int_{\Omega^p} F\ d\mu^p.

In the event that {F} arises as a lift {[F^{(k)}]_{k \rightarrow p}} of another function {F^{(k)} \in L(\Omega^k)_{sym}}, then from Fubini’s theorem we obtain the formula

\displaystyle  \int_{\Omega^p} F\ d\mu^p = \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}.

Thus for instance, if {\Omega={\bf R}},

\displaystyle  \int_{{\bf R}^p} [x_1]_{1 \rightarrow p}\ d\mu^p = p (\int_{\bf R} x\ d\mu(x)) (\int_{\bf R} \mu)^{p-1} \ \ \ \ \ (4)


\displaystyle  \int_{{\bf R}^p} [x_2]_{1 \rightarrow p}\ d\mu^p = \binom{p}{2} (\int_{{\bf R}^2} x_1 x_2\ d\mu(x_1) d\mu(x_2)) (\int_{\bf R} \mu)^{p-2}. \ \ \ \ \ (5)

On summing, we see that if

\displaystyle  F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

is an element of the formal algebra {L(\Omega^*)_{sym}}, then

\displaystyle  \int_{\Omega^p} [F^{(*)}]_{* \rightarrow p}\ d\mu^p = \sum_{k=0}^\infty \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}. \ \ \ \ \ (6)

Note that by hypothesis, only finitely many terms on the right-hand side are non-zero.

Now for a key observation: whereas the left-hand side of (6) only makes sense when {p} is a natural number, the right-hand side is meaningful when {p} takes a fractional value (or even when it takes negative or complex values!), interpreting the binomial coefficient {\binom{p}{k}} as a polynomial {\frac{p(p-1) \dots (p-k+1)}{k!}} in {p}. As such, this suggests a way to introduce a “virtual” concept of a symmetric function on a fractional power space {\Omega^p} for such values of {p}, and even to integrate such functions against product measures {\mu^p}, even if the fractional power {\Omega^p} does not exist in the usual set-theoretic sense (and {\mu^p} similarly does not exist in the usual measure-theoretic sense). More precisely, for arbitrary real or complex {p}, we now define {L(\Omega^p)_{sym}} to be the space of abstract objects

\displaystyle  F^{(p)} = [F^{(*)}]_{* \rightarrow p} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}

with {F^{(*)} \in L(\Omega^*)_{sym}} and {[]_{* \rightarrow p}} (and {[]_{k \rightarrow p}} now interpreted as formal symbols, with the structure of a commutative real algebra inherited from {L(\Omega^*)_{sym}}, thus

\displaystyle  [F^{(*)}]_{* \rightarrow p} + [G^{(*)}]_{* \rightarrow p} := [F^{(*)} + G^{(*)}]_{* \rightarrow p}

\displaystyle  c [F^{(*)}]_{* \rightarrow p} := [c F^{(*)}]_{* \rightarrow p}

\displaystyle  [F^{(*)}]_{* \rightarrow p} [G^{(*)}]_{* \rightarrow p} := [F^{(*)} G^{(*)}]_{* \rightarrow p}.

In particular, the multiplication law (2) continues to hold for such values of {p}, thanks to (3). Given any measure {\mu} on {\Omega}, we formally define a measure {\mu^p} on {\Omega^p} with regards to which we can integrate elements {F^{(p)}} of {L(\Omega^p)_{sym}} by the formula (6) (providing one has sufficient measurability and integrability to make sense of this formula), thus providing a sort of “fractional dimensional integral” for symmetric functions. Thus, for instance, with this formalism the identities (4), (5) now hold for fractional values of {p}, even though the formal space {{\bf R}^p} no longer makes sense as a set, and the formal measure {\mu^p} no longer makes sense as a measure. (The formalism here is somewhat reminiscent of the technique of dimensional regularisation employed in the physical literature in order to assign values to otherwise divergent integrals. See also this post for an unrelated abstraction of the integration concept involving integration over supercommutative variables (and in particular over fermionic variables).)

Example 2 Suppose {\mu} is a probability measure on {\Omega}, and {X: \Omega \rightarrow {\bf R}} is a random variable; on any power {\Omega^k}, we let {X_1,\dots,X_k: \Omega^k \rightarrow {\bf R}} be the usual independent copies of {X} on {\Omega^k}, thus {X_j(\omega_1,\dots,\omega_k) := X(\omega_j)} for {(\omega_1,\dots,\omega_k) \in \Omega^k}. Then for any real or complex {p}, the formal integral

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p

can be evaluated by first using the identity

\displaystyle  [X_1]_{1 \rightarrow p}^2 = [X_1^2]_{1 \rightarrow p} + 2[X_1 X_2]_{2 \rightarrow p}

(cf. (1)) and then using (6) and the probability measure hypothesis {\int_\Omega\ d\mu = 1} to conclude that

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = \binom{p}{1} \int_{\Omega} X^2\ d\mu + 2 \binom{p}{2} \int_{\Omega^2} X_1 X_2\ d\mu^2

\displaystyle  = p (\int_\Omega X^2\ d\mu - (\int_\Omega X\ d\mu)^2) + p^2 (\int_\Omega X\ d\mu)^2

or in probabilistic notation

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2. \ \ \ \ \ (7)

For {p} a natural number, this identity has the probabilistic interpretation

\displaystyle  \mathbf{E}( X_1 + \dots + X_p)^2 = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2 \ \ \ \ \ (8)

whenever {X_1,\dots,X_p} are jointly independent copies of {X}, which reflects the well known fact that the sum {X_1 + \dots + X_p} has expectation {p \mathbf{E} X} and variance {p \mathbf{Var}(X)}. One can thus view (7) as an abstract generalisation of (8) to the case when {p} is fractional, negative, or even complex, despite the fact that there is no sensible way in this case to talk about {p} independent copies {X_1,\dots,X_p} of {X} in the standard framework of probability theory.

In this particular case, the quantity (7) is non-negative for every nonnegative {p}, which looks plausible given the form of the left-hand side. Unfortunately, this sort of non-negativity does not always hold; for instance, if {X} has mean zero, one can check that

\displaystyle  \int_{\Omega^p} [X_1]_{1 \rightarrow p}^4\ d\mu^p = p \mathbf{Var}(X^2) + p(3p-2) (\mathbf{E}(X^2))^2

and the right-hand side can become negative for {p < 2/3}. This is a shame, because otherwise one could hope to start endowing {L(X^p)_{sym}} with some sort of commutative von Neumann algebra type structure (or the abstract probability structure discussed in this previous post) and then interpret it as a genuine measure space rather than as a virtual one. (This failure of positivity is related to the fact that the characteristic function of a random variable, when raised to the {p^{th}} power, need not be a characteristic function of any random variable once {p} is no longer a natural number: “fractional convolution” does not preserve positivity!) However, one vestige of positivity remains: if {F: \Omega \rightarrow {\bf R}} is non-negative, then so is

\displaystyle  \int_{\Omega^p} [F]_{1 \rightarrow p}\ d\mu^p = p (\int_\Omega F\ d\mu) (\int_\Omega\ d\mu)^{p-1}.

One can wonder what the point is to all of this abstract formalism and how it relates to the rest of mathematics. For me, this formalism originated implicitly in an old paper I wrote with Jon Bennett and Tony Carbery on the multilinear restriction and Kakeya conjectures, though we did not have a good language for working with it at the time, instead working first with the case of natural number exponents {p} and appealing to a general extrapolation theorem to then obtain various identities in the fractional {p} case. The connection between these fractional dimensional integrals and more traditional integrals ultimately arises from the simple identity

\displaystyle  (\int_\Omega\ d\mu)^p = \int_{\Omega^p}\ d\mu^p

(where the right-hand side should be viewed as the fractional dimensional integral of the unit {[1]_{0 \rightarrow p}} against {\mu^p}). As such, one can manipulate {p^{th}} powers of ordinary integrals using the machinery of fractional dimensional integrals. A key lemma in this regard is

Lemma 3 (Differentiation formula) Suppose that a positive measure {\mu = \mu(t)} on {\Omega} depends on some parameter {t} and varies by the formula

\displaystyle  \frac{d}{dt} \mu(t) = a(t) \mu(t) \ \ \ \ \ (9)

for some function {a(t): \Omega \rightarrow {\bf R}}. Let {p} be any real or complex number. Then, assuming sufficient smoothness and integrability of all quantities involved, we have

\displaystyle  \frac{d}{dt} \int_{\Omega^p} F^{(p)}\ d\mu(t)^p = \int_{\Omega^p} F^{(p)} [a(t)]_{1 \rightarrow p}\ d\mu(t)^p \ \ \ \ \ (10)

for all {F^{(p)} \in L(\Omega^p)_{sym}} that are independent of {t}. If we allow {F^{(p)}(t)} to now depend on {t} also, then we have the more general total derivative formula

\displaystyle  \frac{d}{dt} \int_{\Omega^p} F^{(p)}(t)\ d\mu(t)^p \ \ \ \ \ (11)

\displaystyle  = \int_{\Omega^p} \frac{d}{dt} F^{(p)}(t) + F^{(p)}(t) [a(t)]_{1 \rightarrow p}\ d\mu(t)^p,

again assuming sufficient amounts of smoothness and regularity.

Proof: We just prove (10), as (11) then follows by same argument used to prove the usual product rule. By linearity it suffices to verify this identity in the case {F^{(p)} = [F^{(k)}]_{k \rightarrow p}} for some symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} for a natural number {k}. By (6), the left-hand side of (10) is then

\displaystyle  \frac{d}{dt} [\binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu(t)^k) (\int_\Omega\ d\mu(t))^{p-k}]. \ \ \ \ \ (12)

Differentiating under the integral sign using (9) we have

\displaystyle  \frac{d}{dt} \int_\Omega\ d\mu(t) = \int_\Omega\ a(t)\ d\mu(t)

and similarly

\displaystyle  \frac{d}{dt} \int_{\Omega^k} F^{(k)}\ d\mu(t)^k = \int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu(t)^k

where {a_1,\dots,a_k} are the standard {k} copies of {a = a(t)} on {\Omega^k}:

\displaystyle  a_j(\omega_1,\dots,\omega_k) := a(\omega_j).

By the product rule, we can thus expand (12) as

\displaystyle  \binom{p}{k} (\int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu^k ) (\int_\Omega\ d\mu)^{p-k}

\displaystyle  + \binom{p}{k} (p-k) (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ a\ d\mu) (\int_\Omega\ d\mu)^{p-k-1}

where we have suppressed the dependence on {t} for brevity. Since {\binom{p}{k} (p-k) = \binom{p}{k+1} (k+1)}, we can write this expression using (6) as

\displaystyle  \int_{\Omega^p} [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p}\ d\mu^p

where {F^{(k)} \ast a \in L(\Omega^{k+1})_{sym}} is the symmetric function

\displaystyle F^{(k)} \ast a(\omega_1,\dots,\omega_{k+1}) := \sum_{j=1}^{k+1} F^{(k)}(\omega_1,\dots,\omega_{j-1},\omega_{j+1} \dots \omega_{k+1}) a(\omega_j).

But from (2) one has

\displaystyle  [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p} = [F^{(k)}]_{k \rightarrow p} [a]_{1 \rightarrow p}

and the claim follows. \Box

Remark 4 It is also instructive to prove this lemma in the special case when {p} is a natural number, in which case the fractional dimensional integral {\int_{\Omega^p} F^{(p)}\ d\mu(t)^p} can be interpreted as a classical integral. In this case, the identity (10) is immediate from applying the product rule to (9) to conclude that

\displaystyle  \frac{d}{dt} d\mu(t)^p = [a(t)]_{1 \rightarrow p} d\mu(t)^p.

One could in fact derive (10) for arbitrary real or complex {p} from the case when {p} is a natural number by an extrapolation argument; see the appendix of my paper with Bennett and Carbery for details.

Let us give a simple PDE application of this lemma as illustration:

Proposition 5 (Heat flow monotonicity) Let {u: [0,+\infty) \times {\bf R}^d \rightarrow {\bf R}} be a solution to the heat equation {u_t = \Delta u} with initial data {\mu_0} a rapidly decreasing finite non-negative Radon measure, or more explicitly

\displaystyle  u(t,x) = \frac{1}{(4\pi t)^{d/2}} \int_{{\bf R}^d} e^{-|x-y|^2/4t}\ d\mu_0(y)

for al {t>0}. Then for any {p>0}, the quantity

\displaystyle  Q_p(t) := t^{\frac{d}{2} (p-1)} \int_{{\bf R}^d} u(t,x)^p\ dx

is monotone non-decreasing in {t \in (0,+\infty)} for {1 < p < \infty}, constant for {p=1}, and monotone non-increasing for {0 < p < 1}.

Proof: By a limiting argument we may assume that {d\mu_0} is absolutely continuous, with Radon-Nikodym derivative a test function; this is more than enough regularity to justify the arguments below.

For any {(t,x) \in (0,+\infty) \times {\bf R}^d}, let {\mu(t,x)} denote the Radon measure

\displaystyle  d\mu(t,x)(y) := \frac{1}{(4\pi)^{d/2}} e^{-|x-y|^2/4t}\ d\mu_0(y).

Then the quantity {Q_p(t)} can be written as a fractional dimensional integral

\displaystyle  Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx.

Observe that

\displaystyle  \frac{\partial}{\partial t} d\mu(t,x) = \frac{|x-y|^2}{4t^2} d\mu(t,x)

and thus by Lemma 3 and the product rule

\displaystyle  \frac{d}{dt} Q_p(t) = -\frac{d}{2t} Q_p(t) + t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [\frac{|x-y|^2}{4t^2}]_{1 \rightarrow p} d\mu(t,x)^p\ dx \ \ \ \ \ (13)

where we use {y} for the variable of integration in the factor space {{\bf R}^d} of {({\bf R}^d)^p}.

To simplify this expression we will take advantage of integration by parts in the {x} variable. Specifically, in any direction {x_j}, we have

\displaystyle  \frac{\partial}{\partial x_j} d\mu(t,x) = -\frac{x_j-y_j}{2t} d\mu(t,x)

and hence by Lemma 3

\displaystyle  \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx = - \int_{({\bf R}^d)^p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

Multiplying by {x_j} and integrating by parts, we see that

\displaystyle  d Q_p(t) = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle  = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

where we use the Einstein summation convention in {j}. Similarly, if {F_j(y)} is any reasonable function depending only on {y}, we have

\displaystyle  \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}[F_j(y)]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = - \int_{({\bf R}^d)^p} [F_j(y)]_{1 \rightarrow p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

and hence on integration by parts

\displaystyle  0 = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [F_j(y) \frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

We conclude that

\displaystyle  \frac{d}{2t} Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} (x_j - [F_j(y)]_{1 \rightarrow p}) [\frac{(x_j-y_j)}{4t}]_{1 \rightarrow p} d\mu(t,x)^p\ dx

and thus by (13)

\displaystyle  \frac{d}{dt} Q_p(t) = \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}

\displaystyle [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p} - (x_j - [F_j(y)]_{1 \rightarrow p}) [x_j - y_j]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

The choice of {F_j} that then achieves the most cancellation turns out to be {F_j(y) = \frac{1}{p} y_j} (this cancels the terms that are linear or quadratic in the {x_j}), so that {x_j - [F_j(y)]_{1 \rightarrow p} = \frac{1}{p} [x_j - y_j]_{1 \rightarrow p}}. Repeating the calculations establishing (7), one has

\displaystyle  \int_{({\bf R}^d)^p} [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p}\ d\mu^p = p \mathop{\bf E} |x-Y|^2 (\int_{{\bf R}^d}\ d\mu)^{p}


\displaystyle  \int_{({\bf R}^d)^p} [x_j-y_j]_{1 \rightarrow p} [x_j-y_j]_{1 \rightarrow p}\ d\mu^p

\displaystyle = (p \mathbf{Var}(x-Y) + p^2 |\mathop{\bf E} x-Y|^2) (\int_{{\bf R}^d}\ d\mu)^{p}

where {Y} is the random variable drawn from {{\bf R}^d} with the normalised probability measure {\mu / \int_{{\bf R}^d}\ d\mu}. Since {\mathop{\bf E} |x-Y|^2 = \mathbf{Var}(x-Y) + |\mathop{\bf E} x-Y|^2}, one thus has

\displaystyle  \frac{d}{dt} Q_p(t) = (p-1) \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \mathbf{Var}(x-Y) (\int_{{\bf R}^d}\ d\mu)^{p}\ dx. \ \ \ \ \ (14)

This expression is clearly non-negative for {p>1}, equal to zero for {p=1}, and positive for {0 < p < 1}, giving the claim. (One could simplify {\mathbf{Var}(x-Y)} here as {\mathbf{Var}(Y)} if desired, though it is not strictly necessary to do so for the proof.) \Box

Remark 6 As with Remark 4, one can also establish the identity (14) first for natural numbers {p} by direct computation avoiding the theory of fractional dimensional integrals, and then extrapolate to the case of more general values of {p}. This particular identity is also simple enough that it can be directly established by integration by parts without much difficulty, even for fractional values of {p}.

A more complicated version of this argument establishes the non-endpoint multilinear Kakeya inequality (without any logarithmic loss in a scale parameter {R}); this was established in my previous paper with Jon Bennett and Tony Carbery, but using the “natural number {p} first” approach rather than using the current formalism of fractional dimensional integration. However, the arguments can be translated into this formalism without much difficulty; we do so below the fold. (To simplify the exposition slightly we will not address issues of establishing enough regularity and integrability to justify all the manipulations, though in practice this can be done by standard limiting arguments.)

— 1. Multilinear heat flow monotonicity —

Before we give a multilinear variant of Proposition 5 of relevance to the multilinear Kakeya inequality, we first need to briefly set up the theory of finite products

\displaystyle  \Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}

of fractional powers of spaces {\Omega_1,\dots,\Omega_k}, where {p_1,\dots,p_k} are real or complex numbers. The functions {F^{(p_1,\dots,p_k)}} to integrate here lie in the tensor product space

\displaystyle  L(\Omega_1^{p_1})_{sym} \otimes \dots \otimes L(\Omega_k^{p_k})_{sym}, \ \ \ \ \ (15)

which is generated by tensor powers

\displaystyle  F^{(p_1,\dots,p_k)} = F_1^{(p_1)} \otimes \dots \otimes F_k^{(p_k)}

with {F_j^{(p_j)} \in L(\Omega_j^{p_j})_{sym}}, with the usual tensor product identifications and algebra operations. One can evaluate fractional dimensional integrals of such functions against “virtual product measures” {d\mu_1^{p_1} \dots d\mu_k^{p_k}}, with {\mu_j} a measure on {\Omega_j}, by the natural formula

\displaystyle  \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} F_1^{(p_1)} \otimes \dots \otimes F_k^{(p_k)} d\mu_1^{p_1} \dots d\mu_k^{p_k} := \prod_{j=1}^k ( \int_{\Omega_j^{p_j}} F_j^{(p_j)}\ d\mu_j^{p_j} )

assuming sufficient measurability and integrability hypotheses. We can lift functions {F_j^{(m)} \in L(\Omega_j^m)_{sym}} to an element {[F_j^{(m)}]_{m \rightarrow p; j}} of the space (15) by the formula

\displaystyle  [F_j^{(m)}]_{m \rightarrow p; j} := 1^{\otimes j-1} \otimes [F_j^{(m)}]_{m \rightarrow p} \otimes 1^{\otimes k-j}.

This is easily seen to be an algebra homomorphism.

Example 7 If {F_1: \Omega_1 \rightarrow {\bf R}} and {F_2: \Omega_2 \rightarrow {\bf R}} are functions and {\mu_1, \mu_2} are measures on {\Omega_1, \Omega_2} respectively, then (assuming sufficient measurability and integrability) then the multiple fractional dimensional integral

\displaystyle  \int_{\Omega_1^{p_1} \times \Omega_2^{p_2}} [F_1]_{1 \rightarrow p_1; 1} [F_2]_{1 \rightarrow p_2;2}\ d\mu_1^{p_1} d\mu^{p_2}

is equal to

\displaystyle  p_1 (\int_{\Omega_1} F_1\ d\mu_1) (\int_{\Omega_1}\ d\mu_1)^{p_1-1} p_2 (\int_{\Omega_2} F_2\ d\mu_2) (\int_{\Omega_2}\ d\mu_2)^{p_2-2}.

In the case that {p_1,p_2} are natural numbers, one can view the “virtual” integrand {[F_1]_{1 \rightarrow p_1; 1} [F_2]_{1 \rightarrow p_2;2}} here as an actual function on {\Omega_1^{p_1} \times \Omega_2^{p_2}}, namely

\displaystyle  (\omega_{1;1},\dots,\omega_{p_1;1}), (\omega_{1;2},\dots,\omega_{p_2;2}) \mapsto \sum_{i_1=1}^{p_1} F_1(\omega_{i_1;1}) \sum_{i_2=1}^{p_2} F_2(\omega_{i_2;2})

in which case the above evaluation of the integral can be achieved classically.

From a routine application of Lemma 3 and various forms of the product rule, we see that if each {\mu_j(t)} varies with respect to a time parameter {t} by the formula

\displaystyle  \frac{d}{dt} \mu_j(t) = a_j(t) \mu_j(t)

and {F^{(p_1,\dots,p_k)}(t)} is a time-varying function in (15), then (assuming sufficient regularity and integrability), the time derivative

\displaystyle  \frac{d}{dt} \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} F^{(p_1,\dots,p_k)}(t)\ d\mu_1(t)^{p_1} \dots d\mu_k(t)^{p_k}

is equal to

\displaystyle  \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} \frac{d}{dt} F^{(p_1,\dots,p_k)}(t) \ \ \ \ \ (16)

Now suppose that for each space {\Omega_j} one has a non-negative measure {\mu_j^0}, a vector-valued function {y_j: \Omega_j \rightarrow {\bf R}^d}, and a matrix-valued function {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} taking values in real symmetric positive semi-definite {d \times d} matrices. Let {p_1,\dots,p_k} be positive real numbers; we make the abbreviations

\displaystyle  \vec p := (p_1,\dots,p_k)

\displaystyle  \Omega^{\vec p} := \Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}.

For any {t > 0} and {x \in {\bf R}^d}, we define the modified measures

\displaystyle  d\mu_j(t,x) := e^{-\pi \langle A_j(x-y_j), (x-y_j) \rangle/t}\ d\mu_j^0

and then the product fractional power measure

\displaystyle  d\mu(t,x)^{\vec p} := d\mu(t,x)_1^{p_1} \dots d\mu(t,x)_k^{p_k}.

If we then define the heat-type functions

\displaystyle  u_j(t,x) := \int_{{\bf R}^d}\ d\mu_j(t,x) = \int_{{\bf R}^d} e^{-\pi \langle A_j(x-y_j), (x-y_j) \rangle/t}\ d\mu_j^0

(where we drop the normalising power of {t} for simplicity) we see in particular that

\displaystyle  \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx = \int_{\Omega^{\vec p}}\ d\mu(t,x)^{\vec p} \ \ \ \ \ (17)

hence we can interpret the multilinear integral in the left-hand side of (17) as a product fractional dimensional integral. (We remark that in my paper with Bennett and Carbery, a slightly different parameterisation is used, replacing {x} with {t x}, and also replacing {t} with {1/t}.)

If the functions {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} were constant in {\Omega_j}, then the functions {u_j(t,x)} would obey some heat-type partial differential equation, and the situation is now very analogous to Proposition 5 (and is also closely related to Brascamp-Lieb inequalities, as discussed for instance in this paper of Carlen, Lieb, and Loss, or this paper of mine with Bennett, Carbery, and Christ). However, for applications to the multilinear Kakeya inequality, we permit {A_j} to vary slightly in the {\Omega_j} variable, and now the {u_j} do not directly obey any PDE.

A naive extension of Proposition 5 would then seek to establish monotonicity of the quantity (17). While such monotonicity is available in the “Brascamp-Lieb case” of constant {A_j}, as discussed in the above papers, this does not quite seem to be to be true for variable {A_j}. To fix this problem, a weight is introduced in order to avoid having to take matrix inverses (which are not always available in this algebra). On the product fractional dimensional space {\Omega^{\vec p}}, we have a matrix-valued function {A_*} defined by

\displaystyle  A_* := \sum_{j=1}^k [A_j]_{1 \rightarrow p_j; j}.

The determinant {\mathrm{det}(A_*)} is then a scalar element of the algebra (15). We then define the quantity

\displaystyle  Q_{\vec p}(t) := t^{-d/2} \int_{\Omega^{\vec p}}\mathrm{det}(A_*)\ d\mu(t,x)^{\vec p}. \ \ \ \ \ (18)

Example 8 Suppose we take {k=2} and let {p_1,p_2} be natural numbers. Then {A_*} can be viewed as the {2 \times 2}-matrix valued function

\displaystyle  A_*(\omega_{1;1},\dots,\omega_{p_1;1},\omega_{1;2},\dots,\omega_{p_2;2}) = \sum_{i=1}^{p_1} A_1(\omega_{i;1}) + \sum_{i=1}^{p_2} A_2(\omega_{i;2}).

By slight abuse of notation, we write the determinant {\mathrm{det}(A)} of a {2 \times 2} matrix as {X \wedge Y}, where {X} and {Y} are the first and second rows of {A}. Then

\displaystyle  \mathrm{det}(A_*) = \sum_{1 \leq i,i' \leq p_1} X_1(\omega_{i;1}) \wedge Y_1(\omega_{i';1})

\displaystyle  + \sum_{i=1}^{p_1} \sum_{i'=1}^{p_2} X_1(\omega_{i;1}) \wedge Y_2(\omega_{i';2}) + X_2(\omega_{i';2}) \wedge Y_1(\omega_{i;1})

\displaystyle  + \sum_{1 \leq i,i' \leq p_2} X_2(\omega_{i;2}) \wedge Y_2(\omega_{i';2})

and after some calculation, one can then write {Q_{\vec p}(t)} as

\displaystyle  p t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1} X_1 \wedge Y_1\ d\mu_1(t,x)) u_1(t,x)^{p_1-1} u_2(t,x)^{p_2}\ dx

\displaystyle  + p(p-1)t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1^2} X_1(\omega_1) \wedge Y_1(\omega_2)\ d\mu^2_1(t,x)(\omega_1,\omega_2))

\displaystyle u_1(t,x)^{p_1-2} u_2(t,x)^{p_2}\ dx

\displaystyle  + p^2t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1} X_1\ d\mu_1(t,x) \wedge \int_{\Omega_2} Y_2\ d\mu_2(t,x)

\displaystyle + \int_{\Omega_2} X_2\ d\mu_2(t,x) \wedge \int_{\Omega_1} Y_1\ d\mu_1(t,x)) u_1(t,x)^{p_1-1} u_2(t,x)^{p_2-1}\ dx

\displaystyle  p t^{-d/2}\int_{{\bf R}^d} (\int_{\Omega_2} X_2 \wedge Y_2\ d\mu_2(t,x)) u_1(t,x)^{p_1} u_2(t,x)^{p_2-1}\ dx

\displaystyle  + p(p-1)t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_2^2} X_2(\omega_1) \wedge Y_2(\omega_2)\ d\mu^2_2(t,x)(\omega_1,\omega_2))

\displaystyle u_1(t,x)^{p_1} u_2(t,x)^{p_2-2}\ dx.

By a polynomial extrapolation argument, this formula is then also valid for fractional values of {p}; this can also be checked directly from the definitions after some tedious computation. Thus we see that while the compact-looking fractional dimensional integral (18) can be expressed in terms of more traditional integrals, the formulae get rather messy, even in the {d=2} case. As such, the fractional dimensional calculus (based heavily on derivative identities such as (16)) gives a more convenient framework to manipulate these otherwise quite complicated expressions.

Suppose the functions {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} are close to constant {d \times d} matrices {A_j^0 \in {\bf R}^{d \times d}}, in the sense that

\displaystyle  A_j = A_j^0 + O(\varepsilon) \ \ \ \ \ (19)

uniformly on {\Omega_j} for some small {\varepsilon>0} (where we use for instance the operator norm to measure the size of matrices, and we allow implied constants in the {O()} notation to depend on {d, \vec p}, and the {A_j^0}). Then we can write {A_j = A_j^0 + \varepsilon C_j} for some bounded matrix {C_j}, and then we can write

\displaystyle  A_* = \sum_{j=1}^k [A_j^0]_{1 \rightarrow p_j;j} + \varepsilon [B_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k p_j A_j^0 + \varepsilon \sum_{j=1}^k [B_j^0]_{1 \rightarrow p_j;j}.

We can therefore write

\displaystyle \mathrm{det}(A_*) = \mathrm{det}(A_*^0) + \varepsilon B_*

where {A_*^0 := \sum_{j=1}^k p_j A_j^0} and the coefficients of the matrix {C_*} are some polynomial combination of the coefficients of {[C_j^0]_{1 \rightarrow p_j;j}}, with all coefficients in this polynomial of bounded size. As a consequence, and on expanding out all the fractional dimensional integrals, one obtains a formula of the form

\displaystyle  Q_{\vec p}(t) = t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{\Omega^{\vec p}} \ d\mu(t,x)^{\vec p}

\displaystyle  = t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx .

Thus, as long as {A_*^0} is strictly positive definite and {\varepsilon} is small enough, this quantity {Q_{\vec p}(t)} is comparable to the classical integral

\displaystyle t^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx.

Now we compute the time derivative of {Q_{\vec p}(t)}. We have

\displaystyle  \frac{\partial}{\partial t} \mu_j(t,x) = \frac{\pi}{t^2} \langle A_j(x-y_j),(x-y_j) \rangle \mu_j(t,x)

so by (16), one can write {\frac{d}{dt} Q_{\vec p}(t)} as

\displaystyle  -\frac{d}{2} Q_{\vec p}(t) + \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \ \ \ \ \ (20)

\displaystyle  \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(x-y_j),(x-y_j) \rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p}\ dx

where we use {y_j} as the coordinate for the copy of {{\bf R}^d} that is being lifted to {({\bf R}^d)^{p_j}}.

As before, we can take advantage of some cancellation in this expression using integration by parts. Since

\displaystyle  \frac{\partial}{\partial x_i} \mu_j(t,x) = -\frac{2\pi}{t} \langle A_j(x-y_j), e_i\rangle \mu_j(t,x)

where {e_1,\dots,e_d} are the standard basis for {{\bf R}^d}, we see from (16) and integration by parts that

\displaystyle  d Q_{\vec p}(t) = \frac{2\pi}{t^{\frac{d}{2}+1}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}}\mathrm{det}(A_*) \sum_{j=1}^k x_i [\langle A_j(x-y_j), e_i\rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p}

with the usual summation conventions on the index {i}. Also, similarly to before, we suppose we have an element {F_i} of (15) for each {i} that does not depend on {x}, then by (16) and integration by parts

\displaystyle  \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k F_i [\langle A_j(x-y_j), e_i\rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p} = 0

or, writing {F = (F_1,\dots,F_d)},

\displaystyle  \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k \langle [A_j(x-y_j)]_{1 \rightarrow p_j;j}, F \rangle\ d\mu(t,x)^{\vec p} = 0.

We can thus write (20) as

\displaystyle  \frac{d}{dt} Q_{\vec p}(t) = \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} G\ d\mu(t,x)^{\vec p}\ dx \ \ \ \ \ (21)

where {G = G(x)} is the element of (15) given by

\displaystyle  G := \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(x-y_j),(x-y_j) \rangle]_{1 \rightarrow p_j;j} \ \ \ \ \ (22)

\displaystyle  - \langle \sum_{j=1}^k [A_j(x-y_j)]_{1 \rightarrow p_j;j}, \mathrm{det}(A_*) x - F \rangle.

The terms in {G} that are quadratic in {x} cancel. The linear term can be rearranged as

\displaystyle  \langle x, A_* F - \mathrm{det}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j} \rangle.

To cancel this, one would like to set {F} equal to

\displaystyle  F = A_*^{-1}\mathrm{det}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j} .

Now in the commutative algebra (15), the inverse {A_*^{-1}} does not necessarily exist. However, because of the weight factor {\mathrm{det}(A_*)}, one can work instead with the adjugate matrix {\mathrm{adj}(A_*)}, which is such that {\mathrm{adj}(A_*) A_* = A_* \mathrm{adj}(A_*) \mathrm{det}(A_*) I} where {I} is the identity matrix. We therefore set {F} equal to the expression

\displaystyle  F := \mathrm{adj}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j}

and now the expression in (22) does not contain any linear or quadratic terms in {x}. In particular it is completely independent of {x}, and thus we can write

\displaystyle  G = \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(\overline{y}-y_j),(\overline{y}-y_j) \rangle]_{1 \rightarrow p_j;j}

\displaystyle  - \langle \sum_{j=1}^k [A_j(\overline{y}-y_j)]_{1 \rightarrow p_j;j}, \mathrm{det}(A_*) \overline{y} - F \rangle

where {\overline{y} = \overline{y}(t,x)} is an arbitrary element of {{\bf R}^d} that we will select later to obtain a useful cancellation. We can rewrite this a little as

\displaystyle  G = \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(\overline{y}-y_j),(\overline{y}-y_j) \rangle]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k [A_j(\overline{y}-y_j)]_{1 \rightarrow p_j;j}, \mathrm{adj}(A_*) \sum_{j'=1}^k [A_j(\overline{y} - y_j)]_{1 \rightarrow p_j; j} \rangle.

If we now introduce the matrix functions

\displaystyle  B_j := A_j^{1/2}

and the vector functions

\displaystyle  w_j := B_j( \overline{y} - y_j)

then this can be rewritten as

\displaystyle  G = \mathrm{det}(A_*) \sum_{j=1}^k [\|w_j\|^2]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k [B_j w_j]_{1 \rightarrow p_j;j}, \mathrm{adj}(A_*) \sum_{j'=1}^k [B_j w_j]_{1 \rightarrow p_j; j} \rangle.

Similarly to (19), suppose that we have

\displaystyle  B_j = B_j^0 + O(\varepsilon)

uniformly on {\Omega_j}, where {B_j^0 := (A_j^0)^{1/2}}, thus we can write

\displaystyle  B_j = B_j^0 + \varepsilon D_j \ \ \ \ \ (23)

for some bounded matrix-valued functions {D_j}. Inserting this into the previous expression (and expanding out {A_*} appropriately) one can eventually write

\displaystyle  G = G^0 + \varepsilon H


\displaystyle  G^0 = \mathrm{det}(A^0_*) (\sum_{j=1}^k [\|w_j\|^2]_{1 \rightarrow p_j;j}

\displaystyle  - \langle \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j;j}, (A^0_*)^{-1} \sum_{j'=1}^k B^0_j [w_j]_{1 \rightarrow p_j; j} \rangle )

and {H} is some polynomial combination of the {D_j} and {w_j} (or more precisely, of the quantities {[D_j]_{1 \rightarrow p_j;j}}, {[w_j]_{1 \rightarrow p_j;j}}, {[D_j w_j]_{1 \rightarrow p_j;j}}, {[\|w_j\|^2]_{1 \rightarrow p_j;j}}) that is quadratic in the {w_j} variables, with bounded coefficients. As a consequence, after expanding out the product fractional dimensional integrals and applying some Cauchy-Schwarz to control cross-terms, we have

\displaystyle  \frac{d}{dt} Q_{\vec p}(t) = \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} G^0\ d\mu(t,x)^{\vec p}\ dx

\displaystyle  + O( \varepsilon t^{-\frac{d}{2}+2} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k [\| w_j \|^2]_{1 \rightarrow p_j;j}d\mu(t,x)^{\vec p}\ dx).

Now we simplify {G^0}. We let

\displaystyle  \overline{w_j} := \frac{\int_{\Omega_j} w_j\ d\mu_j}{\int_{\Omega_j}\ d\mu_j}

be the average value of {\overline{w_j}}; for each {t,x} this is just a vector in {{\bf R}^d}. We then split {w_j = \overline{w_j} + (w_j - \overline{w_j})}, leading to the identities

\displaystyle  [\|w_j\|^2]_{1 \rightarrow p_j;j} = p_j \|\overline{w_j}\|^2 + 2 \langle \overline{w_j}, [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}\rangle

\displaystyle + [\| w_j - \overline{w_j} \|^2]_{1 \rightarrow p_j;j}


\displaystyle  \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k p_j B^0_j \overline{w_j} + \sum_{j=1}^k B^0_j [(w_j - \overline{w_j})]_{1 \rightarrow p_j;j}.

The term {\sum_{j=1}^k p_j B^0_j \overline{w_j}} is problematic, but we can eliminate it as follows. By construction one has (supressing the dependence on {t,x})

\displaystyle  \sum_{j=1}^k p_j B^0_j \overline{w_j} \int_{\Omega^p} \ d\mu^{\vec p} = \int_{\Omega^p} \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}

\displaystyle  = \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j(\overline{y} - y_j)]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}

\displaystyle  = \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j]_{1 \rightarrow p_j;j} \overline{y} - \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j y_j]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}.

By construction, one has

\displaystyle  \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j]_{1 \rightarrow p_j;j}\ d\mu^{\vec p} = (\sum_{j=1}^k p B_0^j B_0^j + O(\varepsilon)) \int_{\Omega^p}\ d\mu^{\vec p}

\displaystyle  = (A^0_* + O(\varepsilon)) \int_{\Omega^p}\ d\mu^{\vec p}.

Thus if {A^0_*} is positive definite and {\varepsilon} is small enough, this matrix is invertible, and we can choose {\overline{y}} so that the expression {\sum_{j=1}^k p_j B^0_j \overline{w_j} } vanishes. Making this choice, we then have

\displaystyle  \sum_{j=1}^k [B^0_j w_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k [B^0_j (w_j - \overline{w_j})]_{1 \rightarrow p_j;j}.

Observe that the fractional dimensional integral of

\displaystyle \langle \overline{w_j}, [w_j - \overline{w_j}]_{1 \rightarrow p_j;j} \rangle


\displaystyle  \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, M [w_{j'} - \overline{w_{j'}}]_{1 \rightarrow p_{j'};j'}

for {j \neq j'} and arbitrary constant matrices {M} against {d\mu^{\vec p}} vanishes. As a consequence, we can now simplify the integral

\displaystyle  \int_{\Omega^p} G_0\ d\mu^{\vec p} \ \ \ \ \ (24)


\displaystyle  \mathrm{det}(A^0_*) \int_{\Omega^p} \sum_{j=1}^k p_j \| \overline{w_j}\|^2

\displaystyle + \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) [w_j - \overline{w_j}]_{1 \rightarrow p_j; j} \rangle\ d\mu^{\vec p}.

Using (2), we can split

\displaystyle  \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) [w_j - \overline{w_j}]_{1 \rightarrow p_j; j} \rangle

as the sum of

\displaystyle  [\langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle ]_{1 \rightarrow p_j; j}


\displaystyle  2[\langle w_j(\omega_1) - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j(\omega_2) - \overline{w_j}) \rangle ]_{2 \rightarrow p_j; j}.

The latter also integrates to zero by the mean zero nature of {w_j - \overline{w_j}}. Thus we have simplified (24) to

\displaystyle  \mathrm{det}(A^0_*) \int_{\Omega^p} \sum_{j=1}^k p_j \| \overline{w_j}\|^2

\displaystyle  + [\langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle ]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}.

Now let us make the key hypothesis that the matrix

\displaystyle  1 - B_0^j (A^0_*)^{-1} B_0^{j}

is strictly positive definite, or equivalently that

\displaystyle  \sum_{j'=1}^k p_{j'} A_{j'}^0 > A_j

for all {j=1,\dots,k}, where the ordering is in the sense of positive definite matrices. Then we have the pointwise bound

\displaystyle \langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle \gtrsim \| w_j - \overline{w_j} \|^2

and thus

\displaystyle  \frac{d}{dt} Q_{\vec p}(t) \gtrsim t^{-\frac{d}{2}+2} \int_{{\bf R}^d} \int_{\Omega^p}

\displaystyle \sum_{j=1}^k [\|\overline{w_j}\|^2 + \| w_j - \overline{w_j} \|^2 - O(\varepsilon) \|w_j\|^2 ]_{1 \rightarrow p_j;j}\ d\mu^{\vec p}\ dx.

For {\varepsilon} small enough, the expression inside the {[]_{1 \rightarrow p_j;j}} is non-negative, and we conclude the monotonicity

\displaystyle  \frac{d}{dt} Q_{\vec p}(t) \geq 0.

We have thus proven the following statement, which is essentially Proposition 4.1 of my paper with Bennett and Carbery:

Proposition 9 Let {d,k \geq 1}, let {A_1^0,\dots,A_k^0} be positive semi-definite real symmetric {d \times d} matrices, and let {p_1,\dots,p_k>0} be such that

\displaystyle  p_1 A_1^0 + \dots + p_k A_k^0 > A_j^0 \ \ \ \ \ (25)

for {j=1,\dots,d}. Then for any positive measure spaces {\Omega_1,\dots,\Omega_k} with measures {\mu_1^0,\dots,\mu_k^0} and any functions {A_j, y_j} on {\Omega_j} with {A_j = A_j^0 + O(\varepsilon)} for a sufficiently small {\varepsilon>0}, the quantity {Q_{\vec p}(t)} is non-decreasing in {t \in (0,+\infty)}, and is also equal to

\displaystyle  t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx.

In particular, we have

\displaystyle  t^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx \lesssim \limsup_{T \rightarrow \infty} T^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(T,x)^{p_j}\ dx

for any {t>0}.

A routine calculation shows that for reasonable choices of {\mu_j^0} (e.g. discrete measures of finite support), one has

\displaystyle  \limsup_{T \rightarrow \infty} T^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(T,x)^{p_j}\ dx \lesssim \prod_{j=1}^k \mu_j^0({\bf R}^d)^{p_j}

and hence (setting {t=1}) we have

\displaystyle  \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx \lesssim \prod_{j=1}^k \mu_j^0({\bf R}^d).

If we choose the {\mu_j^0} to be the sum of {N_j} Dirac masses, and each {A_j^0} to be the diagonal matrix {A_j^0 = I - e_j e_j^T}, then the key condition (25) is obeyed for {p_1=\dots=p_k = p > \frac{1}{d-1}}, and one arrives at the multilinear Kakeya inequality

\displaystyle  \int_{{\bf R}^d} \prod_{j=1}^k (\sum_{i=1}^{N_j} 1_{T_{j,i}})^p \lesssim \prod_{j=1}^k N_j^{p_j}

whenever {T_{j,i}} are infinite tubes in {{\bf R}^d} of width {1} and oriented within {\varepsilon} of the basis vector {e_j}, for a sufficiently small absolute constant {\varepsilon}. (The hypothesis on the directions can then be relaxed to a transversality hypothesis by applying some linear transformations and the triangle inequality.)

June 28, 2019

John PreskillThe importance of being open

Barcelona refused to stay indoors this May.

Merchandise spilled outside shops onto the streets, restaurateurs parked diners under trees, and ice-cream cones begged to be eaten on park benches. People thronged the streets, markets filled public squares, and the scents of flowers wafted from vendors’ stalls. I couldn’t blame the city. Its sunshine could have drawn Merlin out of his crystal cave. Insofar as a city lives, Barcelona epitomized a quotation by thermodynamicist Ilya Prigogine: “The main character of any living system is openness.”

Prigogine (1917–2003), who won the Nobel Prize for chemistry, had brought me to Barcelona. I was honored to receive, at the Joint European Thermodynamics Conference (JETC) there, the Ilya Prigogine Prize for a thermodynamics PhD thesis. The JETC convenes and awards the prize biennially; the last conference had taken place in Budapest. Barcelona suited the legacy of a thermodynamicist who illuminated open systems.


The conference center. Not bad, eh?

Ilya Prigogine began his life in Russia, grew up partially in Germany, settled in Brussels, and worked at American universities. His biography reveals a mind open to many influences and disciplines: Before entering university, his “interest was more focused on history and archaeology, not to mention music, especially piano.” Yet Prigogine pursued chemistry. 

He helped extend thermodynamics outside equilibrium. Thermodynamics is the study of energy, order, and time’s arrow in terms of large-scale properties, such as temperature, pressure, and volume. Many physicists think that thermodynamics describes only equilibrium. Equilibrium is a state of matter in which (1) large-scale properties remain mostly constant and (2) stuff (matter, energy, electric charge, etc.) doesn’t flow in any particular direction much. Apple pies reach equilibrium upon cooling on a countertop. When I’ve described my research as involving nonequilibrium thermodynamics, some colleagues have asked whether I’ve used an oxymoron. But “nonequilibrium thermodynamics” appears in Prigogine’s Nobel Lecture. 

Prigogine photo

Ilya Prigogine

Another Nobel laureate, Lars Onsager, helped extend thermodynamics a little outside equilibrium. He imagined poking a system gently, as by putting a pie on a lukewarm stovetop or a magnet in a weak magnetic field. (Experts: Onsager studied the linear-response regime.) You can read about his work in my blog post “Long live Yale’s cemetery.” Systems poked slightly out of equilibrium tend to return to equilibrium: Equilibrium is stable. Systems flung far from equilibrium, as Prigogine showed, can behave differently. 

A system can stay far from equilibrium by interacting with other systems. Imagine placing an apple pie atop a blistering stove. Heat will flow from the stove through the pie into the air. The pie will stay out of equilibrium due to interactions with what we call a “hot reservoir” (the stove) and a “cold reservoir” (the air). Systems (like pies) that interact with other systems (like stoves and air), we call “open.”

You and I are open: We inhale air, ingest food and drink, expel waste, and radiate heat. Matter and energy flow through us; we remain far from equilibrium. A bumper sticker in my high-school chemistry classroom encapsulated our status: “Old chemists don’t die. They come to equilibrium.” We remain far from equilibrium—alive—because our environment provides food and absorbs heat. If I’m an apple pie, the yogurt that I ate at breakfast serves as my stovetop, and the living room in which I breakfasted serves as the air above the stove. We live because of our interactions with our environments, because we’re open. Hence Prigogine’s claim, “The main character of any living system is openness.”

Apple pie

The author

JETC 2019 fostered openness. The conference sessions spanned length scales and mass scales, from quantum thermodynamics to biophysics to gravitation. One could arrive as an expert in cell membranes and learn about astrophysics.

I remain grateful for the prize-selection committee’s openness. The topics of earlier winning theses include desalination, colloidal suspensions, and falling liquid films. If you tipped those topics into a tube, swirled them around, and capped the tube with a kaleidoscope glass, you might glimpse my thesis’s topic, quantum steampunk. Also, of the nine foregoing Prigogine Prize winners, only one had earned his PhD in the US. I’m grateful for the JETC’s consideration of something completely different.

When Prigogine said, “openness,” he referred to exchanges of energy and mass. Humans can exhibit openness also to ideas. The JETC honored Prigogine’s legacy in more ways than one. Here’s hoping I live up to their example.


Outside La Sagrada Familia

June 25, 2019

June 20, 2019

Cylindrical OnionFinding the Higgs boson within minutes in the cloud

Guest post by Clemens Lange

Lukas Heinrich presenting the keynote. In the background, you can see the Higgs boson mass peak emerge from the background in simulation and data as the jobs running in the cloud are completing.

June 19, 2019

Tim GowersThe fate of combinatorics at Strathclyde

I have just received an email from Sergey Kitaev, one of the three combinatorialists at Strathclyde. As in many universities, they belong not to the mathematics department but to the computer science department. Kitaev informs me that the administrators of that department, in their infinite wisdom, have decided that the future of the department is best served by axing discrete mathematics. I won’t write a long post about this, but instead refer you to a post by Peter Cameron that says everything I would want to say about the decision, and does so extremely cogently. I recommend that you read it if this kind of decision worries you.

June 16, 2019

Matt StrasslerA Ring of Controversy Around a Black Hole Photo

[Note Added: Thanks to some great comments I’ve received, I’m continuing to add clarifying remarks to this post.  You’ll find them in green.]

It’s been a couple of months since the `photo’ (a false-color image created to show the intensity of radio waves, not visible light) of the black hole at the center of the galaxy M87, taken by the Event Horizon Telescope (EHT) collaboration, was made public. Before it was shown, I wrote an introductory post explaining what the ‘photo’ is and isn’t. There I cautioned readers that I thought it might be difficult to interpret the image, and controversies about it might erupt.EHTDiscoveryM87

So far, the claim that the image shows the vicinity of M87’s black hole (which I’ll call `M87bh’ for short) has not been challenged, and I’m not expecting it to be. But what and where exactly is the material that is emitting the radio waves and thus creating the glow in the image? And what exactly determines the size of the dark region at the center of the image? These have been problematic issues from the beginning, but discussion is starting to heat up. And it’s important: it has implications for the measurement of the black hole’s mass (which EHT claims is that of 6.5 billion Suns, with an uncertainty of about 15%), and for any attempt to estimate its rotation rate.

Over the last few weeks I’ve spent some time studying the mathematics of spinning black holes, talking to my Harvard colleagues who are world’s experts on the relevant math and physics, and learning from colleagues who produced the `photo’ and interpreted it. So I think I can now clearly explain what most journalists and scientist-writers (including me) got wrong at the time of the photo’s publication, and clarify what the photo does and doesn’t tell us.

One note before I begin: this post is long. But it starts with a summary of the situation that you can read quickly, and then comes the long part: a step-by-step non-technical explanation of an important aspect of the black hole ‘photo’ that, to my knowledge, has not yet been given anywhere else.

[I am heavily indebted to Harvard postdocs Alex Lupsasca and Shahar Hadar for assisting me as I studied the formulas and concepts relevant for fast-spinning black holes. Much of what I learned comes from early 1970s papers, especially those by my former colleague Professor Jim Bardeen (see this one written with Press and Teukolsky), and from papers written in the last couple of years, especially this one by my present and former Harvard colleagues.]

What Does the EHT Image Show?

Scientists understand the black hole itself — the geometric dimple in space and time — pretty well. If one knows the mass and the rotation rate of the black hole, and assumes Einstein’s equations for gravity are mostly correct (for which we have considerable evidence, for example from LIGO measurements and elsewhere), then the equations tell us what the black hole does to space and time and how its gravity works.

But for the `photo’, ​that’s not enough information. We don’t get to observe the black hole itself (it’s black, after all!) What the `photo’ shows is a blurry ring of radio waves, emitted from hot material (a plasma of mostly electrons and protons) somewhere around the black hole — material whose location, velocity, and temperature we do not know. That material and its emission of radio waves are influenced by powerful gravitational forces (whose details depend on the rotation rate of the M87bh, which we don’t know yet) and powerful magnetic fields (whose details we hardly know at all.) The black hole’s gravity then causes the paths on which the radio waves travel to bend, even more than a glass lens will bend the path of visible light, so that where things appear in the ‘photo’ is not where they are actually located.

The only insights we have into this extreme environment come from computer simulations and a few other `photos’ at lower magnification. The simulations are based on well-understood equations, but the equations have to be solved approximately, using methods that may or may not be justified. And the simulations don’t tell you where the matter is; they tell you where the material will go, but only after you make a guess as to where it is located at some initial point in time. (In the same sense: computers can predict the national weather tomorrow only when you tell them what the national weather was yesterday.) No one knows for sure how accurate or misleading these simulations might be; they’ve been tested against some indirect measurements, but no one can say for sure what flaws they might have.

However, there is one thing we can certainly say, and it has just been said publicly in a paper by Samuel Gralla, Daniel Holz and Robert Wald.

Two months ago, when the EHT `photo’ appeared, it was widely reported in the popular press and on blogs that the photo shows the image of a photon sphere at the edge of the shadow of the M87bh. (Instead of `shadow’, I suggested the term ‘quasi-silhouette‘, which I viewed as somewhat less misleading to a non-expert.)

Unfortunately, it seems these statements are not true; and this was well-known to (but poorly communicated by, in my opinion) the EHT folks.  This lack of clarity might perhaps annoy some scientists and science-loving non-experts; but does this issue also matter scientifically? Gralla et al., in their new preprint, suggest that it does (though they were careful to not yet make a precise claim.)

The Photon Sphere Doesn’t Exist

Indeed, if you happened to be reading my posts carefully when the `photo’ first appeared, you probably noticed that I was quite vague about the photon-sphere — I never defined precisely what it was. You would have been right to read this as a warning sign, for indeed I wasn’t getting clear explanations of it from anyone. Studying the equations and conversing with expert colleagues, I soon learned why: for a rotating black hole, the photon sphere doesn’t really exist.

But let’s first define what the photon sphere is for a non-rotating black hole! Like the Earth’s equator, the photon sphere is a location, not an object. This location is the surface of an imaginary ball, lying well outside the black hole’s horizon. On the photon sphere, photons (the particles that make up light, radio waves, and all other electromagnetic waves) travel on special circular or spherical orbits around the black hole.

By contrast, a rotating black hole has a larger, broader `photon-zone’ where photons can have special orbits. But you won’t ever see the whole photon zone in any image of a rotating black hole. Instead, a piece of the photon zone will appear as a `photon ring‘, a bright and very thin loop of radio waves. However, the photon ring is not the edge of anything spherical, is generally not perfectly circular, and generally is not even perfectly centered on the black hole.

… and the Photon Ring Isn’t What We See…

It seems likely that the M87bh is rotating quite rapidly, so it has a photon-zone rather than a photon-sphere, and images of it will have a photon ring. Ok, fine; but then, can we interpret EHT’s `photo’ simply as showing the photon ring, blurred by the imperfections in the `telescope’? Although some of the EHT folks have seemed to suggest the answer is “yes”, Gralla et al. suggest the answer is likely “no” (and many of their colleagues have been pointing out the same thing in private.) The circlet of radio waves that appears in the EHT `photo’ is probably not simply a blurred image of M87bh’s photon ring; it probably shows a combination of the photon ring with something brighter (as explained below). That’s where the controversy starts.

…so the Dark Patch May Not Be the Full Shadow…

The term `shadow’ is confusing (which is why I prefer `quasi-silhouette’ in describing it in public contexts, though that’s my own personal term) but no matter what you call it, in its ideal form it is supposed to be an absolutely dark area whose edge is the photon ring. But in reality the perfectly dark area need not appear so dark after all; it may be partly filled in by various effects. Furthermore, since the `photo’ may not show us the photon ring, it’s far from clear that the dark patch in the center is the full shadow anyway. The EHT folks are well aware of this, but at the time the photo came out, many science writers and scientist-writers (including me) were not.

…so EHT’s Measurement of the M87bh’s Mass is Being Questioned

It was wonderful that EHT could make a picture that could travel round the internet at the speed of light, and generate justifiable excitement and awe that human beings could indirectly observe such an amazing thing as a black hole with a mass of several billion Sun-like stars. Qualitatively, they achieved something fantastic in showing that yes, the object at the center of M87 really is as compact and dark as such a black hole would be expected to be! But the EHT telescope’s main quantitative achievement was a measurement of the mass of the M87bh, with a claimed precision of about 15%.

Naively, one could imagine that the mass is measured by looking at the diameter of the dark spot in the black hole ‘photo’, under the assumption that it is the black hole’s shadow. So here’s the issue: Could interpreting the dark region incorrectly perhaps lead to a significant mistake in the mass measurement, and/or an underestimate of how uncertain the mass measurement actually is?

I don’t know.  The EHT folks are certainly aware of these issues; their simulations show them explicitly.  The mass of the M87bh isn’t literally measured by putting a ruler on the ‘photo’ and measuring the size of the dark spot! The actual methods are much more sophisticated than that, and I don’t understand them well enough yet to explain, evaluate or criticize them. All I can say with confidence right now is that these are important questions that experts currently are debating, and consensus on the answer may not be achieved for quite a while.


The Appearance of a Black Hole With Nearby Matter

Ok, now I’m going to explain the most relevant points, step-by-step. Grab a cup of coffee or tea, find a comfy chair, and bear with me.

Because fast-rotating black holes are more complicated, I’m going to start illuminating the controversy by looking at a non-rotating black hole’s properties, which is also what Gralla et al. mainly do in their paper. It turns out the qualitative conclusion drawn from the non-rotating case largely applies in the rotating case too, at least in the case of the M87bh as seen from our perspective; that’s important because the M87bh may well be rotating at a very good clip.

A little terminology first: for a rotating black hole there’s a natural definition of the poles and the equator, just as there is for the Earth: there’s an axis of rotation, and the poles are where that axis intersects with the black hole horizon. The equator is the circle that lies halfway between the poles. For a non-rotating black hole, there’s no such axis and no such automatic definition, but it will be useful to define the north pole of the black hole to be the point on the horizon closest to us.

A Single Source of Electromagnetic Waves

Let’s imagine placing a bright light bulb on the same plane as the equator, outside the black hole horizon but rather close to it. (The bulb could emit radio waves or visible light or any other form of electromagnetic waves, at any frequency; for what I’m about to say, it doesn’t matter at all, so I’ll just call it `light’.) See Figure 1. Where will the light from the bulb go?

Some of it, heading inward, ends up in the black hole, while some of it heads outward toward distant observers. The gravity of the black hole will bend the path of the light. And here’s something remarkable: a small fraction of the light, aimed just so, can actually spiral around the black hole any number of times before heading out. As a result, you will see the bulb not once but multiple times!

There will be a direct image — light that comes directly to us — from near the bulb’s true location (displaced because gravity bends the light a bit, just as a glass lens will distort the appearance of what’s behind it.) That path of that light is the orange arrow in Figure 1. But then there will be an indirect image (the green arrow in Figure 1) from light that goes halfway around the black hole before heading in our direction; we will see that image of the bulb on the opposite side of the black hole. Let’s call that the `first indirect image.’ Then there will be a second indirect image from light that orbits the black hole once and comes out near the direct image, but further out; that’s the blue arrow in Figure 1. Then there will be a third indirect image from light that goes around one and a half times (not shown), and so on. In short, Figure 1 shows the paths of the direct, first indirect, and second indirect images of the bulb as they head toward our location at the top of the image.


Figure 1: A light bulb (yellow) outside but near the non-rotating black hole’s horizon (in black) can be seen by someone at the top of the image not only through the light that goes directly upward (orange line) — a “direct image” — but also through light that makes partial or complete orbits of the black hole — “indirect images.” The first indirect and second indirect images are from light taking the green and blue paths. For light to make orbits of the black hole, it must travel near the grey-dashed circle that indicates the location of a “photon-sphere.” (A rotating black hole has no such sphere, but when seen from the north or south pole, the light observed takes similar paths to what is shown in this figure.) [The paths of the light rays were calculated carefully using Mathematica 11.3.]

What you can see in Figure 1 is that both the first and second indirect images are formed by light that spends part of its time close to a special radius around the back hole, shown as a dotted line. This imaginary surface, the edge of a ball,  is an honest “photon-sphere” in the case of a non-rotating black hole.

In the case of a rotating black hole, something very similar happens when you’re looking at the black hole from its north (or south) pole; there’s a special circle then too. But that circle is not the edge of a photon-sphere! In general, photons can have special orbits in a wide region, which I called the “photon-zone” earlier, and only a small set of them are on this circle. You’ll see photons from other parts of the photon zone if you look at the black hole not from the poles but from some other angle.

[If you’d like to learn a bit more about the photon zone, and you have a little bit of knowledge of black holes already, you can profit from exploring this demo by Professor Leo Stein: ]

Back to the non-rotating case: What our camera will see, looking at what is emitted from the light bulb, is shown in Figure 2: an infinite number of increasingly squished `indirect’ images, half on one side of the black hole near the direct image, and the other half on the other side. What is not obvious, but true, is that only the first of the indirect images is large and bright; this is one of Gralla et al.‘s main points. We can, therefore, separate the images into the direct image, the first indirect image, and the remaining indirect images. The total amount of light coming from the direct image and the first indirect image can be large, but the total amount of light from the remaining indirect images is typically (according to Gralla et al.) less than 5% of the light from the first indirect image. And so, unless we have an extremely high-powered camera, we’ll never pick those other images up. Let’s therefore focus our attention on the direct image and the first indirect image.


Figure 2: What the drawing in Figure 1 actually looks like to the observer peering toward the black hole; all the indirect images lie at almost exactly the same distance from the black hole’s center.

WARNING (since this seems to be a common confusion):


A Circular Source of Electromagnetic Waves

Proceeding step by step toward a more realistic situation, let’s replace our ordinary bulb by a circular bulb (Figure 3), again set somewhat close to the horizon, sitting in the plane that contains the equator. What would we see now?


Figure 3: if we replace the light bulb with a circle of light, the paths of the light are the same as in Figure 1, except now for each point along the circle. That means each direct and indirect image itself forms a circle, as shown in the next figure.

That’s shown in Figure 4: the direct image is a circle (looking somewhat larger than it really is); outside it sits the first indirect image of the ring; and then come all the other indirect images, looking quite dim and all piling up at one radius. We’re going to call all those piled-up images the “photon ring”.


Figure 4: The circular bulb’s direct image is the bright circle, but a somewhat dimmer first indirect image appears further out, and just beyond one finds all the other indirect images, forming a thin `photon ring’.

Importantly, if we consider circular bulbs of different diameter [yellow, red and blue in Figure 5], then although the direct images reflect the differences in the bulbs’ diameters (somewhat enlarged by lensing), the first indirect images all are about the same diameter, just a tad larger or smaller than the photon ring.  The remaining indirect images all sit together at the radius of the photon ring.


Figure 5: Three bulbs of different diameter (yellow, blue, red) create three distinct direct images, but their first indirect images are located much closer together, and very close to the photon ring where all their remaining indirect images pile up.

These statements are also essentially true for a rotating black hole seen from the north or south pole; a circular bulb generates a series of circular images, and the indirect images all pile more or less on top of each other, forming a photon ring. When viewed off the poles, the rotating black hole becomes a more complicated story, but as long as the viewing angle is small enough, the changes are relatively minor and the picture is qualitatively somewhat similar.

A Disk as a Source of Electromagnetic Waves

And what if you replaced the circular bulb with a disk-shaped bulb, a sort of glowing pancake with a circular hole at its center, as in Figure 7? That’s relevant because black holes are thought to have `accretion disks’ made of material orbiting the black hole, and eventually spiraling in. The accretion disk may well be the dominant source emitting radio waves at the M87bh. (I’m showing a very thin uniform disk for illustration, but a real accretion disk is not uniform, changes rapidly as clumps of material move within it and then spiral into the black hole, and may be quite thick — as thick as the black hole is wide, or even thicker.)

Well, we can think of the disk as many concentric circles of light placed together. The direct images of the disk (shown in Figure 6 left, on one side of the disk, as an orange wash) would form a disk in your camera, the dim red region in Figure 6 right; the hole at its center would appear larger than it really is due to the bending caused by the black hole’s gravity, but the shape would be similar. However, the indirect images would all pile up in almost the same place from your perspective, forming a bright and quite thin ring, the bright yellow circle in Figure 6 right. (The path of the disk’s first indirect image is shown in Figure 6 left, going halfway about the black hole as a green wash; notice how it narrows as it travels, which is why it appears as a narrow ring in the image at right.) This circle — the full set of indirect images of the whole disk — is the edge of the photon-sphere for a non-rotating black hole, and the circular photon ring for a rotating black hole viewed from its north or south pole.


Figure 6: A glowing disk of material (note it does not touch the black hole) looks like a version of Figure 5 with many more circular bulbs. The direct image of the disk forms a disk (illustrated at left, for a piece of the disk, as an orange wash) while the first indirect image becomes highly compressed (illustrated, for a piece of the disk, as a green wash) and is seen as a narrow circle of bright light.  (It is expected that the disk is mostly transparent in radio waves, so the indirect image can pass through it.) That circle, along with the other indirect images, forms the photon ring. In this case, because the disk’s inner edge lies close to the black hole horizon, the photon ring sits within the disk’s direct image, but we’ll see a different example in Figure 9.

[Gralla et al. call the first indirect image the `lensed ring’ and the remaining indirect images, currently unobservable at EHT, the `photon ring’, while EHT refers to all the indirect images as the `photon ring’. Just letting you know in case you hear `lensed ring’ referred to in future.]

So the conclusion is that if we had a perfect camera, the direct image of a disk makes a disk, but the indirect images (mainly just the first one, as Gralla et al. emphasize) make a bright, thin ring that may be superposed upon the direct image of the disk, depending on the disk’s shape.

And this conclusion, with some important adjustments, applies also for a spinning black hole viewed from above its north or south pole — i.e., along its axis of rotation — or from near that axis; I’ll mention the adjustments in a moment.

But EHT is not a perfect camera. To make the black hole image, technology had to be pushed to its absolute limits. Someday we’ll see both the disk and the ring, but right now, they’re all blurred together. So which one is more important?

From a Blurry Image to Blurry Knowledge

What does a blurry camera do to this simple image? You might think that the disk is so dim and the ring so bright that the camera will mainly show you a blurry image of the bright photon ring. But that’s wrong. The ring isn’t bright enough. A simple calculation reveals that the ​photo will show mainly the disk, not the photon ring! This is shown in Figure 9, which you can compare with the Black Hole `photo’ (Figure 10). (Figure 9 is symmetric around the ring, but the photo is not, for multiple reasons — Doppler-like effect from rotation, viewpoint off the rotation axis, etc. — which I’ll have to defer til another post.)

More precisely, the ring and disk blur together, but the brightness of the image is dominated by the disk, not the ring.


Figure 7: At left is repeated the image in Figure 6, as seen in a perfect camera, while at right the same image is shown when observed using a camera with imperfect vision. The disk and ring blur together into a single thick ring, whose brightness is dominated by the disk. Note that the shadow — the region surrounded by the yellow photon ring — is not the same as the dark patch in the right-hand image; the dark patch is considerably smaller than the shadow.

Let’s say that again: the black hole `photo’ may mainly show the M87bh’s accretion disk, with the photon ring contributing only some of the light, and therefore the photon ring does not completely and unambiguously determine the radius of the observed dark patch in the `photo​.’ In general, the patch could be considerably smaller than what is usually termed the `shadow’ of the black hole.


Figure 8: (Left) We probably observe the M87bh at a small angle off its south pole. Its accretion disk has an unknown size and shape — it may be quite thick and non-uniform — and it may not even lie at the black hole’s equator. The disk and the black hole interact to create outward-going jets of material (observed already many years ago but not clearly visible in the EHT ‘photo’.) (Right) The EHT `photo’ of the M87bh (taken in radio waves and shown in false color!) Compare with Figure 7; the most important difference is that one side of the image is brighter than the other. This likely arises from (a) our view being slightly off from the south pole, combined with (b) rotation of the black hole and its disk, and (c) possibly other more subtle issues.

This is important. The photon ring’s diameter, and thus the width of the `shadow’ too, barely depend on the rotation rate of the black hole; they depend almost exclusively on the black hole’s mass. So if the ring in the photo were simply the photon ring of the M87bh, you’d have a very simple way to measure the black hole’s mass without knowing its rotation rate: you’d look at how large the dark patch is, or equivalently, the diameter of the blurry ring, and that would give you the answer to within 10%. But it’s nowhere near so simple if the blurry ring shows the accretion disk, because the accretion disk’s properties and appearance can vary much more than the photon ring; they can depend strongly on the black hole’s rotation rate, and also on magnetic fields and other details of the black hole’s vicinity.

The Important Role of Rotation

If we conclude that EHT is seeing a mix of the accretion disk with the photon ring, with the former dominating the brightness, then this makes EHT’s measurement of the M87bh’s mass more confusing and even potentially suspect. Hence: controversy. Is it possible that EHT underestimated their uncertainties, and that their measurement of the black hole mass has more ambiguities, and is not as precise, as they currently claim?

Here’s where the rotation rate is important. Despite what I showed (for pedagogical simplicity) in Figure 7, for a non-rotating black hole the accretion disk’s central gap is actually expected to lie outside the photon ring; this is shown at the top of Figure 9.  But  the faster the black hole rotates, the smaller this central gap is expected to be, to the point that for a fast-rotating black hole the gap will lie inside the photon ring, as shown at the bottom of Figure 9. (This tendency is not obvious; it requires understanding details of the black hole geometry.) And if that is true, the dark patch in the EHT image may not be the black hole’s full shadow (i.e. quasi-silhouette), which is the region inside the photon ring. It may be just the inner portion of it, with the outer portion obscured by emission from the accretion disk.

The effect of blurring in the two cases of slow (or zero) and fast rotation are illustrated in Figure 9, where the photon ring’s size is taken to be the same in each case but the disk’s inner edge is close in or far out. (The black holes, not illustrated since they aren’t visible anyway, differ in mass by about 10% in order to have the photon ring the same size.) This shows why the size of the dark patch can be quite different, depending on the disk’s shape, even when the photon ring’s size is the same.


Figure 9: Comparing the appearance of slightly more realistically-shaped disks around slowly rotating or non-rotating black holes (top) to those around fast-rotating black holes (bottom) of the same mass, as seen from the north or south pole. (Left) the view in a perfect camera; (right) rough illustration of the effect of blurring in the current version of the EHT. The faster the black hole is spinning, the smaller the central gap in the accretion disk is likely to be. No matter what the extent of the accretion disk (dark red), the photon ring (yellow) remains at roughly the same location, changing only by 10% between a non-rotating black hole and a maximally rotating black hole of the same mass. But blurring in the camera combines the disk and photon ring into a thick ring whose brightness is dominated by the disk rather than the ring, and which can therefore be of different size even though the mass is the same. This implies that the radius of the blurry ring in the EHT `photo’, and the size of the dark region inside it, cannot by themselves tell us the black hole’s mass; at a minimum we must also know the rotation rate (which we do not.)

Gralla et al. subtly raise these questions but are careful not to overstate their case, perhaps because they have not yet completed their study of rotating black holes. But the question is now in the air.

I’m interested to hear what the EHT folks have to say about it, as I’m sure they have detailed arguments in favor of their procedures. In particular, EHT’s simulations show all of the effects mentioned above; there’s none of this of which they are unaware. (In fact, the reason I know my illustrations above are reasonable is partly because you can see similar pictures in the EHT papers.) As long as the EHT folks correctly accounted for all the issues, then they should have been able to properly measure the mass and estimate their uncertainties correctly. In fact, they don’t really use the photo itself; they use more subtle techniques applied to their telescope data directly. Thus it’s not enough to argue the photo itself is ambiguous; one has to argue that EHT’s more subtle analysis methods are flawed. No one has argued that yet, as far as I am aware.

But the one thing that’s clear right now is that science writers almost uniformly got it wrong [because the experts didn’t explain these points well] when they tried to describe the image two months ago. The `photo’ probably does not show “a photon ring surrounding a shadow.” That would be nice and simple and impressive-sounding, since it refers to fundamental properties of the black hole’s warping effects on space. But it’s far too glib, as Figures 7 and 9 show. We’re probably seeing an accretion disk supplemented by a photon ring, all blurred together, and the dark region may well be smaller than the black hole’s shadow.

(Rather than, or in addition to, the accretion disk, it is also possible that the dominant emission in the photo comes from the inner portion of one of the jets that emerges from the vicinity of the black hole; see Figure 8 above. This is another detail that makes the situation more difficult to interpret, but doesn’t change the main point I’m making.)

Someday in the not distant future, improved imaging should allow EHT to separately image the photon ring and the disk, so both can be observed easily, as in the left side of Figure 9. Then all these questions will be answered definitively.

Why the Gargantua Black Hole from Interstellar is Completely Different

Just as a quick aside, what would you see if an accretion disk were edge-on rather than face-on? Then, in a perfect camera, you’d see something like the famous picture of Gargantua, the black hole from the movie Interstellar — a direct image of the front edge of the disk, and a strongly lensed indirect image of the back side of the disk, appearing both above and below the black hole, as illustrated in Figure 11. And that leads to the Gargantua image from the movie, also shown in Figure 11. Notice the photon ring (which is, as I cautioned you earlier, off-center!)   [Note added: this figure has been modified; in the original version I referred to the top and bottom views of the disk’s far side as the  “1st indirect image”, but as pointed out by Professor Jean-Pierre Luminet, that’s not correct terminology here.]


Figure 10: The movie Interstellar features a visit to an imaginary black hole called Gargantua, and the simulated images in the movie (from 2014) are taken from near the equator, not the pole. As a result, the direct image of the disk cuts across the black hole, and indirect images of the back side of the disk are seen above and below the black hole. There is also a bright photon ring, slightly off center; this is well outside the surface of the black hole, which is not visible. A real image would not be symmetric left-to-right; it would be brighter on the side that is rotating toward the viewer.  At the bottom is shown a much more realistic visual image (albeit not so good quality) from 1994 by Jean-Alain Marck, in which this asymmetry can be seen clearly.

However, the movie image leaves out an important Doppler-like effect (which I’ll explain someday when I understand it 100%). This makes the part of the disk that is rotating toward us bright, and the part rotating away from us dim… and so a real image from this vantage point would be very asymmetric — bright on the left, dim on the right — unlike the movie image.  At the suggestion of Professsor Jean-Pierre Luminet I have added, at the bottom of Figure 10, a very early simulation by Jean-Alain Marck that shows this effect.

I mention this because a number of expert science journalists incorrectly explained the M87 image by referring to Gargantua — but that image has essentially nothing to do with the recent black hole `photo’. M87’s accretion disk is certainly not edge-on. The movie’s Gargantua image is taken from the equator, not from near the pole.

Final Remarks: Where a Rotating Black Hole Differs from a Non-Rotating One

Before I quit for the week, I’ll just summarize a few big differences for fast-rotating black holes compared to non-rotating ones.

1) As I’ve just emphasized, what a rotating black hole looks like to a distant observer depends not only on where the matter around the black hole is located but also on how the black hole’s rotation axis is oriented relative to the observer. A pole observer, an equatorial observer, and a near-pole observer see quite different things. (As noted in Figure 8, we are apparently near-south-pole observers for M87’s black hole.)

Let’s assume that the accretion disk lies in the same plane as the black hole’s equator — there are some reasons to expect this. Even then, the story is complex.

2) As I mentioned above, instead of a photon-sphere, there is a ‘photon-zone’ — a region where specially aimed photons can travel round the black hole multiple times. For high-enough spin (greater than about 80% of maximum as I recall), an accretion disk’s inner edge can lie within the photon zone, or even closer to the black hole than the photon zone; and this can cause a filling-in of the ‘shadow’.

3) Depending on the viewing angle, the indirect images of the disk that form the photon ring may not be a circle, and may not be concentric with the direct image of the disk. Only when viewed from along the rotation axis (i.e., above the north or south pole) will the direct and indirect images of the disk all be circular and concentric. We’re not viewing the M87bh on its axis, and that further complicates interpretation of the blurry image.

4) When the viewing angle is not along the rotation axis the image will be asymmetric, brighter on one side than the other. (This is true of EHT’s `photo’.) However, I know of at least four potential causes of this asymmetry, any or all of which might play a role, and the degree of asymmetry depends on properties of the accretion disk and the rotation rate of the black hole, both of which are currently unknown. Claims about the asymmetry made by the EHT folks seem, at least to me, to be based on certain assumptions that I, at least, cannot currently check.

Each of these complexities is a challenge to explain, so I’ll give both you and I a substantial break while I figure out how best to convey what is known (at least to me) about these issues.