Planet Musings

February 23, 2018

Terence TaoLong gaps in sieved sets

Kevin Ford, Sergei Konyagin, James Maynard, Carl Pomerance, and I have uploaded to the arXiv our paper “Long gaps in sieved sets“, submitted to J. Europ. Math. Soc..

This paper originated from the MSRI program in analytic number theory last year, and was centred around variants of the question of finding large gaps between primes. As discussed for instance in this previous post, it is now known that within the set of primes {{\mathcal P} = \{2,3,5,\dots\}}, one can find infinitely many adjacent elements {a,b} whose gap {b-a} obeys a lower bound of the form

\displaystyle b-a \gg \log a \frac{\log_2 a \log_4 a}{\log_3 a}

where {\log_k} denotes the {k}-fold iterated logarithm. This compares with the trivial bound of {b-a \gg \log a} that one can obtain from the prime number theorem and the pigeonhole principle. Several years ago, Pomerance posed the question of whether analogous improvements to the trivial bound can be obtained for such sets as

\displaystyle {\mathcal P}_2 = \{ n \in {\bf N}: n^2+1 \hbox{ prime} \}.

Here there is the obvious initial issue that this set is not even known to be infinite (this is the fourth Landau problem), but let us assume for the sake of discussion that this set is indeed infinite, so that we have an infinite number of gaps to speak of. Standard sieve theory techniques give upper bounds for the density of {{\mathcal P}_2} that is comparable (up to an absolute constant) to the prime number theorem bounds for {{\mathcal P}}, so again we can obtain a trivial bound of {b-a \gg \log a} for the gaps of {{\mathcal P}_2}. In this paper we improve this to

\displaystyle b-a \gg \log a \log^c_2 a

for an absolute constant {c>0}; this is not as strong as the corresponding bound for {{\mathcal P}}, but still improves over the trivial bound. In fact we can handle more general “sifted sets” than just {{\mathcal P}_2}. Recall from the sieve of Eratosthenes that the elements of {{\mathcal P}} in, say, the interval {[x/2, x]} can be obtained by removing from {[x/2, x]} one residue class modulo {p} for each prime up to {\sqrt{x}}, namely the class {0} mod {p}. In a similar vein, the elements of {{\mathcal P}_2} in {[x/2,x]} can be obtained by removing for each prime {p} up to {x} zero, one, or two residue classes modulo {p}, depending on whether {-1} is a quadratic residue modulo {p}. On the average, one residue class will be removed (this is a very basic case of the Chebotarev density theorem), so this sieving system is “one-dimensional on the average”. Roughly speaking, our arguments apply to any other set of numbers arising from a sieving system that is one-dimensional on average. (One can consider other dimensions also, but unfortunately our methods seem to give results that are worse than a trivial bound when the dimension is less than or greater than one.)

The standard “Erd\H{o}s-Rankin” method for constructing long gaps between primes proceeds by trying to line up some residue classes modulo small primes {p} so that they collectively occupy a long interval. A key tool in doing so are the smooth number estimates of de Bruijn and others, which among other things assert that if one removes from an interval such as {[1,x]} all the residue classes {0} mod {p} for {p} between {x^{1/u}} and {x} for some fixed {u>1}, then the set of survivors has exceptionally small density (roughly of the order of {u^{-u}}, with the precise density given by the Dickman function), in marked contrast to the situation in which one randomly removes one residue class for each such prime {p}, in which the density is more like {1/u}. One generally exploits this phenomenon to sieve out almost all the elements of a long interval using some of the primes available, and then using the remaining primes to cover up the remaining elements that have not already been sifted out. In the more recent work on this problem, advanced combinatorial tools such as hypergraph covering lemmas are used for the latter task.

In the case of {{\mathcal P}_2}, there does not appear to be any analogue of smooth numbers, in the sense that there is no obvious way to arrange the residue classes so that they have significantly fewer survivors than a random arrangement. Instead we adopt the following semi-random strategy to cover an interval {[1,y]} by residue classes. Firstly, we randomly remove residue classes for primes {p} up to some intermediate threshold {z} (smaller than {y} by a logarithmic factor), leaving behind a preliminary sifted set {S_{[2,z]}}. Then, for each prime {p} between {z} and another intermediate threshold {x/2}, we remove a residue class mod {p} that maximises (or nearly maximises) its intersection with {S_{[2,z]}}. This ends up reducing the number of survivors to be significantly below what one would achieve if one selects residue classes randomly, particularly if one also uses the hypergraph covering lemma from our previous paper. Finally, we cover each the remaining survivors by a residue class from a remaining available prime.

February 22, 2018

John BaezComplex Adaptive System Design (Part 7)

In March, I’ll be talking at Spencer Breiner‘s workshop on Applied Category Theory at the National Institute of Standards and Technology. I’ll be giving a joint talk with John Foley about our work using operads to design networks. This work is part of the Complex Adaptive System Composition and Design Environment project being done by Metron Scientific Solutions and managed by John Paschkewitz at DARPA.

I’ve written about this work before:

• Complex Adaptive Systems: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6.

But we’ve done a lot more, and my blog articles are having trouble keeping up! So I’d like to sketch out the big picture as it stands today.

If I had to summarize, I’d say we’ve developed a formalism for step-by-step compositional design and tasking, using commitment networks. But this takes a while to explain.

Here’s a very simple example of a commitment network:

It has four nodes, which represent agents: a port, a helicopter, a UAV (an unmanned aerial vehicle, or drone) and a target. The edges between these notes describe relationships between these agents. Some of these relationships are ‘commitments’. For example, the edges labelled ‘SIR’ say that one agent should ‘search, intervene and rescue’ the other.

Our framework for dealing with commitment networks has some special features. It uses operads, but this isn’t really saying much. An ‘operad’ is a bunch of ways of sticking things together. An ‘algebra’ of the operad gives a particular collection of these things, and says what we get when we stick them together. These concepts are extremely general, so there’s a huge diversity of operads, each with a huge diversity of algebras. To say one is using operads to solve a problem is a bit like saying one is using math. What matters more is the specific kind of operad one is using, and how one is using it.

For our work, we needed to develop a new class of operads called network operads, which are described here:

• John Baez, John Foley, Joseph Moeller and Blake Pollard, Network models.

In this paper we mainly discuss communication networks. Subsequently we’ve been working on a special class of network operads that describe how to build commitment networks.

Here are some of key ideas:

• Using network operads we can build bigger networks from smaller ones by overlaying them. David Spivak’s operad of wiring diagrams only let us ‘wire together’ smaller networks to form bigger ones:

Here networks X1, X2 and X3 are being wired together to form Y.

Network operads also let us wire together networks, but in addition they let us take one network:

and overlay another:

to create a larger network:

This is a new methodology for designing systems. We’re all used to building systems by wiring together subsystems: anyone who has a home stereo system has done this. But overlaying systems lets us do more. For example, we can take two plans of action involving the same collection of agents, and overlay them to get a new plan. We’ve all done this, too: you tell a bunch of people to do things… and then tell the same people, or an overlapping set of people, to do some other things. But lots of problems can arise if you aren’t careful. A mathematically principled approach can avoid some of these problems.

• The nodes of our networks represent agents of various types. The edges represent various relationships between agents. For example, they can represent communication channels. But more interestingly, they can represent commitments. For example, we can have an edge from A to B saying that agent A has to go rescue agent B. We call this kind of network a commitment network.

• By overlaying commitment networks, we can not only build systems out of smaller pieces but also build complicated plans by overlaying smaller pieces of plans. Since ‘tasking’ means telling a system what to do, we call this compositional tasking.

• If one isn’t careful, overlaying commitment networks can produce conflicts. Suppose we have a network with an edge saying that agent A has to rescue agent B. On top of this we overlay a network with an edge saying that A has to rescue agent C. If A can’t do both of these tasks at once, what should A do? There are various choices. We need to build a specific choice into the framework, so we can freely overlay commitment networks and get a well-defined result that doesn’t overburden the agents involved. We call this automatic deconflicting.

• Our approach to automatic deconflicting uses an idea developed by the famous category theorist Bill Lawvere: graphic monoids. I’ll explain these later, along with some of their side-benefits.

• Networks operads should let us do step-by-step compositional tasking. In other words, they should let us partially automate the process of tasking networks of agents, both

1) compositionally: tasking smaller networks and then sticking them together, e.g. by overlaying them, to get larger networks,

and

2) in a step-by-step way, starting at a low level of detail and then increasing the amount of detail.

To do this we need not just operads but their algebras.

• Remember, a network operad is a bunch of ways to stick together networks of some kind, e.g. by overlaying them. An algebra of this operad specifies a particular collection of networks of this kind, and says what we actually get when we stick them together.

So, a network operad can have one algebra in which things are described in a bare-bones, simplified way, and another algebra in which things are described in more detail. Indeed it will typically have many algebras, corresponding to many levels of detail, but for simplicity let’s just think about two.

When we have a ‘less detailed’ algebra A and a ‘more detailed’ algebra A', they will typically be related by a map

f \colon A' \to A

which ‘forgets the extra details’. This sort of map is called a homomorphism of algebras. We give examples in our paper Network models.

But what we usually want to do, when designing a system, is not forget extra detail, but rather add extra detail to a rough specification. There is not always a systematic way to do this. If there is, then we may have a homomorphism

g \colon A \to A'

going back the other way. This lets us automate the process of filling in the details. But we can’t usually count on being able to do this. So, often we may have to start with an element of A and search for an element of A' that is mapped to it by f : A' \to A. And typically we want this element to be optimal, or at least ‘good enough’, according to some chosen criteria. Expressing this idea formally helps us figure out how to automate the search. John Foley, in particular, has been working on this.

That’s an overview of our ideas.

Next, for the mathematically inclined, I want to give a few more details on one of the new elements not mentioned in our Network models paper: ‘graphic monoids’.

Graphic monoids

In our paper Network models we explain how the ‘overlay’ operation makes the collection of networks involving a given set of agents into a monoid. A monoid is a set M with a product that is associative and has an identity element 1:

(xy)z = x(yz)
1 x = x = x 1

In our application, this product is overlaying two networks.

A graphic monoid is one in which the graphic identity

x y x = x y

holds for all x,y.

To understand this identity, let us think of the elements of the monoid as “commitments”. The product x y means “first committing to do x, then committing to do y”. The graphic identity says that if we first commit to do x, then y, and then x again, it’s the same as first committing to do x and then y. Committing to do x again doesn’t change anything!

In particular, in any graphic monoid we have

xx = x 1 x = x 1 = x

so making the same commitment twice is the same as making it once. Mathematically we say every element x of a graphic monoid is idempotent:

x^2 = x

A commutative monoid obeying this law x^2 = x automatically obeys the graphic identity, since then

x y x = x^2 y = x y

But for a noncommutative monoid, the graphic identity is stronger than x^2 = x. It says that after committing to x, no matter what intervening commitments one might have made, committing to x again has no further effect. In other words: the intervening commitments did not undo the original commitment, so making the original commitment a second time has no effect! This captures the idea of how promises should behave.

As I said, for any network model, the set of all networks involving a fixed set of agents is a monoid. In a commitment network model, this monoid is required to be a graphic monoid. Joseph Moeller is writing a paper that shows how to construct a large class of commitment network models. We will follow this with a paper illustrating how to use these in compositional tasking.

For now, let me just mention a side-benefit. In any graphic monoid we can define a relation x \le y by

x \le y  \; \iff \; x a = y $ for some a

This makes the graphic monoid into a partially ordered set, meaning that these properties hold:

reflexivity: x \le x

transitivity: x \le y , y \le z \; \implies \; x \le z

antisymmetry: x \le y, y \le x \; \implies x = y

In the context of commitment networks, x \le y means that starting from x we can reach y by making some further commitment a: that is, x a = y for some a. So, as we ‘task’ a collection of agents by giving them more and more commitments, we move up in this partial order.

Tommaso DorigoDiboson Resonances Review Available Online

This is just a short note - a record-keeping, if you like - to report that my long review on "Collider Searches for Diboson Resonances" has now appeared on the online Elsevier site of the journal "Progress of Particle and Nuclear Physics". 
I had previously pointed to the preprint version of the same article on this blog, with the aim of getting feedback from experts in the field, and I am happy that this has indeed happened: I was able to integrate some corrections from Robert Shrock, a theorist at SUNY, as well as some integrations to the references list by a couple of other colleagues.

read more

BackreactionShut up and simulate. (In which I try to understand how dark matter forms galaxies, and end up very confused.)

Galactic structures.Illustris Simulation.[Image Source] Most of the mass in the universe isn’t a type of matter we are familiar with. Instead, it’s a mysterious kind of “dark matter” that doesn’t emit or absorb light. It also interacts rarely both with itself and with normal matter, too rarely to have left any trace in our detectors. We know dark matter is out there because we see its

Secret Blogging SeminarWhy single variable analysis is easier

Jadagul writes:

Got a draft of the course schedule for next year. Looks like I might get to teach real analysis.

I probably need someone to talk me out of trying to do everything in R^n.

A subsequent update indicates that the more standard alternative is teaching one variable analysis.

This is my second go around teaching rigorous multivariable analysis — key points are the multivariate chain rule, the inverse and implicit function theorems, Fubini’s theorem, the multivariate change of variables formula, the definition of manifolds, differential forms, Stokes’ theorem, the degree of a differentiable map and some preview of de Rham cohomology. I wouldn’t say I’m doing a great job, but at least I know why it’s hard to do. I haven’t taught single variable, but I have read over the day-to-day syllabus and problem sets of our most experienced instructor.

Here is the conceptual difference: It is quite doable to start with the axioms of an ordered field and build rigorously to the Fundamental Theorem of Calculus. Doing this gives students a real appreciation for the nontrivial power of mathematical reasoning. I don’t want to say that it is actually impossible to do the same for Stokes’ theorem (according to rumor, Brian Conrad did it), but I never manage — there comes a point where I start waving my hands and saying “Oh yes, and throw in a partition of unity” or “Yes, there is an inverse function theorem for maps between n-folds just like the one for maps between open subsets of \mathbb{R}^n.” I think most students probably benefit from seeing things done carefully for a term first.

Below the fold, a list of specific topics much harder in more than one variable. If you have found ways not to make them harder, please chime in in the comments!

• No need for linear algebra. Just defining the multivariate derivative uses the concept of a linear map, and stating the chain rule requires you to compose them. If you want your students to ever be able to check the hypotheses of the inverse function theorem, they have to be able to check if matrices are invertible.

• One variable Riemann sums are so nice! If u : [a,b] \to [c,d] is an increasing bijection, and a=x_0 < x_1 < \cdots < x_N = b is a partition of [a,b], then c=u(x_0) < u(x_1) < \cdots < u(x_N) =d is a partition of [c,d]; the u-substitution formula for integrals follows immediately. If u : [a_1, b_1] \times [a_2, b_2] \to [c_1, d_1] \times [c_2, d_2] is a smooth bijection, and we have a partition of [a_1, b_1] \times [a_2, b_2] into rectangles, its image in [c_1, d_1] \times [c_2, d_2] is quite hard to describe. This is why the change of variables formula is such a pain.

• Regions of integration: In one variable we always integrate over an interval. In many variables, we integrate over complicated regions, so we need to describe the geometry of complicated regions. If you want to give a region up into simpler pieces, you need to introduce some rudimentary notion of "measure zero", to make sure the boundaries you cut along aren't too large.

• Improper integrals: In one variable, we always take limits as the bounds of the integral go somewhere. In many variables, there are uncountably many different limiting processes which could define \int_{\mathbb{R}^2} f(x,y).

And that's not even getting into manifolds, or multilinear algebra…

February 21, 2018

n-Category Café Cartesian Bicategories

guest post by Daniel Cicala and Jules Hedges

We continue the Applied Category Theory Seminar with a discussion of Carboni and Walters’ paper Cartesian Bicategories I. The star of this paper is the notion of ‘bicategories of relations’. This is an abstraction of relations internal to a category. As such, this paper provides excellent, if technical, examples of internal relations and other internal category theory concepts. In this post, we discuss bicategories of relations while occasionally pausing to enjoy some internal category theory such as relations, adjoints, monads, and the Kleisli construction.

We’d like to thank Brendan Fong and Nina Otter for running such a great seminar. We’d also like to thank Paweł Sobociński and John Baez for helpful discussions.

Shortly after Bénabou introduced bicategories, a program was initiated to study these through profunctor bicategories. Carboni and Walters, however, decided to study bicategories with a more relational flavor. This is not quite as far a departure as one might think. Indeed, relations and profunctors are connected. Let’s recall two facts:

  • a profunctor from CC to DD is a functor from D op×CD^{op} \times C to SetSet, and

  • a relation between sets xx and yy can be described with a {0,1}\{0,1\}-valued matrix of size x×yx \times y.

Heuristically, profunctors can be thought of as a generalization of relations when considering profunctors as “Set\mathbf{Set}-valued matrix of size ob(C)×ob(D)\text{ob} (C) \times \text{ob} (D)”. As such, a line between profunctors and relations appears. In Cartesian Bicategories I, authors Carboni and Walters walk this line and continue a study of bicategories from a relational viewpoint.

The primary accomplishment of this paper is to characterize ‘bicategories of internal relations’ Rel(E)\mathbf{Rel}(E) and of ‘ordered objects’ Ord(E)\mathbf{Ord}(E) in a regular category EE. To do this, the authors begin by introducing the notion of Cartesian bicategory, an early example of a bicategory with a monoidal product. They then explore bicategories of relations, which are Cartesian bicategories whose objects are Frobenius monoids. The name “bicategories of relations” indicates their close relationship with classical relations Rel\mathbf{Rel}.

We begin by defining the two most important examples of a bicategory of relations: Rel(E)\mathbf{Rel}(E) and Ord(E)\mathbf{Ord}(E). Knowing these bicategories will ground us as we wade through the theory of Cartesian bicategories. We finish by characterizing Rel(E)\mathbf{Rel}(E) and Ord(E)\mathbf{Ord}(E) in terms of the developed theory.

Internal relations

In set theory, a relation from xx to yy is a subset of x×yx \times y. In category theory, things become more subtle. A relation rr from xx to yy internal to a category CC is a ‘jointly monic’ CC-span xr 0r^r 1yx \xleftarrow{r_0} \hat{r} \xrightarrow{r_1} y That is, for any arrows a,b:ur^a , b \colon u \to \hat{r} such that r 0a=r 0br_0 a = r_0 b and r 1a=r 1br_1 a = r_1 b hold, then a=ba = b. In a category with products, this definition simplifies substantially; it is merely a monic arrow r:r^x×yr \colon \hat{r} \to x \times y.

Given a span xcwdyx \xleftarrow{c} w \xrightarrow{d} y and the relation rr 0,r 1r \coloneqq \langle r_0 , r_1 \rangle from above, we say that cc is rr-related to dd if there is an wr^w \to \hat{r} so that

commutes. We will write r:cd r \colon c \nrightarrow d when cc is rr-related to dd.

While we can talk about relations internal to anyany category, we cannot generally assemble them into another category. However, if we start with a regular category EE, then there is a bicategory Rel(E)\mathbf{Rel}(E) of relations internal to EE. The objects are those of EE. The arrows are the relations internal to EE with composition given by pullback:

Additionally, we have a unique 2-cell, written rsr \leq s, whenever s:r 0r 1 s \colon r_0 \nrightarrow r_1 . Diagrammatically, rsr \leq s if there exists a commuting diagram

Internal ordered objects

We are quite used to the idea of having an order on a set. But what about an order on a category? This is captured by Ord(E),\mathbf{Ord}(E), the bicategory of ordered objects and ideals in a regular category EE.

The objects of Ord(E)\mathbf{Ord}(E) are ordered objects in EE. An ordered object is a pair (x,r)(x,r) consisting of an EE-object xx and a reflexive and transitive relation r:xxr : x \to x internal to EE.

(Puzzle: r r is a monic of type r:r^x×x r \colon \hat{r} \to x \times x. Both reflexivity and transitivity can be defined using morphisms. What are the domains and codomains? What properties should be satisfied?)

The arrows of Ord(E)\mathbf{Ord}(E) are a sort of ‘order preserving relation’ called an ideal. Precisely, an ideal f:(x,r)(y,s)f \colon (x,r) \to (y,s) between ordered objects is a relation f:xyf \colon x \nrightarrow y such that given

  • morphisms a,a,b,b a , a' , b , b' with a common domain z z , and

  • relations r:aa r \colon a \nrightarrow a', f:ab f \colon a' \nrightarrow b', and s:bb s \colon b' \nrightarrow b

then f:ab f \colon a \nrightarrow b.

In Set \mathbf{Set} , an ordered object is a preordered set and an ideal f:(x,r)(y,s) f \colon (x,r) \to (y,s) is a directed subset of x×yx \times y with the property that if it contains s s and ss s' \leq s , then it contains s s' .

There is at most a single 2-cell between parallel arrows in Ord(E)\mathbf{Ord}(E). Given f,g:(x,r)(y,s)f , g \colon (x,r) \to (y,s), write fgf \leq g whenever g:f 0f 1 g \colon f_0 \nrightarrow f_1 .

Cartesian Bicategories

Now that we know what bicategories we have the pleasure of working with, we can move forward with the theoretical aspects. As we work through the upcoming definitions, it is helpful to recall our motivating examples Rel(E)\mathbf{Rel}(E) and Ord(E)\mathbf{Ord}(E).

As mentioned above, in the early days of bicategory theory, mathematicians would study bicategories as VV-enriched profunctor bicategories for some suitable VV. A shrewd observation was made that when VV is Cartesian, a VV-profunctor bicategory has several important commonalities with Rel(E)\mathbf{Rel}(E) and Ord(E)\mathbf{Ord}(E). Namely, there is the existence of a Cartesian product \otimes, plus for each object xx, a diagonal arrow Δ:xxx\Delta \colon x \to x \otimes x and terminal object ϵ:xI\epsilon \colon x \to I. With this insight, Carboni and Walters decided to take this structute as primitive.

To simplify coherence, we only look at locally posetal bicategories (i.e. Pos\mathbf{Pos}-enriched categories). This renders 2-dimensional coherences redundant as all parallel 2-cells manifestly commute. This assumption also endows each hom-poset with 2-cells \leq and, as we will see, local meets \wedge. For the remainder of this article, all bicategories will be locally posetal unless otherwise stated.

Definition. A locally posetal bicategory BB is Cartesian when equipped with

  • a symmetric tensor product BBBB \otimes B \to B,

  • a cocommutative comonoid structure, Δ x:xxx\Delta_x \colon x \to x \otimes x, and ϵ x:xI\epsilon_x \colon x \to I, on every BB-object xx

such that

  • every 1-arrow r:xyr \colon x \to y is a lax comonoid homomorphism, i.e.

Δ yr(rr)Δ xandϵ yrϵ x \Delta_y r \leq ( r \otimes r ) \Delta_x \quad \text{and} \quad \epsilon_y r \leq \epsilon_x

  • for all objects xx, both Δ x\Delta_x and ϵ x\epsilon_x have right adjoints Δ x *\Delta^\ast_x and ϵ x *\epsilon^\ast_x.

Moreover, (Δ x,ϵ x)(\Delta_x , \epsilon_x) is the only cocommutative comonoid structure on xx admitting right adjoints.

(Question: This definition contains a slight ambiguity in the authors use of the term “moreover”. Is the uniqueness property of the cocommutative comonoid structure an additional axiom or does it follow from the other axioms?)

If you’re not accustomed to thinking about adjoints internal to a general bicategory, place yourself for a moment in Cat\mathbf{Cat}. Recall that adjoint functors are merely a pair of arrows (adjoint functors) together with a pair of 2-cells (unit and counit) obeying certain equations. But this sort of data can exist in any bicategory, not just Cat\mathbf{Cat}. It is worth spending a minute to feel comfortable with this concept because, in what follows, adjoints play an important role.

Observe that the right adjoints Δ x *\Delta^\ast_x and ϵ x *\epsilon^\ast_x turn xx into a commutative monoid object, hence a bimonoid. The (co)commutative (co)monoid structure on an object xx extend to a tensor product on xxx \otimes x as seen in this string diagram:

Ultimately, we want to think of arrows in a Cartesian bicategory as generalized relations. What other considerations are required to do this? To answer this, it is helpful to first think about what a generalized function should be.

For the moment, let’s use our Set\mathbf{Set} based intuition. For a relation to be a function, we ask that every element of the domain is related to an element of the codomain (entireness) and that the relationship is unique (determinism). How do we encode these requirements into this new, general situation? Again, let’s use intuition from relations in Set\mathbf{Set}. Let rx×yr \nrightarrow x \times y be a relation and r y×xr^\circ \nrightarrow y \times x be the relation defined by r :yx r^{\circ} \colon y \nrightarrow x whenever r:xyr \colon x \nrightarrow y. To say that rr is entire is equivalent to saying that the composite relation r rr^\circ r contains the identity relation on xx (puzzle). To say that rr is deterministic is to say that the composite relation rr rr^\circ is contained by the identity (another puzzle). These two containments are concisely expressed by writing 1r r1 \leq r^\circ r and rr 1r r^\circ \leq 1. Hence rr and r r^\circ form an adjoint pair! This leads us to the following definition.

Definition. An arrow of a Cartesian bicategory is a map when it has a right adjoint. Maps are closed under identity and composition. Hence, for any Cartesian bicategory BB, there is the full subbicategory Map(B)\mathbf{Map}(B) whose arrows are the maps in BB.

(Puzzle: There is an equivalences of categories EMap(Rel(E))E \simeq \mathbf{Map}(\mathbf{Rel}(E)) for a regular category EE. What does this say for E:=SetE := \mathbf{Set}?)

We can now state what appears as Theorem 1.6 of the paper. Recall that () *(-)^\ast refers to the right adjoint.

Theorem. Let BB be a locally posetal bicategory. If BB is Cartesian, then

  • Map(B)\mathbf{Map}(B) has finite bicategorical products \otimes,

  • the hom-posets have finite meets \wedge (i.e. categorical products) and the identity arrow in B(I,I)B(I,I) is maximal (i.e. a terminal object), and

  • bicategorical products and biterminal object in Map(B)\mathbf{Map}(B) may be chosen so that rs=(p *rp)(psp *)r \otimes s = (p^\ast r p) \wedge (p s p^\ast),where pp denotes the appropriate projection.

Conversely, if the first two properties are satisfied and the third defines a tensor product, then BB is Cartesian.

This theorem gives a nice characterisation of Cartesian bicategories. The first two axioms are straightforward enough, but what is the significance of the above tensor product equation?

It’s actually quite painless when you break it down. Note, every bicategorical product \otimes comes with projections pp and inclusions p *p^\ast. Now, let r:wyr \colon w \to y and s:xzs \colon x \to z which gives rs:wxyzr \otimes s \colon w \otimes x \to y \otimes z. One canonical arrow of type wxyzw \otimes x \to y \otimes z is p *rpp^\ast r p which first projects to ww, arrives at yy via rr, which then includes into ywy \otimes w. The other arrow is similar, except we first project to xx. The above theorem says that by combining these two arrows with a meet \wedge, the only available operation, we get our tensor product.

Characterizing bicatgories of internal relations

The next stage is to add to Cartesian bicategories the property that each object is a Frobenius monoid. In this section we will study such bicategories and see that Cartesian plus Frobenius provides a reasonable axiomatization of relations.

Recall that an object with monoid and comonoid structures is called a Frobenius monoid if the equation

holds. If you’re not familiar with this equation, it has an interesting history as outlined by Walters. Now, if every object in a Cartesian bicategory is a Frobenius monoid, we call it a bicategory of relations. This term is a bit overworked as it commonly refers to Rel(E)\mathbf{Rel}(E). Therefore, we will be careful to call the latter a “bicategory of internal relations”.

Why are bicatgories of relations better than simply Cartesian bicategories? For one, they admit a compact closed structure! This appears as Theorem 2.4 in the paper.

Theorem. A bicategory of relations has a compact closed structure. Objects are self-dual via the unit

Δϵ x *:Ixx \Delta \epsilon^\ast_x \colon I \to x \otimes x

and counit

ϵΔ x *:xxI. \epsilon \Delta^\ast_x \colon x \otimes x \to I.

Moreover, the dual r r^\circ of any arrow r:xyr \colon x \to y satisfies

(rid)Δ(1r )Δr (r \otimes id) \Delta \leq (1 \otimes r^\circ) \Delta r

and

Δ *(rid)rΔ *(1r ). \Delta^\ast (r \otimes id) \leq r \Delta^\ast (1 \otimes r^\circ).

Or if you prefer string diagrams, the above inequalities are respectively

and

Because a bicategory of relations is Cartesian, maps are still present. In fact, they have a very nice characterization here.

Lemma. In a bicategory of relations, an arrow rr is a map iff it is a (strict) comonoid homomorphism iff rr r \dashv r^\circ.

As one would hope, the adjoint of a map corresponds with the involution coming from the compact closed structure. The following corollary provides further evidence that maps are well-behaved.

Corollary. In a bicategory of relations:

  • ff is a map implies f =f *f^\circ = f^\ast. In particular, multiplication is adjoint to comultiplication and the unit is adjoint to the counit.

  • for maps ff and gg, if fgf \leq g then f=gf=g.

But maps don’t merely behave in a nice way. They also contain a lot of the information about a Cartesian bicategory and, when working with bicategories of relations, the local information is quite fruitful too. This is made precise in the following corollary.

Corollary. Let F:BDF \colon B \to D be a pseudofunctor between bicategories of relations. The following are equivalent:

  • FF strictly preserves the Frobenius structure.

  • The restriction F:Map(B)Map(D)F \colon \mathbf{Map}(B) \to \mathbf{Map}(D) strictly preserves the comonoid structure.

  • FF preserves local meets and II.

Characterizing bicategories of internal relations

The entire point of the theory developed above is to be able to prove things about certain classes of bicategories. In this section, we provide a characterization theorem for bicategories of internal relations. Freyd had already given this characterization using allegories. However, he relied on a proof by contradiction whereas using bicategories of relations allows for a constructive proof.

A bicategory of relations is meant to generalize bicategories of internal relations. Given a bicategory of relations, we’d like to know when an arrow is “like an internal relation”.

Definition. An arrow r:xyr \colon x \to y is a tabulation if there exists maps f:zxf \colon z \to x and g:zyg \colon z \to y such that r=gf *r = g f^\ast and f *fg *g=1 zf^\ast f \wedge g^\ast g = 1_z.

This definition seems bizarre on its face, but it really is analogous to the jointly monic span-definition of an internal relation. That r=gf *r = g f^\ast is saying that rr is like a span xfzgyx \xleftarrow{f} z \xrightarrow{g} y. The equation f *fg *g=1 zf^\ast f \wedge g^\ast g = 1_z implies that this span is jointly monic (puzzle).

A bicategory of relations is called functionally complete if every arrow r:xIr \colon x \to I has a tabulation i:x rxi \colon x_r \to x and t:x rIt \colon x_r \to I. One can show that the existence of these tabulations together with compact closedness is sufficient to obtain a unique (up to isomorphism) tabulation for every arrow. We now provide the characterization, presented as Theorem 3.5.

Theorem. Let BB be a functionally complete bicategory of relations. Then:

  • Map(B)\mathbf{Map}(B) is a regular category (all 2-arrows are trivial by an above corollary)

  • There is a biequivalence of bicategories Rel(Map(B))B\mathbf{Rel}(\mathbf{Map}(B)) \simeq B obtained by sending the relation f,g\langle f,g \rangle of Rel(Map(B))\mathbf{Rel}(\mathbf{Map}(B)) to the arrow gf g f^\circ of BB.

So all functionally complete bicategories of relations are bicategories of internal relations. An interesting quesion is whether any regular category can be realized as Map(B)\mathbf{Map}(B) for some functionally complete bicategory of relations. Perhaps a knowledgeable passerby will gift us with an answer in the comments!

From this theorem, we can classify some important types of categories. For instance, bicategories of relations internal to a Heyting category are exactly the functionally complete bicategory of relations having all right Kan extensions. Bicategories of relations internal to an elementary topos are exactly the functionally complete bicategories of relations BB such that B(x,)B(x,-) is representable in Map(B)\mathbf{Map}(B) for all objects xx.

Characterizing ordered object bicategories

The goal of this section is to characterize the bicategory Ord(E)\mathbf{Ord}(E) of ordered objects and ideals. We already introduced Ord(E)\mathbf{Ord}(E) earlier, but that definition isn’t quite abstract enough for our purposes. An equivalent way of defining an ordered object in EE is as an EE-object xx together with a relation rr on xx such that 1r1 \leq r and rrrr r \leq r. Does this data look familiar? An ordered object in EE is simply a monad in Rel(E)\mathbf{Rel}(E)!

(Puzzle: What is a monad in a general bicategory? Hint: how are adjoints defined in a general bicategory?)

Quite a bit is known about monads, and we can now apply that knowledge to our study of Ord(E)\mathbf{Ord}(E).

Recall that any monad in Cat\mathbf{Cat} gives rise to a category of adjunctions. The initial object of this category is the Kleisli category. Since the Kleisli category can be defined using a universal property, we can define a Kleisli object in any bicategory. In general, a Kleisli object for a monad t:xxt \colon x \to x need not exist but when it does, it is defined as an arrow k:xx tk : x \to x_t plus a 2-arrow θ:ktk\theta \colon k t \to k such that, given any arrow f:xyf \colon x \to y and 2-arrow α:ftf\alpha \colon f t \to f, there exists a unique arrow h:x tyh \colon x_t \to y such that hk=fh k = f. The pasting diagrams involved also commute:

As in the case of working inside Cat\mathbf{Cat}, we would expect for kk to be on the left of an adjoint pair, and indeed it is. We get a right adjoint k *k^\ast such that the composite k *kk^\ast k is our original monad tt. The benefit of working in the locally posetal case is we also have that kk *=1k k^\ast = 1. This realizes tt as an idempotent:

tt=k *kk *k=k *k=t. t t = k^\ast k k^\ast k = k^\ast k = t.

It follows that the Kleisli object construction is exactly an idempotent splitting of tt! This means we can start with an exact category EE and construct Ord(E)\mathbf{Ord}(E) by splitting the idempotents of Rel(E)\mathbf{Rel}(E). With this in mind, we move on to the characterization, presented as Theorem 4.6.

Theorem. A bicategory BB is biequivalent to a bicategory Ord(E)\mathbf{Ord}(E) if and only if

  • BB is Cartesian,

  • every monad in BB has a Kleisli object,

  • for each object xx, there is a monad on xx and a Frobenius monoid x 0x_0 that is isomorphic to the monad’s Kleisli object,

  • given a Frobenius monoid x and f:xxf \colon x \to x with f1f \leq 1, ff splits.

Final words

The authors go on to look closer at bicategories of relations inside Grothendieck topoi and abelian categories. Both of these are regular categories, and so fit into the picture we’ve just painted. However, each have additional properties and structure that compels further study.

Much of what we have done can be done in greater generality. For instance, we can drop the local posetal requirement. However, this would greatly complicate matters by requiring non-trivial coherence conditions.

February 20, 2018

Jonathan ShockReflections

It's been three years or so since my last post. It would be cliche to talk of all that has changed, but perhaps the more crucial aspects surround what has remained the same. Ironically, the changes have often been associated with the realisation of themes and patterns which recur again and again...perhaps it is simply the spotting of these patterns which leads to potential growth, though growth often feels like a multi-directional random walk, where over time you get further from where you started, but the directions only proliferate as you find more corners to turn down. The perpendicular nature of Euclidean dimensions is itself probably illusory as here, directions may curve back on themselves and become a lost corridor which you had previously explored and long since left behind, with dust hiding your former footsteps...but the motes still floating reminds you that you've been here before and that there are mistakes still to learn from what may be a narrow and uncomfortable passageway. So perhaps it is all some giant torus, and the trick is counting the closed cycles and trying to find ones which are enjoyable to go around, rather than trying to escape from them all, which is probably futile.

A cycle which recurs is hidden truth, or perhaps, more accurately the hiding of truth...it may be deeper than the hiding of truth from the map as the map-maker himself may be unaware of what is resistance/compromise/expectation/honesty/pain/frustration or joy in a given journey. Google Maps has to go back to the matrix to find any reality hidden within the neurons, and maybe the reality has been erased long ago...'safe passage', or 'dead end' being all that is left...where in fact it may not be a dead end at all, and the paralysis that ensues would be eased with only the simplest of secret code words being divulged...I don't like that...there is a better way...can you do this for me?

This is not to be read as deep or meaningful, but really as a little gush of words, to ease the fingers into writing some more, which I hope to do. This is a deep breath and a stretch after a long long sleep...


John BaezApplied Category Theory at NIST

I think it’s really cool how applied category theory is catching on. My former student Blake Pollard is working at the National Institute of Standards and Technology on applications of category theory to electrical engineering. He’s working with Spencer Breiner… and now Breiner is running a workshop on this stuff:

• Applied Category Theory: Bridging Theory & Practice, March 15–16, 2018, NIST, Gaithersburg, Maryland, USA.

It’s by invitation only, but I can’t resist mentioning its existence. Here’s the idea:

What: The Information Technology Laboratory at NIST is pleased to announce a workshop on Applied Category Theory to be held at NIST’s Gaithersburg, Maryland campus on March 15 & 16, 2018. The meeting will focus on practical avenues for introducing methods from category theory into real-world applications, with an emphasis on examples rather than theorems.

Who: The workshop aims to bring together two distinct groups. First, category theorists interested in pursuing applications outside of the usual mathematical fields. Second, domain experts and research managers from industry, government, science and engineering who have in mind potential domain applications for categorical methods.

Intended Outcomes: A proposed landscape of potential CT applications and the infrastructure needed to realize them, together with a 5-10 year roadmap for developing the field of applied category theory. This should include perspectives from industry, academia and government as well as major milestones, potential funding sources, avenues for technology transfer and necessary improvements in tool support and methodology. Exploratory collaborations between category theorists and domain experts. We will ask that each group come prepared to meet the other side. Mathematicians should be prepared with concrete examples that demonstrate practical applications of CT in an intuitive way. Domain experts should bring to the table specific problems to which they can devote time and/or funding as well as some reasons about why they think CT might be relevant to this application.

Invited Speakers:
John Baez (University of California at Riverside) and John Foley (Metron Scientific Solutions).
Bob Coecke (University of Oxford).
Dusko Pavlovic (University of Hawaii).

Some other likely participants include Chris Boner (Metron), Arquimedes Canedo (Siemens at Princeton), Stephane Dugowson (Supméca), William Edmonson (North Carolina A&T), Brendan Fong (MIT), Mark Fuge (University of Maryland), Jack Gray (Penumbra), Steve Huntsman (BAE Systems), Patrick Johnson (Dassault Systèmes), Al Jones (NIST), Cliff Joslyn (Pacific Northwest National Laboratory), Richard Malek (NSF), Tom Mifflin (Metron), Ira Monarch (Carnegie Mellon), John Paschkewitz (DARPA), Evan Patterson (Stanford), Blake Pollard (NIST), Emilie Purvine (Pacific Northwest National Laboratory), Mark Raugas (Pacific Northwest National Laboratory), Bill Regli (University of Maryland), Michael Robinson (American U.) Alberto Speranzon (Honeywell Aerospace), David Spivak (MIT), Eswaran Subrahmanian (Carnegie Mellon), Jamie Vicary (Birmingham and Oxford), and Ryan Wisnesky (Categorical Informatics).

A bunch of us will stay on into the weekend and talk some more. I hope we make a lot of progress—and I plan to let you know how it goes!

n-Category Café A Categorical Semantics for Causal Structure

guest post by Joseph Moeller and Dmitry Vagner

We begin the Applied Category Theory Seminar by discussing the paper A categorical semantics for causal structure by Aleks Kissinger and Sander Uijlen.

Compact closed categories have been used in categorical quantum mechanics to give a structure for talking about quantum processes. However, they prove insufficient to handle higher order processes, in other words, processes of processes. This paper offers a construction for a *\ast-autonomous extension of a given compact closed category which allows one to reason about higher order processes in a non-trivial way.

We would like to thank Brendan Fong, Nina Otter, Joseph Hirsh and Tobias Heindel as well as the other participants for the discussions and feedback.

Preliminaries

We begin with a discussion about the types of categories which we will be working with, and the diagrammatic language we use to reason about these categories.

Diagrammatics

Recall the following diagrammatic language we use to reason about symmetric monoidal categories. Objects are represented by wires. Arrows can be graphically encoded as

Composition \circ and \otimes depicted vertically and horizontally

satisfying the properties

and the interchange law

If II is the unit object, and AA is an object, we call arrows IAI \to A states, AIA \to I effects, and III \to I numbers.

The identity morphism on an object is only displayed as a wire, and both II and its identity morphism are not displayed.

Compact closed categories

A symmetric monoidal category is compact closed if each object AA has a dual object A *A^\ast with arrows

η A:IA *A,\eta_A \colon I \to A^\ast \otimes A,

and

ϵ A:AA *I,\epsilon_A \colon A \otimes A^\ast \to I,

depicted as \cup and \cap and obeying the zigzag identities:

Given a process f:ABf \colon A \to B in a compact closed category, we can construct a state ρ f:IA *B\rho_f \colon I \to A^\ast \otimes B by defining

ρ f=(1 A *f)η A.\rho_f = (1_{A^\ast} \otimes f) \circ \eta_A.

This gives a correspondence which is called “process-state duality”.

An example

Let Mat( +)Mat(\mathbb{R}_+) be the category in which objects are natural numbers, and morphisms mnm \to n are n×mn\times m +\mathbb{R}_+-matrices with composition given by the usual multiplication of matrices. This category is made symmetric monoidal with tensor defined by nm=nmn \otimes m = nm on objects and the Kronecker product of matrices on arrows, (fg) ij kl=f i kg j l(f \otimes g)^{kl}_{ij} = f^k_i g^l_j. For example

[1 2 3 4][0 5 6 7]=[1[0 5 6 7] 2[0 5 6 7] 3[0 5 6 7] 4[0 5 6 7]]=[0 5 0 10 6 7 12 14 0 15 0 20 18 21 24 28] \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \otimes \begin{bmatrix} 0 & 5 \\ 6 & 7 \end{bmatrix} = \begin{bmatrix} 1\cdot \begin{bmatrix} 0 & 5 \\ 6 & 7 \end{bmatrix} & 2\cdot \begin{bmatrix} 0 & 5 \\ 6 & 7 \end{bmatrix} \\ 3 \cdot \begin{bmatrix} 0 & 5 \\ 6 & 7 \end{bmatrix} &4\cdot \begin{bmatrix} 0 & 5 \\ 6 & 7 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 0 & 5 & 0 & 10 \\ 6 & 7 & 12 & 14 \\ 0 & 15 & 0 & 20 \\ 18 & 21 & 24 & 28 \end{bmatrix}

The unit with respect to this tensor is 11. States in this category are column vectors, effects are row vectors, and numbers are 1×11\times 1 matrices, in other words, numbers. Composing a state ρ:1n\rho\colon 1 \to n with an effect π:n1\pi\colon n \to 1, is the dot product. To define a compact closed structure on this category, let n *:=nn^\ast := n. Then η n:1n 2\eta_n \colon 1 \to n^2 and ε n:n 21\varepsilon_n \colon n^2 \to 1 are given by the Kronecker delta.

A Categorical Framework for Causality

Encoding causality

The main construction in this paper requires what is called a precausal category. In a precausal category, we demand that every system has a discard effect, which is a process A:AI_A \colon A \to I. This collection of effects must be compatible with \otimes:

  • AB=_{A \otimes B} = A_A B_B

  • I=1_I = 1

A process Φ:AB\Phi \colon A \to B is called causal if discarding BB after having done Φ\Phi is the same as just discarding AA.

If A *A^\ast has discarding, we can produce a state for AA by spawning an AA and A *A^\ast pair, then discarding the A *A^\ast:

In the case of Mat( +)Mat(\mathbb{R}_+), the discard effect is given as row vector of 11’s: 1:=(11)\mathbf{1}:=(1\cdots 1). Composing a matrix with the discard effect sums the entries of each column. So if a matrix is a causal process, then its column vectors have entries that sum to 11. Thus causal processes in Mat( +)Mat(\mathbb{R}_+) are stochastic maps.

A process Φ:ABAB\Phi \colon A \otimes B \to A' \otimes B' is one-way signalling with ABA \preceq B if

and BAB \preceq A if

and non-signalling if both ABA\preceq B and BAB\preceq A.

The intuition here is that ABA \preceq B means BB cannot signal to AA; the formal condition encodes the fact that had BB influenced the transformation from AA to AA', then it couldn’t have been discarded prior to it occurring.

Consider the following example: let AA be a cup of tea, and BB a glass of milk. Let Φ\Phi the process of pouring half of BB into AA then mixing, to form AA' milktea and BB' half-glass of milk. Clearly this process would not be the same as if we start by discarding the milk. Our intuition is that the milk “signalled” to, or influenced, the tea, and hence intuitively we do not have ABA \preceq B.

A compact closed category C\C is precausal if

1) Every system AA has a discard process A_A

2) For every system AA, the dimension is invertible

3) C\C has enough causal states

4) Second order causal processes factor

From the definition, we can begin to exclude certain causal situations from systems in precausal categories. In Theorem 3.12, we see that precausal categories do not admit ‘time-travel’.

Theorem   If there are systems AA, BB, CC such that

then AIA \cong I.

In precausal categories, we have processes that we call first order causal. However, higher order processes collapse into first order processes, because precausal categories are compact closed. For example, letting AB:=A *BA \Rightarrow B := A^\ast\otimes B,

(AB)C=(A *B)*CBAC (A\Rightarrow B)\Rightarrow C = (A^\ast\otimes B)\ast\otimes C \cong B\Rightarrow A\otimes C

We can see this happens because of the condition AB(A *B *) *A \otimes B \cong (A^\ast \otimes B^\ast)^\ast. Weakening this condition of compact closed categories yields *\ast-autonomous categories. From a precausal category C\C, we construct a category Caus[C]Caus[\C] of higher order causal relations.

The category of higher order causal processes

Given a set of states cC(I,A)c \subseteq \C(I,A) for a system AA, define its dual by

c *={πA *|ρc,πρ=1}C(I,A *) c^\ast = \{\pi \in A^\ast |\, \forall \rho \in c ,\, \pi \circ \rho = 1\} \subseteq \C(I, A^\ast)

Then we say a set of states cC(I,A)c \subseteq \C(I,A) is closed if c=c **c=c^{\ast\ast}, and flat if there are invertible scalars λ\lambda, μ\mu such that λ\lambda c\in c, μ\mu c *\in c^\ast.

Now we can define the category Caus[C]Caus[\C]. Let the objects be pairs (A,c A)(A, c_A) where c Ac_A is a closed and flat set of states of the system ACA \in \C. A morphism f:(A,c A)(B,c B)f \colon (A, c_A) \to (B, c_B) is a morphism f:ABf \colon A \to B in C\C such that if ρc A\rho \in c_A, then fρc Bf \circ \rho \in c_B. This category is a symmetric monoidal category with (A,c A)(B,c B)=(AB,c AB)(A, c_A) \otimes (B, c_B) = (A \otimes B, c_{A \otimes B}). Further, it’s *\ast-autonomous, so higher order processes won’t necessarily collapse into first order.

A first order system in Caus[C]Caus[\C] is one of the form (A,{(A, \{A} *)_A\}^\ast). First order systems are closed under \otimes. In fact, C CC_C admits a full faithful monoidal embedding into Caus[C]Caus[\C] by assigning systems to their corresponding first order systems A(A,{A \mapsto (A, \{A})_A\}).

For an example of a higher order process in Caus[Mat( +)]Caus[Mat(\mathbb{R}_+)], consider a classical switch. Let

ρ 0=[1 0],ρ 1=[0 1],ρ i=ρ i T,\rho_0 = \begin{bmatrix} 1\\0 \end{bmatrix}, \quad \rho_1 = \begin{bmatrix} 0\\1 \end{bmatrix}, \quad \rho'_i = \rho_i^T,

and let ss be the second order process

This process is of type XC(AB)(BA)CX \otimes C \multimap (A \multimap B) \otimes (B \multimap A) \multimap C', where the two middle inputs take types ABA \multimap B on the left and BAB \multimap A on the right. Since ρ iρ j\rho'_i \circ \rho_j is 11 if i=ji=j and 00 otherwise, then plugging in either ρ 0\rho_0 or ρ 1\rho_1 to the bottom left input switches the order that the ABA \multimap B and BAB \multimap A processes are composed in the final output process. This second order process is causal because

The authors go on to prove in Theorem 6.17 that a switch cannot be causally ordered, indicating that this process is genuinely second order.

February 19, 2018

n-Category Café Gradual Typing

(Guest post by Max New)

Dan Licata and I have just put up a paper on the arxiv with a syntax and semantics for a gradually typed programming language, which is a kind of synthesis of statically typed and dynamically typed programming styles. The central insight of the paper is to show that the dynamic type checking used in gradual typing has the structure of a proarrow equipment. Using this we can show that some traditional definitions of dynamic type checks can be proven to be in fact unique solutions to the specifications provided by the structure of an equipment. It’s a classic application of category theory: finding a universal property to better understand what was previously an ad-hoc construction.

The paper is written for computer scientists, so I’ll try to provide a more category-theorist-accessible intro here.

Dynamic Typing

First, a very brief introduction to what dynamic typing is. To put it very simply, dynamically typed languages are languages where there is only one type, the “dynamic type” which I’ll call ?? (more details here). They have the same relationship to typed languages that monoids do to categories or operads to multicategories. So for example, in a dynamically typed language the function application rule looks like

Γt:?Γu:?Γt(u):?\frac{\Gamma \vdash t : ?\,\, \Gamma \vdash u : ?}{\Gamma \vdash t(u) : ?}

That is, it always type-checks, whereas in a typed language the tt above would need to be of a function type ABA \to B and uu would have to be of type AA. However, these languages are not like the pure untyped lambda calculus where everything is a function: they will also include other types of data like numbers, pairs, strings etc. So whereas in a statically typed language the syntax 3(3)3(3) would be rejected at type-checking time, we can think of the dynamic language as delaying this type error until run-time, when the program 3(3)3(3) will crash and say “3 is not a function”. This is implemented essentially by representing a value in the dynamic language as an element of a coproduct of all the primitive types in the language. Then 33 becomes (number,3)(number, 3) and the implementation for the function application form will see that the value is not tagged as a function and crash. On the other hand if we have an application (function,λx.t)((number,3))(function, \lambda x. t)((number, 3)) the implementation will see that the tag is a function and then run t[(number,3)/x]t[(number,3)/x], which may itself raise a dynamic type error later if tt uses its input as a pair, or a function.

So with all that ugliness, why do we care about these languages? Well it is very easy to write down a program and just run it without fighting with the type checker. Anyone who’s had to restructure their coq or agda program to convince the type checker that it is terminating should be able to relate here. Furthermore, even if I do think these languages are ugly and awful it is a fact that these languages are extremely popular: JavaScript, Python, Perl, Ruby, PHP, LISP/Scheme are all used to write real working code and we certainly don’t want to chuck out all of those codebases and laboriously rewrite them in a typed language.

Gradual Typing

Gradually typed languages give us a better way: they embed the dynamic language in them, but we can also program in a statically typed style and have it interoperate with the dynamically typed code. This means new code can be written in a statically typed language while still being able to call the legacy dynamically typed code without having to sacrifice all of the benefits of static typing.

To a first approximation, a gradually typed language is a statically typed language with dynamic type errors, a distinguished dynamic type ?? and a distinguished way to coerce values from any type to and from the dynamic type (possibly causing type errors). Then a dynamically typed program can be compiled to the gradual language by translating the implicit dynamic type checking to explicit casts.

For a gradually typed language to deserve the name, it should be on the one hand typed, meaning the types have their intended categorical semantics as products, exponentials, etc and on the other hand it should satisfy graduality. Graduality of a language means that the transition from a very loose, dynamic style to a precise, statically typed style should be as smooth as possible. More concretely, it means that changing the types used in the program to be less dynamic should lead to a refinement of the program’s behavior: if the term satisfies the new type, it should behave as before, but otherwise it should produce a dynamic type error.

We can formalize this idea by modeling our gradually typed language as a category internal to preorders: types and terms have related “dynamism” orderings (denoted by \sqsubseteq) and all type and term constructors are monotone with respect to these orderings. Then we can characterize the dynamic type as being the most dynamic type and the type error as a least dynamic term of each type. Making everything internal to preorders reproduces exactly the rules of type and term dynamism that programming language researchers have developed based on their operational intuitions.

We can then make a simple logic of type and term dynamism that we call “Preorder Type Theory”. Since we are doing cartesian type theory, it is the internal logic of virtually cartesian preorder categories, rather than plain preorder categories due to the structure of contexts. In the paper I present VCP categories by a direct definition, but behind the scenes I used Crutwell-Shulman’s notion of normalized T-monoid to figure out what the definition should be.

Casts and Equipments

While preorder type theory can express the basic ideas of a dynamic type and type errors, the key ingredient of gradual typing that we are missing is that we should be able to cast terms of one type to another so that we can still program in a dynamically typed style.

Gradually typed languages in the literature do this by having a cast form BAt\langle B \Leftarrow A\rangle t in their language whose semantics is defined by induction on A,BA,B. Our approach is to carve out 2 subsets of these casts which have universal properties with respect to the preorder category structure, which are the upcasts BAt\langle B \leftarrowtail A\rangle t and downcasts ABu\langle A \twoheadleftarrow B\rangle u which can only be formed when ABA \sqsubseteq B. Then one of those “oblique” casts can be defined using the dynamic type: BAt=B??At\langle B \Leftarrow A\rangle t = \langle B \twoheadleftarrow ?\rangle \langle ? \leftarrowtail A\rangle t.

What universal property do the upcasts and downcasts have? Well, first we notice that term dynamism gives a relational profunctor between terms of type AA and type BB (i.e. a relation that is monotone in BB and antitone in AA). Then we characterize the upcast and downcast as left- and right-representables for that profunctor, which consequently means the upcast is left adjoint to the downcast. More explicitly, this means for t:A,u:Bt : A, u : B we have BAtututABu\langle B \leftarrowtail A\rangle t \sqsubseteq u \iff t \sqsubseteq u \iff t \sqsubseteq \langle A \twoheadleftarrow B\rangle u

By uniqueness of representables, this defines the upcasts and downcasts uniquely up to order-equivalence, giving us a specification for the casts. We show that this specification, combined with the monotonicity of all the term connectives and the β,η\beta,\eta rules of the connectives allow us to derive the standard implementations of these casts as the unique solutions to this specification.

If you’re familiar with equipments this structure should look familiar: it gives a functor from the thin category ordering the objects to the category of adjunctions in the term category. This is a special case of a proarrow equipment where we view our preorder category as a “vertically thin” double category. So we extend our syntax to be the internal logic of vertically thin cartesian preorder categories that are equipments, and can model type error, dynamic type, functions and products and we call that system “Gradual Type Theory”. Then in that syntax we give some synthetic proofs of the uniqueness theorems for the casts and a few other theorems about equipments with certain universal properties.

Constructing Models

Finally, we give a construction for models of gradual type theory that extends Dana Scott’s classical models of dynamic typing to gradual typing. This connects our semantics to previous programming languages approaches and proves consistency for the system.

Briefly, we start with a locally thin 2-category CC (such as the category of domains) and construct an equipment by defining the vertical arrows to be coreflections in CC, which are adjunctions where the right adjoint is a retract of the left. Then we pick an object dd in CC and take a kind of vertical slice category C/dC/d. So the objects are coreflections into dd and the vertical arrows are commuting triangles of coreflections.

We use coreflections rather than just adjunctions for 2 reasons: first a sociological reason is that the casts people have developed are coreflections, and second a technical reason is that every coreflection is a monomorphism and so when we take the slice category we get a thin category and so our double category becomes a preorder category. This allows us to interpret type and term dynamism as orderings rather than a more complex category structure.

We then show how Dana Scott’s domain theoretic models of dynamic typing extend to a model of gradual type theory (where the type error is a diverging program) and another model where the type error and diverging program are different.

John PreskillThe Ground Space of Babel

Librarians are committing suicide.

So relates the narrator of the short story “The Library of Babel.” The Argentine magical realist Jorge Luis Borges wrote the story in 1941.

Librarians are committing suicide partially because they can’t find the books they seek. The librarians are born in, and curate, a library called “infinite” by the narrator. The library consists of hexagonal cells, of staircases, of air shafts, and of closets for answering nature’s call. The narrator has never heard of anyone’s finding an edge of the library. Each hexagon houses 20 shelves, each of which houses 32 books, each of which contains 410 pages, each of which contains 40 lines, each of which consists of about 80 symbols. Every symbol comes from a set of 25: 22 letters, the period, the comma, and the space.

The library, a sage posited, contains every combination of the 25 symbols that satisfy the 410-40-and-80-ish requirement. His compatriots rejoiced:

All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon. [ . . . ] a great deal was said about the Vindications: books of apology and prophecy which vindicated for all time the acts of every man in the universe and retained prodigious arcana for his future. Thousands of the greedy abandoned their sweet native hexagons and rushed up the stairways, urged on by the vain intention of finding their Vindication.

Probability punctured their joy: “the possibility of a man’s finding his Vindication, or some treacherous variation thereof, can be computed as zero.”

Many-body quantum physicists can empathize with Borges’s librarian.

A handful of us will huddle over a table or cluster in front of a chalkboard.

“Has anyone found this Hamiltonian’s ground space?” someone will ask.1

Library of Babel

A Hamiltonian is an observable, a measurable property. Consider a quantum system S, such as a set of particles hopping between atoms. We denote the system’s Hamiltonian by H. H determines how the system’s state changes in time. A musical about H swept Broadway last year.

A quantum system’s energy, E, might assume any of many possible values. H encodes the possible values. The least possible value, E0, we call the ground-state energy.

Under what condition does S have an amount E0 of energy? S must occupy a ground state. Consider Olympic snowboarder Shaun White in a half-pipe. He has kinetic energy, or energy of motion, when sliding along the pipe. He gains gravitational energy upon leaving the ground. He has little energy when sitting still on the snow. A quantum analog of that sitting constitutes a ground state.2

Consider, for example, electrons in a magnetic field. Each electron has a property called spin, illustrated with an arrow. The arrow’s direction represents the spin’s state. The system occupies a ground state when every arrow points in the same direction as the magnetic field.

Shaun White has as much energy, sitting on the ground in the half-pipe’s center, as he has sitting at the bottom of an edge of the half-pipe. Similarly, a quantum system might have multiple ground states. These states form the ground space.

“Has anyone found this Hamiltonian’s ground space?”

Olympic crashes

“Find” means, here,“identify the form of.” We want to derive a mathematical expression for the quantum analog of “sitting still, at the bottom of the half-pipe.”

“Find” often means “locate.” How do we locate an object such as a library? By identifying its spatial coordinates. We specify coordinates relative to directions, such as north, east, and up. We specify coordinates also when “finding” ground states.

Libraries occupy the physical space we live in. Ground states occupy an abstract mathematical space, a Hilbert space. The Hilbert space consists of the (pure) quantum states accessible to the system—loosely speaking, how the spins can orient themselves.

Libraries occupy a three-dimensional space. An N-spin system corresponds to a 2N-dimensional Hilbert space. Finding a ground state amounts to identifying 2N coordinates. The problem’s size grows exponentially with the number of particles.

An exponential quantifies also the size of the librarian’s problem. Imagine trying to locate some book in the Library of Babel. How many books should you expect to have to check? How many books does the library hold? Would you have more hope of finding the book, wandering the Library of Babel, or finding a ground state, wandering the Hilbert space? (Please take this question with a grain of whimsy, not as instructions for calculating ground states.)

A book’s first symbol has one of 25 possible values. So does the second symbol. The pair of symbols has one of 25 \times 25 = 25^2 possible values. A trio has one of 25^3 possible values, and so on.

How many symbols does a book contain? About \frac{ 410 \text{ pages} }{ 1 \text{ book} }  \:  \frac{ 40 \text{ lines} }{ 1 \text{ page} }  \:  \frac{ 80 \text{ characters} }{ 1 \text{ line} }  \approx  10^6 \, , or a million. The number of books grows exponentially with the number of symbols per book: The library contains about 25^{ 10^6 } books. You contain only about 10^{24} atoms. No wonder librarians are committing suicide.

Do quantum physicists deserve more hope? Physicists want to find ground states of chemical systems. Example systems are discussed here and here. The second paper refers to 65 electrons distributed across 57 orbitals (spatial regions). How large a Hilbert space does this system have? Each electron has a spin that, loosely speaking, can point upward or downward (that corresponds to a two-dimensional Hilbert space). One might expect each electron to correspond to a Hilbert space of dimensionality (57 \text{ orbitals}) \frac{ 2 \text{ spin states} }{ 1 \text{ orbital} } = 114. The 65 electrons would correspond to a Hilbert space \mathcal{H}_{\rm tot} of dimensionality 114^{65}.

But no two electrons can occupy the same one-electron state, due to Pauli’s exclusion principle. Hence \mathcal{H}_{\rm tot} has dimensionality {114 \choose 65} (“114 choose 65″), the number of ways in which you can select 65 states from a set of 114 states.

{114 \choose 65} equals approximately 10^{34}. Mathematica (a fancy calculator) can print a one followed by 34 zeroes. Mathematica refuses to print the number 25^{ 10^6 } of Babel’s books. Pity the librarians more than the physicists.

Catalogue

Pity us less when we have quantum computers (QCs). They could find ground states far more quickly than today’s supercomputers. But building QCs is taking about as long as Borges’s narrator wandered the library, searching for “the catalogue of catalogues.”

What would Borges and his librarians make of QCs? QCs will be able to search unstructured databases quickly, via Grover’s algorithm. Babel’s library lacks structure. Grover’s algorithm outperforms classical algorithms just when fed large databases. 25^{ 10^6 } books constitute a large database. Researchers seek a “killer app” for QCs. Maybe Babel’s librarians could vindicate quantum computing and quantum computing could rescue the librarians. If taken with a grain of magical realism.

 

1Such questions remind me of an Uncle Alfred who’s misplaced his glasses. I half-expect an Auntie Muriel to march up to us physicists. She, sensible in plaid, will cross her arms.

“Where did you last see your ground space?” she’ll ask. “Did you put it on your dresser before going to bed last night? Did you use it at breakfast, to read the newspaper?”

We’ll bow our heads and shuffle off to double-check the kitchen.

2More accurately, a ground state parallels Shaun White’s lying on the ground, stone-cold.

February 18, 2018

David Hoggnegatory

Tried to write; total fail. Doing stuff for my two jobs. Not complaining! Just not researching, not today.

David Hoggcomoving and coeval stars; and the pre-infall Sagittarius

At Gaia DR2 prep meeting, I discussed comoving stars and related matters with Oh and Price-Whelan. We discussed moving from our work (in DR1) that made use of marginalized likelihoods for catalog generation to a parameter-estimation method. What would that look like? As my loyal reader knows, I prefer parameter-estimation methods, for both pragmatic and philosophical reasons. But once you go to parameter-estimation methods, there are lots of parameters you could in principle estimate. For example: You can look at the space-time event to which the two stars made their closest approach in the past, and how far apart they would be at that point. If the separation is small, then coeval? That might be much more interesting than co-moving, in the long run.

At Stars group meeting, Allyson Sheffield (CUNY) and Jeff Carlin (LSST) showed us results on abundances of M-type giant stars in the Sagittarius tidal streams. They can clearly see that the progenitor of the stream had element-abundance gradients in it prior to tidal stripping. They also show that the stream matches onto the abundances trend of the Sagittarius dwarf body. But the coolest thing they showed is that there are two different types of alpha elements, which they called explosive and hydrostatic, and the two types have different trends. I need to check this in APOGEE! Sheffield also mentioned some (possibly weak) evidence that the bifurcation in the stream is not from multiple wraps of the stream but rather because the object that tidally shredded was a binary galaxy (galaxy and satellite) pair! I hope that's true, because it's super cool.

John PreskillFractons, for real?

“Fractons” is my favorite new toy (short for quantum many-body toy models). It has amazing functions that my old toys do not have; it is so new that there are tons of questions waiting to be addressed; it is perfectly situated at the interface between quantum information and condensed matter and has attracted a lot of interest and efforts from both sides; and it gives me excuses and new incentives to learn more math. I have been having a lot of fun playing with it in the last couple years and in the process, I had the great opportunity to work with some amazing collaborators: Han Ma and Mike Hermele at Boulder, Ethan Lake at MIT, Wilbur Shirley at Caltech, Kevin Slagle at U Toronto and Zhenghan Wang at Station Q. Together we have written a few papers on this subject, but I always felt there are more interesting stories and more excitement in me than what can be properly contained in scientific papers. Hence this blog post.

How I first learned about Fractons

Qmem

Back in the early 2000s, a question that kept attracting and frustrating people in quantum information is how to build a quantum hard drive to store quantum information. This is of course a natural question to ask as quantum computing has been demonstrated to be possible, at least in theory, and experimental progress has shown great potential. It turned out, however, that the question is one of those deceptively enticing ones which are super easy to state, but extremely hard to answer. In a classical hard drive, information is stored using magnetism. Quantum information, instead of being just 0 and 1, is represented using superpositions of 0 and 1, and can be probed in non-commutative ways (that is, measuring along different directions can alter previous answers). To store quantum information, we need “magnetism” in all such non-commutative channels. But how can we do that?

At that time, some proposals had been made, but they either involve actively looking for and correcting errors throughout the time during which information is stored (which is something we never have to do with our classical hard drives) or going into four spatial dimensions. Reliable passive storage of quantum information seemed out of reach in the three-dimensional world we live in, even at the level of a proof of principle toy model!

Given all the previously failed attempts and without a clue about where else to look, this problem probably looked like a dead-end to many. But not to Jeongwan Haah, a fearless graduate student in Preskill’s group at IQIM at that time, who turned the problem from guesswork into a systematic computer search (over a constrained set of models). The result of the search surprised everyone. Jeongwan found a three-dimensional quantum spin model with physical properties that had never been seen before, making it a better quantum hard drive than any other model that we know of!

The model looks surprising not only to the quantum information community, but even more so to the condensed matter community. It is a strongly interacting quantum many-body model, a subject that has been under intense study in condensed matter physics. Yet it exhibits some very strange behaviors whose existence had not even been suspected. It is a condensed matter discovery made not from real materials in real experiments, but through computer search!

fractal

Excitations (stars) in Haah’s code live at the corner of a fractal.

In condensed matter systems, what we know can happen is that elementary excitations can come in the form of point particles – usually called quasi-particles – which can then move around and interact with other excitations. In Jeongwan’s model, now commonly referred to as Haah’s code, elementary excitations still come in the form of point particles, but they cannot freely move around. Instead, if they want to move, four of them have to coordinate with each other to move together, so that they stay at the vertices of a fractal shaped structure! The restricted motion of the quasi-particles leads to slower dynamics at low energy, making the model much better suited for the purpose of storing quantum information.

But how can something like this happen? This is the question that I want to yell out loud every time I read Jeongwan’s papers or listen to his talks. Leaving aside the motivation of building a quantum hard drive, this model presents a grand challenge to the theoretical framework we now have in condensed matter. All of our intuitions break down in predicting the behavior of this model; even some of the most basic assumptions and definitions do not apply.

Haahcode

The interactions in Haah’s code involve eight spins at a time (the eight Z’s and eight X’s in each cube).

I felt so uncomfortable and so excited at the same time because there was something out there that should be related to things I know, yet I totally did not understand how. And there was an even bigger problem. I was like a sick person going to a doctor but unable to pinpoint what was wrong. Something must have been wrong, but I didn’t know what that was and I didn’t know how to even begin to look for it. The model looked so weird. Interaction involved eight spins at a time; there was no obvious symmetry other than translation. Jeongwan, with his magic math power, worked out explicitly many of the amazing properties of the model, but that to me only added to the mystery. Where did all these strange properties coming from?

From the unfathomable to the seemingly approachable

I remained in this superposition of excited state and powerless state for several years, until Jeongwan moved to MIT and posted some papers with Sagar Vijay and Liang Fu in 2015 and 2016.

Xcube

Interaction terms in a nicer looking fracton model.

In these papers, they listed several other models, which, similar to Haah’s code, contain quasi-particle excitations whose motion is constrained. The constraints are weaker and these models do not make good quantum hard drives, but they still represent new condensed matter phenomena. What’s nice about these models is that the form of interaction is more symmetric, takes a simpler form, or is similar to some other models we are familiar with. The quasi-particles do not need a fractal-shaped structure to move around, instead they move along a line, in a plane, or at the corner of a rectangle. In fact, as early as 2005 – six years before Haah’s code, Claudio Chamon at Boston University already proposed a model of this kind. Together with the previous fractal examples, these models are what’s now being referred to as the fracton models. If the original Haah’s code looks like an ET from beyond the milky way, these models at least seem to live somewhere in the solar system. So there must be something that we can do to understand them better!

Obviously, I was not the only one who felt this way. A flurry of papers appeared on these “fracton” models. People came at these models armed with their favorite tools in condensed matter, looking for an entry point to crack them open. The two approaches that I found most attractive was the coupled layer construction and the higher rank gauge theory, and I worked on these ideas together with Han Ma, Ethan Lake and Michael Hermele. Each approach comes from a different perspective and establishes a connection between fractons and physical models that we are familiar with. In the coupled layer construction, the connection is to the 2D discrete gauge theories, while in the higher rank approach it is to the 3D gauge theory of electromagnetism.

I was excited about these results. They each point to simple physical mechanisms underlying the existence of fractons in some particular models. By relating these models to things I already know, I feel a bit relieved. But deep down, I know that this is far from the complete story. Our understanding barely goes beyond the particular models discussed in the paper. In condensed matter, we spend a lot of time studying toy models; but toy models are not the end goal. Toy models are only meaningful if they represent some generic feature in a whole class of models. It is not clear at all to what extent this is the case for fractons.

Step zero: define “order”, define “topological order”

I gave a talk about these results at KITP last fall under the title “Fracton Topological Order”. It was actually too big a title because all we did was to study specific realizations of individual models and analyze their properties. To claim topological order, one needs to show much more. The word “order” refers to the common properties of a continuous range of models within the same phase. For example, crystalline order refers to the regular lattice organization of atoms in the solid phase within a continuous range of temperature and pressure. When the word “topological” is added in front of “order”, it signifies that such properties are usually directly related to the topology of the system. A prototypical example is the fractional quantum Hall system, whose ground state degeneracy is directly determined by the topology of the manifold the system lives in. For fractons, we are far from an understanding at this level. We cannot answer basic questions like what range of models form a phase, what is the order (the common properties of this whole range of models) characterizing each phase, and in what sense is the order topological. So, the title was more about what I hope will happen than what has already happened.

But it did lead to discussions that could make things happen. After my talk, Zhenghan Wang, a mathematician at Microsoft Station Q, said to me, “I would agree these fracton models are topological if you can show me how to define them on different three manifolds”. Of course! How can I claim anything related to topology if all I know is one model on a cubic lattice with periodic boundary condition? It is like claiming a linear relation between two quantities with only one data point.

But how to get more data points? Well, from the paper by Haah, Vijay and Fu, we knew how to define the model on cubic lattices. With periodic boundary conditions, the underlying manifold is a three torus. Is it possible to have a cubic lattice, or something similar, in other three manifolds as well? Usually, this kind of request would be too much to ask. But it turns out that if you whisper your wish to the right mathematician, even the craziest ones can come true. With insightful suggestions from Michael Freedman (the Fields medalist leading Station Q) and Zhenghan, and through the amazing work of Kevin Slagle (U Toronto) and Wilbur Shirley (Caltech), we found that if we make use of a structure called Total Foliation, one of the fracton models can be generalized to different kinds of three manifolds and we can see how the properties of the model are related to certain topological features of the manifold!

Foliation

Foliation.

Foliation is the process of dividing a manifold into parallel planes. Total foliation is a set of three foliations which intersect each other in a transversal way. The xy, yz, and zx planes in a cubic lattice form a total foliation and similar constructions can be made for other three manifolds as well.

Things start to get technical from here, but the basic lesson we learned about some of the fracton models is that structural-wise, they pretty much look like an onion. Even though onions look like a three-dimensional object from the outside, they actually grow in a layered structure. Some of the properties of the fracton models are simply determined by the layers, and related

Onionto the topology of the layers. Once we peel off all the layers, we find that for some, there is nothing left while for others, there is a nontrivial core. This observation allows us to better address the previous questions: we defined a fracton phase (one type of it) as models smoothly related to each other after adding or removing layers; the topological nature of the order is manifested in how the properties of the model are determined by the topology of the layers.

staringThe onion structure is nice, because it allows us to reduce much of the story from 3D to 2D, where we understand things much better. It clarifies many of the weirdnesses of the fracton model we studied, and there is indication that it may apply to a much wider range of fracton models, so we have an exciting road ahead of us. On the other hand, it is also clear that the onion structure does not cover everything. In particular, it does not cover Haah’s code! Haah’s code cannot be built in a layered way and its properties are in a sense intrinsically three dimensional. So, after finishing this whole journey through the onion field, I will be back to staring at Haah’s code again and wondering what to do with it, like what I have been doing in the eight years since Jeongwan’s paper first came out. But maybe this time I will have some better ideas.

Tommaso DorigoSearching For Light Dark Matter: A Gedankenexperiment

Dark Matter (DM), the mysterious substance that vastly dominates the total mass of our universe, is certainly one of the most surprising and tough puzzles of contemporary science. We do not know what DM is, but on the other hand we have a large body of evidence that there must be "something" in the universe that causes a host of effects we observe and which would have no decent explanation otherwise. 

read more

February 17, 2018

David Hoggwriting on dimensionality

Because of work Bedell did (on a Sunday!) in support of the Milky Way Mapper meeting, I got renewed excitement about our element-abundance-space dimensionality and diversity work: She was able to show that we can see aspects of the low dimensionality of the space in the spectra themselves, mirroring work done by Price-Jones (Toronto) in APOGEE, but with more specificity about the abundance origins of the dimensionality. That got me writing text in a document. As my loyal reader knows, I am a strong believer in writing text during (not after) the data-analysis phases. I'm also interested in looking at information-theoretic or prediction or measurement approaches to dimensionality.

February 16, 2018

Matt von HippelValentine’s Day Physics Poem 2018

Valentine’s Day was this week, so long-time readers should know what to expect. To continue this blog’s tradition, I’m posting another one of my old physics poems.

 

Winding Number One

 

When you feel twisted up inside, you may be told to step back

That after a long time, from a long distance

All things fall off.

 

So I stepped back.

 

But looking in from a distance

On the border (at infinity)

A shape remained

Etched deep

In the equation of my being

 

A shape that wouldn’t fall off

Even at infinity.

 

And they may tell you to wait and see,

That you will evolve in time

That all things change, continuously.

 

So I let myself change.

 

But no matter how long I waited

How much I evolved

I could not return

My new state cannot be deformed

To what I was before.

 

The shape at my border

Is basic, immutable.

 

Faced with my thoughts

I try to draw a map

And run out of space.

 

I need two selves

Two lives

To map my soul.

 

A double cover.

 

And now, faced by my dual

Tracing each index

Integrated over manifold possibilities

We do not vanish

We have winding number one.

 

Doug NatelsonPhysics in the kitchen: Jamming

Last weekend while making dinner, I came across a great example of emergent physics.  What you see here are a few hundred grams of vacuum-packed arborio rice:
The rice consists of a few thousand oblong grains whose only important interactions here are a mutual "hard core" repulsion.  A chemist would say they are "sterically hindered".  An average person would say that the grains can't overlap.  The vacuum packing means that the whole ensemble of grains is being squeezed by the pressure of the surrounding air, a pressure of around 101,000 N/m2 or 14.7 pounds per in2.  The result is readily seen in the right hand image:  The ensemble of rice forms a mechanically rigid rectangular block.  Take my word for it, it was hard as a rock. 

However, as soon as I cut a little hole in the plastic packaging and thus removed the external pressure on the rice, the ensemble of rice grains lost all of its rigidity and integrity, and was soft and deformable as a beanbag, as shown here. 

So, what is going on here?  How come this collection of little hard objects acts as a single mechanically integral block when squeezed under pressure?  How much pressure does it take to get this kind of emergent rigidity?  Does that pressure depend on the size and shape of the grains, and whether they are deformable? 

This onset of collective resistance to deformation is called jamming.  This situation is entirely classical, and yet the physics is very rich.  This problem is clearly one of classical statistical physics, since it is only well defined in the aggregate and quantum mechanics is unimportant.  At the same time, it's very challenging, because systems like this are inherently not in thermal equilibrium.  When jammed, the particles are mechanically hindered and therefore can't explore lots of possible configurations.   It is possible to map out a kind of phase diagram of how rigid or jammed a system is, as a function of free volume, mechanical load from the outside, and temperature (or average kinetic energy of the particles).   For good discussions of this, try here (pdf), or more technically here and here.   Control over jamming can be very useful, as in this kind of gripping manipulator (see here for video).  



February 15, 2018

BackreactionWhat does it mean for string theory that the LHC has not seen supersymmetric particles?

And what will become of supersymmetric string theory if Susy is bumped off ? — Suhail Manzoor (@suhailski) February 13, 2018 What are the consequences of no SUSY? — John Deer (@johnxdeer) February 13, 2018 The LHC data so far have not revealed any evidence for supersymmetric particles, or any other new particles. For all we know at present, the standard model of particle

Jordan EllenbergScott Walker and the Let’s Eat Grandma theory of legislative interpretation

How do you know when to call a special election for an empty legislative seat in Wisconsin?  It’s right there in the statutes, 8.50 (4) (d):

Any vacancy in the office of state senator or representative to the assembly occurring before the 2nd Tuesday in May in the year in which a regular election is held to fill that seat shall be filled as promptly as possible by special election. However, any vacancy in the office of state senator or representative to the assembly occurring after the close of the last regular floorperiod of the legislature held during his or her term shall be filled only if a special session or extraordinary floorperiod of the legislature is called or a veto review period is scheduled during the remainder of the term. The special election to fill the vacancy shall be ordered, if possible, so the new member may participate in the special session or floorperiod.

Pretty clear, right?  If a Senate or Assembly seat comes open before May of election year,  the governor has to call a special election, unless the last legislative session has already taken place and no extra legislative business is scheduled before November.  You hold an election unless the duration of the vacancy would be so short as to make the election essentially meaningless.

There are two seats in the Capitol open as we speak, the Senate seat formerly held by Frank Lasee and the Assembly seat once occupied Keith Ripp; both of them left to take jobs in the Walker administration in January.  But the governor has asserted that no special election will be held, and residents of those districts will go unrepresented in the legislature for almost a full year.

What’s Walker’s excuse for ignoring the law?  Are you sitting down?  The state’s claim is that the phrase “in the year” does not refer to “May,” but rather “any vacancy.”  So a vacancy arising in March 2018 is required by law to be filled “as promptly as possible” by state law, despite the severely limited amount of lawmaking the new representative would be have a chance to undertake; but if an assembly rep drops dead on the second day of the legislative term, the governor can leave the seat empty for two whole years if he wants.

I kid you not! That is the claim!

Do you think that’s really what the law says?

As this long, well-researched WisContext article makes clear, Walker’s “interpretation” of the law is, well, a novelty.  For fifty years, Wisconsin has been filling legislative vacancies promptly by special elections.  Most of these elections, according to Scott Walker, were optional, some kind of gubernatorial whim.  And it’s definitely not the case that the governor is leaving the seats empty because he’s spooked by the current lust-to-vote of Wisconsin’s Democratic electorate, which has already cost Republicans a long-held seat in Senate District 10.

The Walker administration would like us to read the law as if the phrases came in the opposite order:

Any vacancy in the office of state senator or representative to the assembly occurring in the year in which a regular election is held to fill that seat, before the 2nd Tuesday in May

But English is non-commutative; that sentence says one thing, and 8.50 (4)(d) says a different thing.

Even an extra comma would make Walker’s interpretation reasonable:

Any vacancy in the office of state senator or representative to the assembly occurring before the 2nd Tuesday in May, in the year in which a regular election is held to fill that seat

Commas change meaning.  As the old T-shirt says:  let’s eat grandma!

I suppose we should count ourselves lucky.  Given the syntactic latitude Walker has granted himself, where a prepositional phrase can wander freely throughout a sentence modifying whatever catches its fancy, he might have claimed a special selection is required only if a legislative vacancy occurs in May of an election year!  That would make just as much sense as the interpretation Walker’s claiming now.  Which is to say:  none.

What’s the remedy here?  I’m not sure there is one.  Someone in one of the affected districts could sue the state, but I don’t think there’s any prospect a lawsuit would conclude in time to make any difference.  I can’t see a court ordering an emergency halt to a legislative session on the grounds that two seats were illegally unfilled.

So there’s not much to stop the governor from breaking state law in this way.  Except natural human embarrassment.  A government that has lost the capacity to be embarrassed can be very difficult to constrain.

 

 

 

David HoggFML, and the Big Bounce

The day started with a realization by Price-Whelan (Princeton) and me that, in our project The Joker, because of how we do our sampling, we have everything we need at the end of the sampling to compute precisely the fully marginalized likelihood of the input model. That's useful, because we are not just making posteriors, we are also making decisions (about, say, what to put in a table or what to follow up). Of course (and as my loyal reader knows), I don't think it is ever a good idea to compute the FML!

At lunch, Paul Steinhardt (Princeton) gave a great black-board talk about the idea that the Universe might have started in a bounce from a previously collapsing universe. His main point (from my perspective; he also has particle-physics objectives) is that the work that inflation does with a quantum mechanism might be possible to achieve with a classical mechanism, if you could design the bounce right. I like that, of course, because I am skeptical that the original fluctuations are fundamentally quantum in nature. I have many things to say here, but I'll just say a few random thoughts: One is that the strongest argument for inflation is the causality argument, and that can be achieved with other space-time histories, like a bounce. That is, the causality (and related problems) are fundamentally about the geometry of the space and the horizon as a function of time, and there are multiple possible universe-histories that would address the problem. So that's a good idea. Another random thought is that there is no way to make the bounce happen (people think) without violating the null-energy condition. That's bad, but so are various things about inflation! A third thought is that the pre-universe (the collapsing one) probably has to be filled with something very special, like a few scalar fields. That's odd, but so is the inflaton! And those fields could be classical. I walked into this talk full of skepticism, and ended up thinking it's a pretty good program to be pursuing.

David Hoggwelcome to the Milky Way Mapper

Today was the (unfortunately Sunday) start to the first full meeting of the Milky Way Mapper team, where MWM is a sub-part of the proposed project SDSS-V, of which I will be a part. It was very exciting! The challenge is to map a large fraction of the Milky Way in red-giant stars (particularly cool, luminous giants), but also get a full census of binary stars in different states of evolution, and follow up exoplanets and other scientific goals. Rix was in town, and pointed out that the survey needs a description that can be stated in two sentences. Right now it is a mix of projects, and doesn't have a description shorter than two dense slides! But it's really exciting and will support an enormous range of science.

There were many highlights of the meeting for me, most of them about technical issues like selection function, adaptive survey design, and making sensitive statistical tests of exoplanet systems. There was also a lot of good talk about how to do non-trivial inferences about binary-star populations with very few radial-velocity measurements per star. That is where Price-Whelan and I shine! Another subject that I was excited about is how one can design a survey that is simultaneously simple to operate but also adaptive as it goes: Can we algorithmically modify what we observe and when based on past results, increase efficiency (on, say, binary stars or exoplanets), but nonetheless produce a survey that is possible to model and understand for population statistics? Another subject was validation of stellar parameter estimates: How to know that we are getting good answers? As my loyal reader can anticipate, I was arguing that such tests ought to be made in the space of the data. Can they be?

n-Category Café Physics and 2-Groups

The founding vision of this blog, higher gauge theory, progresses. Three physicists have just brought out Exploring 2-Group Global Symmetries . Two of the authors have worked with Nathan Seiberg, a highly influential physicist, who, along with three others, had proposed in 2014 a program to study higher form field symmetries in QFT (Generalized Global Symmetries), without apparently being aware that this idea was considered before.

Eric Sharpe then published an article the following year, Notes on generalized global symmetries in QFT, explaining how much work had already been done along these lines:

The recent paper [1] proposed a more general class of symmetries that should be studied in quantum field theories: in addition to actions of ordinary groups, it proposed that we should also consider ‘groups’ of gauge fields and higher-form analogues. For example, Wilson lines can act as charged objects under such symmetries. By using various defects, the paper [1] described new characterizations of gauge theory phases.

Now, one can ask to what extent it is natural for n-forms as above to form a group. In particular, because of gauge symmetries, the group multiplication will not be associative in general, unless perhaps one restricts to suitable equivalence classes, which does not seem natural in general. A more natural understanding of such symmetries is in terms of weaker structures known as 2-groups and higher groups, in which associativity is weakened to hold only up to isomorphisms.

There are more 2-groups and higher groups than merely, ‘groups’ of gauge fields and higher-form tensor potentials (connections on bundles and gerbes), and in this paper we will give examples of actions of such more general higher groups in quantum field theory and string theory. We will also propose an understanding of certain anomalies as transmutations of symmetry groups of classical theories into higher group actions on quantum theories.

In the new paper by Cordova, Dumitrescu, and Intriligator, along with references to some early work by John and Urs, Sharpe’s article is acknowledged, and yet taken to be unusual in considering more than discrete 2-groups:

The continuous 2-group symmetries analyzed in this paper have much in common with their discrete counterparts. Most discussions of 2-groups in the literature have focused on the discrete case (an exception is [17]). (p. 26)

Maybe it’s because they’ve largely encountered people extending Dijkgraaf-Witten theory, such as in Higher symmetry and gapped phases of gauge theories where the authors:

… study topological field theory describing gapped phases of gauge theories where the gauge symmetry is partially Higgsed and partially confined. The TQFT can be formulated both in the continuum and on the lattice and generalizes Dijkgraaf-Witten theory by replacing a finite group by a finite 2-group,

but then Sharpe had clearly stated that

The purpose of this paper is to merely to link the recent work [1] to other work on 2-groups, to review a few highlights, and to provide a few hopefully new results, proposals, and applications,

and he refers to a great many items on higher gauge theory, much of it using continuous, and smooth, 2-groups and higher groups.

Still, even if communication channels aren’t as fast as they might be, things do seem to be moving.

February 14, 2018

Terence TaoPolymath15, third thread: computing and approximating H_t

This is the third “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}, continuing this previous thread. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

We are making progress on the following test problem: can one show that {H_t(x+iy) \neq 0} whenever {t = 0.4}, {x \geq 0}, and {y \geq 0.4}? This would imply that

\displaystyle  \Lambda \leq 0.4 + \frac{1}{2} (0.4)^2 = 0.48

which would be the first quantitative improvement over the de Bruijn bound of {\Lambda \leq 1/2} (or the Ki-Kim-Lee refinement of {\Lambda < 1/2}). Of course we can try to lower the two parameters of {0.4} later on in the project, but this seems as good a place to start as any. One could also potentially try to use finer analysis of dynamics of zeroes to improve the bound {\Lambda \leq 0.48} further, but this seems to be a less urgent task.

Probably the hardest case is {y=0.4}, as there is a good chance that one can then recover the {y>0.4} case by a suitable use of the argument principle. Here we appear to have a workable Riemann-Siegel type formula that gives a tractable approximation for {H_t}. To describe this formula, first note that in the {t=0} case we have

\displaystyle  H_0(z) = \frac{1}{8} \xi( \frac{1+iz}{2} )

and the Riemann-Siegel formula gives

\displaystyle  \xi(s) = \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{1}{n^s}

\displaystyle  + \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{1}{m^{1-s}}

\displaystyle  + \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \frac{e^{-i\pi s} \Gamma(1-s)}{2\pi i} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1}\ dw

for any natural numbers {N,M}, where {C_M} is a contour from {+\infty} to {+\infty} that winds once anticlockwise around the zeroes {e^{2\pi im}, |m| \leq M} of {e^w-1} but does not wind around any other zeroes. A good choice of {N,M} to use here is

\displaystyle  N=M=\lfloor \sqrt{\mathrm{Im}(s)/2\pi}\rfloor = \lfloor \sqrt{\mathrm{Re}(z)/4\pi} \rfloor. \ \ \ \ \ (1)

In this case, a classical steepest descent computation (see wiki) yields the approximation

\displaystyle  \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1}\ dw \approx - (2\pi i M)^{s-1} \Psi( \frac{s}{2\pi i M} - N )

where

\displaystyle  \Psi(\alpha) := 2\pi \frac{\cos \pi(\frac{1}{2}\alpha^2 - \alpha - \pi/8)}{\cos(\pi \alpha)} \exp( \frac{i\pi}{2} \alpha^2 - \frac{5\pi}{8} ).

Thus we have

\displaystyle  H_0(z) \approx A^{(0)} + B^{(0)} - C^{(0)}

where

\displaystyle  A^{(0)} := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{1}{n^s}

\displaystyle  B^{(0)} := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{1}{m^{1-s}}

\displaystyle  C^{(0)} := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \frac{e^{-i\pi s} \Gamma(1-s)}{2\pi i} (2\pi i M)^{s-1} \Psi( \frac{s}{2\pi i M} - N )

with {s := \frac{1+iz}{2}} and {N,M} given by (1).

Heuristically, we have derived (see wiki) the more general approximation

\displaystyle  H_t(z) \approx A + B - C

for {t>0} (and in particular for {t=0.4}), where

\displaystyle  A := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{\exp(\frac{t}{16} \log^2 \frac{s+4}{2\pi n^2} )}{n^s}

\displaystyle  B := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{\exp(\frac{t}{16} \log^2 \frac{5-s}{2\pi m^2} )}{m^{1-s}}

\displaystyle  C := \exp(-\frac{t \pi^2}{64}) C^{(0)}.

In practice it seems that the {C} term is negligible once the real part {x} of {z} is moderately large, so one also has the approximation

\displaystyle  H_t(z) \approx A + B.

For large {x}, and for fixed {t,y>0}, e.g. {t=y=0.4}, the sums {A,B} converge fairly quickly (in fact the situation seems to be significantly better here than the much more intensively studied {t=0} case), and we expect the first term

\displaystyle  B_0 := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \exp( \frac{t}{16} \log^2 \frac{5-s}{2\pi} )

of the {B} series to dominate. Indeed, analytically we know that {\frac{A+B-C}{B_0} \rightarrow 1} (or {\frac{A+B}{B_0} \rightarrow 1}) as {x \rightarrow \infty} (holding {y} fixed), and it should also be provable that {\frac{H_t}{B_0} \rightarrow 1} as well. Numerically with {t=y=0.4}, it seems in fact that {\frac{A+B-C}{B_0}} (or {\frac{A+B}{B_0}}) stay within a distance of about {1/2} of {1} once {x} is moderately large (e.g. {x \geq 2 \times 10^5}). This raises the hope that one can solve the toy problem of showing {H_t(x+iy) \neq 0} for {t=y=0.4} by numerically controlling {H_t(x+iy) / B_0} for small {x} (e.g. {x \leq 2 \times 10^5}), numerically controlling {(A+B)/B_0} and analytically bounding the error {(H_t - A - B)/B_0} for medium {x} (e.g. {2 \times 10^5 \leq x \leq 10^7}), and analytically bounding both {(A+B)/B_0} and {(H_t-A-B)/B_0} for large {x} (e.g. {x \geq 10^7}). (These numbers {2 \times 10^5} and {10^7} are arbitrarily chosen here and may end up being optimised to something else as the computations become clearer.)

Thus, we now have four largely independent tasks (for suitable ranges of “small”, “medium”, and “large” {x}):

  1. Numerically computing {H_t(x+iy) / B_0} for small {x} (with enough accuracy to verify that there are no zeroes)
  2. Numerically computing {(A+B)/B_0} for medium {x} (with enough accuracy to keep it away from zero)
  3. Analytically bounding {(A+B)/B_0} for large {x} (with enough accuracy to keep it away from zero); and
  4. Analytically bounding {(H_t - A - B)/B_0} for medium and large {x} (with a bound that is better than the bound away from zero in the previous two tasks).

Note that tasks 2 and 3 do not directly require any further understanding of the function {H_t}.

Below we will give a progress report on the numeric and analytic sides of these tasks.

— 1. Numerics report (contributed by Sujit Nair) —

There is some progress on the code side but not at the pace I was hoping. Here are a few things which happened (rather, mistakes which were taken care of).

  1. We got rid of code which wasn’t being used. For example, @dhjpolymath computed {H_t} based on an old version but only realized it after the fact.
  2. We implemented tests to catch human/numerical bugs before a computation starts. Again, we lost some numerical cycles but moving forward these can be avoided.
  3. David got set up on GitHub and he is able to compare his output (in C) with the Python code. That is helping a lot.

Two areas which were worked on were

  1. Computing {H_t} and zeroes for {t} around {0.4}
  2. Computing quantities like {(A+B-C)/B_0}, {(A+B)/B_0}, {C/B_0}, etc. with the goal of understanding the zero free regions.

Some observations for {t=0.4}, {y=0.4}, {x \in ( 10^4, 10^7)} include:

  • {(A+B) / B_0} does seem to avoid the negative real axis
  • {|(A+B) / B0| > 0.4} (based on the oscillations and trends in the plots)
  • {|C/B_0|} seems to be settling around {10^{-4}} range.

See the figure below. The top plot is on the complex plane and the bottom plot is the absolute value. The code to play with this is here.

— 2. Analysis report —

The Riemann-Siegel formula and some manipulations (see wiki) give {H_0 = A^{(0)} + B^{(0)} - \tilde C^{(0)}}, where

\displaystyle  A^{(0)} = \frac{2}{8} \sum_{n=1}^N \int_C \exp( \frac{s+4}{2} u - e^u - \frac{s}{2} \log(\pi n^2) )\ du

\displaystyle  - \frac{3}{8} \sum_{n=1}^N \int_C \exp( \frac{s+2}{2} u - e^u - \frac{s}{2} \log(\pi n^2) )\ du

\displaystyle  B^{(0)} = \frac{2}{8} \sum_{m=1}^M \int_{\overline{C}} \exp( \frac{5-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) )\ du

\displaystyle  - \frac{3}{8} \sum_{m=1}^M \int_C \exp( \frac{3-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) )\ du

\displaystyle  \tilde C^{(0)} := -\frac{2}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{5-s}{2} u - e^u)\ du dw

\displaystyle  +\frac{3}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{3-s}{2} u - e^u)\ du dw

where {C} is a contour that goes from {+i\infty} to {+\infty} staying a bounded distance away from the upper imaginary and right real axes, and {\overline{C}} is the complex conjugate of {C}. (In each of these sums, it is the first term that should dominate, with the second one being about {O(1/x)} as large.) One can then evolve by the heat flow to obtain {H_t = \tilde A + \tilde B - \tilde C}, where

\displaystyle  \tilde A := \frac{2}{8} \sum_{n=1}^N \int_C \exp( \frac{s+4}{2} u - e^u - \frac{s}{2} \log(\pi n^2) + \frac{t}{16} (u - \log(\pi n^2))^2)\ du

\displaystyle  - \frac{3}{8} \sum_{n=1}^N \int_C \exp( \frac{s+2}{2} u - e^u - \frac{s}{2} \log(\pi n^2) + \frac{t}{16} (u - \log(\pi n^2))^2)\ du

\displaystyle  \tilde B := \frac{2}{8} \sum_{m=1}^M \int_{\overline{C}} \exp( \frac{5-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) + \frac{t}{16} (u - \log(\pi m^2))^2)\ du

\displaystyle  - \frac{3}{8} \sum_{m=1}^M \int_C \exp( \frac{3-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) + \frac{t}{16} (u - \log(\pi m^2))^2)\ du

\displaystyle  \tilde C := -\frac{2}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M}

\displaystyle \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{5-s}{2} u - e^u + \frac{t}{4} (i \pi(n-1/2) + \log \frac{w}{2\sqrt{\pi}} - \frac{u}{2})^2) \ du dw

\displaystyle  +\frac{3}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M}

\displaystyle \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{3-s}{2} u - e^u + \frac{t}{4} (i \pi(n-1/2) + \log \frac{w}{2\sqrt{\pi}} - \frac{u}{2})^2)\ du dw.

Steepest descent heuristics then predict that {\tilde A \approx A}, {\tilde B \approx B}, and {\tilde C \approx C}. For the purposes of this project, we will need effective error estimates here, with explicit error terms.

A start has been made towards this goal at this wiki page. Firstly there is a “effective Laplace method” lemma that gives effective bounds on integrals of the form {\int_I e^{\phi(x)} \psi(x)\ dx} if the real part of {\phi(x)} is either monotone with large derivative, or has a critical point and is decreasing on both sides of that critical point. In principle, all one has to do is manipulate expressions such as {\tilde A - A}, {\tilde B - B}, {\tilde C - C} by change of variables, contour shifting and integration by parts until it is of the form to which the above lemma can be profitably applied. As one may imagine though the computations are messy, particularly for the {\tilde C} term. As a warm up, I have begun by trying to estimate integrals of the form

\displaystyle  \int_C \exp( s (1+u-e^u) + \frac{t}{16} (u+b)^2 )\ du

for smallish complex numbers {b}, as these sorts of integrals appear in the form of {\tilde A, \tilde B, \tilde C}. As of this time of writing, there are effective bounds for the {b=0} case, and I am currently working on extending them to the {b \neq 0} case, which should give enough control to approximate {\tilde A - A} and {\tilde B-B}. The most complicated task will be that of upper bounding {\tilde C}, but it also looks eventually doable.

February 13, 2018

Doug NatelsonRice Cleanroom position

In case someone out there is interested, Rice is hiring a cleanroom research scientist.  The official job listing is here.  To be clear:  This is not a soft money position.

The Cleanroom Facility at Rice University is a shared equipment facility for enabling micro- and nanofabrication research in the Houston metropolitan area. Current equipment includes deposition, lithography, etching and a number of characterization tools. This facility attracts users from the George R. Brown School of Engineering and the Wiess School of Natural Science and regional universities and corporations whose research programs require advanced fabrication and patterning at the micro- and nanoscale. A new state of the art facility is currently being constructed and is expected to be in operation in summer 2018. Additionally, with new initiatives in Molecular Nanotechnology, the Rice University cleanroom is poised to see significant growth in the next 5-10 years. This job announcement seeks a motivated individual who can lead, manage, teach and grow this advanced facility.

The job responsibilities of a Cleanroom Research Scientist include conducting periodic and scheduled maintenance and safety check of equipment and running qualification and calibration recipes. The incumbent will be expected to maintain the highest safety standards, author and update standard operation procedures (SOPs), maintain and calibrate processes for all equipment. The Cleanroom Research Scientist will help facilitate new equipment installation, contact vendors and manufacturers and work in tandem with them to resolve equipment issues in a timely and safe manner. Efficient inventory management of parts, chemicals and supplies will be required. The Cleanroom Scientist will also oversee personal one-to-one training of users. Additionally, the incumbent will help develop cleanroom laboratory short courses that provide lectures to small groups of students. The incumbent will also coordinate with technical staff members in Rice SEA (Shared Equipment Authority).



February 12, 2018

BackreactionBook Update: First Review!

The final proofs are done and review copies were sent out. One of the happy receivers, Emmanuel Rayner, read the book within two days and so we have a first review on Goodreads now. That’s not counting the two-star review by someone who I am very sure hasn’t read the book because he “reviewed” it before there were review copies. Tells you all about online ratings you need to know. The German

February 11, 2018

John BaezA Categorical Semantics for Causal Structure

 

The school for Applied Category Theory 2018 is up and running! Students are blogging about papers! The first blog article is about a diagrammatic method for studying causality:

• Joseph Moeller and Dmitry Vagner, A categorical semantics for causal structure, The n-Category Café, 22 January 2018.

Make sure to read the whole blog conversation, since it helps a lot. People were confused about some things at first.

Joseph Moeller is a grad student at U.C. Riverside working with me and a company called Metron Scientific Solutions on “network models”—a framework for designing networks, which the Coast Guard is already interested in using for their search and rescue missions:

• John C. Baez, John Foley, Joseph Moeller and Blake S. Pollard, Network Models. (Blog article here.)

Dmitry Vagner is a grad student at Duke who is very enthusiastic about category theory. Dmitry has worked with David Spivak and Eugene Lerman on open dynamical systems and the operad of wiring diagrams.

It’s great to see these students digging into the wonderful world of category theory and its applications.

To discuss this stuff, please go to The n-Category Café.

John BaezLinguistics Using Category Theory

 

Now students in the Applied Category Theory 2018 school are reading about categories applied to linguistics. Read the blog article here for more:

• Jade Master and Cory Griffith, Linguistics using category theory, The n-Category Café, 6 February 2018.

This was written by my grad student Jade Master along with Cory Griffith, an undergrad at Stanford. Jade is currently working with me on the semantics of open Petri nets.

What’s the basic idea of this linguistics and category theory stuff? I don’t know much about this, but I can say a bit.

Since category theory is great for understanding the semantics of programming languages, it makes sense to try it for human languages, even though they’re much harder. The first serious attempt I know was by Jim Lambek, who introduced pregroup grammars in 1958:

• Joachim Lambek, The mathematics of sentence structure, Amer. Math. Monthly 65 (1958), 154–170.

In this article he hid the connection to category theory. But when you start diagramming sentences or phrases using his grammar, as below, you get planar string diagrams as shown above. So it’s not surprising—if you’re in the know—that he’s secretly using monoidal categories where every object has a right dual and, separately, a left dual.

This fact is just barely mentioned in the Wikipedia article:

Pregroup grammar.

but it’s explained in more detail here:

• A. Preller and J. Lambek, Free compact 2-categories, Mathematical Structures in Computer Science 17 (2005), 309-340.

This stuff is hugely fun, so I’m wondering why I never looked into it before! When I talked to Lambek, who is sadly no longer with us, it was mainly about his theories relating particle physics to quaternions.

Recently Mehrnoosh Sadrzadeh and Bob Coecke have taken up Lambek’s ideas, relating them to the category of finite-dimensional vector spaces. Choosing a monoidal functor from a pregroup grammar to this category allows one to study linguistics using linear algebra! This simplifies things, perhaps a bit too much—but it makes it easy to do massive computations, which is very popular in this age of “big data” and machine learning.

It also sets up a weird analogy between linguistics and quantum mechanics, which I’m a bit suspicious of. While the category of finite-dimensional vector spaces with its usual tensor product is monoidal, and has duals, it’s symmetric, so the difference between writing a word to the left of another and writing it to the right of another gets washed out! I think instead of using vector spaces one should use modules of some noncommutative Hopf algebra, or something like that. Hmm… I should talk to those folks.

To discuss this, please visit The n-Category Café, since there’s a nice conversation going on there and I don’t want to split it. There has also been a conversation on Google+, and I’ll quote some of it here, so you don’t have to run all over the internet.

Noam Zeilberger wrote:

You might have been simplifying things for the post, but a small comment anyways: what Lambek introduced in his original paper are these days usually called “Lambek grammars”, and not exactly the same thing as what Lambek later introduced as “pregroup grammars”. Lambek grammars actually correspond to monoidal biclosed categories in disguise (i.e., based on left/right division rather than left/right duals), and may also be considered without a unit (as in his original paper). (I only have a passing familiarity with this stuff, though, and am not very clear on the difference in linguistic expressivity between grammars based on division vs grammars based on duals.)

Noam Zeilberger wrote:

If you haven’t seen it before, you might also like Lambek’s followup paper “On the calculus of syntactic types”, which generalized his original calculus by dropping associativity (so that sentences are viewed as trees rather than strings). Here are the first few paragraphs from the introduction:

…and here is a bit near the end of the 1961 paper, where he made explicit how derivations in the (original) associative calculus can be interpreted as morphisms of a monoidal biclosed category:

John Baez wrote:

Noam Zeilberger wrote: “what Lambek introduced in his original paper are these days usually called “Lambek grammars”, and not exactly the same thing as what Lambek later introduced as “pregroup grammars”.”

Can you say what the difference is? I wasn’t simplifying things on purpose; I just don’t know this stuff. I think monoidal biclosed categories are great, and if someone wants to demand that the left or right duals be inverses, or that the category be a poset, I can live with that too…. though if I ever learned more linguistics, I might ask why those additional assumptions are reasonable. (Right now I have no idea how reasonable the whole approach is to begin with!)

Thanks for the links! I will read them in my enormous amounts of spare time. :-)

Noam Zeilberger wrote:

As I said it’s not clear to me what the linguistic motivations are, but the way I understand the difference between the original “Lambek” grammars and (later introduced by Lambek) pregroup grammars is that it is precisely analogous to the difference between a monoidal category with left/right residuals and a monoidal category with left/right duals. Lambek’s 1958 paper was building off the idea of “categorial grammar” introduced earlier by Ajdukiewicz and Bar-Hillel, where the basic way of combining types was left division A\B and right division B/A (with no product).

Noam Zeilberger wrote:

At least one seeming advantage of the original approach (without duals) is that it permits interpretations of the “semantics” of sentences/derivations in cartesian closed categories. So it’s in harmony with the approach of “Montague semantics” (mentioned by Richard Williamson over at the n-Cafe) where the meanings of natural language expressions are interpreted using lambda calculus. What I understand is that this is one of the reasons Lambek grammar started to become more popular in the 80s, following a paper by Van Benthem where he observed that such such lambda terms denoting the meanings of expressions could be computed via “homomorphism” from syntactic derivations in Lambek grammar.

Jason Nichols wrote:

John Baez, as someone with a minimal understanding of set theory, lambda calculus, and information theory, what would you recommend as background reading to try to understand this stuff?

It’s really interesting, and looks relevant to work I do with NLP and even abstract syntax trees, but I reading the papers and wiki pages, I feel like there’s a pretty big gap to cross between where I am, and where I’d need to be to begin to understand this stuff.

John Baez wrote:

Jason Nichols: I suggest trying to read some of Lambek’s early papers, like this one:

• Joachim Lambek, The mathematics of sentence structure, Amer. Math. Monthly 65 (1958), 154–170.

(If you have access to the version at the American Mathematical Monthly, it’s better typeset than this free version.) I don’t think you need to understand category theory to follow them, at least not this first one. At least for starters, knowing category theory mainly makes it clear that the structures he’s trying to use are not arbitrary, but “mathematically natural”. I guess that as the subject develops further, people take more advantage of the category theory and it becomes more important to know it. But anyway, I recommend Lambek’s papers!

Borislav Iordanov wrote:

Lambek was an amazing teacher, I was lucky to have him in my ungrad. There is a small and very approachable book on his pregroups treatment that he wrote shortly before he passed away: “From Word to Sentence: a computational algebraic approach to grammar”. It’s plain algebra and very fun. Sadly looks like out of print on Amazon, but if you can find it, well worth it.

Andreas Geisler wrote:

One immediate concern for me here is that this seems (don’t have the expertise to be sure) to repeat a very old mistake of linguistics, long abandoned :

Words do not have atomic meanings. They are not a part of some 1:1 lookup table.

The most likely scenario right now is that our brains store meaning as a continuously accumulating set of connections that ultimately are impacted by every instance of a form we’ve ever heard/seen.

So, you shall know a word by all the company you’ve ever seen it in.

Andreas Geisler wrote:

John Baez I am a linguist by training, you’re welcome to borrow my brain if you want. You just have to figure out the words to use to get my brain to index what you need, as I don’t know the category theory stuff at all.

It’s a question of interpretation. I am also a translator, so i might be of some small assistance there as well, but it’s not going to be easy either way I am afraid.

John Baez wrote:

Andreas Geisler wrote: “I might be of some small assistance there as well, but it’s not going to be easy either way I am afraid.”

No, it wouldn’t. Alas, I don’t really have time to tackle linguistics myself. Mehrnoosh Sadrzadeh is seriously working on category theory and linguistics. She’s one of the people leading a team of students at this Applied Category Theory 2018 school. She’s the one who assigned this paper by Lambek, which 2 students blogged about. So she would be the one to talk to.

So, you shall know a word by all the company you’ve ever seen it in.

Yes, that quote appears in the blog article by the students, which my post here was merely an advertisement for.

February 10, 2018

Doug NatelsonThis week in the arxiv

Back when my blogging was young, I had a semi-regular posting of papers that caught my eye that week on the condensed matter part of the arxiv.  As I got busy doing many things, I'd let that fall by the wayside, but I'm going to try to restart it at some rate.  I generally haven't had the time to read these in any detail, and my comments should not be taken too seriously, but these jumped out at me.

arxiv:1802.01045 - Sangwan and Hersam; Electronic transport in two-dimensional materials
If you've been paying any attention to condensed matter and materials physics in the last 14 years, you've noticed a huge amount of work on genuinely two-dimensional materials, often exfoliated from the bulk as in the scotch tape method, or grown by chemical vapor deposition.  This looks like a nice review of many of the relevant issues, and contains lots of references for interested students to chase if they want to learn more.

arxiv:1802.01385 - Froelich; Chiral Anomaly, Topological Field Theory, and Novel States of Matter
While quite mathematical (relativistic field theory always has a certain intimidating quality, at least to me), this also looks like a reasonably pedagogical introduction of topological aspects of condensed matter.  This is not for the general reader, but I'm hopeful that if I put in the time and read it carefully, I will gain a better understanding of some of the topological discussions I hear these days about things like axion insulators and chiral anomalies.

arXiv:1802.01339 - Ugeda et al.; Observation of Topologically Protected States at Crystalline Phase Boundaries in Single-layer WSe2
arXiv:1802.02999 - Huang et al.; Emergence of Topologically Protected Helical States in Minimally Twisted Bilayer Graphene
arXiv:1802.02585 - Schindler et al.; Higher-Order Topology in Bismuth
Remember back when people didn't think about topology in the band structure of materials?  Seems like a million years ago, now that a whole lot of systems (often 2d materials or interfaces between materials) seem to show evidence of topologically special edge states.   These are three examples just this week of new measurements (all using scanning tunneling microscopy as part of the tool-set, to image edge states directly) reporting previously unobserved topological states at edges or surface features.





n-Category Café Linguistics Using Category Theory

guest post by Cory Griffith and Jade Master

Most recently, the Applied Category Theory Seminar took a step into linguistics by discussing the 2010 paper Mathematical Foundations for a Compositional Distributional Model of Meaning, by Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark.

Here is a summary and discussion of that paper.

In recent years, well known advances in AI, such as the development of AlphaGo and the ongoing development of self driving cars, have sparked interest in the general idea of machines examining and trying to understand complex data. In particular, a variety of accounts of successes in natural language processing (NLP) have reached wide audiences (see, for example, The Great AI Awakening).

One key tool for NLP practitioners is the concept of distributional semantics. There is a saying due to Firth that is so often repeated in NLP papers and presentations that even mentioning its ubiquity has become a cliche:

“You shall know a word by the company it keeps.”

The idea is that if we want to know if two words have similar meanings, we should examine the words they are used in conjunction with, and in some way measure how much overlap there is. While direct ancestry of this concept can be traced at least back to Wittgenstein, and the idea of characterizing an object by its relationship with other objects is one category theorists are already fond of, distributional semantics is distinguished by its essentially statistical methods. The variations are endless and complex, but in the cases relevant to our discussion, one starts with a corpus, a suitable way of determining what the context of a word is (simply being nearby, having a grammatical relationship, being in the same corpus at all, etc) and ends up with a vector space in which the words in the corpus each specify a point. The distance between vectors (for an appropriate definition of distance) then correspond to relationships in meaning, often in surprising ways. The creators of the GloVe algorithm give the example of a vector space in which kingman+woman=queenking - man + woman = queen.

There is also a “top down,” relatively syntax oriented analysis of meaning called categorial grammar. Categorial grammar has no accepted formal definition, but the underlying philosophy, called the principle of compositionality, is this: a meaningful sentence is composed of parts, each of which itself has a meaning. To determine the meaning of the sentence as a whole, we may combine the meanings of the constituent parts according to rules which are specified by the syntax of the sentence. Mathematically, this amounts to constructing some algebraic structure which represents grammatical rules. When this algebraic structure is a category, we call it a grammar category.

The Paper

Preliminaries

Pregroups are the algebraic structure that this paper uses to model grammar. A pregroup P is a type of partially ordered monoid. Writing xyx \to y to specify that xyx \leq y in the order relation, we require the following additional property: for each pPp \in P, there exists a left adjoint p lp^l and a right adjoint p rp^r, such that p lp1pp rp^l p \to 1 \to p p^r and pp r1p rpp p^r \to 1 \to p^r p. Since pregroups are partial orders, we can regard them as categories. The monoid multiplication and adjoints then upgrade the category of a pregroup to compact closed category. The equations referenced above are exactly the snake equations.

We can define a pregroup generated by a set XX by freely adding adjoints, units and counits to the free monoid on XX. Our grammar categories will be constructed as follows: take certain symbols, such as nn for noun and ss for sentence, to be primitive. We call these “word classes.” Generate a pregroup from them. The morphisms in the resulting category represent “grammatical reductions” of strings of word classes, with a particular string being deemed “grammatical” if it reduces to the word class ss. For example, construct the pregroup Preg({n,s})Preg( \{n,s\}) generated by nn and ss. A transitive verb can be thought of as accepting two nouns, one on the left and one on the right, and returning a sentence. Using the powerful graphical language for compact closed categories, we can represent this as

Using the adjunctions, we can turn the two inputs into outputs to get

Therefore the type of a verb is n rsn ln^r s n^l. Multiplying this on the left and right by nn allows us to apply the counits of nn to reduce n(n rsn l)nn \cdot (n^r s n^l) \cdot n to the type ss, as witnessed by

Let (FVect,,)(\mathbf{FVect},\otimes, \mathbb{R}) be the symmetric monoidal category of finite dimensional vector spaces and linear transformations with the standard tensor product. Since any vector space we use in our applications will always come equipped with a basis, these vector spaces are all endowed with an inner product. Note that FVect\mathbf{FVect} has a compact closed structure. The counit is the diagonal

η l=η r: VV 1 ie ie i\begin{array}{cccc} \eta_l = \eta_r \colon & \mathbb{R} & \to &V \otimes V \\ &1 &\mapsto & \sum_i \overrightarrow{e_i} \otimes \overrightarrow{e_i} \end{array}

and the unit is a linear extension of the inner product

ϵ l=ϵ r: VV i,jc ijv iw j i,jc ijv i,w j.\begin{array}{cccc} \epsilon^l = \epsilon^r \colon &V \otimes V &\to& \mathbb{R} \\ & \sum_{i,j} c_{i j} \vec{v_{i}} \otimes \vec{w_j} &\mapsto& \sum_{i,j} c_{i j} \langle \vec{v_i}, \vec{w_j} \rangle. \end{array}

The Model of Meaning

Let (P,)(P, \cdot) be a pregroup. The ingenious idea that the authors of this paper had was to combine categorial grammar with distributional semantics. We can rephrase their construction in more general terms by using a compact closed functor

F:(P,)(FVect,,).F \colon (P, \cdot) \to (\mathbf{FVect}, \otimes, \mathbb{R}) .

Unpacking this a bit, we assign each word class a vector space whose basis is a chosen finite set of context words. To each type reduction in PP, we assign a linear transformation. Because FF is strictly monoidal, a string of word classes p 1p 2p np_1 p_2 \cdots p_n maps to a tensor product of vector spaces V 1V 2V nV_1 \otimes V_2 \otimes \cdots \otimes V_n.

To compute the meaning of a string of words you must:

  1. Assign to each word a string of symbols p 1p 2p np_1 p_2 \cdots p_n according to the grammatical types of the word and your choice of pregroup formalism. This is nontrivial. For example, many nouns can also be used as adjectives.

  2. Compute the correlations between each word in your string and the context words of the chosen vector space (see the example below) to get a vector v 1v nV 1V nv_1 \otimes \cdots \otimes v_n \in V_1 \otimes \cdots \otimes V_n,

  3. choose a type reduction f:p 1p 2p nq 1q 2q nf \colon p_1 p_2 \cdots p_n \to q_1 q_2 \cdots q_n in your grammar category (there may not always be a unique type reduction) and,

  4. apply F(f)F(f) to your vector v 1v nv_1 \otimes \cdots \otimes v_n.

  5. You now have a vector in whatever space you reduced to. This is the “meaning” of the string of words, according the your model.

This sweeps some things under the rug, because A. Preller proved that strict monoidal functors from a pregroup to FVect\mathbf{FVect} actually force the relevant spaces to have dimension at most one. So for each word type, the best we can do is one context word. This is bad news, but the good news is that this problem disappears when more complicated grammar categories are used. In Lambek vs. Lambek monoidal bi-closed categories are used, which allow for this functorial description. So even though we are not really dealing with a functor when the domain is a pregroup, it is a functor in spirit and thinking of it this way will allow for generalization into more complicated models.

An Example

As before, we use the pregroup Preg({n,s})Preg(\{n,s\}). The nouns that we are interested in are

{Maria,John,Cynthia} \{ Maria, John, Cynthia \}

These nouns form the basis vectors of our noun space. In the order they are listed, they can be represented as

[1 0 0],[0 1 0],[0 0 1]. \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.

The “sentence space” F(s)F(s) is taken to be a one dimensional space in which 00 corresponds to false and the basis vector 1 S1_S corresponds to true. As before, transitive verbs have type n rsn ln^r s n^l, so using our functor FF, verbs will live in the vector space NSNN \otimes S \otimes N. In particular, the verb “like” can be expressed uniquely as a linear combination of its basis elements. With knowledge of who likes who, we can encode this information into a matrix where the ijij-th entry corresponds to the coefficient in front of v i1 sv jv_i \otimes 1_s \otimes v_j. Specifically, we have

[1 0 1 1 1 0 1 0 1]. \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix}.

The ijij-th entry is 11 if person ii likes person jj and 00 otherwise. To compute the meaning of the sentence “Maria likes Cynthia”, you compute the matrix product

[1 0 0][1 0 1 1 1 0 1 0 1][0 0 1]=1 \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} =1

This means that the sentence “Maria likes Cynthia” is true.

Food for Thought

As we said above, this model does not always give a unique meaning to a string of words, because at various points there are choices that need to be made. For example, the phrase “squad helps dog bite victim” has a different meaning depending on whether you take “bite” to be a verb or a noun. Also, if you reduce “dog bite victim” before applying it to the verb, you will get a different meaning than if you reduce “squad helps dog” and apply it to the verb “bite”. On the one hand, this a good thing because those sentences should have different meanings. On the other hand, the presence of choices makes it harder use this model in a practical algorithm.

Some questions arose which we did not have a clear way to address. Tensor products of spaces of high dimension quickly achieve staggering dimensionality — can this be addressed? How would one actually fit empirical data into this model? The “likes” example, which required us to know exactly who likes who, illustrates the potentially inaccessible information that seems to be necessary to assign vectors to words in a way compatible with the formalism. Admittedly, this is a necessary consequence of the fact the evaluation is of the truth or falsity of the statement, but the issue also arises in general cases. Can this be resolved? In the paper, the authors are concerned with determining the meaning of grammatical sentences (although we can just as easily use non-grammatical strings of words), so that the computed meaning is always a vector in the sentence space F(s)F(s). What are the useful choices of structure for the sentence space?

This paper was not without precedent — suggestions and models related its concepts of this paper had been floating around beforehand, and could be helpful in understanding the development of the central ideas. For example, Aerts and Gabora proposed elaborating on vector space models of meaning, incidentally using tensors as part of an elaborate quantum mechanical framework. Notably, they claimed their formalism solved the “pet fish” problem - English speakers rate goldfish as very poor representatives of fish as a whole, and of pets as a whole, but consider goldfish to be excellent representatives of “pet fish.” Existing descriptions of meaning in compositional terms struggled with this. In The Harmonic Mind, first published in 2005, Smolensky and Legendre argued for the use of tensor products in marrying linear algebra and formal grammar models of meaning. Mathematical Foundations for a Compositional Distributional Model of Meaning represents a crystallization of all this into a novel and exciting construction, which continues to be widely cited and discussed.

We would like to thank Martha Lewis, Brendan Fong, Nina Otter, and the other participants in the seminar.

February 09, 2018

Matt von HippelWhy Your Idea Is Bad

By A. Physicist

 

Your idea is bad…

 

…because it disagrees with precision electroweak measurements

…………………………………..with bounds from ATLAS and CMS

…………………………………..with the power spectrum of the CMB

…………………………………..with Eötvös experiments

…because it isn’t gauge invariant

………………………….Lorentz invariant

………………………….diffeomorphism invariant

………………………….background-independent, whatever that means

…because it violates unitarity

…………………………………locality

…………………………………causality

…………………………………observer-independence

…………………………………technical naturalness

…………………………………international treaties

…………………………………cosmic censorship

…because you screwed up the calculation

…because you didn’t actually do the calculation

…because I don’t understand the calculation

…because you predict too many magnetic monopoles

……………………………………too many proton decays

……………………………………too many primordial black holes

…………………………………..remnants, at all

…because it’s fine-tuned

…because it’s suspiciously finely-tuned

…because it’s finely tuned to be always outside of experimental bounds

…because you’re misunderstanding quantum mechanics

…………………………………………………………..black holes

………………………………………………………….effective field theory

…………………………………………………………..thermodynamics

…………………………………………………………..the scientific method

…because Condensed Matter would contribute more to Chinese GDP

…because the approximation you’re making is unjustified

…………………………………………………………………………is not valid

…………………………………………………………………………is wildly overoptimistic

………………………………………………………………………….is just kind of lazy

…because there isn’t a plausible UV completion

…because you care too much about the UV

…because it only works in polynomial time

…………………………………………..exponential time

…………………………………………..factorial time

…because even if it’s fast it requires more memory than any computer on Earth

…because it requires more bits of memory than atoms in the visible universe

…because it has no meaningful advantages over current methods

…because it has meaningful advantages over my own methods

…because it can’t just be that easy

…because it’s not the kind of idea that usually works

…because it’s not the kind of idea that usually works in my field

…because it isn’t canonical

…because it’s ugly

…because it’s baroque

…because it ain’t baroque, and thus shouldn’t be fixed

…because only a few people work on it

…because far too many people work on it

…because clearly it will only work for the first case

……………………………………………………………….the first two cases

……………………………………………………………….the first seven cases

……………………………………………………………….the cases you’ve published and no more

…because I know you’re wrong

…because I strongly suspect you’re wrong

…because I strongly suspect you’re wrong, but saying I know you’re wrong looks better on a grant application

…….in a blog post

…because I’m just really pessimistic about something like that ever actually working

…because I’d rather work on my own thing, that I’m much more optimistic about

…because if I’m clear about my reasons

……and what I know

…….and what I don’t

……….then I’ll convince you you’re wrong.

 

……….or maybe you’ll convince me?

 

February 08, 2018

Sean CarrollWhy Is There Something, Rather Than Nothing?

A good question!

Or is it?

I’ve talked before about the issue of why the universe exists at all (1, 2), but now I’ve had the opportunity to do a relatively careful job with it, courtesy of Eleanor Knox and Alastair Wilson. They are editing an upcoming volume, the Routledge Companion to the Philosophy of Physics, and asked me to contribute a chapter on this topic. Final edits aren’t done yet, but I’ve decided to put the draft on the arxiv:

Why Is There Something, Rather Than Nothing?
Sean M. Carroll

It seems natural to ask why the universe exists at all. Modern physics suggests that the universe can exist all by itself as a self-contained system, without anything external to create or sustain it. But there might not be an absolute answer to why it exists. I argue that any attempt to account for the existence of something rather than nothing must ultimately bottom out in a set of brute facts; the universe simply is, without ultimate cause or explanation.

As you can see, my basic tack hasn’t changed: this kind of question might be the kind of thing that doesn’t have a sensible answer. In our everyday lives, it makes sense to ask “why” this or that event occurs, but such questions have answers only because they are embedded in a larger explanatory context. In particular, because the world of our everyday experience is an emergent approximation with an extremely strong arrow of time, such that we can safely associate “causes” with subsequent “effects.” The universe, considered as all of reality (i.e. let’s include the multiverse, if any), isn’t like that. The right question to ask isn’t “Why did this happen?”, but “Could this have happened in accordance with the laws of physics?” As far as the universe and our current knowledge of the laws of physics is concerned, the answer is a resounding “Yes.” The demand for something more — a reason why the universe exists at all — is a relic piece of metaphysical baggage we would be better off to discard.

This perspective gets pushback from two different sides. On the one hand we have theists, who believe that they can answer why the universe exists, and the answer is God. As we all know, this raises the question of why God exists; but aha, say the theists, that’s different, because God necessarily exists, unlike the universe which could plausibly have not. The problem with that is that nothing exists necessarily, so the move is pretty obviously a cheat. I didn’t have a lot of room in the paper to discuss this in detail (in what after all was meant as a contribution to a volume on the philosophy of physics, not the philosophy of religion), but the basic idea is there. Whether or not you want to invoke God, you will be left with certain features of reality that have to be explained by “and that’s just the way it is.” (Theism could possibly offer a better account of the nature of reality than naturalism — that’s a different question — but it doesn’t let you wiggle out of positing some brute facts about what exists.)

The other side are those scientists who think that modern physics explains why the universe exists. It doesn’t! One purported answer — “because Nothing is unstable” — was never even supposed to explain why the universe exists; it was suggested by Frank Wilczek as a way of explaining why there is more matter than antimatter. But any such line of reasoning has to start by assuming a certain set of laws of physics in the first place. Why is there even a universe that obeys those laws? This, I argue, is not a question to which science is ever going to provide a snappy and convincing answer. The right response is “that’s just the way things are.” It’s up to us as a species to cultivate the intellectual maturity to accept that some questions don’t have the kinds of answers that are designed to make us feel satisfied.

BackreactionWhich problems make good research problems?

mini-problem [answer here] Scientists solve problems; that’s their job. But which problems are promising topics of research? This is the question I set out to answer in Lost in Math at least concerning the foundations of physics. A first, rough, classification of research problems can be made using Thomas Kuhn’s cycle of scientific theories. Kuhn’s cycle consists of a phase of “normal science

February 07, 2018

Tommaso DorigoRoberto Carlin To Lead CMS Experiment In 2019-20

Great news for the CMS experiment - and for Italy, and for my institution, Padova, where I coordinate accelerator-based physics research for INFN. Professor Roberto Carlin, a longtime member of the CMS experiment, where he has taken many important roles in the construction and operations of the experiment, and recently was deputy spokesperson, has now been elected spokesperson. This consolidates a "rule" which sees Italian physicists at the lead of the experiment every other term, after Tonelli (2010-12) and Camporesi (2014-16). 


read more

Tommaso DorigoRoberto Carlin To Lead CMS Experiment In 2019-20

Great news for the CMS experiment - and for Italy, and for my institution, Padova, where I coordinate accelerator-based physics research for INFN. Professor Roberto Carlin, a longtime member of the CMS experiment, where he has taken many important roles in the construction and operations of the experiment, and recently was deputy spokesperson, has now been elected spokesperson. This consolidates a "rule" which sees Italian physicists at the lead of the experiment every other term, after Tonelli (2010-12) and Camporesi (2014-16). 


read more

Tommaso DorigoRoberto Carlin To Lead CMS Experiment In 2019-20

Great news for the CMS experiment - and for Italy, and for my institution, Padova, where I coordinate accelerator-based physics research for INFN. Professor Roberto Carlin, a longtime member of the CMS experiment, where he has taken many important roles in the construction and operations of the experiment, and recently was deputy spokesperson, has now been elected spokesperson. This consolidates a "rule" which sees Italian physicists at the lead of the experiment every other term, after Tonelli (2010-12) and Camporesi (2014-16). 


read more

February 06, 2018

Terence TaoPolymath15, second thread: generalising the Riemann-Siegel approximate functional equation

This is the second “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}, continuing this previous thread. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

We now have the following proposition (see this page for a proof sketch) that looks like it can give a numerically feasible approach to bound {\Lambda}:

Proposition 1 Suppose that one has parameters {t_0, T, \varepsilon > 0} obeying the following properties:

  • All the zeroes of {H_0(x+iy)=0} with {0 \leq x \leq T} are real.
  • There are no zeroes {H_t(x+iy)=0} with {0 \leq t \leq t_0} in the region {\{ x+iy: x \geq T; 1-2t \geq y^2 \geq \varepsilon^2 + (T-x)^2 \}}.
  • There are no zeroes {H_{t_0}(x+iy)=0} with {x > T} and {y \geq \varepsilon}.

Then one has {\Lambda \leq t_0 + \frac{1}{2} \varepsilon^2}.

The first hypothesis is already known for {T} up to about {10^{12}} (we should find out exactly what we can reach here). Preliminary calculations suggest that we can obtain the third item provided that {t_0, \varepsilon \gg \frac{1}{\log T}}. The second hypothesis requires good numerical calculation for {H_t}, to which we now turn.

The initial definition of {H_t} is given by the formula

\displaystyle  H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du

where

\displaystyle  \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u}).

This formula has proven numerically computable to acceptable error up until about the first hundred zeroes of {H_t}, but degrades after that, and so other exact or approximate formulae for {H_t} are needed. One possible exact formula that could be useful is

\displaystyle  H_t(z) = \frac{1}{2} (K_{t,\theta}(z) + \overline{K_{t,\theta}(\overline{z})})

where

\displaystyle  K_{t,\theta}(z) := \sum_{n=1}^\infty (2\pi^2 n^4 I_{t,\theta}(z-9i, \pi n^2) - 3\pi n^2I_{t,\theta}(z-5i, \pi n^2))

and

\displaystyle  I_{t,\theta}(b,\beta) := \int_{i\theta}^{i\theta+i\infty} \exp(tu^2 - \beta e^{4u} + ibu)\ du

and {-\pi/8 < \theta < \pi/8} can be chosen arbitrarily. We are still trying to see if this can be implemented numerically to give better accuracy than the previous formula.

It seems particularly promising to develop a generalisation of the Riemann-Siegel approximate functional equation for {H_0}. Preliminary computations suggest in particular that we have the approximation

\displaystyle  H_t(x+iy) \approx \frac{1}{4} (F_t(\frac{1+ix-y}{2}) + \overline{F_t(\frac{1+ix+y}{2})})

where

\displaystyle  F_t(s) := \pi^{-s/2} \Gamma(\frac{s+4}{2}) \sum_{n \leq \sqrt{\mathrm{Im}(s)/2\pi}} \frac{\exp( \frac{t}{16} \log^2 \frac{s+4}{2\pi n^2})}{n^s}.

Some very preliminary numerics suggest that this formula is reasonably accurate even for moderate values of {x}, though further numerical verification is needed. As a proof of concept, one could take this approximation as exact for the purposes of seeing what ranges of {T} one can feasibly compute with (and for extremely large values of {T}, we will presumably have to introduce some version of the Odlyzko-Schönhage algorithm. Of course, to obtain a rigorous result, we will eventually need a rigorous version of this formula with explicit error bounds. It may also be necessary to add more terms to the approximation to reduce the size of the error.

Sujit Nair has kindly summarised for me the current state of affairs with the numerics as follows:

  • We need a real milestone and work backward to set up intermediate goals. This will definitely help bring in focus!
  • So far, we have some utilities to compute zeroes of {H_t} with a nonlinear solver which uses roots of {H_0} as an initial condition. The solver is a wrapper around MINPACK’s implementation of Powell’s method. There is some room for optimization. For example, we aren’t providing the solver with the analytical Jacobian which speeds up the computation and increases accuracy.
  • We have some results in the output folder which contains the first 1000 roots of {H_t} for some small values of {t \in \{0.01, 0.1, 0.22\}}, etc. They need some more organization and visualization.

We have a decent initial start but we have some ways to go. Moving forward, here is my proposition for some areas of focus. We should expand and prioritize after some open discussion.

  1. Short term Optimize the existing framework and target to have the first million zeros of {H_t} (for a reasonable range of {t}) and the corresponding plots. With better engineering practice and discipline, I am confident we can get to a few tens of millions range. Some things which will help include parallelization, iterative approaches (using zeroes of {H_t} to compute zeroes of {H_{t + \delta t}}), etc.
  2. Medium term We need to explore better ways to represent the zeros and compute them. An analogy is the computation of Riemann zeroes up to height {T}. It is computed by computing the sign changes of {Z(t)} (page 119 of Edwards) and by exploiting the {\sqrt T} speed up with the Riemann-Siegel formulation (over Euler-Maclaurin). For larger values of {j}, I am not sure the root solver based approach is going to work to understand the gaps between zeroes.
  3. Long term We also need a better understanding of the errors involved in the computation — truncation, hardware/software, etc.

Matt StrasslerIn Memory of Joe Polchinski, the Brane Master

This week, the community of high-energy physicists — of those of us fascinated by particles, fields, strings, black holes, and the universe at large — is mourning the loss of one of the great theoretical physicists of our time, Joe Polchinski. It pains me deeply to write these words.

Everyone who knew him personally will miss his special qualities — his boyish grin, his slightly wicked sense of humor, his charming way of stopping mid-sentence to think deeply, his athleticism and friendly competitiveness. Everyone who knew his research will feel the absence of his particular form of genius, his exceptional insight, his unique combination of abilities, which I’ll try to sketch for you below. Those of us who were lucky enough to know him both personally and scientifically — well, we lose twice.

Image result for joe polchinski

Polchinski — Joe, to all his colleagues — had one of those brains that works magic, and works magically. Scientific minds are as individual as personalities. Each physicist has a unique combination of talents and skills (and weaknesses); in modern lingo, each of us has a superpower or two. Rarely do you find two scientists who have the same ones.

Joe had several superpowers, and they were really strong. He had a tremendous knack for looking at old problems and seeing them in a new light, often overturning conventional wisdom or restating that wisdom in a new, clearer way. And he had prodigious technical ability, which allowed him to follow difficult calculations all the way to the end, on paths that would have deterred most of us.

One of the greatest privileges of my life was to work with Joe, not once but four times. I think I can best tell you a little about him, and about some of his greatest achievements, through the lens of that unforgettable experience.

[To my colleagues: this post was obviously written in trying circumstances, and it is certainly possible that my memory of distant events is foggy and in error.  I welcome any corrections that you might wish to suggest.]

Our papers between 1999 and 2006 were a sequence of sorts, aimed at understanding more fully the profound connection between quantum field theory — the language of particle physics — and string theory — best-known today as a candidate for a quantum theory of gravity. In each of those papers, as in many thousands of others written after 1995, Joe’s most influential contribution to physics played a central role. This was the discovery of objects known as “D-branes”, which he found in the context of string theory. (The term is a generalization of the word `membrane’.)

I can already hear the Lee Smolins and Peter Woits of the world screaming at me. ‘A discovery in string theory,’ some will shout, pounding the table, ‘an untested theory that’s not even wrong, should not be called a discovery in physics.’ Pay them no mind; they’re not even close, as you’ll see by the end of my remarks.

The Great D-scovery

In 1989, Joe, working with two young scientists, Jin Dai and Rob Leigh, was exploring some details of string theory, and carrying out a little mathematical exercise. Normally, in string theory, strings are little lines or loops that are free to move around anywhere they like, much like particles moving around in this room. But in some cases, particles aren’t in fact free to move around; you could, for instance, study particles that are trapped on the surface of a liquid, or trapped in a very thin whisker of metal. With strings, there can be a new type of trapping that particles can’t have — you could perhaps trap one end, or both ends, of the string within a surface, while allowing the middle of the string to move freely. The place where a string’s end may be trapped — whether a point, a line, a surface, or something more exotic in higher dimensions — is what we now call a “D-brane”.  [The `D’ arises for uninteresting technical reasons.]

Joe and his co-workers hit the jackpot, but they didn’t realize it yet. What they discovered, in retrospect, was that D-branes are an automatic feature of string theory. They’re not optional; you can’t choose to study string theories that don’t have them. And they aren’t just surfaces or lines that sit still. They’re physical objects that can roam the world. They have mass and create gravitational effects. They can move around and scatter off each other. They’re just as real, and just as important, as the strings themselves!

D-Branes

Fig. 1: D branes (in green) are physical objects on which a fundamental string (in red) can terminate.

It was as though Joe and his collaborators started off trying to understand why the chicken crossed the road, and ended up discovering the existence of bicycles, cars, trucks, buses, and jet aircraft.  It was that unexpected, and that rich.

And yet, nobody, not even Joe and his colleagues, quite realized what they’d done. Rob Leigh, Joe’s co-author, had the office next to mine for a couple of years, and we wrote five papers together between 1993 and 1995. Yet I think Rob mentioned his work on D-branes to me just once or twice, in passing, and never explained it to me in detail. Their paper had less than twenty citations as 1995 began.

In 1995 the understanding of string theory took a huge leap forward. That was the moment when it was realized that all five known types of string theory are different sides of the same die — that there’s really only one string theory.  A flood of papers appeared in which certain black holes, and generalizations of black holes — black strings, black surfaces, and the like — played a central role. The relations among these were fascinating, but often confusing.

And then, on October 5, 1995, a paper appeared that changed the whole discussion, forever. It was Joe, explaining D-branes to those of us who’d barely heard of his earlier work, and showing that many of these black holes, black strings and black surfaces were actually D-branes in disguise. His paper made everything clearer, simpler, and easier to calculate; it was an immediate hit. By the beginning of 1996 it had 50 citations; twelve months later, the citation count was approaching 300.

So what? Great for string theorists, but without any connection to experiment and the real world.  What good is it to the rest of us? Patience. I’m just getting to that.

What’s it Got to Do With Nature?

Our current understanding of the make-up and workings of the universe is in terms of particles. Material objects are made from atoms, themselves made from electrons orbiting a nucleus; and the nucleus is made from neutrons and protons. We learned in the 1970s that protons and neutrons are themselves made from particles called quarks and antiquarks and gluons — specifically, from a “sea” of gluons and a few quark/anti-quark pairs, within which sit three additional quarks with no anti-quark partner… often called the `valence quarks’.  We call protons and neutrons, and all other particles with three valence quarks, `baryons”.   (Note that there are no particles with just one valence quark, or two, or four — all you get is baryons, with three.)

In the 1950s and 1960s, physicists discovered short-lived particles much like protons and neutrons, with a similar sea, but which  contain one valence quark and one valence anti-quark. Particles of this type are referred to as “mesons”.  I’ve sketched a typical meson and a typical baryon in Figure 2.  (The simplest meson is called a “pion”; it’s the most common particle produced in the proton-proton collisions at the Large Hadron Collider.)

 

MesonBaryonPictures

Fig. 2: Baryons (such as protons and neutrons) and mesons each contain a sea of gluons and quark-antiquark pairs; baryons have three unpaired “valence” quarks, while mesons have a valence quark and a valence anti-quark.  (What determines whether a quark is valence or sea involves subtle quantum effects, not discussed here.)

But the quark/gluon picture of mesons and baryons, back in the late 1960s, was just an idea, and it was in competition with a proposal that mesons are little strings. These are not, I hasten to add, the “theory of everything” strings that you learn about in Brian Greene’s books, which are a billion billion times smaller than a proton. In a “theory of everything” string theory, often all the types of particles of nature, including electrons, photons and Higgs bosons, are tiny tiny strings. What I’m talking about is a “theory of mesons” string theory, a much less ambitious idea, in which only the mesons are strings.  They’re much larger: just about as long as a proton is wide. That’s small by human standards, but immense compared to theory-of-everything strings.

Why did people think mesons were strings? Because there was experimental evidence for it! (Here’s another example.)  And that evidence didn’t go away after quarks were discovered. Instead, theoretical physicists gradually understood why quarks and gluons might produce mesons that behave a bit like strings. If you spin a meson fast enough (and this can happen by accident in experiments), its valence quark and anti-quark may separate, and the sea of objects between them forms what is called a “flux tube.” See Figure 3. [In certain superconductors, somewhat similar flux tubes can trap magnetic fields.] It’s kind of a thick string rather than a thin one, but still, it shares enough properties with a string in string theory that it can produce experimental results that are similar to string theory’s predictions.

SpinningMeson

Fig. 3: One reason mesons behave like strings in experiment is that a spinning meson acts like a thick string, with the valence quark and anti-quark at the two ends.

And so, from the mid-1970s onward, people were confident that quantum field theories like the one that describes quarks and gluons can create objects with stringy behavior. A number of physicists — including some of the most famous and respected ones — made a bolder, more ambitious claim: that quantum field theory and string theory are profoundly related, in some fundamental way. But they weren’t able to be precise about it; they had strong evidence, but it wasn’t ever entirely clear or convincing.

In particular, there was an important unresolved puzzle. If mesons are strings, then what are baryons? What are protons and neutrons, with their three valence quarks? What do they look like if you spin them quickly? The sketches people drew looked something like Figure 3. A baryon would perhaps become three joined flux tubes (with one possibly much longer than the other two), each with its own valence quark at the end.  In a stringy cartoon, that baryon would be three strings, each with a free end, with the strings attached to some sort of junction. This junction of three strings was called a “baryon vertex.”  If mesons are little strings, the fundamental objects in a string theory, what is the baryon vertex from the string theory point of view?!  Where is it hiding — what is it made of — in the mathematics of string theory?

SpinningBaryon.png

Fig. 4: A fast-spinning baryon looks vaguely like the letter Y — three valence quarks connected by flux tubes to a “baryon vertex”.  A cartoon of how this would appear from a stringy viewpoint, analogous to Fig. 3, leads to a mystery: what, in string theory, is this vertex?!

[Experts: Notice that the vertex has nothing to do with the quarks. It’s a property of the sea — specifically, of the gluons. Thus, in a world with only gluons — a world whose strings naively form loops without ends — it must still be possible, with sufficient energy, to create a vertex-antivertex pair. Thus field theory predicts that these vertices must exist in closed string theories, though they are linearly confined.]

BaryonPuzzle1.png

The baryon puzzle: what is a baryon from the string theory viewpoint?

No one knew. But isn’t it interesting that the most prominent feature of this vertex is that it is a location where a string’s end can be trapped?

Everything changed in the period 1997-2000. Following insights from many other physicists, and using D-branes as the essential tool, Juan Maldacena finally made the connection between quantum field theory and string theory precise. He was able to relate strings with gravity and extra dimensions, which you can read about in Brian Greene’s books, with the physics of particles in just three spatial dimensions, similar to those of the real world, with only non-gravitational forces.  It was soon clear that the most ambitious and radical thinking of the ’70s was correct — that almost every quantum field theory, with its particles and forces, can alternatively be viewed as a string theory. It’s a bit analogous to the way that a painting can be described in English or in Japanese — fields/particles and strings/gravity are, in this context, two very different languages for talking about exactly the same thing.

The saga of the baryon vertex took a turn in May 1998, when Ed Witten showed how a similar vertex appears in Maldacena’s examples. [Note added: I had forgotten that two days after Witten’s paper, David Gross and Hirosi Ooguri submitted a beautiful, wide-ranging paper, whose section on baryons contains many of the same ideas.] Not surprisingly, this vertex was a D-brane — specifically a D-particle, an object on which the strings extending from freely-moving quarks could end. It wasn’t yet quite satisfactory, because the gluons and quarks in Maldacena’s examples roam free and don’t form mesons or baryons. Correspondingly the baryon vertex isn’t really a physical object; if you make one, it quickly diffuses away into nothing. Nevertheless, Witten’s paper made it obvious what was going on. To the extent real-world mesons can be viewed as strings, real-world protons and neutrons can be viewed as strings attached to a D-brane.

BaryonPuzzle2.png

The baryon puzzle, resolved.  A baryon is made from three strings and a point-like D-brane. [Note there is yet another viewpoint in which a baryon is something known as a skyrmion, a soliton made from meson fields — but that is an issue for another day.]

It didn’t take long for more realistic examples, with actual baryons, to be found by theorists. I don’t remember who found one first, but I do know that one of the earliest examples showed up in my first paper with Joe, in the year 2000.

 

Working with Joe

That project arose during my September 1999 visit to the KITP (Kavli Institute for Theoretical Physics) in Santa Barbara, where Joe was a faculty member. Some time before that I happened to have studied a field theory (called N=1*) that differed from Maldacena’s examples only slightly, but in which meson-like objects do form. One of the first talks I heard when I arrived at KITP was by Rob Myers, about a weird property of D-branes that he’d discovered. During that talk I made a connection between Myers’ observation and a feature of the N=1* field theory, and I had one of those “aha” moments that physicists live for. I suddenly knew what the string theory that describes the N=1*  field theory must look like.

But for me, the answer was bad news. To work out the details was clearly going to require a very difficult set of calculations, using aspects of string theory about which I knew almost nothing [non-holomorphic curved branes in high-dimensional curved geometry.] The best I could hope to do, if I worked alone, would be to write a conceptual paper with lots of pictures, and far more conjectures than demonstrable facts.

But I was at KITP.  Joe and I had had a good personal rapport for some years, and I knew that we found similar questions exciting. And Joe was the brane-master; he knew everything about D-branes. So I decided my best hope was to persuade Joe to join me. I engaged in a bit of persistent cajoling. Very fortunately for me, it paid off.

I went back to the east coast, and Joe and I went to work. Every week or two Joe would email some research notes with some preliminary calculations in string theory. They had such a high level of technical sophistication, and so few pedagogical details, that I felt like a child; I could barely understand anything he was doing. We made slow progress. Joe did an important warm-up calculation, but I found it really hard to follow. If the warm-up string theory calculation was so complex, had we any hope of solving the full problem?  Even Joe was a little concerned.

Image result for polchinski joeAnd then one day, I received a message that resounded with a triumphant cackle — a sort of “we got ’em!” that anyone who knew Joe will recognize. Through a spectacular trick, he’d figured out how use his warm-up example to make the full problem easy! Instead of months of work ahead of us, we were essentially done.

From then on, it was great fun! Almost every week had the same pattern. I’d be thinking about a quantum field theory phenomenon that I knew about, one that should be visible from the string viewpoint — such as the baryon vertex. I knew enough about D-branes to develop a heuristic argument about how it should show up. I’d call Joe and tell him about it, and maybe send him a sketch. A few days later, a set of notes would arrive by email, containing a complete calculation verifying the phenomenon. Each calculation was unique, a little gem, involving a distinctive investigation of exotically-shaped D-branes sitting in a curved space. It was breathtaking to witness the speed with which Joe worked, the breadth and depth of his mathematical talent, and his unmatched understanding of these branes.

[Experts: It’s not instantly obvious that the N=1* theory has physical baryons, but it does; you have to choose the right vacuum, where the theory is partially Higgsed and partially confining. Then to infer, from Witten’s work, what the baryon vertex is, you have to understand brane crossings (which I knew about from Hanany-Witten days): Witten’s D5-brane baryon vertex operator creates a  physical baryon vertex in the form of a D3-brane 3-ball, whose boundary is an NS 5-brane 2-sphere located at a point in the usual three dimensions. And finally, a physical baryon is a vertex with n strings that are connected to nearby D5-brane 2-spheres. See chapter VI, sections B, C, and E, of our paper from 2000.]

Throughout our years of collaboration, it was always that way when we needed to go head-first into the equations; Joe inevitably left me in the dust, shaking my head in disbelief. That’s partly my weakness… I’m pretty average (for a physicist) when it comes to calculation. But a lot of it was Joe being so incredibly good at it.

Fortunately for me, the collaboration was still enjoyable, because I was almost always able to keep pace with Joe on the conceptual issues, sometimes running ahead of him. Among my favorite memories as a scientist are moments when I taught Joe something he didn’t know; he’d be silent for a few seconds, nodding rapidly, with an intent look — his eyes narrow and his mouth slightly open — as he absorbed the point.  “Uh-huh… uh-huh…”, he’d say.

But another side of Joe came out in our second paper. As we stood chatting in the KITP hallway, before we’d even decided exactly which question we were going to work on, Joe suddenly guessed the answer! And I couldn’t get him to explain which problem he’d solved, much less the solution, for several days!! It was quite disorienting.

This was another classic feature of Joe. Often he knew he’d found the answer to a puzzle (and he was almost always right), but he couldn’t say anything comprehensible about it until he’d had a few days to think and to turn his ideas into equations. During our collaboration, this happened several times. (I never said “Use your words, Joe…”, but perhaps I should have.) Somehow his mind was working in places that language doesn’t go, in ways that none of us outside his brain will ever understand. In him, there was something of an oracle.

Looking Toward The Horizon

Our interests gradually diverged after 2006; I focused on the Large Hadron Collider [also known as the Large D-brane Collider], while Joe, after some other explorations, ended up thinking about black hole horizons and the information paradox. But I enjoyed his work from afar, especially when, in 2012, Joe and three colleagues (Ahmed Almheiri, Don Marolf, and James Sully) blew apart the idea of black hole complementarity, widely hoped to be the solution to the paradox. [I explained this subject here, and also mentioned a talk Joe gave about it here.]  The wreckage is still smoldering, and the paradox remains.

Then Joe fell ill, and we began to lose him, at far too young an age.  One of his last gifts to us was his memoirs, which taught each of us something about him that we didn’t know.  Finally, on Friday last, he crossed the horizon of no return.  If there’s no firewall there, he knows it now.

What, we may already wonder, will Joe’s scientific legacy be, decades from now?  It’s difficult to foresee how a theorist’s work will be viewed a century hence; science changes in unexpected ways, and what seems unimportant now may become central in future… as was the path for D-branes themselves in the course of the 1990s.  For those of us working today, D-branes in string theory are clearly Joe’s most important discovery — though his contributions to our understanding of black holes, cosmic strings, and aspects of field theory aren’t soon, if ever, to be forgotten.  But who knows? By the year 2100, string theory may be the accepted theory of quantum gravity, or it may just be a little-known tool for the study of quantum fields.

Yet even if the latter were to be string theory’s fate, I still suspect it will be D-branes that Joe is remembered for. Because — as I’ve tried to make clear — they’re real.  Really real.  There’s one in every proton, one in every neutron. Our bodies contain them by the billion billion billions. For that insight, that elemental contribution to human knowledge, our descendants can blame Joseph Polchinski.

Thanks for everything, Joe.  We’ll miss you terribly.  You so often taught us new ways to look at the world — and even at ourselves.

Image result for joe polchinski

 

February 04, 2018

John BaezPyrofex

Mike Stay is applying category theory to computation at a new startup called Pyrofex. And this startup has now entered a deal with RChain.

But let me explain why I’m interested. I’m interested in applied category theory… but this is special.

Mike Stay came to work with me at U.C. Riverside after getting a master’s in computer science at the University of Auckland, where he worked with Cristian Calude on algorithmic randomness. For example:

• Cristian S. Calude and Michael A. Stay, From Heisenberg to Gödel via Chaitin, International Journal of Theoretical Physics 44 (2008), 1053–1065.

• Cristian S. Calude and Michael A. Stay, Most programs stop quickly or never halt, Advances in Applied Mathematics, 40 (2008), 295–308.

It seems like ages ago, but I dimly remember looking at his application, seeing the title of the first of these two papers, and thinking “he’s either a crackpot, or I’m gonna like him”.

You see, the lure of relating Gödel’s theorem to Heisenberg’s uncertainty principle is fatally attractive to many people who don’t really understand either. I looked at the paper, decided he wasn’t a crackpot… and yes, it turned out I liked him.

(A vaguely similar thing happened with my student Chris Rogers, who’d written some papers applying techniques from general relativity to the study of crystals. As soon as I assured myself this stuff was for real, I knew I’d like him!)

Since Mike knew a lot about computer science, his presence at U. C. Riverside emboldened me to give a seminar on classical versus quantum computation. I used this as an excuse to learn the old stuff about the lambda-calculus and cartesian closed categories. When I first started, I thought the basic story would be obvious: people must be making up categories where the morphisms describe processes of computation.

But I soon learned I was wrong: people were making up categories where objects were data types, but the morphisms were equivalence classes of things going between data types—and this equivalence relation completely washed out the difference, between, say, a program that actually computes 237 × 419 and a program that just prints out 99303, which happens to be the answer to that problem.

In other words, the actual process of computation was not visible in the category-theoretic framework. I decided that to make it visible, what we really need are 2-categories in which 2-morphisms are ‘processes of computation’. Or in the jargon: objects are types, morphisms are terms, and 2-morphisms are rewrites.

It turned out people had already thought about this:

• Barnaby P. Hilken, Towards a proof theory of rewriting: the simply-typed 2λ-calculus, Theor. Comp. Sci. 170 (1996), 407–444.

• R. A. G. Seely, Weak adjointness in proof theory in Proc. Durham Conf. on Applications of Sheaves, Springer Lecture Notes in Mathematics 753, Springer, Berlin, 1979, pp. 697–701.

• R. A. G. Seely, Modeling computations: a 2-categorical framework, in Proc. Symposium on Logic in Computer Science 1987, Computer Society of the IEEE, pp. 65—71.

But I felt this viewpoint wasn’t nearly as popular as it should be. It should be very popular, at least among theoretical computer scientists, because it describes what’s actually going on in the lambda-calculus. If you read a big fat book on the lambda-calculus, like Barendregt’s The Lambda Calculus: Its Syntax and Semantics, you’ll see it spends a lot of time on reduction strategies: that is, ways of organizing the process of computation. All this is buried in the usual 1-categorical treatment. It’s living up at the 2-morphism level!

Mike basically agreed with me. We started by writing introduction to the usual 1-categorical stuff, and how it shows up in many different fields:

• John Baez and Michael Stay, Physics, topology, logic and computation: a Rosetta Stone, in New Structures for Physics, ed. Bob Coecke, Lecture Notes in Physics vol. 813, Springer, Berlin, 2011, pp. 95–172.

For financial reasons he had to leave U. C. Riverside and take a job at Google. But he finished his Ph.D. at the University of Auckland, with Cristian Calude and me as co-advisors. And large chunk of his thesis dealt with cartesian closed 2-categories and their generalizations suitable for quantum computation:

• Michael Stay, Compact closed bicategories, Theory and Applications of Categories 31 (2016), 755–798.

Great stuff! My students these days are building on this math and marching ahead.

I said Mike ‘basically’ agreed with me. He agreed that we need to go beyond the usual 1-categorical treatment to model the process of computation. But when it came to applying this idea to computer science, Mike wasn’t satisfied with thinking about the lambda-calculus. That’s an old model of computation: it’s okay for a single primitive computer, but not good for systems where different parts are sending messages to each other, like the internet, or even multiprocessing in a single computer. In other words, the lambda-calculus doesn’t really handle the pressing issues of concurrency and distributed computation.

So, Mike wanted to think about more modern formalisms for computation, like the pi-calculus, using 2-categories.

He left Google and wrote some papers with Greg Meredith on these ideas, for example:

• Michael Stay and Lucius Gregory Meredith, Higher category models of the pi-calculus.

• Michael Stay and Lucius Gregory Meredith, Representing operational semantics with enriched Lawvere theories.

The second one takes a key step: moving away from 2-categories to graph-enriched categories, which are simpler and perhaps better.

Then, after various twists and turns, he started a company called Pyrofex with Nash Foster. Or perhaps I should say Foster started a company with Mike, since Foster is the real bigshot of the two. Here’s what their webpage says:

Hello—

My name is Nash Foster, and together with my friend and colleague Mike Stay, I recently founded a company called Pyrofex. We founded our company for one simple reason: we love to build large-scale distributed systems that are always reliable and secure and we wanted to help users like yourself do it more easily.

When Mike and I founded the company, we felt strongly that several key advances in programming language theory would ease the development of every day large-scale systems. However, we were not sure exactly how to expose the APIs to users or how to build the tool-sets. Grand visions compete with practical necessity, so we spent months talking to users just like you to discover what kinds of things you were most interested in. We spent many hours at white-boards, checking our work against the best CS theory we know. And, of course, we have enjoyed many long days of hacking in the hopes that we can produce something new and useful, for you.

I think this is the most exciting time in history to be a programmer (and I wrote my first program in the early 1980’s). The technologies available today are varied and compelling and exciting. I hope that you’ll be as excited as I am about some of the ideas we discovered while building our software.

And on January 8th, 2018, their company entered into an arrangement with Greg Meredith’s company Rchain. Below I’ll quote part of an announcement. I don’t know much about this stuff—at least, not yet. But I’m happy to see some good ideas getting applied in the real world, and especially happy to see Mike doing it.

The RChain Cooperative & Pyrofex Corporation announce Strategic Partnership

The RChain Cooperative and Pyrofex Corporation today announced strategically important service contracts and an equity investment intended to deliver several mutually beneficial blockchain solutions. RChain will acquire 1.1 million shares of Pyrofex Common Stock as a strategic investment. The two companies will ink separate service contracts to reinforce their existing relationship and help to align their business interests.

Pyrofex will develop critical tools and platform components necessary for the long-term success of the RChain platform. These tools are designed to leverage RChain’s unique blockchain environment and make blockchain development simpler, faster, and more effective than ever before. Under these agreements, Pyrofex will develop the world’s first decentralized IDE for writing blockchain smart contracts on the RChain blockchain.

Pyrofex also commits to continuing the core development of RChain’s blockchain platform and to organizing RChain’s global developer events and conferences.

Comments on the News

“We’re thrilled to have an opportunity to strengthen our relationship with the RChain Cooperative in 2018. Their commitment to open-source development mirrors our own corporate values. It’s a pleasure to have such a close relationship with a vibrant open-source community. I’ve rarely seen the kind of excitement the Coop’s members share and we look forward to delivering some great new technology this year.” — Nash E. Foster, Cofounder & CEO, Pyrofex Corp.

“Intuitive development tools are important for us and the blockchain ecosystem as a whole; we’re incredibly glad Pyrofex intends to launch their tools on RChain first. But, Ethereum has been a huge supporter of RChain and we’re pleased that Pyrofex intends to support Solidity developers as well. Having tools that will make it possible for developers to migrate smart contracts between blockchains is going to create tremendous possibilities.” — Lucius Greg Meredith, President, RChain Cooperative Background

Pyrofex is a software development company co-founded by Dr. Michael Stay, PhD and Nash Foster in 2016. Dr. Stay and Greg Meredith are long-time colleagues and collaborators whose mutual research efforts form the mathematical foundations of RChain’s technology. One example of the work that Greg and Mike have collaborated on is the work on the LADL (Logic as Distributed Law) algorithm. You can watch Dr. Stay present the latest research from the RChain Developers retreat.

Pyrofex and its development team should be familiar to those who follow the RChain Cooperative. They currently employ 14 full-time and several part-time developers dedicated to RChain platform development. Pyrofex CEO Nash Foster and Lead Project Manager Medha Parlikar have helped grow RChain’s development team to an impressive 20+ Core devs with plans on doubling by mid 2018. The team now includes multiple PhDs, ex-Googlers, and other word class talents.

Every Wednesday, you will find Medha on our debrief updating the community with the latest developments in RChain. Here she is announcing the recent node.hello release along with a demo from core developer Chris Kirkwood-Watts.

The working relationship between the RChain Cooperative and Pyrofex has gone so well that the Board of Directors and the community at large have supported Pyrofex’s proposal to develop Cryptofex, the much needed developer tool kit for the decentralized world.

The RChain Cooperative is ecstatic to further develop its relationship with the Pyrofex team.

“As we fly towards Mercury and beyond, we all could use better tools.”
— The RChain Co-op Team.

Listen to the announcement from Greg Meredith as well as a short Q&A with Pyrofex CEO Nash Foster, from a recent community debrief.

The following is an excerpt from the soon to be released Cryptofex Whitepaper.

The Problem: Writing Software is Hard, Compiling is Harder

In 1983, Bordland Software Corporation acquired a small compiler called Compas Pascal and released it in the United States as Turbo Pascal. It was the first product to integrate a compiler and the editor in which software was written and for nearly a decade Borland’s products defined the market for integrated development environments (IDEs).

The year after Borland released TurboPascal, Ken Thompson observed the distinct and unique dangers associated with compiler technologies. In his famous Turing Award acceptance speech, Thompson described a mechanism by which a virus can be injected into a compiler such that every binary compiled with that compiler will replicate the virus.

“In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.” — Ken Thompson

Unfortunately, many developers today remain stuck in a world constructed in the early 1980’s. IDEs remain essentially the same, able to solve only those problems that neatly fit onto their laptop’s single Intel CPU. But barely a month ago, on 22nd November 2017, the Intel Corporation released a critical firmware update to the Intel Management Engine and in the intervening weeks, the public at large has become aware of the “Meltdown” bug. The IME and other components are exactly the sort of low-level microcode applications that Thompson warned about. Intel has demonstrated perfectly that in the past 33 years, we have learned little and gone nowhere.

Ironically, we have had a partial solution to these problems for nearly a decade. In 2009, David A. Wheeler published his PhD dissertation, in which he proposed a mechanism by which multiple compilers can be used to verify the correctness of a compiler output. Such a mechanism turns out to be tailor-made for the decentralized blockchain environment. Combining Wheeler’s mechanism with a set of economic incentives for compile farms to submit correct outputs gives us a very real shot at correcting a problem that has plagued us for more than 30 years.

The Solution: A Distributed and Decentralized Toolchain

If we crack open the development environments at companies like Google and Amazon, many of us would be surprised to discover that very few programs are compiled on a single machine. Already, the most sophisticated organizations in the world have moved to a distributed development environment. This allows them to leverage the cloud, bringing high-performance distributed computing to bear on software development itself. At Google, many thousands of machines churn away compiling code, checking it for correctness, and storing objects to be re-used later. Through clever use of caching and “hermetic” builds, Google makes its builds faster and more computationally efficient than could possibly be done on individual developer workstations. Unfortunately, most of us cannot afford to dedicate thousands of machines to compilation.

The open-source community might be able to build large scale shared compilation environments on the Internet, but Ken Thompson explained to us why we could not trust a shared environment for these workloads. However, in the age of blockchain, it’s now possible to build development environments that harness the power of large-scale compute to compile and check programs against programmer intent. Secure, cheap, and fast — we can get all three.

CryptoFex is just such a Decentralized Integrated Development Environment (DIDE) allowing software engineers to author, test, compile, and statically check their code to ensure that it is secure, efficient, and scalable.

Doug NatelsonNew readers: What is condensed matter physics? What is special about the nanoscale?

If you're a new reader, perhaps brought here by the mention of this blog in the Washington Post, welcome!   Great to have you here.  Just a couple of quick FAQs to get you oriented:

What is condensed matter physics?  Condensed matter (once known as "solid state) is a branch of physics that deals with the properties of matter consisting of large numbers of particles (usually atoms or (electrons+the rest of the atoms)) in "condensed" states like liquids and solids - basically the materials that make up an awful lot of the stuff you interact with all the time.  New properties can emerge when you bring lots of particles together.  See here for an example involving plastic balls, or here (pdf) for a famous essay about this general point.  Condensed matter physicists are often interested in identifying the different types of states or phases that can arise, and understanding transitions between those states (like how does water boil, or how does magnetism turn on in iron as its temperature is lowered from the melting point, or how does a ceramic copper oxide suddenly start letting electricity flow without resistance below some particular temperature).  Hard condensed matter typically deals with systems where quantum mechanics is directly important (electronic, magnetic, and optical properties of materials, for example), while soft condensed matter describes systems where the main actors (while quantum deep down like all matter) are not acting in a quantum way - examples include the jamming of grains of sand when you build a sand castle, or the spontaneous alignment of rod-like molecules in the liquid crystal display you're using to read this.

While particle physics tries to look at the tiniest bits of stuff, condensed matter hits on some of the same (literally the same concepts and math) deep ideas about symmetry, and often has direct implications for technologies that affect your daily life.   Understanding this stuff has given us things like the entire electronics industry, the telecommunications industry, and soon probably quantum computers.  

A powerful concept in physics in general and condensed matter in particular is universality.  For example, materials built out of all kinds of different ingredients can be mechanically rigid solids; there is something universal about mechanical rigidity that makes it emerge independent of the microscopic details.  Another example:  Lots of very different systems (metallic lead; waxy crystals of buckyball molecules with some alkaline metal atoms in between; ceramic copper oxides; hydrogen sulfide gas under enormous pressure) can conduct electricity without resistance at low temperatures - why and how is superconductivity an emergent property?

What is special about the nanoscale?  Because it's about collective properties, traditional condensed matter physics often uses a lot of nice approximations to describe systems, like assuming they're infinite in extent, or at least larger than lots of physically important scales.   When you get down to the nanoscale (recall that a typical atom is something like 0.3 nanometers in diameter), a lot of the typical approximations can fail.  As the size of the material or system becomes small compared to the length scales associated with various physical processes, new things can happen and the properties of materials can change dramatically.  Tightly confined liquids can act like solids.  Colorless materials can look brilliantly chromatic when structured on small scales.  Two electrical insulators brought together can produce a nanoscale-thick metallic layer.   We now have different techniques for structuring materials on the nanoscale and for seeing what we're doing down there, where the building blocks are often far smaller than the wavelengths of light.  Investigations at the nanoscale are tied to some of the most active topics in condensed matter, and verge into the interdisciplinary boundaries with chemistry, biology, materials science, electrical engineering, and chemical engineering.   That, and it's fun.

Please browse around through the archives, and I hope you find it interesting.


February 02, 2018

Doug NatelsonWhy science blogging still matters

Nature has a piece up about science blogging.  It's pretty much on target.  I'm a bit surprised that there wasn't more discussion of blogging vs. twitter vs. other social media platforms, or the interactions between blogs and formal journalism.

Matt von HippelUnreasonably Big Physics

The Large Hadron Collider is big, eight and a half kilometers across. It’s expensive, with a cost to construct and operate in the billions. And with an energy of 6.5 TeV per proton, it’s the most powerful collider in the world, accelerating protons to 0.99999999 of the speed of light.

The LHC is reasonable. After all, it was funded, and built. What does an unreasonable physics proposal look like?

It’s probably unfair to call the Superconducting Super Collider unreasonable, after all, it did almost get built. It would have been a 28 kilometer-wide circle in the Texas desert, accelerating protons to an energy of 20 TeV, three times the energy of the LHC. When it was cancelled in 1993, it was projected to cost twelve billion dollars, and two billion had already been spent digging the tunnel. The US hasn’t invested in a similarly sized project since.

A better example of an unreasonable proposal might be the Collider-in-the-Sea. (If that link is paywalled, this paper covers most of the same information.)

mcint2-2656157-large

If you run out of room on land, why not build your collider underwater?

Ok, there are pretty obvious reasons why not. Surprisingly, the people proposing the Collider-in-the-Sea do a decent job of answering them. They plan to put it far enough out that it won’t disrupt shipping, and deep enough down that it won’t interfere with fish. Apparently at those depths even a hurricane barely ripples the water, and they argue that the technology exists to keep a floating ring stable under those conditions. All in all, they’re imagining a collider 600 kilometers in diameter, accelerating protons to 250 TeV, all for a cost they claim would be roughly comparable to the (substantially smaller) new colliders that China and Europe are considering.

I’m sure that there are reasons I’ve overlooked why this sort of project is impossible. (I mean, just look at the map!) Still, it’s impressive that they can marshal this much of an argument.

Besides, there are even more impossible projects, like this one, by Sugawara, Hagura, and Sanami. Their proposal for a 1000 TeV neutrino beam isn’t intended for research: rather, the idea is a beam powerful enough to send neutrinos through the Earth to destroy nuclear bombs. Such a beam could cause the bombs to detonate prematurely, “fizzling” with about 3% the explosion they would have normally.

In this case, Sugawara and co. admit that their proposal is pure fantasy. With current technology they would need a ring larger than the Collider-in-the-Sea, and the project would cost hundreds of billions of dollars. It’s not even clear who would want to build such a machine, or who could get away with building it: the authors imagine a science fiction-esque world government to foot the bill.

There’s a spectrum of papers that scientists write, from whimsical speculation to serious work. The press doesn’t always make the difference clear, so it’s a useful skill to see the clues in the writing that show where a given proposal lands. In the case of the Sugawara and co. proposal, the paper is littered with caveats, explicitly making it clear that it’s just a rough estimate. Even the first line, dedicating the paper to another professor, should get you to look twice: while this sometimes happens on serious papers, often it means the paper was written as a fun gift for the professor in question. The Collider-in-the-Sea doesn’t have these kinds of warning signs, and it’s clear its authors take it a bit more seriously. Nonetheless, comparing the level of detail to other accelerator proposals, even those from the same people, should suggest that the Collider-in-the-Sea isn’t entirely on the same level. As wacky as it is to imagine, we probably won’t get a collider that takes up most of the Gulf of Mexico, or a massive neutrino beam capable of blowing up nukes around the world.

February 01, 2018

Terence TaoPolymath15, first thread: computing H_t, asymptotics, and dynamics of zeroes

This is the first official “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

The proposal naturally splits into at least three separate (but loosely related) topics:

  • Numerical computation of the entire functions {H_t(z)}, with the ultimate aim of establishing zero-free regions of the form {\{ x+iy: 0 \leq x \leq T, y \geq \varepsilon \}} for various {T, \varepsilon > 0}.
  • Improved understanding of the dynamics of the zeroes {z_j(t)} of {H_t}.
  • Establishing the zero-free nature of {H_t(x+iy)} when {y \geq \varepsilon > 0} and {x} is sufficiently large depending on {t} and {\varepsilon}.

Below the fold, I will present each of these topics in turn, to initiate further discussion in each of them. (I thought about splitting this post into three to have three separate discussions, but given the current volume of comments, I think we should be able to manage for now having all the comments in a single post. If this changes then of course we can split up some of the discussion later.)

To begin with, let me present some formulae for computing {H_t} (inspired by similar computations in the Ki-Kim-Lee paper) which may be useful. The initial definition of {H_t} is

\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du

where

\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp(- \pi n^2 e^{4u} )

is a variant of the Jacobi theta function. We observe that {\Phi} in fact extends analytically to the strip

\displaystyle \{ u \in {\bf C}: -\frac{\pi}{8} < \mathrm{Im} u < \frac{\pi}{8} \}, \ \ \ \ \ (1)

 

as {e^{4u}} has positive real part on this strip. One can use the Poisson summation formula to verify that {\Phi} is even, {\Phi(-u) = \Phi(u)} (see this previous post for details). This lets us obtain a number of other formulae for {H_t}. Most obviously, one can unfold the integral to obtain

\displaystyle H_t(z) = \frac{1}{2} \int_{-\infty}^\infty e^{tu^2} \Phi(u) e^{izu}\ du.

In my previous paper with Brad, we used this representation, combined with Fubini’s theorem to swap the sum and integral, to obtain a useful series representation for {H_t} in the {t0} case because expressions such as {e^{tu^2} e^{9u} \exp( -\pi n^2 e^{4u} ) e^{izu}} diverge as {u} approaches {-\infty}. Nevertheless we can still perform the following contour integration manipulation. Let {0 \leq \theta < \frac{\pi}{8}} be fixed. The function {\Phi} decays super-exponentially fast (much faster than {e^{tu^2}}, in particular) as {\mathrm{Re} u \rightarrow +\infty} with {-\infty \leq \mathrm{Im} u \leq \theta}; as {\Phi} is even, we also have this decay as {\mathrm{Re} u \rightarrow -\infty} with {-\infty \leq \mathrm{Im} u \leq \theta} (this is despite each of the summands in {\Phi} having much slower decay in this direction – there is considerable cancellation!). Hence by the Cauchy integral formula we have

\displaystyle H_t(z) = \frac{1}{2} \int_{i\theta-\infty}^{i\theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du.

Splitting the horizontal line from {i\theta-\infty} to {i\theta+\infty} at {i\theta} and using the even nature of {\Phi(u)}, we thus have

\displaystyle H_t(z) = \frac{1}{2} (\int_{i\theta}^{i\theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du + \int_{-i\theta}^{-i\theta+\infty} e^{tu^2} \Phi(u) e^{-izu}\ du).

Using the functional equation {\Phi(\overline{u}) = \overline{\Phi(u)}}, we thus have the representation

\displaystyle H_t(z) = \frac{1}{2} ( K_{t,\theta}(z) + \overline{K_{t,\theta}(\overline{z})} ) \ \ \ \ \ (2)

 

where

\displaystyle K_{t,\theta}(z) := \int_{i\theta}^{i \theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du

\displaystyle = \sum_{n=1}^\infty 2 \pi^2 n^4 I_{t, \theta}( z - 9i, \pi n^2 ) - 3 \pi n^2 I_{t,\theta}( z - 5i, \pi n^2 )

where {I_{t,\theta}(b,\beta)} is the oscillatory integral

\displaystyle I_{t,\theta}(b,\beta) := \int_{i\theta}^{i\theta+\infty} \exp( tu^2 - \beta e^{4u} + i b u )\ du. \ \ \ \ \ (3)

 

The formula (2) is valid for any {0 \leq \theta < \frac{\pi}{8}}. Naively one would think that it would be simplest to take {\theta=0}; however, when {z=x+iy} and {x} is large (with {y} bounded), it seems asymptotically better to take {\theta} closer to {\pi/8}, in particular something like {\theta = \frac{\pi}{8} - \frac{1}{4x}} seems to be a reasonably good choice. This is because the integrand in (3) becomes significantly less oscillatory and also much lower in amplitude; the {\exp(ibu)} term in (3) now generates a factor roughly comparable to {\exp( - \pi x/8 )} (which, as we will see below, is the main term in the decay asymptotics for {H_t(x+iy)}), while the {\exp( - \beta e^{4u} )} term still exhibits a reasonable amount of decay as {u \rightarrow \infty}. We will use the representation (2) in the asymptotic analysis of {H_t} below, but it may also be a useful representation to use for numerical purposes.

— 1. Numerical investigation of {H_t(z)}

Python and Matlab code to compute {H_t(x+iy)} for medium-sized values of {x} are now available in the Polymath15 github repository. It is expected that all the zeroes of these functions for {t \geq 0} are real (this is equivalent to the Riemann hypothesis). For {t=0}, {H_0(x)} is zero precisely when {\frac{1}{2}+\frac{ix}{2}} is a non-trivial zero of the Riemann zeta function, so the first few zeroes of {H_0} occur at approximately

\displaystyle 28.25, 42.05, 50.05, 60.85, 65.85, 75.15, 81.85, 86.65, 96.05, 99.55.

As {t} increases, we have observed that the zeroes drift slightly to the left. This is consistent with theory, for instance the number {N_t(T)} of zeroes of {H_t} of real part between {0} and {T} is known to be asymptotically

\displaystyle N_t(T) = \frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + \frac{t}{16} \log \frac{T}{4\pi} + O_t(1) \ \ \ \ \ (4)

 

so as {t} increases we should expect a few more zeroes in this region. (I had incorrectly omitted the {16} denominator in a previous version of (4).) It seems like a reasonable near-term aim to improve the numerics to the point where we can confirm this asymptotic.

A related theoretical result is that the gaps between zeroes {z_j(t)} should behave locally like an arithmetic progression for large {j}, in the sense that

\displaystyle z_{j+1}(t)-z_j(t) = (1+o_t(1)) \frac{4\pi}{\log z_j(t)} = (1+o_t(1)) \frac{4\pi}{\log j}.

This would also be nice to confirm numerically.

Theory also gives that the functions {H_t(x+iy)} decay roughly like {\exp(-\pi x/8)} as {x \rightarrow \infty}; see later sections for more precise asymptotics. To see this decay numerically for large {x}, it may be necessary to switch over to a representation such as (2) with {\theta} close to {\pi/8}, otherwise the effect of numerical roundoff error may become unmanageably large.

— 2. Dynamics of zeroes —

Let {z_j(t)} denote the zeroes of {H_t(z)}. For sake of discussion let us suppose that the zeroes are always simple (this is in fact predicted by theory assuming RH; see Corollary 2 of Csordas-Smith-Varga. It may in fact be possible to prove this claim unconditionally, but we may not need this claim for the present project). If we implicitly differentiate the equation

\displaystyle H_t(z_j(t)) = 0

in time, we (formally, at least) obtain the equation

\displaystyle (\partial_t H_t)(z_j(t)) + \partial_t z_j(t) (\partial_z H_t)(z_j(t)) = 0.

The functions {H_t} obey the backwards heat equation

\displaystyle \partial_t H_t = - \partial_{zz} H_t

and thus we have

\displaystyle \partial_t z_j(t) = \frac{(\partial_{zz} H_t)(z_j(t))}{ (\partial_{z} H_t)(z_j(t))}.

If we Taylor expand {H_t} around the zero {z_j(t)} as

\displaystyle H_t(z) = a_j (z - z_j(t)) + b_j (z-z_j(t))^2 + \dots

for some coefficients {a_j,b_j} with {a_j} non-zero (because we are assuming the zeroes to be simple) then we have after a brief calculation

\displaystyle \frac{(\partial_{zz} H_t)(z_j(t))}{ (\partial_{z} H_t)(z_j(t))} = \frac{2 b_j}{a_j}

and also

\displaystyle \frac{\partial_z H_t(z)}{H_t(z)} = \frac{1}{z-z_j(t)} + \frac{b_j}{a_j} + \dots.

On the other hand, from Weierstrass factorisation we expect (formally at least) that

\displaystyle \frac{\partial_z H_t(z)}{H_t(z)} = \sum_k \frac{1}{z-z_k(t)}

and thus we should have

\displaystyle \frac{b_j}{a_j} = \sum_{k \neq j} \frac{1}{z_j(t) - z_k(t)}.

Putting all this together, we should obtain the dynamics

\displaystyle \partial_t z_j(t) = - 2 \sum_{k \neq j} \frac{1}{z_k(t) - z_j(t)}.

This is not rigorous for a number of reasons, most notably that the sum here is not absolutely convergent, but these equations should hold in a principal value sense at least. In the regime {t > \Lambda} this was established in Lemma 2.4 of Csordas-Smith-Varga; it may be possible to establish this in the entire region {t>0}.

If we write {z_j = x_j + i y_j}, we obtain the dynamics

\displaystyle \partial_t x_j = - 2 \sum_{k \neq j} \frac{x_k - x_j}{(x_k-x_j)^2 + (y_k - y_j)^2} \ \ \ \ \ (5)

 

and

\displaystyle \partial_t y_j = 2 \sum_{k \neq j} \frac{y_k - y_j}{(x_k-x_j)^2 + (y_k - y_j)^2}. \ \ \ \ \ (6)

 

Informally, the real parts {x_j} repel each other, while the imaginary parts attract each other. In particular, once a zero is real, it should stay real.

If a zero {x_j + i y_j} is not real, then it has a complex conjugate {x_{j'} + iy_{j'} = x_j - iy_j}. Isolating the attraction that the imaginary part {y_j} feels to its complex conjugate {y_{j'} = -y_j}, we obtain

\displaystyle \partial_t y_j = - \frac{1}{y_j} + 2 \sum_{k \neq j, j'} \frac{y_k - y_j}{(x_k-x_j)^2 + (y_k - y_j)^2}. \ \ \ \ \ (7)

 

Suppose that {x_j + iy_j} is a zero with maximal imaginary part {y_j} (it is a result of Ki, Kim, and Lee that there are only finitely many non-real zeroes for {H_t} with {t>0}). Then all the summands in (7) are non-positive, hence we have the differential inequality

\displaystyle \partial_t y_j \leq - \frac{1}{y_j}. \ \ \ \ \ (8)

 

Hence if {\sigma_{max}(t)} denotes the maximal imaginary part of a zero of {H_t}, we (formally, at least) have

\displaystyle \partial_t \sigma_{max}(t) \leq - \frac{1}{\sigma_{max}(t)}

or equivalently

\displaystyle \partial_t \frac{1}{2} \sigma_{max}^2(t) \leq - 1

whenever {\sigma_{max}(t)} is positive. Thus the zeroes will be forced into the real axis in finite time, and in fact we can establish the bound

\displaystyle \Lambda \leq t + \frac{1}{2} \sigma_{max}^2(t)

from this reasoning (this is a result of de Bruijn).

One could presumably do better by a more careful analysis of the sum in (7). The only way it seems that the inequality could be close to sharp is if the offending non-real zero {x_j + iy_j} is somehow isolated far away from all other zeroes except for its complex conjugate {x_j + iy_j}. For large {x_j}, this is presumably in contradiction with the asymptotics (4); for small {x_j}, perhaps one could use numerics to exclude this possibility?

At time {t=0}, we know that the first {10^{13}} zeroes are real (a result of Gourdon). Thus any non-real zero will initially have a very large value of {x_j}. It would be nice to somehow be able to say that these zeroes continue to have very large real part for positive values of {t} as well, but unfortunately the velocity in (5) could be large and negative if the zero is just to the left of another zero that is repelling it towards the origin. Is there some way to stop this? One may have to work with “clusters” of zeroes and study their centre of mass (or some similar statistic) as this may behave in a better way than an individual position function {x_j}. Perhaps some of the identities in this previous post could be adapted for this purpose?

— 3. Asymptotics of {H_t}

Standard calculations give the gaussian integral identity

\displaystyle \int \exp( \alpha_0 + \frac{\alpha_2}{2!} (u - u_0)^2 ) a_0\ du = \sqrt{\frac{2\pi}{-\alpha_2}} a_0 \exp(\alpha_0)

whenever {a_0, \alpha_0, \alpha_2, u_0} are complex numbers with {\mathrm{Re} \alpha_2 < 0}, where we integrate along a horizontal contour such as {{\bf R}} and we use the standard branch of the complex logarithm. More generally, Laplace’s method suggests that one has the approximation

\displaystyle \int \exp( f(u) ) a(u)\ du \approx \sqrt{\frac{2\pi}{(-f''(u_0))}} a(u_0) \exp( f(u_0) )

whenever {f} is an oscillating phase that has a single stationary point {u_0} (with {\mathrm{Re} f''(u_0) < 0}) and {a} is a slowly varying amplitude, and the integral is along a contour that does not start or end too close to {u_0}. (Here one uses the standard branch of the square root function.) There are several ways to make this approximation rigorous, such as the method of steepest descent, the saddle point method, or the method of stationary phase, but for this discussion let us work completely informally. One can apply this method to analyse the integrals (3). For the highest accuracy, one should use the phase {f(u) = tu^2 - \beta e^{4u} + ibu}; this is the approach taken for instance in my paper with Brad (in the {t<0} case), but has the drawback that the stationary point {u_0} has to be defined using the Lambert {W}-function. A more “quick and dirty” approach, which seems to give worse error bounds but still gives broadly correct qualitative conclusions, is only take the phase {f(u) = - \beta e^{4u} + ibu} and treat the factor {a(u) = \exp( t u^2 )} as an amplitude. In this case, the stationary point occurs at

\displaystyle u_0 = \frac{1}{4} \log \frac{ib}{4\beta}

(where we use the branch of the logarithm that makes {u_0} lie in the strip (1)), with

\displaystyle f(u_0) = - \frac{ib}{4} + \frac{ib}{4} \log \frac{ib}{4\beta}

and

\displaystyle f''(u_0) = -4ib.

This suggests that we have the approximation

\displaystyle I_{t,\theta}(b,\beta) \approx \sqrt{\frac{\pi}{2ib}} \exp( \frac{t}{16} \log^2 \frac{ib}{4\beta} - \frac{ib}{4} + \frac{ib}{4} \log \frac{ib}{4\beta} ).

(I think that in order for this approximation to be rigorous, one needs to take {\theta} to be close to the imaginary part of {u_0}, in particular close to {\pi/8}.) Now, from (2) one has

\displaystyle H_t(x+iy) = \frac{1}{2} \sum_{n=1}^\infty \ \ \ \ \ (9)

 

\displaystyle 2 \pi^2 n^4 I_{t, \theta}( x - (9-y)i, \pi n^2 ) - 3 \pi n^2 I_{t,\theta}( x - (5-y)i, \pi n^2 )

\displaystyle + 2 \pi^2 n^4 \overline{I_{t, \theta}( x - (9+y)i, \pi n^2 )} - 3 \pi n^2 \overline{I_{t,\theta}( x + (5-y)i, \pi n^2 )}.

Here we consider the regime where {x} is large and {y} is positive but fairly small. Let’s just look at the term

\displaystyle I_{t, \theta}( x - (9+y)i, \pi n^2 ).

Taking {b = x - (9+y)i} and {\beta = \pi n^2}, we approximately have

\displaystyle \log \frac{ib}{4\beta} \approx \log \frac{x}{4\pi n^2} + i ( \frac{\pi}{2} - \frac{9+y}{x} )

and so (after some calculation, and dropping terms of size {o(1)}) we have the somewhat crude approximation

\displaystyle I_{t,\theta}( x - (9+y)i, \pi n^2 ) \approx \sqrt{\frac{\pi}{2ix}} \exp( A + iB )

where

\displaystyle A := \frac{t}{16} \log^2 \frac{x}{4\pi n^2} - \frac{t \pi^2}{64} + \frac{9+y}{4} \log \frac{x}{4\pi n^2} - \frac{\pi x}{8}

and

\displaystyle B := \frac{\pi t}{16} \log \frac{x}{4\pi n^2} - \frac{x}{4} + \frac{x}{4} \log \frac{x}{4\pi n^2} + \frac{\pi (9+y)}{8}.

In particular, we expect the magnitude {|I_{t,\theta}( x - (9+y)i, \pi n^2 )|} to behave like

\displaystyle |I_{t,\theta}( x - (9+y)i, \pi n^2 )| \approx \sqrt{\frac{\pi}{2x}} (\frac{x}{4\pi n^2})^{(9+y)/4} e^{-\pi x/8}

\displaystyle \times \exp( \frac{t}{16} \log^2 \frac{x}{4\pi n^2} - \frac{t \pi^2}{64} ).

Similarly for the other three {I_{t,\theta}} expressions that appear in (9). If {t, y>0} and {x} is large, this suggests that the {n=1} terms in (9) dominate, and furthermore of the four terms, it is the third term which dominates the other three. Thus we expect to have

\displaystyle H_t(x+iy) \approx \pi^2 \overline{I_{t, \theta}( x - (9+y)i, \pi )}

\displaystyle \approx \pi^2 \sqrt{\frac{\pi}{-2ix}} (\frac{x}{4\pi})^{(9+y)/4} e^{-\pi x/8} \exp( \frac{t}{16} \log^2 \frac{x}{4\pi} - \frac{t \pi^2}{64} ) \exp(- iB )

where

\displaystyle B := \frac{\pi t}{16} \log \frac{x}{4\pi} - \frac{x}{4} + \frac{x}{4} \log \frac{x}{4\pi} + \frac{\pi (9+y)}{8}.

Dividing the phase {B} by {\pi} and using the argument principle, we now see where the asymptotic (4) is supposed to come from. Taking magnitudes we also expect

\displaystyle |H_t(x+iy)| \approx \pi^2 \sqrt{\frac{\pi}{2x}} (\frac{x}{4\pi})^{(9+y)/4} e^{-\pi x/8} \exp( \frac{t}{16} \log^2 \frac{x}{4\pi} - \frac{t \pi^2}{64} ),

in particular {H_t(x+iy)} should be non-zero for fixed {t,y>0} and large {x}.

Hopefully we will be able to make these asymptotics more precise, and also they can be confirmed by numerics (in particular there may be some sign errors or other numerical inaccuracies in my calculations above which numerics might be able to detect).

January 31, 2018

BackreactionPhysics Facts and Figures

Physics is old. Together with astronomy, it’s the oldest scientific discipline. And the age shows. Compared to other scientific areas, physics is a slowly growing field. I learned this from a 2010 paper by Larsen and van Ins. The authors counted the number of publications per scientific areas. In physics, the number of publications grows at an annual rate of 3.8%. This means it currently takes 18

January 29, 2018

John PreskillWhat makes extraordinary science extraordinary?

My article for this month appears on Sean Carroll’s blog, Preposterous UniverseSean is a theoretical physicist who practices cosmology at Caltech. He interfaces with philosophy, which tinges the question I confront: What distinguishes extraordinary science from good science? The topic seemed an opportunity to take Sean up on an invitation to guest-post on Preposterous Universe. Head there for my article. Thanks to Sean for hosting!

Big Dipper

Sean CarrollGuest Post: Nicole Yunger Halpern on What Makes Extraordinary Science Extraordinary

Nicole Yunger Halpern is a theoretical physicist at Caltech’s Institute for Quantum Information and Matter (IQIM).  She blends quantum information theory with thermodynamics and applies the combination across science, including to condensed matter; black-hole physics; and atomic, molecular, and optical physics. She writes for Quantum Frontiers, the IQIM blog, every month.


What makes extraordinary science extraordinary?

Political junkies watch C-SPAN. Sports fans watch ESPN. Art collectors watch Christie’s. I watch scientists respond to ideas.

John Preskill—Caltech professor, quantum-information theorist, and my PhD advisor—serves as the Chief Justice John Roberts of my C-SPAN. Ideas fly during group meetings, at lunch outside a campus cafeteria, and in John’s office. Many ideas encounter a laconicism compared with which Ernest Hemingway babbles. “Hmm,” I hear. “Ok.” “Wait… What?”

The occasional idea provokes an “mhm.” The final syllable has a higher pitch than the first. Usually, the inflection change conveys agreement and interest. Receiving such an “mhm” brightens my afternoon like a Big Dipper sighting during a 9 PM trudge home.

Hearing “That’s cool,” “Nice,” or “I’m excited,” I cartwheel internally.

What distinguishes “ok” ideas from “mhm” ideas? Peeling the Preskillite trappings off this question reveals its core: What distinguishes good science from extraordinary science?

I’ve been grateful for opportunities to interview senior scientists, over the past few months, from coast to coast. The opinions I collected varied. Several interviewees latched onto the question as though they pondered it daily. A couple of interviewees balked (I don’t know; that’s tricky…) but summoned up a sermon. All the responses fired me up: The more wisps of mist withdrew from the nature of extraordinary science, the more I burned to contribute.

I’ll distill, interpret, and embellish upon the opinions I received. Italics flag lines that I assembled to capture ideas that I heard, as well as imperfect memories of others’ words. Quotation marks surround lines that others constructed. Feel welcome to chime in, in the “comments” section.

One word surfaced in all, or nearly all, my conversations: “impact.” Extraordinary science changes how researchers across the world think. Extraordinary science reaches beyond one subdiscipline.

This reach reminded me of answers to a question I’d asked senior scientists when in college: “What do you mean by ‘beautiful’?”  Replies had varied, but a synopsis had crystallized: “Beautiful science enables us to explain a lot with a little.” Schrodinger’s equation, which describes how quantum systems evolve, fits on one line. But the equation describes electrons bound to atoms, particles trapped in boxes, nuclei in magnetic fields, and more. Beautiful science, which overlaps with extraordinary science, captures much of nature in a small net.

Inventing a field constitutes extraordinary science. Examples include the fusion of quantum information with high-energy physics. Entanglement, quantum computation, and error correction are illuminating black holes, wormholes, and space-time.

Extraordinary science surprises us, revealing faces that we never expected nature to wear. Many extraordinary experiments generate data inexplicable with existing theories. Some extraordinary theory accounts for puzzling data; some extraordinary theory provokes experiments. I graduated from the Perimeter Scholars International Masters program,  at the Perimeter Institute for Theoretical Physics, almost five years ago. Canadian physicist Art McDonald presented my class’s commencement address. An interest in theory, he said, brought you to this institute. Plunge into theory, if you like. Theorem away. But keep a bead on experiments. Talk with experimentalists; work to understand them. McDonald won a Nobel Prize, two years later, for directing the Sudbury Neutrino Observatory (SNO). (SNOLab, with the Homestake experiment, revealed properties of subatomic particles called “neutrinos.” A neutrino’s species can change, and neutrinos have tiny masses. Neutrinos might reveal why the universe contains more matter than antimatter.)

Not all extraordinary theory clings to experiment like bubblegum to hair. Elliott Lieb and Mary Beth Ruskai proved that quantum entropies obey an inequality called “strong subadditivity” (SSA).  Entropies quantify uncertainty about which outcomes measurements will yield. Experimentalists could test SSA’s governance of atoms, ions, and materials. But no physical platform captures SSA’s essence.

Abstract mathematics underlies Lieb and Ruskai’s theorem: convexity and concavity (properties of functions), the Golden-Thompson inequality (a theorem about exponentials of matrices), etc. Some extraordinary theory dovetails with experiment; some wings away.

One interviewee sees extraordinary science in foundational science. At our understanding’s roots lie ideas that fertilize diverse sprouts. Other extraordinary ideas provide tools for calculating, observing, or measuring. Richard Feynman sped up particle-physics computations, for instance, by drawing diagrams.  Those diagrams depict high-energy physics as the creation, separation, recombination, and annihilation of particles. Feynman drove not only a technical, but also a conceptual, advance. Some extraordinary ideas transform our views of the world.

Difficulty preoccupied two experimentalists. An experiment isn’t worth undertaking, one said, if it isn’t difficult. A colleague, said another, “does the impossible and makes it look easy.”

Simplicity preoccupied two theorists. I wrung my hands, during year one of my PhD, in an email to John. The results I’d derived—now that I’d found them— looked as though I should have noticed them months earlier. What if the results lacked gristle? “Don’t worry about too simple,” John wrote back. “I like simple.”

Another theorist agreed: Simplification promotes clarity. Not all simple ideas “go the distance.” But ideas run farther when stripped down than when weighed down by complications.

Extraordinary scientists have a sense of taste. Not every idea merits exploration. Identifying the ideas that do requires taste, or style, or distinction. What distinguishes extraordinary science? More of the theater critic and Julia Child than I expected five years ago.

With gratitude to the thinkers who let me pick their brains.

Georg von HippelLooking for guest blogger(s) to cover LATTICE 2018

Since I will not be attending LATTICE 2018 for some excellent personal reasons, I am looking for a guest blogger or even better several guest bloggers from the lattice community who would be interested in covering the conference. Especially for advanced PhD students or junior postdocs, this might be a great opportunity to get your name some visibility. If you are interested, drop me a line either in the comment section or by email (my university address is easy to find).

Jordan EllenbergNot even the most poorly paid shipping clerk

One more from Why Men Fail:

Not even the most poorly paid shipping clerk would dream of trying to make his own shirts, and confidential investigation would probably reveal that mighty few darn their own socks.  Yet the cities are full of women on march larger salaries who not only make their own clothes, but cook their own meals and do their own laundry.

So in 1927, it was more unusual to cook for yourself than it was to make your own clothes?  When did that flip?

 

Jordan EllenbergI’m tryin’, I’m tryin’, I’m tryin’, I’m tryin’

“I’m tryin, I’m tryin’, I’m tryin’, I’m tryin'”

reappears, 25 years after Slanted and Enchanted, in Selena Gomez’s “Bad Liar””:

Both songs are lopey and talky.  Stephen Malkmus is talking over the Fall’s “A New Face in Hell.”   Gomez is talking over “Psycho Killer.” Gomez, unlike Malkmus, tells you what she’s trying to do, or trying to not do.  I don’t think this blunts the basic ambiguity of the line — I’m trying to do something, but also, yeah, I’m a little trying, aren’t I?

Bonus track:  Julian Cope, “Try Try Try.”  Your famous victory will be no victory!

January 28, 2018

Terence TaoEmbedding the Boussinesq equations in the incompressible Euler equations on a manifold

The Boussinesq equations for inviscid, incompressible two-dimensional fluid flow in the presence of gravity are given by

\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_x = -\partial_x p \ \ \ \ \ (1)

\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_y = \rho - \partial_y p \ \ \ \ \ (2)

\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) \rho = 0 \ \ \ \ \ (3)

\displaystyle \partial_x u_x + \partial_y u_y = 0 \ \ \ \ \ (4)

where {u: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}^2} is the velocity field, {p: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}} is the pressure field, and {\rho: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}} is the density field (or, in some physical interpretations, the temperature field). In this post we shall restrict ourselves to formal manipulations, assuming implicitly that all fields are regular enough (or sufficiently decaying at spatial infinity) that the manipulations are justified. Using the material derivative {D_t := \partial_t + u_x \partial_x + u_y \partial_y}, one can abbreviate these equations as

\displaystyle D_t u_x = -\partial_x p

\displaystyle D_t u_y = \rho - \partial_y p

\displaystyle D_t \rho = 0

\displaystyle \partial_x u_x + \partial_y u_y = 0.

One can eliminate the role of the pressure {p} by working with the vorticity {\omega := \partial_x u_y - \partial_y u_x}. A standard calculation then leads us to the equivalent “vorticity-stream” formulation

\displaystyle D_t \omega = \partial_x \rho

\displaystyle D_t \rho = 0

\displaystyle \omega = \partial_x u_y - \partial_y u_x

\displaystyle \partial_x u_y + \partial_y u_y = 0

of the Boussinesq equations. The latter two equations can be used to recover the velocity field {u} from the vorticity {\omega} by the Biot-Savart law

\displaystyle u_x := -\partial_y \Delta^{-1} \omega; \quad u_y = \partial_x \Delta^{-1} \omega.

It has long been observed (see e.g. Section 5.4.1 of Bertozzi-Majda) that the Boussinesq equations are very similar, though not quite identical, to the three-dimensional inviscid incompressible Euler equations under the hypothesis of axial symmetry (with swirl). The Euler equations are

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p

\displaystyle \nabla \cdot u = 0

where now the velocity field {u: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}^3} and pressure field {p: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}} are over the three-dimensional domain {{\bf R}^3}. If one expresses {{\bf R}^3} in polar coordinates {(z,r,\theta)} then one can write the velocity vector field {u} in these coordinates as

\displaystyle u = u^z \frac{d}{dz} + u^r \frac{d}{dr} + u^\theta \frac{d}{d\theta}.

If we make the axial symmetry assumption that these components, as well as {p}, do not depend on the {\theta} variable, thus

\displaystyle \partial_\theta u^z, \partial_\theta u^r, \partial_\theta u^\theta, \partial_\theta p = 0,

then after some calculation (which we give below the fold) one can eventually reduce the Euler equations to the system

\displaystyle \tilde D_t \omega = \frac{1}{r^4} \partial_z \rho \ \ \ \ \ (5)

\displaystyle \tilde D_t \rho = 0 \ \ \ \ \ (6)

\displaystyle \omega = \frac{1}{r} (\partial_z u^r - \partial_r u^z) \ \ \ \ \ (7)

\displaystyle \partial_z(ru^z) + \partial_r(ru^r) = 0 \ \ \ \ \ (8)

where {\tilde D_t := \partial_t + u^z \partial_z + u^r \partial_r} is the modified material derivative, and {\rho} is the field {\rho := (r u^\theta)^2}. This is almost identical with the Boussinesq equations except for some additional powers of {r}; thus, the intuition is that the Boussinesq equations are a simplified model for axially symmetric Euler flows when one stays away from the axis {r=0} and also does not wander off to {r=\infty}.

However, this heuristic is not rigorous; the above calculations do not actually give an embedding of the Boussinesq equations into Euler. (The equations do match on the cylinder {r=1}, but this is a measure zero subset of the domain, and so is not enough to give an embedding on any non-trivial region of space.) Recently, while playing around with trying to embed other equations into the Euler equations, I discovered that it is possible to make such an embedding into a four-dimensional Euler equation, albeit on a slightly curved manifold rather than in Euclidean space. More precisely, we use the Ebin-Marsden generalisation

\displaystyle \partial_t u + \nabla_u u = - \mathrm{grad}_g p

\displaystyle \mathrm{div}_g u = 0

of the Euler equations to an arbitrary Riemannian manifold {(M,g)} (ignoring any issues of boundary conditions for this discussion), where {u: {\bf R} \rightarrow \Gamma(TM)} is a time-dependent vector field, {p: {\bf R} \rightarrow C^\infty(M)} is a time-dependent scalar field, and {\nabla_u} is the covariant derivative along {u} using the Levi-Civita connection {\nabla}. In Penrose abstract index notation (using the Levi-Civita connection {\nabla}, and raising and lowering indices using the metric {g = g_{ij}}), the equations of motion become

\displaystyle \partial_t u^i + u^j \nabla_j u^i = - \nabla^i p \ \ \ \ \ (9)

 

\displaystyle \nabla_i u^i = 0;

in coordinates, this becomes

\displaystyle \partial_t u^i + u^j (\partial_j u^i + \Gamma^i_{jk} u^k) = - g^{ij} \partial_j p

\displaystyle \partial_i u^i + \Gamma^i_{ik} u^k = 0 \ \ \ \ \ (10)

where the Christoffel symbols {\Gamma^i_{jk}} are given by the formula

\displaystyle \Gamma^i_{jk} := \frac{1}{2} g^{il} (\partial_j g_{lk} + \partial_k g_{lj} - \partial_l g_{jk}),

where {g^{il}} is the inverse to the metric tensor {g_{il}}. If the coordinates are chosen so that the volume form {dg} is the Euclidean volume form {dx}, thus {\mathrm{det}(g)=1}, then on differentiating we have {g^{ij} \partial_k g_{ij} = 0}, and hence {\Gamma^i_{ik} = 0}, and so the divergence-free equation (10) simplifies in this case to {\partial_i u^i = 0}. The Ebin-Marsden Euler equations are the natural generalisation of the Euler equations to arbitrary manifolds; for instance, they (formally) conserve the kinetic energy

\displaystyle \frac{1}{2} \int_M |u|_g^2\ dg = \frac{1}{2} \int_M g_{ij} u^i u^j\ dg

and can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on {M} (see this previous post for a discussion of this in the flat space case).

The specific four-dimensional manifold in question is the space {{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z} \times {\bf R}/{\bf Z}} with metric

\displaystyle dx^2 + dy^2 + y^{-1} dz^2 + y dw^2

and solutions to the Boussinesq equation on {{\bf R} \times {\bf R}^+} can be transformed into solutions to the Euler equations on this manifold. This is part of a more general family of embeddings into the Euler equations in which passive scalar fields (such as the field {\rho} appearing in the Boussinesq equations) can be incorporated into the dynamics via fluctuations in the Riemannian metric {g}). I am writing the details below the fold (partly for my own benefit).

Firstly, it is convenient to transform the Euler equations on an arbitrary Riemannian manifold to a covelocity formulation, by introducing the covelocity {1}-form {v_i := g_{ij} u^j}, as this formulation allows one to largely avoid having to work with covariant derivatives or Christoffel symbols. Lowering indices in the Euler equation (9) then gives the system

\displaystyle \partial_t v_i + u^j \nabla_j v_i = - \partial_i p

\displaystyle u^j = g^{ij} v_i

\displaystyle \nabla_i u^i = 0.

Noting that {u^j \nabla_i v_j = \frac{1}{2} \partial_i ( u^j v_j)}, and introducing the modified pressure {p' := p + \frac{1}{2} u^j v_j}, we arrive at the system

\displaystyle \partial_t v_i + u^j (\nabla_j v_i - \nabla_i v_j) = - \partial_i p'

\displaystyle u^j = g^{ij} v_i

\displaystyle \nabla_i u^i = 0.

As the Levi-Civita connection is torsion-free (or equivalently, one has the symmetry {\Gamma^i_{jk} = \Gamma^i_{kj})}, we have {\nabla_j v_i - \nabla_i v_j = \partial_j v_i - \partial_i v_j}, thus we arrive at the system

\displaystyle \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) = - \partial_i p' \ \ \ \ \ (11)

\displaystyle u^j = g^{ij} v_i \ \ \ \ \ (12)

\displaystyle \nabla_i u^i = 0 \ \ \ \ \ (13)

which is equivalent to (and thus embeddable in) the Euler equations. The advantage of this formulation is that the metric {g} now plays no role whatsoever in the main equation (11), and only appears in (12) and (13). One can also interpret the expression {u^j (\partial_j v_i - \partial_i v_j)} as the Lie derivative of the covelocity {v} along the velocity {u}.

If one works in a coordinate system in which the volume form {dg} is Euclidean (that is to say, {\mathrm{det} g = 1}), then the Riemannian divergence {\nabla_i u^i} is the same as the ordinary divergence {\partial_i u^i}; this can be seen either by viewing the divergence as the adjoint of the gradient operator with respect to the volume form, or else by differentiating the condition {\mathrm{det} g = 1} to conclude that {g^{ij} \partial_k g_{ij} = 0}, which implies that {\Gamma^i_{ik}=0} and hence {\nabla_i u^i = \partial_i u^i}. But actually, as already observed in my previous paper, one can replace {\nabla_i u^i} with {\partial_i u^i} “for free”, even if one does not have the Euclidean volume form condition {\mathrm{det} g = 1}, if one is prepared to add an additional “dummy” dimension to the manifold {M}. More precisely, if {u, v, g, p'} solves the system

\displaystyle \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) = - \partial_i p' \ \ \ \ \ (14)

\displaystyle u^j = g^{ij} v_i \ \ \ \ \ (15)

\displaystyle \partial_i u^i = 0 \ \ \ \ \ (16)

on some {d}-dimensional Riemannian manifold {(M,g)}, then one can introduce modified fields {\tilde u, \tilde v, \tilde g, \tilde p'} on a {d+1}-dimensional manifold {(M',g')} by defining

\displaystyle \tilde M := M \times {\bf R}/{\bf Z}

\displaystyle d\tilde g^2 := dg^2 + \mathrm{det}(g)^{-1} ds^2

\displaystyle \tilde u^i(x,s) := u^i(x)

\displaystyle \tilde u^s(x,s) := 0

\displaystyle \tilde v_i(x,s) := v_i(x)

\displaystyle \tilde v_s(x,s) := 0

\displaystyle \tilde p'(x,s) := p'(x)

then these fields obey the same system, and hence (since {\mathrm{det}(\tilde g)=1}) solve (11), (12), (13). Thus the above system is embeddable into the Euler equations in one higher dimension. To embed the Boussinesq equations into the four-dimensional Euler equations mentioned previously, it thus suffices to embed these equations into the system (14)(16) for the three-dimensional manifold {{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z}} with metric

\displaystyle dx^2 + dy^2 + y^{-1} dz^2. \ \ \ \ \ (17)

 

Let us more generally consider the system (14)(16) under the assumption that {M} splits as a product {M_1 \times M_2} of two manifolds {M_1,M_2}, with all data independent of the {M_2} coordinates (but, for added flexibility, we do not assume that the metric on {M} splits as the direct sum of metrics on {M_1} and {M_2}, allowing for twists and shears). This, if we use Roman indices to denote the {M_1} coordinates and Greek indices to denote the {M_2} coordinates (with summation only being restricted to these coordinates), and denote the inverse metric by the tensor with components {g^{ij}, g^{i\beta}, g^{\beta \gamma}}, then we have

\displaystyle \partial_\alpha g^{ij}, \partial_\alpha g^{i\beta}, \partial_\alpha g^{\beta \gamma} = 0 \ \ \ \ \ (18)

\displaystyle \partial_\alpha v_i, \partial_\alpha v_\beta = 0 \ \ \ \ \ (19)

\displaystyle \partial_\alpha u^i, \partial_\alpha u^\beta = 0 \ \ \ \ \ (20)

\displaystyle \partial_\alpha p' = 0, \ \ \ \ \ (21)

and then the system (14)(16) in these split coordinates become

\displaystyle \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) - u^\alpha \partial_i v_\alpha = - \partial_i p'

\displaystyle \partial_t v_\beta + u^j \partial_j v_\beta = 0

\displaystyle u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle u^\alpha = g^{\alpha \beta} v_\beta + g^{\alpha j} v_j

\displaystyle \partial_i u^i = 0.

We can view this as a system of PDE on the smaller manifold {M_1}, which is then embedded into the Euler equations. Introducing the material derivative {D_t := \partial_t + u^j \partial_j}, this simplifies slightly to

\displaystyle D_t v_i - u^j \partial_i v_j - u^\alpha \partial_i v_\alpha = - \partial_i p'

\displaystyle D_t v_\beta = 0

\displaystyle u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle u^\alpha = g^{\alpha \beta} v_\beta + g^{\alpha j} v_j

\displaystyle \partial_i u^i = 0.

We substitute the third and fourth equations into the first, then drop the fourth (as it can be viewed as a definition of the field {u^\alpha}, which no longer plays any further role), to obtain

\displaystyle D_t v_i - g^{jk} v_k \partial_i v_j - g^{\alpha j} v_\alpha \partial_i v_j - g^{\alpha \beta} v_\beta \partial_i v_\alpha - g^{\alpha j} v_j \partial_i v_\alpha = - \partial_i p'

\displaystyle D_t v_\beta = 0

\displaystyle u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle \partial_i u^i = 0.

We can reverse the pressure modification by writing

\displaystyle p := p' - \frac{1}{2} g^{jk} v_j v_k - g^{\alpha j} v_j v_\alpha - \frac{1}{2} g^{\alpha \beta} v_\alpha v_\beta,

to move some derivatives off of the covelocity fields and onto the metric, so that the system now becomes

\displaystyle D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} + v_j v_\alpha \partial_i g^{j\alpha} + \frac{1}{2} v_\alpha v_\beta \partial_i g^{\alpha \beta} = - \partial_i p \ \ \ \ \ (22)

\displaystyle D_t v_\beta = 0 \ \ \ \ \ (23)

\displaystyle u^j = g^{ij} v_i + g^{\alpha j} v_\alpha \ \ \ \ \ (24)

\displaystyle \partial_i u^i = 0. \ \ \ \ \ (25)

 

At this point one can specialise to various special cases to obtain some possibly simpler dynamics. For instance, one could set {M_2} to be flat (so that {g^{\alpha \beta}} is constant), and set {v_i} and {p} to both vanish, then we obtain the simple-looking (but somewhat overdetermined) system

\displaystyle D_t v_\beta = 0

\displaystyle u^j = g^{\alpha j} v_\alpha

\displaystyle \partial_i u^i = 0.

This is basically the system I worked with in my previous paper. For instance, one could set one of the components of {v_\alpha}, say {v_\zeta} to be identically {1}, and {g^{\zeta j}} to be an arbitrary divergence-free vector field for that component, then {u^j = g^{\zeta j}}, and all the other components {v_\beta} of {v} are transported by this static velocity field, leading for instance to exponential growth of vorticity if {g^{\zeta j}} has a hyperbolic fixed point and the initial data of the components of {v} other than {v_\alpha} are generic. (Alas, I was not able to modify this example to obtain something more dramatic than exponential growth, such as finite time blowup.)

Alternatively, one can set {g^{j\alpha}} to vanish, leaving one with

\displaystyle D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} + \frac{1}{2} v_\alpha v_\beta \partial_i g^{\alpha \beta} = - \partial_i p. \ \ \ \ \ (26)

\displaystyle D_t v_\beta = 0 \ \ \ \ \ (27)

\displaystyle u^j = g^{ij} v_i \ \ \ \ \ (28)

\displaystyle \partial_i u^i = 0. \ \ \ \ \ (29)

If {M_2} consists of a single coordinate {\zeta}, then on setting {\rho := \frac{1}{2} v_\zeta^2}, this simplifies to

\displaystyle D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} = - \partial_i p - \rho \partial_i g^{\zeta\zeta}

\displaystyle D_t \rho = 0

\displaystyle u^j = g^{ij} v_i

\displaystyle \partial_i u^i = 0.

If we take {M_1} to be {{\bf R} \times {\bf R}^+} with the Euclidean metric {g^{ij}}, and set {g^{\zeta\zeta} = y} (so that {M} has the metric (17)), then one obtains the Boussinesq system (1)(3), giving the claimed embedding.

Now we perform a similar analysis for the axially symmetric Euler equations. The cylindrical coordinate system {(z,r,\theta)} is slightly inconvenient to work with because the volume form {r\ dz dr d\theta} is not Euclidean. We therefore introduce Turkington coordinates

\displaystyle (x,y,\zeta) := (z,r^2/2,\theta)

to rewrite the metric {dz^2 + dr^2 + r^2 d\theta^2} as

\displaystyle dx^2 + \frac{1}{2y} dy^2 + 2y d\zeta^2

so that the volume form {dx dy d\zeta} is now Euclidean, and the Euler equations become (14)(16). Splitting as before, with {M_1} being the two-dimensional manifold parameterised by {x,y}, and {M_2} the one-dimensional manifold parameterised by {\zeta}, the symmetry reduction (18)(21) gives us (26)(29) as before. Explicitly, one has

\displaystyle D_t v_x = - \partial_x p

\displaystyle D_t v_y + v_y^2 - \frac{1}{4y^2} v_\zeta^2 = - \partial_y p

\displaystyle D_t v_\zeta = 0

\displaystyle u^x = v_x; u^y = 2y v_y

\displaystyle \partial_x u^x + \partial_y u^y = 0.

Setting {\omega := \partial_x v_y - \partial_y v_x} to eliminate the pressure {p}, we obtain

\displaystyle D_t \omega = \frac{1}{4y^2} \partial_x (v_\zeta^2)

\displaystyle D_t v_\zeta = 0

\displaystyle \omega := \partial_x(\frac{1}{2y} u^y) - \partial_y u^x

\displaystyle \partial_x u^x + \partial_y u^y = 0.

Since {u^y = r u^r}, {u^x = u^z}, {\partial_x = \partial_z}, {\partial_y = \frac{1}{r} \partial_r}, and {\rho = v_\zeta^2}, we obtain the system (5)(8).

Returning to the general form of (22)(25), one can obtain an interesting transformation of this system by writing {g_{ij}} for the inverse of {g^{ij}} (caution: in general, this is not the restriction of the original metric on {M} to {M_1}), and define the modified covelocity

\displaystyle \tilde v_i := g_{ij} u^j = v_i + g_{ij} g^{j\alpha} v_\alpha,

then by the Leibniz rule

\displaystyle D_t \tilde v_i = D_t v_i + v_\alpha D_t (g_{ij} g^{j\alpha})

\displaystyle = - \frac{1}{2} v_k v_j \partial_i g^{jk} - v_j v_\alpha \partial_i g^{j\alpha} - v_\alpha v_\beta \frac{1}{2} \partial_i g^{\alpha \beta} - \partial_i p'

\displaystyle + v_\alpha u^k \partial_k( g_{ij} g^{j\alpha} )

Replacing the covelocity with the modified covelocity, this becomes

\displaystyle = - \frac{1}{2} \tilde v_k \tilde v_j \partial_i g^{jk} + \tilde v_k g_{jl} g^{l\alpha} v_\alpha \partial_i g^{jk} - \frac{1}{2} g_{km} g^{m\beta} v_\beta g_{jl} g^{l\alpha} v_\alpha \partial_i g^{jk}

\displaystyle - \tilde v_j v_\alpha \partial_i g^{j\alpha} + g_{jl} g^{l \beta} v_\alpha v_\beta \partial_i g^{j\alpha} - v_\alpha v_\beta \frac{1}{2} \partial_i g^{\alpha \beta} - \partial_i p

\displaystyle + v_\alpha u^k \partial_k( g_{ij} g^{j\alpha} ).

We thus have the system

\displaystyle D_t \tilde v_i + \frac{1}{2} \tilde v_k \tilde v_j \partial_i g^{jk} = - \partial_i p' + R_i^{j\alpha} \tilde v_j v_\alpha + S_i^{\alpha \beta} v_\alpha v_\beta

\displaystyle D_t v_\alpha = 0

\displaystyle u^i = g^{ij} \tilde v_j

\displaystyle \partial_i u^i = 0

where

\displaystyle R_i^{j\alpha} := g_{kl} g^{l\alpha} \partial_i g^{jk} - \partial_i g^{j\alpha} + g^{jk} \partial_k( g_{il} g^{l\alpha} )

\displaystyle = g^{jk} (\partial_k (g^{l\alpha} g_{il}) - \partial_i(g^{l\alpha} g_{kl}))

\displaystyle S_i^{\alpha \beta} := -\frac{1}{2} g_{km} g^{m\beta} g_{jl} g^{l\alpha} \partial_i g^{jk} + g_{jl} g^{l\beta} \partial_i g^{j\alpha} - \frac{1}{2} \partial_i g^{\alpha\beta} = \frac{1}{2} \partial_i (g^{m\beta} g^{l\alpha} g_{ml} - g^{\alpha \beta})

and so if one writes

\displaystyle p' := p + \frac{1}{2} \tilde v_k \tilde v_j g^{jk}

\displaystyle \theta^\alpha_i := g^{l\alpha} g_{il}

\displaystyle \omega^\alpha_{ki} := \partial_k \theta^\alpha_i - \partial_i \theta^\alpha_k

\displaystyle F^{\alpha \beta} := \frac{1}{2} (g^{\alpha \beta} - g^{m\beta} g^{l\alpha} g_{ml})

we obtain

\displaystyle D_t \tilde v_i - u^j \partial_i \tilde v_j = - \partial_i p' + u^k v_\alpha \omega^\alpha_{ki} - v_\alpha v_\beta \partial_i F^{\alpha \beta}

\displaystyle D_t v_\alpha = 0

\displaystyle u^i = g^{ij} \tilde v_j

\displaystyle \partial_i u^i = 0.

For each {\alpha,\beta}, we can specify {F^{\alpha\beta}} as an arbitrary smooth function of space (it has to be positive definite to keep the manifold {M} Riemannian, but one can add an arbitrary constant to delete this constraint), and {\omega^\alpha_{ki}} as an arbitrary time-independent exact {2}-form. Thus we obtain an incompressible Euler system with two new forcing terms, one term {v_\alpha v_\beta \partial_i F^{\alpha \beta}} coming from passive scalars {v_\alpha, v_\beta}, and another term {u^k v_\alpha \omega^\alpha_{ki}} that sets up some rotation between the components {\tilde v_i}, with the rotation speed determined by a passive scalar {v_\alpha}.

Remark 1 As a sanity check, one can observe that one still has conservation of the kinetic energy, which is equal to

\displaystyle \frac{1}{2} \int g^{jk} v_j v_k + 2 g^{j\alpha} v_j v_\alpha + g^{\alpha \beta} v_\alpha v_\beta

and can be expressed in terms of {u^j} and {v_\alpha} as

\displaystyle \int g_{jk} u^j u^k + v_\alpha v_\beta (g^{\alpha \beta} - g^{j\alpha} g^{k\beta} g_{jk})

\displaystyle = \int u^j \tilde v_j + 2 v_\alpha v_\beta F^{\alpha \beta}.

One can check this is conserved by the above system (mainly due to the antisymmetry of {\omega}).

As one special case of this system, one can work with a one-dimensional fibre manifold {M_2}, and set {v_\zeta=1} and {F^{\zeta\zeta}=0} for the single coordinate {\zeta} of this manifold. This leads to the system

\displaystyle D_t \tilde v_i - u^j \partial_i \tilde v_j = - \partial_i p' + u^k \omega_{ki}

\displaystyle u^i = g^{ij} \tilde v_j

\displaystyle \partial_i u^i = 0.

where {\omega_{ki}} is some smooth time-independent exact {2}-form that one is free to specify. This resembles an Euler equation in the presence of a “magnetic field” that rotates the velocity of the fluid. I am currently experimenting with trying to use this to force some sort of blowup, though I have not succeeded so far (one would obviously have to use the pressure term at some point, for if the pressure vanished then one could keep things bounded using the method of characteristics).

January 27, 2018

Jordan EllenbergHall of Fame ballots: some quick and dirty clustering

Since all the public Hall of Fame ballots are now available online in machine-readable form, thanks to Ryan Thibodeaux, I thought I’d mess around with the built-in clustering functions in sklearn and see what I could see.

The natural metric on ballots is Hamming distance, I guess.   I first tried the AgglomerativeClustering package.  I didn’t tell it what metric to use on the ballots, but I’m assuming it’s using Hamming distance, aka Euclidean in this case.  I asked AgglomerativeClustering to break the Hall of Fame voters into 2 clusters.  Guess what I found?  There’s a cluster of 159 voters who almost entirely voted for Barry Bonds and Roger Clemens, and a cluster of 83 who universally didn’t.  You won’t be surprised to hear that those who voted for Bonds and Clemens were also a lot more likely to vote for Manny Ramirez, Sammy Sosa, and Curt Schilling than the other cluster.

Which candidate was most notably unpopular among the Bonds-forgivers?  That would be Omar Vizquel.  He was on 53% of the steroid-rejector ballots!  Only 24% of the Bonds cluster thought Omar deserved Cooperstown.

Then I tried asking AgglomerativeClustering for four clusters.  The 83 anti-steroids folks all stayed together.  But the bigger group now split into Cluster 1 (61 ballots), Cluster 2 (46), and Cluster 3 (52).  Cluster 1 is the Joe Posnanski cluster.  Cluster 2 is the Nick Cafardo cluster.  Cluster 3 is the Tim Kurkjian cluster.

What differentiates these?  Cluster 1 is basically “people who voted for Larry Walker.”  The difference between Cluster 2 and Cluster 3 is more complicated.  The Cluster 2 ballots were much more likely to have:

Manny Ramirez, Sammy Sosa

and much less likely to have

Mike Mussina, Edgar Martinez, Curt Schilling

I’m not sure how to read this!  My best guess is that there’s no animus towards pitchers and DHs here; if you’re voting for Bonds and Clemens and Sosa and Ramirez and the guys who basically everybody voted for, you just don’t have that many votes left.  So I’d call Cluster 2 the “2000s-slugger loving cluster” and Cluster 3 everybody else.

Maybe I should say how you actually do this?  OK, first of all you munge the spreadsheet until you have a 0-1 matrix X where the rows are voters and the columns are baseball players.  Then your code looks like:

import sklearn

model = AgglomerativeClustering(n_clusters=4)

modplay.labels_

which outputs

array([1, 0, 3, 1, 1, 1, 0, 0, 0, 0, 2, 1, 2, 1, 3, 0, 0, 0, 2, 1, 0, 3, 2,
1, 2, 1, 1, 3, 1, 3, 3, 0, 2, 2, 0, 1, 1, 1, 0, 2, 0, 0, 1, 2, 1, 3,
2, 2, 1, 3, 0, 2, 0, 3, 1, 0, 0, 2, 0, 2, 1, 2, 1, 0, 0, 0, 1, 0, 2,
0, 1, 1, 2, 0, 1, 3, 0, 0, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 2, 0, 1, 0,
0, 0, 3, 1, 1, 0, 1, 0, 3, 1, 3, 3, 2, 0, 2, 1, 0, 2, 2, 3, 2, 3, 1,
3, 0, 3, 1, 0, 2, 1, 0, 0, 0, 1, 3, 1, 1, 3, 2, 3, 3, 2, 2, 0, 3, 3,
1, 0, 0, 2, 2, 3, 1, 3, 1, 2, 0, 1, 3, 1, 0, 0, 2, 3, 0, 2, 1, 0, 2,
1, 3, 3, 0, 1, 3, 1, 1, 0, 0, 2, 0, 1, 2, 0, 2, 1, 0, 0, 3, 3, 1, 1,
2, 3, 2, 0, 2, 0, 0, 1, 2, 1, 0, 3, 1, 3, 0, 3, 3, 3, 0, 3, 3, 3, 0,
2, 0, 3, 3, 0, 1, 0, 1, 2, 3, 2, 2, 0, 0, 0, 1, 3, 3, 1, 0, 0, 1, 3,
0, 2, 3, 1, 0, 0, 0, 0, 0, 3, 3, 3])

i.e. a partition of the voters into four groups.

(Agglomerative clustering naturally generates a hierarchical clustering, i.e. a tree with the HoF voters on the leaves; there must be some way to get sklearn to output this directly, but I don’t know it!

Of course, if you have a 0-1 matrix, you don’t have to cluster the rows — you can cluster the columns! This time, just for kicks, I used the hierarchical clustering package in scipy.  I think this one is just agglomerating too.  But it has a nice output package!  Here, Y is the transpose of X above, a 0-1 matrix telling us which players were on which ballots.  Then:

>> import scipy
>>> Z = scipy.cluster.hierarchy.linkage(Y)
>>> Dend = scipy.cluster.hierarchy.dendrogram(Z,labels=(a list of player names))
>>> plt.xticks(ha=’right’)
>>> plt.show()

gives

Not bad! You can see that Bonds and Clemens form their own little cluster in red.  There is not that much structure here — maybe worth noting that this method may be dominated by the variable “number of votes received.”  Still, the two stories we’ve told here do seem to have some coarse features in common:  Bonds/Clemens are a cluster, and Larry Walker voting is kind of its own variable independent of the rest of the ballot.

OK, this picture was nice so I couldn’t resist doing one for the voters:

Pretty hard to read!  I think that black cluster on the end is probably the no-Bonds-no-Clemens gang.  But what I want you to notice is that there’s one outlying node all the way over to the left, which the clustering algorithm identifies as the weirdest ballot made public.  It’s Sadiel LeBron, who voted for Clemens, Sosa, and Ramirez, but not Bonds.  And also he voted for Andruw Jones and Omar Vizquel!  Yeah, that’s a weird ballot.

I’m sure this isn’t the right way to visualize a 0-1 matrix.  Here’s what I’d do if I felt like spending a little more time:  write something up to look for a positive definite rank-2 matrix A such that

A_{ij} > A_{ik}

whenever voter i voted for player j but not player k.  That models the idea of each player being a point in R^2 and each voter being a point in R^2 and then voters vote for every player whose dot product with them is large enough.  My intuition is that this would be a better way of plotting ballots in a plane than just doing PCA on the 0-1 matrix.  But maybe it would come out roughly the same, who knows?

Presumably there are known best practices for clustering subsets of a fixed set (or, more generally, finding good embeddings into visualizable metric spaces like the plane.)  Tell me about them!

 

January 26, 2018

Matt von HippelThe Rippling Pond Universe

[Background: Someone told me they couldn’t imagine popularizing Quantum Field Theory in the same flashy way people popularize String Theory. Naturally I took this as a challenge. Please don’t take any statements about what “really exists” here too seriously, this isn’t intended as metaphysics, just metaphor.]

 

You probably learned about atoms in school.

Your teacher would have explained that these aren’t the same atoms the ancient Greeks imagined. Democritus thought of atoms as indivisible, unchanging spheres, the fundamental constituents of matter. We know, though, that atoms aren’t indivisible. They’re clouds of electrons, buzzing in their orbits around a nucleus of protons and neutrons. Chemists can divide the electrons from the rest, nuclear physicists can break the nucleus. The atom is not indivisible.

And perhaps your teacher remarked on how amazing it is, that the nucleus is such a tiny part of the atom, that the atom, and thus all solid matter, is mostly empty space.

 

You might have learned that protons and neutrons, too, are not indivisible. That each proton, and each neutron, is composed of three particles called quarks, particles which can be briefly freed by powerful particle colliders.

And you might have wondered, then, even if you didn’t think to ask: are quarks atoms? The real atoms, the Greek atoms, solid indestructible balls of fundamental matter?

 

They aren’t, by the way.

 

You might have gotten an inkling of this, learning about beta decay. In beta decay, a neutron transforms, becoming a proton, an electron, and a neutrino. Look for an electron inside a neutron, and you won’t find one. Even if you look at the quarks, you see the same transformation: a down quark becomes an up quark, plus an electron, plus a neutrino. If quarks were atoms, indivisible and unchanging, this couldn’t happen. There’s nowhere for the electron to hide.

 

In fact, there are no atoms, not the way the Greeks imagined. Just ripples.

Water Drop

Picture the universe as a pond. This isn’t a still pond: something has disturbed it, setting ripples and whirlpools in motion. These ripples and whirlpools skim along the surface of the pond, eddying together and scattering apart.

Our universe is not a simple pond, and so these are not simple ripples. They shine and shimmer, each with their own bright hue, colors beyond our ordinary experience that mix in unfamiliar ways. The different-colored ripples interact, merge and split, and the pond glows with their light.

Stand back far enough, and you notice patterns. See that red ripple, that stays together and keeps its shape, that meets other ripples and interacts in predictable ways. You might imagine the red ripple is an atom, truly indivisible…until it splits, transforms, into ripples of new colors. The quark has changed, down to up, an electron and a neutrino rippling away.

All of our world is encoded in the colors of these ripples, each kind of charge its own kind of hue. With a wink (like your teacher’s, telling you of empty atoms), I can tell you that distance itself is just a kind of ripple, one that links other ripples together. The pond’s very nature as a place is defined by the ripples on it.

 

This is Quantum Field Theory, the universe of ripples. Democritus said that in truth there are only atoms and the void, but he was wrong. There are no atoms. There is only the void. It ripples and shimmers, and each of us lives as a collection of whirlpools, skimming the surface, seeming concrete and real and vital…until the ripples dissolve, and a new pattern comes.

January 25, 2018

Alexey PetrovRapid-response (non-linear) teaching: report

Some of you might remember my previous post about non-linear teaching, where I described a new teaching strategy that I came up with and was about to implement in teaching my undergraduate Classical Mechanics I class. Here I want to report on the outcomes of this experiment and share some of my impressions on teaching.

Course description

Our Classical Mechanics class is a gateway class for our physics majors. It is the first class they take after they are done with general physics lectures. So the students are already familiar with the (simpler version of the) material they are going to be taught. The goal of this class is to start molding physicists out of physics students. It is a rather small class (max allowed enrollment is 20 students; I had 22 in my class), which makes professor-student interaction rather easy.

Rapid-response (non-linear) teaching: generalities

To motivate the method that I proposed, I looked at some studies in experimental psychology, in particular in memory and learning studies. What I was curious about is how much is currently known about the process of learning and what suggestions I can take from the psychologists who know something about the way our brain works in retaining the knowledge we receive.

As it turns out, there are some studies on this subject (I have references, if you are interested). The earliest ones go back to 1880’s when German psychologist Hermann Ebbinghaus hypothesized the way our brain retains information over time. The “forgetting curve” that he introduced gives approximate representation of information retention as a function of time. His studies have been replicated with similar conclusions in recent experiments.

EbbinghausCurveThe upshot of these studies is that loss of learned information is pretty much exponential; as can be seen from the figure on the left, in about a day we only retain about 40% of what we learned.

Psychologists also learned that one of the ways to overcome the loss of information is to (meaningfully) retrieve it: this is how learning  happens. Retrieval is critical for robust, durable, and long-term learning. It appears that every time we retrieve learned information, it becomes more accessible in the future. It is, however, important how we retrieve that stored information: simple re-reading of notes or looking through the examples will not be as effective as re-working the lecture material. It is also important how often we retrieve the stored info.

So, here is what I decided to change in the way I teach my class in light of the above-mentioned information (no pun intended).

Rapid-response (non-linear) teaching: details

To counter the single-day information loss, I changed the way homework is assigned: instead of assigning homework sets with 3-4-5 problems per week, I introduced two types of homework assignments: short homeworks and projects.

Short homework assignments are single-problem assignments given after each class that must be done by the next class. They are designed such that a student needs to re-derive material that was discussed previously in class (with small new twist added). For example, if the block-down-to-incline problem was discussed in class, the short assignment asks to redo the problem with a different choice of coordinate axes. This way, instead of doing an assignment in the last minute at the end of the week, the students are forced to work out what they just learned in class every day (meaningful retrieval)!

The second type of assignments, project homework assignments are designed to develop understanding of how topics in a given chapter relate to each other. There are as many project assignments as there are chapters. Students get two weeks to complete them.

At the end, the students get to solve approximately the same number of problems over the course of the semester.

For a professor, the introduction of short homework assignments changes the way class material is presented. Depending on how students performed on the previous short homework, I adjusted the material (both speed and volume) that we discussed in class. I also designed examples for the future sections in such a way that I could repeat parts of the topic that posed some difficulties in comprehension. Overall, instead of a usual “linear” propagation of the course, we moved along something akin to helical motion, returning and spending more time on topics that students found more difficult (hence “rapid-response or non-linear” teaching).

Other things were easy to introduce: for instance, using Socrates’ method in doing examples. The lecture itself was an open discussion between the prof and students.

Outcomes

So, I have implemented this method in teaching Classical Mechanics I class in Fall 2017 semester. It was not an easy exercise, mostly because it was the first time I was teaching GraphNonlinearTeachingthis class and had no grader help. I would say the results confirmed my expectations: introduction of short homework assignments helps students to perform better on the exams. Now, my statistics is still limited: I only had 20 students in my class. Yet, among students there were several who decided to either largely ignore short homework assignments or did them irregularly. They were given zero points for each missed short assignment. All students generally did well on their project assignments, yet there appears some correlation (see graph above) between the total number of points acquired on short homework assignments and exam performance (measured by a total score on the Final and two midterms). This makes me thing that short assignments were beneficial for students. I plan to teach this course again next year, which will increase my statistics.

I was quite surprised that my students generally liked this way of teaching. In fact, they were disappointed that I decided not to apply this method for the Mechanics II class that I am teaching this semester. They also found that problems assigned in projects were considerably harder than the problems from the short assignments (this is how it was supposed to be).

For me, this was not an easy semester. I had to develop my set of lectures — so big thanks go to my colleagues Joern Putschke and Rob Harr who made their notes available. I spent a lot of time preparing this course, which, I think, affected my research outcome last semester. Yet, most difficulties are mainly Wayne State-specifics: Wayne State does not provide TAs for small classes, so I had to not only design all homework assignments, but also grade them (on top of developing the lectures from the ground up). During the semester, it was important to grade short assignments in the same day I received them to re-tune lectures, this did take a lot of my time. I would say TAs would certainly help to run this course — so I’ll be applying for some internal WSU educational grants to continue development of this method. I plan to employ it again next year to teach Classical Mechanics.

January 19, 2018

Matt von HippelTutoring at GGI

I’m still at the Galileo Galilei Institute this week, tutoring at the winter school.

At GGI’s winter school, each week is featuring a pair of lecturers. This week, the lectures alternate between Lance Dixon covering the basics of amplitudeology and Csaba Csaki, discussing ways in which the Higgs could be a composite made up of new fundamental particles.

Most of the students at this school are phenomenologists, physicists who make predictions for particle physics. I’m an amplitudeologist, I study the calculation tools behind those predictions. You’d think these would be very close areas, but it’s been interesting seeing how different our approaches really are.

Some of the difference is apparent just from watching the board. In Csaki’s lectures, the equations that show up are short, a few terms long at most. When amplitudes show up, it’s for their general properties: how many factors of the coupling constant, or the multipliers that show up with loops. There aren’t any long technical calculations, and in general they aren’t needed: he’s arguing about the kinds of physics that can show up, not the specifics of how they give rise to precise numbers.

In contrast, Lance’s board filled up with longer calculations, each with many moving parts. Even things that seem simple from our perspective take a decent amount of board space to derive, and involve no small amount of technical symbol-shuffling. For most of the students, working out an amplitude this complicated was an unfamiliar experience. There are a few applications for which you need the kind of power that amplitudeology provides, and a few students were working on them. For the rest, it was a bit like learning about a foreign culture, an exercise in understanding what other people are doing rather than picking up a new skill themselves. Still, they made a strong go at it, and it was enlightening to see the pieces that ended up mattering to them, and to hear the kinds of questions they asked.

January 17, 2018

Sean CarrollBeyond Falsifiability

I have a backlog of fun papers that I haven’t yet talked about on the blog, so I’m going to try to work through them in reverse chronological order. I just came out with a philosophically-oriented paper on the thorny issue of the scientific status of multiverse cosmological models:

Beyond Falsifiability: Normal Science in a Multiverse
Sean M. Carroll

Cosmological models that invoke a multiverse – a collection of unobservable regions of space where conditions are very different from the region around us – are controversial, on the grounds that unobservable phenomena shouldn’t play a crucial role in legitimate scientific theories. I argue that the way we evaluate multiverse models is precisely the same as the way we evaluate any other models, on the basis of abduction, Bayesian inference, and empirical success. There is no scientifically respectable way to do cosmology without taking into account different possibilities for what the universe might be like outside our horizon. Multiverse theories are utterly conventionally scientific, even if evaluating them can be difficult in practice.

This is well-trodden ground, of course. We’re talking about the cosmological multiverse, not its very different relative the Many-Worlds interpretation of quantum mechanics. It’s not the best name, as the idea is that there is only one “universe,” in the sense of a connected region of space, but of course in an expanding universe there will be a horizon past which it is impossible to see. If conditions in far-away unobservable regions are very different from conditions nearby, we call the collection of all such regions “the multiverse.”

There are legitimate scientific puzzles raised by the multiverse idea, but there are also fake problems. Among the fakes is the idea that “the multiverse isn’t science because it’s unobservable and therefore unfalsifiable.” I’ve written about this before, but shockingly not everyone immediately agreed with everything I have said.

Back in 2014 the Edge Annual Question was “What Scientific Theory Is Ready for Retirement?”, and I answered Falsifiability. The idea of falsifiability, pioneered by philosopher Karl Popper and adopted as a bumper-sticker slogan by some working scientists, is that a theory only counts as “science” if we can envision an experiment that could potentially return an answer that was utterly incompatible with the theory, thereby consigning it to the scientific dustbin. Popper’s idea was to rule out so-called theories that were so fuzzy and ill-defined that they were compatible with literally anything.

As I explained in my short write-up, it’s not so much that falsifiability is completely wrong-headed, it’s just not quite up to the difficult task of precisely demarcating the line between science and non-science. This is well-recognized by philosophers; in my paper I quote Alex Broadbent as saying

It is remarkable and interesting that Popper remains extremely popular among natural scientists, despite almost universal agreement among philosophers that – notwithstanding his ingenuity and philosophical prowess – his central claims are false.

If we care about accurately characterizing the practice and principles of science, we need to do a little better — which philosophers work hard to do, while some physicists can’t be bothered. (I’m not blaming Popper himself here, nor even trying to carefully figure out what precisely he had in mind — the point is that a certain cartoonish version of his views has been elevated to the status of a sacred principle, and that’s a mistake.)

After my short piece came out, George Ellis and Joe Silk wrote an editorial in Nature, arguing that theories like the multiverse served to undermine the integrity of physics, which needs to be defended from attack. They suggested that people like me think that “elegance [as opposed to data] should suffice,” that sufficiently elegant theories “need not be tested experimentally,” and that I wanted to “to weaken the testability requirement for fundamental physics.” All of which is, of course, thoroughly false.

Nobody argues that elegance should suffice — indeed, I explicitly emphasized the importance of empirical testing in my very short piece. And I’m not suggesting that we “weaken” anything at all — I’m suggesting that we physicists treat the philosophy of science with the intellectual care that it deserves. The point is not that falsifiability used to be the right criterion for demarcating science from non-science, and now we want to change it; the point is that it never was, and we should be more honest about how science is practiced.

Another target of Ellis and Silk’s ire was Richard Dawid, a string theorist turned philosopher, who wrote a provocative book called String Theory and the Scientific Method. While I don’t necessarily agree with Dawid about everything, he does make some very sensible points. Unfortunately he coins the term “non-empirical theory confirmation,” which was an extremely bad marketing strategy. It sounds like Dawid is saying that we can confirm theories (in the sense of demonstrating that they are true) without using any empirical data, but he’s not saying that at all. Philosophers use “confirmation” in a much weaker sense than that of ordinary language, to refer to any considerations that could increase our credence in a theory. Of course there are some non-empirical ways that our credence in a theory could change; we could suddenly realize that it explains more than we expected, for example. But we can’t simply declare a theory to be “correct” on such grounds, nor was Dawid suggesting that we could.

In 2015 Dawid organized a conference on “Why Trust a Theory?” to discuss some of these issues, which I was unfortunately not able to attend. Now he is putting together a volume of essays, both from people who were at the conference and some additional contributors; it’s for that volume that this current essay was written. You can find other interesting contributions on the arxiv, for example from Joe Polchinski, Eva Silverstein, and Carlo Rovelli.

Hopefully with this longer format, the message I am trying to convey will be less amenable to misconstrual. Nobody is trying to change the rules of science; we are just trying to state them accurately. The multiverse is scientific in an utterly boring, conventional way: it makes definite statements about how things are, it has explanatory power for phenomena we do observe empirically, and our credence in it can go up or down on the basis of both observations and improvements in our theoretical understanding. Most importantly, it might be true, even if it might be difficult to ever decide with high confidence whether it is or not. Understanding how science progresses is an interesting and difficult question, and should not be reduced to brandishing bumper-sticker mottos to attack theoretical approaches to which we are not personally sympathetic.

Mark Chu-CarrollInference with Sets in Type Theory

It’s been quite a while since I did any meaningful writing on type theory. I spent a lot of time introducing the basic concepts, and trying to explain the intuition behind them. I also spent some time describing what I think of as the platform: stuff like arity theory, which we need for talking about type construction. But once we move beyond the basic concepts, I ran into a bit of a barrier – no so much in understanding type theory, but in finding ways of presenting it that will be approacha ble.

I’ve been struggling to figure out how to move forward in my exploration of type theory. The logical next step is working through the basics of intuitionistic logic with type theory semantics. The problem is, that’s pretty dry material. I’ve tried to put together a couple of approaches that skip over this, but it’s really necessary.

For someone like me, coming from a programming language background, type theory really hits its stride when we look at type systems and particularly type inference. But you can’t understand type inference without understanding the basic logic. In fact, you can’t describe the algorithm for type inference without referencing the basic inference rules of the underlying logic. Type inference is nothing but building type theoretic proofs based on a program.

So here we are: we need to spend some time looking at the basic logic of type theory. We’ve looked at the basic concepts that underlie the syntax and semantics, so what we need to do next is learn the basic rules that we use to build logical statements in the realm of type theory. (If you’re interested in more detail, this is material from chapter 5 of “Programming in Martin-Löof’s Type Theory”, which is the text I’m using to learn this material.)

Martin Löoff’s type theory is a standard intuitionistic predicate logic, so we’ll go through the rules using standard sequent notation. Each rule is a sequence which looks sort-of like a long fraction. The “numerator” section is a collection of things which we already know are true; the “denominator” is something that we can infer given those truths. Each statement has the form A[\Gamma], where A is a statement, and B is a set of assumptions. For example, F(a) [a \in A, \Delta] means that F(a) is true, provided we’re in a context that includes a \in A.

Personally, I find that this stuff feels very abstract until you take the logical statements, and interpret them in terms of programming. So throughout this post, I’ll do that for each of the rules.

With that introduction out of the way, let’s dive in to the first set of rules.

Simple Introductions

We’ll start off with a couple of really easy rules, which allow us to introduce a variable given a type, or a type given a variable.

Introducing Elements

\frac{A \,\text{type}}{a \in A \,[a \in A]}

This is an easy one. It says that if we know that A is a type, then we can introduce the statement that a \in A, and add that as an assumption in our context. What this means is also simple: since our definition of type says that it’s only a type if it has an element, then if we know that A is a type, we know that there must be an element of A, and so we can write statements using it.

If you think of this in programming terms, the statement A \text{type} is saying that A is a type. To be a valid type, there must be at least one value that belongs to the type. So you’re allowed to introduce a variable that can be assigned a value of the type.

Introducing Propositions as Types

 \frac{a \in A \, []}{A \, \text{true}}

This is almost the mirror image of the previous. A type and a true proposition are the same thing in our type theory: a proposition is just a type, which is a set with at least one member. So if we know that there’s a member of the set A, then A is both a type and a true proposition.

Equality Rules

We start with the three basic rules of equality: equality is reflexive, symmetric, and transitive.

Reflexivity

 \frac{a \in A}{a = a \in A}

 \frac{A \, \text{type}}{A = A}

If a is an element of a type A, then a is equal to itself in type A; and if A is a type, then A is equal to itself.

The only confusing thing about this is just that when we talk about an object in a type, we make reference to the type that it’s a part of. This makes sense if you think in terms of programming: you need to declare the type of your variables. “3: Int” doesn’t necessarily mean the same object as “3: Real”; you need the type to disambiguate the statement. So within type theory, we always talk about values with reference to the type that they’re a part of.

Symmetry

\frac{a = b \in A}{b = a \in A}

\frac{A = B}{B = A}

No surprises here – standard symmetry.

Transitivity

\frac{a = b \in A \quad b = c \in A}{a = c \in A}

\frac{A = B \quad B = C}{A = C}

Type Equality

\frac{a \in A \quad A = B}{a \in B}

\frac{a = b \in A \quad A = B}{a = b \in B}

These are pretty simple, and follow from the basic equality rules. If we know that a is a member of the type A, and we know that the type A equals the type B, then obviously a is also a member of B. Along the same lines, if we know that a=b in type A, and A equals B, then a=b in the type B.

Substitution Rules

We’ve got some basic rules about how to formulate some simple meaningful statements in the logic of our type theory. We still can’t do any interesting reasoning; we haven’t built up enough inference rules. In particular, we’ve only been looking at simple, atomic statements using parameterless predicates.

We can use those basic rules to start building upwards, to get to parametric statements, by using substitution rules that allow us to take a parametric statement and reason with it using the non-parametric rules we talked about above.

For example, a parametric statement can be something like C(x) \, \text{type} [x \in A], which says that applying C to a value x which is a member of type A produces a value which is a type. We can use that to produce new inference rules like the ones below.

 \frac{C(x) \, \text{type} [x \in A] \quad a \in A}{C(a) \, \text{type}}

This says that if we know that given a of type A, C will produce a type; and we know that the value a is of type A, then C(a) will be a type. In logical terms, it’s pretty straightforward; in programming terms it’s even clearer: if C is a function on type A, and we pass it a value of type A, it will produce a result. In other words, C(a) is defined for all values of type A.

 \frac{C(x) \, \text{type}[x \in A] \quad a = b \in A}{C(a) = C(b)}

This is even simpler: if C is a function on type A, then given two values that are equal in type A, C will produce the same result for those values.

Of course, I’m lying a bit. In this stuff, C isn’t really a function. It’s a logical statement; C(x) isn’t quite a function. It’s a logical stamement which includes the symbol x; when we say C(a), what we mean is the logical statement C, with the object a substituted for the symbol x. But I think the programming metaphors help clarify what it means.

Using those two, we can generate more:

 \frac{c(x) \in C(x) [x \in A] \quad a \in A}{c(a) \in C(A)}

This one becomes interesting. C(x) is a proposition which is parametric in x. Then c(x) is a proof-element: it’s an instance of C(x) which proves that C(X) is a type, and we can see c as a computation which, given an element of a produces a instance of C. Then what this judgement says is that given an instance a of type A, we know that c(a) is an instance of type C(a). This will become very important later on, when we really get in to type inference and quantification and parametric types.

\frac{c(x) \in C(x) [x \in A] \quad a = b}{c(a)=c(b)\in C(a)}

This is just a straightforward application of equality to proof objects.

There’s more of these kinds of rules, but I’m going to stop here. My goal isn’t to get you to know every single judgement in the intuitionistic logic of type theory, but to give you a sense of what they mean.

That brings us to the end of the basic inference rules. The next things we’ll need to cover are ways of constructing new types or types from existing ones. The two main tools for that are enumeration types (basically, types consisting of a group of ordered values), and cartesian products of multiple types. With those, we’ll be able to find ways of constructing most of the types we’ll want to use in programming languages.

January 04, 2018

Mark Chu-CarrollA Gentle Explanation of the Intel Speculative Execution CPU Bug

There’s a lot of talk today about the recent discovery of a significant bug in the Intel CPUs that so many of us use every day. It’s an interesting problem, and understanding it requires knowing a little bit about how CPUs work, so I thought I’d take a shot at writing an explainer.

Let me start with a huge caveat: this involves a lot of details about how CPUs work, and in order to explain it, I’m going to simplify things to an almost ridiculous degree in order to try to come up with an explanation that’s comprehensible to a lay person. I’m never deliberately lying about how things work, but at times, I’m simplifying enough that it will be infuriating to an expert. I’m doing my best to explain my understanding of this problem in a way that most people will be able to understand, but I’m bound to oversimplify in some places, and get details wrong in others. I apologize in advance.

It’s also early days for this problem. Intel is still trying to keep the exact details of the bug quiet, to make it harder for dishonest people to exploit it. So I’m working from the information I’ve been able to gather about the issue so far. I’ll do my best to correct this post as new information comes out, but I can only do that when I’m not at work!

That said: what we know so far is that the Intel bug involves non-kernel code being able to access cached kernel memory through the use of something called speculative execution.

To an average person, that means about as much as a problem in the flux thruster atom pulsar electrical ventury space-time implosion field generator coil.

Let’s start with a quick overview of a modern CPU.

The CPU, in simple terms, the brain of a computer. It’s the component that actually does computations. It reads a sequence of instructions from memory, and then follows those instructions to perform some computation on some values, which are also stored in memory. This is a massive simplification, but basically, you can think of the CPU as a pile of hardware than runs a fixed program:

def simplified_cpu_main_loop():
  IP = 0
  while true:
     (op, in1, in2, out) = fetch(IP)
     val1 = fetch(in1)
     val2 = fetch(in2)
     result, IP = perform(op, in1, in2)
     store(result, out)

There’s a variable called the instruction pointer (abbreviated IP) built-in to the CPU that tells it where to fetch the next instruction from. Each time the clock ticks, the CPU fetches an instruction from the memory address stored in the instruction pointer, fetches the arguments to that instruction from cells in memory, performs the operation on those arguments, and then stores the result into another cell in the computer memory. Each individual operation produces both a result, and a new value for the instruction pointer. Most of the time, you just increment the instruction pointer to look at the next instruction, but for comparisons or branches, you can change it to something else.

What I described above is how older computers really worked. But as CPUs got faster, chipmaker ran into a huge problem: the CPU can perform its operations faster and faster every year. But retrieving a value from memory hasn’t gotten faster at the same rate as executing instructions. The exact numbers don’t matter, but to give you an idea, a modern CPU can execute an instruction in less than one nanosecond, but fetching a single value from memory takes more than 100 nanoseconds. In the scheme we described above, you need to fetch the instruction from memory (one fetch), and then fetch two parameters from memory (another two fetches), execute the instruction (1 nanosecond), and then store the result back into memory (one store). Assuming a store is no slower than a fetch, that means that for one nanosecond of computation time, the CPU needs to do 3 fetches and one store for each instruction. That means that the CPU is waiting, idle, for at least 400ns, during which it could have executed another 400 instructions, if it didn’t need to wait for memory.

That’s no good, obviously. There’s no point in making a fast CPU if all it’s going to do is sit around and wait for slow memory. But designers found ways to work around that, by creating ways to do a lot of computation without needing to pause to wait things to be retrieved from/stored to memory.

One of those tricks was to add registers to the CPUs. A register is a cell of memory inside of the CPU itself. Early processors (like the 6502 that was used by the Apple II) had one main register called an accumulator. Every arithmetic instruction would work by retrieving a value from memory, and then performing some arithmetic operation on the value in the accumulator and the value retrieved from memory, and leave the result in the accumulator. (So, for example, if 0x1234 were the address variable X, you could add the value of X to the accumulator with the instruction “ADD (1234)”. More modern CPUs added many registers, so that you can keep all of the values that you need for some computation in different registers. Reading values from or writing values to registers is lightning fast – in fact, it’s effectively free. So you structure your computations so that they load up the registers with the values they need, then do the computation in registers, and then dump the results out to memory. Your CPU can run a fairly long sequence of instructions without ever pausing for a memory fetch.

Expanding on the idea of putting memory into the CPU, they added ways of reducing the cost of working with memory by creating copies of the active memory regions on the CPU. These are called caches. When you try to retrieve something from memory, if it’s in the cache, then you can access it much faster. When you access something from a memory location that isn’t currently in the cache, the CPU will copy a chunk of memory including that location into the cache.

You might ask why, if you can make the cache fast, why not just make all of memory like the cache? The answer is that the time it takes in hardware to retrieve something from memory increases with amount of memory that you can potentially access. Pointing at a cache with 1K of memory is lightning fast. Pointing at a cache with 1 megabyte of memory is much slower that the 1K cache, but much faster that a 100MB cache; pointing at a cache with 100MB is even slower, and so on.

So what we actually do in practice is have multiple tiers of memory. We have the registers (a very small set – a dozen or so memory cells, which can be accessed instantly); a level-0 cache (on the order of 8k in Intel’s chips), which is pretty fast; a level-1 cache (100s of kilobytes), an L2 cache (megabytes), L3 (tens of megabytes), and now even L4 (100s of megabytes). If something isn’t in L0 cache, then we look for it in L1; if we can’t find it in L1, we look in L2, and so on, until if we can’t find it in any cache, we actually go out to the main memory.

There’s more we can do to make things faster. In the CPU, you can’t actually execute an entire instruction all at once – it’s got multiple steps. For a (vastly simplified) example, in the pseudocode above, you can think of each instruction as four phases: (1) decode the instruction (figuring out what operation it performs, and what its parameters are), (2) fetch the parameters, (3) perform the operations internal computation, and (4) write out the result. By taking advantage of that, you can set up your CPU to actually do a lot of work in parallel. If there are three phases to executing an instruction, then you can execute phase one of instruction one in one cycle; phase one of instruction two and phase two of instruction one in the next cycle; phase one of instruction three, phase two of instruction two, and phase three of instruction one in the third cycle. This process is called pipelining.

To really take advantage of pipelining, you need to keep the pipeline full. If your CPU has a four-stage pipeline, then ideally, you always know what the next four instructions you’re going to execute are. If you’ve got the machine code version of an if-then-else branch, when you start the comparison, you don’t know what’s going to come next until you finish it, because there are two possibilities. That means that when you get to phase 2 of your branch instruction, you can’t start phase one of the next instruction. instruction until the current one is finished – which means that you’ve lost the advantage of your pipeline.

That leads to another neat trick that people play in hardware, called branch prediction. You can make a guess about which way a branch is going to go. An easy way to understand this is to think of some numerical code:

def run_branch_prediction_demo():
  for i in 1 to 1000:
     for j in 1 to 1000:
          q = a[i][j] * sqrt(b[i][j])

After each iteration of the inner loop, you check to see if j == 1000. If it isn’t, you branch back to the beginning of that loop. 999 times, you branch back to the beginning of the loop, and one time, you won’t. So you can predict that you take the backward branch, and you can start executing the early phases of the first instructions of the next iteration. That may, most of the time you’re running the loop, your pipeline is full, and you’re executing your computation quickly!

The catch is that you can’t execute anything that stores a result. You need to be able to say “Oops, everything that I started after that branch was wrong, so throw it away!”. Alongside with branch prediction, most CPUs also provide speculative execution, which is a way of continuing to execute instructions in the pipeline, but being able to discard the results if they’re the result of an incorrect branch prediction.

Ok, we’re close. We’ve got to talk about just another couple of basic ideas before we can get to just what the problem is with these Intel chips.

We’re going to jump up the stack a bit, and instead of talking directly about the CPU hardware, we’re going to talk about the operating system, and how it’s implemented on the CPU.

An operating system is just a program that runs on your computer. The operating system can load and run other programs (your end-user applications), and it manages all of the resources that those other programs can work with. When you use an application that allocates memory, it sent a request called a syscall to the operating system asking it to give it some memory. If your application wants to read data from a disk drive, it makes a syscall to open a file and read data. The operating system is responsible for really controlling all of those resources, and making sure that each program that’s running only accesses the things that it should be allowed to access. Program A can only use memory allocated by program A; if it tries to access memory allocated by program B, it should cause an error.

The operating system is, therefore, a special program. It’s allowed to touch any piece of memory, any resource owned by anything on the computer. How does that work?

There are two pieces. First, there’s something called memory protection. The hardware provides a mechanism that the CPU can use to say something like “This piece of memory is owned by program A”. When the CPU is running program A, the memory protection system will arrange the way that memory looks to the program so that it can access that piece of memory; anything else just doesn’t exist to A. That’s called memory mapping: the system memory of the computer is mapped for A so that it can see certain pieces of memory, and not see others. In addition to memory mapping, the memory protection system can mark certain pieces of memory as only being accessible by privileged processes.

Privileged processes get us to the next point. In the CPU, there’s something called an execution mode: programs can run in a privileged mode (sometimes called kernel space execution), or it can run in a non-privileged mode (sometimes called user-space execution). Only code that’s running in kernel-space can do things like send commands to the memory manager, or change memory protection settings.

When your program makes a syscall, what really happens is that your program puts the syscall parameters into a special place, and then sends a signal called an interrupt. The interrupt switches the CPU into system space, and gives control to the operating system, which reads the interrupt parameters, and does whatever it needs to. Then it puts the result where the user space program expects it, switches back to user-space, and then allows the user space program to continue.

That process of switching from the user space program to the kernel space, doing something, and then switching back is called a context switch. Context switches are very expensive. Implemented naively, you need to redo the memory mapping every time you switch. So the interrupt consists of “stop what you’re doing, switch to privileged mode, switch to the kernel memory map, run the syscall, switch to the user program memory map, switch to user mode”.

Ok. So. We’re finally at the point where we can actually talk about the Intel bug.

Intel chips contain a trick to make syscalls less expensive. Instead of having to switch memory maps on a syscall, they allow the kernel memory to be mapped into the memory map of every process running in the system. But since kernel memory can contain all sorts of secret stuff (passwords, data belonging to other processes, among other things), you can’t let user space programs look at it – so the kernel memory is mapped, but it’s marked as privileged. With things set up that way, a syscall can drop the two “switch memory map” steps in the syscall scenario. Now all a syscall needs to do is switch to kernel mode, run the syscall, and switch back to user mode. It’s dramatically faster!

Here’s the problem, as best I understand from the information that’s currently available:

Code that’s running under speculative execution doesn’t do the check whether or not memory accesses from cache are accessing privileged memory. It starts running the instructions without the privilege check, and when it’s time to commit to whether or not the speculative execution should be continued, the check will occur. But during that window, you’ve got the opportunity to run a batch of instructions against the cache without privilege checks. So you can write code with the right sequence of branch instructions to get branch prediction to work the way you want it to; and then you can use that to read memory that you shouldn’t be able to read.

With that as a starting point, you can build up interesting exploits that can ultimately allow you to do almost anything you want. It’s not a trivial exploit, but with a bit of work, you can use a user space program to make a sequence of syscalls to get information you want into memory, and then write that information wherever you want to – and that means that you can acquire root-level access, and do anything you want.

The only fix for this is to stop doing that trick where you map the kernel memory into every user space memory map, because there’s no way to enforce the privileged memory property in speculative execution. In other words, drop the whole syscall performance optimization. That’ll avoid the security issue, but it’s a pretty expensive fix: requiring a full context switch for every syscall will slow down the execution of user space programs by something between 5 and 30 percent.

January 02, 2018

Mark Chu-CarrollZombie Math in the Vortex

Paul Krugman has taken to calling certain kinds of economic ideas zombie economics, because no matter how many times they’re shown to be false, they just keep coming back from the dead. I certainly don’t have stature that compares in any way to Krugmant, but I’m still going to use his terminology for some bad math. There are some crackpot ideas that you just can’t kill.

For example, vortex math. I wrote about vortex math for the first time in 2012, again in early 2013, and again in late 2013. But like a zombie in a bad movie, it’s fans won’t let it stay dead. There must have been a discussion on some vortex-math fan forum recently, because over the last month, I’ve been getting comments on the old posts, and emails taking me to task for supposedly being unfair, closed-minded, ignorant, and generally a very nasty person.

Before I look at any of their criticisms, let’s start with a quick refresher. What is vortex math?

We’re going to create a pattern of single-digit numbers using multiples of 2. Take the number 1. Multiply it by 2, and you get 2. Multiple it by 2, and you get 4. Again, you get 8. Again, and you get 16. 16 is two digits, but we only want one-digit numbers, so we add them together, getting 7. Double, you get 14, so add the digits, and you get 5. Double, you get 10, add the digits, and you get 1. So you’ve got a repeating sequence: 1, 2, 4, 8, 7, 5, …


Take the numbers 1 through 9, and put them at equal distances around the perimeter of a circle. Draw an arrow from a number to its single-digit double. You end up with something that looks kinda-sorta like the infinity symbol. You can also fit those numbers onto the surface of a torus.

That’s really all there is to vortex math. This guy named Marco Rodin discovered that there’s a repeating pattern, and if you draw it on a circle, it looks kinda-like the infinity symbol, and that there must be something incredibly profound and important about it. Launching from there, he came up with numerous claims about what that means. According to vortex math, there’s something deeply significant about that pattern:

  1. If you make metallic windings on a toroidal surface according to that pattern and use it as a generator, it will generate free energy.
  2. Take that same coil, and run a current through it, and you have a perfect, reactionless space drive (called “the flux thruster atom pulsar electrical ventury space time implosion field generator coil”).
  3. If you use those numbers as a pattern in a medical device, it will cure cancer, as well as every other disease.
  4. If you use that numerical pattern, you can devise better compression algorithms that can compress any string of bits.
  5. and so on…

Essentially, according to vortex math, that repeated pattern of numbers defines a “vortex”, which is the deepest structure in the universe, and it’s the key to understanding all of math, all of physics, all of metaphysics, all of medicine. It’s the fundamental pattern of everything, and by understanding it, you can do absolutely anything.

As a math geek, the problem with stuff like vortex math is that it’s difficult to refute mathematically, because even though Rodin calls it math, there’s really no math to it. There’s a pattern, and therefore magic! Beyond the observation that there’s a pattern, there’s nothing but claims of things that must be true because there’s a pattern, without any actual mathematical argument.

Let me show you an example, from one of Rodin’s followers, named Randy Powell.

I call my discovery the ABHA Torus. It is now the full completion of how to engineer Marko Rodin’s Vortex Based Mathematics. The ABHA Torus as I have discovered it is the true and perfect Torus and it has the ability to reveal in 3-D space any and all mathematical/geometric relationships possible allowing it to essentially accomplish any desired functional application in the world of technology. This is because the ABHA Torus provides us a mathematical framework where the true secrets of numbers (qualitative relationships based on angle and ratio) are revealed in fullness.

This is why I believe that the ABHA Torus as I have calculated is the most powerful mathematical tool in existence because it presents proof that numbers are not just flat imaginary things. To the contrary, numbers are stationary vector interstices that are real and exhibiting at all times spatial, temporal, and volumetric qualities. Being stationary means that they are fixed constants. In the ABHA Torus the numbers never move but the functions move through the numbers modeling vibration and the underlying fractal circuitry that natures uses to harness living energy.

The ABHA Torus as revealed by the Rodin/Powell solution displays a perfectly symmetrical spin array of numbers (revealing even prime number symmetry), a feat that has baffled countless scientists and mathematicians throughout the ages. It even uncovers the secret of bilateral symmetry as actually being the result of a diagonal motion along the surface and through the internal volume of the torus in an expanding and contracting polarized logarithmic spiral diamond grain reticulation pattern produced by the interplay of a previously unobserved Positive Polarity Energetic Emanation (so-called ‘dark’ or ‘zero-point’ energy) and a resulting Negative Polarity Back Draft Counter Space (gravity).

If experimentally proven correct such a model would for example replace the standard approach to toroidal coils used in energy production today by precisely defining all the proportional and angular relationships existent in a moving system and revealing not only the true pathway that all accelerated motion seeks (be it an electron around the nucleus of an atom or water flowing down a drain) but in addition revealing this heretofore unobserved, undefined point energetic source underlying all space-time, motion, and vibration.

Lots of impressive sounding words, strung together in profound sounding ways, but what does it mean? Sure, gravity is a “back draft” of an unobserved “positive polarity energetic emanatation”, and therefore we’ve unified dark energy and gravity, and unified all of the forces of our universe. That sounds terrific, except that it doesn’t mean anything! How can you test that? What evidence would be consistent with it? What evidence would be inconsistent with it? No one can answer those questions, because none of it means anything.

As I’ve said lots of times before: there’s a reason for the formal framework of mathematics. There’s a reason for the painful process of mathematical proof. There’s a reason why mathematicians and scientists have devised an elaborate language and notation for expressing mathematical ideas. And that reason is because it’s easy to string together words in profound sounding ways. It’s easy to string together reasoning in ways that look like they might be compelling if you took the time to understand them. But to do actual mathematics or actual science, you need to do more that string together something that sounds good. You need to put together something that is precise. The point of mathematical notation and mathematical reasoning is to take complex ideas and turn them into precisely defined, unambiguous structures that have the same meaning to everyone who looks at them.

“positive polarity energetic emanation” is a bunch of gobbledegook wordage that doesn’t mean anything to anyone. I can’t refute the claim that gravity is a back-draft negative polarity energetic reaction to dark energy. I can’t support that claim, either. I can’t do much of anything with it, because Randy Powell hasn’t said anything meaningful. It’s vague and undefined in ways that make it impossible to reason about in any way.

And that’s the way that things go throughout all of vortex math. There’s this cute pattern, and it must mean something! Therefore… endless streams of words, without any actual mathematical, physical, or scientific argument.

There’s so much wrong with vortex math, but it all comes down to the fact that it takes some arbitrary artifacts of human culture, and assigns them deep, profound meaning for no reason.

There’s this pattern in the doubling of numbers and reducing them to one digit. Why multiple by two? Because we like it, and it produces a pretty pattern. Why not use 3? Well, because in base-10, it won’t produce a good pattern: [1, 3, 9, 9, 9, 9, ….] But we can pick another number like 7: [1, 7, 5, 8, 2, 5, 8, 2, 5, ….], and get a perfectly good series: why is that series less compelling than [1, 4, 8, 7, 2, 5]?

There’s nothing magical about base-10. We can do the same thing in base-8: [1, 2, 4, 1, 2, 4…] How about base-12, which was used for a lot of stuff in Egypt? [1, 2, 4, 8, 5, 10, 9, 7, 3, 6, 1] – that gives us a longer pattern! What makes base-10 special? Why does the base-10 pattern mean something that other bases, or other numerical representations, don’t? The vortex math folks can’t answer that. (Note: I made an arithmetic error in the initial version of the base-12 sequence above. It was pointed out in comments by David Wallace. Thanks!)

If we plot the numbers on a circle, we get something that looks kind-of like an infinity symbol! What does that mean? Why should the infinity symbal (which was invented in the 17th century, and chosen because it looked sort of like a number, and sort-of like the last letter of the greek alphabet) have any intrinsic meaning to the universe?

It’s giving profound meaning to arbitrary things, for unsupported reasons.

So what’s in the recent flood of criticism from the vortex math guys?

Well, there’s a lot of “You’re mean, so you’re wrong.” And there’s a lot of “Why don’t you prove that they’re wrong instead of making fun of them?”. And last but not least, there’s a lot of “Yeah, well, the fibonacci series is just a pattern of numbers too, but it’s really important”.

On the first: Yeah, fine, I’m mean. But I get pretty pissed at seeing people get screwed over by charlatans. The vortex math guys use this stuff to take money from “investors” based on their claims about producing limitless free energy, UFO space drives, and cancer cures. This isn’t abstract: this kind of nonsense hurts people. They people who are pushing these scams deserve to be mocked, without mercy. They don’t deserve kindness or respect, and they’re not going to get it from me.

I’d love to be proved wrong on this. One of my daughter’s friends is currently dying of cancer. I’d give up nearly anything to be able to stop her, and other children like her, from dying an awful death. If the vortex math folks could do anything for this poor kid, I would gladly grovel and humiliate myself at their feet. I would dedicate the rest of my life to nothing but helping them in their work.

But the fact is, when they talk about the miraculous things vortex math can do? At best, they’re delusional; more likely, they’re just lying. There is no cure for cancer in [1, 2, 4, 8, 7, 5, 1].

As for the Fibonacci series: well. It’s an interesting pattern. It does appear to show up in some interesting places in nature. But there are two really important differences.

  1. The Fibonacci series shows up in every numeric notation, in every number base, no matter how you do numbers.
  2. It does show up in nature. This is key: there’s more to it than just words and vague assertions. You can really find fragments of the Fibonacci series in nature. By doing a careful mathematical analysis, you can find the Fibonacci series in numerous places in mathematics, such as the solutions to a range of interesting dynamic optimization problems. When you find a way of observing the vortex math pattern in nature, or a way of producing actual numeric solutions for real problems, in a way that anyone can reproduce, I’ll happily give it another look.
  3. The Fibonacci series does appear in nature – but it’s also been used by numerous crackpots to make ridiculous assertions about how the world must work!

Peter RohdePhD positions available

A fully-funded PhD position is available at the University of Technology Sydney, Australia, to conduct forefront theoretical research in the quantum information sciences, working with Dr Peter Rohde within the Centre for Quantum Software & Information (QSI).

The research topics are flexible, including:

* Quantum machine learning
* Quantum cryptography (especially homomorphic encryption and blind quantum computing)
* Quantum networking (especially cloud quantum computing)
* Quantum computing (with an emphasis on optical quantum computing, boson-sampling and quantum random walks)
* Quantum information theory
* Quantum metrology
* Your own suggestions for exciting projects in quantum technology

The candidate should have a background in physics, computer science, engineering, mathematics or a related discipline, and be highly creative, independent and passionate about quantum technology. The duration of the position is for 3 years, and includes a scholarship for $25,861/year (tax free). The position is research-only, with no teaching or coursework obligations.

QSI is a leading research centre in the quantum information sciences, and the candidate will have the opportunity to collaborate with leading researchers within the centre, as well as with other researchers domestically and internationally. Sydney is home to several major quantum research centres, presenting outstanding local collaboration opportunities.

The candidate will conduct theoretical research to be published in international journals, present research findings at major conferences, and build collaboration networks. Travel opportunities will be available.

To apply for the position or request further information, please contact Dr Peter Rohde (peter.rohde@uts.edu.au) by January 14. When applying, please provide a resume, academic record, contact details for two academic referees, and a statement of your research interests and passions. Applications are now open.

Please distribute this advert amongst your colleagues, students, mailing lists and Facebook groups.

– Ad astra per alas fideles. Scientia potentia est.