Planet Musings

June 29, 2017

Georg von HippelLattice 2017, Day Six

On the last day of the 2017 lattice conference, there were plenary sessions only. The first plenary session opened with a talk by Antonio Rago, who gave a "community review" of lattice QCD on new chips. New chips in the case of lattice QCD means mostly Intel's new Knight's Landing architecture, to whose efficient use significant effort is devoted by the community. Different groups pursue very different approaches, from purely OpenMP-based C codes to mixed MPI/OpenMP-based codes maximizing the efficiency of the SIMD pieces using assembler code. The new NVidia Tesla Volta and Intel's OmniPath fabric also featured in the review.

The next speaker was Zoreh Davoudi, who reviewed lattice inputs for nuclear physics. While simulating heavier nuclei directly in the lattice is still infeasible, nuclear phenomenologists appear to be very excited about the first-principles lattice QCD simulations of multi-baryon systems now reaching maturity, because these can be use to tune and validate nuclear models and effective field theories, from which predictions for heavier nuclei can then be derived so as to be based ultimately on QCD. The biggest controversy in the multi-baryon sector at the moment is due to HALQCD's claim that the multi-baryon mass plateaux seen by everyone except HALQCD (who use their own method based on Bethe-Salpeter amplitudes) are probably fakes or "mirages", and that using the Lüscher method to determine multi-baryon binding would require totally unrealistic source-sink separations of over 10 fm. The volume independence of the bound-state energies determined from the allegedly fake plateaux, as contrasted to the volume dependence of the scattering-state energies so extracted, provides a fairly strong defence against this claim, however. There are also new methods to improve the signal-to-noise ratio for multi-baryon correlation functions, such as phase reweighting.

This was followed by a talk on the tetraquark candidate Zc(3900) by Yoichi Ikeda, who spent a large part of his talk on reiterating the HALQCD claim that the Lüscher method requires unrealistically large time separations. During the questions, William Detmold raised the important point that there would be no excited-state contamination at all if the interpolating operator created an eigenstate of the QCD Hamiltonian, and that for improved interpolating operators (such as generated by the variational method) one can get rather close to this situation, so that the HLAQCD criticism seems hardly applicable. As for the Zc(3900), HALQCD find it to be not a resonance, but a kinematic cusp, although this conclusion is based on simulations at rather heavy pion masses (mπ> 400 MeV).

The final plenary session was devoted to the anomalous magnetic moment of the muon, which is perhaps the most pressing topic for the lattice community, since the new (g-2) experiment is now running, and theoretical predictions matching the improved experimental precision will be needed soon. The first speaker was Christoph Lehner, who presented RBC/UKQCD's efforts to determine the hadronic vacuum polarization contribution to aμ with high precision. The strategy for this consists of two main ingredients: one is to minimize the statistical and systematic errors of the lattice calculation by using a full-volume low-mode average via a multigrid Lanczos method, explicitly including the leading effects of strong isospin breaking and QED, and the contribution from disconnected diagrams, and the other is to combine lattice and phenomenology to take maximum advantage of their respective strengths. This is achieved by using the time-momentum representation with a continuum correlator reconstructed from the R-ratio, which turns out to be quite precise at large times, but more uncertain at shorter times, which is exactly the opposite of the situation for the lattice correlator. Using a window which continuously switches over from the lattice to the continuum at time separations around 1.2 fm then minimizes the overall error on aμ.

The last plenary talk was given by Gilberto Colangelo, who discussed the new dispersive approach to the hadronic light-by-light scattering contribution to aμ. Up to now the theory results for this small, but important, contribution have been based on models, which will always have an a priori unknown and irreducible systematic error, although lattice efforts are beginning to catch up. For a dispersive approach based on general principles such as analyticity and unitarity, the hadronic light-by-light tensor first needs to be Lorentz decomposed, which gives 138 tensors, of which 136 are independent, and of which gauge invariance permits only 54, of which 7 are distinct, with the rest related by crossing symmetry; care has to be taken to choose the tensor basis such that there are no kinematic singularities. A master formula in terms of 12 linear combinations of these components has been derived by Gilberto and collaborators, and using one- and two-pion intermediate states (and neglecting the rest) in a systematic fashion, they have been able to produce a model-independent theory result with small uncertainties based on experimental data for pion form factors and scattering amplitudes.

The closing remarks were delivered by Elvira Gamiz, who advised participants that the proceedings deadline of 18 October will be strict, because this year's proceedings will not be published in PoS, but in EPJ Web of Conferences, who operate a much stricter deadline policy. Many thanks to Elvira for organizing such a splendid lattice conference! (I can appreciate how much work that is, and I think you should have received far more applause.)

Huey-Wen Lin invited the community to East Lansing, Michigan, USA, for the Lattice 2018 conference, which will take place 22-28 July 2018 on the campus of Michigan State University.

The IAC announced that Lattice 2019 will take place in Wuhan, China.

And with that the conference ended. I stayed in Granada for a couple more days of sightseeing and relaxation, but the details thereof will be of legitimate interest only to a very small subset of my readership (whom I keep updated via different channels), and I therefore conclude my coverage and return the blog to its accustomed semi-hiatus state.

June 28, 2017

David HoggMCMC

The research highlight of the day was Marla Geha (Yale) dropping in to Flatiron to chat about MCMC sampling. She is working through the tutorial that Foreman-Mackey (UW) and I are putting together and she is doing the exercises.
I'm impressed! She gave lots of valuable feedback for our first draft.

June 27, 2017

Jordan EllenbergWhen random people give money to random other people

A post on Decision Science about a problem of Uri Wilensky‘s has been making the rounds:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

People often expect the distribution to be close to uniform.  But this isn’t right; the simulations in the post show clearly that inequality of wealth rapidly appears and then persists (though each individual person bobs up and down from rich to poor.)  What’s going on?  Why would this utterly fair and random process generate winners and losers?

Here’s one way to think about it.  The possible states of the system are the sets of nonnegative integers (m_1, .. m_100) summing to 10,000; if you like, the lattice points inside a simplex.  (From now on, let’s write N for 100 because who cares if it’s 100?)

The process is a random walk on a graph G, whose vertices are these states and where two vertices are connected if you can get from one to the other by taking a dollar from one person and giving it to another.  We are asking:  when you run the random walk for a long time, where are you on this graph?  Well, we know what the stationary distribution for random walk on an undirected graph is; it gives each vertex a probability proportional to its degree.  On a regular graph, you get uniform distribution.

Our state graph G isn’t regular, but it almost is; most nodes have degree N, where by “most” I mean “about 1-1/e”; since the number of states is

N^2 + N - 1 \choose N-1

and, of these, the ones with degree N are exactly those in which nobody’s out of money; if each person has a dollar, the number of ways to distribute the remaining N^2 – N dollars is

N^2  - 1 \choose N-1

and so the proportion of states where someone’s out of money is about

\frac{(N^2 - 1)^N}{(N^2 + N - 1)^N} \sim (1-1/N)^N \sim 1/e.

So, apart from those states where somebody’s broke, in the long run every possible state is equally likely;  we are just as likely to see $9,901 in one person’s hands and everybody else with $1 as we are to see exact equidistribution again.

What is a random lattice point in this simplex like?  Good question!  An argument just like the one above shows that the probability nobody goes below $c is on order e^-c, at least when c is small relative to N; in other words, it’s highly likely that somebody’s very nearly out of money.

If X is the maximal amount of money held by any player, what’s the distribution of X?  I didn’t immediately see how to figure this out.  You might consider the continuous version, where you pick a point at random from the real simplex

(x_1, .. x_N) \in \mathbf{R}^N:   \sum x_i = N^2.

Equivalently; break a stick at N-1 randomly chosen points; what is the length of the longest piece?  This is a well-studied problem; the mean size of the longest piece is about N log N.  So I guess I think maybe that’s the expected value of the net worth of the richest player?

But it’s not obvious to me whether you can safely approximate the finite problem by its continuous limit (which corresponds to the case where we keep the number of players at N but reduce the step size so that each player can give each other a cent, or a picocent, or whatever.)

What happens if you give each of the N players just one dollar?  Now the uniformity really breaks down, because it’s incredibly unlikely that nobody’s broke.  The probability distribution on the set of (m_1, .. m_N) summing to N assigns each vector a probability proportional to the size of its support (i.e. the number of m_i that are nonzero.)  That must be a well-known distribution, right?  What does the corresponding distribution on partitions of N look like?

Update:  Kenny Easwaran points out that this is basically the same computation physicists do when they compute the Boltzmann distribution, which was new to me.



David Hogglearning

I spent time working through the last bits of a paper by Dun Wang (NYU) about image modeling for time-domain astrophysics. I asked him to send it to our co-authors.

The rest of the day was spent in discussions of Bayesian inference with the Flatiron Astronomical Data Group reading group. We are doing elementary exercises in data analysis and yet we are not finding it easy to discuss and understand, especially some of the details and conceptual arguments. In other words: No matter how much experience you have with data analysis, there are always things to learn!

David Hoggcosmic rays, alien technology

I helped Justin Alsing (Flatiron) and Maggie Lieu (ESA) search for HST data relevant to their project for training a model to find cosmic rays and asteroids. They started to decide that HST's cosmic-ray identification methods that they are already using might be good enough to just rely upon, which drops their requirements down to asteroids. That's good! But it's hard to make a good training set.

Jia Liu (Columbia) swung by to discuss the possibility of finding things at exo-L1 or exo-L2 (or the other Lagrange points). Some of the Lagrange points are unstable, so anything we find would be clear signs of alien technology. We looked at the relevant literature; we may be fully scooped, but I think there are probably things to do still. One thing we discussed is the observability; it is somehow going to depend on the relative density of the planet and star!

June 26, 2017

Tommaso DorigoPaola Nicoletti's Artwork

Tormented water masses with distant horizons in flames. Multicoloured reflexions over living oceans. Moving images with an astounding visual impact. These are the first words that come to my mind if I try to describe Paola Nicoletti's paintings. Needless to say, I like them a lot - but judge for yourself from the three images below, or by visiting her web site. And if you do, please drop her a line!
Paola is a friend, who has recently taken her painting quite seriously. I believe these pictures give ample justification for it. Here is what Paola herself writes to describe some of her works:

read more

BackreactionDear Dr B: Is science democratic?

“Hi Bee, One of the often repeated phrases here in Italy by so called “science enthusiasts” is that “science is not democratic”, which to me sounds like an excuse for someone to justify some authoritarian or semi-fascist fantasy. We see this on countless “Science pages”, one very popular example being Fare Serata Con Galileo. It's not a bad page per se, quite the contrary, but the level of

June 25, 2017

Georg von HippelLattice 2017, Day Five

The programme for today took account of the late end of the conference dinner in the early hours of the day, by moving the plenary sessions by half an hour. The first plenary talk of the day was given by Ben Svetitsky, who reviewed the status of BSM investigations using lattice field theory. An interesting point Ben raised was that these studies go not so much "beyond" the Standard Model (like SUSY, dark matter, or quantum gravity would), but "behind" or "beneath" it by seeking for a deeper explanation of the seemingly unnaturally small Higgs mass, flavour hierarchies, and other unreasonable-looking features of the SM. The original technicolour theory is quite dead, being Higgsless, but "walking" technicolour models are an area of active investigation. These models have a β-function that comes close to zero at some large coupling, leading to an almost conformal behaviour near the corresponding IR almost-fixed point. In such almost conformal theories, a light scalar (i.e. the Higgs) could arise naturally as the pseudo-Nambu-Goldstone boson of the approximate dilatation symmetry of the theory. A range of different gauge groups, numbers of flavours, and fermion representations are being investigated, with the conformal or quasi-conformal status of some of these being apparently controversial. An alternative approach to Higgs compositeness has the Higgs appear as the exact Nambu-Goldstone boson of some spontaneous symmetry breaking which keeps SU(2)L⨯U(1) intact, with the Higgs potential being generated at the loop level by the coupling to the SM sector. There are also some models of this type being actively investigated.

The next plenary speaker was Stefano Forte, who reviewed the status and prospects of determining the strong coupling αs from sources other than the lattice. The PDG average for αs is a weighted average of six values, four of which are the pre-averages of the determinations from the lattice, from τ decays, from jet rates and shapes, and from parton distribution functions, and two of which are the determinations from the global electroweak fit and from top production at the LHC. Each of these channels has its own systematic issues, and one problem can be that overaggressive error estimates give too much weight to the corresponding determination, leading to statistically implausible scatter of results in some channels. It should be noted, however, that the lattice results are all quite compatible, with the most precise results by ALPHA and by HPQCD (which use different lattice formulations and completely different analysis methods) sitting right on top of each other.

This was followed by a presentation by Thomas Korzec of the determination of αs by the ALPHA collaboration. I cannot really attempt to do justice to this work in a blog post, so I encourage you to look at their paper. By making use of both the Schrödinger functional and the gradient flow coupling in finite volume, they are able to non-perturbatively run αs between hadronic and perturbative scales with high accuracy.

After the coffee break, Erhard Seiler reviewed the status of the complex Langevin method, which is one of the leading methods for simulating actions with a sign problem, e.g. at finite chemical potential or with a θ term. Unfortunately, it is known that the complex Langevin method can sometimes converge to wrong results, and this can be traced to the violation by the complexification of the conditions under which the (real) Langevin method is justified, of which the development of zeros in e-S seems to be the most important case, giving rise to poles in the force which will violate ergodicity. There seems to be a lack of general theorems for situations like this, although the complex Langevin method has apparently been shown to be correct under certain difficult-to-check conditions. One of the best hopes for simulating with complex Langevin seems to be the dynamical stabilization proposed by Benjamin Jäger and collaborators.

This was followed by Paulo Bedaque discussing the prospects of solving the sign problem using the method of thimbles and related ideas. As far as I understand, thimbles are permissible integration regions in complexified configuration space on which the imaginary part of the action is constant, and which can thus be integrated over without a sign problem. A holomorphic flow that is related both to the gradient flow and the Hamiltonian flow can be constructed so as to flow from the real integration region to the thimbles, and based on this it appears to have become possible to solve some toy models with a sign problem, even going so far as to perform real-time simulations in the Keldysh-Schwinger formalism in Euclidean space (if I understood correctly).

In the afternoon, there was a final round of parallel sessions, one of which was again dedicated to the anomalous magnetic moment of the muon, this time focusing on the very difficult hadronic light-by-light contribution, for which the Mainz group has some very encouraging first results.

June 23, 2017

Tommaso DorigoLucid Dreams: Exploring The Dark Side

No, this post is not about some exotic new physics model predicting dark photons or other useless concoctions which physicists sometimes entertain with, in their frustration for the lack of guidance from experimental data of what really is it  that the Standard Model is an effective theory of. For that kind of stuff, please wait and check out my blog at some other time.

read more

Georg von HippelLattice 2017, Days Three and Four

Wednesday was the customary short day, with parallel sessions in the morning, and time for excursions in the afternoon. I took the "Historic Granada" walking tour, which included visits to the Capilla Real and the very impressive Cathedral of Granada.

The first plenary session of today had a slightly unusual format in that it was a kind of panel discussion on the topic of axions and QCD topology at finite temperature.

After a brief outline by Mikko Laine, the session chair, the session started off with a talk by Guy Moore on the role of axions in cosmology and the role of lattice simulations in this context. Axions arise in the Peccei-Quinn solution to the strong CP problem and are a potential dark matter candidate. Guy presented some of his own real-time lattice simulations in classical field theory for axion fields, which exhibit the annihilation of cosmic-string-like vortex defects and associated axion production, and pointed out the need for accurate lattice QCD determinations of the topological susceptibility in the temperature range of 500-1200 MeV in order to fix the mass of the axion more precisely from the dark matter density (assuming that dark matter consists of axions).

The following talks were all fairly short. Claudio Bonati presented algorithmic developments for simulations of the topological properties of high-temperature QCD. The long autocorrelations of the topological charge at small lattice spacing are a problem. Metadynamics, which bias the Monte Carlo evolution in a non-Markovian manner so as to more efficiently sample the configuration space, appear to be of help.

Hidenori Fukaya reviewed the question of whether U(1)A remains anomalous at high temperature, which he claimed (both on theoretical grounds and based on numerical simulation results) it doesn't. I didn't quite understand this, since as far as I understand the axial anomaly, it is an operator identity, which will remain true even if both sides of the identity were to happen to vanish at high enough temperature, which is all that seemed to be shown; but this may just be my ignorance showing.

Tamas Kovacs showed recent results on the temperature-dependence of the topological susceptibility of QCD. By a careful choice of algorithms based on physical considerations, he could measure the topological susceptibility over a wide range of temperatures, showing that it becomes tiny at large temperature.

Then the speakers all sat on the stage as a panel and fielded questions from the audience. Perhaps it might have been a good idea to somehow force the speakers to engage each other; as it was, the advantage of this format over simply giving each speaker a longer time for answering questions didn't immediately become apparent to me.

After the coffee break, things returned to the normal format. Boram Yoon gave a review of lattice determinations of the neutron electric dipole moment. Almost any BSM source of CP violation must show up as a contribution to the neutron EDM, which is therefore a very sensitive probe of new physics. The very strong experimental limits on any possible neutron EDM imply e.g. |θ|<10-10 in QCD through lattice measurements of the effects of a θ term on the neutron EDM. Similarly, limits can be put on any quark EDMs or quark chromoelectric dipole moments. The corresponding lattice simulations have to deal with sign problems, and the usual techniques (Taylor expansions, simulations at complex θ) are employed to get past this, and seem to be working very well.

The next plenary speaker was Phiala Shanahan, who showed recent results regarding the gluon structure of hadrons and nuclei. This line of research is motivated by the prospect of an electron-ion collider that would be particularly sensitive to the gluon content of nuclei. For gluonic contributions to the momentum and spin decomposition of the nucleon, there are some fresh results from different groups. For the gluonic transversity, Phiala and her collaborators have performed first studies in the φ system. The gluonic radii of small nuclei have also been looked at, with no deviation from the single-nucleon case visible at the present level of accuracy.

The 2017 Kenneth Wilson Award was awarded to Raúl Briceño for his groundbreaking contributions to the study of resonances in lattice QCD. Raúl has been deeply involved both in the theoretical developments behind extending the reach of the Lüscher formalism to more and more complicated situations, and in the numerical investigations of resonance properties rendered possible by those developments.

After the lunch break, there were once again parallel sessions, two of which were dedicated entirely to the topic of the hadronic vacuum polarization contribution to the anomalous magnetic moment of the muon, which has become one of the big topics in lattice QCD.

In the evening, the conference dinner took place. The food was excellent, and the Flamenco dancers who arrived at midnight (we are in Spain after all, where it seems dinner never starts before 9pm) were quite impressive.

Terence TaoQuantitative continuity estimates

Suppose {F: X \rightarrow Y} is a continuous (but nonlinear) map from one normed vector space {X} to another {Y}. The continuity means, roughly speaking, that if {x_0, x \in X} are such that {\|x-x_0\|_X} is small, then {\|F(x)-F(x_0)\|_Y} is also small (though the precise notion of “smallness” may depend on {x} or {x_0}, particularly if {F} is not known to be uniformly continuous). If {F} is known to be differentiable (in, say, the Fréchet sense), then we in fact have a linear bound of the form

\displaystyle  \|F(x)-F(x_0)\|_Y \leq C(x_0) \|x-x_0\|_X

for some {C(x_0)} depending on {x_0}, if {\|x-x_0\|_X} is small enough; one can of course make {C(x_0)} independent of {x_0} (and drop the smallness condition) if {F} is known instead to be Lipschitz continuous.

In many applications in analysis, one would like more explicit and quantitative bounds that estimate quantities like {\|F(x)-F(x_0)\|_Y} in terms of quantities like {\|x-x_0\|_X}. There are a number of ways to do this. First of all, there is of course the trivial estimate arising from the triangle inequality:

\displaystyle  \|F(x)-F(x_0)\|_Y \leq \|F(x)\|_Y + \|F(x_0)\|_Y. \ \ \ \ \ (1)

This estimate is usually not very good when {x} and {x_0} are close together. However, when {x} and {x_0} are far apart, this estimate can be more or less sharp. For instance, if the magnitude of {F} varies so much from {x_0} to {x} that {\|F(x)\|_Y} is more than (say) twice that of {\|F(x_0)\|_Y}, or vice versa, then (1) is sharp up to a multiplicative constant. Also, if {F} is oscillatory in nature, and the distance between {x} and {x_0} exceeds the “wavelength” of the oscillation of {F} at {x_0} (or at {x}), then one also typically expects (1) to be close to sharp. Conversely, if {F} does not vary much in magnitude from {x_0} to {x}, and the distance between {x} and {x_0} is less than the wavelength of any oscillation present in {F}, one expects to be able to improve upon (1).

When {F} is relatively simple in form, one can sometimes proceed simply by substituting {x = x_0 + h}. For instance, if {F: R \rightarrow R} is the squaring function {F(x) = x^2} in a commutative ring {R}, one has

\displaystyle  F(x_0+h) = (x_0+h)^2 = x_0^2 + 2x_0 h+ h^2

and thus

\displaystyle  F(x_0+h) - F(x_0) = 2x_0 h + h^2

or in terms of the original variables {x, x_0} one has

\displaystyle  F(x) - F(x_0) = 2 x_0 (x-x_0) + (x-x_0)^2.

If the ring {R} is not commutative, one has to modify this to

\displaystyle  F(x) - F(x_0) = x_0 (x-x_0) + (x-x_0) x_0 + (x-x_0)^2.

Thus, for instance, if {A, B} are {n \times n} matrices and {\| \|_{op}} denotes the operator norm, one sees from the triangle inequality and the sub-multiplicativity {\| AB\|_{op} \leq \| A \|_{op} \|B\|_{op}} of operator norm that

\displaystyle  \| A^2 - B^2 \|_{op} \leq \| A - B \|_{op} ( 2 \|B\|_{op} + \|A - B \|_{op} ). \ \ \ \ \ (2)

If {F(x)} involves {x} (or various components of {x}) in several places, one can sometimes get a good estimate by “swapping” {x} with {x_0} at each of the places in turn, using a telescoping series. For instance, if we again use the squaring function {F(x) = x^2 = x x} in a non-commutative ring, we have

\displaystyle  F(x) - F(x_0) = x x - x_0 x_0

\displaystyle  = (x x - x_0 x) + (x_0 x - x_0 x_0)

\displaystyle  = (x-x_0) x + x_0 (x-x_0)

which for instance leads to a slight improvement of (2):

\displaystyle  \| A^2 - B^2 \|_{op} \leq \| A - B \|_{op} ( \| A\|_{op} + \|B\|_{op} ).

More generally, for any natural number {n}, one has the identity

\displaystyle  x^n - x_0^n = (x-x_0) (x^{n-1} + x^{n-2} x_0 + \dots + x x_0^{n-2} + x_0^{n-1}) \ \ \ \ \ (3)

in a commutative ring, while in a non-commutative ring one must modify this to

\displaystyle  x^n - x_0^n = \sum_{i=0}^{n-1} x_0^i (x-x_0) x^{n-1-i},

and for matrices one has

\displaystyle  \| A^n - B^n \|_{op} \leq \| A-B\|_{op} ( \|A\|_{op}^{n-1} + \| A\|_{op}^{n-2} \| B\|_{op} + \dots + \|B\|_{op}^{n-1} ).

Exercise 1 If {U} and {V} are unitary {n \times n} matrices, show that the commutator {[U,V] := U V U^{-1} V^{-1}} obeys the inequality

\displaystyle  \| [U,V] - I \|_{op} \leq 2 \| U - I \|_{op} \| V - I \|_{op}.

(Hint: first control {\| UV - VU \|_{op}}.)

Now suppose (for simplicity) that {F: {\bf R}^d \rightarrow {\bf R}^{d'}} is a map between Euclidean spaces. If {F} is continuously differentiable, then one can use the fundamental theorem of calculus to write

\displaystyle  F(x) - F(x_0) = \int_0^1 \frac{d}{dt} F( \gamma(t) )\ dt

where {\gamma: [0,1] \rightarrow Y} is any continuously differentiable path from {x_0} to {x}. For instance, if one uses the straight line path {\gamma(t) := (1-t) x_0 + tx}, one has

\displaystyle  F(x) - F(x_0) = \int_0^1 ((x-x_0) \cdot \nabla F)( (1-t) x_0 + t x )\ dt.

In the one-dimensional case {d=1}, this simplifies to

\displaystyle  F(x) - F(x_0) = (x-x_0) \int_0^1 F'( (1-t) x_0 + t x )\ dt. \ \ \ \ \ (4)

Among other things, this immediately implies the factor theorem for {C^k} functions: if {F} is a {C^k({\bf R})} function for some {k \geq 1} that vanishes at some point {x_0}, then {F(x)} factors as the product of {x-x_0} and some {C^{k-1}} function {G}. Another basic consequence is that if {\nabla F} is uniformly bounded in magnitude by some constant {C}, then {F} is Lipschitz continuous with the same constant {C}.

Applying (4) to the power function {x \mapsto x^n}, we obtain the identity

\displaystyle  x^n - x_0^n = n (x-x_0) \int_0^1 ((1-t) x_0 + t x)^{n-1}\ dt \ \ \ \ \ (5)

which can be compared with (3). Indeed, for {x_0} and {x} close to {1}, one can use logarithms and Taylor expansion to arrive at the approximation {((1-t) x_0 + t x)^{n-1} \approx x_0^{(1-t) (n-1)} x^{t(n-1)}}, so (3) behaves a little like a Riemann sum approximation to (5).

Exercise 2 For each {i=1,\dots,n}, let {X^{(1)}_i} and {X^{(0)}_i} be random variables taking values in a measurable space {R_i}, and let {F: R_1 \times \dots \times R_n \rightarrow {\bf R}^m} be a bounded measurable function.

  • (i) (Lindeberg exchange identity) Show that

    \displaystyle  \mathop{\bf E} F(X^{(1)}_1,\dots,X^{(1)}_n) - \mathop{\bf E} F(X^{(0)}_1,\dots,X^{(0)}_n)

    \displaystyle = \sum_{i=1}^n \mathop{\bf E} F( X^{(1)}_1,\dots, X^{(1)}_{i-1}, X^{(1)}_i, X^{(0)}_{i+1}, \dots, X^{(0)}_n)

    \displaystyle - \mathop{\bf E} F( X^{(1)}_1,\dots, X^{(1)}_{i-1}, X^{(0)}_i, X^{(0)}_{i+1}, \dots, X^{(0)}_n).

  • (ii) (Knowles-Yin exchange identity) Show that

    \displaystyle  \mathop{\bf E} F(X^{(1)}_1,\dots,X^{(1)}_n) - \mathop{\bf E} F(X^{(0)}_1,\dots,X^{(0)}_n)

    \displaystyle = \int_0^1 \sum_{i=1}^n \mathop{\bf E} F( X^{(t)}_1,\dots, X^{(t)}_{i-1}, X^{(1)}_i, X^{(t)}_{i+1}, \dots, X^{(t)}_n)

    \displaystyle - \mathop{\bf E} F( X^{(t)}_1,\dots, X^{(t)}_{i-1}, X^{(0)}_i, X^{(t)}_{i+1}, \dots, X^{(t)}_n)\ dt,

    where {X^{(t)}_i = 1_{I_i \leq t} X^{(0)}_i + 1_{I_i > t} X^{(1)}_i} is a mixture of {X^{(0)}_i} and {X^{(1)}_i}, with {I_1,\dots,I_n} uniformly drawn from {[0,1]} independently of each other and of the {X^{(0)}_1,\dots,X^{(0)}_n, X^{(1)}_0,\dots,X^{(1)}_n}.

  • (iii) Discuss the relationship between the identities in parts (i), (ii) with the identities (3), (5).

(The identity in (i) is the starting point for the Lindeberg exchange method in probability theory, discussed for instance in this previous post. The identity in (ii) can also be used in the Lindeberg exchange method; the terms in the right-hand side are slightly more symmetric in the indices {1,\dots,n}, which can be a technical advantage in some applications; see this paper of Knowles and Yin for an instance of this.)

Exercise 3 If {F: {\bf R}^d \rightarrow {\bf R}^{d'}} is continuously {k} times differentiable, establish Taylor’s theorem with remainder

\displaystyle  F(x) = \sum_{j=0}^{k-1} \frac{1}{j!} (((x-x_0) \cdot \nabla)^j F)( x_0 )

\displaystyle + \int_0^1 \frac{(1-t)^{k-1}}{(k-1)!} (((x-x_0) \cdot \nabla)^k F)((1-t) x_0 + t x)\ dt.

If {\nabla^k F} is bounded, conclude that

\displaystyle  |F(x) - \sum_{j=0}^{k-1} \frac{1}{j!} (((x-x_0) \cdot \nabla)^j F)( x_0 )|

\displaystyle \leq \frac{|x-x_0|^k}{k!} \sup_{y \in {\bf R}^d} |\nabla^k F(y)|.

For real scalar functions {F: {\bf R}^d \rightarrow {\bf R}}, the average value of the continuous real-valued function {(x - x_0) \cdot \nabla F((1-t) x_0 + t x)} must be attained at some point {t} in the interval {[0,1]}. We thus conclude the mean-value theorem

\displaystyle  F(x) - F(x_0) = ((x - x_0) \cdot \nabla F)((1-t) x_0 + t x)

for some {t \in [0,1]} (that can depend on {x}, {x_0}, and {F}). This can for instance give a second proof of fact that continuously differentiable functions {F} with bounded derivative are Lipschitz continuous. However it is worth stressing that the mean-value theorem is only available for real scalar functions; it is false for instance for complex scalar functions. A basic counterexample is given by the function {e(x) := e^{2\pi i x}}; there is no {t \in [0,1]} for which {e(1) - e(0) = e'(t)}. On the other hand, as {e'} has magnitude {2\pi}, we still know from (4) that {e} is Lipschitz of constant {2\pi}, and when combined with (1) we obtain the basic bounds

\displaystyle  |e(x) - e(y)| \leq \min( 2, 2\pi |x-y| )

which are already very useful for many applications.

Exercise 4 Let {H_0, V} be {n \times n} matrices, and let {t} be a non-negative real.

  • (i) Establish the Duhamel formula

    \displaystyle  e^{t(H_0+V)} = e^{tH_0} + \int_0^t e^{(t-s) H_0} V e^{s (H_0+V)}\ ds

    \displaystyle  = e^{tH_0} + \int_0^t e^{(t-s) (H_0+V)} V e^{s H_0}\ ds

    where {e^A} denotes the matrix exponential of {A}. (Hint: Differentiate {e^{(t-s) H_0} e^{s (H_0+V)}} or {e^{(t-s) (H_0+V)} e^{s H_0}} in {s}.)

  • (ii) Establish the iterated Duhamel formula

    \displaystyle  e^{t(H_0+V)} = e^{tH_0} + \sum_{j=1}^k \int_{0 \leq t_1 \leq \dots \leq t_j \leq t}

    \displaystyle e^{(t-t_j) H_0} V e^{(t_j-t_{j-1}) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_j

    \displaystyle  + \int_{0 \leq t_1 \leq \dots \leq t_{k+1} \leq t}

    \displaystyle  e^{(t-t_{k+1}) (H_0+V)} V e^{(t_{k+1}-t_k) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_{k+1}

    for any {k \geq 0}.

  • (iii) Establish the infinitely iterated Duhamel formula

    \displaystyle  e^{t(H_0+V)} = e^{tH_0} + \sum_{j=1}^\infty \int_{0 \leq t_1 \leq \dots \leq t_j \leq t}

    \displaystyle e^{(t-t_j) H_0} V e^{(t_j-t_{j-1}) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_j.

  • (iv) If {H(t)} is an {n \times n} matrix depending in a continuously differentiable fashion on {t}, establish the variation formula

    \displaystyle  \frac{d}{dt} e^{H(t)} = (F(\mathrm{ad}(H(t))) H'(t)) e^{H(t)}

    where {\mathrm{ad}(H)} is the adjoint representation {\mathrm{ad}(H)(V) = HV - VH} applied to {H}, and {F} is the function

    \displaystyle  F(z) := \int_0^1 e^{sz}\ ds

    (thus {F(z) = \frac{e^z-1}{z}} for non-zero {z}), with {F(\mathrm{ad}(H(t)))} defined using functional calculus.

We remark that further manipulation of (iv) of the above exercise using the fundamental theorem of calculus eventually leads to the Baker-Campbell-Hausdorff-Dynkin formula, as discussed in this previous blog post.

Exercise 5 Let {A, B} be positive definite {n \times n} matrices, and let {Y} be an {n \times n} matrix. Show that there is a unique solution {X} to the Sylvester equation

\displaystyle  AX + X B = Y

which is given by the formula

\displaystyle  X = \int_0^\infty e^{-tA} Y e^{-tB}\ dt.

In the above examples we had applied the fundamental theorem of calculus along linear curves {\gamma(t) = (1-t) x_0 + t x}. However, it is sometimes better to use other curves. For instance, the circular arc {\gamma(t) = \cos(\pi t/2) x_0 + \sin(\pi t/2) x} can be useful, particularly if {x_0} and {x} are “orthogonal” or “independent” in some sense; a good example of this is the proof by Maurey and Pisier of the gaussian concentration inequality, given in Theorem 8 of this previous blog post. In a similar vein, if one wishes to compare a scalar random variable {X} of mean zero and variance one with a Gaussian random variable {G} of mean zero and variance one, it can be useful to introduce the intermediate random variables {\gamma(t) := (1-t)^{1/2} X + t^{1/2} G} (where {X} and {G} are independent); note that these variables have mean zero and variance one, and after coupling them together appropriately they evolve by the Ornstein-Uhlenbeck process, which has many useful properties. For instance, one can use these ideas to establish monotonicity formulae for entropy; see e.g. this paper of Courtade for an example of this and further references. More generally, one can exploit curves {\gamma} that flow according to some geometrically natural ODE or PDE; several examples of this occur famously in Perelman’s proof of the Poincaré conjecture via Ricci flow, discussed for instance in this previous set of lecture notes.

In some cases, it is difficult to compute {F(x)-F(x_0)} or the derivative {\nabla F} directly, but one can instead proceed by implicit differentiation, or some variant thereof. Consider for instance the matrix inversion map {F(A) := A^{-1}} (defined on the open dense subset of {n \times n} matrices consisting of invertible matrices). If one wants to compute {F(B)-F(A)} for {B} close to {A}, one can write temporarily write {F(B) - F(A) = E}, thus

\displaystyle  B^{-1} - A^{-1} = E.

Multiplying both sides on the left by {B} to eliminate the {B^{-1}} term, and on the right by {A} to eliminate the {A^{-1}} term, one obtains

\displaystyle  A - B = B E A

and thus on reversing these steps we arrive at the basic identity

\displaystyle  B^{-1} - A^{-1} = B^{-1} (A - B) A^{-1}. \ \ \ \ \ (6)

For instance, if {H_0, V} are {n \times n} matrices, and we consider the resolvents

\displaystyle  R_0(z) := (H_0 - z I)^{-1}; \quad R_V(z) := (H_0 + V - zI)^{-1}

then we have the resolvent identity

\displaystyle  R_V(z) - R_0(z) = - R_V(z) V R_0(z) \ \ \ \ \ (7)

as long as {z} does not lie in the spectrum of {H_0} or {H_0+V} (for instance, if {H_0}, {V} are self-adjoint then one can take {z} to be any strictly complex number). One can iterate this identity to obtain

\displaystyle  R_V(z) = \sum_{j=0}^k (-R_0(z) V)^j R_0(z) + (-R_V(z) V) (-R_0(z) V)^k R_0(z)

for any natural number {k}; in particular, if {R_0(z) V} has operator norm less than one, one has the Neumann series

\displaystyle  R_V(z) = \sum_{j=0}^\infty (-R_0(z) V)^j R_0(z).

Similarly, if {A(t)} is a family of invertible matrices that depends in a continuously differentiable fashion on a time variable {t}, then by implicitly differentiating the identity

\displaystyle  A(t) A(t)^{-1} = I

in {t} using the product rule, we obtain

\displaystyle  (\frac{d}{dt} A(t)) A(t)^{-1} + A(t) \frac{d}{dt} A(t)^{-1} = 0

and hence

\displaystyle  \frac{d}{dt} A(t)^{-1} = - A(t)^{-1} (\frac{d}{dt} A(t)) A(t)^{-1}

(this identity may also be easily derived from (6)). One can then use the fundamental theorem of calculus to obtain variants of (6), for instance by using the curve {\gamma(t) = (1-t) A + tB} we arrive at

\displaystyle  B^{-1} - A^{-1} = \int_0^1 ((1-t)A + tB)^{-1} (A-B) ((1-t)A + tB)^{-1}\ dt

assuming that the curve stays entirely within the set of invertible matrices. While this identity may seem more complicated than (6), it is more symmetric, which conveys some advantages. For instance, using this identity it is easy to see that if {A, B} are positive definite with {A>B} in the sense of positive definite matrices (that is, {A-B} is positive definite), then {B^{-1} > A^{-1}}. (Try to prove this using (6) instead!)

Exercise 6 If {A} is an invertible {n \times n} matrix and {u, v} are {n \times 1} vectors, establish the Sherman-Morrison formula

\displaystyle  (A + t uv^T)^{-1} = A^{-1} - \frac{t}{1 + t v^T A^{-1} u} A^{-1} uv^T A^{-1}

whenever {t} is a scalar such that {1 + t v^T A^{-1} u} is non-zero. (See also this previous blog post for more discussion of these sorts of identities.)

One can use the Cauchy integral formula to extend these identities to other functions of matrices. For instance, if {F: {\bf C} \rightarrow {\bf C}} is an entire function, and {\gamma} is a counterclockwise contour that goes around the spectrum of both {H_0} and {H_0+V}, then we have

\displaystyle  F(H_0+V) = \frac{-1}{2\pi i} \int_\gamma F(z) R_V(z)\ dz

and similarly

\displaystyle  F(H_0) = \frac{-1}{2\pi i} \int_\gamma F(z) R_0(z)\ dz

and hence by (7) one has

\displaystyle  F(H_0+V) - F(H_0) = \frac{1}{2\pi i} \int_\gamma F(z) R_V(z) V F_0(z)\ dz;

similarly, if {H(t)} depends on {t} in a continuously differentiable fashion, then

\displaystyle  \frac{d}{dt} F(H(t)) = \frac{1}{2\pi i} \int_\gamma F(z) (H(t) - zI)^{-1} H'(t) (z) (H(t)-zI)^{-1}\ dz

as long as {\gamma} goes around the spectrum of {H(t)}.

Exercise 7 If {H(t)} is an {n \times n} matrix depending continuously differentiably on {t}, and {F: {\bf C} \rightarrow {\bf C}} is an entire function, establish the tracial chain rule

\displaystyle  \frac{d}{dt} \hbox{tr} F(H(t)) = \hbox{tr}(F'(H(t)) H'(t)).

In a similar vein, given that the logarithm function is the antiderivative of the reciprocal, one can express the matrix logarithm {\log A} of a positive definite matrix by the fundamental theorem of calculus identity

\displaystyle  \log A = \int_0^\infty (I + sI)^{-1} - (A + sI)^{-1}\ ds

(with the constant term {(I+tI)^{-1}} needed to prevent a logarithmic divergence in the integral). Differentiating, we see that if {A(t)} is a family of positive definite matrices depending continuously on {t}, that

\displaystyle  \frac{d}{dt} \log A(t) = \int_0^\infty (A(t) + sI)^{-1} A'(t) (A(t)+sI)^{-1}\ dt.

This can be used for instance to show that {\log} is a monotone increasing function, in the sense that {\log A> \log B} whenever {A > B > 0} in the sense of positive definite matrices. One can of course integrate this formula to obtain some formulae for the difference {\log A - \log B} of the logarithm of two positive definite matrices {A,B}.

To compare the square root {A^{1/2} - B^{1/2}} of two positive definite matrices {A,B} is trickier; there are multiple ways to proceed. One approach is to use contour integration as before (but one has to take some care to avoid branch cuts of the square root). Another to express the square root in terms of exponentials via the formula

\displaystyle  A^{1/2} = \frac{1}{\Gamma(-1/2)} \int_0^\infty (e^{-tA} - I) t^{-1/2} \frac{dt}{t}

where {\Gamma} is the gamma function; this formula can be verified by first diagonalising {A} to reduce to the scalar case and using the definition of the Gamma function. Then one has

\displaystyle  A^{1/2} - B^{1/2} = \frac{1}{\Gamma(-1/2)} \int_0^\infty (e^{-tA} - e^{-tB}) t^{-1/2} \frac{dt}{t}

and one can use some of the previous identities to control {e^{-tA} - e^{-tB}}. This is pretty messy though. A third way to proceed is via implicit differentiation. If for instance {A(t)} is a family of positive definite matrices depending continuously differentiably on {t}, we can differentiate the identity

\displaystyle  A(t)^{1/2} A(t)^{1/2} = A(t)

to obtain

\displaystyle  A(t)^{1/2} \frac{d}{dt} A(t)^{1/2} + (\frac{d}{dt} A(t)^{1/2}) A(t)^{1/2} = \frac{d}{dt} A(t).

This can for instance be solved using Exercise 5 to obtain

\displaystyle  \frac{d}{dt} A(t)^{1/2} = \int_0^\infty e^{-sA(t)^{1/2}} A'(t) e^{-sA(t)^{1/2}}\ ds

and this can in turn be integrated to obtain a formula for {A^{1/2} - B^{1/2}}. This is again a rather messy formula, but it does at least demonstrate that the square root is a monotone increasing function on positive definite matrices: {A > B > 0} implies {A^{1/2} > B^{1/2} > 0}.

Several of the above identities for matrices can be (carefully) extended to operators on Hilbert spaces provided that they are sufficiently well behaved (in particular, if they have a good functional calculus, and if various spectral hypotheses are obeyed). We will not attempt to do so here, however.

Filed under: expository, math.CA, math.RA Tagged: estimates, matrix identities, stability

Terence TaoPCMI lecture notes on random matrix theory

In July I will be spending a week at Park City, being one of the mini-course lecturers in the Graduate Summer School component of the Park City Summer Session on random matrices.  I have chosen to give some lectures on least singular values of random matrices, the circular law, and the Lindeberg exchange method in random matrix theory; this is a slightly different set of topics than I had initially advertised (which was instead about the Lindeberg exchange method and the local relaxation flow method), but after consulting with the other mini-course lecturers I felt that this would be a more complementary set of topics.  I have uploaded an draft of my lecture notes (some portion of which is derived from my monograph on the subject); as always, comments and corrections are welcome.

<I>[Update, June 23: notes revised and reformatted to PCMI format. -T.]</I>



Filed under: math.PR, math.RA, paper, talk Tagged: Park City, random matrices

June 22, 2017

David HoggBayesian basics; red clump

A research highlight today was the first meeting of our Bayesian Data Analysis, 3ed reading group. It lasted a lot longer than an hour! We ended up going off into a tangent on the Fully Marginalized Likelihood vs cross-validation and Bayesian equivalents. We came up with some possible research projects there! The rest of the meeting was Bayesian basics. We decided on some problems we would do in Chapter 2. I hate to admit that the idea of having a problem set to do makes me nervous!

In the afternoon, Lauren Anderson (Flatiron) and I discussed our project to separate red-clump stars from red-giant-branch stars in the spectral domain. We have two approaches: The first is unsupervised: Can we see two spectral populations where the RC and RGB overlap? The second is supervised: Can we predict relevant asteroseismic parameters ina training set using the spectra?

Jordan EllenbergDriftless Father’s Day

This Father’s Day I found that, by some kind of unanticipated-gap-in-the-Red-Sea-level miracle, neither of my children had any events scheduled, so I gave myself a present and did something I’d been meaning to do for a year; take them to Dubuque.

It’s not far from Madison.  You drive southwest through the Driftless Zone, where the glaciers somehow looped around and missed a spot while they were grinding the rest of the Midwest flat.

At the exit to Platteville there was a sign for a “Mining Museum.”  We had about six seconds to decide whether we all wanted to go to a mining museum but that was plenty of time because obviously we all totally wanted to go to a mining museum.  And it was great!  Almost the platonic ideal of a small-town museum.  Our guide took us down into the old lead mine from the 1850s, now with electric lights and a lot of mannequins caught in the act of blasting holes in the rock.  (One of the mannequins was black; our guide told us that there were African-American miners in southwestern Wisconsin, but not that some of them were enslaved.)

This museum did a great job of conveying the working conditions of those miners; ankle-deep in water, darkness broken only by the candle wired to the front of their hat, the hammers on the rock so loud you couldn’t talk, and had to communicate by hand signals.  Riding up and down to the surface with one leg in the bucket and one leg out so more men could fit in one load, just hoping the bucket didn’t swing wrong and crush your leg against the rock wall.  There’s nothing like an industrial museum to remind you that everything you buy in a store has hours of difficult, dangerous labor built into it.  But it was also labor people traveled miles to get the chance to do!

Only twenty miles further to the Mississippi, my daughter’s first time seeing the river, and across it Dubuque.  Which has a pretty great Op-Art flag:



Our main goal was the National Mississippi River Museum; slick where the Platteville museum was homespun, up-to-date where the Plateville Museum was old-fashioned.  The kids really liked both.  I wanted fewer interactive screens, more actual weird river creatures.

The museum is on the Riverwalk; Dubuque, like just about every city on a body of water, is reinventing its shoreline as a tourist hub.  Every harbor a Harborplace.  OK, I snark, but it was a lovely walk; lots of handsome bridges in view, all different, an old-timey band playing in the gazebo, Illinois and Wisconsin and Iowa invisibly meeting across the water….

Only disappointment of the afternoon; the famous funicular railway was closed.  Maybe they could have posted that on their website or something.  But in a way it’s good they didn’t; if I’d known it was closed, I probably would have decided to put off the trip, and who knows if we’d ever have gone?

On the way back we stopped in Dickeyville to get gas but missed the Dickeyville Grotto; would have stopped there for sure if I’d known about it.  Dinner in Dodgeville at Culver’s, the Midwest’s superior version of In-N-Out, where I got my free Father’s Day turtle.   I like cheese curds and brats as much as the next guy, but I gotta say, I think the turtle is my favorite of the many foods I’d never heard of before I moved to Wisconsin.



June 21, 2017

Scott AaronsonAlex Halderman testifying before the Senate Intelligence Committee

This morning, my childhood best friend Alex Halderman testified before the US Senate about the proven ease of hacking electronic voting machines without leaving any record, the certainty that Russia has the technical capability to hack American elections, and the urgency of three commonsense (and cheap) countermeasures:

  1. a paper trail for every vote cast in every state,
  2. routine statistical sampling of the paper trail—enough to determine whether large-scale tampering occurred, and
  3. cybersecurity audits to instill general best practices (such as firewalling election systems).

You can watch Alex on C-SPAN here—his testimony begins at 2:16:13, and is followed by the Q&A period.  You can also read Alex’s prepared testimony here, as well as his accompanying Washington Post editorial (joint with Justin Talbot-Zorn).

Alex’s testimony—its civic, nonpartisan nature, right down to Alex’s flourish of approvingly quoting President Trump in support of paper ballots—reflects a moving optimism that, even in these dark times for democracy, Congress can be prodded into doing the right thing merely because it’s clearly, overwhelmingly in the national interest.  I wish I could say I shared that optimism.  Nevertheless, when called to testify, what can one do but act on the assumption that such optimism is justified?  Here’s hoping that Alex’s urgent message is heard and acted on.

John BaezThe Theory of Devices

I’m visiting the University of Genoa and talking to two category theorists: Marco Grandis and Giuseppe Rosolini. Grandis works on algebraic topology and higher categories, while Rosolini works on the categorical semantics of programming languages.

Yesterday, Marco Grandis showed me a fascinating paper by his thesis advisor:

• Gabriele Darbo, Aspetti algebrico-categoriali della teoria dei dispotivi, Symposia Mathematica IV (1970), Istituto Nazionale di Alta Matematica, 303–336.

It’s closely connected to Brendan Fong’s thesis, but also different—and, of course, much older. According to Grandis, Darbo was the first person to work on category theory in Italy. He’s better known for other things, like ‘Darbo’s fixed point theorem’, but this piece of work is elegant, and, it seems to me, strangely ahead of its time.

The paper’s title translates as ‘Algebraic-categorical aspects of the theory of devices’, and its main concept is that of a ‘universe of devices’: a collection of devices of some kind that can be hooked up using wires to form more devices of this kind. Nowadays we might study this concept using operads—but operads didn’t exist in 1970, and Darbo did quite fine without them.

The key is the category \mathrm{FinCorel}, which has finite sets as objects and ‘corelations’ as morphisms. I explained corelations here:

Corelations in network theory, 2 February 2016.

Briefly, a corelation from a finite set X to a finite set Y is a partition of the disjoint union of X and Y. We can get such a partition from a bunch of wires connecting points of X and Y. The idea is that two points lie in the same part of the partition iff they’re connected, directly or indirectly, by a path of wires. So, if we have some wires like this:

they determine a corelation like this:

There’s an obvious way to compose corelations, giving a category \mathrm{FinCorel}.

Gabriele Darbo doesn’t call them ‘corelations’: he calls them ‘trasduttori’. A literal translation might be ‘transducers’. But he’s definitely talking about corelations, and like Fong he thinks they are basic for studying ways to connect systems.

Darbo wants a ‘universe of devices’ to assign to each finite set X a set D(X) of devices having X as their set of ‘terminals’. Given a device in D(X) and a corelation f \colon X \to Y, thought of as a bunch of wires, he wants to be able to attach these wires to the terminals in X and get a new device with Y as its set of terminals. Thus he wants a map D(f): D(X) \to D(Y). If you draw some pictures, you’ll see this should give a functor

D : \mathrm{FinCorel} \to \mathrm{Set}

Moreover, if we have device with a set X of terminals and a device with a set Y of terminals, we should be able to set them side by side and get a device whose set of terminals form the set X + Y, meaning the disjoint union of X and Y. So, Darbo wants to have maps

\delta_{X,Y} : D(X) \times D(Y) \to D(X + Y)

If you draw some more pictures you can convince yourself that \delta should be a lax symmetric monoidal functor… if you’re one of the lucky few who knows what that means. If you’re not, you can look it up in many places, such as Section 1.2 here:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Darbo does not mention lax symmetric monoidal functors, perhaps because such concepts were first introduced by Mac Lane only in 1968. But as far as I can tell, Darbo’s definition is almost equivalent to this:

Definition. A universe of devices is a lax symmetric monoidal functor D \colon \mathrm{FinCorel} \to \mathrm{Set}.

One difference is that Darbo wants there to be exactly one device with no terminals. Thus, he assumes D(\emptyset) is a one-element set, say 1, while the definition above would only demand the existence of a map \delta \colon 1 \to D(\emptyset) obeying a couple of axioms. That gives a particular ‘favorite’ device with no terminals. I believe we get Darbo’s definition from the above one if we further assume \delta is the identity map. This makes sense if we take the attitude that ‘a device is determined by its observable behavior’, but not otherwise. This attitude is called ‘black-boxing’.

Darbo does various things in his paper, but the most exciting to me is his example connected to linear electrical circuits. He defines, for any pair of objects V and I in an abelian category C, a particular universe of devices. He calls this the universe of linear devices having V as the object of potentials and I as the object of currents.

If you don’t like abelian categories, think of C as the category of finite-dimensional real vector spaces, and let V = I = \mathbb{R}. Electric potential and electric current are described by real numbers so this makes sense.

The basic idea will be familiar to Fong fans. In an electrical circuit made of purely conductive wires, when two wires merge into one we add the currents to get the current on the wire going out. When one wire splits into two we duplicate the potential to get the potentials on the wires going out. Working this out further, any corelation f : X \to Y between finite set determines two linear relations, one

f_* : I^X \rightharpoonup I^Y

relating the currents on the wires coming in to the currents on the wires going out, and one

f^* : V^Y \rightharpoonup V^X

relating the potentials on the wires going out to the potentials on the wires coming in. Here I^X is the direct sum of X copies of I, and so on; the funky arrow indicates that we have a linear relation rather than a linear map. Note that f_* goes forward while f^* goes backward; this is mainly just conventional, since you can turn linear relations around, but we’ll see it’s sort of nice.

If we let \mathrm{Rel}(A,B) be the set of linear relations between two objects A, B \in C, we can use the above technology to get a universe of devices where

D(X) = \mathrm{Rel}(V^X, I^X)

In other words, a device of this kind is simply a linear relation between the potentials and currents at its terminals!

How does D get to be a functor D : \mathrm{FinCorel} \to \mathrm{FinSet}? That’s pretty easy. We’ve defined it on objects (that is, finite sets) by the above formula. So, suppose we have a morphism (that is, a corelation) f \colon X \to Y. How do we define D(f) : D(X) \to D(Y)?

To answer this question, we need a function

D(f) : \mathrm{Rel}(V^X, I^X) \to \mathrm{Rel}(V^Y, I^Y)

Luckily, we’ve got linear relations

f_* : I^X \rightharpoonup I^Y


f^* : V^Y \rightharpoonup V^X

So, given any linear relation R \in \mathrm{Rel}(V^X, I^X), we just define

D(f)(R) = f_* \circ R \circ f^*


People who have read Fong’s thesis, or my paper with Blake Pollard on reaction networks:

• John Baez and Blake Pollard, A compositional framework for reaction networks.

will find many of Darbo’s ideas eerily similar. In particular, the formula

D(f)(R) = f_* \circ R \circ f^*

appears in Lemma 16 of my paper with Blake, where we are defining a category of open dynamical systems. We prove that D is a lax symmetric monoidal functor, which is just what Darbo proved—though in a different context, since our R is not linear like his, and for a different purpose, since he’s trying to show D is a ‘universe of devices’, while we’re trying to construct the category of open dynamical systems as a ‘decorated cospan category’.

In short: if this work of Darbo had become more widely known, the development of network theory could have been sped up by three decades! But there was less interest in a general theory of networks at the time, lax monoidal functors were brand new, operads unknown… and, sadly, few mathematicians read Italian.

Darbo has other papers, and so do his students. We should read them and learn from them! Here are a few open-access ones:

• Franco Parodi, Costruzione di un universo di dispositivi non lineari su una coppia di gruppi abeliani , Rendiconti del Seminario Matematico della Università di Padova 58 (1977), 45–54.

• Franco Parodi, Categoria degli universi di dispositivi e categoria delle T-algebre, Rendiconti del Seminario Matematico della Università di Padova 62 (1980), 1–15.

• Stefano Testa, Su un universo di dispositivi monotoni, Rendiconti del Seminario Matematico della Università di Padova 65 (1981), 53–57.

At some point I will scan in G. Darbo’s paper and make it available here.

BackreactionIf tensions in cosmological data are not measurement problems, they probably mean dark energy changes

Galaxy pumpkin.Src: The Swell Designer According to physics, the universe and everything in it can be explained by but a handful of equations. They’re difficult equations, all right, but their simplest feature is also the most mysterious one. The equations contain a few dozen parameters that are – for all we presently know – unchanging, and yet these numbers determine everything about the

Doug NatelsonAbout grants: What are "indirect costs"?

Before blogging further about science, I wanted to explain something about the way research grants work in the US.  Consider this part of my series of posts intended to educate students (and perhaps the public) about careers in academic research.

When you write a proposal to a would-be source of research funding, you have to include a budget.  As anyone would expect, that budget will list direct costs - these are items that are clear research expenses.  Examples would include, say, $30K/yr for a graduate student's stipend, and $7K for a piece of laboratory electronics essential to the work, and $2K/yr to support travel of the student and the principal investigator (PI) to conferences.   However, budgets also include indirect costs, sometimes called overhead.  The idea is that research involves certain costs that aren't easy to account for directly, like the electricity to run the lights and air conditioning in the lab, or the costs to keep the laboratory building maintained so that the research can get done, or the (meta)costs for the university to administer the grant.  

So, how does the university to figure out how much to tack on for indirect costs?  For US federal grants, the magic (ahem) is all hidden away in OMB Circular A21 (wiki about it, pdf of the actual doc).  Universities periodically go through an elaborate negotiation process with the federal government (see here for a description of this regarding MIT), and determine an indirect cost rate for that university.  The idea is you take the a version of the direct costs ("modified total direct costs" - for example, a piece of equipment that costs more than $5K is considered a capital expense and not subject to indirect costs) and multiply by a negotiated factor (in the case of Rice right now, 56.5%) to arrive at the indirect costs.  The cost rates are lower for research done off campus (like at CERN), with the argument that this should be cheaper for the university.  (Effective indirect cost rates at US national labs tend to be much higher.)

Foundations and industry negotiate different rates with universities.  Foundations usually limit their indirect cost payments, arguing that they just can't afford to pay at the federal level.  The Bill and Melinda Gates Foundation, for example, only allows (pdf) 10% for indirect costs.   The effective indirect rate for a university, averaged over the whole research portfolio, is always quite a bit lower than the nominal A21 negotiated rate.  Vice provosts/presidents/chancellors for research at major US universities would be happy to explain at length that indirect cost recovery doesn't come close to covering the actual costs associated with doing university-based research.  

Indirect cost rates in the US are fraught with controversy, particularly now.  The current system is definitely complicated, and reasonable people can ask whether it makes sense (and adds administrative costs) to have every university negotiate its own rate with the feds.   It remains to be seen whether there are changes in the offing.

June 20, 2017

Georg von HippelLattice 2017, Day Two

Welcome back to our blog coverage of the Lattics 2017 conference in Granada.

Today's first plenary session started with an experimental talk by Arantza Oyanguren of the LHCb collaboration on B decay anomalies at LHCb. LHCb have amassed a huge number of b-bbar pairs, which allow them to search for and study in some detail even the rarest of decay modes, and they are of course still collecting more integrated luminosity. Readers of this blog will likely recall the Bs → μ+μ- branching ratio result from LHCb, which agreed with the Standard Model prediction. In the meantime, there are many similar results for branching ratios that do not agree with Standard Model predictions at the 2-3σ level, e.g. the ratios of branching fractions like Br(B+→K+μ+μ-)/Br(B+→K+e+e-), in which lepton flavour universality appears to be violated. Global fits to data in these channels appear to favour the new physics hypothesis, but one should be cautious because of the "look-elsewhere" effect: when studying a very large number of channels, some will show an apparently significant deviation simply by statistical chance. On the other hand, it is very interesting that all the evidence indicating potential new physics (including the anomalous magnetic moment of the muon and the discrepancy between the muonic and electronic determinations of the proton electric charge radius) involve differences between processes involving muons and analogous processes involving electrons, an observation I'm sure model-builders have made a long time ago.

This was followed by a talk on flavour physics anomalies by Damir Bečirević. Expanding on the theoretical interpretation of the anomalies discussed in the previous talk, he explained how the data seem to indicate a violation of lepton flavour universality at the level where the Wilson coefficient C9 in the effective Hamiltonian is around zero for electrons, and around -1 for muons. Experimental data seem to favour the situation where C10=-C9, which can be accommodated is certain models with a Z' boson coupling preferentially to muons, or in certain special leptoquark models with corrections at the loop level only. Since I have little (or rather no) expertise in phenomenological model-building, I have no idea how likely these explanations are.

The next speaker was Xu Feng, who presented recent progress in kaon physics simulations on the lattice. The "standard" kaon quantities, such as the kaon decay constant or f+(0), are by now very well-determined from the lattice, with overall errors at the sub-percent level, but beyond that there are many important quantities, such as the CP-violating amplitudes in K → ππ decays, that are still poorly known and very challenging. RBC/UKQCD have been leading the attack on many of these observables, and have presented a possible solution to the ΔI=1/2 rule, which consists in non-perturbative effects making the amplitude A0 much larger relative to A2 than what would be expected from naive colour counting. Making further progress on long-distance contributions to the KL-KS mass difference or εK will require working at the physical pion mass and treating the charm quark with good control of discretization effects. For some processes, such as KL→π0+-, even the sign of the coefficient would be desirable.

After the coffee break, Luigi Del Debbio talked about parton distributions in the LHC era. The LHC data reduce the error on the NNLO PDFs by around a factor of two in the intermediate-x region. Conversely, the theory errors coming from the PDFs are a significant part of the total error from the LHC on Higgs physics and BSM searches. In particular the small-x and large-x regions remain quite uncertain. On the lattice, PDFs can be determined via quasi-PDFs, in which the Wilson line inside the non-local bilinear is along a spatial direction rather than in a light-like direction. However, there are still theoretical issues to be settled in order to ensure that the renormalization and matching the the continuum really lead to the determination of continuum PDFs in the end.

Next was a talk about chiral perturbation theory results on the multi-hadron state contamination of nucleon observables by Oliver Bär. It is well known that until very recently, lattice calculations of the nucleon axial charge underestimated its value relative to experiment, and this has been widely attributed to excited-state effects. Now, Oliver has calculated the corrections from nucleon-pion states on the extraction of the axial charge in chiral perturbation theory, and has found that they actually should lead to an overestimation of the axial charge from the plateau method, at least for source-sink separations above 2 fm, where ChPT is applicable. Similarly, other nucleon charges should be overestimated by 5-10%. Of course, nobody is currently measuring in that distance regime, and so it is quite possible that higher-order corrections or effects not captured by ChPT overcompensate this and lead to an underestimation, which would however mean that there is some instermediate source-sink separation for which one gets the experimental result by accident, as it were.

The final plenary speaker of the morning was Chia-Cheng Chang, who discussed progress towards a precise lattice determination of the nucleon axial charge, presenting the results of the CalLAT collaboration from using what they refer to as the Feynman-Hellmann method, a novel way of implementing what is essentially the summation method through ideas based in the Feynman-Hellmann theorem (but which doesn't involve simulating with a modified action, as a straightforward applicaiton of the Feynman-Hellmann theorem would demand).

After the lunch break, there were parallel sessions, and in the evening, the poster session took place. A particular interesting and entertaining contribution was a quiz about women's contributions to physics and computer science, the winner of which will win a bottle of wine and a book.

Georg von HippelLattice 2017, Day One

Hello from Granada and welcome to our coverage of the 2017 lattice conference.

After welcome addresses by the conference chair, a representative of the government agency in charge of fundamental research, and the rector of the university, the conference started off in a somewhat sombre mood with a commemoration of Roberto Petronzio, a pioneer of lattice QCD, who passed away last year. Giorgio Parisi gave a memorial talk summarizing Roberto's many contributions to the development of the field, from his early work on perturbative QCD and the parton model, through his pioneering contributions to lattice QCD back in the days of small quenched lattices, to his recent work on partially twisted boundary conditions and on isospin breaking effects, which is very much at the forefront of the field at the moment, not to omit Roberto's role as director of the Italian INFN in politically turbulent times.

This was followed by a talk by Martin Lüscher on stochastic locality and master-field simulations of very large lattices. The idea of a master-field simulation is based on the observation of volume self-averaging, i.e. that the variance of volume-averaged quantities is much smaller on large lattices (intuitively, this would be because an infinitely-extended properly thermalized lattice configuration would have to contain any possible finite sub-configuration with a frequency corresponding to its weight in the path integral, and that thus a large enough typical lattice configuration is itself a sort of ensemble). A master field is then a huge (e.g. 2564) lattice configuration, on which volume averages of quantities are computed, which have an expectation value equal to the QCD expectation value of the quantity in question, and a variance which can be estimated using a double volume sum that is doable using an FFT. To generate such huge lattice, algorithms with global accept-reject steps (like HMC) are unsuitable, because ΔH grows with the square root of the volume, but stochastic molecular dynamics (SMD) can be used, and it has been rigorously shown that for short-enough trajectory lengths SMD converges to a unique stationary state even without an accept-reject step.

After the coffee break, yet another novel simulation method was discussed by Ignacio Cirac, who presented techniques to perform quantum simulations of QED and QCD on alattice. While quantum computers of the kind that would render RSA-based public-key cryptography irrelevant remain elusive at the moment, the idea of a quantum simulator (which is essentially an analogue quantum computer), which goes back to Richard Feynman, can already be realized in practice: optical lattices allow trapping atoms on lattice sites while fine-tuning their interactions so as to model the couplings of some other physical system, which can thus be simulated. The models that are typically simulated in this way are solid-state models such as the Hubbard model, but it is of course also possible to setup a quantum simulator for a lattice field theory that has been formulated in the Hamiltonian framework. In order to model a gauge theory, it is necessary to model the gauge symmetry by some atomic symmetry such as angular momentum conservation, and this has been done at least in theory for QED and QCD. The Schwinger model has been studied in some detail. The plaquette action for d>1+1 additionally requires a four-point interaction between the atoms modelling the link variables, which can be realized using additional auxiliary variables, and non-abelian gauge groups can be encoded using multiple species of bosonic atoms. A related theoretical tool that is still in its infancy, but shows significant promise, is the use of tensor networks. This is based on the observation that for local Hamiltonians the entanglement between a region and its complement grows only as the surface of the region, not its volume, so only a small corner of the total Hilbert space is relevant; this allows one to write the coefficients of the wavefunction in a basis of local states as a contraction of tensors, from where classical algorithms that scale much better than the exponential growth in the number of variables that would naively be expected can be derived. Again, the method has been successfully applied to the Schwinger model, but higher dimensions are still challenging, because the scaling, while not exponential, still becomes very bad.

Staying with the topic of advanced simulation techniques, the next talk was Leonardo Giusti speaking about the block factorization of fermion determinants into local actions for multi-boson fields. By decomposing the lattice into three pieces, of which the middle one separates the other by a distance Δ large enough to render e-MπΔ small, and by applying a domain decomposition similar to the one used in Lüscher's DD-HMC algorithm to the Dirac operator, Leonardo and collaborators have been able to derive a multi-boson algorithm that allows to perform multilevel integration with dynamical fermions. For hadronic observables, the quark propagator also needs to be factorized, which Leonardo et al. also have achieved, making a significant decrease in statistical error possible.

After the lunch break there were parallel sessions, in one of which I gave my own talk and another one of which I chaired, thus finishing all of my duties other than listening (and blogging) on day one.

In the evening, there was a reception followed by a special guided tour of the truly stunning Alhambra (which incidentally contains a great many colourful - and very tasteful - lattices in the form of ornamental patterns).

Chad OrzelPhysics Blogging Round-Up: May

Much delayed, but this works out well because it’ll give you something to read while we’re away in Mexico on a family vacation. Here’s what I wrote for Forbes in the merry month of May:

In Science, Probability Is More Certain Than You Think: Some thoughts on the common mistake people make in saying that science only predicts probabilities of future outcomes.

A “Cosmic Controversy” Is Mostly A Distraction: A lament about the neglect of science we know to be true versus more speculative stuff.

Why Do We Invent Historical Roots For Modern Science?: Claims of ancient origins for current ideas in science often have more to do with modern concerns than historical reality.

What Things Should Every Physics Major Know?: A look at the very broad topics that are truly essential for an undergraduate physics degree.

Science Communication Is A Two-Way Street: The calmer version of a Twitter rant about how failures in science communication can’t be blamed only on scientists; the non-scientists who actively push us away also bear some responsibility.

Kind of a lot of noodle-y stuff in this month, largely because of my day job. I was team-teaching our Integrated Math and Physics class with a colleague from Math, and the class met for a couple of hours a day four days a week. It also used a book that I’d never used before, which means that even though the subject matter (introductory E&M) was familiar, it was essentially a new prep because all my notes needed to be converted to match the notation and language of the new book. That didn’t leave an enormous amount of mental energy for blogging.

Traffic-wise, the physics major post was a big hit, and most of the feedback I got was positive. Many of the others were a little too inside-baseball to get read all that widely, which is a Thing.

Anyway, that’s what I was blogging about not all that long ago.

John BaezThe Geometric McKay Correspondence (Part 1)

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the \mathrm{E}_8 Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

• Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of \mathrm{SO}(3) with 60 elements, so its double cover up in \mathrm{SU}(2) has 120. This double cover is called the binary icosahedral group, but I’ll call it \Gamma for short.

This group \Gamma is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of \mathrm{SU}(2) as the unit quaternions. This lets us think of the elements of \Gamma as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:

It’s called the 600-cell.

Since \Gamma is a subgroup of \mathrm{SU}(2) it acts on \mathbb{C}^2, and we can form the quotient space

S = \mathbb{C}^2/\Gamma

This is a smooth manifold except at the origin—that is, the point coming from 0 \in \mathbb{C}^2. There’s a singularity at the origin, and this where \mathrm{E}_8 is hiding! The reason is that there’s a smooth manifold \widetilde{S} and a map

\pi : \widetilde{S} \to S

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of \widetilde{S}, the map \pi : \widetilde{S} \to S, and these 8 spheres.

But first it’s good to get a mental image of S. Each point in this space is a \Gamma orbit in \mathbb{C}^2, meaning a set like this:

\{g x : \; g \in \Gamma \}

for some x \in \mathbb{C}^2. For x = 0 this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in \mathbb{C}^2. This 600-cell is centered at the point 0 \in \mathbb{C}^2, but it can be big or small, depending on the magnitude of x.

So, as we take a journey starting at the origin in S, we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in S, is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

\{g x : \; g \in \Gamma \}

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of \mathbb{C}^2. Then the points g \in \Gamma are unit quaternions forming the vertices of a 600-cell, and multiplying g on the right by x dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

\{x g y : \; g \in \Gamma \}

for a pair of quaternions x, y.

Luckily, there’s a simpler picture of the space S. It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

S = \mathbb{H}/\Gamma

Specifying a point x \in \mathbb{H} amounts to specifying the magnitude \|x\| together with x/\|x\|, which is a unit quaternion, or equivalently an element of \mathrm{SU}(2). So, specifying a point in

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying the magnitude \|x\| together with a point in \mathrm{SU}(2)/\Gamma. But \mathrm{SU}(2) modulo the binary icosahedral group \Gamma is the same as \mathrm{SO(3)} modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, \mathrm{SO(3)} modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of \mathbb{R}^3.

So, specifying a point

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying a nonnegative number \|x\| together with a unit-sized icosahedron centered at the origin of \mathbb{R}^3. But this is the same as specifying an icosahedron of arbitrary size centered at the origin of \mathbb{R}^3. There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, S is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold \widetilde{S}, we replace this singularity with 8 spheres, intersecting in a pattern described by the \mathrm{E}_8 Dynkin diagram.

Points on these spheres are limiting cases of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.

n-Category Café The Geometric McKay Correspondence (Part 1)

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the E 8\mathrm{E}_8 Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

  • Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of SO(3)\mathrm{SO}(3) with 60 elements, so its double cover up in SU(2)\mathrm{SU}(2) has 120. This double cover is called the binary icosahedral group, but I’ll call it Γ\Gamma for short.

This group Γ\Gamma is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of SU(2)\mathrm{SU}(2) as the unit quaternions. This lets us think of the elements of Γ\Gamma as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:

It’s called the 600-cell.

Since Γ\Gamma is a subgroup of SU(2)\mathrm{SU}(2) it acts on 2\mathbb{C}^2, and we can form the quotient space

S= 2/ΓS = \mathbb{C}^2/\Gamma

This is a smooth manifold except at the origin—that is, the point coming from 0 20 \in \mathbb{C}^2. There’s a singularity at the origin, and this where E 8\mathrm{E}_8 is hiding! The reason is that there’s a smooth manifold S˜\widetilde{S} and a map

π:S˜S\pi : \widetilde{S} \to S

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of S˜\widetilde{S}, the map π:S˜S\pi : \widetilde{S} \to S, and these 8 spheres.

But first it’s good to get a mental image of SS. Each point in this space is a Γ\Gamma orbit in 2\mathbb{C}^2, meaning a set like this:

{gx:gΓ}\{g x : \; g \in \Gamma \}

for some x 2x \in \mathbb{C}^2. For x=0x = 0 this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in 2\mathbb{C}^2. This 600-cell is centered at the point 0 20 \in \mathbb{C}^2, but it can be big or small, depending on the magnitude of xx.

So, as we take a journey starting at the origin in SS, we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in SS, is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

{gx:gΓ}\{g x : \; g \in \Gamma \}

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of 2\mathbb{C}^2. Then the points gΓg \in \Gamma are unit quaternions forming the vertices of a 600-cell, and multiplying gg on the right by xx dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

{xgy:gΓ}\{x g y : \; g \in \Gamma \}

for a pair of quaternions x,yx, y.

Luckily, there’s a simpler picture of the space SS. It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

S=/ΓS = \mathbb{H}/\Gamma

Specifying a point xx \in \mathbb{H} amounts to specifying the magnitude x\|x\| together with x/xx/\|x\|, which is a unit quaternion, or equivalently an element of SU(2)\mathrm{SU}(2). So, specifying a point in

{gx:gΓ}/Γ\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying the magnitude x\|x\| together with a point in SU(2)/Γ\mathrm{SU}(2)/\Gamma. But SU(2)\mathrm{SU}(2) modulo the binary icosahedral group Γ\Gamma is the same as SO(3)\mathrm{SO}(3) modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, SO(3)\mathrm{SO}(3) modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of 3\mathbb{R}^3.

So, specifying a point

{gx:gΓ}/Γ\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying a nonnegative number x\|x\| together with a unit-sized icosahedron centered at the origin of 3\mathbb{R}^3. But this is the same as specifying an icosahedron of arbitrary size centered at the origin of 3\mathbb{R}^3. There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, SS is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold S˜\widetilde{S}, we replace this singularity with 8 spheres, intersecting in a pattern described by the E 8\mathrm{E}_8 Dynkin diagram.

Points on these spheres are _limiting cases_ of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.

June 19, 2017

David Hoggcryo-electron-microscopy biases

At the Stars group meeting, I proposed a new approach for asteroseismology, that could work for TESS. My approach depends on the modes being (effectively) coherent, which is only true for short survey durations, where “short” can still mean years. Also, Mike Blanton (NYU) gave us an update on the APOGEE-S spectrograph, being commissioned now at LCO in Chile. Everything is nominal, which bodes very well for SDSS-IV and is great for AS-4. David Weinberg (OSU) showed up and told us about chemical-abundance constraints on a combination of yields and gas-recycling fractions.

In the afternoon I missed Cosmology group meeting, because of an intense discussion about marginalization (in the context of cryo-EM) with Leslie Greengard (Flatiron) and Marina Spivak (Flatiron). In the conversation, Charlie Epstein (Penn) came up with a very simple argument that is highly relevant. Imagine you have many observations of the function f(x), but for each one your x value has had noise applied. If you take as your estimate of the true f(x) the empirical mean of your observations, the bias you get will be (for small scatter in x) proportional to the variance in x times the second derivative of f. That's a useful and intuitive argument for why you have to marginalize.

Chad OrzelKids Update, Programming Note

I’ve skipped a few weeks of cute-kid updates, largely because I was at DAMOP for a week, and then catching on stuff I missed while I was at DAMOP for a week. The principal activity during this stretch has been SteelyKid’s softball, with a mad flurry of games at the end of the season to make up for all the rained-out games. This has been sort of stressful, but it also led to the greatest Google Photos animation ever, so…

Anyway, softball was fun, providing the opportunity for me to take no end of photos with my telephoto lens, some of which are pretty good. SteelyKid was way into running the bases, which ended up providing material for a blog post, so everybody wins. And I got a lot of photos like this one:

SteelyKid is pretty intense when she runs the bases.

Of course, while the intense running and dark helmet make her look a little intimidating in that, she’s still a cheerful eight-year-old, which means there’s really not a lot of killer instinct going on. For example, while she was playing first base, she high-fived every player on the other team who reached base safely:


Softball’s kind of a slow game, so boredom is always a danger when you have an eight-year-old’s attention span. She finds ways to pass the slower moments, though:

SteelyKid working on her fitness while her teammates bat.

(That’s not the greatest photo because the sun is setting more or less directly behind that dugout, but GIMP makes it tolerably clear…)

Speaking of short attention spans, The Pip has also come to a lot of the games, though he mostly just runs around and pays no attention to softball. Here he is stalking Kate, who’s watching SteelyKid play:

Stealthy Pip.

He’s like a ninja. In safety orange.

Much of the time, though, he’s pretty effective at keeping one or both of us from watching the game, frequently roping us into games of hide and seek or tag:

The Pip was It. Now Kate is.

(I also let him chase me around, but I’m the one who knows how to work the good camera. And more importantly, I’m the one who has editorial control over what pictures get posted here…)

And when all else fails, he can plop down on the ground and play in the grass:

“Dad, do I have grass in my hair?”

On the “programming note” side of things, I’m also aware that I haven’t posted the Forbes blog recap from May. I’m planning to type that up and post it probably Wednesday morning. Wednesday afternoon, we’re leaving on vacation for a while, going to Mexico with family, so you can expect a lot of photos of the kids doing tropical things in a few weeks…

BackreactionWhat’s new in high energy physics? Clockworks.

Clockworks. [Img via dwan1509]. High energy physics has phases. I don’t mean phases like matter has – solid, liquid, gaseous and so on. I mean phases like cranky toddlers have: One week they eat nothing but noodles, the next week anything as long as it’s white, then toast with butter but it must be cut into triangles. High energy physics is like this. Twenty years ago, it was extra

Doug NatelsonInteresting reading material

Summer travel and other activities have slowed blogging, but I'll pick back up again soon.  In the meantime, here are a couple of interesting things to read:

  • Ignition!  An Informal History of Liquid Rocket Propellants (pdf) is fascinating, if rather chemistry-heavy.  Come for discussions of subtle side reactions involved in red fuming nitric acid slowly eating its storage containers and suggested (then rejected) propellants like dimethyl mercury (!!), and stay for writing like, "Miraculously, nobody was killed, but there was one casualty — the man who had been steadying the cylinder when it split. He was found some five hundred feet away, where he had reached Mach 2 and was still picking up speed when he was stopped by a heart attack."  This is basically the story from start to finish (in practical terms) of the development of liquid propellants for rockets.   That book also led me to stumbling onto this library of works, most of which are waaaaay too chemistry-oriented for me.  Update:  for a directly relevant short story, see here.
  • Optogenetics is the idea of using light to control and trigger the activation/inactivation of genes.  More recently, there is a big upswing in the idea of magnetogenetics, using magnetic fields to somehow do similar things.  One question at play is, what is the physical mechanism whereby magnetic fields can really do much at room temperature, since magnetic effects tend to be weak.  (Crudely speaking, the energy scale of visible photons is eV, much larger than the thermal energy scale of \(k_{\mathrm{B}}T \sim ~\)26 meV, and readily able to excite vibrations or drive electronic excitations.  However, one electron spin in a reasonably accessible magnetic field of 1 Tesla is \(g \mu_{\mathrm{B}}B \sim ~\) 0.1 meV.)  Here is a nice survey article about the constraints on how magnetogenetics could operate.
  • For a tutorial in how not to handle academic promotion cases, see here.  

John PreskillTime capsule at the Dibner Library

The first time I met Lilla Vekerdy, she was holding a book.

“What’s that?” I asked.

“A second edition of Galileo’s Siderius nuncius. Here,” she added, thrusting the book into my hands. “Take it.”

So began my internship at the Smithsonian Institution’s Dibner Library for the History of Science and Technology.

Many people know the Smithsonian for its museums. The Smithsonian, they know, houses the ruby slippers worn by Dorothy in The Wizard of Oz. The Smithsonian houses planes constructed by Orville and Wilbur Wright, the dresses worn by First Ladies on presidential inauguration evenings, a space shuttle, and a Tyrannosaurus Rex skeleton. Smithsonian museums line the National Mall in Washington, D.C.—the United States’ front lawn—and march beyond.

Most people don’t know that the Smithsonian has 21 libraries.

Lilla heads the Smithsonian Libraries’ Special-Collections Department. She also directs a library tucked into a corner of the National Museum of American History. I interned at that library—the Dibner—in college. Images of Benjamin Franklin, of inventor Eli Whitney, and of astronomical instruments line the walls. The reading room contains styrofoam cushions on which scholars lay crumbling rare books. Lilla and the library’s technician, Morgan Aronson, find references for researchers, curate exhibitions, and introduce students to science history. They also care for the vault.

The vault. How I’d missed the vault.


A corner of the Dibner’s reading room and part of the vault

The vault contains manuscripts and books from the past ten centuries. We handle the items without gloves, which reduce our fingers’ sensitivities: Interpose gloves between yourself and a book, and you’ll raise your likelihood of ripping a page. A temperature of 65°F inhibits mold from growing. Redrot mars some leather bindings, though, and many crowns—tops of books’ spines—have collapsed. Aging carries hazards.

But what the ages have carried to the Dibner! We1 have a survey filled out by Einstein and a first edition of Newton’s Principia mathematica. We have Euclid’s Geometry in Latin, Arabic, and English, from between 1482 and 1847. We have a note, handwritten by quantum physicist Erwin Schödinger, about why students shouldn’t fear exams.

I returned to the Dibner one day this spring. Lilla and I fetched out manuscripts and books related to quantum physics and thermodynamics. “Hermann Weyl” labeled one folder.

Weyl contributed to physics and mathematics during the early 1900s. I first encountered his name when studying particle physics. The Dibner, we discovered, owns a proof for part of his 1928 book Gruppentheorie und Quantenmechanik. Weyl appears to have corrected a typed proof by hand. He’d handwritten also spin matrices.

Electrons have a property called “spin.” Spin resembles a property of yours, your position relative to the Earth’s center. We represent your position with three numbers: your latitude, your longitude, and your distance above the Earth’s surface. We represent electron spin with three blocks of numbers, three 2 \times 2 matrices. Today’s physicists write the matrices as2

S_x  = \begin{bmatrix}  0  &  1  \\  1  &  0  \end{bmatrix}  \, , \quad  S_y  = \begin{bmatrix}  0  &  -i  \\  i  &  0  \end{bmatrix}  \, , \quad \text{and} \quad  S_z  = \begin{bmatrix}  -1  &  0  \\  0  &  1  \end{bmatrix} \, .

We needn’t write these matrices. We could represent electron spin with different 2 \times 2 matrices, so long as the matrices obey certain properties. But most physicists choose the above matrices, in my experience. We call our choice “a convention.”

Weyl chose a different convention:

S_x  = \begin{bmatrix}  1  &  0  \\  0  &  -1  \end{bmatrix}  \, , \quad  S_y  = \begin{bmatrix}  0  &  1  \\  1  &  0  \end{bmatrix}  \, , \quad \text{and} \quad  S_z  = \begin{bmatrix}  0  &  i  \\  -i  &  0  \end{bmatrix} \, .

The difference surprised me. Perhaps it shouldn’t have: Conventions change. Approaches to quantum physics change. Weyl’s matrices differ from ours little: Permute our matrices and negate one matrix, and you recover Weyl’s.

But the electron-spin matrices play a role, in quantum physics, like the role played by T. Rex in paleontology exhibits: All quantum scientists recognize electron spin. We illustrate with electron spin in examples. Students memorize spin matrices in undergrad classes. Homework problems feature electron spin. Physicists have known of electron spin’s importance for decades. I didn’t expect such a bedrock to have changed its trappings.

How did scientists’ convention change? When did it? Why? Or did the convention not change—did Weyl’s contemporaries use today’s convention, and did Weyl stand out?

Weyl 2

A partially typed, partially handwritten, proof of a book by Hermann Weyl.

I intended to end this article with these questions. I sent a draft to John Preskill, proposing to post soon. But he took up the questions like a knight taking up arms.

Wolfgang Pauli, John emailed, appears to have written the matrices first. (Physicist call the matrices “Pauli matrices.”) A 1927 paper by Pauli contains the notation used today. Paul Dirac copied the notation in a 1928 paper, acknowledging Pauli. Weyl’s book appeared the same year. The following year, Weyl used Pauli’s notation in a paper.

No document we know of, apart from the Dibner proof, contains the Dibner-proof notation. Did the notation change between the proof-writing and publication? Does the Dibner hold the only anomalous electron-spin matrices? What accounts for the anomaly?

If you know, feel free to share. If you visit DC, drop Lilla and Morgan a line. Bring a research project. Bring a class. Bring zeal for the past. You might find yourself holding a time capsule by Galileo.

Lilla and me

Dibner librarian Lilla Vekerdy and a former intern

With thanks to Lilla and Morgan for their hospitality, time, curiosity, and expertise. With thanks to John for burrowing into the Pauli matrices’ history.

1I continue to count myself as part of the Dibner community. Part of me refuses to leave.

2I’ll omit factors of \hbar / 2 \, .

June 18, 2017

Sean CarrollA Response to “On the time lags of the LIGO signals” (Guest Post)

This is a special guest post by Ian Harry, postdoctoral physicist at the Max Planck Institute for Gravitational Physics, Potsdam-Golm. You may have seen stories about a paper that recently appeared, which called into question whether the LIGO gravitational-wave observatory had actually detected signals from inspiralling black holes, as they had claimed. Ian’s post is an informal response to these claims, on behalf of the LIGO Scientific Collaboration. He argues that there are data-analysis issues that render the new paper, by James Creswell et al., incorrect. Happily, there are sufficient online tools that this is a question that interested parties can investigate for themselves. Here’s Ian:

On 13 Jun 2017 a paper appeared on the arXiv titled “On the time lags of the LIGO signals” by Creswell et al. This paper calls into question the 5-sigma detection claim of GW150914 and following detections. In this short response I will refute these claims.

Who am I? I am a member of the LIGO collaboration. I work on the analysis of LIGO data, and for 10 years have been developing searches for compact binary mergers. The conclusions I draw here have been checked by a number of colleagues within the LIGO and Virgo collaborations. We are also in touch with the authors of the article to raise these concerns directly, and plan to write a more formal short paper for submission to the arXiv explaining in more detail the issues I mention below. In the interest of timeliness, and in response to numerous requests from outside of the collaboration, I am sharing these notes in the hope that they will clarify the situation.

In this article I will go into some detail to try to refute the claims of Creswell et al. Let me start though by trying to give a brief overview. In Creswell et al. the authors take LIGO data made available through the LIGO Open Science Data from the Hanford and Livingston observatories and perform a simple Fourier analysis on that data. They find the noise to be correlated as a function of frequency. They also perform a time-domain analysis and claim that there are correlations between the noise in the two observatories, which is present after removing the GW150914 signal from the data. These results are used to cast doubt on the reliability of the GW150914 observation. There are a number of reasons why this conclusion is incorrect: 1. The frequency-domain correlations they are seeing arise from the way they do their FFT on the filtered data. We have managed to demonstrate the same effect with simulated Gaussian noise. 2. LIGO analyses use whitened data when searching for compact binary mergers such as GW150914. When repeating the analysis of Creswell et al. on whitened data these effects are completely absent. 3. Our 5-sigma significance comes from a procedure of repeatedly time-shifting the data, which is not invalidated if correlations of the type described in Creswell et al. are present.

Section II: The curious case of the Fourier phase correlations?

The main result (in my opinion) from section II of Creswell et al. is Figure 3, which shows that, when one takes the Fourier transform of the LIGO data containing GW150914, and plots the Fourier phases as a function of frequency, one can see a clear correlation (ie. all the points line up, especially for the Hanford data). I was able to reproduce this with the LIGO Open Science Center data and a small ipython notebook. I make the ipython notebook available so that the reader can see this, and some additional plots, and reproduce this.

For Gaussian noise we would expect the Fourier phases to be distributed randomly (between -pi and pi). Clearly in the plot shown above, and in Creswell et al., this is not the case. However, the authors overlooked one critical detail here. When you take a Fourier transform of a time series you are implicitly assuming that the data are cyclical (i.e. that the first point is adjacent to the last point). For colored Gaussian noise this assumption will lead to a discontinuity in the data at the two end points, because these data are not causally connected. This discontinuity can be responsible for misleading plots like the one above.

To try to demonstrate this I perform two tests. First I whiten the colored LIGO noise by measuring the power spectral density (see the LOSC example, which I use directly in my ipython notebook, for some background on colored noise and noise power spectral density), then dividing the data in the Fourier domain by the power spectral density, and finally converting back to the time domain. This process will corrupt some data at the edges so after whitening we only consider the middle half of the data. Then we can make the same plot:

And we can see that there are now no correlations visible in the data. For white Gaussian noise there is no correlation between adjacent points, so no discontinuity is introduced when treating the data as cyclical. I therefore assert that Figure 3 of Creswell et al. actually has no meaning when generated using anything other than whitened data.

I would also like to mention that measuring the noise power spectral density of LIGO data can be challenging when the data are non-stationary and include spectral lines (as Creswell et al. point out). Therefore it can be difficult to whiten data in many circumstances. For the Livingston data some of the spectral lines are still clearly present after whitening (using the methods described in the LOSC example), and then mild correlations are present in the resulting plot (see ipython notebook). This is not indicative of any type of non-Gaussianity, but demonstrates that measuring the noise power-spectral density of LIGO noise is difficult, and, especially for parameter inference, a lot of work has been spent on answering this question.

To further illustrate that features like those seen in Figure 3 of Creswell et al. can be seen in known Gaussian noise I perform an additional check (suggested by my colleague Vivien Raymond). I generate a 128 second stretch of white Gaussian noise (using numpy.random.normal) and invert the whitening procedure employed on the LIGO data above to produce 128 seconds of colored Gaussian noise. Now the data, previously random, are ‘colored’ Coloring the data in the manner I did makes the full data set cyclical (the last point is correlated with the first) so taking the Fourier transform of the complete data set, I see the expected random distribution of phases (again, see the ipython notebook). However, If I select 32s from the middle of this data, introducing a discontinuity as I mention above, I can produce the following plot:

In other words, I can produce an even more extremely correlated example than on the real data, with actual Gaussian noise.

Section III: The data is strongly correlated even after removing the signal

The second half of Creswell et al. explores correlations between the data taken from Hanford and Livingston around GW150914. For me, the main conclusion here is communicated in Figure 7, where Creswell et al. claim that even after removal of the GW150914 best-fit waveform there is still correlation in the data between the two observatories. This is a result I have not been able to reproduce. Nevertheless, if such a correlation were present it would suggest that we have not perfectly subtracted the real signal from the data, which would not invalidate any detection claim. There could be any number of reasons for this, for example the fact that our best-fit waveform will not exactly match what is in the data as we cannot measure all parameters with infinite precision. There might also be some deviations because the waveform models we used, while very good, are only approximations to the real signal (LIGO put out a paper quantifying this possibility). Such deviations might also be indicative of a subtle deviation from general relativity. These are of course things that LIGO is very interested in pursuing, and we have published a paper exploring potential deviations from general relativity (finding no evidence for that), which includes looking for a residual signal in the data after subtraction of the waveform (and again finding no evidence for that).

Finally, LIGO runs “unmodelled” searches, which do not search for specific signals, but instead look for any coherent non-Gaussian behaviour in the observatories. These searches actually were the first to find GW150914, and did so with remarkably consistent parameters to the modelled searches, something which we would not expect to be true if the modelled searches are “missing” some parts of the signal.

With that all said I try to reproduce Figure 7. First I begin by cross-correlating the Hanford and Livingston data, after whitening and band-passing, in a very narrow 0.02s window around GW150914. This produces the following:

There is a clear spike here at 7ms (which is GW150914), with some expected “ringing” behaviour around this point. This is a much less powerful method to extract the signal than matched-filtering, but it is completely signal independent, and illustrates how loud GW150914 is. Creswell et al. however, do not discuss their normalization of this cross-correlation, or how likely a deviation like this is to occur from noise alone. Such a study would be needed before stating that this is significant—In this case we know this signal is significant from other, more powerful, tests of the data. Then I repeat this but after having removed the best-fit waveform from the data in both observatories (using only products made available in the LOSC example notebooks). This gives:

This shows nothing interesting at all.

Section IV: Why would such correlations not invalidate the LIGO detections?

Creswell et al. claim that correlations between the Hanford and Livingston data, which in their results appear to be maximized around the time delay reported for GW150914, raised questions on the integrity of the detection. They do not. The authors claim early on in their article that LIGO data analysis assumes that the data are Gaussian, independent and stationary. In fact, we know that LIGO data are neither Gaussian nor stationary and if one reads through the technical paper accompanying the detection PRL, you can read about the many tests we run to try to distinguish between non-Gaussianities in our data and real signals. But in doing such tests, we raise an important question: “If you see something loud, how can you be sure it is not some chance instrumental artifact, which somehow was missed in the various tests that you do”. Because of this we have to be very careful when assessing the significance (in terms of sigmas—or the p-value, to use the correct term). We assess the significance using a process called time-shifts. We first look through all our data to look for loud events within the 10ms time-window corresponding to the light travel time between the two observatories. Then we look again. Except the second time we look we shift ALL of the data from Livingston by 0.1s. This delay is much larger than the light travel time so if we see any interesting “events” now they cannot be genuine astrophysical events, but must be some noise transient. We then repeat this process with a 0.2s delay, 0.3s delay and so on up to time delays on the order of weeks long. In this way we’ve conducted of order 10 million experiments. For the case of GW150914 the signal in the non-time shifted data was louder than any event we saw in any of the time-shifted runs—all 10 million of them. In fact, it was still a lot louder than any event in the time-shifted runs as well. Therefore we can say that this is a 1-in-10-million event, without making any assumptions at all about our noise. Except one. The assumption is that the analysis with Livingston data shifted by e.g. 8s (or any of the other time shifts) is equivalent to the analysis with the Livingston data not shifted at all. Or, in other words, we assume that there is nothing special about the non-time shifted analysis (other than it might contain real signals!). As well as the technical papers, this is also described in the science summary that accompanied the GW150914 PRL.

Nothing in the paper “On the time lags of the LIGO signals” suggests that the non-time shifted analysis is special. The claimed correlations between the two detectors due to resonance and calibration lines in the data would be present also in the time-shifted analyses—The calibration lines are repetitive lines, and so if correlated in the non-time shift analyses, they will also be correlated in the time-shift analyses as well. I should also note that potential correlated noise sources was explored in another of the companion papers to the GW150914 PRL. Therefore, taking the results of this paper at face value, I see nothing that calls into question the “integrity” of the GW150914 detection.

Section V: Wrapping up

I have tried to reproduce the results quoted in “On the time lags of the LIGO signals”. I find the claims of section 2 are due to an issue in how the data is Fourier transformed, and do not reproduce the correlations claimed in section 3. Even if taking the results at face value, it would not affect the 5-sigma confidence associated with GW150914. Nevertheless I am in contact with the authors and we will try to understand these discrepancies.

For people interested in trying to explore LIGO data, check out the LIGO Open Science Center tutorials. As someone who was involved in the internal review of the LOSC data products it is rewarding to see these materials being used. It is true that these tutorials are intended as an introduction to LIGO data analysis, and do not accurately reflect many of the intricacies of these studies. For the interested reader a number of technical papers, for example this one, accompany the main PRL and within this paper and its references you can find all the nitty-gritty about how our analyses work. Finally, the PyCBC analysis toolkit, which was used to obtain the 5-sigma confidence, and of which I am one of the primary developers, is available open-source on git-hub. There are instructions here and also a number of examples that illustrate a number of aspects of our data analysis methods.

This article was circulated in the LIGO-Virgo Public Outreach and Education mailing list before being made public, and I am grateful to comments and feedback from: Christopher Berry, Ofek Birnholtz, Alessandra Buonanno, Gregg Harry, Martin Hendry, Daniel Hoak, Daniel Holz, David Keitel, Andrew Lundgren, Harald Pfeiffer, Vivien Raymond, Jocelyn Read and David Shoemaker.

June 16, 2017

Robert HellingI got this wrong

In yesterday's post, I totally screwed up when identifying the middle part of the spectrum as low frequency. It is not. Please ignore what I said or better take it as a warning what happens when you don't double check.

Apologies to everybody that I stirred up!

Robert HellingSome DIY LIGO data analysis

UPDATE: After some more thinking about this, I have very serious doubt about my previous conclusions. From looking at the power spectrum, I (wrongly) assumed that the middle part of the spectrum is the low frequency part (my original idea was, that the frequencies should be symmetric around zero but the periodicity of the Bloch cell bit me). So quite to the opposite, when taking into account the wrapping, this is the high frequency part (at almost the sample rate). So this is neither physics nor noise but the sample rate. For documentation, I do not delete the original post but leave it with this comment.

Recently, in the Arnold Sommerfeld Colloquium, we had Andrew Jackson of NBI talk about his take on the LIGO gravitational wave data, see this announcement with link to a video recording. He encouraged the audience to download the freely available raw data and play with it a little bit. This sounded like fun, so I had my go at it. Now, that his paper is out, I would like to share what I did with you and ask for your comments.

I used mathematica for my experiments, so I guess the way to proceed is to guide you to an html export of my (admittedly cleaned up) notebook (Source for your own experiments here).

The executive summary is that apparently, you can eliminate most of the "noise" at the interesting low frequency part by adding to the signal its time reversal casting some doubt about the stochasticity of this "noise".

I would love to hear what this is supposed to mean or what I am doing wrong, in particular from my friends in the gravitational wave community.

Tommaso DorigoHiggs Boson-Inspired Artwork By High-School Students

The "Art&Science" project is coming to the final phase as far as the activities in Venice are concerned. About 100 15 to 17-year-old students from high schools in Venice have assisted to lessons on particle physics and the Higgs boson in the past months, and have been challenged to produce, alone or in groups of up to three, artistic compositions inspired by what they had learned. This resulted in 38 artworks, many of which are really interesting. The 17 best works will be exposed at the Palazzo del Casinò of the Lido of Venice, the site of the international EPS conference, next July 5-12, and the three best among them will receive prizes during a public event on July 8th, in presence of the CERN director general Fabiola Gianotti.

read more

June 15, 2017

John BaezInformation Processing in Chemical Networks (Part 2)

I’m in Luxembourg, and I’ll be blogging a bit about this workshop:

Dynamics, Thermodynamics and Information Processing in Chemical Networks, 13-16 June 2017, Complex Systems and Statistical Mechanics Group, University of Luxembourg. Organized by Massimiliano Esposito and Matteo Polettini.

I’ll do it in the comments!

I explained the idea of this workshop here:

Information processing in chemical networks.

and now you can see the program here.

Jordan EllenbergWhat I ate in Toronto

A hamburger, black sesame gelato, breakfast (incl. baked beans) at a diner from the 1940s, hand-pulled noodles, poutine.

Update:  “Related posts” reminds me that the last time I went to a conference in Toronto, I learned a lot of interesting math from Julia Wolf, and the same was true this time!

June 13, 2017

n-Category Café Eliminating Binders for Easier Operational Semantics

guest post by Mike Stay

Last year, my son’s math teacher introduced the kids to the concept of a function. One of the major points of confusion in the class was the idea that it didn’t matter whether he wrote f(x)=x 2f(x) = x^2 or f(y)=y 2f(y) = y^2, but it did matter whether he wrote f(x)=xyf(x) = x y or f(x)=xzf(x) = x z. The function declaration binds some of the variables appearing on the right to the ones appearing on the left; the ones that don’t appear on the left are “free”. In a few years when he takes calculus, my son will learn about the quantifiers “for all” and “there exists” in the “epsilon-delta” definition of limit; quantifiers also bind variables in expressions.

Reasoning formally about languages with binders is hard:

“The problem of representing and reasoning about inductively-defined structures with binders is central to the PoplMark challenges. Representing binders has been recognized as crucial by the theorem proving community, and many different solutions to this problem have been proposed. In our (still limited) experience, none emerge as clear winners.” – Aydemir, Bohannon, Fairbairn, Foster, Pierce, Sewell, Vytiniotis, Washburn, Weirich, and Zdancewic, Mechanized metatheory for the masses: The PoplMark challenge. (2005)

The paper quoted above reviews around a dozen approaches in section 2.3, and takes pains to point out that their review is incomplete. However, recently Andrew Pitts and his students (particularly Murdoch Gabbay) developed the notion of a nominal set (introductory slides, book) that has largely solved this problem. Bengston and Barrow use a nominal datatype package in Isabell/HOL to formalize π\pi-calculus, and Clouston defined nominal Lawvere theories. It’s my impression that pretty much everyone now agrees that using nominal sets to formally model binders is the way forward.

Sometimes, though, it’s useful to look backwards; old techniques can lead to new ways of looking at a problem. The earliest approach to the problem of formally modeling bound variables was to eliminate them.

Abstraction elimination

λ\lambda-calculus is named for the binder in that language. The language itself is very simple. We start with an infinite set of variables x,y,z,x, y, z, \ldots and then define the terms to be

t,t::={x variable λx.t abstraction (tt) applicationt, t' ::= \left\{ \begin{array}{lr}x & variable \\ \lambda x.t & abstraction \\ (t\; t') & application\end{array}\right.

Schönfinkel’s idea was roughly to “sweep the binders under the rug”. We’ll allow binders, but only in the definition of a “combinator”, one of a finite set of predefined terms. We don’t allow binders in any expression using the combinators themselves; the binders will all be hidden “underneath” the name of the combinator.

To eliminate all the binders in a term, we start at the “lowest level”, a term of the form t=λx.u,t = \lambda x.u, where uu only has variables, combinators, and applications; no abstractions! Then we’ll try to find a way of rewriting tt using combinators instead. Since the lambda term λx.(vx)\lambda x.(v\; x) behaves the same as vv itself, if we can find some vv such that vx=uv x = u, then the job’s done.

Suppose u=xu=x. What term can we apply to xx and get xx itself back? The identity function, obviously, so our first combinator is

I=λx.x.I = \lambda x.x.

What if uu doesn’t contain xx at all? We need a “konstant” term K uK_u such that (K ux)(K_u\; x) just discards xx and returns u.u. At the same time, we don’t want to have to specify a different combinator for each uu that doesn’t contain xx, so we define our second combinator KK to first read in which uu to return, then read in xx, throw it away, and return u:u:

K=λux.u.K = \lambda u x.u.

Finally, suppose uu is an application u=(ww).u = (w\; w'). The variable xx might occur in ww or in ww' or both. Note that if we recurse on each of the terms in the application, we’ll have terms r,rr, r' such that (rx)=w(r\; x) = w and (rx)=w(r'\; x) = w', so we can write u=((rx)(rx)).u = ((r\; x)\; (r'\; x)). This suggests our final combinator should read in r,rr, r', and xx and “share” xx with them:

S=λrrx.((rx)(rx)).S = \lambda r r'x.((r\; x)\; (r'\; x)).

If we look at the types of the terms S,K,S, K, and II, we find something interesting:

I: ZZ K: ZYZ S: (ZYX)(ZY)ZX\begin{array}{rl}I:&Z \to Z\\K:&Z \to Y \to Z\\S:&(Z \to Y\to X)\to (Z\to Y) \to Z \to X\end{array}

The types correspond exactly to the axiom schemata for positive implicational logic!

The SKIS K I-calculus is a lot easier to model formally than the λ\lambda-calculus; we can use a tiny Gph-enriched Lawvere theory (see the appendix) to capture the operational semantics and then derive the denotational semantics from it.


As computers got smaller and telephony became cheaper, people started to connect them to each other. ARPANET went live in 1969, grew dramatically over the next twenty years, and eventually gave rise to the internet. ARPANET was decommissioned in 1990; that same year, Robin Milner published a paper introducing the π\pi-calculus. He designed it to model the new way computation was occurring in practice: instead of serially on a single computer, concurrently on many machines via the exchange of messages. Instead of applying one term to another, as in the lambda and SK calculi, terms (now called “processes”) get juxtaposed and then exchange messages. Also, whereas in λ\lambda-calculus a variable can be replaced by an entire term, in the π\pi-calculus names can only be replaced by other names.

Here’s a “good parts version” of the asynchronous π\pi-calculus; see the appendix for a full description.

P,Q::= 0 donothing | P|Q concurrency | for(xy)P input | x!y output | νx.P newname | !P replication\begin{array}{rll}P, Q ::= \quad& 0 & do\; nothing \\ \;|\; & P|Q & concurrency \\ \;|\; &for (x \leftarrow y) P & input \\ \;|\; & x!y & output \\ \;|\; & \nu x.P & new\; name \\ \;|\; & !P& replication \end{array}

x!z|for(yx).PP{z/y}communicationrulex!z \;|\; for(y \leftarrow x).P \Rightarrow P\{z/y\}\quad communication\; rule

There are six term constructors for π\pi-calculus instead of the three in λ\lambda-calculus. Concurrency is represented with a vertical bar |, which forms a commutative monoid with 0 as the monoidal unit. There are two binders, one for input and one for introducing a new name into scope. The rewrite rule is reminiscent of a trace in a compact closed category: xx appears in an input term and an output term on the left-hand side, while on the right-hand side xx doesn’t appear at all. I’ll explore that relationship in another post.

The syntax we use for the input prefix is not Milner’s original syntax. Instead, we borrowed from Scala, where the same syntax is syntactic sugar for M(λx.P)(y)M(\lambda x.P)(y) for some monad MM that describes a collection. We read it as “for a message xx drawn from the set of messages sent on yy, do PP with it”.

For many years after Milner proposed the π\pi-calculus, researchers tried to come up with a way to eliminate the bound names from a π\pi-calculus term. Yoshida was able to give an algorithm for eliminating the bound names that come from input prefixes, but not those from new names. Like the abstraction elimination algorithm above, Yoshida’s algorithm produced a set of concurrent combinators. There’s one combinator m(a,x)m(a, x) for sending a message aa on the name xx, and several others that interact with mm’s to move the computation forward (see the appendix for details):

d(a,b,c)|m(a,x) m(b,x)|m(c,x) (fanout) k(a)|m(a,x) 0 (drop) fw(a,b)|m(a,x) m(b,x) (forward) br(a,b)|m(a,x) fw(b,x) (branchright) bl(a,b)|m(a,x) fw(x,b) (branchleft) s(a,b,c)|m(a,x) fw(b,c) (synchronize)\begin{array}{rlr} d(a,b,c) | m(a,x) & \Rightarrow m(b,x) | m(c,x) & (fanout)\\ k(a) | m(a,x) & \Rightarrow 0 & (drop) \\ fw(a,b) | m(a,x) & \Rightarrow m(b,x) & (forward) \\ br(a,b) | m(a,x) & \Rightarrow fw(b,x) & (branch\; right)\\ bl(a,b) | m(a,x) & \Rightarrow fw(x,b) & (branch\; left) \\ s(a,b,c) | m(a,x) & \Rightarrow fw(b,c) & (synchronize)\end{array}

Unlike the SKIS K I combinators, no one has shown a clear connection between some notion of type for these combinators and a system of logic.


Several years ago, Greg Meredith had the idea to combine the set of π\pi-calculus names and the set of π\pi-calculus terms recursively. In a paper with Radestock he introduced a “quoting” operator I’ll write & that turns processes into names and a “dereference” operator I’ll write * that turns names into processes. They also made the calculus higher-order: they send processes on a name and receive the quoted process on the other side.

x!Q|for(yx).PP{&Q/y}communicationrulex!\langle Q\rangle \;|\; for(y \leftarrow x).P \Rightarrow P\{\&Q/y\}\quad communication\; rule

The smallest process is 0, so the smallest name is &0. The next smallest processes are

&00andfor(&0&0)0,\&0\langle 0 \rangle \quad and \quad for(\&0 \leftarrow \&0) \; 0,

which in turn can be quoted to produce more names, and so on.

Together, these two changes let them demonstrate a ν\nu-elimination transformation from π\pi calculus to their reflective higher-order (RHO) calculus: since a process never contains its own name (&P\&P cannot occur in PP), one can use that fact to generate names that are fresh with respect to a process.

Another benefit of reflection is the concept of “namespaces”: since the names have internal structure, we can ask whether they satisfy propositions. This lets Meredith and Radestock define a spatial-behavioral type system like that of Caires, but more powerful. Greg demonstrated that the type system is strong enough to prevent attacks on smart contracts like the $50M one last year that caused the fork in Ethereum.

In our most recent pair of papers, Greg and I consider two different reflective higher-order concurrent combinator calculi where we eliminate all the bound variables. In the first paper, we present a reflective higher-order version of Yoshida’s combinators. In the second, we note that we can think of each of the term constructors as combinators and apply them to each other. Then we can use SKIS K I combinators to eliminate binders from input prefixes and Greg’s reflection idea to eliminate those from ν.\nu. Both calculi can be expressed concisely using Gph-enriched Lawvere theories.

In future work, we intend to present a type system for the resulting combinators and show how the types give axiom schemata.


Gph-theory of SKIS K I-calculus

  • objects
    • TT
  • morphisms
    • S,K,I:1TS, K, I\colon 1 \to T
    • ():T×TT(-\; -)\colon T \times T \to T
  • equations
    • none
  • edges
    • σ:(((Sx)y)z)((xz)(yz))\sigma\colon (((S\; x)\; y)\; z) \Rightarrow ((x\; z)\; (y\; z))
    • κ:((Kx)z)x\kappa\colon ((K\; x)\; z) \Rightarrow x
    • ι:(Iz)z\iota\colon (I\; z) \Rightarrow z


  • grammar
    • P,Q::= 0 donothing | P|Q concurrency | for(xy).P input | x!y output | νx.P newname | !P replication\begin{array}{rll}P, Q ::= \quad & 0 & do\; nothing \\ \;|\; & P|Q & concurrency \\ \;|\; &for (x \leftarrow y).P & input \\ \;|\; & x!y & output \\ \;|\; & \nu x.P & new\; name \\ \;|\; & !P& replication \end{array}
  • structural equivalence
    • free names
      • FN(0)={}FN(0) = \{\}
      • FN(P|Q)=FN(P)FN(Q)FN(P|Q) = FN(P) \cup FN(Q)
      • FN(for(xy).P)={y}FN(P){x}FN(for (x \leftarrow y).P) = \{y\} \cup FN(P) - \{x\}
      • FN(x!y)={x,y}FN(x!y) = \{x,y\}
      • FN(νx.P)=FN(P){x}FN(\nu x.P) = FN(P) - \{x\}
      • FN(!P)=FN(P)FN(!P) = FN(P)
    • 0 and | form a commutative monoid
      • P|0PP|0 \equiv P
      • P|QQ|PP|Q \equiv Q|P
      • (P|Q)|RP|(Q|R)(P|Q)|R \equiv P|(Q|R)
    • replication
      • !PP|!P!P \equiv P\;|\;!P
    • α\alpha-equivalence
      • for(xy).Pfor(zy).P{z/x}for (x \leftarrow y).P\equiv for (z \leftarrow y).P\{z/x\} where zz is not free in PP
    • new names
      • νx.νx.Pνx.P\nu x.\nu x.P \equiv \nu x.P
      • νx.νy.Pνy.νx.P\nu x.\nu y.P \equiv \nu y.\nu x.P
      • νx.(P|Q)(νx.P)|Q\nu x.(P|Q) \equiv (\nu x.P)|Q when xx is not free in QQ
  • rewrite rules
    • x!z|for(yx).PP{z/y}x!z \;|\; for(y \leftarrow x).P \Rightarrow P\{z/y\}
    • if PPP \Rightarrow P', then P|QP|QP\;|\; Q \Rightarrow P'\;|\; Q
    • if PPP \Rightarrow P', then νx.Pνx.P\nu x.P \Rightarrow \nu x.P'

To be clear, the following is not an allowed reduction, because it occurs under an input prefix:

for(vu).(x!z|for(yx).P)for(vu).P{z/y}for(v \leftarrow u).(x!z \;|\; for(y \leftarrow x).P) \nRightarrow for(v \leftarrow u).P\{z/y\}

Yoshida’s combinators

  • grammar
    • atom: Q::=0|m(a,b)|d(a,b,c)|k(a)|fw(a,b)|br(a,b)|bl(a,b)|s(a,b,c)Q ::= 0 \;|\; m(a,b) \;|\; d(a,b,c) \;|\; k(a) \;|\; fw(a,b) \;|\; br(a,b) \;|\; bl(a,b) \;|\; s(a,b,c)
    • process: P::=Q|νa.P|P|P|!PP ::= Q \;|\; \nu a.P \;|\; P|P \;|\; !P
  • structural congruence
    • free names
      • FN(0)={}FN(0) = \{\}
      • FN(k(a))={a}FN(k(a)) = \{a\}
      • FN(m(a,b))=FN(fw(a,b))=FN(br(a,b))=FN(bl(a,b))={a,b}FN(m(a,b)) = FN(fw(a,b)) = FN(br(a,b)) = FN(bl(a,b)) = \{a, b\}
      • FN(d(a,b,c))=FN(s(a,b,c))={a,b,c}FN(d(a,b,c)) = FN(s(a,b,c)) = \{a,b,c\}
      • FN(νa.P)=FN(P){x}FN(\nu a.P) = FN(P) - \{x\}
      • FN(P|Q)=FN(P)FN(Q)FN(P|Q) = FN(P)\cup FN(Q)
      • FN(!P)=FN(P)FN(!P) = FN(P)
    • 0 and | form a commutative monoid
      • P|0PP|0 \equiv P
      • P|QQ|PP|Q \equiv Q|P
      • (P|Q)|RP|(Q|R)(P|Q)|R \equiv P|(Q|R)
    • replication
      • !PP|!P!P \equiv P\;|\;!P
    • new names
      • νx.νx.Pνx.P\nu x.\nu x.P \equiv \nu x.P
      • νx.νy.Pνy.νx.P\nu x.\nu y.P \equiv \nu y.\nu x.P
      • νx.(P|Q)(νx.P)|Q\nu x.(P|Q) \equiv (\nu x.P)|Q when xx is not free in QQ
  • rewrite rules
    • d(a,b,c)|m(a,x)m(b,x)|m(c,x)d(a,b,c) | m(a,x) \Rightarrow m(b,x) | m(c,x) (fanout)
    • k(a)|m(a,x)0k(a) | m(a,x) \Rightarrow 0 (drop)
    • fw(a,b)|m(a,x)m(b,x)fw(a,b) | m(a,x) \Rightarrow m(b,x) (forward)
    • br(a,b)|m(a,x)fw(b,x)br(a,b) | m(a,x) \Rightarrow fw(b,x) (branch right)
    • bl(a,b)|m(a,x)fw(x,b)bl(a,b) | m(a,x) \Rightarrow fw(x,b) (branch left)
    • s(a,b,c)|m(a,x)fw(b,c)s(a,b,c) | m(a,x) \Rightarrow fw(b,c) (synchronize)
    • !PP|!P!P \Rightarrow P|!P
    • if PPP \Rightarrow P' then for any term context CC, C[P]C[P]C[P] \Rightarrow C[P'].

Gph-theory of RHO Yoshida combinators

  • objects
    • N,TN, T
  • morphisms
    • 0:1T0\colon 1 \to T
    • k:TTk\colon T \to T
    • m:N×TTm\colon N \times T \to T
    • fw,br,bl:N 2Tfw,br,bl\colon N^2 \to T
    • d,s:N 3Td,s\colon N^3 \to T
    • |:T 2T|\colon T^2 \to T
    • *:NT*\colon N \to T
    • &:TN\&\colon T \to N
  • equations
    • 0 and | form a commutative monoid
      • P|0=PP|0 = P
      • P|Q=Q|PP|Q = Q|P
      • (P|Q)|R=P|(Q|R)(P|Q)|R = P|(Q|R)
    • *&P=P*\&P = P
  • edges
    • δ:d(a,b,c)|m(a,P)m(b,P)|m(c,P)\delta\colon d(a,b,c) | m(a,P) \Rightarrow m(b,P) | m(c,P)
    • κ:k(a)|m(a,P)0\kappa\colon k(a) | m(a,P) \Rightarrow 0
    • ϕ:fw(a,b)|m(a,P)m(b,P)\phi\colon fw(a,b) | m(a,P) \Rightarrow m(b,P)
    • ρ:br(a,b)|m(a,P)fw(b,&P)\rho\colon br(a,b) | m(a,P) \Rightarrow fw(b,\&P)
    • λ:bl(a,b)|m(a,P)fw(&P,b)\lambda\colon bl(a,b) | m(a,P) \Rightarrow fw(\&P,b)
    • σ:s(a,b,c)|m(a,P)fw(b,c)\sigma\colon s(a,b,c) | m(a,P) \Rightarrow fw(b,c)

Gph-theory of RHO π\pi-like combinators

  • objects
    • TT
  • morphisms
    • C,0,|,for,!,&,*,S,K,I:1TC, 0, |, for, !, \&, *, S, K, I\colon 1 \to T
    • ():T×TT(-\; -)\colon T\times T \to T
  • equations
    • 0 and | form a commutative monoid
      • ((|0)P)=P((|\; 0)\; P) = P
      • ((|((|P)Q))R)=((|P)((|Q)R)((|\; ((|\; P)\; Q))\; R)=((|\; P)\; ((|\; Q)\; R)
      • ((|P)Q)=((|Q)P)((|\; P)\; Q) = ((|\; Q)\; P)
  • edges
    • σ:(((SP)Q)R)((PR)(QR))\sigma\colon(((S\; P)\; Q)\; R) \Rightarrow ((P\; R)\; (Q\; R))
    • κ:((KP)Q)P\kappa\colon((K\; P)\; Q) \Rightarrow P
    • ι:(IP)P\iota\colon(I\; P) \Rightarrow P
    • ξ:((|C)((|((for(&P))Q))((!(&P))R)))((|C)(Q(&R)))\xi\colon((|\; C)\; ((|\; ((for\; (\&\; P))\; Q))\; ((!\; (\&\; P))\; R))) \Rightarrow ((|\; C)\; (Q\; (\&\; R)))
    • ϵ:((|C)(*(&P)))((|C)P)\epsilon\colon((|\; C)\;(*\; (\&\; P))) \Rightarrow ((|\; C)\; P)

June 10, 2017

n-Category Café Enriched Lawvere Theories for Operational Semantics

guest post by Mike Stay

Programs are an expression of programmer intent. We want the computer to do something for us, so we need to tell it what to do. We make mistakes, though, so we want to be able to check somehow that the program will do what we want. The idea of semantics for a programming language is that we assign some meaning to programs in such a way that we can reason about the behavior of a program. There are two main approaches to this: denotational semantics and operational semantics. I’ll discuss both below, but the post will focus for the most part on operational semantics.

There’s a long history of using 2-categories and related structures for term rewriting and operational semantics, but Greg Meredith and I are particularly fond of an approach using multisorted Lawvere theories enriched over the category of reflexive directed graphs, which we call Gph. Such enriched Lawvere theories are equal in power to, for instance, Sassone and Sobociński’s reactive systems, but in my opinion they have a cleaner categorical presentation. We wrote a paper on them:

Here I’ll just sketch the basic ideas.

Denotational Semantics

Denotational semantics works really well for functional programming languages. The actual process of computation is largely ignored in denotational semantics; it doesn’t matter how you compute the function, just what the function is. John Baez’s seminar eleven years ago explored Lambek and Scott’s approach to the denotational semantics of lambda calculus, and there was extensive discussion on this blog. Lambek and Scott constructed a cartesian closed category of types and α\alpha-β\beta-η\eta-equivalence classes of terms with one free variable, and then assigned meaning to the types and terms with a cartesian closed functor into Set.

Denotational semantics gets a lot harder once we move away from functional programming languages. Modern programs run on multiple computers at the same time and each computer has several cores. The computers are connected by networks that can mix up the order in which messages are received. A program may run perfectly by itself, but deadlock when you run it in parallel with another copy of itself. The notion of “composition” begins to change, too: we run programs in parallel with each other and let them interact by passing messages back and forth, not simply by feeding the output of one function into the input of another. All of this makes it hard to think of such programs as functions.

Operational Semantics

Operational semantics is the other end of the spectrum, concerned with the rules by which the state of a computer changes. Whereas denotational semantics is inspired by Church and the lambda calculus, operational semantics is inspired by Turing and his machines. All of computational complexity lives here (e.g. P=?NPP \stackrel{?}{=} NP).

To talk about the operational semantics of a programming language, there are five things we need to define.

First, we have to describe the layout of the state of the computer. For each kind of data that goes into a description of the state, we have a sort. If we’re using a programming language like lambda calculus, we have a sort for variables and a sort for terms, and the term is the entire state of the computer. If we’re using a Turing machine, there are more parts: the tape, the state transition table, the current state, and the position of the read/write head on the tape. If we’re using a modern language like JavaScript, the state is very complex: there are a couple of stacks, the heap, the lexical environment, the this binding, and more.

Second, we have to build up the state itself using term constructors. For example, in lambda calculus, we start with variables and use abstraction and application to build up a specific term.

Third, we say what rearrangements of the state we’re going to ignore; this is called structural congruence. In lambda calculus, we say that two terms are the same if they only differ in the choice of bound variables. In pi calculus, it doesn’t matter in what order we list the processes that are all running at the same time.

Fourth, we give reduction rules describing how the state is allowed to change. In lambda calculus, the state only changes via β\beta-reduction, substituting the argument of a function for the bound variable. In a Turing machine, each state leads to one of five others (change the bit to 0 or 1, then move left or right; or halt). In pi calculus, there may be more than one transition possible out of a particular state: if a process is listening on a channel and there are two messages, then either message may be processed first. Computational complexity theory is all about how many steps it takes to compute a result, so we do not have equations between sequences of rewrites.

Finally, the reduction rules themselves may only apply in certain contexts; for example, in all modern programming languages based on the lambda calculus, no reductions happen under an abstraction. That is, even if a term tt reduces to tt', it is never the case that λx.t\lambda x.t reduces to λx.t\lambda x.t'. The resulting normal form is called “weak head normal form”.

Here’s an example from Boudol’s paper “The π\pi-calculus in direct style.” There are two sorts: xx or zz for variables and LL or NN for terms. The first line, labeled “syntax,” defines four term constructors. There are equations for structural congruence, and there are two reduction rules followed by the contexts in which the rules apply:

Category theory

We’d like to formalize this using category theory. For our first attempt, we capture almost all of this information in a multisorted Gph-enriched Lawvere theory: we have a generating object for each sort, a generating morphism for each term constructor, an equation between morphisms encoding structural congruence, and an edge for each rewrite.

We interpret the theory in Gph. Sorts map to graphs, term constructors to graph homomorphisms, equations to equations, and rewrites map to things I call “graph transformations”, which are the obvious generalization of a natural transformation to the category of graphs: a graph transformation between two graph homomorphisms α:FG\alpha:F\Rightarrow G assigns to each vertex vv an edge α v:FvGv\alpha_v:Fv \to Gv. There’s nothing about a commuting square in the definition because it doesn’t even parse: we can’t compose edges to get a new edge.

This initial approach doesn’t quite work because of the way reduction contexts are usually presented. The reduction rules assume that we have a “global view” of the term being reduced, but the category theory insists on a “local view”. By “local” I mean that we can always whisker a reduction with a term constructor: if KK is an endomorphism on a graph, then given any edge e:vve:v\to v', there’s necessarily an edge Ke:KvKv.K e:K v \to K v'. These two requirements conflict: to model reduction to weak head normal form, if we have a reduction tt,t \to t', we don’t want a reduction λx.tλx.t.\lambda x.t \to \lambda x.t'.

One solution is to introduce “context constructors”, unary morphisms for marking reduction contexts. These contexts become part of the rewrite rules and the structural congruence; for example, taking CC to be the context constructor for weak head normal form, we add a structural congruence rule that says that to reduce an application of one term to another, we have to reduce the term on the left first:

C(TU)(CTU).C(T\; U) \equiv (C T\; U).

We also modify the reduction reduction rule to involve the context constructors. Here’s β\beta reduction when reducing to weak head normal form:

β:(C(λx.T)U)CT{U/x}. \beta: (C(\lambda x.T)\; U) \Rightarrow C T\{U/x\}.

Now β\beta reduction can’t happen just anywhere; it can only happen in the presence of the “catalyst” CC.

With context constructors, we can capture all of the information about operational semantics using a multisorted Gph-enriched Lawvere theory: we have a generating object for each sort, a generating morphism for each term constructor and for each context constructor, equations between morphisms encoding structural congruence and context propagation, and an edge for each rewrite in its appropriate context.

Connecting to Lambek/Scott

The SKI combinator calculus is a formal system invented by Schönfinkel and Curry. It allows for universal computation, and expressions in this calculus can easily be translated into the lambda calculus, but it’s simpler because it doesn’t include variables. The SK calculus is a fragment of the SK calculus that is still computationally universal.

We can recover the Lambek/Scott-style denotational semantics of the SK calculus (see the appendix) by taking the Gph-theory, modding out by the edges, and taking the monoid of endomorphisms on the generating object. The monoid is the cartesian closed category with only the “untyped” type. Using Melliès and Zeilberger’s notion of a functor as a type refinement system, we “-oidify” the monoid into a category of types and equivalence classes of terms.

However, modding out by edges utterly destroys the semantics of concurrent languages, and composition of endomorphisms doesn’t line up particularly well with composition of processes, so neither of those operations are desirable in general. That doesn’t stop us from considering Gph-enriched functors as type refinement systems, though.

Let GG be the free model of a theory on the empty graph. Our plan for future work is to show how different notions of a collection of edges of GG give rise to different kinds of logics. For example, if we take subsets of the edges of G,G, we get subgraphs of G,G, which form a Heyting algebra. On the other hand, if we consider sets of lists of composable edges in G,G, we get quantale semantics for linear logic. Specific collections will be the types in the type system, and proofs should be graph homomorphisms mapped over the collection. Edges will feature in proof normalization.

At the end, we should have a system where given a formal semantics for a language, we algorithmically derive a type system tailored to the language. We should also get a nice Curry-Howard style approach to operational semantics that even denotational semantics people won’t turn up their noses at!



For our purposes, a graph is a set EE of edges and a set VV of vertices together with three functions s:EVs\colon E \to V for the source of the edge, t:EVt\colon E \to V for the target, and a:VEa\colon V \to E such that sa=Vs\circ a = V and ta=Vt \circ a = V—that is, aa assigns a chosen self-loop to each vertex. A graph homomorphism maps vertices to vertices and edges to edges such that sources, targets and chosen self-loops are preserved. Gph is the category of graphs and graph homomorphisms. Gph has finite products: the terminal graph is the graph with one vertex and one loop, while the product of two graphs (E,V,s,t,a)×(E,V,s,t,a)(E , V , s, t, a) \times (E' , V' , s' , t' , a') is (E×E,V×V,s×s,t×t,a×a).(E \times E', V \times V', s \times s', t\times t', a \times a').

Gph is a topos; the subobject classifier has two vertices t,ft, f and five edges: the two self-loops, an edge from tt to f,f, an edge from ff to t,t, and an extra self-loop on tt. Any edge in a subgraph maps to the chosen self-loop on t,t, while an edge not in the subgraph maps to one of the other four edges depending on whether the source and target vertex are included or not.

A Gph-enriched category consists of

  • a set of objects;
  • for each pair of objects x,y,x, y, a graph hom(x,y);\hom(x,y);
  • for each triple of objects x,y,z,x, y, z, a composition graph homomorphism :hom(y,z)×hom(x,y)hom(x,z);\quad \circ\colon \hom(y, z) \times \hom(x, y) \to \hom(x, z); and
  • for each object x,x, a vertex of hom(x,x),\hom(x, x), the identity on x,x,

such that composition is associative, and composition and the identity obey the unit laws. A Gph-enriched category has finite products if the underlying category does.

Any category is trivially Gph-enrichable by treating the elements of the hom sets as vertices and adjoining a self loop to each vertex. The category Gph is nontrivially Gph-enriched: Gph is a topos, and therefore cartesian closed, and therefore enriched over itself. Given two graph homomorphisms F,F:(E,V,s,t,a)(E,V,s,t,a),F, F'\colon (E, V, s, t, a) \to (E', V', s', t', a'), a graph transformation assigns to each vertex vv in VV an edge ee' in EE' such that s(e)=F(v)s'(e') = F(v) and t(e)=F(v).t'(e') = F'(v). Given any two graphs GG and G,G', there is an exponential graph G GG'^G whose vertices are graph homomorphisms between them and whose edges are graph transformations. There is a natural isomorphism between the graphs C A×BC^{A\times B} and (C B) A.(C^B)^A.

A Gph-enriched functor between two Gph-enriched categories C,DC, D is a functor between the underlying categories such that the graph structure on each hom set is preserved, i.e. the functions between hom sets are graph homomorphisms between the hom graphs.

Let SS be a finite set, FinSet be a skeleton of the category of finite sets and functions between them, and FinSet/SFinSet/S be the category of functions into SS and commuting triangles. A multisorted Gph-enriched Lawvere theory, hereafter Gph-theory is a Gph-enriched category with finite products Th equipped with a finite set SS of sorts and a Gph-enriched functor θ:FinSet op/STh\theta\colon FinSet^{op}/S \to Th that preserves products strictly. Any Gph-theory has an underlying multisorted Lawvere theory given by forgetting the edges of each hom graph.

A model of a Gph-theory Th is a Gph-enriched functor from Th to Gph that preserves products up to natural isomorphism. A homomorphism of models is a braided Gph-enriched natural transformation between the functors. Let FPGphCat be the 2-category of small Gph-enriched categories with finite products, product-preserving Gph-functors, and braided Gph-natural transformations. The forgetful functor U:FPGphCat[Th,Gph]GphU\colon FPGphCat[\Th, \Gph] \to \Gph that picks out the underlying graph of a model has a left adjoint that picks out the free model on a graph.

Gph-enriched categories are part of a spectrum of 2-category-like structures. A strict 2-category is a category enriched over Cat with its usual product. Sesquicategories are categories enriched over Cat with the “funny” tensor product; a sesquicategory can be thought of as a generalized 2-category where the interchange law does not always hold. A Gph-enriched category can be thought of as a generalized sesquicategory where 2-morphisms (now edges) cannot always be composed. Any strict 2-category has an underlying sesquicategory, and any sesquicategory has an underlying Gph-enriched category; these forgetful functors have left adjoints.

Some examples

The SK calculus

Here’s a presentation of the Gph-theory for the SK calculus:

  • objects
    • TT
  • morphisms
    • S:1TS\colon 1 \to T
    • K:1TK\colon 1 \to T
    • ():T×TT(-\; -)\colon T \times T \to T
  • equations
    • none
  • edges
    • σ:(((Sx)y)z)((xz)(yz))\sigma\colon (((S\; x)\; y)\; z) \Rightarrow ((x\; z)\; (y\; z))
    • κ:((Kx)z)x\kappa\colon ((K\; x)\; z) \Rightarrow x

The free model of this theory on the empty graph has a vertex for every term in the SK calculus and an edge for every reduction.

The SK calculus with the weak head normal form reduction strategy

Here’s the theory above modified to account for weak head normal form:

  • objects
    • TT
  • morphisms
    • S:1TS\colon 1 \to T
    • K:1TK\colon 1 \to T
    • ():T×TT(-\; -)\colon T \times T \to T
    • R:TTR\colon T \to T
  • equations
    • R(xy)=(Rxy)R(x\; y) = (Rx\; y)
  • edges
    • σ:(((RSx)y)z)((Rxz)(yz))\sigma\colon (((RS\; x)\; y)\; z) \Rightarrow ((Rx\; z)\; (y\; z))
    • κ:((RKx)z)Rx\kappa\colon ((RK\; x)\; z) \Rightarrow Rx

If MM is an SKSK term with no uses of RR and MM' is its weak head normal form, then RMRM reduces to RM.RM'.

Once we have RR, we can think about other things to do with it. The rewrites treat the morphism RR as a linear resource. If we treat RR as “fuel” rather than “catalyst”—that is, if we modify σ\sigma and κ\kappa so that RR only appears on the left-hand side—then the reduction of R nMR^n M counts how many steps the computation takes; this is similar to Ethereum’s notion of “gasoline.”

June 09, 2017

Robert HellingRelativistic transformation of temperature

Apparently, there is a long history of controversy going back to Einstein an Planck about the proper way to deal with temperature relativistically. And I admit, I don't know what exactly the modern ("correct") point of view is. So I would like to ask your opinion about a puzzle we came up during yesterday's after colloquium dinner with Erik Verlinde:

Imagine a long rail of a railroad track. It is uniformly heated to a temperature T and is in thermodynamic equilibrium (if you like a mathematical language: it is in a KMS state). On this railroad track travels Einstein's relativistic train at velocity v. From the perspective of the conductor, the track in front of the train is approaching the train with velocity v, so one might expect that the temperature T appears blue shifted while behind the train, the track is moving away with v and so the temperature appears red-shifted. 

Following this line of thought, one would conclude that the conductor thinks the rail has different temperatures in different places and thus is out of equilibrium.  

On the other hand, the question of equilibrium should be independent of the observer. So, is the assumption of the Doppler shift wrong? 

A few remarks: If you are worried that Doppler shifts should apply to radiation then you are free to assume that both in front and in the back, there are black bodies in thermal contact with the rail and thus exhibiting a photon gas at the same temperature as the rail.

You could probably also make the case for the temperature transforming like the time component of a four vector (since it is essentially an energy). Then the transformed temperature would be independent of the sign of v. This you could for example argue for by assuming the temperature is so high that in your black body photon gas you also create electron-positron pairs which would be heavier due to their relativistic speed relative to the train and thus requiring more energy (and thus temperature) for their creation.

A final remark is about an operational definition of temperature at relativistic speeds: It might be difficult to bring a relativistic thermometer in equilibrium with a system if there is a large relative velocity (when we define temperature as the criterium for two systems in contact to be in equilibrium). Or to operate a heat engine between he front part of the rail and the back while moving along at relativistic speed and then arguing about the efficiency (and defining the temperature  that way).

Update one day later:
Thanks for all your comments. We also had some further discussions here and I would like to share my conclusions:

1) It probably boils down to what exactly you mean when you say ("temperature"). Of course, you want that his at least in familiar situations agrees with what thermometers of this type or another measure. (In the original text I had hinted at two possible definitions that I learned about from a very interesting paper by Buchholz and Solveen discussing the Unruh effect and what would actually be observed there: Either you define temperature that the property that characterises equilibrium states of systems such there is no heat exchange when you bring in contact two systems of the same temperature. This is for example close to what a mercury thermometer measures. Alternatively, you operate a perfect heat engine between two reservoirs and define your temperatures via
$$\eta = \frac{T_h - T_c}{T_h}.$$
This is for example hinted at in the Feynamn lectures on physics.

One of the commentators suggested using the ratio of eigenvalues of the energy momentum tensor as definition of temperature. Even though this might give the usual thing for a perfect fluid I am not really convinced that this generalises in the right way.

2) I would rather define the temperature as the parameter in the Gibbs (or rather KMS) state (it should only exist in equilibrium, anyway). So if your state is described by density matrix $\rho$, and it can be written as
$$\rho = \frac{e^{-\beta H}}{tr(e^{-\beta H})}$$
then $1/\beta$ is the temperature. Obviously, this requires the a priori knowledge of what the Hamiltonian is.

For such states, under mild assumptions, you can prove nice things: Energy-entropy inequalities ("minimisation of free energy"), stability, return to equilibrium and most important here: passivity, i.e. the fact you cannot extract mechanical work out of this state in a cyclic process.

2) I do not agree that it is out of the question to have a thermometer with a relative velocity in thermal equilibrium with a heat bath at rest. You could for example imagine a mirror fixed next to the track and in thermal equilibrium with the track. A second mirror is glued to the train (and again in thermal equilibrium, this time with a thermometer). Between the mirrors is is a photon gas (black body) that you could imagine equilibrating with the mirrors on both ends. The question is if that is the case.

3) Maybe rails and trains a a bit too non-spherical cows, so lets better look at an infinitely extended free quantum gas (bosons or fermions, your pick). You put it in a thermal state at rest, i.e. up to normalisation, its density matrix is given by
$$\rho = e^{-\beta P^0}.$$
Here $P^0$ is the Poincaré generator of time translations.

Now, the question above can be rephrased as: Is there a $\beta'$ such that also
$$\rho = e^{-\beta' (\cosh\alpha P^0 + \sinh \alpha P^1)}?$$
And to the question formulated this way, the answer is pretty clearly "No". A thermal state singles out  a rest frame and that's it. It is not thermal in the moving frame and thus there is no temperature.

It's also pretty easy to see this state is not passive (in the above sense): You could operate a windmill in the slipstream of particles coming more likely from the front than the back. So in particular, this state is not KMS (this argument I learned from Sven Bachmann).

4) Another question would be about gravitational redshift: Let's take some curve space-time and for simplicity assume it has no horizons (for example, let the far field be Schwarzschild but in the center, far outside the Schwarzschild radius, you smooth it out. Like the space-time created by the sun). Make it static, so it contains a timeline Killing vector (otherwise no hope for a thermal state). Now prepare a scalar field in the thermal state with temperature T. Couple to it a harmonic oscillator via
$$ H_{int}(r) = a^\dagger a + \phi(t, r) (a^\dagger + a).$$
You could now compute a "local temperature" by computing the probability that the harmonic oscillator is in the first excited state. Then, how does this depend on $r$?

Tommaso DorigoPhysics World's Review Of "Anomaly!"

A new review of my book, "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", has appeared on the June issue of "Physics World". It is authored by Gavin Hesketh, a lecturer at University College London, and you can read it here.

read more

John BaezThe Mathematics of Open Reaction Networks

Next week, Blake Pollard and I will talk about our work on reaction networks. We’ll do this at Dynamics, Thermodynamics and Information Processing in Chemical Networks, a workshop at the University of Luxembourg organized by Massimiliano Esposito and Matteo Polettini. We’ll do it on Tuesday, 13 June 2017, from 11:00 to 13:00, in room BSC 3.03 of the Bâtiment des Sciences. If you’re around, please stop by and say hi!

Here are the slides for my talk:

The mathematics of open reaction networks.

Abstract. To describe systems composed of interacting parts, scientists and engineers draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Feynman diagrams and the like. In principle all these different diagrams fit into a common framework: the mathematics of monoidal categories. This has been known for some time. However, the details are more challenging, and ultimately more rewarding, than this basic insight. Here we explain how various applications of reaction networks and Petri nets fit into this framework.

If you see typos or other problems please let me know now!

I hope to blog a bit about the workshop… it promises to be very interesting.

June 07, 2017

BackreactionDear Dr B: What are the chances of the universe ending out of nowhere due to vacuum decay?

“Dear Sabine, my names [-------]. I'm an anxiety sufferer of the unknown and have been for 4 years. I've recently came across some articles saying that the universe could just end out of no where either through false vacuum/vacuum bubbles or just ending and I'm just wondering what the chances of this are occurring anytime soon. I know it sounds silly but I'd be dearly greatful for your reply

June 06, 2017

Scott AaronsonHigher-level causation exists (but I wish it didn’t)

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.

Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.

This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?

To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.

Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.

Scott AaronsonThe Social Justice Warriors are right

As you might know, I haven’t been exactly the world’s most consistent fan of the Social Justice movement, nor has it been the most consistent fan of me.

I cringe when I read about yet another conservative college lecture shut down by mob violence; or student protesters demanding the firing of a professor for trying gently to argue and reason with them; or an editor forced from his position for writing a (progressive) defense of “cultural appropriation”—a practice that I take to have been ubiquitous for all of recorded history, and without which there wouldn’t be any culture at all.  I cringe not only because I know that I was in the crosshairs once before and could easily be again, but also because, it seems to me, the Social Justice scalp-hunters are so astoundingly oblivious to the misdirection of their energies, to the power of their message for losing elections and neutering the progressive cause, to the massive gift their every absurdity provides to the world’s Fox Newses and Breitbarts and Trumps.

Yet there’s at least one issue where it seems to me that the Social Justice Warriors are 100% right, and their opponents 100% wrong. This is the moral imperative to take down every monument to Confederate “war heroes,” and to rename every street and school and college named after individuals whose primary contribution to the world was to defend chattel slavery.  As a now-Southerner, I have a greater personal stake here than I did before: UT Austin just recently removed its statue of Jefferson Davis, while keeping up its statue of Robert E. Lee.  My kids will likely attend what until very recently was called Robert E. Lee Elementary—this summer renamed Russell Lee Elementary.  (My suggestion, that the school be called T. D. Lee Parity Violation Elementary, was sadly never considered.)

So I was gratified that last week, New Orleans finally took down its monuments to slavers.  Mayor Mitch Landrieu’s speech, setting out the reasons for the removal, is worth reading.

I used to have little patience for “merely symbolic” issues: would that offensive statues and flags were the worst problems!  But it now seems to me that the fight over Confederate symbols is just a thinly-veiled proxy for the biggest moral question that’s faced the United States through its history, and also the most urgent question facing it in 2017.  Namely: Did the Union actually win the Civil War? Were the anti-Enlightenment forces—the slavers, the worshippers of blood and land and race and hierarchy—truly defeated? Do those forces acknowledge the finality and the rightness of their defeat?

For those who say that, sure, slavery was bad and all, but we need to keep statues to slavers up so as not to “erase history,” we need only change the example. Would we similarly defend statues of Hitler, Himmler, and Goebbels, looming over Berlin in heroic poses?  Yes, let Germans reflect somberly and often on this aspect of their heritage—but not by hoisting a swastika over City Hall.

For those who say the Civil War wasn’t “really” about slavery, I reply: this is the canonical example of a “Mount Stupid” belief, the sort of thing you can say only if you’ve learned enough to be wrong but not enough to be unwrong.  In 1861, the Confederate ringleaders themselves loudly proclaimed to future generations that, indeed, their desire to preserve slavery was their overriding reason to secede. Here’s CSA Vice-President Alexander Stephens, in his famous Cornerstone Speech:

Our new government is founded upon exactly the opposite ideas; its foundations are laid, its cornerstone rests, upon the great truth that the negro is not equal to the white man; that slavery, subordination to the superior race, is his natural and normal condition. This, our new government, is the first, in the history of the world, based upon this great physical, philosophical, and moral truth.

Here’s Texas’ Declaration of Secession:

We hold as undeniable truths that the governments of the various States, and of the confederacy itself, were established exclusively by the white race, for themselves and their posterity; that the African race had no agency in their establishment; that they were rightfully held and regarded as an inferior and dependent race, and in that condition only could their existence in this country be rendered beneficial or tolerable. That in this free government all white men are and of right ought to be entitled to equal civil and political rights; that the servitude of the African race, as existing in these States, is mutually beneficial to both bond and free, and is abundantly authorized and justified by the experience of mankind, and the revealed will of the Almighty Creator, as recognized by all Christian nations; while the destruction of the existing relations between the two races, as advocated by our sectional enemies, would bring inevitable calamities upon both and desolation upon the fifteen slave-holding states.

It was only when defeat looked inevitable that the slavers started changing their story, claiming that their real grievance was never about slavery per se, but only “states’ rights” (states’ right to do what, exactly?). So again, why should we take the slavers’ rationalizations any more seriously than we take the postwar epiphanies of jailed Nazis that actually, they’d never felt any personal animus toward Jews, that the Final Solution was just the world’s biggest bureaucratic mishap?  Of course there’s a difference: when the Allies occupied Germany, they insisted on de-Nazification.  They didn’t suffer streets to be named after Hitler. And today, incredibly, fascism and white nationalism are greater threats here in the US than they are in Germany.  One reads about the historic irony of some American Jews, who are eligible for German citizenship because of grandparents expelled from there, now seeking to move there because they’re terrified about Trump.

By contrast, after a brief Reconstruction, the United States lost its will to continue de-Confederatizing the South.  The leaders were left free to write book after book whitewashing their cause, even to hold political office again.  And probably not by coincidence, we then got nearly a hundred years of Jim Crow—and still today, a half-century after the civil rights movement, southern governors and legislatures that do everything in their power to disenfranchise black voters.

For those who ask: but wasn’t Robert E. Lee a great general who was admired by millions? Didn’t he fight bravely for a cause he believed in?  Maybe it’s just me, but I’m allergic to granting undue respect to history’s villains just because they managed to amass power and get others to go along with them.  I remember reading once in some magazine that, yes, Genghis Khan might have raped thousands and murdered millions, but since DNA tests suggest that ~1% of humanity is now descended from him, we should also celebrate Khan’s positive contribution to “peopling the world.” Likewise, Hegel and Marx and Freud and Heidegger might have been wrong in nearly everything they said, sometimes with horrific consequences, but their ideas still need to be studied reverently, because of the number of other intellectuals who took them seriously.  As I reject those special pleas, so I reject the analogous ones for Jefferson Davis, Alexander Stephens, and Robert E. Lee, who as far as I can tell, should all (along with the rest of the Confederate leadership) have been sentenced for treason.

This has nothing to do with judging the past by standards of the present. By all means, build statues to Washington and Jefferson even though they held slaves, to Lincoln even though he called blacks inferior even while he freed them, to Churchill even though he fought the independence of India.  But don’t look for moral complexity where there isn’t any.  Don’t celebrate people who were terrible even for their own time, whose public life was devoted entirely to what we now know to be evil.

And if, after the last Confederate general comes down, the public spaces are too empty, fill them with monuments to Alan Turing, Marian Rejewski, Bertrand Russell, Hypatia of Alexandria, Emmy Noether, Lise Meitner, Mark Twain, Srinivasa Ramanujan, Frederick Douglass, Vasili Arkhipov, Stanislav Petrov, Raoul Wallenberg, even the inventors of saltwater taffy or Gatorade or the intermittent windshield wiper.  There are, I think, enough people who added value to the world to fill every city square and street sign.

Tommaso DorigoA Giant Fireball Thrown At Venice

In the evening of May 30 a giant fireball lit up the skies south of Venice, Italy. The object, which was traveling very slowly along a south-north trajectory, was captured by three video stations in the area, plus observed by countless bystanders and recorded in pictures. The video data allowed to precisely measure the trajectory, which made it clear that the rock was headed straight toward the Venice metropolitan area, and that it would have landed there if it had not disintegrated in flight.

read more

Doug NatelsonFollow-up: More 5nm/7nm/10nm transistors

Two years ago I wrote this post about IBM's announcement of "7 nm transistors", and it still gets many pageviews every week.   So, what's been going on since then?

I encourage you to read this article from Semiconductor Engineering - it's a nice, in-depth look at what the major manufacturers are doing to get very small transistors actually to market.  It also explains what those numerical designations of size really mean, and changes in lithography technology.  (This is one part of my book that I really will have to revise at some point.)

Likewise, here is a nice article about IBM's latest announcement of "5 nm" transistors.  The writing at Ars Technica on this topic is usually of high quality.  The magic words that IBM is now using are "gate all around", which means designing the device geometry so that the gate electrode, the one that actually controls the conduction, affects the channel (where the current flows) from all sides.  In old-school planar transistors, the gate only couples to the channel from one direction.

Later I will write a long, hopefully accessible article about transistors, as this year is the 70th anniversary of the Bell Labs invention that literally reshaped global technology.

June 05, 2017

Jordan EllenbergRational points on solvable curves over Q via non-abelian Chabauty (with Daniel Hast)

New paper up!  With my Ph.D. student Daniel Hast (last seen on the blog here.)

We prove that hyperelliptic curves over Q of genus at least 2 have only finitely many rational points.  Actually, we prove this for a more general class of high-genus curves over Q, including all solvable covers of P^1.

But wait, don’t we already know that, by Faltings?  Of course we do.  So the point of the paper is to show that you can get this finiteness in a different way, via the non-abelian Chabauty method pioneered by Kim.  And I think it seems possible in principle to get Faltings for all curves over Q this way; though I don’t know how to do it.

Chabauty’s theorem tells you that if X/K is a curve over a global field, and the rank of J(X)(K) is strictly smaller than g(X), then X(K) is finite.  That theorem statement is superseded by Faltings, but the method is not; and it’s been the subject of active interest recently, in part because Chabauty gives you more control over the number of points.  (See e.g. this much-lauded recent paper of Katz, Rabinoff, and Zureick-Brown.)

But of course the Chabauty condition on the Mordell-Weil rank of J(X) isn’t always satisfied.  So what do you do with curves outside the Chabauty range?  A natural and popular idea is to exploit an etale cover Y -> X.  All the rational points of X are covered by those of a finite list of twists of Y; so if you can show all the relevant twists of Y satisfy Chabauty, you get finiteness of X(K).  And that seems pretty reasonable; there’s no reason two distinct points of X should lie on the same twist Y_e of Y, and once a twist Y_e has only one rational point, the stuff in J(Y) that’s not in J(X) doesn’t have any reason to have large Mordell-Weil rank.  Even if J(X) has pretty big rank, it should get “diluted” by all that empty Prym-like space.

About 15 years ago (amazing how projects that old still feel like “things I was recently thinking about”) I spent a lot of time working on this.  What I wanted to show was something like this:  if X is a curve (let’s say with a given rational basepoint), there’s a natural etale cover X_n obtained by mapping X to its Jacobian and then pulling back the multiplication map [p^n]:J(X) -> J(X).  In fact, the X_n fit together into a tower of curves over X.  Is it the case that

\mbox{rank} J(X_n)(K) < g(X_n)

for n large enough?  And what about the twists of X_n?  The idea was to work at the “top” of the tower, where you have actions of lots of p-adic groups; the action of T_p J(X) by deck transformations, the action of some symplectic quotient of Gal(K) on T_p J(X), making the Selmer group at the top a great big non-abelian Iwasawa module.  I still think that makes sense!  But I ended up not being able to prove anything good about it.  (This preprint was as far as I got.)

That project was about abelian covers (the Jacobians) of abelian covers of X; so it was really about the metabelian quotient of the fundamental group of X.  What Minhyong Kim did was quite different, working with the much smaller nilpotent quotient; and it turns out that here you can indeed show that, for certain X, some version of Chabauty applies with Jac(X) replaced by a “unipotent Albanese,” which is to the quotient of pi_1(X) by some term of the lower central series as Jac(X) is to the Jacobian.  Very very awesome.

What are “certain X”?  Well, Kim’s method applies to all curves whose Jacobians have CM, as proved in this paper by him and Coates.

Most hyperelliptic curves don’t have CM Jacobians.  But now the etale cover trick comes to the rescue, because, by an offbeat result of Bogomolov and Tschinkel I have always admired, every hyperelliptic curve X has an etale cover Y which geometrically dominates the CM genus 2 curve with model y^2 = x^6 – 1!  So the main point of our paper is to generalize the argument of Coates and Kim to apply to curves whose Jacobian has a nontrivial CM part.  (This is familiar from Chabauty; you don’t actually need the Jacobian to have small rank, it’s enough for just one chunk of it to have small rank relative to its dimension.)  Having done this, we get finiteness for Y and all its twists, whence for X.

There are further results by Poonen of Bogomolov-Tschinkel flavor; these allow us to go from hyperelliptic curves to a more general class of curves, including all curves which are solvable covers of P^1.  But of course here’s the natural question:

Question:  Does every algebraic curve over Q admit an etale cover which geometrically dominates a curve Y whose Jacobian has CM?

An affirmative answer would extend the reach of non-abelian Chabauty to all curves over Q, which would be cool!


June 02, 2017

Jordan EllenbergMultiple height zeta functions?

Idle speculation ensues.

Let X be a projective variety over a global field K, which is Fano — that is, its anticanonical bundle is ample.  Then we expect, and in lots of cases know, that X has lots of rational points over K.  We can put these points together into a height zeta function

\zeta_X(s) = \sum_{x \in X(K)} H(x)^{-s}

where H(x) is the height of x with respect to the given projective embedding.  The height zeta function organizes information about the distribution of the rational points of X, and which in favorable circumstances (e.g. if X is a homogeneous space) has the handsome analytic properties we have come to expect from something called a zeta function.  (Nice survey by Chambert-Loir.)

What if X is a variety with two (or more) natural ample line bundles, e.g. a variety that sits inside P^m x P^n?  Then there are two natural height functions H_1 and H_2 on X(K), and we can form a “multiple height zeta function”

\zeta_X(s,t) = \sum_{x \in X(K)} H_1(x)^{-s} H_2(x)^{-t}

There is a whole story of “multiple Dirichlet series” which studies functions like

\sum_{m,n} (\frac{m}{n}) m^{-s} n^{-t}

where (\frac{m}{n}) denotes the Legendre symbol.  These often have interesting analytic properties that you wouldn’t see if you fixed one variable and let the other move; for instance, they sometimes have finite groups of functional equations that commingle the s and the t!

So I just wonder:  are there situations where the multiple height zeta function is an “analytically interesting” multiple Dirichlet series?

Here’s a case to consider:  what if X is the subvariety of P^2 x P^2 cut out by the equation

x_0 y_0 + x_1 y_1 + x_2 y_2 = 0?

This has something to do with Eisenstein series on GL_3 but I am a bit confused about what exactly to say.

June 01, 2017

Doug NatelsonWhat should everyone know about physics?

A couple of weeks ago, Sean Carroll made an offhand remark (the best kind) on twitter about what every physics major should know.  That prompted a more thoughtful look at our expectations for physics majors by Chad Orzel, with which I broadly agree, as does ZapperZ, who points out that most physics majors don't actually go on to be physics PhDs (and that's fine, btw.)  

Entertaining and thought-provoking as this was, it seems like it's worth having a discussion among practicing physicist popularizers about what we'd like everyone to know about physics.  (For the pedants in the audience, by "everyone" I'm not including very young children, and remember, this is aspirational.  It's what we'd like people to know, not what we actually expect people to know.)  I'm still thinking about this, but here are some basic ingredients that make the list, including some framing topics about science overall.
  • Science is a reason- and logic-based way to look at the natural world; part of science is figuring out "models" (ways of describing the natural world and how it works) that have explanatory (retrodictive) and predictive power.   
  • Basic science is about figuring out how the world works - figuring out the "rules of the game".  Engineering is about using that knowledge to achieve some practical goal.  The line is very blurry; lots of scientists are motivated by and think about eventual applications.
  • Different branches of science deal with different levels of complexity and have their own vocabularies.  When trying to answer a scientific question, it's important to use the appropriate level of complexity and the right vocabulary.  You wouldn't try to describe how a bicycle works by starting with the molecular composition of the grease on the chain....
  • Physics in particular uses mathematics as its language and a tool.  You can develop good intuition for how physics works, but to make quantitative predictions, you need math.
  • Ultimately, observation and experiment are the arbiters of whether a scientific model/theory is right and wrong, scientifically.  "Right" means "agrees with observation/experiment whenever checked", "wrong" means "predicts results at odds with reality".  Usually this means the model/theory needs an additional correction, or has only a limited range of applicability.  (Our commonplace understanding of how bicycles work doesn't do so well at speeds of thousands of miles an hour.  That doesn't mean we don't understand how bikes work at low speeds; it means that additional effects have to be considered at very high speeds.)
  • There are many branches of physics - it's not all particle physics and astrophysics, despite the impression you might get from TV or movies.
  • Physics explains light in all its forms (the microwaves that heat your food; the radio waves that carry your wifi and cell phone traffic; the light you see with your eye and which carries your internet data over fibers; the x-rays that can go through your skin; and the really high energy gamma rays that do not, in fact, turn you into an enormous green ragemonster).  
  • Physics includes not just looking at the tiniest building blocks of matter, but also understanding what happens when those building blocks come together in very large numbers - it can explain that diamond is hard and transparent, how/why water freezes and boils, and how little pieces of silicon in your computer can be used to switch electric current.  Physics provides a foundation for chemistry and biology, but in those fields often it makes much more sense to use chemistry and biology vocabulary and models.  
  • Quantum mechanics can be unintuitive and weird, but that doesn't mean it's magic, and it doesn't mean that everything we don't understand (e.g., consciousness) is deeply connected to quantum physics.  Quantum is often most important at small scales, like at the level of individual atoms and molecules.  That's one reason it can seem weird - your everyday world is much larger.
  • Relativity can also be unintuitive and weird - that's because it's most important at speeds near the speed of light, and again those are far from your everyday experience.   
  • We actually understand a heck of a lot of physics, and that's directly responsible for our enormous technological progress in the last hundred and fifty years.
  • Physicists enjoy being creative and speculative, but good and honest ones are careful to point out when they're hand waving or being fanciful. 
I'll add more to this list over time, but that's a start.... 

Chad OrzelNerd Parenting Update

We took a rare long weekend to go to a family party for Memorial Day at my parents’ (Kate and the kids always get the day off, but I usually have to teach; this year, I’m doing a team-taught class, and the other person was willing to cover my Monday spot), thus the lack of a weekly update post. The delay, however, allows for some quality nerd-parenting content, so it’s a win all around…

SteelyKid’s third-grade class has been doing long-ish projects approximately monthly through the year, and the May project was to design and make a board game based on a book. SteelyKid picked the Bone graphic novels, which she and a bunch of her classmates tore through earlier in the year, and she finished the game early this week. She’s very proud of how it came out:

SteelyKid mugging with her board game.

This is not without justification, as she put a lot of thought into it, and it came out really nicely:

SteelyKid’s board game based on the Bone graphic novels.

(Obviously, she had a bit of help from us in finding and printing out images from the books, but the concept and general design are all her…)

We play-tested it last night, and The Pip won the first game, with SteelyKid coming in second. She’s taking it in to school today, and very excited to show it to her friends.

And so the Little Dude isn’t left out, he has also provided some quality nerd-parenting moments. SteelyKid’s work on the game pushed well past bedtime on Tuesday night, and The Pip was getting restless, so I offered to go upstairs and read with him. To my surprise, he accepted enthusiastically, mostly because we’ve started reading the Dragonbreath series by Ursula Vernon.

SteelyKid read these a while back, and I’ve been jokingly calling the kids “Cousin Spencer” whenever they critique my playing of video games. The Pip is old enough now to get the books, so we started the series, and he was very fired up for it. I have him read the dialogue bubbles on the comic-style pages that are scattered through the books, and I read the regular prose sections. We finished the first book last night, and he asked me to go get the second volume from SteelyKid’s room, even though we weren’t going to start it right away, just so he could see it and have it in his room for tonight.

The Pip started kindergarten at a lower reading level than SteelyKid did for a variety of reasons, chiefly that his birthday is three months later in the year than hers, so he was starting younger. He’s made remarkable progress, though, and is very smug about being a couple of reading levels past the goal for the end of the year. This has largely been driven by reading guidebooks about Pokemon and, more recently, Skylanders, which has done wonders for his reading but is not a real thrill for whichever parent has to read lists of character attacks to him at bedtime. Broadening his literary interests to some quality books with actual plot is a huge win.

So, it’s been a good week for nerd parenting. And now, I have page proofs of a couple of forthcoming things to review, so I need to go somewhere with no wi-fi for a while…

BackreactionDoes parametric resonance solve the cosmological constant problem?

An oscillator too.Source: Giphy. Tl;dr: Ask me again in ten years. A lot of people asked for my opinion about a paper by Wang, Zhu, and Unruh that recently got published in Physical Reviews D, one of the top journals in the field. How the huge energy of quantum vacuum gravitates to drive the slow accelerating expansion of the Universe Qingdi Wang, Zhen Zhu, William G. UnruhPhys. Rev. D 95,

Andrew JaffePython Bug Hunting

This is a technical, nerdy post, mostly so I can find the information if I need it later, but possibly of interest to others using a Mac with the Python programming language, and also since I am looking for excuses to write more here. (See also updates below.)

It seems that there is a bug in the latest (mid-May 2017) release of Apple’s macOS Sierra 10.12.5 (ok, there are plenty of bugs, as there in any sufficiently complex piece of software).

It first manifested itself (to me) as an error when I tried to load the jupyter notebook, a web-based graphical front end to Python (and other languages). When the command is run, it opens up a browser window. However, after updating macOS from 10.12.4 to 10.12.5, the browser didn’t open. Instead, I saw an error message:

    0:97: execution error: "http://localhost:8888/tree?token=<removed>" doesn't understand the "open location" message. (-1708)

A little googling found that other people had seen this error, too. I was able to figure out a workaround pretty quickly: this behaviour only happens when I wanted to use the “default” browser, which is set in the “General” tab of the “System Preferences” app on the Mac (I have it set to Apple’s own “Safari” browser, but you can use Firefox or Chrome or something else). Instead, there’s a text file you can edit to explicitly set the browser that you want jupyter to use, located at ~/.jupyter/, by including the line

c.NotebookApp.browser = u'Safari'

(although an unrelated bug in Python means that you can’t currently use “Chrome” in this slot).

But it turns out this isn’t the real problem. I went and looked at the code in jupyter that is run here, and it uses a Python module called webbrowser. Even outside of jupyter, trying to use this module to open the default browser fails, with exactly the same error message (though I’m picking a simpler URL at instead of the jupyter-related one above):

>>> import webbrowser
>>> br = webbrowser.get()
0:33: execution error: "" doesn't understand the "open location" message. (-1708)

So I reported this as an error in the Python bug-reporting system, and hoped that someone with more experience would look at it.

But it nagged at me, so I went and looked at the source code for the webbrowser module. There, it turns out that the programmers use a macOS command called “osascript” (which is a command-line interface to Apple’s macOS automation language “AppleScript”) to launch the browser, with a slightly different syntax for the default browser compared to explicitly picking one. Basically, the command is osascript -e 'open location ""'. And this fails with exactly the same error message. (The similar code osascript -e 'tell application "Safari" to open location ""' which picks a specific browser runs just fine, which is why explicitly setting “Safari” back in the jupyter file works.)

But there is another way to run the exact same AppleScript command. Open the Mac app called “Script Editor”, type open location "" into the window, and press the “run” button. From the experience with “osascript”, I expected it to fail, but it didn’t: it runs just fine.

So the bug is very specific, and very obscure: it depends on exactly how the offending command is run, so appears to be a proper bug, and not some sort of security patch from Apple (and it certainly doesn’t appear in the 10.12.5 release notes). I have filed a bug report with Apple, but these are not publicly accessible, and are purported to be something of a black hole, with little feedback from the still-secretive Apple development team.


  • The bug has been marked as a duplicate, which means I won’t get any more direct information from Apple, but at least they acknowledge that it’s a bug of some sort.
  • It appears to have been corrected in the latest beta of macOS Sierra 10.12.6, and so should be fixed in the next official release.

May 31, 2017

Tim GowersIntransitive dice V: we want a local central limit theorem

It has become clear that what we need in order to finish off one of the problems about intransitive dice is a suitable version of the local central limit theorem. Roughly speaking, we need a version that is two-dimensional — that is, concerning a random walk on \mathbb Z^2 — and completely explicit — that is, giving precise bounds for error terms so that we can be sure that they get small fast enough.

There is a recent paper that does this in the one-dimensional case, though it used an elementary argument, whereas I would prefer to use Fourier analysis. Here I’d like to begin the process of proving a two-dimensional result that is designed with our particular application in mind. If we are successful in doing that, then it would be natural to try to extract from the proof a more general statement, but that is not a priority just yet.

As people often do, I’ll begin with a heuristic argument, and then I’ll discuss how we might try to sharpen it up to the point where it gives us good bounds for the probabilities of individual points of \mathbb Z^2. Much of this post is cut and pasted from comments on the previous post, since it should be more convenient to have it in one place.

Using characteristic functions

The rough idea of the characteristic-functions approach, which I’ll specialize to the 2-dimensional case, is as follows. (Apologies to anyone who knows about this properly for anything idiotic I might accidentally write.) Let (X,Y) be a random variable on \mathbb Z^2 and write f(x,y) for \mathbb P[(X,Y)=(x,y)]. If we take n independent copies of (X,Y) and add them together, then the probability of being at (x,y) is


where that denotes the n-fold convolution.

Now let’s define the Fourier transform of f, which probabilists call the characteristic function, in the usual way by

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}.

Here \alpha and \beta belong to \mathbb R/\mathbb Z, but I’ll sometimes think of them as belonging to [0,1) too.

We have the convolution law that \widehat{f*g}=\hat f\hat g and the inversion formula

f(x,y)=\int\int \hat f(\alpha,\beta)e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta.

Putting these together, we find that if random variables (X_i,Y_i) are n independent copies of (X,Y), then the probability that their sum is (x,y) is

\int\int \hat f(\alpha,\beta)^n e^{-2\pi i(\alpha x+\beta y)}\,d\alpha\,d\beta.

The very rough reason that we should now expect a Gaussian formula is that we consider a Taylor expansion of f. We can assume for our application that X and Y have mean zero. From that one can argue that the coefficients of the linear terms in the Taylor expansion are zero. (I’ll give more details in a subsequent comment.) The constant term is 1, and the quadratic terms give us the covariance matrix of X and Y. If we assume that we can approximate f(\alpha,\beta) by an expression of the form 1-q(\alpha,\beta) for some suitable quadratic form in \alpha and \beta, then the nth power should be close to \exp(-nq(\alpha,\beta)), and then, since Fourier transforms (and inverse Fourier transforms) take Gaussians to Gaussians, when we invert this one, we should get a Gaussian-type formula for f(x,y). So far I’m glossing over the point that Gaussians are defined on \mathbb R^2, whereas \alpha and \beta live in \mathbb T and x and y live in \mathbb Z^2, but if most of \hat f^n is supported in a small region around 0, then this turns out not to be too much of a problem.

The Taylor-expansion part

If we take the formula

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}

and partially differentiate r times with respect to \alpha and s times with respect to \beta we obtain the expression

(2\pi i)^{r+s}\sum_{x,y}x^ry^sf(x,y)e^{2\pi i(\alpha x+\beta y)}.

Setting \alpha=\beta=0 turns this into (2\pi i)^{r+s}\mathbb E(X^rY^s). Also, for every \alpha and \beta the absolute value of the partial derivative is at most (2\pi)^{r+s}\mathbb E(X^rY^s). This allows us to get a very good handle on the Taylor expansion of \hat f when \alpha and \beta are close to the origin.

Recall that the two-dimensional Taylor expansion of \hat f about (0,0) is given by the formula

\hat f(\alpha,\beta)=\hat f(0,0)+D_1\hat f(0,0)\alpha+D_2\hat f(0,0)\beta+\frac 12D_{11}\hat f(0,0)\alpha^2
+D_{12}\hat f(0,0)\alpha\beta+\frac 12 D_{22}\hat f(0,0)\beta^2+\mbox{error term}

where D_1 is the partial derivative operator with respect to the first coordinate, D_{12} the mixed partial derivative, and so on.

In our case, \hat f(0,0)=\sum_{x,y}f(x,y)=1, D_1\hat f(0,0)=2\pi i\mathbb EX=0, and D_2\hat f(0,0)=2\pi i\mathbb EY=0.

As in the one-dimensional case, the error term has an integral representation, namely

\sum_{j=0}^3\binom 3j  \alpha^j\beta^{3-j}\int_0^1 (1-t)^2 (D_1^jD_2^{3-j}\hat f)(t\alpha,t\beta)\,dt,

which has absolute value at most 8\pi^3\sum_{j=1}^3\binom 3j|\alpha^j\beta^{3-j}\mathbb E X^jY^{3-j}|, which in turn is at most

8\pi^3\sum_{j=1}^3\binom 3j|\alpha|^j|\beta|^{3-j}\|X\|_\infty^j\|Y\|_\infty^{3-j}=(|\alpha|\|X\|_\infty+|\beta|\|Y\|_\infty)^3.

When (X,Y) is the random variable (h_A(j),j-(n+1)/2) (where A is a fixed die and j is chosen randomly from [n]), we have that \|Y\|_\infty=(n-1)/2<n.

With very slightly more effort we can get bounds for the moments of h_A as well. For any particular j and a purely random sequence A=(a_1,\dots,a_n)\in [n]^n, the probability that |h_A(j)|\geq t\sqrt n is bounded above by e^{-ct^2} for an absolute constant c>0. (Something like 1/8 will do.) So the probability that there exists such a j conditional on \sum_ia_i=n(n+1)/2 (which happens with probability about 1/n) is at most Cn^2e^{-ct^2}, and in particular is small when t=100\sqrt{\log n}. I think that with a bit more effort we could probably prove that \mathbb E_j |h_A(j)|^3 is at most Cn^{3/2}, which would allow us to improve the bound for the error term, but I think we can afford the logarithmic factor here, so I won’t worry about this. So we get an error of C(|\alpha|\sqrt{n\log n}+\beta n)^3.

For this error to count as small, we want it to be small compared with the second moments. For the time being I’m just going to assume that the rough size of the second-moment contribution is around c(|\alpha|\sqrt n+|\beta|n)^2. So for our error to be small, we want \alpha to be o(1/\sqrt{n}(\log n)^{3/2}) and \beta to be o(1/n).

That is giving us a rough idea of the domain in which we can say confidently that the terms up to the quadratic ones give a good approximation to \hat f, and hence that \hat f^n is well approximated by a Gaussian.

The remaining part

Outside the domain, we have to do something different, and that something is fairly simple: we shall show that \hat f^n is very small. This is equivalent to showing that |\hat f| is bounded away from 1 by significantly more than 1/n. This we do by looking more directly at the formula for the Fourier transform:

\hat f(\alpha,\beta)=\sum_{x,y}f(x,y)e^{2\pi i(\alpha x+\beta y)}.

We would like this to have absolute value bounded away from 1 by significantly more than 1/n except when \alpha is quite a bit smaller than n^{-1/2}(\log n)^{-3/2} and \beta is quite a bit smaller than n^{-1}.

Now in our case f is uniformly distributed on the n points (h_A(j),j-(n+1)/2). So we can write \hat f as

\hat f(\alpha,\beta)=n^{-1}\sum_je^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}.

Here’s a possible way that we might try to bound that sum. Let m=n/2 and let us split up the sum into pairs of terms with j and j+m, for j\leq m. So each pair of terms will take the form

e^{2\pi i(\alpha h_A(j)+\beta(j-(n+1)/2))}+e^{2\pi i(\alpha h_A(j+m)+\beta(j+m-(n+1)/2))}.

The ratio of these two terms is

e^{2\pi i(\alpha(h_A(j)-h_A(j+m))-\beta m)}.

And if the ratio is e^{i\theta}, then the modulus of the sum of the two terms is at most 2(1-c\theta^2).

Now let us suppose that as j varies, the differences h_A(j)-h_A(j+m) are mostly reasonably well distributed in an interval between -\sqrt n and \sqrt n, as seems very likely to be the case. Then the ratios above vary in a range from about -\alpha\sqrt n to \alpha\sqrt n. But that should imply that the entire sum, when divided by n, has modulus at most 1-c\alpha^2n. (This analysis obviously isn’t correct when \alpha is bigger than n^{-1/2}, since the modulus can’t be negative, but once we’re in that regime, then it really is easy to establish the bounds we want.)

If \alpha is, say n^{-2/3}, then this gives us 1-cn^{-1/3}, and raising that to the power n gives us e^{-cn^{2/3}}, which is tiny.

As a quick sanity check, note that for (1-c\alpha^2n)^n not to be tiny we need \alpha to be not much more than n^{-1}. This reflects the fact that a random walk of n steps of typical size about \sqrt n will tend to be at a distance comparable to n from the origin, and when you take the Fourier transform, you take the reciprocals of the distance scales.

If \alpha is quite a bit smaller than n^{-1/2} and \beta is not too much smaller than n^{-1}, then the numbers \alpha h_A(j) are all small but the numbers \beta(j-(n+1)/2) vary quite a bit, so a similar argument can be used to show that in this case too the Fourier transform is not close enough to 1 for its nth power to be large. I won’t give details here.

What is left to do?

If the calculations above are not too wide of the mark, then the main thing that needs to be done is to show that for a typical die A the numbers h_A(j) are reasonably uniform in a range of width around \sqrt n, and more importantly that the numbers h_A(j)-h_A(j+m) are not too constant: basically I’d like them to be pretty uniform too.

It’s possible that we might want to try a slightly different approach, which is to take the uniform distribution on the set of points (h_A(j),j-(n+1)/2), convolve it once with itself, and argue that the resulting probability distribution is reasonably uniform in a rectangle of width around n and height around \sqrt n. By that I mean that a significant proportion of the points are hit around \sqrt n times each (because there are n^2 sums and they lie in a rectangle of area n^{3/2}). But one way or another, I feel pretty confident that we will be able to bound this Fourier transform and get the local central limit theorem we need.

May 30, 2017

Tim GowersIntransitive dice IV: first problem more or less solved?

I hope, but am not yet sure, that this post is a counterexample to Betteridge’s law of headlines. To back up that hope, let me sketch an argument that has arisen from the discussion so far, which appears to get us close to showing that if A,B and C are three n-sided dice chosen independently at random from the sequences model (I will recap some of these definitions in a moment), then the probability that A beats C given that A beats B and B beats C is 1/2+o(1). Loosely speaking, the information that A beats B and B beats C tells you nothing about how likely A is to beat C. Having given the argument, I will try to isolate a statement that looks plausible, not impossible to prove, and sufficient to finish off the argument.

Basic definitions

An nsided die in the sequence model is a sequence (a_1,\dots,a_n) of elements of [n]=\{1,2,\dots,n\} such that \sum_ia_i=n(n+1)/2, or equivalently such that the average value of a_i is (n+1)/2, which is of course the average value of a random element of [n]. A random n-sided die in this model is simply an n-sided die chosen uniformly at random from the set of all such dice.

Given n-sided dice A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n), we say that A beats B if


If the two sets above have equal size, then we say that A ties with B.

A first reduction

When looking at this problem, it is natural to think about the following directed graph: the vertex set is the set of all n-sided dice and we put an arrow from A to B if A beats B.

We believe (and even believe we can prove) that ties are rare. Assuming that to be the case, then the conjecture above is equivalent to the statement that if A,B,C are three vertices chosen independently at random in this graph, then the probability that ABC is a directed cycle is what you expect for a random tournament, namely 1/8.

One can also make a more general conjecture, namely that the entire (almost) tournament is quasirandom in a sense defined by Chung and Graham, which turns out to be equivalent to the statement that for almost all pairs (A,B) of dice, the four possible pairs of truth values for the pair of statements

(A beats C, B beats C)

each occur with probability approximately 1/4. If this is true, then given k random dice A_1,\dots,A_k, all the 2^{\binom k2} possibilities for which A_i beat which A_j have probability approximately 2^{-\binom k2}. This would imply, for example, that if A_1,\dots,A_k are independent random n-sided dice, then the probability that A_1 beats A_k given that A_i beats A_j for all other pairs (i,j) with i<j is still 1/2+o(1).

Several of us have done computer experiments to test these conjectures, and it looks as though the first one is true and the second one false. A further reason to be suspicious of the stronger conjecture is that a natural approach to prove it appears to be morally equivalent to a relationship between the correlations of certain random variables that doesn’t seem to have any heuristic justification or to fit with experimental evidence. So although we don’t have a disproof of the stronger conjecture (I think it would be very interesting to find one), it doesn’t seem like a good idea to spend a lot of effort trying to prove it, unless we can somehow explain away the evidence that appears to be stacking up against it.

The first conjecture turns out to be equivalent to a statement that doesn’t mention transitivity. The very quick proof I’ll give here was supplied by Luke Pebody. Suppose we have a tournament (that is, a complete graph with each edge directed in one of the two possible directions) and write d_+(x) for the out-degree of a vertex x (that is, the number of y such that there is an arrow from x to y) and d_-(x) for the in-degree. Then let us count the number of ordered triples (x,y,z) such that x\to y\to z. Any directed triangle xyz in the tournament will give rise to three such triples, namely (x,y,z), (y,z,x) and (z,x,y). And any other triangle will give rise to just one: for example, if x\to y, x\to z and y\to z we get just the triple (x,y,z). So the number of ordered triples (x,y,z) such that x\to y and y\to z is \binom n3 plus twice the number of directed triangles. Note that \binom n3 is approximately n^3/6.

But the number of these ordered triples is also \sum_yd_+(y)d_-(y). If almost all in-degrees and almost all out-degrees are roughly n/2, then this is approximately n^3/4, which means that the number of directed triangles is approximately (n^3/4-n^3/6)/2\approx\binom n3/4. That is, in this case, the probability that three dice form an intransitive triple is approximately 1/4, as we are hoping from the conjecture. If on the other hand several in-degrees fail to be roughly n/2, then \sum_yd_+(y)d_-(y) is substantially lower than n^3/4 and we get a noticeably smaller proportion of intransitive triples.

Thus, the weaker conjecture is equivalent to the statement that almost every die beats approximately half the other dice.

Why should a “typical” die beat approximately half the other dice?

The answer to this is fairly simple, heuristically at least. Let A be an arbitrary die. For j=1,2,\dots,n define g_A(j) to be the number of i with a_i\leq j and define h_A(j) to be g_A(j)-j. Then

\sum_jg_A(j)=\sum_j\sum_i\mathbf 1_{[a_i\leq j]}=\sum_i(n-a_i+1)


from which it follows that \sum_jh_A(j)=0.

We also have that if B is another die, then

\sum_jh_A(b_j)=\sum_j(\sum_i\mathbf 1_{a_i\leq b_j}-b_j)=\sum_{i,j}\mathbf 1_{[a_i\leq b_j]}-n(n+1)/2.

If we make the simplifying assumption that a_i=b_j sufficiently infrequently to make no real difference to what is going on (which is not problematic, as a slightly more complicated but still fairly simple function can be used instead of h_A to avoid this problem), then we find that to a reasonable approximation B beats A if and only if \sum_jh_A(b_j) is positive.

So what we would like to prove is that if b_1,\dots,b_n are chosen independently at random from [n], then

\mathbb P\Bigl[\sum_jh_A(b_j)>0\Bigm|\sum_jb_j=n(n+1)/2\Bigr]\approx 1/2.

We are therefore led to consider the random variable


where now B=(b_1,\dots,b_n) is chosen uniformly at random from [n]^n without any condition on the sum. To write this in a more transparent way, let (U,V) be the random variable (h_A(x), x-(n+1)/2), where x is chosen uniformly at random from [n]. Then (X,Y) is a sum of n independent copies of (U,V). What we are interested in is the distribution we obtain when we condition the random variable (X,Y) on Y=0.

This should mean that we are in an excellent position, since under appropriate conditions, a lot is known about sums of independent random variables, and it looks very much as though those conditions are satisfied by (U,V), at least when A is “typical”. Indeed, what we would expect, by the central limit theorem, is that (X,Y) will approximate a bivariate normal distribution with mean 0 (since both U and V have mean zero). But a bivariate normal distribution is centrally symmetric, so we expect the distribution of [X|Y=0] to be approximately centrally symmetric, which would imply what we wanted above, since that is equivalent to the statement that \mathbb P[X>0|Y=0]\approx 1/2.

Why are we not immediately done?

How can we make the above argument rigorous? The central limit theorem on its own is not enough, for two reasons. The first is that it does not give us information about the speed of convergence to a normal distribution, whereas we need a sum of n copies of (U,V) to be close to normal. The second is that the notion of “close to normal” is not precise enough for our purposes: it will allow us to approximate the probability of an event such as [X\geq x\wedge Y\geq y] but not of a “probability zero” event such as [X>0\wedge Y=0].

The first of these difficulties is not too worrying, since plenty of work has been done on the speed of convergence in the central limit theorem. In particular, there is a famous theorem of Berry and Esseen that is often used when this kind of information is needed.

However, the Berry-Esseen theorem still suffers from the second drawback. To get round that one needs to turn to more precise results still, known as local central limit theorems, often abbreviated to LCLTs. With a local central limit theorem, one can even talk about the probability that (X,Y) takes a specific value after a specific number of steps. Roughly speaking, it says (in its 2-dimensional version) that if (U,V) is a random variable of mean zero taking values in \mathbb Z^2 and if (U,V) satisfies suitable moment conditions and is not supported in a proper sublattice of \mathbb Z^2, then writing (X,Y) for a sum of n copies of (U,V), we have that the probability that (X,Y) takes a particular value differs from the “expected” probability (given by a suitable Gaussian formula) by O(n^{-2}). (I’m not 100% sure I’ve got that right: the theorem in question is Theorem 2.1.1 from this book.)

That looks very close to what we want, but it still falls short. The problem is that the implied constant depends on the random variable (U,V). A simple proof of this is that if (U,V) is not supported in a sublattice but very nearly is — for example, if the probability that it takes a value outside the sublattice is 2^{-2^n} — then one will have to add together an extremely large number of copies of (U,V) before the sum ceases to be concentrated in the sublattice.

So the situation we appear to be in is the following. We have more precise information about the random variable (U,V) than is assumed in the LCLT in the reference above, and we want to use that to obtain an explicit constant in the theorem.

It could be that out there in the literature is exactly the result we need, which would be nice, but it also seems possible that we will have to prove an appropriate version of the LCLT for ourselves. I’d prefer the first, but the second wouldn’t be too disappointing, as the problem is quite appealing and even has something of an additive-combinatorial flavour (since it is about describing an iterated convolution of a subset of \mathbb Z^2 under appropriate assumptions).

What properties can we establish about the random variable (U,V)?

I said above, with no justification, that we have more precise information about the random variable (U,V). Let me now try to give the justification.

First of all, we know everything we could possibly want to know about V: it is the uniform distribution on [n] - (n+1)/2. (In particular, if n is odd, then it is the uniform distribution on the set of integers in [-(n-1)/2, (n-1)/2].)

How about the distribution of U? That question is equivalent to asking about the values taken by h_A(j), and their multiplicities. There is quite a lot one can say about those. For example, I claim that with high probability (if A is a random n-sided die) h_A(j) is never bigger than C\sqrt{n\log n}. That is because if we choose a fully random sequence (a_1,\dots,a_n)\in[n]^n, then the expected number of i such that a_i\leq j is j, and the probability that this number differs from j by more than t\sqrt n is \exp(-ct^2), by standard probabilistic estimates, so if we set t=C\sqrt{\log n}, then this is at most n^{-cC}, which we can make a lot smaller than n^{-1} by choosing C to be, say, 10c^{-1}. (I think c can be taken to be 1/8 if you want me to be more explicit.) Since the probability that \sum_ia_i=n(n+1)/2 is proportional to 1/n, it follows that this conclusion continues to hold even after we condition on that event.

Another simple observation is that the values taken by U are not contained in a sublattice (assuming, that is, that h_A is ever non-zero). That is simply because h_A(j+1)\geq h_A(j)-1 and h_A averages zero.

A third simple observation is that with probability 1-o(1) h_A will take a value of at least \sqrt n/\log n at least somewhere. I’ll sketch a proof of this. Let k be around \log n and let m_1<\dots<m_k be evenly spaced in [n], staying away from the end points 1 and n. Let A be a purely random sequence in [n]^n. Then the standard deviation of h_A(m_1) is around \sqrt{n/\log n}, so the probability that it is less than \sqrt n/\log n is around (\log n)^{-1/2}. The same is true of the conditional probability that h_A(m_i) is less than \sqrt n/\log n conditioned on the value of h_A(m_{i-1}) (the worst case being when this value is 0). So the probability that this happens for every i is at most (\log n)^{-\log n/2}. This is much smaller than n^{-1}, so the conclusion remains valid when we condition on the sum of the a_i being n(n+1)/2. So the claim follows. Note that because of the previous simple observation, it follows that h_A must be at least c\sqrt{n/\log n} in magnitude at least c\sqrt{n/\log n} times, so up to log factors we get that \sum_jh_A(j)^2 is at least n^{3/2}. With a bit more effort, it should be possible to push this up to something more like n^2, since one would expect that h_A(j) would have rough order of magnitude \sqrt n for a positive fraction of the j. Maybe this would be a good subproblem to think about, and ideally not too difficult.

How about the joint distribution (U,V)? It seems highly likely that for typical A this will not be concentrated in a lattice, and that elementary arguments such as the above can be used to prove this. But let me indicate the kind of situation that we would have to prove is not typical. Suppose that n=15 and A=(1,1,1,1,1,8,8,8,8,8,15,15,15,15,15). Then as j runs from 1 to 15 the values taken by g_A are (5,5,5,5,5,5,5,10,10,10,10,10,10,10,15) and the values taken by h_A are (4,3,2,1,0,-1,-2,2,1,0,-1,-2,-3,-4,0). For this example, all the points (h_A(x),x) live in the lattice of points (u,v) such that u+v is a multiple of 5.

This wouldn’t necessarily be a disaster for us actually, since the LCLT can be restricted to a sublattice and if after conditioning on Y=0 we happen to have that X is always a multiple of 5, that isn’t a problem if we still have the central symmetry. But it would probably be nicer to prove that it is an atypical occurrence, so that we don’t have to worry about (U,V) living inside a sublattice (or even being concentrated in one).

My guess is that if we were to pursue these kinds of thoughts, we would end up being able to prove a statement that would say something like that (U,V) takes a pretty representative sample of values with V being between -(n-1)/2 and (n-1)/2 and U being in a range of width around \sqrt n. I would expect, for example, that if we add three or four independent copies of (U,V), then we will have a distribution that is similar in character to the uniform distribution on a rectangle of width of order of magnitude n and height of order of magnitude \sqrt n. And if that’s true, then adding n of them should give us something very close to normal (in an appropriate discrete sense of the word “normal”).

What next?

There are two obvious tasks here. One is to try to prove as much as we can about the random variable (U,V). The other is to try to prove a suitable LCLT that is strong enough to give us that the probability that X>0 given that Y=0 is approximately 1/2, under suitable assumptions about (U,V). And then we have to hope that what we achieve for the first is sufficient for the second.

It’s possible that the second task can be achieved by simply going through one of the existing proofs of the LCLT and being more careful about the details. But if that’s the case, then we should spend some time trying to find out whether anyone has done it already, since there wouldn’t be much point in duplicating that work. I hope I’ve set out what we want clearly enough for any probabilist who might stumble upon this blog post to be able to point us in the right direction if indeed the result we want is out there somewhere.

May 26, 2017

Geraint F. LewisFalling into a black hole: Just what do you see?

Everyone loves black holes. Immense gravity, a one-way space-time membrane, the possibility of links to other universes. All lovely stuff.

A little trawl of the internets reveals an awful lot of web pages discussing black holes, and discussions about spaghettification, firewalls, lost information, and many other things. Actually, a lot of the stuff out there on the web is nonsense, hand-waving, partly informed guesswork. And one of the questions that gets asked is "What would you see looking out into the universe?"

Some (incorrectly) say that you would never cross the event horizon, a significant mis-understanding of the coordinates of relativity. Other (incorrectly) conclude from this that you actually see the entire future history of the universe play out in front of your eyes.

What we have to remember, of course, is that relativity is a mathematical theory, and instead of hand waving, we can use mathematics to work out what we will see. And that's what I did.

I won't go through the details here, but it is based upon correctly calculating redshifts in relativity and conservation laws embodied in Killing vectors. But the result is an equation, an equation that looks like this

Here,  rs is the radius from which you start to fall, re is the radius at which the photon was emitted, and ro is the radius at which you receive the photon. On the left-hand-side is the ratio of the frequencies of the photon at the time of observation compared to emission. If this is bigger than one, then the photon is observed to have more energy than emitted, and the photon is blueshifted. If it is less than one, then it has less energy, and the photon is redshifted. Oh, and m is the mass of the black hole.

One can throw this lovely equation into python and plot it up. What do you get.

So, falling from a radius of 2.1m, we get
And falling from 3m
And from 5m
And 10m
and finally at 50m

In each of these, each line is a photon starting from different differences.

The key conclusion is that within the event horizon (r=2m) photons are generally seen to be redshifted, irrespective of where you start falling from. In fact in the last moment before you meet your ultimate end in the central singularity, the energy of the observed photon goes to zero and the outside universe is infinitely reshifted and vanishes from view.

How cool is that?

May 25, 2017

Chad OrzelLong-Overdue Photo-a-Day Wrap-Up

During my sabbatical last year, I decided to try to do a photo-a-day project, taking and sharing at least one good picture a day of something or another. The strict photo-a-day format fell victim to my general busy-ness and disorganization, but I did eventually complete the whole thing. In the final post of the series, I said I might have some wrap-up thoughts in a few days; that was several months ago.

Every time I need to clear some disk space, though, I’m reminded that I left that hanging, so I finally did something in the direction of a summative post, by uploading all the edited pictures from the project to a Google Photos album. There are way more than the promised 366 photos in here, because there were sometimes multiple images on a single day, and some others are photos I took and never used for anything.

Screen shot of the Google Photos album of my photo-a-day pictures.

(That photo of Emmy in the first row still makes me a little sad…)

So, looking back on this, what are my thoughts?

I think it was definitely worthwhile, especially in the earlier part when I had more time to play around and think about what pictures to take. I learned a lot about the camera and what sort of shots work well, so I take somewhat fewer total pictures to get a handful of good ones these days. (Obviously, for things like SteelyKid’s softball games, I’m still doing a lot of burst-mode shooting in hopes of getting a really good moment. For landscape-ish things, though, I feel like I have a better sense of what will work, and don’t waste as many shots trying to get stuff that never turns out right.

It’s also interesting to look back at that whole collection of stuff, and remember what a busy year that was. We had a bunch of out-of-town trips, including a Disney cruise with the kids and a trip to Rome for a friend’s wedding. (There are many more Rome pictures on Google Photos, in other albums– I only pulled a few for the 366 project…)

Excluding the cute-kid photos (of which there are many) and the science-of-photography experiments, there’s a reasonably consistent aesthetic to the kinds of pictures I liked enough to include in this. I’ve gotten used to shooting mostly with the 22mm “pancake” lens, which has a large aperture and a large-ish field of view. It’s a fixed length, which means no zoom, but the speed factor mostly makes up for it. I’ve been switching back and forth between that and the telephoto lens lately, because SteelyKid’s playing softball again, and screwing around with the camera keeps me from getting too bored at the games.

There were a couple of things I had hoped to do, but never really found the time, mostly involving processing. I was going to spend a day or so taking photos with an eye toward making them black and white; there are a couple of B&W shots in the album, mostly where the lighting conditions made it difficult to get a decent color image. I thought it might be fun to try to do that deliberately; still do, really, but it’s hard to find time.

I also wanted to get some time to play around more with GIMP (my image-processing program of choice, for quite a few years now) and try more processing than just cropping and color-correcting. In particular, I’ve never done much with selectively processing parts of images, or trying to edit stuff out of photos. It’d be nice to know how to do those things, because people who know what they’re doing get some cool effects that way, but again, it’s hard to find the time.

Anyway, Looking at that whole collection, I’m pleased to see that the number of real duds is pretty small. And it was a lot of fun to spend time walking around specifically looking at things as potential photographs; I need to do more of that.

And that is pretty much all I have to say about that…

n-Category Café A Type Theory for Synthetic ∞-Categories

One of the observations that launched homotopy type theory is that the rule of identity-elimination in Martin-Löf’s identity types automatically generates the structure of an \infty-groupoid. In this way, homotopy type theory can be viewed as a “synthetic theory of \infty-groupoids.”

It is natural to ask whether there is a similar directed type theory that describes a “synthetic theory of (,1)(\infty,1)-categories” (or even higher categories). Interpreting types directly as (higher) categories runs into various problems, such as the fact that not all maps between categories are exponentiable (so that not all \prod-types exist), and that there are numerous different kinds of “fibrations” given the various possible functorialities and dimensions of categories appearing as fibers. The 2-dimensional directed type theory of Licata and Harper has semantics in 1-categories, with a syntax that distinguishes between co- and contra-variant dependencies; but since the 1-categorical structure is “put in by hand”, it’s not especially synthetic and doesn’t generalize well to higher categories.

An alternative approach was independently suggested by Mike and by Joyal, motivated by the model of homotopy type theory in the category of Reedy fibrant simplicial spaces, which contains as a full subcategory the \infty-cosmos of complete Segal spaces, which we call Rezk spaces. It is not possible to model ordinary homotopy type theory directly in the Rezk model structure, which is not right proper, but we can model it in the Reedy model structure and then identify internally some “types with composition,” which correspond to Segal spaces, and “types with composition and univalence,” which correspond to the Rezk spaces.

Almost five years later, we are finally developing this approach in more detail. In a new paper now available on the arXiv, Mike and I give definitions of Segal and Rezk types motivated by these semantics, and demonstrate that these simple definitions suffice to develop the synthetic theory of (,1)(\infty,1)-categories. So far this includes functors, natural transformations, co- and contravariant type families with discrete fibers (\infty-groupoids), the Yoneda lemma (including a “dependent” Yoneda lemma that looks like “directed identity-elimination”), and the theory of coherent adjunctions.

Cofibrations and extension types

One of the reasons this took so long to happen is that it required a technical innovation to become feasible. To develop the synthetic theory of Segal and Rezk types, we need to detect the semantic structure of the simplicial spaces model internally, and it seems that the best way to do this is to axiomatize the presence of a strict interval 22 (a totally ordered set with distinct top and bottom elements). This is the geometric theory of which simplicial sets are the classifying topos (and of which simplicial spaces are the classifying (,1)(\infty,1)-topos). We can then define an arrow in a type AA to be a function 2A2\to A.

However, often we want to talk about arrows with specified source and target. We can of course define the type hom A(x,y)\hom_A(x,y) of such arrows to be f:2A(f(0)=x)×(f(1)=y)\sum_{f:2\to A} (f(0)=x)\times (f(1)=y), but since we are in homotopy type theory, the equalities f0=xf0=x and f1=yf1=y are data, i.e. homotopical paths, that have to be carried around everywhere. When we start talking about 2-simplices and 3-simplices with specified boundaries as well, the complexity becomes unmanageable.

The innovation that solves this problem is to introduce a notion of cofibration in type theory, with a corresponding type of extensions. If i:ABi:A\to B is a cofibration and X:B𝒰X:B\to \mathcal{U} is a type family dependent on BB, while f: a:AX(i(a))f:\prod_{a:A} X(i(a)) is a section of XX over ii, then we introduce an extension type b:BX(b) f i\langle \prod_{b:B} X(b) \mid^i_f\rangle consisting of “those dependent functions g: b:BX(b)g:\prod_{b:B} X(b) such that g(i(a))f(a)g(i(a)) \equiv f(a) — note the strict judgmental equality! — for any a:Aa:A”. This is modeled semantically by a “Leibniz” or “pullback-corner” map. In particular, we can define hom A(x,y)= t:2A [x,y] 0,1\hom_A(x,y) = \langle \prod_{t:2} A \mid^{0,1}_{[x,y]} \rangle, the type of functions f:2Af:2\to A such that f(0)xf(0)\equiv x and f(1)yf(1) \equiv y strictly, and so on for higher simplices.

General extension types along cofibrations were first considered by Mike and Peter Lumsdaine for a different purpose. In addition to the pullback-corner semantics, they are inspired by the path-types of cubical type theory, which replace the inductively specified identity types of ordinary homotopy type theory with a similar sort of restricted function-type out of the cubical interval. Our paper introduces a general notion of “type theory with shapes” and extension types that includes the basic setup of cubical type theory as well as our simplicial type theory, along with potential generalizations to Joyal’s “disks” for a synthetic theory of (,n)(\infty,n)-categories.

Simplices in the theory of a strict interval

In simplicial type theory, the cofibrations are the “inclusions of shapes” generated by the coherent theory of a strict interval, which is axiomatized by the interval 22, top and bottom elements 0,1:20,1 : 2, and an inequality relation \le satisfying the strict interval axioms.

Simplices can then be defined as

Δ n:={t 1,,t nt nt n1t 2t 1} \Delta^n := \{ \langle t_1,\ldots, t_n\rangle \mid t_n \leq t_{n-1} \cdots t_2 \leq t_1 \}

Note that the 1-simplex Δ 1\Delta^1 agrees with the interval 22.

Boundaries, e.g. of the 2-simplex, can be defined similarly Δ 2:={t 1,t 2:2×2(0t 2t 1)(t 2t 1)(t 2t 11)} \partial\Delta^2 :=\{&#10216;t_1,t_2&#10217;: 2 \times 2 \mid (0 \equiv t_2 \leq t_1) \vee (t_2 \equiv t_1) \vee (t_2 \leq t_1 \equiv 1)\} making the inclusion of the boundary of a 2-simplex into a cofibration.

Segal types

For any type AA with terms x,y:Ax,y : A define

hom A(x,y):=2A [x,y] Δ 1 hom_A(x,y) := \langle 2 \to A \mid^{\partial\Delta^1}_{ [x,y]} \rangle

That is, a term f:hom A(x,y)f : hom_A(x,y), which we call an arrow from xx to yy in AA, is a function f:2Af: 2 \to A so that f(0)xf(0) \equiv x and f(1)yf(1) \equiv y. For f:hom A(x,y)f : hom_A(x,y), g:hom A(y,z)g : hom_A(y,z), and h:hom A(x,z)h : hom_A(x,z), a similar extension type

hom A(x,y,z,f,g,h):=Δ 2A [x,y,z,f,g,h] Δ 2 hom_A(x,y,z,f,g,h) := \langle \Delta^2 \to A \mid^{\partial\Delta^2}_{[x,y,z,f,g,h]}\rangle

has terms that we interpret as witnesses that hh is the composite of ff and gg. We define a Segal type to be a type in which any composable pair of arrows admits a unique (composite, witness) pair. In homotopy type theory, this may be expressed by saying that AA is Segal if and only if for all f:hom A(x,y)f : hom_A(x,y) and g:hom A(y,z)g : hom_A(y,z) the type

h:hom A(x,z)hom A(x,y,z,f,g,h) \sum_{h : hom_A(x,z)} hom_A(x,y,z,f,g,h)

is contractible. A contractible type is in particular inhabited, and an inhabitant in this case defines a term gf:hom A(x,z)g \circ f : hom_A(x,z) that we refer to as the composite of ff and gg, together with a 2-simplex witness comp(g,f):hom A(x,y,z,f,g,gf)comp(g,f) : hom_A(x,y,z,f,g,g \circ f).

Somewhat surprisingly, this single contractibility condition characterizing Segal types in fact ensures coherent categorical structure at all dimensions. The reason is that if AA is Segal, then the type XAX \to A is also Segal for any type or shape XX. For instance, applying this result in the case X=2X=2 allows us to prove that the composition operation in any Segal type is associative. In an appendix we prove a conjecture of Joyal that in the simplical spaces model this condition really does characterize exactly the Segal spaces, as usually defined.

Discrete types

An example of a Segal type is a discrete type, which is one for which the map

idtoarr: x,y:A(x= Ay)hom A(x,y) idtoarr : \prod_{x,y: A} (x=_A y) \to hom_A(x,y)

defined by identity elimination by sending the reflexivity term to the identity arrow, is an equivalence. In a discrete type, the \infty-groupoid structure encoded by the identity types and equivalent to the (,1)(\infty,1)-category structure encoded by the hom types. More precisely, a type AA is discrete if and only if it is Segal, as well as Rezk-complete (in the sense to be defined later on), and moreover “every arrow is an isomorphism”.

The dependent Yoneda lemma

If AA and BB are Segal types, then any function f:ABf:A\to B is automatically a “functor”, since by composition it preserves 2-simplices and hence witnesses of composition. However, not every type family C:A𝒰C:A\to \mathcal{U} is necessarily functorial; in particular, the universe 𝒰\mathcal{U} is not Segal — its hom-types hom 𝒰(X,Y)\hom_{\mathcal{U}}(X,Y) consist intuitively of “spans and higher spans”. We say that C:A𝒰C:A\to \mathcal{U} is covariant if for any f:hom A(x,y)f:\hom_A(x,y) and u:C(x)u:C(x), the type

v:C(y) t:2C(f(t)) [u,v] Δ 1 \sum_{v:C(y)} \langle \prod_{t:2} C(f(t)) \mid^{\partial\Delta^1}_{[u,v]}\rangle

of “liftings of ff starting at uu” is contractible. An inhabitant of this type consists of a point f *(u):C(y)f_\ast(u):C(y), which we call the (covariant) transport of uu along ff, along with a witness trans(f,u)trans(f,u). As with Segal types, this single contractibility condition suffices to ensure that this action is coherently functorial. It also ensures that the fibers C(x)C(x) are discrete, and that the total space x:AC(x)\sum_{x:A} C(x) is Segal.

In particular, for any Segal type AA and any a:Aa:A, the hom-functor hom A(a,):A𝒰\hom_A(a,-) :A \to \mathcal{U} is covariant. The Yoneda lemma says that for any covariant C:A𝒰C:A\to \mathcal{U}, evaluation at (a,id a)(a,id_a) defines an equivalence

( x:Ahom A(a,x)C(x))C(a) \Big(\prod_{x:A} \hom_A(a,x) \to C(x)\Big) \simeq C(a)

The usual proof of the Yoneda lemma applies, except that it’s simpler since we don’t need to check naturality or functoriality; in the “synthetic” world all of that comes for free.

More generally, we have a dependent Yoneda lemma, which says that for any covariant C:( x:Ahom A(a,x))𝒰C : \Big(\sum_{x:A} \hom_A(a,x)\Big) \to \mathcal{U}, we have a similar equivalence

( x:A f:hom A(a,x)C(x,f))C(a,id a). \Big(\prod_{x:A} \prod_{f:\hom_A(a,x)} C(x,f)\Big) \simeq C(a,id_a).

This should be compared with the universal property of identity-elimination (path induction) in ordinary homotopy type theory, which says that for any type family C:( x:A(a=x))𝒰C : \Big(\sum_{x:A} (a=x)\Big) \to \mathcal{U}, evaluation at (a,refl a)(a,refl_a) defines an equivalence

( x:A f:a=xC(x,f))C(a,refl a). \Big(\prod_{x:A} \prod_{f:a=x} C(x,f)\Big) \simeq C(a,refl_a).

In other words, the dependent Yoneda lemma really is a “directed” generalization of path induction.

Rezk types

When is an arrow f:hom A(x,y)f : hom_A(x,y) in a Segal type an isomorphism? Classically, ff is an isomorphism just when it has a two-sided inverse, but in homotopy type theory more care is needed, for the same reason that we have to be careful when defining what it means for a function to be an equivalence: we want the notion of “being an isomorphism” to be a mere proposition. We could use analogues of any of the equivalent notions of equivalence in Chapter 4 of the HoTT Book, but the simplest is the following:

isiso(f):=( g:hom A(y,x)gf=id x)×( h:hom A(y,x)fh=id y) isiso(f) := \left(\sum_{g : hom_A(y,x)} g \circ f = id_x\right) \times \left(\sum_{h : hom_A(y,x)} f \circ h = id_y \right)

An element of this type consists of a left inverse and a right inverse together with witnesses that the respective composites with ff define identities. It is easy to prove that g=hg = h, so that ff is an isomorphism if and only if it admits a two-sided inverse, but the point is that any pair of terms in the type isiso(f)isiso(f) are equal (i.e., isiso(f)isiso(f) is a mere proposition), which would not be the case for the more naive definition.

The type of isomorphisms from xx to yy in AA is then defined to be

(x Ay):= f:hom A(x,y)isiso(f). (x \cong_A y) := \sum_{f : \hom_A(x,y)} isiso(f).

Identity arrows are in particular isomorphisms, so by identity-elimination there is a map

x,y:A(x= Ay)(x Ay) \prod_{x,y: A} (x =_A y) \to (x \cong_A y)

and we say that a Segal type AA is Rezk complete if this map is an equivalence, in which case AA is a Rezk type.

Coherent adjunctions

Similarly, it is somewhat delicate to define homotopy correct types of adjunction data that are contractible when they are inhabited. In the final section to our paper, we compare transposing adjunctions, by which we mean functors f:ABf : A \to B and u:BAu : B \to A (i.e. functions between Segal types) together with a fiberwise equivalence

a:A,b:Bhom A(a,ub)hom B(fa,b) \prod_{a :A, b: B} \hom_A(a,u b) \simeq \hom_B(f a,b)

with various notions of diagrammatic adjunctions, specified in terms of units and counits and higher coherence data.

The simplest of these, which we refer to as a quasi-diagrammatic adjunction is specified by a pair of functors as above, natural transformations η:Id Auf\eta: Id_A \to u f and ϵ:fuId B\epsilon : f u \to Id_B (a “natural transformation” is just an arrow in a function-type between Segal types), and witnesses α\alpha and β\beta to both of the triangle identities. The incoherence of this type of data has been observed in bicategory theory (it is not cofibrant as a 2-category) and in (,1)(\infty,1)-catgory theory (as a subcomputad of the free homotopy coherent adjunction it is not parental). One homotopically correct type of adjunction data is a half-adjoint diagrammatic adjunction, which has additionally a witness that fα:ϵfuϵfηuϵf \alpha : \epsilon \circ f u\epsilon \circ f\eta u \to \epsilon and βu:ϵϵfufηu\beta u: \epsilon \circ \epsilon f u \circ f \eta u commute with the naturality isomorphism for ϵ\epsilon.

We prove that given Segal types AA and BB and functors f:ABf : A \to B and u:BAu : B \to A, the type of half-adjoint diagrammatic adjunctions between them is equivalent to the type of transposing adjunctions. More precisely, if in the notion of transposing adjunction we interpret “equivalence” as a “half-adjoint equivalence”, i.e. a pair of maps ϕ\phi and ψ\psi with homotopies ϕψ=1\phi \psi = 1 and ψϕ=1\psi \phi = 1 and a witness to one of the triangle identities for an adjoint equivalence (this is another of the coherent notions of equivalence from the HoTT Book), then these data correspond exactly under the Yoneda lemma to those of a half-adjoint diagrammatic adjunction.

This suggests that similar correspondences for other kinds of coherent equivalences. For instance, if we interpret transposing adjunctions using the “bi-invertibility” notion of coherent equivalence (specification of a separate left and right inverse, as we used above to define isomorphisms in a Segal type), we obtain upon Yoneda-fication a new notion of coherent diagrammatic adjunction, consisting of a unit η\eta and two counits ϵ,ϵ\epsilon,\epsilon', together with witnesses that η,ϵ\eta,\epsilon satisfy one triangle identity and η,ϵ\eta,\epsilon' satisfy the other triangle identity.

Finally, if the types AA and BB are not just Segal but Rezk, we can show that adjoints are literally unique, not just “unique up to isomorphism”. That is, given a functor u:BAu:B\to A between Rezk types, the “type of left adjoints to uu” is a mere proposition.

May 24, 2017

Doug NatelsonHot electrons and a connection to thermoelectricity

The two recent posts about the Seebeck effect and hot electrons give some context so that I can talk about a paper we published last month.

We started out playing around with metal nanowires, and measuring the open-circuit voltage (that is, hook up a volt meter across the device, which nominally doesn't allow current to flow) across those wires as a function of where we illuminated them with a near-IR laser.  Because the metal absorbs some of the light, that laser spot acts like a local heat source (though figuring out the temperature profile requires some modeling of the heat transfer processes).   As mentioned here, particles tend to diffuse from hot locations to cold locations; in an open circuit, a voltage builds up to balance out this tendency, because in the steady state no net current flows in an open circuit; and in a metal, the way electron motion and scattering depend on the energy of the electrons gives you the magnitude and sign of this process.   If the metal is sufficiently nanoscale that boundary scattering matters, you end up with a thermoelectric response that depends on the metal geometry.  The end result is shown in the left portion of the figure.  If you illuminate the center of the metal wire, you measure no net voltage - you shouldn't, because the whole system is symmetric.  The junction where the wire fans out to a bigger pad acts like a thermocouple because of that boundary scattering, and if you illuminate it you get a net thermoelectric voltage (sign depends on how you pick ground and which end you're illuminating).   Bottom line:  Illumination heats the electrons a bit (say a few Kelvin), and you get a thermoelectric voltage because of that, to offset the tendency of the electrons to diffuse due to the temperature gradient.  In this system, the size of the effect is small - microvolts at our illumination conditions.

Now we can take that same nanowire, and break it to make a tunnel junction somewhere in there - a gap between the two electrodes where the electrons are able to "tunnel" across from one side to the other.  When we illuminate the tunnel junction, we now see open-circuit photovoltages that are much larger, and very localized to the gap region.  So, what is going on here?  The physics is related, but not true thermoelectricity (which assumes that it always makes sense to define temperature everywhere).   What we believe is happening is something that was discussed theoretically here, and was reported in molecule-containing junctions here.   As I said when talking about hot electrons, when light gets absorbed, it is possible to kick electrons way up in energy.  Usually that energy gets dissipated by being spread among other electrons very quickly.  However, if hot electrons encounter the tunnel junction before they've lost most of that energy, they have a higher likelihood of getting across the tunnel junction, because quantum tunneling is energy-dependent.  Producing more hot electrons on one side of the junction than the other will drive a tunneling current.  We still have an open circuit, though, so some voltage has to build up so that the net current in the steady state adds up to zero.  Bottom line:  Illumination here can drive a "hot" electron tunneling current, and you get a photovoltage to offset that process.  This isn't strictly a thermoelectric effect because the electrons aren't thermally distributed - it's the short-lived high energy tail that matters most.

It's fun to think about ways to try to better understand and maximize such effects, perhaps for applications in photodetection or other technologies....

Tim GowersIntransitive dice III

I now feel more optimistic about the prospects for this project. I don’t know whether we’ll solve the problem, but I think there’s a chance. But it seems that there is after all enough appetite to make it an “official” Polymath project. Perhaps we could also have an understanding that the pace of the project will be a little slower than it has been for most other projects. I myself have various other mathematical projects on the boil, so can’t spend too much time on this one, but quite like the idea of giving it an occasional go when the mood takes me, and trying to make slow but steady progress. So I’ve created a polymath13 category, into which this post now fits. I’ve also retrospectively changed the category for the previous two posts. I don’t think we’ve got to the stage where a wiki will be particularly useful, but I don’t rule that out at some point in the future.

In this post I want to expand on part of the previous one, to try to understand better what would need to be true for the quasirandomness assertion to be true. I’ll repeat a few simple definitions and simple facts needed to make the post more self-contained.

By an nsided die I mean a sequence in [n]^n (where [n] is shorthand for \{1,2,\dots,n\}) that adds up to n(n+1)/2. Given an n-sided die A=(a_1,\dots,a_n) and j\in[n], I define g_A(j) to be the number of i such that a_i\leq j and h_A(j) to be g_A(j)-j.

We can write h_A(j) as \sum_i\mathbf 1_{[a_i\leq j]}-j. Therefore, if C=(c_1,\dots,c_n) is another die, or even just an arbitrary sequence in [n]^n, we have that
\sum_jh_A(c_j)=\sum_j(\sum_i\mathbf 1_{[a_i\leq c_j]}-c_j)=\sum_{i,j}\mathbf 1_{[a_i\leq c_j]} - \sum_jc_j.
If \sum_jc_j=n(n+1)/2 and no a_i is equal to any c_j, then the sign of this sum therefore tells us whether A beats C. For most A, we don’t expect many ties, so the sign of the sum is a reasonable, but not perfect, proxy for which of the two dice wins. (With a slightly more complicated function we can avoid the problem of ties: I shall stick with the simpler one for ease of exposition, but would expect that if proofs could be got to work, then we would switch to the more complicated functions.)

This motivates the following question. Let A and B be two random dice. Is it the case that with high probability the remaining dice C are split into four sets of roughly equal size according to the signs of h_A(C) and h_B(C)? I expect the answer to this question to be the same as the answer to the original transitivity question, but I haven’t checked as carefully as I should that my cavalier approach to ties isn’t problematic.

I propose the following way of tackling this question. We fix A and B and then choose a purely random sequence C=(c_1,\dots,c_n) (that is, with no constraint on the sum) and look at the 3D random variable
Each coordinate separately is a sum of independent random variables with mean zero, so provided not too many of the h_A or h_B are zero, which for random A and B is a reasonable assumption, we should get something that approximates a trivariate normal distribution.

Therefore, we should expect that when we condition on \sum_j(c_j-(n+1)/2) being zero, we will get something that approximates a bivariate normal distribution. Although that may not be completely straightforward to prove rigorously, tools such as the Berry-Esseen theorem ought to be helpful, and I’d be surprised if this was impossibly hard. But for now I’m aiming at a heuristic argument, so I want simply to assume it.

What we want is for the signs of the first two coordinates to be approximately independent, which I think is equivalent to saying (assuming normality) that the first two coordinates themselves are approximately independent.

However, what makes the question interesting is that the first two coordinates are definitely not independent without the conditioning: the random variables \sum_jh_A(c_j) and \sum_jh_B(c_j) are typically quite strongly correlated. (There are good reasons to expect this to be the case, and I’ve tested it computationally too.) Also, we expect correlations between these variables and \sum_j(c_j-(n+1)/2). So what we are asking for is that all these correlations should disappear when we condition appropriately. More geometrically, there is a certain ellipsoid, and we want its intersection with a certain plane to be a circle.

The main aim of this post is to make the last paragraph more precise. That is, I want to take three standard normal random variables X, Y and Z that are not independent, and understand precisely the circumstances that guarantee that X and Y become independent when we condition on Z.

The joint distribution of (X,Y,Z) is determined by the matrix of correlations. Let this matrix be split up as \begin{pmatrix}\Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22}\\ \end{pmatrix}, where \Sigma_{11} is the 2\times 2 covariance matrix of (X,Y), \Sigma_{12} is a 2\times 1 matrix, \Sigma_{21} is a 1\times 2 matrix and \Sigma_{22} is the matrix (1). A general result about conditioning joint normal distributions on a subset of the variables tells us, if I understand the result correctly, that the covariance matrix of (X,Y) when we condition on the value of Z is \Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}. (I got this from Wikipedia. It seems to be quite tricky to prove, so I hope it really can be used as a black box.) So in our case if we have a covariance matrix \begin{pmatrix}1&\lambda&\mu\\ \lambda&1&\nu\\ \mu&\nu&1\\ \end{pmatrix} then the covariance matrix of (X,Y) conditioned on Z should be \begin{pmatrix}1-\mu^2&\lambda-\mu\nu\\ \lambda-\mu\nu&1-\nu^2\\ \end{pmatrix}.

That looks dimensionally odd because I normalized the random variables to have variance 1. If instead I had started with the more general covariance matrix \begin{pmatrix}a&\lambda&\mu\\ \lambda&b&\nu\\ \mu&\nu&c\\ \end{pmatrix} I would have ended up with \begin{pmatrix}a-c^{-1}\mu^2&\lambda-c^{-1}\mu\nu\\ \lambda-c^{-1}\mu\nu&b-c^{-1}\nu^2\\ \end{pmatrix}.

So after the conditioning, if we want X and Y to become independent, we appear to want \lambda c to equal \mu\nu. That is, we want
\langle X,Z\rangle\langle Y,Z\rangle=\langle X,Y\rangle\langle Z,Z\rangle
where I am using angle brackets for covariances.

If we divide each variable by its standard deviation, that gives us that the correlation between X and Y should be the product of the correlation between X and Z and the correlation between Y and Z.

I wrote some code to test this, and it seemed not to be the case, anything like, but I am not confident that I didn’t make careless mistakes in the code. (However, my correlations were reasonable numbers in the range [-1,1], so any mistakes there might have been didn’t jump out at me. I might just rewrite the code from scratch without looking at the old version.)

One final remark I’d like to make is that if you feel there is something familiar about the expression \langle x,z\rangle\langle y,z\rangle-\langle x,y\rangle\langle z,z\rangle, then you are not completely wrong. The formula for the vector triple product is

x\wedge(y\wedge z)=\langle x,z\rangle y - \langle x,y\rangle z.

Therefore, the expression can be condensed to \langle x\wedge(y\wedge z),z\rangle. Now this is the scalar triple product of the three vectors x, y\wedge z, and z. For this to be zero, we need x to lie in the plane generated by y\wedge z and z. Note that y\wedge z is orthogonal to both y and z. So if P is the orthogonal projection to the subspace generated by z, we want x-Px to be orthogonal to y. Actually, that can be read out of the original formula too, since it is \langle\langle x,z\rangle z - \langle z,z\rangle x,y\rangle. A nicer way of thinking of it (because more symmetrical) is that we want the orthogonal projections of x and y to the subspace orthogonal to z to be orthogonal. To check that, assuming (WLOG) that \|z\|_2=1,
\langle x-\langle x,z\rangle z,y-\langle y,z\rangle z\rangle=\langle x,y\rangle-\langle x,z\rangle\langle y,z\rangle.

So what I’d like to see done (but I’m certainly not saying it’s the only thing worth doing) is the following.

1. Test experimentally whether for a random pair (A,B) of n-sided dice we find that the correlations of the random variables X=\sum_jh_A(j), Y=\sum_jh_B(j) and Z=\sum_j(c_j-(n+1)/2) really do appear to satisfy the relationship

corr(X,Z).corr(Y,Z)= corr(X,Y).

Here the c_j are chosen randomly without any conditioning on their sum. My experiment seemed to indicate not, but I’m hoping I made a mistake.

2. If they do satisfy that relationship, then we can start to think about why.

3. If they do not satisfy it, then we can start to think about why not. In particular, which of the heuristic assumptions used to suggest that they should satisfy that relationship is wrong — or is it my understanding of multivariate normals that is faulty?

If we manage to prove that they typically do satisfy that relationship, at least approximately, then we can think about whether various distributions become sufficiently normal sufficiently quickly for that to imply that intransitivity occurs with probability 1/4.

Scott AaronsonYet more errors in papers

Following up on my posts PostBQP Postscripts and More Wrong Things I Said In Papers, it felt like time for another post in which I publicly flog myself for mistakes in my research papers.  [Warning: The rest of this post is kinda, sorta technical.  Read at your own risk.]

(1) In my 2006 paper “Oracles are subtle but not malicious,” I claimed to show that if PP is contained in BQP/qpoly, then the counting hierarchy collapses to QMA (Theorem 5).  But on further reflection, I only know how to show a collapse of the counting hierarchy under the stronger assumption that PP is in BQP/poly.  If PP is in BQP/qpoly, then certainly P#P=PP=QMA, but I don’t know how to collapse any further levels of the counting hierarchy.  The issue is this: in QMA, we can indeed nondeterministically guess an (amplified) quantum advice state for a BQP/qpoly algorithm.  We can then verify that the advice state works to solve PP problems, by using (for example) the interactive protocol for the permanent, or some other #P-complete problem.  But having done that, how do we then unravel the higher levels of the counting hierarchy?  For example, how do we simulate PPPP in PPBQP=PP?  We don’t have any mechanism to pass the quantum advice up to the oracle PP machine, since queries to a PP oracle are by definition classical strings.  We could try to use tools from my later paper with Andy Drucker, passing a classical description of the quantum advice up to the oracle and then using the description to reconstruct the advice for ourselves.  But doing so just doesn’t seem to give us a complexity class that’s low for PP, which is what would be needed to unravel the counting hierarchy.  I still think this result might be recoverable, but a new idea is needed.

(2) In my 2008 algebrization paper with Avi Wigderson, one of the most surprising things we showed was a general connection between communication complexity lower bounds and algebraic query complexity lower bounds.  Specifically, given a Boolean oracle A:{0,1}n→{0,1}, let ~A be a low-degree extension of A over a finite field F (that is, ~A(x)=A(x) whenever x∈{0,1}n).  Then suppose we have an algorithm that’s able to learn some property of A, by making k black-box queries to ~A.  We observed that, in such a case, if Alice is given the top half of the truth table of A, and Bob is given the bottom half of the truth table, then there’s necessarily a communication protocol by which Alice and Bob can learn the same property of A, by exchanging at most O(kn log|F|) bits.  This connection is extremely model-independent: a randomized query algorithm gives rise to a randomized communication protocol, a quantum query algorithm gives rise to a quantum communication protocol, etc. etc.  The upshot is that, if you want to lower-bound the number of queries that an algorithm needs to make to the algebraic extension oracle ~A, in order to learn something about A, then it suffices to prove a suitable communication complexity lower bound.  And the latter, unlike algebraic query complexity, is a well-studied subject with countless results that one can take off the shelf.  We illustrated how one could use this connection to prove, for example, that there exists an oracle A such that NPA ⊄ BQP~A, for any low-degree extension ~A of A—a separation that we didn’t and don’t know how to prove any other way. Likewise, there exists an oracle B such that BQPB ⊄ BPP~B for any low-degree extension ~B of B.

The trouble is, our “proof sketches” for these separations (in Theorem 5.11) are inadequate, even for “sketches.”  They can often be fixed, but only by appealing to special properties of the communication complexity separations in question, properties that don’t necessarily hold for an arbitrary communication separation between two arbitrary models.

The issue is this: while it’s true, as we claimed, that a communication complexity lower bound implies an algebraic query complexity lower bound, it’s not true in general that a communication complexity upper bound implies an algebraic query complexity upper bound!  So, from a communication separation between models C and D, we certainly obtain a query complexity problem that’s not in D~A, but then the problem might not be in CA.  What tripped us up was that, in the cases we had in mind (e.g. Disjointness), it’s obvious that the query problem is in CA.  In other cases, however, such as Raz’s separation between quantum and randomized communication complexity, it probably isn’t even true.  In the latter case, to recover the desired conclusion about algebraic query complexity (namely, the existence of an oracle B such that BQPB ⊄ BPP~B), what seems to be needed is to start from a later quantum vs. classical communication complexity separation due to Klartag and Regev, and then convert their communication problem into a query problem using a recent approach by myself and Shalev Ben-David (see Section 4).  Unfortunately, my and Shalev’s approach only tells us nonconstructively that there exists a query problem with the desired separation, with no upper bound on the gate complexity of the quantum algorithm.  So strictly speaking, I still don’t know how to get a separation between the relativized complexity classes BQPB and BPP~B defined in terms of Turing machines.

In any case, I of course should have realized this issue with the algebrization paper the moment Shalev and I encountered the same issue when writing our later paper.  Let me acknowledge Shalev, as well as Robin Kothari, for helping to spur my realization of the issue.

In case it wasn’t clear, the mistakes I’ve detailed here have no effect on the main results of the papers in question (e.g., the existence of an oracle relative to which PP has linear-sized circuits; the existence and pervasiveness of the algebrization barrier).  The effect is entirely on various “bonus” results—results that, because they’re bonus, were gone over much less carefully by authors and reviewers alike.

Nevertheless, I’ve always felt like in science, the louder you are about your own mistakes, the better.  Hence this post.

May 23, 2017

Andrew JaffeNot-quite hacked

This week, the New York Times, The Wall Street Journal and Twitter, along with several other news organizations, have all announced that they were attacked by (most likely) Chinese hackers.

I am not quite happy to join their ranks: for the last few months, the traffic on this blog has been vastly dominated by attempts to get into the various back-end scripts that run this site, either by direct password hacks or just denial-of-service attacks. In fact, I only noticed it because the hackers exceeded my bandwidth allowance by a factor of a few (and costing me a few hundred bucks in over-usage charged by my host in the process, unfortunately).

I’ve since attempted to block the attacks by denying access to the IP addresses which have been the most active (mostly from domains that look like, for what it’s worth). So, my apologies if any of this results in any problems for anyone else trying to access the blog.

Andrew JaffeJSONfeed

More technical stuff, but I’m trying to re-train myself to actually write on this blog, so here goes…

For no good reason other than it was easy, I have added a JSONfeed to this blog. It can be found at, and accessed from the bottom of the right-hand sidebar if you’re actually reading this at

What does this mean? JSONfeed is an idea for a sort-of successor to something called RSS, which may stand for really simple syndication, a format for encapsulating the contents of a blog like this one so it can be indexed, consumed, and read in a variety of ways without explicitly going to my web page. RSS was created by developer, writer, and all around web-and-software guru Dave Winer, who also arguably invented — and was certainly part of the creation of — blogs and podcasting. Five or ten years ago, so-called RSS readers were starting to become a common way to consume news online. NetNewsWire was my old favourite on the Mac, although its original versions by Brent Simmons were much better than the current incarnation by a different software company; I now use something called Reeder. But the most famous one was Google Reader, which Google discontinued in 2013, thereby killing off most of the RSS-reader ecosystem.

But RSS is not dead: RSS readers still exist, and it is still used to store and transfer information between web pages. Perhaps most importantly, it is the format behind subscriptions to podcasts, whether you get them through Apple or Android or almost anyone else.

But RSS is kind of clunky, because it’s built on something called XML, an ugly but readable format for structuring information in files (HTML, used for the web, with all of its < and > “tags”, is a close cousin). Nowadays, people use a simpler family of formats called JSON for many of the same purposes as XML, but it is quite a bit easier for humans to read and write, and (not coincidentally) quite a bit easier to create computer programs to read and write.

So, finally, two more web-and-software developers/gurus, Brent Simmons and Manton Reece realised they could use JSON for the same purposes as RSS. Simmons is behind NewNewsWire and Reece’s most recent project is an “indie microblogging” platform (think Twitter without the giant company behind it), so they both have an interest in these things. And because JSON is so comparatively easy to use, there is already code that I could easily add to this blog so it would have its own JSONfeed. So I did it.

So it’s easy to create a JSONfeed. What there isn’t — so far — are any newsreaders like NetNewsWire or Reeder that can ingest them. (In fact, Maxime Vaillancourt apparently wrote a web-based reader in about an hour, but it may already be overloaded…). Still, looking forward to seeing what happens.