Planet Musings

October 21, 2017

Tommaso DorigoWhat Is Statistical Significance?

Yesterday, October 20, was the international day of Statistics. I took inspiration from it to select a clip from chapter 7 of my book "Anomaly! Collider physics and the quest for new phenomena at Fermilab" which attempts to explain how physicists use the concept of statistical significance to give a quantitative meaning to their measurements of new effects. I hope you will enjoy it....

----

read more

October 20, 2017

David Hoggself-calibrating pulsar arrays, and much more

I had a great conversation with Chiara Mingarelli (Flatiron) and Ellie Schwab (CUNY) today about pulsar-timing arrays and gravitational-wave sources. We are developing some ideas about self-calibration of the arrays, such that we might be able to simultanously search for coherent sources (that is: not just stochastic backgrounds) and also precisely determine the distances to the individual pulsars to many digits of accuracy!. It is futuristic stuff, and there are lots of ways it might fail badly, but if I am right that the self-calibration of the arrays is possible, it would make the arrays a few to tens of times more sensitive to sources! We started with Mingarelli assigning us some reading homework.

In the Stars group meeting, we had a productive discussion led by Megan Bedell (Flatiron), Andrew Mann (Columbia), and John Brewer (Yale) about things learned at the recent #KnowThyStar conference. There are some new uses of machine learning and data-driven models that I might need to spend some time criticizing! And it appears that there are some serious discrepancies between asteroseismic scaling relations for stellar radii and interferometric measurements. Not bigger than those expected by the stellar experts, apparently, but much bigger than assumed by some of the exoplanet community.

Prior to that, in our weekly Gaia DR2 prep working session, we discussed the use of proper motion as a distance indicator in a post-reduced-proper-motion world. That is: The assumptions underlying reduced proper motion are not great, and will be strongly violated in the DR2 data set. So let's replace it with a much better thing!

Adrian Price-Whelan (Princeton) showed some incredible properties of (flowing from beautiful design of) the astropy coordinates package. Damn!

Terence TaoThe logarithmically averaged and non-logarithmically averaged Chowla conjectures

Let {\lambda: {\bf N} \rightarrow \{-1,1\}} be the Liouville function, thus {\lambda(n)} is defined to equal {+1} when {n} is the product of an even number of primes, and {-1} when {n} is the product of an odd number of primes. The Chowla conjecture asserts that {\lambda} has the statistics of a random sign pattern, in the sense that

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (1)

for all {k \geq 1} and all distinct natural numbers {h_1,\dots,h_k}, where we use the averaging notation

\displaystyle  \mathbb{E}_{n \leq N} f(n) := \frac{1}{N} \sum_{n \leq N} f(n).

For {k=1}, this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any {k \geq 2}.

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (2)

of the conjecture, where we use the logarithmic averaging notation

\displaystyle  \mathbb{E}_{n \leq N}^{\log} f(n) := \frac{\sum_{n \leq N} \frac{f(n)}{n}}{\sum_{n \leq N} \frac{1}{n}}.

Using the summation by parts (or telescoping series) identity

\displaystyle  \sum_{n \leq N} \frac{f(n)}{n} = \sum_{M < N} \frac{1}{M(M+1)} (\sum_{n \leq M} f(n)) + \frac{1}{N} \sum_{n \leq N} f(n) \ \ \ \ \ (3)

it is not difficult to show that the Chowla conjecture (1) for a given {k,h_1,\dots,h_k} implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for {k=1}, we have already mentioned that the Chowla conjecture

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0

is equivalent to the prime number theorem; but the logarithmically averaged analogue

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}^{\log}_{n \leq N} \lambda(n) = 0

is significantly easier to show (a proof with the Liouville function {\lambda} replaced by the closely related Möbius function {\mu} is given in <a href="

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0

“>this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for {k=2}, and in this recent paper with Joni Teravainen, we proved the conjecture for all odd {k} (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1 Assume that the logarithmically averaged Chowla conjecture (2) is true for all {k}. Then there exists a sequence {N_i} going to infinity such that the Chowla conjecture (1) is true for all {k} along that sequence, that is to say

\displaystyle  \lim_{N_i \rightarrow \infty} \mathbb{E}_{n \leq N_i} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for all {k} and all distinct {h_1,\dots,h_k}.

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2 Let {k} be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for {2k}. Then there exists a set {{\mathcal N}} of natural numbers of logarithmic density {1} (that is, {\lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}} = 1}) such that

\displaystyle  \lim_{N \rightarrow \infty: N \in {\mathcal N}} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for any distinct {h_1,\dots,h_k}.

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ({k=2} and odd {k}) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct {h_1,\dots,h_k}, we take a large number {H} and consider the limiting the second moment

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2.

We can expand this as

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{m,m' \leq H} \mathop{\bf E}_{n \leq N}^{\log} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)

\displaystyle  \lambda(n+m'+h_1) \dots \lambda(n+m'+h_k).

If all the {m+h_1,\dots,m+h_k,m'+h_1,\dots,m'+h_k} are distinct, the hypothesis (2) tells us that the inner averages goes to zero as {N \rightarrow \infty}. The remaining averages are {O(1)}, and there are {O( k^2 )} of these averages. We conclude that

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k^2 / H.

By Markov’s inequality (and (3)), we conclude that for any fixed {h_1,\dots,h_k, H}, there exists a set {{\mathcal N}_{h_1,\dots,h_k,H}} of upper logarithmic density at least {1-k/H^{1/2}}, thus

\displaystyle  \limsup_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}_{h_1,\dots,h_k,H}} \geq 1 - k/H^{1/2}

such that

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

By deleting at most finitely many elements, we may assume that {{\mathcal N}_{h_1,\dots,h_k,H}} consists only of elements of size at least {H^2} (say).

For any {H_0}, if we let {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} be the union of {{\mathcal N}_{h_1,\dots,h_k, H}} for {H \geq H_0}, then {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} has logarithmic density {1}. By a diagonalisation argument (using the fact that the set of tuples {(h_1,\dots,h_k)} is countable), we can then find a set {{\mathcal N}} of natural numbers of logarithmic density {1}, such that for every {h_1,\dots,h_k,H_0}, every sufficiently large element of {{\mathcal N}} lies in {{\mathcal N}_{h_1,\dots,h_k,\geq H_0}}. Thus for every sufficiently large {N} in {{\mathcal N}}, one has

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

for some {H \geq H_0} with {N \geq H^2}. By Cauchy-Schwarz, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k) \ll k^{1/2} / H^{1/4};

interchanging the sums and using {N \geq H^2} and {H \geq H_0}, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) \ll k^{1/2} / H^{1/4} \leq k^{1/2} / H_0^{1/4}.

We conclude on taking {H_0} to infinity that

\displaystyle  \lim_{N \rightarrow \infty; N \in {\mathcal N}} \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

as required.


Filed under: expository, math.CO, math.DS, math.NT, math.PR Tagged: Chowla conjecture, second moment method

David Hoggwriting projects

Coming off my personal success of (finally) getting a paper on the arXiv yesterday (check the footnote on the cover page), I worked through two projects that are close to being writeable or finishable. The first is a paper with Stephen Feeney (Flatiron) on the Lutz-Kelker correction, when to use it (never) and what it is (a correction from ML to MAP). The second is a document I wrote many months ago about finding similar or identical objects in noisy data. After I read through both, I got daunted by the work that needs to happen! So I borked. I love my job! But writing is definitely hard.

Terence TaoHeat flow and zeroes of polynomials

Let {P(z) = z^n + a_{n-1} z^{n-1} + \dots + a_0} be a monic polynomial of degree {n} with complex coefficients. Then by the fundamental theorem of algebra, we can factor {P} as

\displaystyle  P(z) = (z-z_1) \dots (z-z_n) \ \ \ \ \ (1)

for some complex zeroes {z_1,\dots,z_n} (possibly with repetition).

Now suppose we evolve {P} with respect to time by heat flow, creating a function {P(t,z)} of two variables with given initial data {P(0,z) = P(z)} for which

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z). \ \ \ \ \ (2)

On the space of polynomials of degree at most {n}, the operator {\partial_{zz}} is nilpotent, and one can solve this equation explicitly both forwards and backwards in time by the Taylor series

\displaystyle  P(t,z) = \sum_{j=0}^\infty \frac{t^j}{j!} \partial_z^{2j} P(0,z).

For instance, if one starts with a quadratic {P(0,z) = z^2 + bz + c}, then the polynomial evolves by the formula

\displaystyle  P(t,z) = z^2 + bz + (c+2t).

As the polynomial {P(t)} evolves in time, the zeroes {z_1(t),\dots,z_n(t)} evolve also. Assuming for sake of discussion that the zeroes are simple, the inverse function theorem tells us that the zeroes will (locally, at least) evolve smoothly in time. What are the dynamics of this evolution?

For instance, in the quadratic case, the quadratic formula tells us that the zeroes are

\displaystyle  z_1(t) = \frac{-b + \sqrt{b^2 - 4(c+2t)}}{2}

and

\displaystyle  z_2(t) = \frac{-b - \sqrt{b^2 - 4(c+2t)}}{2}

after arbitrarily choosing a branch of the square root. If {b,c} are real and the discriminant {b^2 - 4c} is initially positive, we see that we start with two real zeroes centred around {-b/2}, which then approach each other until time {t = \frac{b^2-4c}{8}}, at which point the roots collide and then move off from each other in an imaginary direction.

In the general case, we can obtain the equations of motion by implicitly differentiating the defining equation

\displaystyle  P( t, z_i(t) ) = 0

in time using (2) to obtain

\displaystyle  \partial_{zz} P( t, z_i(t) ) + \partial_t z_i(t) \partial_z P(t,z_i(t)) = 0.

To simplify notation we drop the explicit dependence on time, thus

\displaystyle  \partial_{zz} P(z_i) + (\partial_t z_i) \partial_z P(z_i)= 0.

From (1) and the product rule, we see that

\displaystyle  \partial_z P( z_i ) = \prod_{j:j \neq i} (z_i - z_j)

and

\displaystyle  \partial_{zz} P( z_i ) = 2 \sum_{k:k \neq i} \prod_{j:j \neq i,k} (z_i - z_j)

(where all indices are understood to range over {1,\dots,n}) leading to the equations of motion

\displaystyle  \partial_t z_i = \sum_{k:k \neq i} \frac{2}{z_k - z_i}, \ \ \ \ \ (3)

at least when one avoids those times in which there is a repeated zero. In the case when the zeroes {z_i} are real, each term {\frac{2}{z_k-z_i}} represents a (first-order) attraction in the dynamics between {z_i} and {z_k}, but the dynamics are more complicated for complex zeroes (e.g. purely imaginary zeroes will experience repulsion rather than attraction, as one already sees in the quadratic example). Curiously, this system resembles that of Dyson brownian motion (except with the brownian motion part removed, and time reversed). I learned of the connection between the ODE (3) and the heat equation from this paper of Csordas, Smith, and Varga, but perhaps it has been mentioned in earlier literature as well.

One interesting consequence of these equations is that if the zeroes are real at some time, then they will stay real as long as the zeroes do not collide. Let us now restrict attention to the case of real simple zeroes, in which case we will rename the zeroes as {x_i} instead of {z_i}, and order them as {x_1 < \dots < x_n}. The evolution

\displaystyle  \partial_t x_i = \sum_{k:k \neq i} \frac{2}{x_k - x_i}

can now be thought of as reverse gradient flow for the “entropy”

\displaystyle  H := -\sum_{i,j: i \neq j} \log |x_i - x_j|,

(which is also essentially the logarithm of the discriminant of the polynomial) since we have

\displaystyle  \partial_t x_i = \frac{\partial H}{\partial x_i}.

In particular, we have the monotonicity formula

\displaystyle  \partial_t H = 4E

where {E} is the “energy”

\displaystyle  E := \frac{1}{4} \sum_i (\frac{\partial H}{\partial x_i})^2

\displaystyle  = \sum_i (\sum_{k:k \neq i} \frac{1}{x_k-x_i})^2

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2} + 2 \sum_{i,j,k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_j-x_i)}

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2}

where in the last line we use the antisymmetrisation identity

\displaystyle  \frac{1}{(x_k-x_i)(x_j-x_i)} + \frac{1}{(x_i-x_j)(x_k-x_j)} + \frac{1}{(x_j-x_k)(x_i-x_k)} = 0.

Among other things, this shows that as one goes backwards in time, the entropy decreases, and so no collisions can occur to the past, only in the future, which is of course consistent with the attractive nature of the dynamics. As {H} is a convex function of the positions {x_1,\dots,x_n}, one expects {H} to also evolve in a convex manner in time, that is to say the energy {E} should be increasing. This is indeed the case:

Exercise 1 Show that

\displaystyle  \partial_t E = 2 \sum_{i,j: i \neq j} (\frac{2}{(x_i-x_j)^2} - \sum_{k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_k-x_j)})^2.

Symmetric polynomials of the zeroes are polynomial functions of the coefficients and should thus evolve in a polynomial fashion. One can compute this explicitly in simple cases. For instance, the center of mass is an invariant:

\displaystyle  \partial_t \frac{1}{n} \sum_i x_i = 0.

The variance decreases linearly:

Exercise 2 Establish the virial identity

\displaystyle  \partial_t \sum_{i,j} (x_i-x_j)^2 = - 4n^2(n-1).

As the variance (which is proportional to {\sum_{i,j} (x_i-x_j)^2}) cannot become negative, this identity shows that “finite time blowup” must occur – that the zeroes must collide at or before the time {\frac{1}{4n^2(n-1)} \sum_{i,j} (x_i-x_j)^2}.

Exercise 3 Show that the Stieltjes transform

\displaystyle  s(t,z) = \sum_i \frac{1}{x_i - z}

solves the viscous Burgers equation

\displaystyle  \partial_t s = \partial_{zz} s - 2 s \partial_z s,

either by using the original heat equation (2) and the identity {s = - \partial_z P / P}, or else by using the equations of motion (3). This relation between the Burgers equation and the heat equation is known as the Cole-Hopf transformation.

The paper of Csordas, Smith, and Varga mentioned previously gives some other bounds on the lifespan of the dynamics; roughly speaking, they show that if there is one pair of zeroes that are much closer to each other than to the other zeroes then they must collide in a short amount of time (unless there is a collision occuring even earlier at some other location). Their argument extends also to situations where there are an infinite number of zeroes, which they apply to get new results on Newman’s conjecture in analytic number theory. I would be curious to know of further places in the literature where this dynamics has been studied.


Filed under: expository, math.AP, math.CV, math.DS Tagged: entropy, heat flow, polynomials, zeroes

Steinn Sigurðsson"In the beginning..."


"In the beginning was the command line..."

a lot of people who ought to have read Neal Stephenson's essay on the UI as a metaphor, have not done so.

This is a public service.

Go get a copy, then carry it with you until you get stuck at O'Hare long enough to read it, or whatever works for you.


Doug NatelsonNeutron stars and condensed matter physics

In the wake of the remarkable results reported earlier this week regarding colliding neutron stars, I wanted to write just a little bit about how a condensed matter physics concept is relevant to these seemingly exotic systems.

When you learn high school chemistry, you learn about atomic orbitals, and you learn that electrons "fill up" those orbitals starting with the lowest energy (most deeply bound) states, two electrons of opposite spin per orbital.  (This is a shorthand way of talking about a more detailed picture, involving words like "linear combination of Slater determinants", but that's a detail in this discussion.)  The Pauli principle, the idea that (because electrons are fermions) all the electrons can't just fall down into the lowest energy level, leads to this.  In solid state systems we can apply the same ideas.  In a metal like gold or copper, the density of electrons is high enough that the highest kinetic energy electrons are moving around at ~ 0.5% of the speed of light (!).  

If you heat up the electrons in a metal, they get more spread out in energy, with some occupying higher energy levels and some lower energy levels being empty.   To decide whether the metal is really "hot" or "cold", you need a point of comparison, and the energy scale gives you that.  If most of the low energy levels are still filled, the metal is cold.  If the ratio of the thermal energy scale, \(k_{\mathrm{B}}T\) to the depth of the lowest energy levels (essentially the Fermi energy, \(E_{\mathrm{F}}\) is much less than one, then the electrons are said to be "degenerate".  In common metals, \(E_{\mathrm{F}}\) is several eV, corresponding to a temperature of tens of thousands of Kelvin.  That means that even near the melting point of copper, the electrons are effectively very cold.

Believe it or not, a neutron star is a similar system.  If you squeeze a bit more than one solar mass into a sphere 10 km across, the gravitational attraction is so strong that the electrons and protons in the matter are crushed together to form a degenerate ball of neutrons.  Amazingly, by our reasoning above, the neutrons are actually very very cold.  The Fermi energy for those neutrons corresponds to a temperature of nearly \(10^{12}\) K.  So, right up until they smashed into each other, those two neutron stars spotted by the LIGO observations were actually incredibly cold, condensed objects.   It's also worth noting that the properties of neutron stars are likely affected by another condensed matter phenomenon, superfluidity.   Just as electrons can pair up and condense into a superconducting state under some circumstances, it is thought that cold, degenerate neutrons can do the same thing, even when "cold" here might mean \(5 \times 10^{8}\) K.

BackreactionSpace may not be as immaterial as we thought

Galaxy slime. [Img Src] Physicists have gathered evidence that space-time can behave like a fluid. Mathematical evidence, that is, but still evidence. If this relation isn’t a coincidence, then space-time – like a fluid – may have substructure. We shouldn’t speak of space and time as if the two were distant cousins. We have known at least since Einstein that space and time are inseparable,

October 19, 2017

Steinn Sigurðssonthe best things in life are free

The The arXiv wants your $

arXiv  is an e-print service in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics

each day it receives several hundred e-prints, mostly preprints, categorizes them and distributes the list of papers, with full access to the pdf and, when available, the TeX source  - which is most of the time, TeX rocks

authors submit the e-prints, with license to distribute, and users receive the list of the day's papers, for the categories they express interest in, and access to content, free, originally by e-mail, now web

almost all papers in theoretical physics, mathematics and astrophysics now go on the arXiv, as does an increasing fraction from the newer fields
there are multiple other *Xivs  covering other subject areas with varying degree of success

the arXiv now holds over a million e-prints, going back to 1991, and a little bit beyond, as people have sent in old stuff to archive on the arXiv, e-prints are coming at about 10,000 per month and growing

the service is slim, by design, almost minimalistic, but oh so powerful

you all use it, obsessively, you couldn't do without it!

arXiv is not actually cost free, it as an IT staff, its own server and a management team and development team
a lot of the current cost is provided by member institutions, and Cornell University

but... we could use more, so the annual fundraising drive is under way, this week only
 - ok, you can give any time, but this is sorta special

Steinn SigurðssonWhy, yes, it is all about me...


Dynamics of Cats is back.

This is the original, Ye Olde, blog, started back in March of 20065, it had a decent run through June 2006, at which point I was invited to join the Scienceblogs collective at what was then SEED magazine.

A fun decade ensued, as blogs boomed,  markets crashed, SEED realized Sb was keeping the rest of the group going, and then National Geographic ate Sb, which then started to shrivel.

I blog strictly to amuse myself, no promises.

I find blogging is good warmup for serious writing.  Actual output is very sensitive to Real Life,  I tend to be more prolific when busy doing stuff, and less prolific when subsumed by administrivia and other intrusions from The World.

I'm a physicist, educated in England, PhD from a small private university in California.
Postdocs in NoCal and back to UK, and now a Professor of Astronomy and Astrophysics at The Pennsylvania State University.
Which is Good In Parts.

I am a member of:

I have never taken a class in astronomy, for credit.

In my copious spare time I am a Science Editor for the AAS Journals,
I am also a member of the Aspen Center for Physics 
and most recently I became the Scientific Director of arXiv

Steinn Sigurdsson Appointed as arXiv Scientific Director

As noted above (you did read the disclaimer...),
I do not speak for any of these Institutions upon this here blog.

We also have a dog.
Gunnar.
Named after the poet-warrior of the Sagas.
Don't ask, he had the name when he moved in with us.



We used to have cats, but they, sadly, died.

October 18, 2017

BackreactionI totally mean it: Inflation never solved the flatness problem.

I’ve had many interesting reactions to my recent post about inflation, this idea that the early universe expanded exponentially and thereby flattened and smoothed itself. The maybe most interesting response to my pointing out that inflation doesn’t solve the problems it was invented to solve is a flabbergasted: “But everyone else says it does.” Not like I don’t know that. But, yes, most people

October 17, 2017

Matt StrasslerThe Significance of Yesterday’s Gravitational Wave Announcement: an FAQ

Yesterday’s post on the results from the LIGO/VIRGO network of gravitational wave detectors was aimed at getting information out, rather than providing the pedagogical backdrop.  Today I’m following up with a post that attempts to answer some of the questions that my readers and my personal friends asked me.  Some wanted to understand better how to visualize what had happened, while others wanted more clarity on why the discovery was so important.  So I’ve put together a post which  (1) explains what neutron stars and black holes are and what their mergers are like, (2) clarifies why yesterday’s announcement was important — and there were many reasons, which is why it’s hard to reduce it all to a single soundbite.  And (3) there are some miscellaneous questions at the end.

First, a disclaimer: I am *not* an expert in the very complex subject of neutron star mergers and the resulting explosions, called kilonovas.  These are much more complicated than black hole mergers.  I am still learning some of the details.  Hopefully I’ve avoided errors, but you’ll notice a few places where I don’t know the answers … yet.  Perhaps my more expert colleagues will help me fill in the gaps over time.

Please, if you spot any errors, don’t hesitate to comment!!  And feel free to ask additional questions whose answers I can add to the list.

BASIC QUESTIONS ABOUT NEUTRON STARS, BLACK HOLES, AND MERGERS

What are neutron stars and black holes, and how are they related?

Every atom is made from a tiny atomic nucleus, made of neutrons and protons (which are very similar), and loosely surrounded by electrons. Most of an atom is empty space, so it can, under extreme circumstances, be crushed — but only if every electron and proton convert to a neutron (which remains behind) and a neutrino (which heads off into outer space.) When a giant star runs out of fuel, the pressure from its furnace turns off, and it collapses inward under its own weight, creating just those extraordinary conditions in which the matter can be crushed. Thus: a star’s interior, with a mass one to several times the Sun’s mass, is all turned into a several-mile(kilometer)-wide ball of neutrons — the number of neutrons approaching a 1 with 57 zeroes after it.

If the star is big but not too big, the neutron ball stiffens and holds its shape, and the star explodes outward, blowing itself to pieces in a what is called a core-collapse supernova. The ball of neutrons remains behind; this is what we call a neutron star. It’s a ball of the densest material that we know can exist in the universe — a pure atomic nucleus many miles(kilometers) across. It has a very hard surface; if you tried to go inside a neutron star, your experience would be a lot worse than running into a closed door at a hundred miles per hour.

If the star is very big indeed, the neutron ball that forms may immediately (or soon) collapse under its own weight, forming a black hole. A supernova may or may not result in this case; the star might just disappear. A black hole is very, very different from a neutron star. Black holes are what’s left when matter collapses irretrievably upon itself under the pull of gravity, shrinking down endlessly. While a neutron star has a surface that you could smash your head on, a black hole has no surface — it has an edge that is simply a point of no return, called a horizon. In Einstein’s theory, you can just go right through, as if passing through an open door. You won’t even notice the moment you go in. [Note: this is true in Einstein’s theory. But there is a big controversy as to whether the combination of Einstein’s theory with quantum physics changes the horizon into something novel and dangerous to those who enter; this is known as the firewall controversy, and would take us too far afield into speculation.]  But once you pass through that door, you can never return.

Black holes can form in other ways too, but not those that we’re observing with the LIGO/VIRGO detectors.

Why are their mergers the best sources for gravitational waves?

One of the easiest and most obvious ways to make gravitational waves is to have two objects orbiting each other.  If you put your two fists in a pool of water and move them around each other, you’ll get a pattern of water waves spiraling outward; this is in rough (very rough!) analogy to what happens with two orbiting objects, although, since the objects are moving in space, the waves aren’t in a material like water.  They are waves in space itself.

To get powerful gravitational waves, you want objects each with a very big mass that are orbiting around each other at very high speed. To get the fast motion, you need the force of gravity between the two objects to be strong; and to get gravity to be as strong as possible, you need the two objects to be as close as possible (since, as Isaac Newton already knew, gravity between two objects grows stronger when the distance between them shrinks.) But if the objects are large, they can’t get too close; they will bump into each other and merge long before their orbit can become fast enough. So to get a really fast orbit, you need two relatively small objects, each with a relatively big mass — what scientists refer to as compact objects. Neutron stars and black holes are the most compact objects we know about. Fortunately, they do indeed often travel in orbiting pairs, and do sometimes, for a very brief period before they merge, orbit rapidly enough to produce gravitational waves that LIGO and VIRGO can observe.

Why do we find these objects in pairs in the first place?

Stars very often travel in pairs… they are called binary stars. They can start their lives in pairs, forming together in large gas clouds, or even if they begin solitary, they can end up pairing up if they live in large densely packed communities of stars where it is common for multiple stars to pass nearby. Perhaps surprisingly, their pairing can survive the collapse and explosion of either star, leaving two black holes, two neutron stars, or one of each in orbit around one another.

What happens when these objects merge?

Not surprisingly, there are three classes of mergers which can be detected: two black holes merging, two neutron stars merging, and a neutron star merging with a black hole. The first class was observed in 2015 (and announced in 2016), the second was announced yesterday, and it’s a matter of time before the third class is observed. The two objects may orbit each other for billions of years, very slowly radiating gravitational waves (an effect observed in the 70’s, leading to a Nobel Prize) and gradually coming closer and closer together. Only in the last day of their lives do their orbits really start to speed up. And just before these objects merge, they begin to orbit each other once per second, then ten times per second, then a hundred times per second. Visualize that if you can: objects a few dozen miles (kilometers) across, a few miles (kilometers) apart, each with the mass of the Sun or greater, orbiting each other 100 times each second. It’s truly mind-boggling — a spinning dumbbell beyond the imagination of even the greatest minds of the 19th century. I don’t know any scientist who isn’t awed by this vision. It all sounds like science fiction. But it’s not.

How do we know this isn’t science fiction?

We know, if we believe Einstein’s theory of gravity (and I’ll give you a very good reason to believe in it in just a moment.) Einstein’s theory predicts that such a rapidly spinning, large-mass dumbbell formed by two orbiting compact objects will produce a telltale pattern of ripples in space itself — gravitational waves. That pattern is both complicated and precisely predicted. In the case of black holes, the predictions go right up to and past the moment of merger, to the ringing of the larger black hole that forms in the merger. In the case of neutron stars, the instants just before, during and after the merger are more complex and we can’t yet be confident we understand them, but during tens of seconds before the merger Einstein’s theory is very precise about what to expect. The theory further predicts how those ripples will cross the vast distances from where they were created to the location of the Earth, and how they will appear in the LIGO/VIRGO network of three gravitational wave detectors. The prediction of what to expect at LIGO/VIRGO thus involves not just one prediction but many: the theory is used to predict the existence and properties of black holes and of neutron stars, the detailed features of their mergers, the precise patterns of the resulting gravitational waves, and how those gravitational waves cross space. That LIGO/VIRGO have detected the telltale patterns of these gravitational waves. That these wave patterns agree with Einstein’s theory in every detail is the strongest evidence ever obtained that there is nothing wrong with Einstein’s theory when used in these combined contexts.  That then in turn gives us confidence that our interpretation of the LIGO/VIRGO results is correct, confirming that black holes and neutron stars really exist and really merge. (Notice the reasoning is slightly circular… but that’s how scientific knowledge proceeds, as a set of detailed consistency checks that gradually and eventually become so tightly interconnected as to be almost impossible to unwind.  Scientific reasoning is not deductive; it is inductive.  We do it not because it is logically ironclad but because it works so incredibly well — as witnessed by the computer, and its screen, that I’m using to write this, and the wired and wireless internet and computer disk that will be used to transmit and store it.)

THE SIGNIFICANCE(S) OF YESTERDAY’S ANNOUNCEMENT OF A NEUTRON STAR MERGER

What makes it difficult to explain the significance of yesterday’s announcement is that it consists of many important results piled up together, rather than a simple takeaway that can be reduced to a single soundbite. (That was also true of the black hole mergers announcement back in 2016, which is why I wrote a long post about it.)

So here is a list of important things we learned.  No one of them, by itself, is earth-shattering, but each one is profound, and taken together they form a major event in scientific history.

First confirmed observation of a merger of two neutron stars: We’ve known these mergers must occur, but there’s nothing like being sure. And since these things are too far away and too small to see in a telescope, the only way to be sure these mergers occur, and to learn more details about them, is with gravitational waves.  We expect to see many more of these mergers in coming years as gravitational wave astronomy increases in its sensitivity, and we will learn more and more about them.

New information about the properties of neutron stars: Neutron stars were proposed almost a hundred years ago and were confirmed to exist in the 60’s and 70’s.  But their precise details aren’t known; we believe they are like a giant atomic nucleus, but they’re so vastly larger than ordinary atomic nuclei that can’t be sure we understand all of their internal properties, and there are debates in the scientific community that can’t be easily answered… until, perhaps, now.

From the detailed pattern of the gravitational waves of this one neutron star merger, scientists already learn two things. First, we confirm that Einstein’s theory correctly predicts the basic pattern of gravitational waves from orbiting neutron stars, as it does for orbiting and merging black holes. Unlike black holes, however, there are more questions about what happens to neutron stars when they merge. The question of what happened to this pair after they merged is still out — did the form a neutron star, an unstable neutron star that, slowing its spin, eventually collapsed into a black hole, or a black hole straightaway?

But something important was already learned about the internal properties of neutron stars. The stresses of being whipped around at such incredible speeds would tear you and I apart, and would even tear the Earth apart. We know neutron stars are much tougher than ordinary rock, but how much more? If they were too flimsy, they’d have broken apart at some point during LIGO/VIRGO’s observations, and the simple pattern of gravitational waves that was expected would have suddenly become much more complicated. That didn’t happen until perhaps just before the merger.   So scientists can use the simplicity of the pattern of gravitational waves to infer some new things about how stiff and strong neutron stars are.  More mergers will improve our understanding.  Again, there is no other simple way to obtain this information.

First visual observation of an event that produces both immense gravitational waves and bright electromagnetic waves: Black hole mergers aren’t expected to create a brilliant light display, because, as I mentioned above, they’re more like open doors to an invisible playground than they are like rocks, so they merge rather quietly, without a big bright and hot smash-up.  But neutron stars are big balls of stuff, and so the smash-up can indeed create lots of heat and light of all sorts, just as you might naively expect.  By “light” I mean not just visible light but all forms of electromagnetic waves, at all wavelengths (and therefore at all frequencies.)  Scientists divide up the range of electromagnetic waves into categories. These categories are radio waves, microwaves, infrared light, visible light, ultraviolet light, X-rays, and gamma rays, listed from lowest frequency and largest wavelength to highest frequency and smallest wavelength.  (Note that these categories and the dividing lines between them are completely arbitrary, but the divisions are useful for various scientific purposes.  The only fundamental difference between yellow light, a radio wave, and a gamma ray is the wavelength and frequency; otherwise they’re exactly the same type of thing, a wave in the electric and magnetic fields.)

So if and when two neutron stars merge, we expect both gravitational waves and electromagnetic waves, the latter of many different frequencies created by many different effects that can arise when two huge balls of neutrons collide.  But just because we expect them doesn’t mean they’re easy to see.  These mergers are pretty rare — perhaps one every hundred thousand years in each big galaxy like our own — so the ones we find using LIGO/VIRGO will generally be very far away.  If the light show is too dim, none of our telescopes will be able to see it.

But this light show was plenty bright.  Gamma ray detectors out in space detected it instantly, confirming that the gravitational waves from the two neutron stars led to a collision and merger that produced very high frequency light.  Already, that’s a first.  It’s as though one had seen lightning for years but never heard thunder; or as though one had observed the waves from hurricanes for years but never observed one in the sky.  Seeing both allows us a whole new set of perspectives; one plus one is often much more than two.

Over time — hours and days — effects were seen in visible light, ultraviolet light, infrared light, X-rays and radio waves.  Some were seen earlier than others, which itself is a story, but each one contributes to our understanding of what these mergers are actually like.

Confirmation of the best guess concerning the origin of “short” gamma ray bursts:  For many years, bursts of gamma rays have been observed in the sky.  Among them, there seems to be a class of bursts that are shorter than most, typically lasting just a couple of seconds.  They come from all across the sky, indicating that they come from distant intergalactic space, presumably from distant galaxies.  Among other explanations, the most popular hypothesis concerning these short gamma-ray bursts has been that they come from merging neutron stars.  The only way to confirm this hypothesis is with the observation of the gravitational waves from such a merger.  That test has now been passed; it appears that the hypothesis is correct.  That in turn means that we have, for the first time, both a good explanation of these short gamma ray bursts and, because we know how often we observe these bursts, a good estimate as to how often neutron stars merge in the universe.

First distance measurement to a source using both a gravitational wave measure and a redshift in electromagnetic waves, allowing a new calibration of the distance scale of the universe and of its expansion rate:  The pattern over time of the gravitational waves from a merger of two black holes or neutron stars is complex enough to reveal many things about the merging objects, including a rough estimate of their masses and the orientation of the spinning pair relative to the Earth.  The overall strength of the waves, combined with the knowledge of the masses, reveals how far the pair is from the Earth.  That by itself is nice, but the real win comes when the discovery of the object using visible light, or in fact any light with frequency below gamma-rays, can be made.  In this case, the galaxy that contains the neutron stars can be determined.

Once we know the host galaxy, we can do something really important.  We can, by looking at the starlight, determine how rapidly the galaxy is moving away from us.  For distant galaxies, the speed at which the galaxy recedes should be related to its distance because the universe is expanding.

How rapidly the universe is expanding has been recently measured with remarkable precision, but the problem is that there are two different methods for making the measurement, and they disagree.   This disagreement is one of the most important problems for our understanding of the universe.  Maybe one of the measurement methods is flawed, or maybe — and this would be much more interesting — the universe simply doesn’t behave the way we think it does.

What gravitational waves do is give us a third method: the gravitational waves directly provide the distance to the galaxy, and the electromagnetic waves directly provide the speed of recession.  There is no other way to make this type of joint measurement directly for distant galaxies.  The method is not accurate enough to be useful in just one merger, but once dozens of mergers have been observed, the average result will provide important new information about the universe’s expansion.  When combined with the other methods, it may help resolve this all-important puzzle.

Best test so far of Einstein’s prediction that the speed of light and the speed of gravitational waves are identical: Since gamma rays from the merger and the peak of the gravitational waves arrived within two seconds of one another after traveling 130 million years — that is, about 5 thousand million million seconds — we can say that the speed of light and the speed of gravitational waves are both equal to the cosmic speed limit to within one part in 2 thousand million million.  Such a precise test requires the combination of gravitational wave and gamma ray observations.

Efficient production of heavy elements confirmed:  It’s long been said that we are star-stuff, or stardust, and it’s been clear for a long time that it’s true.  But there’s been a puzzle when one looks into the details.  While it’s known that all the chemical elements from hydrogen up to iron are formed inside of stars, and can be blasted into space in supernova explosions to drift around and eventually form planets, moons, and humans, it hasn’t been quite as clear how the other elements with heavier atoms — atoms such as iodine, cesium, gold, lead, bismuth, uranium and so on — predominantly formed.  Yes they can be formed in supernovas, but not so easily; and there seem to be more atoms of heavy elements around the universe than supernovas can explain.  There are many supernovas in the history of the universe, but the efficiency for producing heavy chemical elements is just too low.

It was proposed some time ago that the mergers of neutron stars might be a suitable place to produce these heavy elements.  Even those these mergers are rare, they might be much more efficient, because the nuclei of heavy elements contain lots of neutrons and, not surprisingly, a collision of two neutron stars would produce lots of neutrons in its debris, suitable perhaps for making these nuclei.   A key indication that this is going on would be the following: if a neutron star merger could be identified using gravitational waves, and if its location could be determined using telescopes, then one would observe a pattern of light that would be characteristic of what is now called a “kilonova” explosion.   Warning: I don’t yet know much about kilonovas and I may be leaving out important details. A kilonova is powered by the process of forming heavy elements; most of the nuclei produced are initially radioactive — i.e., unstable — and they break down by emitting high energy particles, including the particles of light (called photons) which are in the gamma ray and X-ray categories.  The resulting characteristic glow would be expected to have a pattern of a certain type: it would be initially bright but would dim rapidly in visible light, with a long afterglow in infrared light.  The reasons for this are complex, so let me set them aside for now.  The important point is that this pattern was observed, confirming that a kilonova of this type occurred, and thus that, in this neutron star merger, enormous amounts of heavy elements were indeed produced.  So we now have a lot of evidence, for the first time, that almost all the heavy chemical elements on and around our planet were formed in neutron star mergers.  Again, we could not know this if we did not know that this was a neutron star merger, and that information comes only from the gravitational wave observation.

MISCELLANEOUS QUESTIONS

Did the merger of these two neutron stars result in a new black hole, a larger neutron star, or an unstable rapidly spinning neutron star that later collapsed into a black hole?

We don’t yet know, and maybe we won’t know.  Some scientists involved appear to be leaning toward the possibility that a black hole was formed, but others seem to say the jury is out.  I’m not sure what additional information can be obtained over time about this.

If the two neutron stars formed a black hole, why was there a kilonova?  Why wasn’t everything sucked into the black hole?

Black holes aren’t vacuum cleaners; they pull things in via gravity just the same way that the Earth and Sun do, and don’t suck things in some unusual way.  The only crucial thing about a black hole is that once you go in you can’t come out.  But just as when trying to avoid hitting the Earth or Sun, you can avoid falling in if you orbit fast enough or if you’re flung outward before you reach the edge.

The point in a neutron star merger is that the forces at the moment of merger are so intense that one or both neutron stars are partially ripped apart.  The material that is thrown outward in all directions, at an immense speed, somehow creates the bright, hot flash of gamma rays and eventually the kilonova glow from the newly formed atomic nuclei.  Those details I don’t yet understand, but I know they have been carefully studied both with approximate equations and in computer simulations such as this one and this one.  However, the accuracy of the simulations can only be confirmed through the detailed studies of a merger, such as the one just announced.  It seems, from the data we’ve seen, that the simulations did a fairly good job.  I’m sure they will be improved once they are compared with the recent data.

 

 

 


Filed under: Astronomy, Gravitational Waves Tagged: black holes, Gravitational Waves, LIGO, neutron stars

Matt StrasslerLIGO and VIRGO Announce a Joint Observation of a Black Hole Merger

Welcome, VIRGO!  Another merger of two big black holes has been detected, this time by both LIGO’s two detectors and by VIRGO as well.

Aside from the fact that this means that the VIRGO instrument actually works, which is great news, why is this a big deal?  By adding a third gravitational wave detector, built by the VIRGO collaboration, to LIGO’s Washington and Louisiana detectors, the scientists involved in the search for gravitational waves now can determine fairly accurately the direction from which a detected gravitational wave signal is coming.  And this allows them to do something new: to tell their astronomer colleagues roughly where to look in the sky, using ordinary telescopes, for some form of electromagnetic waves (perhaps visible light, gamma rays, or radio waves) that might have been produced by whatever created the gravitational waves.

The point is that with three detectors, one can triangulate.  The gravitational waves travel for billions of years, traveling at the speed of light, and when they pass by, they are detected at both LIGO detectors and at VIRGO.  But because it takes light a few thousandths of a second to travel the diameter of the Earth, the waves arrive at slightly different times at the LIGO Washington site, the LIGO Louisiana site, and the VIRGO site in Italy.  The precise timing tells the scientists what direction the waves were traveling in, and therefore roughly where they came from.  In a similar way, using the fact that sound travels at a known speed, the times that a gunshot is heard at multiple locations can be used by police to determine where the shot was fired.

You can see the impact in the picture below, which is an image of the sky drawn as a sphere, as if seen from outside the sky looking in.  In previous detections of black hole mergers by LIGO’s two detectors, the scientists could only determine a large swath of sky where the observed merger might have occurred; those are the four colored regions that stretch far across the sky.  But notice the green splotch at lower left.  That’s the region of sky where the black hole merger announced today occurred.  The fact that this region is many times smaller than the other four reflects what including VIRGO makes possible.  It’s a small enough region that one can search using an appropriate telescope for something that is making visible light, or gamma rays, or radio waves.

Skymap of the LIGO/Virgo black hole mergers.

Image credit: LIGO/Virgo/Caltech/MIT/Leo Singer (Milky Way image: Axel Mellinger)

 

While a black hole merger isn’t expected to be observable by other telescopes, and indeed nothing was observed by other telescopes this time, other events that LIGO might detect, such as a merger of two neutron stars, may create an observable effect. We can hope for such exciting news over the next year or two.


Filed under: Astronomy, Gravitational Waves Tagged: black holes, Gravitational Waves, LIGO

David Hoggdiscovery! submission!

It was an important day for physics: The LIGO/VIRGO collaboration and a huge group of astronomical observational facilities and teams announced the discovery of a neutron-star–neutron-star binary inspiral. It has all the properties it needs to have to be the source of r-process elements, as the theorists have been telling us it would. Incredible. And a huge win for everyone involved. Lots of questions remain (for me, anyway) about the 2-s delay between GW and EM, and about the confidence with which we can say we are seeing the r process!

It was also an unusual day for me: After working a long session on the weekend, Dan Foreman-Mackey (Flatiron) and I finished our pedagogical document about MCMC sampling. I ended the day by posting it to arXiv and submitting it (although this seems insane) to a special issue of the ApJ. I don't write many first-author publications, so this was a very, very good day.

David Hoggcalibration of ZTF; interpolation

I am loving the Friday-morning parallel working sessions in my office. I am not sure that anyone else is getting anything out of them! Today Anna Ho (Caltech) and I discussed things in my work on calibration and data-driven models (two extremely closely related subjects) that might be of use to the ZTF and SEDM projects going on at Caltech.

Late in the morning, an argument broke out about using deep learning to interpolate model grids. Many projects are doing this, and it is interesting (and odd) to me that you would choose a hard-to-control deep network when you could use an easy-to-control function space (like a Gaussian Process, stationary or non-stationary). But the deep-learning toothpaste is hard to put back into the tube! That said, it does have its uses. One of my medium-term goals is to write something about what those uses are.

Terence TaoOdd order cases of the logarithmically averaged Chowla conjecture

Joni Teräväinen and I have just uploaded to the arXiv our paper “Odd order cases of the logarithmically averaged Chowla conjecture“, submitted to J. Numb. Thy. Bordeaux. This paper gives an alternate route to one of the main results of our previous paper, and more specifically reproves the asymptotic

\displaystyle \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o(\log x) \ \ \ \ \ (1)

 

for all odd {k} and all integers {h_1,\dots,h_k} (that is to say, all the odd order cases of the logarithmically averaged Chowla conjecture). Our previous argument relies heavily on some deep ergodic theory results of Bergelson-Host-Kra, Leibman, and Le (and was applicable to more general multiplicative functions than the Liouville function {\lambda}); here we give a shorter proof that avoids ergodic theory (but instead requires the Gowers uniformity of the (W-tricked) von Mangoldt function, established in several papers of Ben Green, Tamar Ziegler, and myself). The proof follows the lines sketched in the previous blog post. In principle, due to the avoidance of ergodic theory, the arguments here have a greater chance to be made quantitative; however, at present the known bounds on the Gowers uniformity of the von Mangoldt function are qualitative, except at the {U^2} level, which is unfortunate since the first non-trivial odd case {k=3} requires quantitative control on the {U^3} level. (But it may be possible to make the Gowers uniformity bounds for {U^3} quantitative if one assumes GRH, although when one puts everything together, the actual decay rate obtained in (1) is likely to be poor.)


Filed under: math.NT, paper Tagged: Chowla conjecture, Joni Teravainen

Tommaso DorigoMerging Neutron Stars: Why It's A Breakthrough, And Why You Should Stand Your Ground

Like many others, I listened to yesterday's (10/16/17) press release at the NSF without a special prior insight in the physics of neutron star mergers, or in the details of the measurements we can extract from the many observations that the detected event made possible. My knowledge of astrophysics is quite incomplete and piecemeal, so in some respects I could be considered a "layman" listening to a science outreach seminar.

Yet, of course, as a physicist I have a good basic understanding of the processes at the heart of the radiation emissions that took place two hundred million years ago in that faint, otherwise unconspicuous galaxy in Hydra. 

read more

David Hoggage-velocity; finishing

I had a great, long call with Jonathan Bird (Vandy) to discuss his nearly-finished paper on the age–velocity relation of stars in the Gaia DR1 data. We discussed the addition of an old, hot population, in addition to the population that shows the age–velocity relation. That's a good idea, and accords with our beliefs, hence even gooder.

I spent the rest of my research time today working through the text of Dan Foreman-Mackey (Flatiron) and my MCMC tutorial. We are trying to finish it this week (after five-ish years)!

Steinn SigurðssonGW170817


The list of all the preprints from the LIGO/Virgo Collaboration and the ElectroMagnetic counterpart followup collaboration, which were sent to the arXiv is now available at:

arXiv blog: GW170817


October 16, 2017

Matt StrasslerA Scientific Breakthrough! Combining Gravitational and Electromagnetic Waves

Gravitational waves are now the most important new tool in the astronomer’s toolbox.  Already they’ve been used to confirm that large black holes — with masses ten or more times that of the Sun — and mergers of these large black holes to form even larger ones, are not uncommon in the universe.   Today it goes a big step further.

It’s long been known that neutron stars, remnants of collapsed stars that have exploded as supernovas, are common in the universe.  And it’s been known almost as long that sometimes neutron stars travel in pairs.  (In fact that’s how gravitational waves were first discovered, indirectly, back in the 1970s.)  Stars often form in pairs, and sometimes both stars explode as supernovas, leaving their neutron star relics in orbit around one another.  Neutron stars are small — just ten or so kilometers (miles) across.  According to Einstein’s theory of gravity, a pair of stars should gradually lose energy by emitting gravitational waves into space, and slowly but surely the two objects should spiral in on one another.   Eventually, after many millions or even billions of years, they collide and merge into a larger neutron star, or into a black hole.  This collision does two things.

  1. It makes some kind of brilliant flash of light — electromagnetic waves — whose details are only guessed at.  Some of those electromagnetic waves will be in the form of visible light, while much of it will be in invisible forms, such as gamma rays.
  2. It makes gravitational waves, whose details are easier to calculate and which are therefore distinctive, but couldn’t have been detected until LIGO and VIRGO started taking data, LIGO over the last couple of years, VIRGO over the last couple of months.

It’s possible that we’ve seen the light from neutron star mergers before, but no one could be sure.  Wouldn’t it be great, then, if we could see gravitational waves AND electromagnetic waves from a neutron star merger?  It would be a little like seeing the flash and hearing the sound from fireworks — seeing and hearing is better than either one separately, with each one clarifying the other.  (Caution: scientists are often speaking as if detecting gravitational waves is like “hearing”.  This is only an analogy, and a vague one!  It’s not at all the same as acoustic waves that we can hear with our ears, for many reasons… so please don’t take it too literally.)  If we could do both, we could learn about neutron stars and their properties in an entirely new way.

Today, we learned that this has happened.  LIGO , with the world’s first two gravitational observatories, detected the waves from two merging neutron stars, 130 million light years from Earth, on August 17th.  (Neutron star mergers last much longer than black hole mergers, so the two are easy to distinguish; and this one was so close, relatively speaking, that it was seen for a long while.)  VIRGO, with the third detector, allows scientists to triangulate and determine roughly where mergers have occurred.  They saw only a very weak signal, but that was extremely important, because it told the scientists that the merger must have occurred in a small region of the sky where VIRGO has a relative blind spot.  That told scientists where to look.

The merger was detected for more than a full minute… to be compared with black holes whose mergers can be detected for less than a second.  It’s not exactly clear yet what happened at the end, however!  Did the merged neutron stars form a black hole or a neutron star?  The jury is out.

At almost exactly the moment at which the gravitational waves reached their peak, a blast of gamma rays — electromagnetic waves of very high frequencies — were detected by a different scientific team, the one from FERMI. FERMI detects gamma rays from the distant universe every day, and a two-second gamma-ray-burst is not unusual.  And INTEGRAL, another gamma ray experiment, also detected it.   The teams communicated within minutes.   The FERMI and INTEGRAL gamma ray detectors can only indicate the rough region of the sky from which their gamma rays originate, and LIGO/VIRGO together also only give a rough region.  But the scientists saw those regions overlapped.  The evidence was clear.  And with that, astronomy entered a new, highly anticipated phase.

Already this was a huge discovery.  Brief gamma-ray bursts have been a mystery for years.  One of the best guesses as to their origin has been neutron star mergers.  Now the mystery is solved; that guess is apparently correct. (Or is it?  Probably, but the gamma ray discovery is surprisingly dim, given how close it is.  So there are still questions to ask.)

Also confirmed by the fact that these signals arrived within a couple of seconds of one another, after traveling for over 100 million years from the same source, is that, indeed, the speed of light and the speed of gravitational waves are exactly the same — both of them equal to the cosmic speed limit, just as Einstein’s theory of gravity predicts.

Next, these teams quickly told their astronomer friends to train their telescopes in the general area of the source. Dozens of telescopes, from every continent and from space, and looking for electromagnetic waves at a huge range of frequencies, pointed in that rough direction and scanned for anything unusual.  (A big challenge: the object was near the Sun in the sky, so it could be viewed in darkness only for an hour each night!) Light was detected!  At all frequencies!  The object was very bright, making it easy to find the galaxy in which the merger took place.  The brilliant glow was seen in gamma rays, ultraviolet light, infrared light, X-rays, and radio.  (Neutrinos, particles that can serve as another way to observe distant explosions, were not detected this time.)

And with so much information, so much can be learned!

Most important, perhaps, is this: from the pattern of the spectrum of light, the conjecture seems to be confirmed that the mergers of neutron stars are important sources, perhaps the dominant one, for many of the heavy chemical elements — iodine, iridium, cesium, gold, platinum, and so on — that are forged in the intense heat of these collisions.  It used to be thought that the same supernovas that form neutron stars in the first place were the most likely source.  But now it seems that this second stage of neutron star life — merger, rather than birth — is just as important.  That’s fascinating, because neutron star mergers are much more rare than the supernovas that form them.  There’s a supernova in our Milky Way galaxy every century or so, but it’s tens of millenia or more between these “kilonovas”, created in neutron star mergers.

If there’s anything disappointing about this news, it’s this: almost everything that was observed by all these different experiments was predicted in advance.  Sometimes it’s more important and useful when some of your predictions fail completely, because then you realize how much you have to learn.  Apparently our understanding of gravity, of neutron stars, and of their mergers, and of all sorts of sources of electromagnetic radiation that are produced in those merges, is even better than we might have thought. But fortunately there are a few new puzzles.  The X-rays were late; the gamma rays were dim… we’ll hear more about this shortly, as NASA is holding a second news conference.

Some highlights from the second news conference:

  • New information about neutron star interiors, which affects how large they are and therefore how exactly they merge, has been obtained
  • The first ever visual-light image of a gravitational wave source, from the Swope telescope, at the outskirts of a distant galaxy; the galaxy’s center is the blob of light, and the arrow points to the explosion.

  • The theoretical calculations for a kilonova explosion suggest that debris from the blast should rather quickly block the visual light, so the explosion dims quickly in visible light — but infrared light lasts much longer.  The observations by the visible and infrared light telescopes confirm this aspect of the theory; and you can see evidence for that in the picture above, where four days later the bright spot is both much dimmer and much redder than when it was discovered.
  • Estimate: the total mass of the gold and platinum produced in this explosion is vastly larger than the mass of the Earth.
  • Estimate: these neutron stars were formed about 10 or so billion years ago.  They’ve been orbiting each other for most of the universe’s history, and ended their lives just 130 million years ago, creating the blast we’ve so recently detected.
  • Big Puzzle: all of the previous gamma-ray bursts seen up to now have always had shone in ultraviolet light and X-rays as well as gamma rays.   But X-rays didn’t show up this time, at least not initially.  This was a big surprise.  It took 9 days for the Chandra telescope to observe X-rays, too faint for any other X-ray telescope.  Does this mean that the two neutron stars created a black hole, which then created a jet of matter that points not quite directly at us but off-axis, and shines by illuminating the matter in interstellar space?  This had been suggested as a possibility twenty years ago, but this is the first time there’s been any evidence for it.
  • One more surprise: it took 16 days for radio waves from the source to be discovered, with the Very Large Array, the most powerful existing radio telescope.  The radio emission has been growing brighter since then!  As with the X-rays, this seems also to support the idea of an off-axis jet.
  • Nothing quite like this gamma-ray burst has been seen — or rather, recognized — before.  When a gamma ray burst doesn’t have an X-ray component showing up right away, it simply looks odd and a bit mysterious.  Its harder to observe than most bursts, because without a jet pointing right at us, its afterglow fades quickly.  Moreover, a jet pointing at us is bright, so it blinds us to the more detailed and subtle features of the kilonova.  But this time, LIGO/VIRGO told scientists that “Yes, this is a neutron star merger”, leading to detailed study from all electromagnetic frequencies, including patient study over many days of the X-rays and radio.  In other cases those observations would have stopped after just a short time, and the whole story couldn’t have been properly interpreted.

 

 


Filed under: Astronomy, Gravitational Waves

Sean CarrollStandard Sirens

Everyone is rightly excited about the latest gravitational-wave discovery. The LIGO observatory, recently joined by its European partner VIRGO, had previously seen gravitational waves from coalescing black holes. Which is super-awesome, but also a bit lonely — black holes are black, so we detect the gravitational waves and little else. Since our current gravitational-wave observatories aren’t very good at pinpointing source locations on the sky, we’ve been completely unable to say which galaxy, for example, the events originated in.

This has changed now, as we’ve launched the era of “multi-messenger astronomy,” detecting both gravitational and electromagnetic radiation from a single source. The event was the merger of two neutron stars, rather than black holes, and all that matter coming together in a giant conflagration lit up the sky in a large number of wavelengths simultaneously.

Look at all those different observatories, and all those wavelengths of electromagnetic radiation! Radio, infrared, optical, ultraviolet, X-ray, and gamma-ray — soup to nuts, astronomically speaking.

A lot of cutting-edge science will come out of this, see e.g. this main science paper. Apparently some folks are very excited by the fact that the event produced an amount of gold equal to several times the mass of the Earth. But it’s my blog, so let me highlight the aspect of personal relevance to me: using “standard sirens” to measure the expansion of the universe.

We’re already pretty good at measuring the expansion of the universe, using something called the cosmic distance ladder. You build up distance measures step by step, determining the distance to nearby stars, then to more distant clusters, and so forth. Works well, but of course is subject to accumulated errors along the way. This new kind of gravitational-wave observation is something else entirely, allowing us to completely jump over the distance ladder and obtain an independent measurement of the distance to cosmological objects. See this LIGO explainer.

The simultaneous observation of gravitational and electromagnetic waves is crucial to this idea. You’re trying to compare two things: the distance to an object, and the apparent velocity with which it is moving away from us. Usually velocity is the easy part: you measure the redshift of light, which is easy to do when you have an electromagnetic spectrum of an object. But with gravitational waves alone, you can’t do it — there isn’t enough structure in the spectrum to measure a redshift. That’s why the exploding neutron stars were so crucial; in this event, GW170817, we can for the first time determine the precise redshift of a distant gravitational-wave source.

Measuring the distance is the tricky part, and this is where gravitational waves offer a new technique. The favorite conventional strategy is to identify “standard candles” — objects for which you have a reason to believe you know their intrinsic brightness, so that by comparing to the brightness you actually observe you can figure out the distance. To discover the acceleration of the universe, for example,  astronomers used Type Ia supernovae as standard candles.

Gravitational waves don’t quite give you standard candles; every one will generally have a different intrinsic gravitational “luminosity” (the amount of energy emitted). But by looking at the precise way in which the source evolves — the characteristic “chirp” waveform in gravitational waves as the two objects rapidly spiral together — we can work out precisely what that total luminosity actually is. Here’s the chirp for GW170817, compared to the other sources we’ve discovered — much more data, almost a full minute!

So we have both distance and redshift, without using the conventional distance ladder at all! This is important for all sorts of reasons. An independent way of getting at cosmic distances will allow us to measure properties of the dark energy, for example. You might also have heard that there is a discrepancy between different ways of measuring the Hubble constant, which either means someone is making a tiny mistake or there is something dramatically wrong with the way we think about the universe. Having an independent check will be crucial in sorting this out. Just from this one event, we are able to say that the Hubble constant is 70 kilometers per second per megaparsec, albeit with large error bars (+12, -8 km/s/Mpc). That will get much better as we collect more events.

So here is my (infinitesimally tiny) role in this exciting story. The idea of using gravitational-wave sources as standard sirens was put forward by Bernard Schutz all the way back in 1986. But it’s been developed substantially since then, especially by my friends Daniel Holz and Scott Hughes. Years ago Daniel told me about the idea, as he and Scott were writing one of the early papers. My immediate response was “Well, you have to call these things `standard sirens.'” And so a useful label was born.

Sadly for my share of the glory, my Caltech colleague Sterl Phinney also suggested the name simultaneously, as the acknowledgments to the paper testify. That’s okay; when one’s contribution is this extremely small, sharing it doesn’t seem so bad.

By contrast, the glory attaching to the physicists and astronomers who pulled off this observation, and the many others who have contributed to the theoretical understanding behind it, is substantial indeed. Congratulations to all of the hard-working people who have truly opened a new window on how we look at our universe.

Steinn SigurðssonEnds with a Bang


as you know, the LIGO/Virgo collaboration discovered a splendid thing

at long last the source we all expected LIGO to actually detect, for sure like, was actually observed,
and a real wowser it was

not just a coalescing binary neutron star, but one right in our backyard, ridiculously bright, and with a gorgeous all band electro-magnetic counterpart

for once theorists were right, there were γ-rays, x-rays, jets, optical and infra-red glows, radio emission and, apparently, a significant chunk of high atomic number metals, including a lot
of r-process elements, ejected from outer layers of the neutron star debris as the bulk of the neutron star mass merged

it was a team effort, rather jolly, and I expect it may earn some people a trip or two to Stockholm, if The Academy is its usual sensible self, that is.

As part of the flap, Josh Bloom and I wrote a "perspective" piece on the discovery,
the link for which is appended, for your perusal

A cosmic multimessenger gold rush




October 15, 2017

Doug NatelsonGravitational waves again - should be exciting

There is going to be a big press conference tomorrow, apparently to announce that LIGO/VIRGO has seen an event (binary neutron star collision) directly associated with a gamma ray burst in NGC 4993.  Fun stuff, and apparently the worst-kept secret in science right now.  This may seem off-topic for a condensed matter blog, but there's physics in there which isn't broadly appreciated, and I'll write a bit about it after the announcement.

October 14, 2017

Richard EastherThe Unforgivable Curses

Fabrication and plagiarism are the unforgivable curses of science – crimes of no return. If you are caught committing them you will not wind up in an academic Azkaban but you would be hard put to find another job in a university as a parking warden, much less a research role. Ironically, to outsiders these infractions may appear to be relatively victimless crimes. Do a few faked graphs really hurt anyone? If music can be downloaded with impunity is plagiarism a terrible sin? However, these transgressions are unforgivable because they undermine the integrity of the system, not as a result of their impact on individuals. An expert counterfeiter whose bogus Benjamins are never spotted by banks might claim that no-one was hurt by their escapades, but financial systems can be damaged by a flood of fake notes. Likewise, we trust the integrity of our colleagues when we build on their work. We tell students to "check everything" but this is an impossible goal, since at some point you would do nothing but verify the work of others, so dishonesty undermines science just as debased coinage threatens an economy. 

Last month the American Geophysical Union revised its understanding of "scientific misconduct", a term encompassesing plagiarism and data-faking, to explicitly include a new category of crime – discrimination, sexual harassment, and bullying. These are transgressions against individuals, but the AGU's decision recognises that they weaken science itself; systematically burdening those who are disproportionately on the receiving end of "poor behaviour", blighting lives and careers, boosting inequality, and robbing the field of talent. This recognises that many female geoscientists experience harassment or worse in the field, often while they are physically isolated and far from help. Just last week sickening allegations of bullying and assault during trips to remote Antarctic valleys were levelled against David Marchant, a geoscientist at Boston University. It was telling that while many of the worst allegations were corroborated by others, some of Marchant's defenders pointed out that they themselves had never witnessed such behaviour by Marchant and that these infractions were "historical", with no recent allegations of misconduct coming to light. However, if this had been a case of data-faking there would be no ill-defined statute of limitations and "some of his work is legitimate" would in no way constitute a defence.

A similar paradox was visible last year, thanks to a defamation case pursued by astrophysicist Mike Bode against Carole Mundell, who intervened after he wrote a glowing letter of recommendation for a mutual colleague facing an active harassment investigation. The claim that Mundell had defamed Bode by this action was witheringly rejected but I cannot imagine anyone writing a letter of reference – much less a good one – for a person facing live allegations of intellectual misconduct. Moreover, the position in question was Chief Scientist in the Square Kilometre Array - South Africa, an organisation which will have a key role supporting South Africa's engagement with a multi-hundred million dollar international collaboration involving vast amounts of public money from a half-dozen countries. This hire fell through, but the job was later filled by a scientist who had left (and was apparently "dismissed" from) a leadership position at the Arecibo Observatory while "under a cloud", demonstrating just how hard it can be for a senior scientist to definitively torpedo his own career.* [This situation may also lead one to draw inferences about the institutional health of SKA-South Africa, but that is another matter.]

This is the same month that the Harvey Weinstein story broke and his serial sexual assaults appear to have been an open secret within the entertainment industry. Despite this, any number of male stars who had benefited from their association with Weinstein were shocked, shocked to hear the news, or to think that their industry may suffer from an endemic harassment problem. And while we lack Hollywood's glamour, senior academics have a similar ability to make or break the careers of young people vying for their big chance, and it is similarly a breeding ground for abusive behaviour. Likewise, just as many men averred that they had personally never seen poor behaviour by Weinstein, many scientists assert that because they have never witnessed a colleague harass that person cannot be an harasser – a stunning lack of logic for people who spend their professional lives drawing inferences about events we cannot hope to witness with our own eyes. Likewise, we all know that many senior harassers in science are yet to have their "Geoff Marcy moment" and, like Weinstein, some are rumoured to have agressively lawyered up to keep a lid on simmering scandals.

If this is our truth, all scientists – and particular all white men in science whose progress has never been potentially impeded by our gender, our race, or our sexual identity – are complicit in having allowed it to happen. And it is on us to sort it out. 



FOOTNOTE: *To be clear, the investigation of the original applicant was apparently never completed so these allegations were never formally substantiated. However, published sources refer to them as "sexual assault" so it does appear that they were of a serious nature. Likewise, the specific employment issues faced by the person now in the role have not been fully disclosed.

COMMENTS: Comments off on this one. Getting way more than normal and they didn't seem to be heading anywhere constructive.

October 13, 2017

Scott AaronsonNot the critic who counts

There’s a website called Stop Timothy Gowers! !!! —yes, that’s the precise name, including the exclamation points.  The site is run by a mathematician who for years went under the pseudonym “owl / sowa,” but who’s since outed himself as Nikolai Ivanov.

For those who don’t know, Sir Timothy Gowers is a Fields Medalist, known for seminal contributions including the construction of Banach spaces with strange properties, the introduction of the Gowers norm, explicit bounds for the regularity lemma, and more—but who’s known at least as well for explaining math, in his blog, books, essays, MathOverflow, and elsewhere, in a remarkably clear, friendly, and accessible way.  He’s also been a leader in the fight to free academia from predatory publishers.

So why on earth would a person like that need to be stopped?  According to sowa, because Gowers, along with other disreputable characters like Terry Tao and Endre Szemerédi and the late Paul Erdös, represents a dangerous style of doing mathematics: a style that’s just as enamored of concrete problems as it is of abstract theory-building, and that doesn’t even mind connections to other fields like theoretical computer science.  If that style becomes popular with young people, it will prevent faculty positions and prestigious prizes from going to the only deserving kind of mathematics: the kind exemplified by Bourbaki and by Alexander Grothendieck, which builds up theoretical frameworks with principled disdain for the solving of simple-to-state problems.  Mathematical prizes going to the wrong people—or even going to the right people but presented by the wrong people—are constant preoccupations of sowa’s.  Read his blog and let me know if I’ve unfairly characterized it.


Now for something totally unrelated.  I recently discovered a forum on Reddit called SneerClub, which, as its name suggests, is devoted to sneering.  At whom?  Basically, at anyone who writes anything nice about nerds or Silicon Valley, or who’s associated with the “rationalist community,” or the Effective Altruist movement, or futurism or AI risk.  Typical targets include Scott Alexander, Eliezer Yudkowsky, Robin Hanson, Michael Vassar, Julia Galef, Paul Graham, Ray Kurzweil, Elon Musk … and with a list like that, I guess I should be honored to be a regular target too.

The basic SneerClub M.O. is to seize on a sentence that, when ripped from context and reflected through enough hermeneutic funhouse mirrors, can make nerds out to look like right-wing villains, oppressing the downtrodden with rays of disgusting white maleness (even, it seems, ones who aren’t actually white or male).  So even if the nerd under discussion turns out to be, say, a leftist or a major donor to anti-Trump causes or malaria prevention or whatever, readers can feel reassured that their preexisting contempt was morally justified after all.

Thus: Eliezer Yudkowsky once wrote a piece of fiction in which a character, breaking the fourth wall, comments that another character seems to have no reason to be in the story.  This shows that Eliezer is a fascist who sees people unlike himself as having no reason to exist, and who’d probably exterminate them if he could.  Or: many rationalist nerds spend a lot of effort arguing against Trumpists, alt-righters, and neoreactionaries.  The fact that they interact with those people, in order to rebut them, shows that they’re probably closet neoreactionaries themselves.


When I browse sites like “Stop Timothy Gowers! !!!” or SneerClub, I tend to get depressed about the world—and yet I keep browsing, out of a fascination that I don’t fully understand.  I ask myself: how can a person read Gowers’s blog, or Slate Star Codex, without seeing what I see, which is basically luminous beacons of intellectual honesty and curiosity and clear thought and sparkling prose and charity to dissenting views, shining out far across the darkness of online discourse?

(Incidentally, Gowers lists “Stop Timothy Gowers! !!!” in his blogroll, and I likewise learned of SneerClub only because Scott Alexander linked to it.)

I’m well aware that this very question will only prompt more sneers.  From the sneerers’ perspective, they and their friends are the beacons, while Gowers or Scott Alexander are the darkness.  How could a neutral observer possibly decide who was right?

But then I reflect that there’s at least one glaring asymmetry between the sides.

If you read Timothy Gowers’s blog, one thing you’ll constantly notice is mathematics.  When he’s not weighing in on current events—for example, writing against Brexit, Elsevier, or the destruction of a math department by cost-cutting bureaucrats—Gowers is usually found delighting in exploring a new problem, or finding a new way to explain a known result.  Often, as with his dialogue with John Baez and others about the recent “p=t” breakthrough, Gowers is struggling to understand an unfamiliar piece of mathematics—and, completely unafraid of looking like an undergrad rather than a Fields Medalist, he simply shares each step of his journey, mistakes and all, inviting you to follow for as long as you can keep up.  Personally, I find it electrifying: why can’t all mathematicians write like that?

By contrast, when you read sowa’s blog, for all the anger about the sullying of mathematics by unworthy practitioners, there’s a striking absence of mathematical exposition.  Not once does sowa ever say: “OK, forget about the controversy.  Since you’re here, instead of just telling you about the epochal greatness of Grothendieck, let me walk you through an example.  Let me share a beautiful little insight that came out of his approach, in so self-contained a way that even a physicist or computer scientist will understand it.”  In other words, sowa never uses his blog to do what Gowers does every day.  Sowa might respond that that’s what papers are for—but the thing about a blog is that it gives you the chance to reach a much wider readership than your papers do.  If someone is already blogging anyway, why wouldn’t they seize that chance to share something they love?

Similar comments apply to Slate Star Codex versus r/SneerClub.  When I read an SSC post, even if I vehemently disagree with the central thesis (which, yes, happens sometimes), I always leave the diner intellectually sated.  For the rest of the day, my brain is bloated with new historical tidbits, or a deep-dive into the effects of a psychiatric drug I’d never heard of, or a jaw-dropping firsthand account of life as a medical resident, or a different way to think about a philosophical problem—or, if nothing else, some wicked puns and turns of phrase.

But when I visit r/SneerClub—well, I get exactly what’s advertised on the tin.  Once you’ve read a few, the sneers become pretty predictable.  I thought that for sure, I’d occasionally find something like: “look, we all agree that Eliezer Yudkowsky and Elon Musk and Nick Bostrom are talking out their asses about AI, and are coddled white male emotional toddlers to boot.  But even granting that, what do we think about AI?  Are intelligences vastly smarter than humans possible?  If not, then what principle rules them out?  What, if anything, can be said about what a superintelligent being would do, or want?  Just for fun, let’s explore this a little: I mean the actual questions themselves, not the psychological reasons why others explore them.”

That never happens.  Why not?


There’s another fascinating Reddit forum called “RoastMe”, where people submit a photo of themselves holding a sign expressing their desire to be “roasted”—and then hundreds of Redditors duly oblige, savagely mocking the person’s appearance and anything else they can learn about the person from their profile.  Many of the roasts are so merciless that one winces vicariously for the poor schmucks who signed up for this, hopes that they won’t be driven to self-harm or suicide.  But browse enough roasts, and a realization starts to sink in: there’s no person, however beautiful or interesting they might’ve seemed a priori, for whom this roasting can’t be accomplished.  And that very generality makes the roasting lose much of its power—which maybe, optimistically, was the point of the whole exercise?

In the same way, spend a few days browsing SneerClub, and the truth hits you: once you’ve made their enemies list, there’s nothing you could possibly say or do that they wouldn’t sneer at.  Like, say it’s a nice day outside, and someone will reply:

“holy crap how much of an entitled nerdbro do you have to be, to erase all the marginalized people for whom the day is anything but ‘nice’—or who might be unable to go outside at all, because of limited mobility or other factors never even considered in these little rich white boys’ geek utopia?”

For me, this realization is liberating.  If appeasement of those who hate you is doomed to fail, why bother even embarking on it?


I’ve spent a lot of time on this blog criticizing D-Wave, and cringeworthy popular articles about quantum computing, and touted arXiv preprints that say wrong things.  But I hope regular readers feel like I’ve also tried to offer something positive: y’know, actual progress in quantum computing that actually excites me, or a talk about big numbers, or an explanation of the Bekenstein bound, whatever.  My experience with sites like “Stop Timothy Gowers! !!!” and SneerClub makes me feel like I ought to be doing less criticizing and more positive stuff.

Why, because I fear turning into a sneerer myself?  No, it’s subtler than that: because reading the sneerers drives home for me that it’s a fool’s quest to try to become what Scott Alexander once called an “apex predator of the signalling world.”

At the risk of stating the obvious: if you write, for example, that Richard Feynman was a self-aggrandizing chauvinist showboater, then even if your remarks have a nonzero inner product with the truth, you don’t thereby “transcend” Feynman and stand above him, in the same way that set theory transcends and stands above arithmetic by constructing a model for it.  Feynman’s achievements don’t thereby become your achievements.

When I was in college, I devoured Ray Monk’s two-volume biography of Bertrand Russell.  This is a superb work of scholarship, which I warmly recommend to everyone.  But there’s one problem with it: Monk is constantly harping on his subject’s failures, and he has no sense of humor, and Russell does.  The result is that, whenever Monk quotes Russell’s personal letters at length to prove what a jerk Russell was, the quoted passages just leap off the page—as if old Bertie has come back from the dead to share a laugh with you, the reader, while his biographer looks on sternly and says, “you two think this is funny?”

For a writer, I can think of no higher aspiration than that: to write like Bertrand Russell or like Scott Alexander—in such a way that, even when people quote you to stand above you, your words break free of the imprisoning quotation marks, wiggle past the critics, and enter the minds of readers of your generation and of generations not yet born.


Update (Nov. 13): Since apparently some people didn’t know (?!), the title of this post comes from the famous Teddy Roosevelt quote:

It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.

Sean CarrollMind-Blowing Quantum Mechanics

Trying to climb out from underneath a large pile of looming (and missed) deadlines, and in the process I’m hoping to ramp back up the real blogging. In the meantime, here are a couple of videos to tide you over.

First, an appearance a few weeks ago on Joe Rogan’s podcast. Rogan is a professional comedian and mixed-martial arts commentator, but has built a great audience for his wide-ranging podcast series. One of the things that makes him a good interviewer is his sincere delight in the material, as evidenced here by noting repeatedly that his mind had been blown. We talked for over two and a half hours, covering cosmology and quantum mechanics but also some bits about AI and pop culture.

And here’s a more straightforward lecture, this time at King’s College in London. The topic was “Extracting the Universe from the Wave Function,” which I’ve used for a few talks that ended up being pretty different in execution. This one was aimed at undergraduate physics students, some of whom hadn’t even had quantum mechanics. So the first half is a gentle introduction to many-worlds theory and why it’s the best version of quantum mechanics, and the second half tries to explain our recent efforts to emerge space itself out of quantum entanglement.

I was invited to King’s by Eugene Lim, one of my former grad students and now an extremely productive faculty member in his own right. It’s always good to see your kids grow up to do great things!

BackreactionIs the inflationary universe a scientific theory? Not anymore.

Living in a Bubble?[Image: YouTube] We are made from stretched quantum fluctuations. At least that’s cosmologists’ currently most popular explanation. According to their theory, the history of our existence began some billion years ago with a – now absent – field that propelled the universe into a phase of rapid expansion called “inflation.” When inflation ended, the field decayed and its

October 11, 2017

BackreactionWhat could we learn from quantum gravity? [Video]

Tommaso DorigoTrevor Hastie Lectures In Padova

Trevor Hastie, the Stanford University guru on Statistical Learning (he coined the term together with his colleagues Tibshirani and Friedman) is in Padova this week, where he is giving a short course on his pet topic and a seminar. I am happy to report this as this was partly made possible by the European Union-funded network of which I am the project coordinator, AMVA4NewPhysics. But most of the merit is of Prof. Giovanna Menardi, PI of the Padova node of the network, who organized it... And of course I am happy because I am learning from his insightful lectures!

(Above, prof. Menardi introduces the lectures).

read more

October 10, 2017

Doug NatelsonPiezo controller question - followup.

A couple of weeks ago I posted:

Anyone out there using a Newport NPC3SG controller to drive a piezo positioning stage, with computer communication successfully talking to the NPC3SG?  If so, please leave a comment so that we can get in touch, as I have questions.

No responses so far.  This is actually the same unit as this thing:
https://www.piezosystem.com/products/piezo_controller/piezo_controller_3_channel_version/nv_403_cle/

In our unit from Newport, communications simply don't work properly.  Timeout problems.  The labview code supplied by Newport (the same code paired with the link above) has these problems, as do many other ways of trying to talk with the instrument.  Has anyone out there had success in using a computer to control and read this thing?   At issue is whether this is a hardware problem with our unit, or whether there is a general problem with these.  The vendor has been verrrrrrrrry slow to figure this out.

October 09, 2017

John PreskillA Few Words With Caltech Research Scientist, David Boyd

Twenty years ago, David Boyd began his career at Caltech as a Postdoctoral Scholar with Dave Goodwin, and since 2012 has held the position of Research Scientist in the Division of Physics, Mathematics and Astronomy.  A 20 year career at Caltech is in itself a significant achievement considering Caltech’s flair for amassing the very best scientists from around the world.  Throughout Boyd’s career he has secured 7 patents, and most recently discovered a revolutionary single-step method for growing graphene.  The method allows for unprecedented continuity in graphene growth essential to significantly scaling-up production capacity.  Boyd worked with a number of great scientists at the outset of his career.  Notably, he gained a passion for science from Professor Thomas Wdowiak (Mars’ Wdowiak Ridge is named in his honor) at the University of Alabama at Birmingham as an undergraduate, and worked as David Goodwin’s (best known for developing methods for growing thin film high-purity diamonds) postdoc at Caltech.  Currently, Boyd is formulating a way to apply Goodwin’s reaction modeling code to graphene.  Considering Boyd’s accomplishments and extensive scientific knowledge, I feel fortunate to have been afforded the opportunity to work in his lab the past six summers. I have learned much from Boyd, but I still have more questions (not all scientific), so I requested an interview and he graciously accepted.

On the day of the interview, I meet Boyd at his office on campus at Caltech.  We walk a ways down a sunlit hallway and out to a balcony through two glass doors.  There’s a slight breeze in the air, a smell of nearby roses, and the temperature is perfect.  It’s a picturesque day in Pasadena.  We sit at a table and I ask my first question.

How many patents do you own?

I have seven patents.  The graphene patent was really hard to get, but we got it.  We just got it executed in China, so they are allowed to use it.  This is particularly exciting because of all the manufacturing in China.  The patent system has changed a bit, so it’s getting harder and harder.  You can come up with the idea, but if disparate components have already been patented, then you can’t get the patent for combining them in a unique way.  The invention has to provide a result that is unexpected or not obvious, and the patent for growing graphene with a one step process was just that.  The one step process refers to cleaning the copper substrate and growing graphene under the same chemistry in a continuous manner.  What used to be a two step process can be done in one.

You don’t have to anneal the substrate to 1000 degrees before growing.

Exactly.  Annealing the copper first and then growing doesn’t allow for a nice continuous process.  Removing the annealing step means the graphene is growing in an environment with significantly lower temperatures, which is important for CMOS or computer chip manufacturing.

Which patents do you hold most dear?

Usually in the research areas that are really cutting edge.  I have three patents in plasmonics, and that was a fun area 10 years ago.  It was a new area and we were doing something really exciting.  When you patent something, an application may never be realized, sometimes they get used and sometimes they don’t.  The graphene patent has already been licensed, so we’ve received quite a bit of traction.  As far as commercial success, the graphene has been much more successful than the other ones, but plasmonics were a lot of fun.  Water desalinization may be one application, and now there is a whole field of plasmonic chemistry.  A company has not yet licensed it, so it may have been too far ahead of its time for application anytime soon.

When did you realize you wanted to be a scientist?

I liked Physics in high school, and then I had a great mentor in college, Thomas Wdowiak.  Wdowiak showed me how to work in the lab.  Science is one of those things where an initial spark of interest drives you into action.  I became hooked, because of my love for science, the challenge it offers, and the simple fact I have fun with it.  I feel it’s very important to get into the lab and start learning science as early as possible in your education.

Were you identified as a gifted student?

I don’t think that’s a good marker.  I went to a private school early on, but no, I don’t think I was good at what they were looking for, no I wasn’t.  It comes down to what you want to do.  If you want to do something and you’re motivated to do it, you’ll find ways to make it happen.  If you want to code, you start coding, and that’s how you get good at it.  If you want to play music and have a passion for it, at first it may be your parents saying you have to go practice, but in the end it’s the passion that drives everything else.

Did you like high school?

I went to high school in Alabama and I had a good Physics teacher.  It was not the most academic of places, and if you were into academics the big thing there was to go to medical school.  I just hated memorizing things so I didn’t go that route.

Were AP classes offered at your high school, and if so, were you an AP student?

Yeah, I did take AP classes.  My high school only had AP English and AP Math, but it was just coming onboard at that time.  I took AP English because I liked the challenge and I love reading.

Were you involved in any extracurricular activities in school?

I earned the rank of Eagle Scout in the Boy Scouts.  I also raced bicycles in high school, and I was a several time state champion.  I finished high school (in America) and wanted to be a professional cyclist.  So, I got involved in the American Field Service (AFS), and did an extra year of high school in Italy as an exchange student where I ended up racing with some of the best cyclists in the world all through Italy.  It was a fantastic experience.

Did you have a college in mind for your undergraduate studies?  

No, I didn’t have a school in mind.  I had thought about the medical school path, so I considered taking pre-med courses at the local college, University of Alabama at Birmingham (UAB), because they have a good medical school.  Then UAB called me and said I earned an academic scholarship.  My father advised me that it would be a good idea to go there since it’s paid for.  I could take pre-med courses and then go to medical school afterwards if I wanted.  Well, I was in an honors program at the university and met an astronomer by the name Thomas Wdowiak.  I definitely learned from him how to be a scientist.  He also gave me a passion for being a scientist.  So, after working with Wdowiak for a while, I decided I didn’t want to go to medical school, I wanted to study Physics.  They just named a ridge on Mars after him, Wdowiak Ridge.  He was a very smart guy, and a great experimentalist who really grew my interest in science… he was great.

Did you do research while earning your undergraduate degree?  

Yes, Wdowiak had me in the lab working all the time.  We were doing real stuff in the lab.  I did a lot of undergraduate research in Astronomy, and the whole point was to get in the lab and work on science.  Because I worked with Wdowiak I had one or two papers published by the time I graduated.  Wdowiak taught me how to do science.   And that’s the thing, you have to want to do science, have a lab or a place to practice, and then start working.  

So, he was professor and experimentalist.

He was a very hands-on lab guy.  I was in the lab breaking things and fixing things. Astronomers are fun to work with.  He was an experimental astronomer who taught me, among other things, spectroscopy, vacuum technology, and much about the history of science.  In fact, it was Professor Wdowiak who told me about Millikan’s famous “Machine Shop in a Vacuum” experiment that inspired the graphene discovery… it all comes back to Caltech!

Name another scientist, other than Wdowiak, who has influenced you.

Richard Feynman also had a big influence on me.  I did not know him, but I love his books.

Were you focused solely on academics in college, or did you have a social life as well?

I was part of a concert committee that brought bands to the college.  We had some great bands like R.E.M. and the Red Hot Chili Peppers play, and I would work as a stagehand and a roadie for the shows.

So, you weren’t doing keg stands at fraternity parties?

No, it wasn’t like that.  I liked to go out and socialize, but no keg stands.  Though, I have had friends that were very successful that did do keg stands.

What’s your least favorite part of your job?

You’re always having to raise funds for salaries, equipment, and supplies.  It can be difficult, but once you get the funding it is a relief for the moment.  As a scientist, your focus isn’t always on just the science.

What are your responsibilities related to generating revenue for the university?

I raise funds for my projects via grants.  Part of the money goes to Caltech as overhead to pay for the facilities, lab space, and to keep the lights on.

What do you wish you could do more of in your job?

Less raising money.  I like working in the lab, which is fun.  Now that I have worked out the technique to grow graphene, I’m looking for applications.  I’m searching for the next impactful thing, and then I’ll figure out the necessary steps that need to be taken to get there.

Is there an aspect of your job that you believe would surprise people?

You have to be entrepreneurial, you have to sell your ideas to raise money for these projects.  You have to go with what’s hot in research.  There are certain things that get funded and things that don’t.

There may be some things you’re interested in, but other people aren’t, so there’s no funding.

Yeah, there may not be a need, therefore, no funding.  Right now, graphene is a big thing, because there are many applications and problems to be solved.  For example, diamonds were huge back in the ‘80’s.  But once they solved all the problems, research cooled off and industrial application took over.

Is there something else you’d really rather be researching, or are the trending ideas right now in line with your interests?

There is nothing else I’d rather be researching.  I’m in a good place right now.  We’re trying to commercialize the graphene research.  You try to do research projects that are complementary to one another.  For example, there’s a project underway, where graphene is being used for hydrogen storage in cars, that really interests me.  I do like the graphene work, it’s exciting, we’ll see where that goes.

What are the two most important personality traits essential to being a good scientist?

Creativity.  You have to think outside the box.  Perseverance.  I’m always reading and trying to understand something better.  Curiosity is, of course, a huge part of it as well. You gotta be obsessive too, I guess.  That’s more than two, sorry.

What does it take for someone to become a scientist?

You must have the desire to be a scientist, otherwise you’ll go be a stockbroker or something else.  It’s more of a passion thing, your personality.  You do have to have an aptitude for it though.  If you’re getting D’s in math, physics is probably not the place for you.  There’s an old joke, the medical student in physics class asks the professor, “Why do we have to take physics?  We’ll never use it.”  The Physics professor answers, “Physics saves lives, because it keeps idiots out of medical school.”  If you like science, but you’re not so good at math, then look at less quantitative areas of science where math is not as essential.  Computational physics and experimental physics will require you to be very good at math.  It takes a different temperament, a different set of skills.  Same curiosity, same drive and intelligence, but different temperament.

Do you ever doubt your own abilities?  Do you have insecurities about not being smart enough?

Sure, but there’s always going to be someone out there smarter.  Although, you really don’t want to ask yourself these types of questions.  If you do, you’re looking down the wrong end of the telescope.  Everyone has their doubts, but you need to listen to the feedback from the universe.  If you’re doing something for a long time and not getting results, then that’s telling you something.  Like I said, you must have a passion for what you’re doing.  If people are in doubt they should read biographies of scientists and explore their mindset to discover if science seems to be a good fit for them.  For a lot of people, it’s not the most fun job, it’s not the most social job, and certainly not the most glamorous type of job.  Some people need more social interaction, researchers are usually a little more introverted.  Again, it really depends on the person’s temperament. There are some very brilliant people in business, and it’s definitely not the case that only the brilliant people in a society go into science.  It doesn’t mean you can’t be doing amazing things just because you’re not in a scientific field.  If you like science and building things, then follow that path.  It’s also important not to force yourself to study something you don’t enjoy.

Scientists are often thought to work with giant math problems that are far above the intellectual capabilities of mere mortals.  Have you ever been in a particular situation where the lack of a solution to a math problem was impeding progress in the lab?  If so, what was the problem and did you discover the solution?

I’m attempting to model the process of graphene growth, so I’m facing this situation right now.  That’s why I have this book here.  I’m trying to adapt Professor Dave Goodwin’s Cantera reactor modeling code to model the reaction kinetics in graphene (Goodwin originally developed and wrote the modeling software called Cantera).  Dave was a big pioneer in diamond and he died almost 5 years ago here in Pasadena.  He developed a reaction modeling code for diamond, and I’m trying to apply that to graphene.  So, yeah, it’s a big math problem that I’ve been spending weeks on trying to figure out.  It’s not that I’m worried about the algebra or the coding, it’s trying to figure things out conceptually.

Do you love your job?

I do, I’ve done it for awhile, it’s fun, and I really enjoy it.  When it works, it’s great. Discovering stuff is fun and possesses a great sense of satisfaction.  But it’s not always that way, it can be very frustrating.  Like any good love affair, it has its peaks and valleys.  Sometimes you hate it, but that’s part of the relationship, it’s like… aaarrgghh!!

 


Alexey PetrovNon-linear teaching

I wanted to share some ideas about a teaching method I am trying to develop and implement this semester. Please let me know if you’ve heard of someone doing something similar.

This semester I am teaching our undergraduate mechanics class. This is the first time I am teaching it, so I started looking into a possibility to shake things up and maybe apply some new method of teaching. And there are plenty offered: flipped classroom, peer instruction, Just-in-Time teaching, etc.  They all look to “move away from the inefficient old model” where there the professor is lecturing and students are taking notes. I have things to say about that, but not in this post. It suffices to say that most of those approaches are essentially trying to make students work (both with the lecturer and their peers) in class and outside of it. At the same time those methods attempt to “compartmentalize teaching” i.e. make large classes “smaller” by bringing up each individual student’s contribution to class activities (by using “clickers”, small discussion groups, etc). For several reasons those approaches did not fit my goal this semester.

Our Classical Mechanics class is a gateway class for our physics majors. It is the first class they take after they are done with general physics lectures. So the students are already familiar with the (simpler version of the) material they are going to be taught. The goal of this class is to start molding physicists out of students: they learn to simplify problems so physics methods can be properly applied (that’s how “a Ford Mustang improperly parked at the top of the icy hill slides down…” turns into “a block slides down the incline…”), learn to always derive the final formula before plugging in numbers, look at the asymptotics of their solutions as a way to see if the solution makes sense, and many other wonderful things.

So with all that I started doing something I’d like to call non-linear teaching. The gist of it is as follows. I give a lecture (and don’t get me wrong, I do make my students talk and work: I ask questions, we do “duels” (students argue different sides of a question), etc — all of that can be done efficiently in a class of 20 students). But instead of one homework with 3-4 problems per week I have two types of homework assignments for them: short homeworks and projects.

Short homework assignments are single-problem assignments given after each class that must be done by the next class. They are designed such that a student need to re-derive material that we discussed previously in class with small new twist added. For example, in the block-down-to-incline problem discussed in class I ask them to choose coordinate axes in a different way and prove that the result is independent of the choice of the coordinate system. Or ask them to find at which angle one should throw a stone to get the maximal possible range (including air resistance), etc.  This way, instead of doing an assignment in the last minute at the end of the week, students have to work out what they just learned in class every day! More importantly, I get to change how I teach. Depending on how they did on the previous short homework, I adjust the material (both speed and volume) discussed in class. I also  design examples for the future sections in such a way that I can repeat parts of the topic that was hard for the students previously. Hence, instead of a linear propagation of the course, we are moving along something akin to helical motion, returning and spending more time on topics that students find more difficult. That’t why my teaching is “non-linear”.

Project homework assignments are designed to develop understanding of how topics in a given chapter relate to each other. There are as many project assignments as there are chapters. Students get two weeks to complete them.

Overall, students solve exactly the same number of problems they would in a normal lecture class. Yet, those problems are scheduled in a different way. In my way, students are forced to learn by constantly re-working what was just discussed in a lecture. And for me, I can quickly react (by adjusting lecture material and speed) using constant feedback I get from students in the form of short homeworks. Win-win!

I will do benchmarking at the end of the class by comparing my class performance to aggregate data from previous years. I’ll report on it later. But for now I would be interested to hear your comments!

 


Doug NatelsonSeveral items - the arxiv, "axial-gravitational" fun, topology

Things have been a bit busy, but here are a few items that have popped up recently:
  • Symmetry magazine is generally insightful and well-written.   Recently they posted this amusing article looking at various fun papers on the arxiv.  Their first example reminds me of this classic.
  • Speaking of the arxiv, it's creator, Paul Ginsparg, posted this engaging overview recently.  It's not an overstatement to say that the arxiv has had an enormous impact on science over the last 25 years.
  • There has been a huge amount of media attention on this paper (arxiv version).  The short version:  In high energy physics there is a certain conservation principle regarding chiral (meaning that the particle spin is directed along its momentum) massless fermions, so that ordinarily these things are produced so that there is no net excess of one handedness of spin over the other.  There is a long-standing high energy theory argument that in curved spacetime, the situation changes and you can get an excess of one handedness - a "chiral anomaly".  It is difficult to see how one could test this directly via experiment, since in our daily existence spacetime curvature is pretty minimal, unlike, say, near the event horizon of a small blackhole.  However, solid state materials can provide a playground for some wild ideas.  The spatial arrangement of atoms in a crystalline solid strongly affects the dispersion relation, the relationship between energy and (the crystal analog of) momentum.  For example, the linear dispersion relation between energy and momentum in (neutral) graphene makes the electrons behave in some ways analogous to massless relativistic particles, and lets people do experiments that test the math behind things like Klein tunneling.  As a bonus, you can add in spin-orbit coupling in solids to bring spin into the picture.  In this particular example, the electronic structure of NbP is such that, once one accounts for the spatial symmetries and spin-orbit effects, and if the number of electrons in there is right, the low-energy electronic excitations are supposed to act mathematically like massless chiral fermions (Weyl fermions).  Moreover, in a temperature gradient, the math looks like that used to describe that gravitational anomaly I'd mentioned above, and this is a system where one can actually do measurements.  However, there is a lot of hype about this, so it's worth stating clearly:  gravity itself does not play a role in NbP or this experiment.  Also, I have heard concerns about the strength of the experimental interpretation, because of issues about anisotropy in the NbP material and the aspect ratio of the sample.  
  • Similarly, there is going to be a lot of media attention around this paper, where researchers have combined a material ((Cr0.12Bi0.26Sb0.62)2Te3) that acts like a kind of topological insulator (a quantum anomalous Hall insulator, to use the authors' particular language) and a superconductor (Nb).  The result is predicted to be a system where there is conduction around the edges with the low energy current-carrying excitations act like Majorana fermions, another concept originally invented in the context of high energy physics.  
  • Both of these are examples of a kind of topology mania going on in condensed matter physics these days, as described here.  This deserves a longer discussion later.  

October 08, 2017

Scott AaronsonComing to Nerd Central

While I’m generally on sabbatical in Tel Aviv this year, I’ll be in the Bay Area from Saturday Oct. 14 through Wednesday Oct. 18, where I look forward to seeing many friends new and old.  On Wednesday evening, I’ll be giving a public talk in Berkeley, through the Simons Institute’s “Theoretically Speaking” series, entitled Black Holes, Firewalls, and the Limits of Quantum Computers.  I hope to see at least a few of you there!  (I do have readers in the Bay Area, don’t I?)

But there’s more: on Saturday Oct. 14, I’m thinking of having a first-ever Shtetl-Optimized meetup, somewhere near the Berkeley campus.  Which will also be a Slate Star Codex meetup, because Scott Alexander will be there too.  We haven’t figured out many details yet, except that it will definitively involve getting fruit smoothies from one of the places I remember as a grad student.  Possible discussion topics include what the math, CS, and physics research communities could be doing better; how to advance Enlightenment values in an age of recrudescent totalitarianism; and (if we’re feeling really ambitious) the interpretation of quantum mechanics.  If you’re interested, shoot me an email, let me know if there are times that don’t work; then other Scott and I will figure out a plan and make an announcement.

On an unrelated note, some people might enjoy my answer to a MathOverflow question about why one should’ve expected number theory to be so rife with ridiculously easy-to-state yet hard-to-prove conjectures, like Fermat’s Last Theorem and the Goldbach Conjecture.  As I’ve discussed on this blog before, I’ve been deeply impressed with MathOverflow since the beginning, but never more so than today, when a decision to close the question as “off-topic” was rightfully overruled.  If there’s any idea that unites all theoretical computer scientists, I’d say it’s the idea that what makes a given kind of mathematics “easy” or “hard” is, itself, a proper subject for mathematical inquiry.

Doug NatelsonThe Abnormal Force

How does the chair actually hold you up when you sit down?  What is keeping your car tires from sinking through the road surface?  What is keeping my coffee mug from falling through my desk?  In high school and first-year undergrad physics, we teach people about the normal force - that is a force that acts normal (perpendicular) to a surface, and it takes on whatever value is needed so that solid objects don't pass through each other.

The microscopic explanation of the normal force is that the electrons in the atoms of my coffee mug (etc.) interact with the electrons in the atoms of the desk surface, through a combination of electrostatics (electrons repel each other) and quantum statistics (the Pauli principle means that you can't just shuffle electrons around willy-nilly).  The normal force is "phenomenological" shorthand.  We take the observation that solid objects don't pass through each other, deduce that whatever is happening microscopically, the effect is that there is some force normal to surfaces that touch each other, and go from there, rather than trying to teach high school students how to calculate it from first principles.  The normal force is an emergent effect that makes sense on macroscopic scales without knowing the details.  This is just like how we teach high school students about pressure as a useful macroscopic concept, without actually doing a statistical calculation of the average perpendicular force per area on a surface due to collisions with molecules of a gas or a liquid.  

You can actually estimate the maximum reasonable normal force per unit area.  If you tried to squeeze the electrons of two adjacent atoms into the volume occupied by one atom, even without the repulsion of like charges adding to the cost, the Pauli principle means you'd have to kick some of those electrons into higher energy levels.  If a typical energy scale for doing that for each electron was something like 1 eV, and you had a few electrons per atom, and the areal density of atoms is around 1014 per cm2, then we can find the average force \(F_{\mathrm{av}}\) required to make a 1 cm2 area of two surfaces overlap with each other.   We'd have \(F_{\mathrm{av}} d \sim 10^{15}\)eV, where \(d\) is the thickness of an atom, around 0.3 nm.   That's around 534000 Newtons/cm2, or around 5.3 GPa.   That's above almost all of the yield stresses for materials (usually worrying about tension rather than compression) - that just means that the atoms themselves will move around before you really push electrons around.

Very occasionally, when two surfaces are brought together, there is a force that arises at the interface that is not along the normal direction.  A great example of that is in this video, which shows two graphite surfaces that spontaneously slide in the plane so that they are crystallographically aligned.  That work comes from this paper.

As far as I can tell, there is no official terminology for such a spontaneous in-plane force.  In the spirit of one of my professional heroes David Mermin, who coined the scientific term boojum, I would like to suggest that such a transverse force be known as the abnormal force.  (Since I don't actually work in this area and I'm not trying to name the effect after myself, hopefully the barrier to adoption will be lower than the one faced by Mermin, who actually worked on boojums :-)  ).

Richard EastherBike Bridges of Auckland

IMG_0962.jpg

A city can simply happen, its location and form shaped by millions of people choosing where to live and what to build. This is a classic example of emergence or complex phenomena arising from simple ingredients, one of the key ideas science uses to make sense of the world.

Of course, cities don't just happen; they reflect decisions made by a society or, more accurately, by those with power within a society. Where will the roads go? What are the planning rules and regulations? What activities and assets will be taxed? What do the laws say? Where are the schools and hospitals?

Ironically these decisions bring their own, additional layer of emergence. No-one decides that traffic jams make for a better city but the roads in many cities are clogged morning and evening, a consequence of policy choices which see private cars as the primary means of transportation. To my physicist's eyes, efficient transport is a matter of density and energy: cars take up more space per passenger than bikes, buses and trains, so building infrastructure to properly support cars is expensive* and moving people inside cars costs more energy than the alternatives.  

Unpicking the unintended consequences of design decisions is hard; new resources must be threaded through a complex environment. To its credit, Auckland is solving this problem with trainsbuses and bicycles, although "mode share" for bikes is still a small slice of the transport pie. However, cycling is perhaps the most energy efficient transport option known to human beings, better even than walking** and usage of the Northwestern bike path and my regular commute is growing at an annual rate of20%. This may even pick up a notch in the coming year, as the Northwestern is connected to the new Waterview Shared Path which opened in the last few days. I went for a ride along the new tracks today and was stunned at how well this set of cycling arteries has been stitched into the suburbs of Waterview and Mount Albert.

Te Whitinga Bridge across SH 20; south of the Waterview Tunnel.

Te Whitinga Bridge across SH 20; south of the Waterview Tunnel.

Unlike almost any other form of transport, regular cycling leaves you fitter and healthier than sitting in a car, a bus, or a train. Not only that, Auckland's best cycle paths run through parks and incorporate a series of stunning bridges, so you are likely to arrive at work with a smile on your face.

IMG_0963.jpg
IMG_0974.jpg
IMG_0978.jpg
IMG_0993.jpg

Light Path / Te Ara I Whiti to its designers) which was built on a disused motorway off-ramp and in the coming years "Skypath" will hopefully add walking and cycling options to the eight lanes of traffic on our Harbour Bridge. Personally, I can't wait.

Lightpath-1st-Birthday-01.12.2016-416.jpg
02 SkyPath spans across harbour.jpg
01 SkyPath Observation Deck view.jpg

campaigned for it, both to mitigate the impact of the motorway construction and to deliver more comprehensive transport connectivity, which was the overall goal of the project.

* Ironically, building better roads to beat traffic jams tends to increase the amount of driving people want to do, making the traffic jams come back – what planners called induced demand. 

** Including the embodied energy used to make the cars and bicycles as well as the energy costs of making both food and fuel complicates this simple picture. But for many short journeys cycling is a time and energy efficient way to get around. 

IMAGES: All by me except the Lightpath picture (BikeAuckland) and the Skypath mock-ups (Reset Urban Design)

John BaezVladimir Voevodsky, 1966 — 2017



Vladimir Voevodsky died last week. He won the Fields Medal in 2002 for proving the Milnor conjecture in a branch of algebra known as algebraic K-theory. He continued to work on this subject until he helped prove the more general Bloch–Kato conjecture in 2010.

Proving these results—which are too technical to easily describe to nonmathematicians!—required him to develop a dream of Grothendieck: the theory of motives. Very roughly, this is a way of taking the space of solutions of some polynomial equations and chopping it apart into building blocks. But this process of ‘chopping’ and also these building blocks, called ‘motives’, are very abstract—nothing easy to visualize.

It’s a bit like how a proton is made of quarks. You never actually see a quark in isolation, so you have to think very hard to realize they are there at all. But once you know this, a lot of things become clear.

This is wonderful, profound mathematics. But in the process of proving the Bloch-Kato conjecture, Voevodsky became tired of this stuff. He wanted to do something more useful… and more ambitious. He later said:

It was very difficult. In fact, it was 10 years of technical work on a topic that did not interest me during the last 5 of these 10 years. Everything was done only through willpower.

Since the autumn of 1997, I already understood that my main contribution to the theory of motives and motivic cohomology was made. Since that time I have been very conscious and actively looking for. I was looking for a topic that I would deal with after I fulfilled my obligations related to the Bloch-Kato hypothesis.

I quickly realized that if I wanted to do something really serious, then I should make the most of my accumulated knowledge and skills in mathematics. On the other hand, seeing the trends in the development of mathematics as a science, I realized that the time is coming when the proof of yet another conjecture won’t have much of an effect. I realized that mathematics is on the verge of a crisis, or rather, two crises.

The first is connected with the separation of “pure” and applied mathematics. It is clear that sooner or later there will be a question about why society should pay money to people who are engaged in things that do not have any practical applications.

The second, less obvious, is connected with the complication of pure mathematics, which leads to the fact that, sooner or later, the articles will become too complicated for detailed verification and the process of accumulating undetected errors will begin. And since mathematics is a very deep science, in the sense that the results of one article usually depend on the results of many and many previous articles, this accumulation of errors for mathematics is very dangerous.

So, I decided, you need to try to do something that will help prevent these crises. For the first crisis, this meant that it was necessary to find an applied problem that required for its solution the methods of pure mathematics developed in recent years or even decades.

He looked for such a problem. He studied biology and found an interesting candidate. He worked on it very hard, but then decided he’d gone down a wrong path:

Since childhood I have been interested in natural sciences (physics, chemistry, biology), as well as in the theory of computer languages, and since 1997, I have read a lot on these topics, and even took several student and post-graduate courses. In fact, I “updated” and deepened the knowledge that had to a very large extent. All this time I was looking for that I recognized open problems that would be of interest to me and to which I could apply modern mathematics.

As a result, I chose, I now understand incorrectly, the problem of recovering the history of populations from their modern genetic composition. I took on this task for a total of about two years, and in the end, already by 2009, I realized that what I was inventing was useless. In my life, so far, it was, perhaps, the greatest scientific failure. A lot of work was invested in the project, which completely failed. Of course, there was some benefit, of course—I learned a lot of probability theory, which I knew badly, and also learned a lot about demography and demographic history.

But he bounced back! He came up with a new approach to the foundations of mathematics, and helped organize a team at the Institute of Advanced Studies at Princeton to develop it further. This approach is now called homotopy type theory or univalent foundations. It’s fundamentally different from set theory. It treats the fundamental concept of equality in a brand new way! And it’s designed to be done with the help of computers.

It seems he started down this new road when the mathematician Carlos Simpson pointed out a serious mistake in a paper he’d written.

I think it was at this moment that I largely stopped doing what is called “curiosity-driven research” and started to think seriously about the future. I didn’t have the tools to explore the areas where curiosity was leading me and the areas that I considered to be of value and of interest and of beauty.

So I started to look into what I could do to create such tools. And it soon became clear that the only long-term solution was somehow to make it possible for me to use computers to verify my abstract, logical, and mathematical constructions. The software for doing this has been in development since the sixties. At the time, when I started to look for a practical proof assistant around 2000, I could not find any. There were several groups developing such systems, but none of them was in any way appropriate for the kind of mathematics for which I needed a system.

When I first started to explore the possibility, computer proof verification was almost a forbidden subject among mathematicians. A conversation about the need for computer proof assistants would invariably drift to Gödel’s incompleteness theorem (which has nothing to do with the actual problem) or to one or two cases of verification of already existing proofs, which were used only to demonstrate how impractical the whole idea was. Among the very few mathematicians who persisted in trying to advance the field of computer verification in mathematics during this time were Tom Hales and Carlos Simpson. Today, only a few years later, computer verification of proofs and of mathematical reasoning in general looks completely practical to many people who work on univalent foundations and homotopy type theory.

The primary challenge that needed to be addressed was that the foundations of mathematics were unprepared for the requirements of the task. Formulating mathematical reasoning in a language precise enough for a computer to follow meant using a foundational system of mathematics not as a standard of consistency to establish a few fundamental theorems, but as a tool that can be employed in ­everyday mathematical work. There were two main problems with the existing foundational systems, which made them inadequate. Firstly, existing foundations of mathematics were based on the languages of predicate logic and languages of this class are too limited. Secondly, existing foundations could not be used to directly express statements about such objects as, for example, the ones in my work on 2-theories.

Still, it is extremely difficult to accept that mathematics is in need of a completely new foundation. Even many of the people who are directly connected with the advances in homotopy type theory are struggling with this idea. There is a good reason: the existing foundations of mathematics—ZFC and category theory—have been very successful. Overcoming the appeal of category theory as a candidate for new foundations of mathematics was for me personally the most challenging.

Homotopy type theory is now a vital and exciting area of mathematics. It’s far from done, and to make it live up to Voevodsky’s dreams will require brand new ideas—not just incremental improvements, but actual sparks of genius. For some of the open problems, see Mike Shulman’s comment on the n-Category Café, and some replies to that.

I only met him a few times, but as far as I can tell Voevodsky was a completely unpretentious person. You can see that in the picture here.

He was also a very complex person. For example, you might not guess that he took great wildlife photos:



You also might not guess at this side of him:

In 2006-2007 a lot of external and internal events happened to me, after which my point of view on the questions of the “supernatural” changed significantly. What happened to me during these years, perhaps, can be compared most closely to what happened to Karl Jung in 1913-14. Jung called it “confrontation with the unconscious”. I do not know what to call it, but I can describe it in a few words. Remaining more or less normal, apart from the fact that I was trying to discuss what was happening to me with people whom I should not have discussed it with, I had in a few months acquired a very considerable experience of visions, voices, periods when parts of my body did not obey me, and a lot of incredible accidents. The most intense period was in mid-April 2007 when I spent 9 days (7 of them in the Mormon capital of Salt Lake City), never falling asleep for all these days.

Almost from the very beginning, I found that many of these phenomena (voices, visions, various sensory hallucinations), I could control. So I was not scared and did not feel sick, but perceived everything as something very interesting, actively trying to interact with those “beings” in the auditorial, visual and then tactile spaces that appeared around me (by themselves or by invoking them). I must say, probably, to avoid possible speculations on this subject, that I did not use any drugs during this period, tried to eat and sleep a lot, and drank diluted white wine.

Another comment: when I say “beings”, naturally I mean what in modern terminology are called complex hallucinations. The word “beings” emphasizes that these hallucinations themselves “behaved”, possessed a memory independent of my memory, and reacted to attempts at communication. In addition, they were often perceived in concert in various sensory modalities. For example, I played several times with a (hallucinated) ball with a (hallucinated) girl—and I saw this ball, and felt it with my palm when I threw it.

Despite the fact that all this was very interesting, it was very difficult. It happened for several periods, the longest of which lasted from September 2007 to February 2008 without breaks. There were days when I could not read, and days when coordination of movements was broken to such an extent that it was difficult to walk.

I managed to get out of this state due to the fact that I forced myself to start math again. By the middle of spring 2008 I could already function more or less normally and even went to Salt Lake City to look at the places where I wandered, not knowing where I was, in the spring of 2007.

In short, he was a genius akin to Cantor or Grothendieck, at times teetering on the brink of sanity, yet gripped by an immense desire for beauty and clarity, engaging in struggles that gripped his whole soul. From the fires of this volcano, truly original ideas emerge.

This last quote, and the first few quotes, are from some interviews in Russian, done by Roman Mikhailov, which Mike Stay pointed out to me. I used Google Translate and polished the results a bit:

Интервью Владимира Воеводского (часть 1), 1 July 2012. English version via Google Translate: Interview with Vladimir Voevodsky (Part 1).

Интервью Владимира Воеводского (часть 2), 5 July 2012. English version via Google Translate: Interview with Vladimir Voevodsky (Part 2).

The quote about the origins of ‘univalent foundations’ comes from his nice essay here:

• Vladimir Voevodsky, The origins and motivations of univalent foundations, 2014.

There’s also a good obituary of Voevodsky explaining its relation to Grothendieck’s idea in simple terms:

• Institute for Advanced Studies, Vladimir Voevodsky 1966–2017, 4 October 2017.

The photograph of Voevodsky is from Andrej Bauer’s website:

• Andrej Bauer, Photos of mathematicians.

To learn homotopy type theory, try this great book:

Homotopy Type Theory: Univalent Foundations of Mathematics, The Univalent Foundations Program, Institute for Advanced Study.


October 06, 2017

John BaezApplied Category Theory 2018

There will be a conference on applied category theory!

Applied Category Theory (ACT 2018). School 23–27 April 2018 and workshop 30 April–4 May 2018 at the Lorentz Center in Leiden, the Netherlands. Organized by Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford).

The plenary speakers will be:

• Samson Abramsky (Oxford)
• John Baez (UC Riverside)
• Kathryn Hess (EPFL)
• Mehrnoosh Sadrzadeh (Queen Mary)
• David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website:

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

Towards an Integrative Science: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.

While the workshop will host discussions on a wide range of applications of category theory, there will be four special tracks on exciting new developments in the field:

1. Dynamical systems and networks
2. Systems biology
3. Cognition and AI
4. Causality

Accompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

Sincerely,
The organizers

Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

About Applied Category Theory

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

The proposal: Towards an Integrative Science

Category theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez and Mike STay. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945–1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127–135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121–134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317–341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.


Jordan EllenbergTrace test

Jose Rodriguez gave a great seminar here yesterday about his work on the trace test, a numerical way of identifying irreducible components of varieties.  In Jose’s world, you do a lot of work with homotopy; if a variety X intersects a linear subspace V in points p1, p2, .. pk, you can move V a little bit and numerically follow those k points around.  If you move V more than a little bit — say in a nice long path in the Grassmannian that loops around and returns to its starting point — you’ll come back to p1, p2, .. pk, but maybe in a different order.  In this way you can compute the monodromy of those points; if it’s transitive, and if you’re careful about avoiding some kind of discriminant locus, you’ve proven that p1,p2…pk are all on the same component of V.

But the trace test is another thing; it’s about short paths, not long paths.  For somebody like me, who never thinks about numerical methods, this means “oh we should work in the local ring.”  And then it says something interesting!  It comes down to this.  Suppose F(x,y) is a form (not necessarily homogenous) of degree at most d over a field k.  Hitting it with a linear transformation if need be, we can assume the x^d term is nonzero.  Now think of F as an element of k((y))[x]:  namely

F = x^d + a_1(y) x^{d-1} + \ldots + a_d(y).

Letting K be the algebraic closure of k((y)), we can then factor F as (x-r_1) … (x-r_d).  Each of these roots can be computed as explicitly to any desired precision by Hensel’s lemma.  While the r_i may be power series in k((y)) (or in some algebraic extension), the sum of the r_i is -a_1(y), which is a linear function A+by.

Suppose you are wondering whether F factors in k[x,y], and whether, for instance, r_1 and r_2 are the roots of an irreducible factor of F.  For that to be true, r_1 + r_2 must be a linear function of y!  (In Jose’s world, you grab a set of points, you homotopy them around, and observe that if they lie on an irreducible component, their centroid moves linearly as you translate the plane V.)

Anyway, you can falsify this easily; it’s enough for e.g. the quadratic term of r_1 + r_2 to be nonzero.  If you want to prove F is irreducible, you just check that every proper subset of the r_i sums to something nonlinear.

  1.  Is this something I already know in another guise?
  2.  Is there a nice way to express the condition (which implies irreducibility) that no proper subset of the r_i sums to something with zero quadratic term?

 


October 05, 2017

John BaezAzimuth Backup Project (Part 5)

I haven’t spoken much about the Azimuth Climate Data Backup Project, but it’s going well, and I’ll be speaking about it soon, here:

International Open Access Week, Wednesday 25 October 2017, 9:30–11:00 a.m., University of California, Riverside, Orbach Science Library, Room 240.

“Open in Order to Save Data for Future Research” is the 2017 event theme.

Open Access Week is an opportunity for the academic and research community to learn about the potential benefits of sharing what they’ve learned with colleagues, and to help inspire wider participation in helping to make “open access” a new norm in scholarship, research and data planning and preservation.

The Open Access movement is made of up advocates (librarians, publishers, university repositories, etc.) who promote the free, immediate, and online publication of research.

The program will provide information on issues related to saving open data, including climate change and scientific data. The panelists also will describe open access projects in which they have participated to save climate data and to preserve end-of-term presidential data, information likely to be and utilized by the university community for research and scholarship.

The program includes:

• Brianna Marshall, Director of Research Services, UCR Library: Brianna welcomes guests and introduces panelists.

• John Baez, Professor of Mathematics, UCR: John will describe his activities to save US government climate data through his collaborative effort, the Azimuth Climate Data Backup Project. All of the saved data is now open access for everyone to utilize for research and scholarship.

• Perry Willett, Digital Preservation Projects Manager, California Digital Library: Perry will discuss the open data initiatives in which CDL participates, including the end-of-term presidential web archiving that is done in partnership with the Library of Congress, Internet Archive and University of North Texas.

• Kat Koziar, Data Librarian, UCR Library: Kat will give an overview of DASH, the UC system data repository, and provide suggestions for researchers interested in making their data open.

This will be the eighth International Open Access Week program hosted by the UCR Library.

The event is free and open to the public. Light refreshments will be served.


Scott AaronsonBecause you asked: the Simulation Hypothesis has not been falsified; remains unfalsifiable

By email, by Twitter, even as the world was convulsed by tragedy, the inquiries poured in yesterday about a different topic entirely: Scott, did physicists really just prove that the universe is not a computer simulation—that we can’t be living in the Matrix?

What prompted this was a rash of popular articles like this one (“Researchers claim to have found proof we are NOT living in a simulation”).  The articles were all spurred by a recent paper in Science Advances: Quantized gravitational responses, the sign problem, and quantum complexity, by Zohar Ringel of Hebrew University and Dmitry L. Kovrizhin of Oxford.

I’ll tell you what: before I comment, why don’t I just paste the paper’s abstract here.  I invite you to read it—not the whole paper, just the abstract, but paying special attention to the sentences—and then make up your own mind about whether it supports the interpretation that all the popular articles put on it.

Ready?  Set?

Abstract: It is believed that not all quantum systems can be simulated efficiently using classical computational resources.  This notion is supported by the fact that it is not known how to express the partition function in a sign-free manner in quantum Monte Carlo (QMC) simulations for a large number of important problems.  The answer to the question—whether there is a fundamental obstruction to such a sign-free representation in generic quantum systems—remains unclear.  Focusing on systems with bosonic degrees of freedom, we show that quantized gravitational responses appear as obstructions to local sign-free QMC.  In condensed matter physics settings, these responses, such as thermal Hall conductance, are associated with fractional quantum Hall effects.  We show that similar arguments also hold in the case of spontaneously broken time-reversal (TR) symmetry such as in the chiral phase of a perturbed quantum Kagome antiferromagnet.  The connection between quantized gravitational responses and the sign problem is also manifested in certain vertex models, where TR symmetry is preserved.

For those tuning in from home, the “sign problem” is an issue that arises when, for example, you’re trying to use the clever trick known as Quantum Monte Carlo (QMC) to learn about the ground state of a quantum system using a classical computer—but where you needed probabilities, which are real numbers from 0 to 1, your procedure instead spits out numbers some of which are negative, and which you can therefore no longer use to define a sensible sampling process.  (In some sense, it’s no surprise that this would happen when you’re trying to simulate quantum mechanics, which of course is all about generalizing the rules of probability in a way that involves negative and even complex numbers!  The surprise, rather, is that QMC lets you avoid the sign problem as often as it does.)

Anyway, this is all somewhat far from my expertise, but insofar as I understand the paper, it looks like a serious contribution to our understanding of the sign problem, and why local changes of basis can fail to get rid of it when QMC is used to simulate certain bosonic systems.  It will surely interest QMC experts.

OK, but does any of this prove that the universe isn’t a computer simulation, as the popular articles claim (and as the original paper does not)?

It seems to me that, to get from here to there, you’d need to overcome four huge difficulties, any one of which would be fatal by itself, and which are logically independent of each other.

  1. As a computer scientist, one thing that leapt out at me, is that Ringel and Kovrizhin’s paper is fundamentally about computational complexity—specifically, about which quantum systems can and can’t be simulated in polynomial time on a classical computer—yet it’s entirely innocent of the language and tools of complexity theory.  There’s no BQP, no QMA, no reduction-based hardness argument anywhere in sight, and no clearly-formulated request for one either.  Instead, everything is phrased in terms of the failure of one specific algorithmic framework (namely QMC)—and within that framework, only “local” transformations of the physical degrees of freedom are considered, not nonlocal ones that could still be accessible to polynomial-time algorithms.  Of course, one does whatever one needs to do to get a result.
    To their credit, the authors do seem aware that a language for discussing all possible efficient algorithms exists.  They write, for example, of a “common understanding related to computational complexity classes” that some quantum systems are hard to simulate, and specifically of the existence of systems that support universal quantum computation.  So rather than criticize the authors for this limitation of their work, I view their paper as a welcome invitation for closer collaboration between the quantum complexity theory and quantum Monte Carlo communities, which approach many of the same questions from extremely different angles.  As official ambassador between the two communities, I nominate Matt Hastings.
  2. OK, but even if the paper did address computational complexity head-on, about the most it could’ve said is that computer scientists generally believe that BPP≠BQP (i.e., that quantum computers can solve more decision problems in polynomial time than classical probabilistic ones); and that such separations are provable in the query complexity and communication complexity worlds; and that at any rate, quantum computers can solve exact sampling problems that are classically hard unless the polynomial hierarchy collapses (as pointed out in the BosonSampling paper, and independently by Bremner, Jozsa, Shepherd).  Alas, until someone proves P≠PSPACE, there’s no hope for an unconditional proof that quantum computers can’t be efficiently simulated by classical ones.
    (Incidentally, the paper comments, “Establishing an obstruction to a classical simulation is a rather ill-defined task.”  I beg to differ: it’s not ill-defined; it’s just ridiculously hard!)
  3. OK, but suppose it were proved that BPP≠BQP—and for good measure, suppose it were also experimentally demonstrated that scalable quantum computing is possible in our universe.  Even then, one still wouldn’t by any stretch have ruled out that the universe was a computer simulation!  For as many of the people who emailed me asked themselves (but as the popular articles did not), why not just imagine that the universe is being simulated on a quantum computer?  Like, duh?
  4. Finally: even if, for some reason, we disallowed using a quantum computer to simulate the universe, that still wouldn’t rule out the simulation hypothesis.  For why couldn’t God, using Her classical computer, spend a trillion years to simulate one second as subjectively perceived by us?  After all, what is exponential time to She for whom all eternity is but an eyeblink?

Anyway, if it weren’t for all four separate points above, then sure, physicists would have now proved that we don’t live in the Matrix.

I do have a few questions of my own, for anyone who came here looking for my reaction to the ‘news’: did you really need me to tell you all this?  How much of it would you have figured out on your own, just by comparing the headlines of the popular articles to the descriptions (however garbled) of what was actually done?  How obvious does something need to be, before it no longer requires an ‘expert’ to certify it as such?  If I write 500 posts like this one, will the 501st post basically just write itself?

Asking for a friend.


Comment Policy: I welcome discussion about the Ringel-Dovrizhin paper; what might’ve gone wrong with its popularization; QMC; the sign problem; the computational complexity of condensed-matter problems more generally; and the relevance or irrelevance of work on these topics to broader questions about the simulability of the universe.  But as a little experiment in blog moderation, I won’t allow comments that just philosophize in general about whether or not the universe is a simulation, without making further contact with the actual content of this post.  We’ve already had the latter conversation here—probably, like, every week for the last decade—and I’m ready for something new.

October 03, 2017

BackreactionYet another year in which you haven’t won a Nobel Prize!

“Do you hope to win a Nobel Prize?” asked an elderly man who had come to shake my hand after the lecture. I laughed, but he was serious. Maybe I had been a little too successful explaining how important quantum gravity is. No, I don’t hope to win a Nobel Prize. If that’s what I’d been after, I certainly would have chosen a different field. Condensed matter physics, say, or quantum things. At

October 02, 2017

Richard EastherYou Ain't Seen Nothing Yet...

This is the week the Nobel Prizes are announced, and today is the day (at least in New Zealand; first place in the world to see the light, as the tourist people say) the Physics prize is announced. And this year the odds-on favourite will be LIGO and the discovery of gravitational waves.

It is possible that the famously cautious Nobel committee will wait another year but their biggest debate may be exactly which people should get the nod, as the prize can only be split three ways. At one point the LIGO experiment was driven by "the troika" of Rainer Weiss, Kip Thorne and Ronald Drever which would have simplified matters. However, Drever was sidelined early in the process and died this year, rendering him definitively ineligible and leaving an empty chair. One candidate to fill it is Barry Barish who is credited with laying the much of the groundwork for the overall LIGO collaboration but there is no shortage of claimants.

This gets to the heart of a dilemma with prizes and the Nobel in particular, thanks to the almost infinite prestige it confers upon winners and the iron rule that it can be shared only three ways: we award accolades to individuals for what are ultimately the accomplishments of a community. One solution would to give prizes to ideas; there is often debate about who should win the prize but it is rare to hear deep disagreement as to what should win the prize. Nor do we honour the often unsung heroes who make the community tick. [My proposal for a George Bailey Prize has gone nowhere, however. C'est la vie.]

But ahead of the big reveal, here are a few quick thoughts: 

  • Since I am in New Zealand, it would be remiss of me not to mention my compatriot Roy Kerr, who solved the Einstein field equations for spinning black holes in 1963. A black hole born in a merger inherits the spin of it progenitors so the end-stages of the events detected by LIGO are expected to match the "Kerr metric", and the detailed signals appear to confirm this. For whatever reason, the smart money is not backing Kerr but it would be a pleasant surprise if he gets a phone call from Stockholm late this evening, and not undeserved. (Local time!) That said, gravitational wave observations are not yet sufficiently sensitive to confirm that the final state is exactly described by Kerr's solution, which is probably enough to prevent a major antipodean celebration later tonight. 
  • Gravitational waves have already been the subject of a Nobel Prize. In 1993 Hulse and Taylor were honoured for discovering a binary neutron star whose mutual orbit is slowly contracting; this behaviour is exactly consistent with energy carried away by gravitational waves emitted by this system. These gravitational waves are far too weak to detect directly on Earth, but more than enough for a win.
  • Hulse was a student and Taylor his supervisor. Neutron stars themselves were first detected via the discovery of pulsars in the 1960s. Jocelyn Bell's work as a PhD student was key to this earlier breakthrough but she (in)famously missed out on the prize awarded to her supervisor. So an empty slot this year could be used to correct this shameful mistake, but sadly that is also an unlikely result.
  • A key test of any scientific discovery is an independent confirmation. LIGO has two detectors but they were built by the same team from the same design and work in tandem. However, the latest detection (just last week) was the first in which the European VIRGO instrument also played an important role so that announcement should put many remaining qualms about LIGO's discovery to rest.
  • And, in fact, the LIGO-VIRGO news came as something of an anti-climax since many people were expecting an announcement of the first detection of two neutron stars merging into a single black hole. Unlike black holes, colliding neutron stars produce a strong burst of energy at wavelengths ranging from gamma rays down to optical and below, providing an exquisite opportunity to complement gravitational wave observations with conventional telescopes. Stay tuned on that one. 

As it happens, I have been teaching about gravitational waves over the last few weeks. At the University of Auckland (where I am Head of the Department of Physics, or "Chair", in North American parlance) we were grappling with a paradox at the heart of all physics programmes: we like to think of physics as a cutting edge discipline but the foundational material we teach to undergraduates has almost all been known for decades or centuries. In fact, even "modern physics", quantum mechanics and relativity, was itself developed in the 1920s, and is perhaps "modern" in the same way that Virginia Woolf and TS Eliot are "modernist writers".

Our solution to this dilemma was to develop a second year / sophomore course we called "Frontiers of Physics", which treats cutting edge topics but assumes only introductory physics. This comes at the cost of the finer details, but in the case of gravitational waves you can get a lot of traction with the key ideas of energy conservation, the wave equation, the interference of light beams, and a little special relativity. This year is the first time we have taught the course and gravitational waves have truly been ripped from the headlines: I had to change my lectures to keep up with breaking news from LIGO and discoveries in black hole physics. And I couldn't be happier. 


CODA: The header image is from the Simulating Extreme Spacetimes (SXS) Project. And this post was edited a little to clarify the reasons why Roy Kerr is not tipped to win the Prize today. (Thanks to Christopher Berry for comments on Twitter about this.) 

Terence TaoAn update to “On the sign patterns of entrywise positivity preservers in fixed dimension”

Apoorva Khare and I have updated our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“, announced at this post from last month. The quantitative results are now sharpened using a new monotonicity property of ratios {s_{\lambda}(u)/s_{\mu}(u)} of Schur polynomials, namely that such ratios are monotone non-decreasing in each coordinate of {u} if {u} is in the positive orthant, and the partition {\lambda} is larger than that of {\mu}. (This monotonicity was also independently observed by Rachid Ait-Haddou, using the theory of blossoms.) In the revised version of the paper we give two proofs of this monotonicity. The first relies on a deep positivity result of Lam, Postnikov, and Pylyavskyy, which uses a representation-theoretic positivity result of Haiman to show that the polynomial combination

\displaystyle s_{(\lambda \wedge \nu) / (\mu \wedge \rho)} s_{(\lambda \vee \nu) / (\mu \vee \rho)} - s_{\lambda/\mu} s_{\nu/\rho} \ \ \ \ \ (1)

of skew-Schur polynomials is Schur-positive for any partitions {\lambda,\mu,\nu,\rho} (using the convention that the skew-Schur polynomial {s_{\lambda/\mu}} vanishes if {\mu} is not contained in {\lambda}, and where {\lambda \wedge \nu} and {\lambda \vee \nu} denotes the pointwise min and max of {\lambda} and {\nu} respectively). It is fairly easy to derive the monotonicity of {s_\lambda(u)/s_\mu(u)} from this, by using the expansion

\displaystyle s_\lambda(u_1,\dots, u_n) = \sum_k u_1^k s_{\lambda/(k)}(u_2,\dots,u_n)

of Schur polynomials into skew-Schur polynomials (as was done in this previous post).

The second proof of monotonicity avoids representation theory by a more elementary argument establishing the weaker claim that the above expression (1) is non-negative on the positive orthant. In fact we prove a more general determinantal log-supermodularity claim which may be of independent interest:

Theorem 1 Let {A} be any {n \times n} totally positive matrix (thus, every minor has a non-negative determinant). Then for any {k}-tuples {I_1,I_2,J_1,J_2} of increasing elements of {\{1,\dots,n\}}, one has

\displaystyle \det( A_{I_1 \wedge I_2, J_1 \wedge J_2} ) \det( A_{I_1 \vee I_2, J_1 \vee J_2} ) - \det(A_{I_1,J_1}) \det(A_{I_2,J_2}) \geq 0

where {A_{I,J}} denotes the {k \times k} minor formed from the rows in {I} and columns in {J}.

For instance, if {A} is the matrix

\displaystyle A = \begin{pmatrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{pmatrix}

for some real numbers {a,\dots,p}, one has

\displaystyle a h - de\geq 0

(corresponding to the case {k=1}, {I_1 = (1), I_2 = (2), J_1 = (4), J_2 = (1)}), or

\displaystyle \det \begin{pmatrix} a & c \\ i & k \end{pmatrix} \det \begin{pmatrix} f & h \\ n & p \end{pmatrix} - \det \begin{pmatrix} e & h \\ i & l \end{pmatrix} \det \begin{pmatrix} b & c \\ n & o \end{pmatrix} \geq 0

(corresponding to the case {k=2}, {I_1 = (2,3)}, {I_2 = (1,4)}, {J_1 = (1,4)}, {J_2 = (2,3)}). It turns out that this claim can be proven relatively easy by an induction argument, relying on the Dodgson and Karlin identities from this previous post; the difficulties are largely notational in nature. Combining this result with the Jacobi-Trudi identity for skew-Schur polynomials (discussed in this previous post) gives the non-negativity of (1); it can also be used to directly establish the monotonicity of ratios {s_\lambda(u)/s_\mu(u)} by applying the theorem to a generalised Vandermonde matrix.

(Log-supermodularity also arises as the natural hypothesis for the FKG inequality, though I do not know of any interesting application of the FKG inequality in this current setting.)


Filed under: math.CA, math.RA, update Tagged: Apoorva Khare, Dodgson condensation, Schur polynomials, supermodularity, total positivity

Scott AaronsonAlso against individual IQ worries

Scott Alexander recently blogged “Against Individual IQ Worries.”  Apparently, he gets many readers writing to him terrified that they scored too low on an IQ test, and therefore they’ll never be able to pursue their chosen career, or be a full-fledged intellectual or member of the rationalist community or whatever.  Amusingly, other Scott says, some of these readers have even performed their own detailed Bayesian analysis of what it might mean that their IQ score is only 90, cogently weighing the arguments and counterarguments while deploying the full vocabulary of statistical research.  It somehow reminds me of the joke about the talking dog, who frets to his owner that he doesn’t think he’s articulate enough to justify all the media attention he’s getting.

I’ve long had mixed feelings about the entire concept of IQ.

On the one hand, I know all the studies that show that IQ is highly heritable, that it’s predictive of all sorts of life outcomes, etc. etc.  I’m also aware of the practical benefits of IQ research, many of which put anti-IQ leftists into an uncomfortable position: for example, the world might never have understood the risks of lead poisoning without studies showing how they depressed IQ.  And as for the thousands of writers who dismiss the concept of IQ in favor of grit, multiple intelligences, emotional intelligence, or whatever else is the flavor of the week … well, I can fully agree about the importance of the latter qualities, but can’t go along with many of those writers’ barely-concealed impulse to lower the social status of STEM nerds even further, or to enforce a world where the things nerds are good at don’t matter.

On the other hand … well, have you actually looked at an IQ test?  To anyone with a scientific or mathematical bent, the tests are vaguely horrifying.  “Which of these pictures is unlike the others?”  “What number comes next in the sequence?”  Question after question that could have multiple defensible valid answers, but only one that “counts”—and that, therefore, mostly tests the social skill of reverse-engineering what the test-writer had in mind.  As a teacher, I’d be embarrassed to put such questions on an exam.

I sometimes get asked what my IQ is.  The truth is that, as far as I know, I was given one official IQ test, when I was four years old, and my score was about 106.  The tester earnestly explained to my parents that, while I scored off the chart on some subtests, I completely bombed others, and averaging yielded 106.  As a representative example of what I got wrong, the tester offered my parents the following:

Tester: “Suppose you came home, and you saw smoke coming out of your neighbor’s roof.  What would you do?”

Me: “Probably nothing, because it’s just the chimney, and they have a fire in their fireplace.”

Tester: “OK, but suppose it wasn’t the chimney.”

Me: “Well then, I’d either call for help or not, depending on how much I liked my neighbor…”

Apparently, my parents later consulted other psychologists who were of the opinion that my IQ was higher.  But the point remains: if IQ is defined as your score on a professionally administered IQ test, then mine is about 106.

Richard Feynman famously scored only 124 on a childhood IQ test—above average, but below the cutoff for most schools’ “gifted and talented” programs.  After he won the Nobel Prize in Physics, he reportedly said that the prize itself was no big deal; what he was really proud of was to have received one despite a merely 124 IQ.  If so, then it seems to me that I can feel equally proud, to have completed a computer science PhD at age 22, become a tenured MIT professor, etc. etc. despite a much lower IQ even than Feynman’s.

But seriously: how do we explain Feynman’s score?  Well, when you read IQ enthusiasts, you find what they really love is not IQ itself, but rather “g“, a statistical construct derived via factor analysis: something that positively correlates with just about every measurable intellectual ability, but that isn’t itself directly measurable (at least, not by any test yet devised).  An IQ test is merely one particular instrument that happens to correlate well with g—not necessarily the best one for all purposes.  The SAT also correlates with g—indeed, almost as well as IQ tests themselves do, despite the idea (or pretense?) that the SAT measures “acquired knowledge.”  These correlations are important, but they allow for numerous and massive outliers.

So, not for the first time, I find myself in complete agreement with Scott Alexander, when he advises people to stop worrying.  We can uphold every statistical study that’s ever been done correlating IQ with other variables, while still affirming that fretting about your own low IQ score is almost as silly as fretting that you must be dumb because your bookshelf is too empty (a measurable variable that also presumably correlates with g).

More to the point: if you want to know, let’s say, whether you can succeed as a physicist, then surely the best way to find out is to start studying physics and see how well you do.  That will give you a much more accurate signal than a gross consumer index like IQ will—and conditioned on that signal, I’m guessing that your IQ score will provide almost zero additional information.  (Though then again, what would a guy with a 106 IQ know about such things?)

n-Category Café Vladimir Voevodsky, June 4, 1966 - September 30, 2017

Vladimir Voevodsky died this Saturday. He was 51.

I met him a couple of times, and have a few stories, but I think I’ll just quote the Institute for Advanced Studies obituary and open up the thread for memories and conversation.

The Institute for Advanced Study is deeply saddened by the passing of Vladimir Voevodsky, Professor in the School of Mathematics.

Voevodsky, a truly extraordinary and original mathematician, made many contributions to the field of mathematics, earning him numerous honors and awards, including the Fields Medal.

Celebrated for tackling the most difficult problems in abstract algebraic geometry, Voevodsky focused on the homotopy theory of schemes, algebraic K-theory, and interrelations between algebraic geometry, and algebraic topology. He made one of the most outstanding advances in algebraic geometry in the past few decades by developing new cohomology theories for algebraic varieties. Among the consequences of his work are the solutions of the Milnor and Bloch-Kato Conjectures.

More recently he became interested in type-theoretic formalizations of mathematics and automated proof verification. He was working on new foundations of mathematics based on homotopy-theoretic semantics of Martin-Löf type theories. His new “Univalence Axiom” has had a dramatic impact in both mathematics and computer science.

A gathering to celebrate Voevodsky’s life and legacy is being planned and more information will be available soon.

October 01, 2017

Tommaso DorigoThe Physics Of Boson Pairs

At 10:00 AM this morning, my smartphone alerted me that in two months I will have to deliver a thorough review on the physics of boson pairs - a 50 page thing which does not yet even exist in the world of ideas. So I have better start planning carefully my time in the next 60 days, to find at least two clean weeks where I may cram in the required concentration. That will be the hard part!

read more

September 30, 2017

Terence TaoOn the sign patterns of entrywise positivity preservers in fixed dimension

Apoorva Khare and I have just uploaded to the arXiv our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“. This paper explores the relationship between positive definiteness of Hermitian matrices, and entrywise operations on these matrices. The starting point for this theory is the Schur product theorem, which asserts that if {A = (a_{ij})_{1 \leq i,j \leq N}} and {B = (b_{ij})_{1 \leq i,j \leq N}} are two {N \times N} Hermitian matrices that are positive semi-definite, then their Hadamard product

\displaystyle  A \circ B := (a_{ij} b_{ij})_{1 \leq i,j \leq N}

is also positive semi-definite. (One should caution that the Hadamard product is not the same as the usual matrix product.) To prove this theorem, first observe that the claim is easy when {A = {\bf u} {\bf u}^*} and {B = {\bf v} {\bf v}^*} are rank one positive semi-definite matrices, since in this case {A \circ B = ({\bf u} \circ {\bf v}) ({\bf u} \circ {\bf v})^*} is also a rank one positive semi-definite matrix. The general case then follows by noting from the spectral theorem that a general positive semi-definite matrix can be expressed as a non-negative linear combination of rank one positive semi-definite matrices, and using the bilinearity of the Hadamard product and the fact that the set of positive semi-definite matrices form a convex cone. A modification of this argument also lets one replace “positive semi-definite” by “positive definite” in the statement of the Schur product theorem.

One corollary of the Schur product theorem is that any polynomial {P(z) = c_0 + c_1 z + \dots + c_d z^d} with non-negative coefficients {c_n \geq 0} is entrywise positivity preserving on the space {{\mathbb P}_N({\bf C})} of {N \times N} positive semi-definite Hermitian matrices, in the sense that for any matrix {A = (a_{ij})_{1 \leq i,j \leq N}} in {{\mathbb P}_N({\bf C})}, the entrywise application

\displaystyle  P[A] := (P(a_{ij}))_{1 \leq i,j \leq N}

of {P} to {A} is also positive semi-definite. (As before, one should caution that {P[A]} is not the application {P(A)} of {P} to {A} by the usual functional calculus.) Indeed, one can expand

\displaystyle  P[A] = c_0 A^{\circ 0} + c_1 A^{\circ 1} + \dots + c_d A^{\circ d},

where {A^{\circ i}} is the Hadamard product of {i} copies of {A}, and the claim now follows from the Schur product theorem and the fact that {{\mathbb P}_N({\bf C})} is a convex cone.

A slight variant of this argument, already observed by Pólya and Szegö in 1925, shows that if {I} is any subset of {{\bf C}} and

\displaystyle  f(z) = \sum_{n=0}^\infty c_n z^n \ \ \ \ \ (1)

is a power series with non-negative coefficients {c_n \geq 0} that is absolutely and uniformly convergent on {I}, then {f} will be entrywise positivity preserving on the set {{\mathbb P}_N(I)} of positive definite matrices with entries in {I}. (In the case that {I} is of the form {I = [0,\rho]}, such functions are precisely the absolutely monotonic functions on {I}.)

In the work of Schoenberg and of Rudin, we have a converse: if {f: (-1,1) \rightarrow {\bf C}} is a function that is entrywise positivity preserving on {{\mathbb P}_N((-1,1))} for all {N}, then it must be of the form (1) with {c_n \geq 0}. Variants of this result, with {(-1,1)} replaced by other domains, appear in the work of Horn, Vasudeva, and Guillot-Khare-Rajaratnam.

This gives a satisfactory classification of functions {f} that are entrywise positivity preservers in all dimensions {N} simultaneously. However, the question remains as to what happens if one fixes the dimension {N}, in which case one may have a larger class of entrywise positivity preservers. For instance, in the trivial case {N=1}, a function would be entrywise positivity preserving on {{\mathbb P}_1((0,\rho))} if and only if {f} is non-negative on {I}. For higher {N}, there is a necessary condition of Horn (refined slightly by Guillot-Khare-Rajaratnam) which asserts (at least in the case of smooth {f}) that all derivatives of {f} at zero up to {(N-1)^{th}} order must be non-negative in order for {f} to be entrywise positivity preserving on {{\mathbb P}_N((0,\rho))} for some {0 < \rho < \infty}. In particular, if {f} is of the form (1), then {c_0,\dots,c_{N-1}} must be non-negative. In fact, a stronger assertion can be made, namely that the first {N} non-zero coefficients in (1) (if they exist) must be positive, or equivalently any negative term in (1) must be preceded (though not necessarily immediately) by at least {N} positive terms. If {f} is of the form (1) is entrywise positivity preserving on the larger set {{\mathbb P}_N((0,+\infty))}, one can furthermore show that any negative term in (1) must also be followed (though not necessarily immediately) by at least {N} positive terms.

The main result of this paper is that these sign conditions are the only constraints for entrywise positivity preserving power series. More precisely:

Theorem 1 For each {n}, let {\epsilon_n \in \{-1,0,+1\}} be a sign.

  • Suppose that any negative sign {\epsilon_M = -1} is preceded by at least {N} positive signs (thus there exists {0 \leq n_0 < \dots < n_{N-1}< M} with {\epsilon_{n_0} = \dots = \epsilon_{n_{N-1}} = +1}). Then, for any {0 < \rho < \infty}, there exists a convergent power series (1) on {(0,\rho)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,\rho))}.
  • Suppose in addition that any negative sign {\epsilon_M = -1} is followed by at least {N} positive signs (thus there exists {M < n_N < \dots < n_{2N-1}} with {\epsilon_{n_N} = \dots = \epsilon_{n_{2N-1}} = +1}). Then there exists a convergent power series (1) on {(0,+\infty)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,+\infty))}.

One can ask the same question with {(0,\rho)} or {(0,+\infty)} replaced by other domains such as {(-\rho,\rho)}, or the complex disk {D(0,\rho)}, but it turns out that there are far fewer entrywise positivity preserving functions in those cases basically because of the non-trivial zeroes of Schur polynomials in these ranges; see the paper for further discussion. We also have some quantitative bounds on how negative some of the coefficients can be compared to the positive coefficients, but they are a bit technical to state here.

The heart of the proofs of these results is an analysis of the determinants {\mathrm{det} P[ {\bf u} {\bf u}^* ]} of polynomials {P} applied entrywise to rank one matrices {uu^*}; the positivity of these determinants can be used (together with a continuity argument) to establish the positive definiteness of {P[uu^*]} for various ranges of {P} and {u}. Using the Cauchy-Binet formula, one can rewrite such determinants as linear combinations of squares of magnitudes of generalised Vandermonde determinants

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N},

where {{\bf u} = (u_1,\dots,u_N)} and the signs of the coefficients in the linear combination are determined by the signs of the coefficients of {P}. The task is then to find upper and lower bounds for the magnitudes of such generalised Vandermonde determinants. These determinants oscillate in sign, which makes the problem look difficult; however, an algebraic miracle intervenes, namely the factorisation

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N} = \pm V({\bf u}) s_\lambda({\bf u})

of the generalised Vandermonde determinant into the ordinary Vandermonde determinant

\displaystyle  V({\bf u}) = \prod_{1 \leq i < j \leq N} (u_j - u_i)

and a Schur polynomial {s_\lambda} applied to {{\bf u}}, where the weight {\lambda} of the Schur polynomial is determined by {n_1,\dots,n_N} in a simple fashion. The problem then boils down to obtaining upper and lower bounds for these Schur polynomials. Because we are restricting attention to matrices taking values in {(0,\rho)} or {(0,+\infty)}, the entries of {{\bf u}} can be taken to be non-negative. One can then take advantage of the total positivity of the Schur polynomials to compare these polynomials with a monomial, at which point one can obtain good criteria for {P[A]} to be positive definite when {A} is a rank one positive definite matrix {A = {\bf u} {\bf u}^*}.

If we allow the exponents {n_1,\dots,n_N} to be real numbers rather than integers (thus replacing polynomials or power series by Pusieux series or Hahn series), then we lose the above algebraic miracle, but we can replace it with a geometric miracle, namely the Harish-Chandra-Itzykson-Zuber identity, which I discussed in this previous blog post. This factors the above generalised Vandermonde determinant as the product of the ordinary Vandermonde determinant and an integral of a positive quantity over the orthogonal group, which one can again compare with a monomial after some fairly elementary estimates.

It remains to understand what happens for more general positive semi-definite matrices {A}. Here we use a trick of FitzGerald and Horn to amplify the rank one case to the general case, by expressing a general positive semi-definite matrix {A} as a linear combination of a rank one matrix {{\bf u} {\bf u}^*} and another positive semi-definite matrix {B} that vanishes on the last row and column (and is thus effectively a positive definite {N-1 \times N-1} matrix). Using the fundamental theorem of calculus to continuously deform the rank one matrix {{\bf u} {\bf u}^*} to {A} in the direction {B}, one can then obtain positivity results for {P[A]} from positivity results for {P[{\bf u} {\bf u}^*]} combined with an induction hypothesis on {N}.


Filed under: math.CA, math.SP, paper Tagged: Apoorva Khare, Hadamard product, Harish-Chandra-Itzykson-Zuber formula, positive definite matrices, Schur polynomials

September 29, 2017

Scott AaronsonMichael Cohen (1992-2017)

Image result for michael cohen mit

I first encountered Michael Cohen when, as a freshman newly arrived at MIT, he walked into my office unannounced to ask if I had any open problems for him to solve.  My first reaction was bemused annoyance: who does this punk think he is?  he’s ready to do theory, but doesn’t even know enough to make an appointment first?  also, why doesn’t he shave?

And then, five minutes later, I was practically begging him to do research with me.  “OK, so you didn’t like that problem?  Wait, I’ve got another one!  Hey, where are you going…”

Within those five minutes, it had become obvious that this was a freshman who I could—must—talk to like an advanced grad student or professor.  Sadly for quantum computing, Michael ultimately decided to go into classical parts of theoretical computer science, such as low-rank approximation and fast algorithms for geometry and linear-algebra problems.  But that didn’t stop him from later taking my graduate course on quantum complexity theory, where he sat in the front and loudly interrupted me every minute, stream-of-consciousness style, so that my “lectures” often turned into dialogues with him.  Totally unforgivable—all the more so because his musings were always on point, constantly catching me in errors or unjustified claims (one of which I blogged about previously).

Not once did I ever suspect he did it to show off: he was simply so overtaken by his urge to understand the point at hand, as to be oblivious to all niceties.  Yet somehow, that social obliviousness didn’t stop him from accumulating a huge circle of friends.  (Well, it was MIT.)

Michael stayed on at MIT as a grad student, racking up an incredible publication list by age 25.  This semester, he went to visit the Simons Institute for Theory of Computing in Berkeley.

Three days ago, Michael was found dead in his apartment in Berkeley, after having cancelled a scheduled talk because he was feeling unwell.  No cause has been given.

The horrible news came just as I was arriving in Germany for the Heidelberg Laureate Forum, to speak about quantum supremacy.  So I barely had time to process the tragedy—yet it was always in the background, especially as I learned that in his brief life, Michael had also touched many of the other computer scientists who I spoke with in Heidelberg, such as Dan Spielman, whose approach to Ramanujan graphs (with Marcus and Srivastava) Michael had made constructive in one of his most celebrated works.  Only now is the full weight of what happened bearing down on me.

I understand that memorial events are being planned at both MIT and Berkeley.  Feel free to share memories of Michael in the comments; see also Luca’s post and Lance’s post.

This is an unfathomable loss for Michael’s family, for his many friends and colleagues, and for a field that’s been robbed of decades of breakthroughs.

September 28, 2017

n-Category Café Applied Category Theory at UCR (Part 2)

I’m running a special session on applied category theory, and now the program is available:

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at M.I.T., and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopolou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone — for example, most of my students will be speaking at this special session, and other people too — but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Wanna see what the talks are about?

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

Saturday November 4, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
A higher-order temporal logic for dynamical systems.
David I. Spivak, M.I.T.


10:00 a.m.
Algebras of open dynamical systems on the operad of wiring diagrams.
Dmitry Vagner*, Duke University
David I. Spivak, M.I.T.
Eugene Lerman, University of Illinois at Urbana-Champaign


10:30 a.m.
Abstract dynamical systems.
Christina Vasilakopoulou*, University of California, Riverside
David Spivak, M.I.T.
Patrick Schultz, M.I.T.


Saturday November 4, 2017, 3:00 p.m.-5:50 p.m.

3:00 p.m.
Black boxes and decorated corelations.
Brendan Fong, M.I.T.


4:00 p.m.
Compositional modelling of open reaction networks.
Blake S. Pollard*, University of California, Riverside
John C. Baez, University of California, Riverside


4:30 p.m.
A bicategory of coarse-grained Markov processes.
Kenny Courser, University of California, Riverside


5:00 p.m.
A bicategorical syntax for pure state qubit quantum mechanics.
Daniel Michael Cicala, University of California, Riverside


5:30 p.m.
Open systems in classical mechanics.
Adam Yassine, University of California Riverside


Sunday November 5, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
Controllability and observability: diagrams and duality.
Jason Erbele, Victor Valley College


9:30 a.m.
Frobenius monoids, weak bimonoids, and corelations.
Brandon Coya, University of California, Riverside


10:00 a.m.
Compositional design and tasking of networks.
John D. Foley*, Metron, Inc.
John C. Baez, University of California, Riverside
Joseph Moeller, University of California, Riverside
Blake S. Pollard, University of California, Riverside


10:30 a.m.
Operads for modeling networks.
Joseph Moeller*, University of California, Riverside
John Foley, Metron Inc.
John C. Baez, University of California, Riverside
Blake S. Pollard, University of California, Riverside


Sunday November 5, 2017, 2:00 p.m.-4:50 p.m.

2:00 p.m.
Reeb graph smoothing via cosheaves.
Vin de Silva, Department of Mathematics, Pomona College


3:00 p.m.
Knowledge representation in bicategories of relations.
Evan Patterson*, Stanford University, Statistics Department


3:30 p.m.
The multiresolution analysis of flow graphs.
Steve Huntsman*, BAE Systems


4:00 p.m.
Data modeling and integration using the open source tool Algebraic Query Language (AQL).
Peter Y. Gates*, Categorical Informatics
Ryan Wisnesky, Categorical Informatics

John BaezApplied Category Theory at UCR (Part 2)

I’m running a special session on applied category theory, and now the program is available:

Applied category theory, Fall Western Sectional Meeting of the AMS, 4-5 November 2017, U.C. Riverside.

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at MIT, and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopoulou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone—for example, most of my students will be speaking at this special session, and other people too—but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

Saturday November 4, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
A higher-order temporal logic for dynamical systems.
David I. Spivak, MIT

10:00 a.m.
Algebras of open dynamical systems on the operad of wiring diagrams.
Dmitry Vagner*, Duke University
David I. Spivak, MIT
Eugene Lerman, University of Illinois at Urbana-Champaign

10:30 a.m.
Abstract dynamical systems.
Christina Vasilakopoulou*, University of California, Riverside
David Spivak, MIT
Patrick Schultz, MIT

Saturday November 4, 2017, 3:00 p.m.-5:50 p.m.

3:00 p.m.
Black boxes and decorated corelations.
Brendan Fong, MIT

4:00 p.m.
Compositional modelling of open reaction networks.
Blake S. Pollard*, University of California, Riverside
John C. Baez, University of California, Riverside

4:30 p.m.
A bicategory of coarse-grained Markov processes.
Kenny Courser, University of California, Riverside

5:00 p.m.
A bicategorical syntax for pure state qubit quantum mechanics.
Daniel M. Cicala, University of California, Riverside

5:30 p.m.
Open systems in classical mechanics.
Adam Yassine, University of California Riverside

Sunday November 5, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
Controllability and observability: diagrams and duality.
Jason Erbele, Victor Valley College

9:30 a.m.
Frobenius monoids, weak bimonoids, and corelations.
Brandon Coya, University of California, Riverside

10:00 a.m.
Compositional design and tasking of networks.
John D. Foley*, Metron, Inc.
John C. Baez, University of California, Riverside
Joseph Moeller, University of California, Riverside
Blake S. Pollard, University of California, Riverside

10:30 a.m.
Operads for modeling networks.
Joseph Moeller*, University of California, Riverside
John Foley, Metron Inc.
John C. Baez, University of California, Riverside
Blake S. Pollard, University of California, Riverside

Sunday November 5, 2017, 2:00 p.m.-4:50 p.m.

2:00 p.m.
Reeb graph smoothing via cosheaves.
Vin de Silva, Department of Mathematics, Pomona College

3:00 p.m.
Knowledge representation in bicategories of relations.
Evan Patterson*, Stanford University, Statistics Department

3:30 p.m.
The multiresolution analysis of flow graphs.
Steve Huntsman*, BAE Systems

4:00 p.m.
Data modeling and integration using the open source tool Algebraic Query Language (AQL).
Peter Y. Gates*, Categorical Informatics
Ryan Wisnesky, Categorical Informatics


September 25, 2017

Jordan EllenbergGame report: Cubs 5, Brewers 0

  • I guess the most dominant pitching performance I’ve seen in person?  Quintana never seemed dominant.  The Brewers hit a lot of balls hard.  But a 3-hit complete game shutout is a 3-hit complete game shutout.
  • A lot of Cubs fans. A lot a lot.  My kids both agreed there were more Cubs than Brewers fans there, in a game that probably mattered more to Milwaukee.
  • For Cubs fans to boo Ryan Braun in Wrigley Field is OK, I guess.  To come to Miller Park and boo Ryan Braun is classless.  Some of those people were wearing Sammy Sosa jerseys!
  • This is the first time I’ve sat high up in the outfield.  And the view was great, as it’s been from every other seat I’ve ever occupied there.  A really nice design.  If only the food were better.

September 23, 2017

John PreskillStanding back at Stanford

T-shirt 1

This T-shirt came to mind last September. I was standing in front of a large silver-colored table littered with wires, cylinders, and tubes. Greg Bentsen was pointing at components and explaining their functions. He works in Monika Schleier-Smith’s lab, as a PhD student, at Stanford.

Monika’s group manipulates rubidium atoms. A few thousand atoms sit in one of the cylinders. That cylinder contains another cylinder, an optical cavity, that contains the atoms. A mirror caps each of the cavity’s ends. Light in the cavity bounces off the mirrors.

Light bounces off your bathroom mirror similarly. But we can describe your bathroom’s light accurately with Maxwellian electrodynamics, a theory developed during the 1800s. We describe the cavity’s light with quantum electrodynamics (QED). Hence we call the lab’s set-up cavity QED.

The light interacts with the atoms, entangling with them. The entanglement imprints information about the atoms on the light. Suppose that light escaped from the cavity. Greg and friends could measure the light, then infer about the atoms’ quantum state.

A little light leaks through the mirrors, though most light bounces off. From leaked light, you can infer about the ensemble of atoms. You can’t infer about individual atoms. For example, consider an atom’s electrons. Each electron has a quantum property called a spin. We sometimes imagine the spin as an arrow that points upward or downward. Together, the electrons’ spins form the atom’s joint spin. You can tell, from leaked light, whether one atom’s spin points upward. But you can’t tell which atom’s spin points upward. You can’t see the atoms for the ensemble.

Monika’s team can. They’ve cut a hole in their cylinder. Light escapes the cavity through the hole. The light from the hole’s left-hand edge carries information about the leftmost atom, and so on. The team develops a photograph of the line of atoms. Imagine holding a photograph of a line of people. You can point to one person, and say, “Aha! She’s the xkcd fan.” Similarly, Greg and friends can point to one atom in their photograph and say, “Aha! That atom has an upward-pointing spin.” Monika’s team is developing single-site imaging.

Solvay

Aha! She’s the xkcd fan.

Monika’s team plans to image atoms in such detail, they won’t need for light to leak through the mirrors. Light leakage creates problems, including by entangling the atoms with the world outside the cavity. Suppose you had to diminish the amount of light that leaks from a rubidium cavity. How should you proceed?

Tell the mirrors,

T-shirt 2

You should lengthen the cavity. Why? Imagine a photon, a particle of light, in the cavity. It zooms down the cavity’s length, hits a mirror, bounces off, retreats up the cavity’s length, hits the other mirror, and bounces off. The photon repeats this process until a mirror hit fails to generate a bounce. The mirror transmits the photon to the exterior; the photon leaks out. How can you reduce leaks? By preventing photons from hitting mirrors so often, by forcing the photons to zoom longer, by lengthening the cavity, by shifting the mirrors outward.

So Greg hinted, beside that silver-colored table in Monika’s lab. The hint struck a chord: I recognized the impulse to

T-shirt 3

The impulse had led me to Stanford.

Weeks earlier, I’d written my first paper about quantum chaos and information scrambling. I’d sat and read and calculated and read and sat and emailed and written. I needed to stand up, leave my cavity, and image my work from other perspectives.

Stanford physicists had written quantum-chaos papers I admired. So I visited, presented about my work, and talked. Patrick Hayden introduced me to a result that might help me apply my result to another problem. His group helped me simplify a mathematical expression. Monika reflected that a measurement scheme I’d proposed sounded not unreasonable for cavity QED.

And Greg led me to recognize the principle behind my visit: Sometimes, you have to

T-shirt 4

to move forward.

With gratitude to Greg, Monika, Patrick, and the rest of Monika’s and Patrick’s groups for their time, consideration, explanations, and feedback. With thanks to Patrick and Stanford’s Institute for Theoretical Physics for their hospitality.


September 22, 2017

Tommaso DorigoTop Quarks Observed In Proton-Nucleus Collisions For The First Time

The top quark is the heaviest known matter corpuscle we consider elementary. 
Elementary is an overloaded word in English, so I need to explain what it means in the context of subatomic particles. If we grab a dictionary we get several possibilities, like e.g.- elementary: pertanining to or dealing with elements, rudiments, or first principles


- elementary: of the nature of an ultimate constituent; uncompounded
- elementary: not decomposable into elements or other primary constituents
- elementary: simple

read more

n-Category Café Schröder Paths and Reverse Bessel Polynomials

I want to show you a combinatorial interpretation of the reverse Bessel polynomials which I learnt from Alan Sokal. The sequence of reverse Bessel polynomials begins as follows.

θ 0(R) =1 θ 1(R) =R+1 θ 2(R) =R 2+3R+3 θ 3(R) =R 3+6R 2+15R+15 \begin{aligned} \theta_0(R)&=1\\ \theta_1(R)&=R+1\\ \theta_2(R)&=R^2+3R+3\\ \theta_3(R)&= R^3 +6R^2+15R+15 \end{aligned}

To give you a flavour of the combinatorial interpretation we will prove, you can see that the second reverse Bessel polynomial can be read off the following set of ‘weighted Schröder paths’: multiply the weights together on each path and add up the resulting monomials.

Schroeder paths

In this post I’ll explain how to prove the general result, using a certain result about weighted Dyck paths that I’ll also prove. At the end I’ll leave some further questions for the budding enumerative combinatorialists amongst you.

These reverse Bessel polynomials have their origins in the theory of Bessel functions, but which I’ve encountered in the theory of magnitude, and they are key to a formula I give for the magnitude of an odd dimensional ball which I have just posted on the arxiv.

In that paper I use the combinatorial expression for these Bessel polynomials to prove facts about the magnitude.

Here, to simplify things slightly, I have used the standard reverse Bessel polynomials whereas in my paper I use a minor variant (see below).

I should add that a very similar expression can be given for the ordinary, unreversed Bessel polynomials; you just need a minor modification to the way the weights on the Schröder paths are defined. I will leave that as an exercise.

The reverse Bessel polynomials

The reverse Bessel polynomials have many properties. In particular they satisfy the recursion relation θ i+1(R)=R 2θ i1(R)+(2i+1)θ i(R) \theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R) and θ i(R)\theta_i(R) satisfies the differential equation Rθ i (R)2(R+i)θ i (R)+2iθ i(R)=0. R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0. There’s an explicit formula: θ i(R)= t=0 i(i+t)!(it)!t!2 tR it. \theta_i(R) = \sum_{t=0}^i \frac{(i+t)!}{(i-t)!\, t!\, 2^t}R^{i-t}.

I’m interested in them because they appear in my formula for the magnitude of odd dimensional balls. To be more precise, in my formula I use the associated Sheffer polynomials, (χ i(R)) i=0 (\chi_i(R))_{i=0}^\infty; they are related by χ i(R)=Rθ i1(R)\chi_i(R)=R\theta_{i-1}(R), so the coefficients are the same, but just moved around a bit. These polynomials have a similar but slightly more complicated combinatorial interpretation.

In my paper I prove that the magnitude of the (2p+1)(2p+1)-dimensional ball of radius RR has the following expression:

|B R 2p+1|=det[χ i+j+2(R)] i,j=0 p(2p+1)!Rdet[χ i+j(R)] i,j=0 p \left|B^{2p+1}_R \right|= \frac{\det[\chi_{i+j+2}(R)]_{i,j=0}^p}{(2p+1)!\, R\,\det[\chi_{i+j}(R)]_{i,j=0}^p}

As the each polynomial χ i(R)\chi_i(R) has a path counting interpretation, one can use the rather beautiful Lindström-Gessel-Viennot Lemma to give a path counting interpretation to the determinants in the above formula and find some explicit expression. I will probably blog about this another time. (Fellow host Qiaochu has also blogged about the LGV Lemma.)

Weighted Dyck paths

Before getting on to Bessel polynomials and weighted Schröder paths, we need to look at counting weighted Dyck paths, which are simpler and more classical.

A Dyck path is a path in the lattice 2\mathbb{Z}^2 which starts at (0,0)(0,0), stays in the upper half plane, ends back on the xx-axis at (2i,0)(2{i},0) and has steps going either diagonally right and up or right and down. The integer 2i2{i} is called the length of the path. Let D iD_{i} be the set of length 2i2{i} Dyck paths.

For each Dyck path σ\sigma, we will weight each edge going right and down, from (x,y)(x,y) to (x+1,y1)(x+1,y-1) by yy then we will take w(σ)w(\sigma), the weight of σ\sigma, to be the product of all the weights on its steps. Here are all five weighted Dyck paths of length six.

Dyck paths

Famously, the number of Dyck paths of length 2i2{i} is given by the i{i}th Catalan number; here, however, we are interested in the number of paths weighted by the weighting(!). If we sum over the weights of each of the above diagrams we get 6+4+2+2+1=156+4+2+2+1=15. Note that this is 5×3×15\times 3 \times 1. This is a pattern that holds in general.

Theorem A. (Françon and Viennot) The weighted count of length 2i2{i} Dyck paths is equal to the double factorial of 2i12{i} -1: σD iw(σ) =(2i1)(2i3)(2i5)31 (2i1)!!. \begin{aligned} \sum_{\sigma\in D_{i}} w(\sigma)&= (2{i} -1)\cdot (2{i} -3)\cdot (2{i}-5)\cdot \cdots\cdot 3\cdot 1 \\ &\eqqcolon (2{i} -1)!!. \end{aligned}

The following is a nice combinatorial proof of this theorem that I found in a survey paper by Callan. (I was only previously aware of a high-tech proof involving continued fractions and a theorem of Gauss.)

The first thing to note is that the weight of a Dyck path is actually counting something. It is counting the ways of labelling each of the down steps in the diagram by a positive integer less than the height (i.e.~the weight) of that step. We call such a labelling a height labelling. Note that we have no choice of weighting but we often have choice of height labelling. Here’s a height labelled Dyck path.

height labelled Dyck path

So the weighted count of Dyck paths of length 2i2{i} is precisely the number of height labelled Dyck paths of length 2i2{i}. σD iw(σ)=#{height labelled paths of length 2i} \sum_{\sigma\in D_{i}} w(\sigma) = \#\{\text{height labelled paths of length }\,\,2{i}\}

We are going to consider marked Dyck paths, which just means we single out a specific vertex. A path of length 2i2{i} has 2i+12{i} + 1 vertices. Thus

#{height labelled, MARKED paths of length 2i} =(2i+1)×#{height labelled paths of length 2i}. \begin{aligned} \#\{\text{height labelled,}\,\, &\text{ MARKED paths of length }\,\,2{i}\}\\ &=(2{i}+1)\times\#\{\text{height labelled paths of length }\,\,2{i}\}. \end{aligned}

Hence the theorem will follow by induction if we find a bijection

{height labelled, paths of length 2i} {height labelled, MARKED paths of length 2i2}. \begin{aligned} \{\text{height labelled,}\,\,&\text{ paths of length }\,\,2{i} \}\\ &\cong \{\text{height labelled, MARKED paths of length }\,\,2{i}-2 \}. \end{aligned}

Such a bijection can be constructed in the following way. Given a height labelled Dyck path, remove the left-hand step and the first step that has a label of one on it. On each down step between these two deleted steps decrease the label by one. Now join the two separated parts of the path together and mark the vertex at which they are joined. Here is an example of the process.

dyck bijection

Working backwards it is easy to describe the inverse map. And so the theorem is proved.

Schröder paths and reverse Bessel polynomials

In order to give a path theoretic interpretation of reverse Bessel polynomials we will need to use Schröder paths. These are like Dyck paths except we allow a certain kind of flat step.

A Schröder path is a path in the lattice 2\mathbb{Z}^2 which starts at (0,0)(0,0), stays in the upper half plane, ends back on the xx-axis at (2i,0)(2{i},0) and has steps going either diagonally right and up, diagonally right and down, or horizontally two units to the right. The integer 2i2{i} is called the length of the path. Let S iS_{i} be the set of all length 2i2{i} Schröder paths.

For each Schröder path σ\sigma, we will weight each edge going right and down, from (x,y)(x,y) to (x+1,y1)(x+1,y-1) by yy and we will weight each flat edge by the indeterminate RR. Then we will take w(σ)w(\sigma), the weight of σ\sigma, to be the product of all the weights on its steps.

Here is the picture of all six length four weighted Schröder paths again.

Schroeder paths

You were asked at the top of this post to check that the sum of the weights equals the second reverse Bessel polynomial. Of course that result generalizes!

The following theorem was shown to me by Alan Sokal, he proved it using continued fractions methods, but these essentially amount to the combinatorial proof I’m about to give.

Theorem B. The weighted count of length 2i2{i} Schröder paths is equal to the i{i}th reverse Bessel polynomial: σS iw(σ)=θ i(R). \sum_{\sigma\in S_{i}} w(\sigma)= \theta_{i}(R).

The idea is to observe that you can remove the flat steps from a weighted Schröder path to obtain a weighted Dyck path. If a Schröder path has length 2i2{i} and tt upward steps then it has tt downward steps and it{i}-t flat steps, so it has a total of i+t{i}+t steps. This means that there are (i+tit)\binom{{i}+t}{{i}-t} length 2i2{i} Schröder paths with the same underlying length 2t2t Dyck path (we just choose were to insert the flat steps). Let’s write S i tS^t_{i} for the set of Schröder paths of length 2i2{i} with tt upward steps. σS iw(σ) = t=0 i σS i tw(σ)= t=0 i(i+tit) σD tw(σ)R it = t=0 i(i+tit)(2t1)!!R it = t=0 i(i+t)!(it)!(2t)!(2t)!2 tt!R it =θ i(R), \begin{aligned} \sum_{\sigma\in S_{i}} w(\sigma) &= \sum_{t=0}^{i} \sum_{\sigma\in S^t_{i}} w(\sigma) = \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}\sum_{\sigma'\in D_t} w(\sigma')R^{{i}-t}\\ &= \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}(2t-1)!!\,R^{{i}-t}\\ &= \sum_{t=0}^{i} \frac{({i}+t)!}{({i}-t)!\,(2t)!}\frac{(2t)!}{2^t t!}R^{{i}-t}\\ &= \theta_{i}(R), \end{aligned} where the last equality comes from the formula for θ i(R)\theta_{i}(R) given at the beginning of the post.

Thus we have the required combinatorial interpretation of the reverse Bessel polynomials.

Further questions

The first question that springs to mind for me is if it is possible to give a bijective proof of Theorem B, similar in style, perhaps (or perhaps not), to the proof given of Theorem A, basically using the recursion relation θ i+1(R)=R 2θ i1(R)+(2i+1)θ i(R) \theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R) rather than the explicit formular for them.

The second question would be whether the differential equation Rθ i (R)2(R+i)θ i (R)+2iθ i(R)=0. R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0. has some sort of combinatorial interpretation in terms of paths.

I’m interested to hear if anyone has any thoughts.

n-Category Café Lattice Paths and Continued Fractions II

Last time we proved Flajolet’s Fundamental Lemma about enumerating Dyck paths. This time I want to give some examples, in particular to relate this to what I wrote previously about Dyck paths, Schröder paths and what they have to do with reverse Bessel polynomials.

We’ll see that the generating function of the sequence of reverse Bessel polynomials (θ i(R)) i=0 \left(\theta_i(R)\right)_{i=0}^\infty has the following continued fraction expansion.

i=0 θ i(R)t i=11Rtt1Rt2t1Rt3t1 \sum_{i=0}^\infty \theta_i(R) \,t^i = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}

I’ll even give you a snippet of SageMath code so you can have a play around with this if you like.

Flajolet’s Fundamental Lemma

Let’s just recall from last time that if we take Motzkhin paths weighted by a ia_is, b ib_is and c ic_is as in this example,

weighted Motzkhin path

then when we sum the weightings of all Motzkhin paths together we have the following continued fraction expression. σMotzkhinw a,b,c(σ)=11c 0a 1b 11c 1a 2b 21c 2a 3b 31[[a i,b i,c i]] \sum_{\sigma\,\,\mathrm{Motzkhin}} w_{a,b,c}(\sigma) = \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}} \in\mathbb{Z}[[a_i, b_i, c_i]]

Jacobi continued fractions and Motzkhin paths

Flajolet’s Fundamental Lemma is very beautiful, but we want a power series going up in terms of path length. So let’s use another variable tt to keep track of path length. All three types of step in a Motzkhin path have length one. We can set a i=α ita_i=\alpha_i t, b i=β itb_i=\beta_i t and c i=γ itc_i=\gamma_i t. Then σw a,b,c(σ)[α i,β i,γ i][[t]]\sum_{\sigma} w_{a, b, c}(\sigma)\in \mathbb{Z}[\alpha_i, \beta_i, \gamma_i][[t]], and the coefficient of t t^\ell will be the sum of the weights of Motzkhin paths of length \ell. This coefficient will be a polynomial (rather than a power series) as there are only finitely paths of a given length.

=0 ( σMotzkhin lengthw α,β,γ(σ))t =11γ 0tα 1β 1t 21γ 1tα 2β 2t 21γ 2tα 3β 3t 21 \sum_{\ell=0}^\infty\left(\sum_{\sigma\,\,\text{Motzkhin length}\,\,\ell} w_{\alpha,\beta,\gamma}(\sigma)\right)t^\ell = \frac{1} {1- \gamma_{0}t - \frac{\alpha_{1}\beta_1 t^2} {1-\gamma_{1}t - \frac{\alpha_{2}\beta_2 t^2} {1- \gamma_2t - \frac{\alpha_3 \beta_3 t^2} {1-\dots }}}}

Such a continued fraction is called a Jacobi (or J-type) continued fraction. They crop up in the study of moments of orthogonal polynomials and also in birth-death processes.

For example, I believe that Euler proved the following Jacobi continued fraction expansion of the generating function of the factorials. =0 !t =11tt 213t4t 215t9t 21 \sum_{\ell=0}^\infty \ell!\, t^\ell = \frac{1} {1- t - \frac{t^2} {1-3 t - \frac{4 t^2} {1- 5t - \frac{9 t^2} {1-\dots }}}} We can get the right hand side by taking α i=β i=i\alpha_i=\beta_i=i and γ i=2i+1\gamma_i=2i+1. Here is a Motzkhin path weighted in that way.

weighted Motzkhin path

The equation above is telling us that if we weight Motzkhin paths in that way, then the weighted count of Motzkhin paths of length \ell is !\ell!, and that deserves an exclamation mark! (You’re invited to verify this for Motzkhin paths of length 4.)

I’ve put some SageMath code at the bottom of this post if you want to check the continued fraction equality numerically.

Stieltjes continued fractions and Dyck paths

A Dyck path is a Motzkhin path with no flat steps. So if we weight the flat steps in Motzkhin paths with 00 then when we do a weighted count then we just count the weighted Dyck paths. This means setting γ i=0\gamma_i=0.

Also the weigh α i\alpha_i on an up step always appears with the weight β i\beta_i on a corresponding down step (what goes up must come down!) so we can simplify things by just putting a weighting α iβ i\alpha_i\beta_i — which we’ll rename as α i\alpha_i — on the down step from level ii and put a weighting of 11 on each up step. We can call this weighting w αw_\alpha.

Putting this together we get the following, where we’ve noted that there are no Dyck paths of odd length.

n=0 ( σDyck length2nw α(σ))t 2n=11α 1t 21α 2t 21α 3t 21 \sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Dyck length}\,\,2n} w_\alpha(\sigma)\right)t^{2n} = \frac{1} {1- \frac{\alpha_{1} t^2} {1- \frac{\alpha_{2} t^2} {1- \frac{\alpha_3 t^2} {1-\dots }}}}

This kind of continued fraction is called a Stieltjes (or S-type) continued fraction. Of course, we could replace t 2t^2 by tt in the above, without any ill effect.

Previously we proved combinatorially that with the weighting where α i=i\alpha_i=i the weighted count of Dyck paths of length 2n2n was precisely (2n1)!!(2n-1)!!. This means that we have proved the following continued fraction expansion of the generating function of the odd double factorials.

n=0 (2n1)!!t 2n=11t 212t 213t 21 \sum_{n=0}^\infty (2n -1)!!\, t^{2n} = \frac{1} {1- \frac{ t^2} {1- \frac{2 t^2} {1- \frac{3 t^2} {1-\dots }}}}

I believe this was originally proved by Gauss, but I have no idea how.

Again there’s some SageMath code at the end for you to see this in action.

Thron continued fractions and Schröder paths

What I’m really interested in, you’ll remember, is reverse Bessel polynomials, and these are giving weighted counts of Schroder paths. Using continued fractions in this context is less standard than for Dyck paths and Motzkhin paths as above, but it only requires a minor modification. I learnt about this from Alan Sokal.

The difference between Motzkhin paths and Schröoder paths is that the flat steps have length 22 in Schroder paths. Remember that the power of tt was encoding the length, so we just have to assign t 2t^2 to each flat step rather than tt. So if we put a i=ta_i= t, b i=α itb_i=\alpha_i t and c i=γ it 2c_i = \gamma_i t^2 in Flajolet’s Fundamental Lemma then we get the following.

n=0 ( σSchroder length2nw α,γ(σ))t 2n=11γ 0t 2α 1t 21γ 1t 2α 2t 21γ 2t 2α 3t 21 \sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{\alpha,\gamma}(\sigma)\right)t^{2n} = \frac{1} {1- \gamma_{0}t^2 - \frac{\alpha_{1} t^2} {1-\gamma_{1}t^2 - \frac{\alpha_{2} t^2} {1- \gamma_2t^2 - \frac{\alpha_3 t^2} {1-\dots }}}}

Here w α,γw_{\alpha, \gamma} is the weighting where we put α i\alpha_is on the down steps and γ i\gamma_is on the flat steps.

This kind of continued fraction is called a Thron (or T-type) continued fraction. Again, we could replace t 2t^2 by tt in the above, without any ill effect.

We saw before that if we take the weighting, w rBpw_{rBp}, with α i:=i\alpha_i:=i and γ i:=R\gamma_i:=R, such as in the following picture,

weighted schroder path

then the weighted sum of Schroder paths of length 2n2n is precisely the nnth reverse Bessel polynomial: θ i(R)= σSchroder length2nw rBp(σ). \theta_i(R)= \sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{rBp}(\sigma).

Putting that together with the Thron continued fraction above we get the following Thron continued fraction expansion for the generating function of the reverse Bessel polynomials.

n=0 θ n(R)t n=11Rtt1Rt2t1Rt3t1 \sum_{n=0}^\infty \theta_n(R) t^n = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}

This expression is given by Paul Barry, without any reference, in the formulas section of the entry in the Online Encyclopedia of Integer Sequences.

See the end of the post for some SageMath code to check this numerically.

In my recent magnitude paper I actually work backwards. I start with the continued fraction expansion as a given, and use Flajolet’s Fundamental Lemma to give the Schr&\ouml;der path interpretation of the reverse Bessel polynomials. Of course, I now know that I can bypass the use of continued fractions completely, and have a purely combinatorial proof of this interpretation. Regardless of that, however, the theory of lattice paths and continued fractions remains beautiful.

Appendix: Some SageMath code

It’s quite easy to play around with these continued fractions in SageMath, at least to some finite order. I thought I’d let you have some code to get you started…

Here’s some SageMath code for you to check the Jacodi continued fraction expansion of the generating function of the factorials.

# T = Z[t]
T.<t> = PolynomialRing(ZZ)
# We'll take the truncated continued fraction to be in the 
# ring of rational functions, P = Z(t)
P = Frac(T)

def j_ctd_frac(alphas, gammas):
    if alphas == [] or gammas == []:
        return 1
    else:
        return P(1/(1 - gammas[0]*t - alphas[0]*t^2* 
                        j_ctd_frac(alphas[1:], gammas[1:])))

cf(t) = j_ctd_frac([1, 4, 9, 16, 25, 36], [1, 3, 5, 7, 9, 11]) 
print cf(t).series(t, 10)

The above code can be used to define a Stieltjes continued fraction and check out the expansion of Gauss on the odd double factorials.

def s_ctd_frac(alphas):
    gammas = [0]*len(alphas)
    return j_ctd_frac(alphas, gammas)

cf(t) = s_ctd_frac([1, 2, 3, 4, 5, 6])
print cf(t).series(t, 13)

Here’s the code for getting the reverse Bessel polynomials from a Thron continued fraction.

S.<R> = PolynomialRing(ZZ)
T.<t> = PowerSeriesRing(S)

def t_ctd_frac(alphas, gammas):
    if alphas == [] or gammas == []:
        return 1
    else: 
        return (1/(1- gammas[0]*t^2 - alphas[0]*t^2* 
                      t_ctd_frac(alphas[1:], gammas[1:])))

print T(t_ctd_frac([1, 2, 3, 4, 5, 6], [R, R, R, R, R, R]),
        prec=13)

September 21, 2017

n-Category Café Applied Category Theory 2018

We’re having a conference on applied category theory!

The plenary speakers will be:

  • Samson Abramsky (Oxford)
  • John Baez (UC Riverside)
  • Kathryn Hess (EPFL)
  • Mehrnoosh Sadrzadeh (Queen Mary)
  • David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website.

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

Towards an integrative science: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.

While the workshop will host talks on a wide range of applications of category theory, there will be three special tracks on exciting new developments in the field:

  1. Dynamical systems and networks
  2. Systems biology
  3. Cognition and AI
  4. Causality

Accompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

Sincerely,
The organizers

Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

About Applied Category Theory

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

The Proposal: Towards an Integrative Science

Category theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez and Mike Stay. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945-1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127-135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55-92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121-134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317-341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.

September 19, 2017

Tim GowersTwo infinities that are surprisingly equal

It has been in the news recently — or rather, the small corner of the news that is of particular interest to mathematicians — that Maryanthe Malliaris and Saharon Shelah recently had an unexpected breakthrough when they stumbled on a proof that two infinities were equal that had been conjectured, and widely believed, to be distinct. Or rather, since both were strictly between the cardinality of the natural numbers and the cardinality of the reals, they were widely believed to be distinct in some models of set theory where the continuum hypothesis fails.

A couple of days ago, John Baez was sufficiently irritated by a Quanta article on this development that he wrote a post on Google Plus in which he did a much better job of explaining what was going on. As a result of reading that, and following and participating in the ensuing discussion, I have got interested in the problem. In particular, as a complete non-expert, I am struck that a problem that looks purely combinatorial (though infinitary) should, according to Quanta, have a solution that involves highly non-trivial arguments in proof theory and model theory. It makes me wonder, again as a complete non-expert so probably very naively, whether there is a simpler purely combinatorial argument that the set theorists missed because they believed too strongly that the two infinities were different.

I certainly haven’t found such an argument, but I thought it might be worth at least setting out the problem, in case it appeals to anyone, and giving a few preliminary thoughts about it. I’m not expecting much from this, but if there’s a small chance that it leads to a fruitful mathematical discussion, then it’s worth doing. As I said above, I am indebted to John Baez and to several commenters on his post for being able to write much of what I write in this post, as can easily be checked if you read that discussion as well.

A few definitions and a statement of the result

The problem concerns the structure you obtain when you take the power set of the natural numbers and quotient out by the relation “has a finite symmetric difference with”. That is, we regard two sets A and B as equivalent if you can turn A into B by removing finitely many elements and adding finitely many other elements.

It’s easy to check that this is an equivalence relation. We can also define a number of the usual set-theoretic operations. For example, writing [A] for the equivalence class of A, we can set [A]\cap[B] to be [A\cap B], [A]\cup [B] to be [A\cup B], [A]^c to be [A^c], etc. It is easy to check that these operations are well-defined.

What about the subset relation? That too has an obvious definition. We don’t want to say that [A]\subset[B] if A\subset B, since that is not well-defined. However, we can define A to be almost contained in B if the set A\setminus B is finite, and then say that [A]\subset[B] if A is almost contained in B. This is well-defined and it’s also easy to check that it is true if and only if [A]\cap[B]=[A], which is the sort of thing we’d like to happen if our finite-fuzz set theory is to resemble normal set theory as closely as possible.

I will use a non-standard piece of terminology and refer to an equivalence class of sets as an f-set, the “f” standing for “finite” or “fuzzy” (though these fuzzy sets are not to be confused with the usual definition of fuzzy sets, which I don’t know and probably never will know). I’ll also say things like “is f-contained in” (which means the same as “is almost contained in” except that it refers to the f-sets rather than to representatives of their equivalence classes).

So far so good, but things start to get a bit less satisfactory when we consider infinite intersections and unions. How are we to define \bigcap_{n=1}^\infty[A_n], for example?

An obvious property we would like is that the intersection should be the largest f-set that is contained in all the [A_n]. However, simple examples show that there doesn’t have to be a largest f-set contained in all the [A_n]. Indeed, let A_1\supset A_2\supset\dots be an infinite sequence of subsets of \mathbb N such that A_n\setminus A_{n+1} is infinite for every n. Then A is almost contained in every A_n if and only if A\setminus A_n is finite for every n. Given any such set, we can find for each n an element b_n of A_n\setminus A_{n+1} that is not contained in A (since A_n\setminus A_{n+1} is infinite but A\setminus A_{n+1} is finite). Then the set B=A\cup\{b_1,b_2,\dots\} is also almost contained in every A_n, and [A] is properly contained in [B] (in the obvious sense).

OK, we don’t seem to have a satisfactory definition of infinite intersections, but we could at least hope for a satisfactory definition of “has an empty intersection”. And indeed, there is an obvious one. Given a collection of f-sets [A_\gamma], we say that its intersection is empty if the only f-set that is f-contained in every [A_\gamma] is [\emptyset]. (Note that [\emptyset] is the equivalence class of the empty set, which consists of all finite subsets of \mathbb N.) In terms of the sets rather than their equivalence classes, this is saying that there is no infinite set that is almost contained in every A_\gamma.

An important concept that appears in many places in mathematics, but particularly in set theory, is the finite-intersection property. A collection \mathcal A of subsets of a set X is said to have this property if A_1\cap\dots\cap A_n is non-empty whenever A_1,\dots,A_n\in\mathcal A. This definition carries over to f-sets with no problem at all, since finite f-intersections were easy to define.

Let’s ask ourselves a little question here: can we find a collection of f-sets with the finite-intersection property but with an empty intersection? That is, no finite intersection is empty, but the intersection of all the f-sets is empty.

That should be pretty easy. For sets, there are very simple examples like A_n=\{n,n+1,\dots\} — finitely many of those have a non-empty intersection, but there is no set that’s contained in all of them.

Unfortunately, all those sets are the same if we turn them into f-sets. But there is an obvious way of adjusting the example: we just take sets A_1\supset A_2\supset\dots such that A_n\setminus A_{n+1} is infinite for each n and \bigcap_{n=1}^\infty A_n=\emptyset. That ought to do the job once we turn each A_n into its equivalence class [A_n].

Except that it doesn’t do the job. In fact, we’ve already observed that we can just pick a set B=\{b_1,b_2,\dots\} with b_n\in A_n\setminus A_{n+1} and then [B] will be a non-empty f-intersection of the A_n.

However, here’s an example that does work. We’ll take all f-sets [A] such that A has density 1. (This means that n^{-1}|A\cap\{1,2,\dots,n\}| tends
to 1 as n tends to infinity.) Since the intersection of any two sets of density 1 has density 1 (a simple exercise), this collection of f-sets has the finite-intersection property. I claim that any f-set contained in all these f-sets must be [\emptyset].

Indeed, let B be an infinite set and (b_1,b_2,\dots) the enumeration of its elements in increasing order. We can pick a subsequence (c_1,c_2,\dots) such that c_n\geq 2^n for every n, and the corresponding subset C is an infinite subset of B with density zero. Therefore, \mathbb N\setminus C is a set of density 1 that does not almost contain B.

The number of f-sets we took there in order to achieve an f-empty intersection was huge: the cardinality of the continuum. (That’s another easy exercise.) Did we really need that many? This innocent question leads straight to a definition that is needed in order to understand what Malliaris and Shelah did.

Definition. The cardinal p is the smallest cardinality of a collection F of f-sets such that F has the finite-intersection property but also F has an empty f-intersection.

It is simple to prove that this cardinal is uncountable, but it is also known not to be as big as the cardinality of the continuum (where again this means that there are models of set theory — necessarily ones where CH fails — for which it is strictly smaller). So it is a rather nice intermediate cardinal, which partially explains its interest to set theorists.

The cardinal p is one of the two infinities that Malliaris and Shelah proved were the same. The other one is closely related. Define a tower to be a collection of f-sets that does not contain [\emptyset] and is totally ordered by inclusion. Note that a tower T trivially satisfies the finite-intersection property: if [A_1],\dots,[A_n] belong to T, then the smallest of the f-sets [A_i] is the f-intersection and it isn’t f-empty. So let’s make another definition.

Definition. The cardinal t is the smallest cardinality of a tower T that has an empty f-intersection.

Since a tower has the finite-intersection property, we are asking for something strictly stronger before, so strictly harder to obtain. It follows that t is at least as large as p.

And now we have the obvious question: is the inequality strict? As I have said, it was widely believed that it was, and a big surprise when Malliaris and Shelah proved that the two infinities were in fact equal.

What does this actually say? It says that if you can find a bunch F of f-sets with the finite-intersection property and an empty f-intersection, then you can find a totally ordered example T that has at most the cardinality of F.

Why is the problem hard?

I don’t have a sophisticated answer to this that would explain why it is hard to experts in set theory. I just want to think about why it might be hard to prove the statement using a naive approach.

An immediate indication that things might be difficult is that it isn’t terribly easy to give any example of a tower with an empty f-intersection, let alone one with small cardinality.

An indication of the problem we face was already present when I gave a failed attempt to construct a system of sets with the finite-intersection property and empty intersection. I took a nested sequence [A_1]\supset[A_2]\supset such that the sets A_n had empty intersection, but that didn’t work because I could pick an element from each A_n\setminus A_{n+1} and put those together to make a non-empty f-intersection. (I’m using “f-intersection” to mean any f-set f-contained in all the given f-sets. In general, we can’t choose a largest one, so it’s far from unique. The usual terminology would be to say that if A is almost contained in every set from a collection of sets, then A is a pseudointersection of that collection. But I’m trying to express as much as possible in terms of f-sets.)

Anyone who is familiar with ordinal hierarchies will see that there is an obvious thing we could do here. We could start as above, and then when we find the annoying f-intersection we simply add it to the tower and call it [A_\omega]. And then inside [A_\omega] we can find another nested decreasing sequence of sets and call those [A_{\omega+1}], [A_{\omega+2}],\dots and so on. Those will also have a non-empty f-intersection, which we could call [A_{2\omega}], and so on.

Let’s use this idea to prove that there do exist towers with empty f-intersections. I shall build a collection of non-empty f-sets [A_\alpha] by transfinite induction. If I have already built [A_\alpha], I let [A_{\alpha+1}] be any non-empty f-set that is strictly f-contained in [A_\alpha]. That tells me how to build my sets at successor ordinals. If \alpha is a limit ordinal, then I’ll take A_\alpha to be a non-empty f-intersection of all the [A_\beta] with \beta<\alpha.

But how am I so sure that such an f-intersection exists? I’m not, but if it doesn’t exist, then I’m very happy, as that means that the f-sets [A_\beta] with \beta<\alpha form a tower with empty f-intersection.

Since all the f-sets in this tower are distinct, the process has to terminate at some point, and that implies that a tower with empty f-intersection must exist.

For a lot of ordinal constructions like this, one can show that the process terminates at the first uncountable ordinal, \omega_1. To set theorists, this has extremely small cardinality — by definition, the smallest one after the cardinality of the natural numbers. In some models of set theory, there will be a dizzying array of cardinals between this and the cardinality of the continuum.

In our case it is not too hard to prove that the process doesn’t terminate before we get to the first uncountable ordinal. Indeed, if \alpha is a countable limit ordinal, then we can take an increasing sequence of ordinals \alpha_n that tend to \alpha, pick an element b_n from A_{\alpha_n}\setminus A_{\alpha_{n+1}}, and define A_\alpha to be \{b_1,b_2,\dots\}.

However, there doesn’t seem to be any obvious argument to say that the f-sets [A_\alpha] with \alpha<\omega_1 have an empty f-intersection, even if we make some effort to keep our sets small (for example, by defining A_{\alpha+1} to consist of every other element of A_\alpha). In fact, we sort of know that there won’t be such an argument, because if there were, then it would show that there was a tower whose cardinality was that of the first uncountable ordinal. That would prove that t had this cardinality, and since p is uncountable (that is easy to check) we would immediately know that p and t were equal.

So that’s already an indication that something subtle is going on that you need to be a proper set theorist to understand properly.

But do we need to understand these funny cardinalities to solve the problem? We don’t need to know what they are — just to prove that they are the same. Perhaps that can still be done in a naive way.

So here’s a very naive idea. Let’s take a set F of f-sets with the finite intersection property and empty f-intersection, and let’s try to build a tower T with empty intersection using only sets from F. This would certainly be sufficient for showing that T has cardinality at most that of F, and if F has minimal cardinality it would show that p=t.

There’s almost no chance that this will work, but let’s at least see where it goes wrong, or runs into a brick wall.

At first things go swimmingly. Let [A]\in F. Then there must exist an f-set [A']'\in F that does not f-contain [A], since otherwise [A] itself would be a non-empty f-intersection for F. But then [A]\cap [A'] is a proper f-subset of [A], and by the finite-intersection property it is not f-empty.

By iterating this argument, we can therefore obtain a nested sequence [A_1]\supset[A_2]\supset of f-sets in F.

The next thing we’d like to do is create [A_\omega]. And this, unsurprisingly, is where the brick wall is. Consider, for example, the case where F consists of all sets of density 1. What if we stupidly chose A_n in such a way that \min A_n\geq 2^n for every n? Then our diagonal procedure — picking an element from each set A_n\setminus A_{n+1} — would yield a set of density zero. Of course, we could go for a different diagonal procedure. We would need to prove that for this particular F and any nested sequence we can always find an f-intersection that belongs to F. That’s equivalent to saying that for any sequence A_1\supset A_2\supset of dense sets we can find a set A such that A\setminus A_n is finite for every n and A has density 1.

That’s a fairly simple (but not trivial) exercise I think, but when I tried to write a proof straight down I failed — it’s more like a pen-and-paper job until you get the construction right. But here’s the real question I’d like to know the answer to right at this moment. It splits into two questions actually.

Question 1. Let F be a collection of f-sets with the finite-intersection property and no non-empty f-intersection. Let [A_1]\supset[A_2]\supset\dots be a nested sequence of elements of F. Must this sequence have an f-intersection that belongs to F?

Question 2. If, as seems likely, the answer to Question 1 is no, must it at least be the case that there exists a nested sequence in F with an f-intersection that also belongs to F?

If the answer to Question 2 turned out to be yes, it would naturally lead to the following further question.

Question 3. If the answer to Question 2 is yes, then how far can we go with it? For example, must F contain a nested transfinite sequence of uncountable length?

Unfortunately, even a positive answer to Question 3 would not be enough for us, for reasons I’ve already given. It might be the case that we can indeed build nice big towers in F, but that the arguments stop working once we reach the first uncountable ordinal. Indeed, it might well be known that there are sets F with the finite-intersection property and no non-empty f-intersection that do not contain towers that are bigger than this. If that’s the case, it would give at least one serious reason for the problem being hard. It would tell us that we can’t prove the equality by just finding a suitable tower inside F: instead, we’d need to do something more indirect, constructing a tower T and some non-obvious injection from T to F. (It would be non-obvious because it would not preserve the subset relation.)

Another way the problem might be difficult is if F does contain a tower with no non-empty f-intersection, but we can’t extend an arbitrary tower in F to a tower with this property. Perhaps if we started off building our tower the wrong way, it would lead us down a path that had a dead end long before the tower was big enough, even though good paths and good towers did exist.

But these are just pure speculations on my part. I’m sure the answers to many of my questions are known. If so, I’ll be interested to hear about it, and to understand better why Malliaris and Shelah had to use big tools and a much less obvious argument than the kind of thing I was trying to do above.


September 13, 2017

John BaezComplex Adaptive Systems (Part 5)

When we design a complex system, we often start with a rough outline and fill in details later, one piece at a time. And if the system is supposed to be adaptive, these details may need to changed as the system is actually being used!

The use of operads should make this easier. One reason is that an operad typically has more than one algebra.

Remember from Part 3: an operad has operations, which are abstract ways of sticking things together. An algebra makes these operations concrete: it specifies some sets of actual things, and how the operations in the operad get implemented as actual ways to stick these things together.

So, an operad O can have one algebra in which things are described in a bare-bones, simplified way, and another algebra in which things are described in more detail. Indeed it will typically have many algebras, corresponding to many levels of detail, but let’s just think about two for a minute.

When we have a ‘less detailed’ algebra A and a ‘more detailed’ algebra A', they will typically be related by a map

f : A' \to A

which ‘forgets the extra details’. This map should be a ‘homomorphism’ of algebras, but I’ll postpone the definition of that concept.

What we often want to do, when designing a system, is not forget extra detail, but rather add extra detail to some rough specification. There is not always a systematic way to do this. If there is, then we may have a homomorphism

g : A \to A'

going back the other way. This is wonderful, because it lets us automate the process of filling in the details. But we can’t always count on being able to do this—especially not if we want an optimal or even acceptable result. So, often we may have to start with an element of A and search for elements of A' that are mapped to it by f : A' \to A.

Let me give some examples. I’ll take the operad that I described last time, and describe some of its algebras, and homomorphisms between these.

I’ll start with an algebra that has very little detail: its elements will be simple graphs. As the name suggests, these are among the simplest possible ways of thinking about networks. They just look like this:

Then I’ll give an algebra with more detail, where the vertices of our simple graphs are points in the plane. There’s nothing special about the plane: we could replace the plane by any other set, and get another algebra of our operad. For example, we could use the set of points on the surface of the Caribbean Sea, the blue stuff in the rectangle here:

That’s what we might use in a search and rescue operation. The points could represent boats, and the edges could represent communication channels.

Then I’ll give an algebra with even more detail, where two points connected by an edge can’t be too far apart. This would be good for range-limited communication channels.

Then I’ll give an algebra with still more detail, where the locations of the points are functions of time. Now our boats are moving around!

Okay, here we go.

The operad from last time was called O_G. Here G is the network model of simple graphs. The best way to picture an operation of O_G is as a way of sticking together a list of simple graphs to get a new simple graph.

For example, an operation

f \in O_G(3,4,2;9)

is a way of sticking together a simple graph with 3 vertices, one with 4 vertices and one with 2 vertices to get one with 9 vertices. Here’s a picture of such an operation:

Note that this operation is itself a simple graph. An operation in O_G(3,4,2;9) is just a simple graph with 9 vertices, where we have labelled the vertices from 1 to 9.

This operad comes with a very obvious algebra A where the operations do just what I suggested. In this algebra, an element of A(t) is a simple graph with t vertices, listed in order. Here t is any natural number, which I’m calling ‘t’ for ‘type’.

We also need to say how the operations in O_G act on these sets A(t). If we take simple graphs in A(3), A(4), and A(2):

we can use our operation f to stick them together and get this:

But we can also make up a more interesting algebra of O_G. Let’s call this algebra A'. We’ll let an element of A'(t) be a simple graph with t vertices, listed in order, which are points in the plane.

My previous pictures can be reused to show how operations in O_G act on this new algebra A'. The only difference is that now we tread the vertices literally as points in the plane! Before you should have been imagining them as abstract points not living anywhere; now they have locations.

Now let’s make up an even more detailed algebra A''.

What if our communication channels are ‘range-limited’? For example, what if two boats can’t communicate if they are more than 100 kilometers apart?

Then we can let an element of A''(t) be a simple graph with t vertices in the plane such that no two vertices connected by an edge have distance > 100.

Now the operations of our operad O_G act in a more interesting way. If we have an operation, and we apply it to elements of our algebra, it ‘tries’ to put in new edges as it did before, but it ‘fails’ for any edge that would have length > 100. In other words, we just leave out any edges that would be too long.

It took me a while to figure this out. At first I thought the result of the operation would need to be undefined whenever we tried to create an edge that violated the length constraint. But in fact it acts in a perfectly well-defined way: we just don’t put in edges that would be too long!

This is good. This means that if you tell two boats to set up a communication channel, and they’re too far apart, you don’t get the ‘blue screen of death’: your setup doesn’t crash and burn. Instead, you just get a polite warning—‘communication channel not established’—and you can proceed.

The nontrivial part is to check that if we do this, we really get an algebra of our operad! There are some laws that must hold in any algebra. But since I haven’t yet described those laws, I won’t check them here. You’ll have to wait for our paper to come out.

Let’s do one more algebra today. For lack of creativity I’ll call it A'''. Now an element of A'''(t) is a time-dependent graph in the plane with t vertices, listed in order. Namely, the positions of the vertices depend on time, and the presence or absence of an edge between two vertices can also depend on time. Furthermore, let’s impose the requirement that any two vertices can only connected by an edge at times when their distance is ≤ 100.

When I say ‘functions of time’ here, what ‘time’? We can model time by some interval [T_1, T_2]. But if you don’t like that, you can change it.

This algebra A''' works more or less like A''. The operations of O_G try to create edges, but these edges only ‘take’ at times when the vertices they connect have distance ≤ 100.

There’s something here you might not like. Our operations can only try to create edges ‘for all times’… and succeed at times when the vertices are close enough. We can’t try to set up a communication channel for a limited amount of time.

But fear not: this is just a limitation in our chosen network model, ‘simple graphs’. With a fancier network model, we’d get a fancier operad, with fancier operations. Right now I’m trying to keep the operad simple (pun not intended), and show you a variety of different algebras.

And you might expect, we have algebra homomorphisms going from more detailed algebras to less detailed ones:

f_T : A''' \to A'', \quad h : A' \to A

The homomorphism h takes a simple graph in the plane and forgets the location of its vertices. The homomorphism f_T depends on a choice of time T \in [T_1, T_2]. For any time T, it takes a time-dependent graph in the plane and evaluates it at that time, getting a graph in the plane (which obeys the distance constraints, since the time-dependent graph obeyed those constraints at any time).

We do not have a homomorphism g: A'' \to A' that takes a simple graph in the plane obeying our distance constraints and forgets about those constraints. There’s a map g sending elements of A'' to elements of A' in this way. But it’s not an algebra homomorphism! The problem is that first trying to connect two graphs with an edge and then applying g may give a different result than first applying g and then connecting two graphs with an edge.

In short: a single operad has many algebras, which we can use to describe our desired system at different levels of detail. Algebra homomorphisms relate these different levels of detail.

Next time I’ll look at some more interesting algebras of the same operad. For example, there’s one that describes a system of interacting mobile agents, which move around in some specific way, determined by their location and the locations of the agents they’re communicating with.

Even this is just the tip of the iceberg—that is, still a rather low level of detail. We can also introduce stochasticity (that is, randomness). And to go even further, we could switch to a more sophisticated operad, based on a fancier ‘network model’.

But not today.


Richard EastherSet The Controls For The Heart of Saturn

PIA01482.jpg

It has been a bitter-sweet month for solar system explorers.

As a teenager and a space-geek, I had a poster of this iconic montage of montage of Saturn's moons, composed from images taken by the two Voyager probes as they rushed past the ringed planet.

Time passes. In August, the Voyager mission celebrated its 40th anniversary; the twin spacecraft are heading into interstellar space but still regularly dispatch data to Earth, their signals growing fainter and fainter as they travel further and further from home. With luck they will survive another decade, but they must eventually fall silent and will journey mutely to the stars.

This week on September 15, the Cassini spacecraft – the most recent visitor human beings have sent to Saturn – will meet a far more emphatic demise. Launched 20 years ago, Cassini arrived at Saturn in 2004 and took up orbit around the giant planet, making hundreds of loops through its retinue of moons and skimming the iconic rings. It's about to run out of the propellant needed for manoeuvring; theoretically it might circle Saturn forever, but we could no longer steer it. 

So, as I write this, Cassini has climbed away from Saturn for the final time to make a flyby of Titan, Saturn's largest moon, which has nudged it into a collision course with the giant planet. 

Cassini's spectacular finale is, in part, a tribute to its success. All spacecraft carry stowaways: bacterial spores which can, remarkably, remain viable in the vacuum of space. And one of Cassini's many discoveries was that three of Saturn's moons appear to have oceans of water beneath their solid crusts. Leaving the spacecraft to wander unguided around the Saturnian system would risk a collision with one of these moons. So, rather than let its microbial hitchhikers disembark onto a pristine world, Cassini will not "go gentle into the good night" as its propellant runs low but has been steered towards a fiery demise.

Geysers of warm water from the sub-surface oceans of Enceladus.

Geysers of warm water from the sub-surface oceans of Enceladus.

For cosmologists like myself, the contents of our own solar system (and even our own galaxy, much of the time) are things to look past rather than to look at. The complex worlds that circle our sun can seem to be a motley collection of adorable oddballs when compared to the deep simplicity of the universe itself,  like Shakespearean comic turns in the foreground with the serious actors behind.

If cosmologists were architects, I supect most of us would be austere modernists, whereas planetary scientists might prefer rococo delights adorned with complicated facades and gratuitous flourishes. 

Saturn's moon Mimas, with the impact crater Herschel. 

Saturn's moon Mimas, with the impact crater Herschel. 

Saturn is a prime example: beyond its gaudy rings and complex cloud tops, it boasts an astonishingly diverse collection of moons. These include Titan, the only known moon with its own atmosphere, to which Cassini dispatched the Huygens lander; and Mimas, a battered icy world sporting a giant impact crater that makes it look uncannily like the Death Star. ("That's no moon, it's a space station" – and yet this, indeed, is a moon.)

NASA has an undoubted ability to sell a story, and it has been making the most of the anthropomorphic appeal of this brave little $3 billion, 5 ton, plutonium-powered spacecraft on its two-decade mission. But the hype is not misplaced: Saturn has a key place in the evolving human understanding of the cosmos. "Childlike wonder" is both a cliché and the literal truth when we speak of the "space". The most distant planet easily visible to the naked eye, Saturn once marked the apparent edge of our solar system. Its rings are visible through even the smallest of telescopes, and seeing them this way still takes my breath away. Cassini has shown us Saturn with its rings and its moons up close and personal, with astonishing clarity and precision. The spacecraft and the team of scientists responsible for it have written themselves into the history books. 

Beyond revealing the universe to us, space exploration exposes our own  small place in the big picture. Saturn's rings, backlit by the distant sun, dominate what may be the most haunting image returned by Cassini. After taking in the planet's dark bulk and golden rings, our eye drifts to the pale blue dot in the lower right hand corner of the frame, and the image's full weight is revealed: it is a lovely, lonely, long-distance portrait of the Earth and of humanity itself.

The earth, as seen from Saturn. 

The earth, as seen from Saturn. 


CODA: All images courtesy of NASA, JPL and/or the ESA. The header image is from a visualisation of Cassini's final plunge into Saturn's atmosphere. 

As a physicist, Saturn always stuns me with the complex orbital dynamics of its moons, the chaotically braided and filigreed rings, and its astonishing atmospheric dynamics. Truly, there is something here for everyone. 

No introduction necessary
No introduction necessary A hexagon storm at Saturn's north pole. Fluid dynamics at its best.
A hexagon storm at Saturn's north pole. Fluid dynamics at its best. Braided structure in the rings.
Braided structure in the rings. Hyperion
Hyperion Enceladus
Enceladus Titan
Titan The surface of Titan, from the Huygens lander.
The surface of Titan, from the Huygens lander.