Planet Musings

January 16, 2018

BackreactionBook Review: “The Dialogues” by Clifford Johnson

Clifford Johnson is a veteran of the science blogosphere, a long-term survivor, around already when I began blogging and one of the few still at it today. He is professor at the Department of Physics and Astronomy at the University of Southern California (in LA). I had the pleasure of meeting Clifford in 2007. Who’d have thought back then that 10 years later we would both be in the midst of

Jordan EllenbergLinger

Dolores O’Riordan, singer in the Cranberries, died today.  In the fall of 1993 I was living in an apartment by myself for the first time, the Baltimorean on N. Charles Street.  I was devoting myself full-time to being a writer and kind of hating it.  I didn’t know anyone in Baltimore and the people in my program were mostly older than me and socially inaccessible and I was lonely.   The apartment was always too hot.  I ate spaghetti with jar sauce for dinner by myself and listened to “Linger.”  It’s still the sound of loneliness to me, after all these years.

 

Jordan EllenbergFarblandia

The job fell to me of giving an overview talk about Benson Farb’s entire career at his birthday conference last fall.  Hard task!  I didn’t cover nearly everything but I think I gave a decent impression of what Farbisme is all about.

 

John BaezThe Kepler Problem (Part 1)

Johannes Kepler loved geometry, so of course he was fascinated by Platonic solids. His early work Mysterium Cosmographicum, written in 1596, includes pictures showing how the 5 Platonic solids correspond to the 5 elements:

Five elements? Yes, besides earth, air, water and fire, he includes a fifth element that doesn’t feel the Earth’s gravitational pull: the ‘quintessence’, or ‘aether’, from which heavenly bodies are made.

In the same book he also tried to use the Platonic solids to explain the orbits of the planets:

The six planets are Mercury, Venus, Earth, Mars, Jupiter and Saturn. And the tetrahedron and cube, in case you’re wondering, sit outside the largest sphere shown above. You can see them another picture from Kepler’s book:

These ideas may seem goofy now, but studying the exact radii of the planets’ orbits led him to discover that these orbits aren’t circular: they’re ellipses! By 1619 this led him to what we call Kepler’s laws of planetary motion. And those, in turn, helped Newton verify Hooke’s hunch that the force of gravity goes as the inverse square of the distance between bodies!

In honor of this, the problem of a particle orbiting in an inverse square force law is called the Kepler problem.

So, I’m happy that Greg Egan, Layra Idarani and I have come across a solid mathematical connection between the Platonic solids and the Kepler problem.

But this involves a detour into the 4th dimension!

It’s a remarkable fact that the Kepler problem has not just the expected conserved quantities—energy and the 3 components of angular momentum—but also 3 more: the components of the Runge–Lenz vector. To understand those extra conserved quantities, go here:

• Greg Egan, The ellipse and the atom.

Noether proved that conserved quantities come from symmetries. Energy comes from time translation symmetry. Angular momentum comes from rotation symmetry. Since the group of rotations in 3 dimensions, called SO(3), is itself 3-dimensional, it gives 3 conserved quantities, which are the 3 components of angular momentum.

None of this is really surprising. But if we take the angular momentum together with the Runge–Lenz vector, we get 6 conserved quantities—and these turn out to come from the group of rotations in 4 dimensions, SO(4), which is itself 6-dimensional. The obvious symmetries in this group just rotate a planet’s elliptical orbit, while the unobvious ones can also squash or stretch it, changing the eccentricity of the orbit.

(To be precise, all this is true only for the ‘bound states’ of the Kepler problem: the circular and elliptical orbits, not the parabolic or hyperbolic ones, which work in a somewhat different way. I’ll only be talking about bound states in this post!)

Why should the Kepler problem have symmetries coming from rotations in 4 dimensions? This is a fascinating puzzle—we know a lot about it, but I doubt the last word has been spoken. For an overview, go here:

• John Baez, Mysteries of the gravitational 2-body problem.

This SO(4) symmetry applies not only to the classical mechanics of the inverse square force law, but also the quantum mechanics! Nobody cares much about the quantum mechanics of two particles attracting gravitationally via an inverse square force law—but people care a lot about the quantum mechanics of hydrogen atoms, where the electron and proton attract each other via their electric field, which also obeys an inverse square force law.

So, let’s talk about hydrogen. And to keep things simple, let’s pretend the proton stays fixed while the electron orbits it. This is a pretty good approximation, and experts will know how to do things exactly right. It requires only a slight correction.

It turns out that wavefunctions for bound states of hydrogen can be reinterpreted as functions on the 3-sphere, S3 The sneaky SO(4) symmetry then becomes obvious: it just rotates this sphere! And the Hamiltonian of the hydrogen atom is closely connected to the Laplacian on the 3-sphere. The Laplacian has eigenspaces of dimensions n2 where n = 1,2,3,…, and these correspond to the eigenspaces of the hydrogen atom Hamiltonian. The number n is called the principal quantum number, and the hydrogen atom’s energy is proportional to -1/n2.

If you don’t know all this jargon, don’t worry! All you need to know is this: if we find an eigenfunction of the Laplacian on the 3-sphere, it will give a state where the hydrogen atom has a definite energy. And if this eigenfunction is invariant under some subgroup of SO(4), so will this state of the hydrogen atom!

The biggest finite subgroup of SO(4) is the rotational symmetry group of the 600-cell, a wonderful 4-dimensional shape with 120 vertices and 600 dodecahedral faces. The rotational symmetry group of this shape has a whopping 7,200 elements! And here is a marvelous moving image, made by Greg Egan, of an eigenfunction of the Laplacian on S3 that’s invariant under this 7,200-element group:

We’re seeing the wavefunction on a moving slice of the 3-sphere, which is a 2-sphere. This wavefunction is actually real-valued. Blue regions are where this function is positive, yellow regions where it’s negative—or maybe the other way around—and black is where it’s almost zero. When the image fades to black, our moving slice is passing through a 2-sphere where the wavefunction is almost zero.

For a full explanation, go here:

• Greg Egan, In the chambers with seven thousand symmetries, 2 January 2018.

Layra Idarani has come up with a complete classification of all eigenfunctions of the Laplacian on S3 that are invariant under this group… or more generally, eigenfunctions of the Laplacian on a sphere of any dimension that are invariant under the even part of any Coxeter group. For the details, go here:

• Layra Idarani, SG-invariant polynomials, 4 January 2018.

All that is a continuation of a story whose beginning is summarized here:

• John Baez, Quantum mechanics and the dodecahedron.

So, there’s a lot of serious math under the hood. But right now I just want to marvel at the fact that we’ve found a wavefunction for the hydrogen atom that not only has a well-defined energy, but is also invariant under this 7,200-element group. This group includes the usual 60 rotational symmetries of a dodecahedron, but also other much less obvious symmetries.

I don’t have a good picture of what these less obvious symmetries do to the wavefunction of a hydrogen atom. I understand them a bit better classically—where, as I said, they squash or stretch an elliptical orbit, changing its eccentricity while not changing its energy.

We can have fun with this using the old quantum theory—the approach to quantum mechanics that Bohr developed with his colleague Sommerfeld from 1920 to 1925, before Schrödinger introduced wavefunctions.

In the old Bohr–Sommerfeld approach to the hydrogen atom, the quantum states with specified energy, total angular momentum and angular momentum about a fixed axis were drawn as elliptical orbits. In this approach, the symmetries that squash or stretch elliptical orbits are a bit easier to visualize:

This picture by Pieter Kuiper shows some orbits at the 5th energy level, n = 5: namely, those with different eigenvalues of the total angular momentum, ℓ.

While the old quantum theory was superseded by the approach using wavefunctions, it’s possible to make it mathematically rigorous for the hydrogen atom. So, we can draw elliptical orbits that rigorously correspond to a basis of wavefunctions for the hydrogen atom. So, I believe we can draw the orbits corresponding to the basis elements whose linear combination gives the wavefunction shown as a function on the 3-sphere in Greg’s picture above!

We should get a bunch of ellipses forming a complicated picture with dodecahedral symmetry. This would make Kepler happy.

As a first step in this direction, Greg drew the collection of orbits that results when we take a circle and apply all the symmetries of the 600-cell:

For more details, read this:

• Greg Egan, Kepler orbits with the symmetries of the 600-cell.

Postscript

To do this really right, one should learn a bit about ‘old quantum theory’. I believe people have been getting it a bit wrong for quite a while—starting with Bohr and Sommerfeld!

If you look at the ℓ = 0 orbit in the picture above, it’s a long skinny ellipse. But I believe it really should be a line segment straight through the proton: that’s what’s an orbit with no angular momentum looks like.

There’s a paper about this:

• Manfred Bucher, Rise and fall of the old quantum theory.

Matt McIrvin had some comments on this:

This paper from 2008 is a kind of thing I really like: an exploration of an old, incomplete theory that takes it further than anyone actually did at the time.

It has to do with the Bohr-Sommerfeld “old quantum theory”, in which electrons followed definite orbits in the atom, but these were quantized–not all orbits were permitted. Bohr managed to derive the hydrogen spectrum by assuming circular orbits, then Sommerfeld did much more by extending the theory to elliptical orbits with various shapes and orientations. But there were some problems that proved maddeningly intractable with this analysis, and it eventually led to the abandonment of the “orbit paradigm” in favor of Heisenberg’s matrix mechanics and Schrödinger’s wave mechanics, what we know as modern quantum theory.

The paper argues that the old quantum theory was abandoned prematurely. Many of the problems Bohr and Sommerfeld had came not from the orbit paradigm per se, but from a much simpler bug in the theory: namely, their rejection of orbits in which the electron moves entirely radially and goes right through the nucleus! Sommerfeld called these orbits “unphysical”, but they actually correspond to the s orbital states in the full quantum theory, with zero angular momentum. And, of course, in the full theory the electron in these states does have some probability of being inside the nucleus.

So Sommerfeld’s orbital angular momenta were always off by one unit. The hydrogen spectrum came out right anyway because of the happy accident of the energy degeneracy of certain orbits in the Coulomb potential.

I guess the states they really should have been rejecting as “unphysical” were Bohr’s circular orbits: no radial motion would correspond to a certain zero radial momentum in the full theory, and we can’t have that for a confined electron because of the uncertainty principle.

January 15, 2018

n-Category Café On the Magnitude Function of Domains in Euclidean Space, I

Guest post by Heiko Gimperlein and Magnus Goffeng.

The magnitude of a metric space was born, nearly ten years ago, on this blog, although it went by the name of cardinality back then. There has been much development since (for instance, see Tom Leinster and Mark Meckes’ survey of what was known in 2016). Basic geometric questions about magnitude, however, remain open even for compact subsets of n\mathbb{R}^n: Tom Leinster and Simon Willerton suggested that magnitude could be computed from intrinsic volumes, and the algebraic origin of magnitude created hopes for an inclusion-exclusion principle.

In this sequence of three posts we would like to discuss our recent article, which is about asymptotic geometric content in the magnitude function and also how it relates to scattering theory.

For “nice” compact domains in n\mathbb{R}^n we prove an asymptotic variant of Leinster and Willerton’s conjecture, as well as an asymptotic inclusion-exclusion principle. Starting from ideas by Juan Antonio Barceló and Tony Carbery, our approach connects the magnitude function with ideas from spectral geometry, heat kernels and the Atiyah-Singer index theorem.

We will also address the location of the poles in the complex plane of the magnitude function. For example, here is a plot of the poles and zeros of the magnitude function of the 2121-dimensional ball.

poles and zeros of the magnitude function of the 21-dim ball

We thank Simon for inviting us to write this post and also for his paper on the magnitude of odd balls as the computations in it rescued us from some tedious combinatorics.

The plan for the three café posts is as follows:

  1. State the recent results on the asymptotic behaviour as a metric space is scaled up and on the meromorphic extension of the magnitude function.

  2. Discuss the proof in the toy case of a compact domain XX\subseteq \mathbb{R} and indicate how it generalizes to arbitrary odd dimension.

  3. Consider the relationship of the methods to geometric analysis and potential ramifications; also state some open problems that could be interesting.

Asymptotic results

As you may recall, the magnitude mag(X)\mathrm{mag}(X) of a finite subset X nX\subseteq \mathbb{R}^n is easy to define: let w:Xw:X\to \mathbb{R} be a function which satisfies

yXe d(x,y)w(y)=1for allxX. \sum_{y\in X} \mathrm{e}^{-\mathrm{d}(x,y)} w(y) = 1 \quad \text{for all}\ x \in X.

Such a function is called a weighting. Then the magnitude is defined as the sum of the weights:

mag(X)= xXw(x). \mathrm{mag}(X) = \sum_{x \in X} w(x).

For a compact subset XX of n\mathbb{R}^n, Mark Meckes shows that all reasonable definitions of magnitude are equal to what you get by taking the supremum of the magnitudes of all finite subsets of XX:

mag(X)=sup{mag(Ξ):ΞXfinite}. \mathrm{mag}(X) = \sup \{\mathrm{mag}(\Xi) : \Xi \subset X \ \text{finite} \} .

Unfortunately, few explicit computations of the magnitude for a compact subset of n\mathbb{R}^n are known.

Instead of the magnitude of an individual set XX, it proves fruitful to study the magnitude of dilates RXR\cdot X of XX, for R>0R\gt 0. Here the dilate RXR\cdot X means the space XX with the metric scaled by a factor of RR. We can vary RR and this gives rise to the magnitude function of XX:

X:(0,); X(R):=mag(RX)for R>0. \mathcal{M}_X\colon (0,\infty)\to \mathbb{R};\quad\mathcal{M}_X(R) := \mathrm{mag}(R\cdot X)\quad\text{for }\ R \gt 0.

Tom and Simon conjectured a relation between the magnitude function of XX and its intrinsic volumes (V i(X)) i=0 n(V_i(X))_{i=0}^n. The intrinsic volumes of subsets of n\mathbb{R}^n generalize notions such as volume, surface area and Euler characteristic, with V n(X)=vol n(X)V_n(X)=\text{vol}_n(X) and V 0(X)=χ(X)V_0(X)=\chi(X).

Convex Magnitude Conjecture. Suppose X nX \subseteq \mathbb{R}^n is compact and convex, then the magnitude function is a polynomial whose coefficients involve the intrinsic volumes:
X(R)= i=0 nV i(X)i!ω iR n, \mathcal{M}_X(R) = \sum_{i=0}^n \frac{V_i(X)}{i! \,\omega_i} R^n, where V i(X)V_i(X) is the ii-th intrinsic volume of XX and ω i\omega_i the volume of the ii-dimensional ball.

The conjecture was disproved by Barceló and Carbery (see also this post). They computed the magnitude function of the 55-dimensional ball B 5B_5 to be the rational function B 5(R)=R 55!+3R 5+27R 4+105R 3+216R+7224(R+3). \mathcal{M}_{B_5}(R)=\frac{R^5}{5!} +\frac{3R^5+27R^4+105R^3+216R+72}{24(R+3)}. In particular, the magnitude function is not even a polynomial for B 5B_5. Also, the coefficient of R 4R^4 in the asymptotic expansion of B 5(R)\mathcal{M}_{B_5}(R) as RR \to \infty does not agree with the conjecture.

Nevertheless, for any smooth, compact domain in odd-dimensional Euclidean space, X nX\subseteq\mathbb{R}^n (in other words, the closure of an open bounded subset with smooth boundary), for n=2m1n=2m-1, our main result shows that a modified form of the conjecture holds asymptotically as RR \to \infty.

Theorem A. Suppose n=2m1n=2m-1 and that X nX\subseteq \mathbb{R}^n is a smooth domain.

  1. There exists an asymptotic expansion of the magnitude function: X(R)1n!ω n j=0 c j(X)R njas R \mathcal{M}_X(R) \sim \frac{1}{n!\omega_n}\sum_{j=0}^\infty c_j(X) R^{n-j} \quad \text{as }\ R\to \infty with coefficients c j(X)c_j(X)\in\mathbb{R} for j=0,1,2,j=0,1,2,\ldots.

  2. The first three coefficients are given by c 0(X) =vol n(X), c 1(X) =mvol n1(X), c 2(X) =m 22(n1) XHdS, \begin{aligned} c_0(X)&=\text{vol}_n(X),\\ c_1(X)&=m\text{vol}_{n-1}(\partial X),\\ c_2(X)&=\frac{m^2}{2} (n-1)\int_{\partial X} H \ \mathrm{d}S, \end{aligned} where HH is the mean curvature of X\partial X. (The coefficient c 0c_0 was computed by Barceló and Carbery.) If XX is convex, these coefficients are proportional to the intrinsic volumes V n(X)V_{n}(X), V n1(X)V_{n-1}(X) and V n2(X)V_{n-2}(X) respectively.

  3. For j1j\geq 1, the coefficient c j(X)c_j(X) is determined by the second fundamental form LL and covariant derivative X\nabla_{\partial X} of X\partial X: c j(X)c_j(X) is an integral over X\partial X of a universal polynomial in the entries of X kL\nabla_{\partial X}^k L, 0kj20 \leq k\leq j-2. The total number of covariant derivatives appearing in each term of the polynomial is j2j-2.

  4. The previous part implies an asymptotic inclusion-exclusion principle: if AA, BB and AB nA \cap B \subset \mathbb{R}^n are smooth, compact domains, AB(R) A(R) B(R)+ AB(R)0as R \mathcal{M}_{A \cup B}(R) - \mathcal{M}_A(R) - \mathcal{M}_B(R) + \mathcal{M}_{A \cap B}(R) \to 0 \quad \text{as }\ R \to \infty faster than R NR^{-N} for all NN.

If you’re not familiar with the second fundamental form, you should think of it as the container for curvature information of the boundary relative to the ambient Euclidean space. Since Euclidean space is flat, any curvature invariant of the boundary (satisfying reasonable symmetry conditions) will only depend on the second fundamental form and its derivatives.

Note that part 4 of the theorem does not imply that an asymptotic inclusion-exclusion principle holds for all AA and BB, even if AA and BB are smooth, since the intersection ABA \cap B usually is not smooth. In fact, it seems unlikely that an asymptotic inclusion-exclusion principle holds for general AA and BB without imposing curvature conditions, for example by means of assuming convexity of AA and BB.

The key idea of the short proof relates the computation of the magnitude to classical techniques from geometric and semiclassical analysis, applied to a reformulated problem already studied by Meckes and by Barceló and Carbery. Meckes proved that the magnitude can be computed from the solution to a partial differential equation in the exterior domain nX\mathbb{R}^n\setminus X with prescribed values in XX. A careful analysis by Barceló and Carbery refined Meckes’ results and expressed the magnitude by means of the solution to a boundary value problem. We refer to this boundary value problem as the “Barceló-Carbery boundary value problem” below.

Meromorphic extension of the magnitude function

Intriguingly, we find that the magnitude function X\mathcal{M}_X extends meromorphically to complex values of the scaling factor RR. The meromorphic extension was noted by Tom for finite metric spaces and was observed in all previously known examples.

Theorem B. Suppose n=2m1n=2m-1 and that X nX\in \mathbb{R}^n is a smooth domain.

  • The magnitude function X\mathcal{M}_X admits a meromorphic continuation to the complex plane.

  • The poles of X\mathcal{M}_X are generalized scattering resonances, and each sector {z:|arg(z)|<θ}\{z : |\arg(z)| \lt \theta \} with θ<π2\theta \lt \frac{\pi}{2} contains at most finitely many of them (all of them outside {z:|arg(z)|<πn+1}\{z : |\arg(z)|\lt {\textstyle \frac{\pi}{n+1}}\}).

The ordinary notion of scattering resonances comes from studying waves scattering at a compact obstacle X nX\subseteq \mathbb{R}^n. A scattering resonance is a pole of the meromorphic extension of the solution operator (R 2Δ) 1(R^2-\Delta)^{-1} to the Helmholtz equation on nX\mathbb{R}^n\setminus X, with suitable boundary conditions. These resonances determine the long-time behaviour of solutions to the wave equation and are well studied in geometric analysis as well as throughout physics. The Barceló-Carbery boundary value problem is a higher order version of this problem and studies solutions to (R 2Δ) mu=0(R^2-\Delta)^{m}u=0 outside XX. In dimension n=1n=1 (i.e. m=1m=1), the Barceló-Carbery problem coincides with the Helmholtz problem, and the poles of the magnitude function are indeed scattering resonances. As in scattering theory, one might hope to find detailed structure in the location of the poles. A plot of the poles and zeros in case of the 2121-dimensional ball is given at the top of the post.

The second part of this theorem is sharp. In fact, the poles do not need to lie in any half plane. Using the techniques of Barceló and Carbery we observe that the magnitude function of the 3-dimensional spherical shell X=(2B 3)B 3 X=(2B_3)\setminus B_3^\circ is not rational and contains an infinite discrete sequence of poles which approaches the curve Re(R)=log(|Im(R)|)\mathrm{Re}(R)= \log(|\mathrm{Im}(R)|) as Re(R)\mathrm{Re}(R) \to \infty. Here’s a plot of the poles of X\mathcal{M}_{X} with |Im(R)|<30|\mathrm{Im}(R)|\lt 30.

poles 3-d spherical shell

The magnitude function of XX is given by X(R)=(7/6)R 3+5R 2+2R+2+e 2R(R 2+1)+2R 33R 2+2R1sinh2R2R.\mathcal{M}_X(R)=(7/6)R^3 +5R^2 + 2R + 2 + \frac{e^{-2R}(R^2+1)+2R^3-3R^2+2R-1}{\sinh 2R -2R}.

[EDIT: The above formula has been corrected, following comments below.]

Our techniques extend to compact domains with a C kC^k boundary, as long as kk is large enough. In this case, the asymptotic inclusion-exclusion principle takes the form that AB(R) A(R) B(R)+ AB(R)0asR\mathcal{M}_{A \cup B}(R) - \mathcal{M}_A(R) - \mathcal{M}_B(R) + \mathcal{M}_{A \cap B}(R) \to 0 \quad \text{as}\ R \to \infty faster than R NR^{-N} for an N=N(k)N=N(k).

John BaezRetrotransposons

This article is very interesting:

• Ed Yong, Brain cells share information with virus-like capsules, Atlantic, January 12, 2018.

Your brain needs a protein called Arc. If you have trouble making this protein, you’ll have trouble forming new memories. The neuroscientist Jason Shepherd noticed something weird:

He saw that these Arc proteins assemble into hollow, spherical shells that look uncannily like viruses. “When we looked at them, we thought: What are these things?” says Shepherd. They reminded him of textbook pictures of HIV, and when he showed the images to HIV experts, they confirmed his suspicions. That, to put it bluntly, was a huge surprise. “Here was a brain gene that makes something that looks like a virus,” Shepherd says.

That’s not a coincidence. The team showed that Arc descends from an ancient group of genes called gypsy retrotransposons, which exist in the genomes of various animals, but can behave like their own independent entities. They can make new copies of themselves, and paste those duplicates elsewhere in their host genomes. At some point, some of these genes gained the ability to enclose themselves in a shell of proteins and leave their host cells entirely. That was the origin of retroviruses—the virus family that includes HIV.

It’s worth pointing out that gypsy is the name of a specific kind of retrotransposon. A retrotransposon is a gene that can make copies of itself by first transcribing itself from DNA into RNA and then converting itself back into DNA and inserting itself at other places in your chromosomes.

About 40% of your genes are retrotransposons! They seem to mainly be ‘selfish genes’, focused on their own self-reproduction. But some are also useful to you.

So, Arc genes are the evolutionary cousins of these viruses, which explains why they produce shells that look so similar. Specifically, Arc is closely related to a viral gene called gag, which retroviruses like HIV use to build the protein shells that enclose their genetic material. Other scientists had noticed this similarity before. In 2006, one team searched for human genes that look like gag, and they included Arc in their list of candidates. They never followed up on that hint, and “as neuroscientists, we never looked at the genomic papers so we didn’t find it until much later,” says Shepherd.

I love this because it confirms my feeling that viruses are deeply entangled with our evolutionary past. Computer viruses are just the latest phase of this story.

As if that wasn’t weird enough, other animals seem to have independently evolved their own versions of Arc. Fruit flies have Arc genes, and Shepherd’s colleague Cedric Feschotte showed that these descend from the same group of gypsy retrotransposons that gave rise to ours. But flies and back-boned animals co-opted these genes independently, in two separate events that took place millions of years apart. And yet, both events gave rise to similar genes that do similar things: Another team showed that the fly versions of Arc also sends RNA between neurons in virus-like capsules. “It’s exciting to think that such a process can occur twice,” says Atma Ivancevic from the University of Adelaide.

This is part of a broader trend: Scientists have in recent years discovered several ways that animals have used the properties of virus-related genes to their evolutionary advantage. Gag moves genetic information between cells, so it’s perfect as the basis of a communication system. Viruses use another gene called env to merge with host cells and avoid the immune system. Those same properties are vital for the placenta—a mammalian organ that unites the tissues of mothers and babies. And sure enough, a gene called syncytin, which is essential for the creation of placentas, actually descends from env. Much of our biology turns out to be viral in nature.

Here’s something I wrote in 1998 when I was first getting interested in this business:

RNA reverse transcribing viruses

RNA reverse transcribing viruses are usually called retroviruses. They have a single-stranded RNA genome. They infect animals, and when they get inside the cell’s nucleus, they copy themselves into the DNA of the host cell using reverse transcriptase. In the process they often cause tumors, presumably by damaging the host’s DNA.

Retroviruses are important in genetic engineering because they raised for the first time the possibility that RNA could be transcribed into DNA, rather than the reverse. In fact, some of them are currently being deliberately used by scientists to add new genes to mammalian cells.

Retroviruses are also important because AIDS is caused by a retrovirus: the human immunodeficiency virus (HIV). This is part of why AIDS is so difficult to treat. Most usual ways of killing viruses have no effect on retroviruses when they are latent in the DNA of the host cell.

From an evolutionary viewpoint, retroviruses are fascinating because they blur the very distinction between host and parasite. Their genome often contains genetic information derived from the host DNA. And once they are integrated into the DNA of the host cell, they may take a long time to reemerge. In fact, so-called endogenous retroviruses can be passed down from generation to generation, indistinguishable from any other cellular gene, and evolving along with their hosts, perhaps even from species to species! It has been estimated that up to 1% of the human genome consists of endogenous retroviruses! Furthermore, not every endogenous retrovirus causes a noticeable disease. Some may even help their hosts.

It gets even spookier when we notice that once an endogenous retrovirus lost the genes that code for its protein coat, it would become indistinguishable from a long terminal repeat (LTR) retrotransposon—one of the many kinds of “junk DNA” cluttering up our chromosomes. Just how much of us is made of retroviruses? It’s hard to be sure.

For my whole article, go here:

Subcellular life forms.

It’s about the mysterious subcellular entities that stand near the blurry border between the living and the non-living—like viruses, viroids, plasmids, satellites, transposons and prions. I need to update it, since a lot of new stuff is being discovered!

Jason Shepherd’s new paper has a few other authors:

• Elissa D. Pastuzyn, Cameron E. Day, Rachel B. Kearns, Madeleine Kyrke-Smith, Andrew V. Taibi, John McCormick, Nathan Yoder, David M. Belnap, Simon Erlendsson, Dustin R. Morado, John A.G. Briggs, Cédric Feschotte and Jason D. Shepherd, The neuronal gene Arc encodes a repurposed retrotransposon gag protein that mediates intercellular RNA transfer, Cell 172 (2018), 275–288.

January 13, 2018

Doug NatelsonAbout grants: What is cost sharing?

In addition to science, I occasionally use this forum as a way to try to explain to students and the public how sponsored research works in academia.  Previously I wrote about the somewhat mysterious indirect costs.  This time I'd like to discuss cost sharing.

Cost sharing is what it sounds like - when researchers at a university propose a research project, and the funding agency or foundation wants to see the university kick in funding as well (beyond obvious things like the lab space where the investigators work).  Many grants, such as NSF single-investigator awards, expressly forbid explicit cost sharing.  That has certain virtues:  To some extent, it levels the playing field, so that particularly wealthy universities don't have an even larger advantage.  Agencies would all like to see their money leveraged as far as possible, and if cost sharing were unrestricted on grants, you could imagine a situation where wealthy institutions would effectively have an incentive to try to buy their way to grant success by offering big matching funds.   

In other programs, such as the NSF's major research instrumentation program, cost sharing is mandated, but the level is set at a fixed percentage of the total budget.  Similarly, some foundations make it known that they expect university matching at a certain percentage level.  While that might be a reach for some smaller, less-well-off universities when the budget is large, at least it's well-defined.    

Sometimes agencies try to finesse things, forbidding explicit cost sharing but still trying to get universities to invest "skin in the game".  For the NSF materials research science and engineering center program, for example, cost sharing is forbidden (in the sense that explicit promises of $N matching or institutional funding is not allowed), but proposals are required to include a discussion of "organizational commitment":  "Provide a description of the resources that the organization will provide to the project, should it be funded. Resources such as space, faculty release time, faculty and staff positions, capital equipment, access to existing facilities, collaborations, and support of outreach programs should be discussed, but not given as dollar equivalents.

"  First and foremost the science and broader impacts drive the merit review, but there's no question that an institution that happens to be investing synergistically with the topic of such a proposal would look good.

The big challenge for universities are grants where cost sharing is not forbidden, and no guidance is given about expectations.  There is a game theory dilemma at work, where institutions try to guess what level of cost sharing is really needed to be competitive.   

So where does the money for cost sharing come from on the university side?  Good question.  The details depend on the university.  Departments, deans, and the central administration typically have some financial resources that they can use to support cost sharing, but how these responsibilities get arranged and distributed varies.  

For the open-ended cost sharing situations, one question that comes up is, how much is too much?  As I'd discussed before, university administrations often argue that research is already a money-losing proposition, in the sense that the amount of indirect costs that they bring in does not actually come close to covering the true expenses of supporting the research enterprise.  That would argue in favor of minimizing cost sharing offers, except that schools really do want to land some of these awards.  (Clearly there are non-financial or indirect benefits to doing research, such as scholarly reputation, or universities would stop supporting that kind of work.)  It would be very interesting if someone would set up a rumor-mill-style site, so that institutions could share with peers roughly what they are offering up for certain programs - it would be revealing to see what it takes to be competitive.  

David Hoggnonlinear dimensionality reduction

Today Brice Ménard (JHU) showed me a new dimensionality-reduction method by him and Dalya Baron. He claims it has no free parameters and good performance. But no paper yet!

David Hogg#hackAAS number 6

Today was the 6th annual AAS Hack Day, at #AAS231 in Washington DC. (I know it was 6th because of this post.) It was an absolutely great day, organized by Kelle Cruz (CUNY), Meg Schwamb (Gemini), and Jim Davenport (UW & WWU), and sponsored by Northrup Grumman and LSST. The Hack Day has become an integral part of the AAS winter meetings, and it is now a sustainable activity that is easy to organize and sponsor.

My hack for the day was to work on dimensionality and structure in the element-abundance data for Solar Twins (we need a name for this data set) created by Megan Bedell (Flatiron). I was reminded how good it is to bring data sets to the Hack Day! Several others took the data to play with, and Martin Henze (SDSU) went to town, visualizing (in R) the covariances (empirical correlations) among the elements. Some of these correlations are positive, some are negative, and some are tiny. Indeed, his analysis sort-of chunks the elements up into blocks that are related in their formation! This is very promising for my long-term goals of obtaining empirical nucleosynthetic yields.

What I did for Hack Day was to visualize the data in the natural (element-abundance) coordinates, and then again in PCA coordinates, where many of the variances really vanish, so the data really are low dimensional (dammit; my loyal reader knows that I don't want this to be true). And then I also visualized in random orthonormal coordinates (which was fun); this shows that the low-variance PCA space is indeed a very rare or hard-to-find subspace in the full 33-dimensional element space. I also visualized some rotations in the space, which forced me to do some 33-dimensional geometry, which is a bit challenging in a room of enthusiastic hackers!

But so much happened at the hack day. There was a project (led by aforementioned Bedell) to make interactive web-based plots of the exoplanet systems, to visualize multiplicity, insolation, and stellar properties. There was a project to find the “Kevin Bacon of astronomy” which was obviously flawed, since it didn't identify yours truly. But it did make a huge network graph of astronomers who use ORCID. Get using that, people! Erik Tollerud did more work on his hack-of-hacks to build great tools for collecting and recording hacks, but he also was working on a white paper for NASA about software licensing. I gave him some content and co-signed it. MIT for the win. Foreman-Mackey led a set of hacks in which astronomers learned how to use TensorFlow, which uses NVIDIA GPUs for insane speed-up in linear-algebra operations. Usually people use TensorFlow for machine learning, but it is a full linear-algebra library, with auto-differentiation baked in.

The AAS-WWT people were in the house, and Jonathan Fay, as per usual at Hack Days (what a hacker!), pushed a substantial change to the software, to make it understand and visualize velocity maps. Another group (including Schwamb, mentioned above) visualized sky polygons in WWT, and used a citizen-science K2 discovery as its test case for visualizing a telescope focal-plane footprint. There were nice hacks with APIs, with people learning to use the NASA Astrophysics Data System API and Virtual Observatory APIs, and getting different APIs to talk together. One hack was to visualize Julia Sets using Julia! It took the room a few minutes to get the joke, but the visualization was great, and very few lines of code in the end. And there were at least two sewing hacks.

None of this does justice: It was a packed room, about 1/3 of the participants completely new to Hack Days, and great atmosphere and energy. I love my job!

January 12, 2018

Terence TaoBi-invariant metrics of linear growth on the free group

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let {F_2} be the free group on two generators {a,b}. Does there exist a metric {d} on this group which is

  • bi-invariant, thus {d(xg,yg)=d(gx,gy) = d(x,y)} for all {x,y,g \in F_2}; and
  • linear growth in the sense that {d(x^n,1) = n d(x,1)} for all {x \in F_2} and all natural numbers {n}?

By defining the “norm” of an element {x \in F_2} to be {\| x\| := d(x,1)}, an equivalent formulation of the problem asks if there exists a non-negative norm function {\| \|: F_2 \rightarrow {\bf R}^+} that obeys the conjugation invariance

\displaystyle  \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)

for all {x,g \in F_2}, the triangle inequality

\displaystyle  \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)

for all {x,y \in F_2}, and the linear growth

\displaystyle  \| x^n \| = |n| \|x\| \ \ \ \ \ (3)

for all {x \in F_2} and {n \in {\bf Z}}, and such that {\|x\| > 0} for all non-identity {x \in F_2}. Indeed, if such a norm exists then one can just take {d(x,y) := \| x y^{-1} \|} to give the desired metric.

One can normalise the norm of the generators to be at most {1}, thus

\displaystyle  \| a \|, \| b \| \leq 1.

This can then be used to upper bound the norm of other words in {F_2}. For instance, from (1), (3) one has

\displaystyle  \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.

A bit less trivially, from (3), (2), (1) one can bound commutators as

\displaystyle  \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|

\displaystyle  = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|

\displaystyle  \leq \frac{4}{3}.

In a similar spirit one has

\displaystyle  \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|

\displaystyle  = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|

\displaystyle  \leq 2.

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm {\| g\|} of a given non-trivial group element {g} to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on {F_2} would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

Terence TaoBi-invariant metrics of linear growth on the free group, II

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let {F_2} be the free group on two generators {a,b}, and let {\| \|: F_2 \rightarrow {\bf R}^+} be a quantity obeying the triangle inequality

\displaystyle \| xy\| \leq \|x \| + \|y\|

and the linear growth property

\displaystyle \| x^n \| = |n| \| x\|

for all {x,y \in F_2} and integers {n \in {\bf Z}}; this implies the conjugation invariance

\displaystyle \| y^{-1} x y \| = \|x\|

or equivalently

\displaystyle \| xy \| = \| yx\|

We consider inequalities of the form

\displaystyle \| xyx^{-1}y^{-1} \| \leq \alpha \|x\| + \beta \| y\| \ \ \ \ \ (1)

or

\displaystyle \| xyx^{-2}y^{-1} \| \leq \gamma \|x\| + \delta \| y\| \ \ \ \ \ (2)

for various real numbers {\alpha,\beta,\gamma,\delta}. For instance, since

\displaystyle \| xyx^{-1}y^{-1} \| \leq \| xyx^{-1}\| + \|y^{-1} \| = \|y\| + \|y\|

we have (1) for {(\alpha,\beta) = (2,0)}. We also have the following further relations:

Proposition 1

  • (i) If (1) holds for {(\alpha,\beta)}, then it holds for {(\beta,\alpha)}.
  • (ii) If (1) holds for {(\alpha,\beta)}, then (2) holds for {(\alpha+1, \frac{\beta}{2})}.
  • (iii) If (2) holds for {(\gamma,\delta)}, then (1) holds for {(\frac{2\gamma}{3}, \frac{2\delta}{3})}.
  • (iv) If (1) holds for {(\alpha,\beta)} and (2) holds for {(\gamma,\delta)}, then (1) holds for {(\frac{2\alpha+1+\gamma}{4}, \frac{\delta+\beta}{4})}.

Proof: For (i) we simply observe that

\displaystyle \| xyx^{-1} y^{-1} \| = \| (xyx^{-1} y^{-1})^{-1} \| = \| y^{-1} x^{-1} y x \| = \| y x y^{-1} x^{-1} \|.

For (ii), we calculate

\displaystyle \| xyx^{-2}y^{-1} \| = \frac{1}{2}\| (xyx^{-2}y^{-1})^2 \|

\displaystyle = \frac{1}{2} \| (xyx^{-2}y^{-1} x) (yx^{-2} y^{-1}) \|

\displaystyle \leq \frac{1}{2} (\| xyx^{-2}y^{-1} x\| + \|yx^{-2} y^{-1}\|)

\displaystyle \leq \frac{1}{2} ( \| x^2 y x^{-2} y^{-1} \| + 2 \|x\| )

\displaystyle \leq \frac{1}{2} ( 2 \alpha \|x\| + \beta \|y\| + 2 \|x\|)

giving the claim.

For (iii), we calculate

\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{3} \| (xyx^{-1}y^{-1})^3 \|

\displaystyle = \frac{1}{3} \| (xyx) (x^{-2} y^{-1} xy) (xyx)^{-1} (x^2 y x^{-1} y^{-1}) \|

\displaystyle \leq \frac{1}{3} ( \| x^{-2} y^{-1} xy\| + \| x^2 y x^{-1} y^{-1}\| )

\displaystyle = \frac{1}{3} ( \| xy x^{-2} y^{-1} \| + \|x^{-1} y^{-1} x^2 y \| )

\displaystyle \leq \frac{1}{3} ( \gamma \|x\| + \delta \|y\| + \gamma \|x\| + \delta \|y\|)

giving the claim.

For (iv), we calculate

\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{4} \| (xyx^{-1}y^{-1})^4 \|

\displaystyle = \frac{1}{4} \| (xy) (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) (xy)^{-1} (x^2yx^{-1}y^{-1}) \|

\displaystyle \leq \frac{1}{4} ( \| (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) \| + \|x^2yx^{-1}y^{-1}\| )

\displaystyle \leq \frac{1}{4} ( \|(y x^{-1} y^{-1}) (xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)

\displaystyle \leq \frac{1}{4} ( \|x\| + \|(xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)

\displaystyle = \frac{1}{4} ( \|x\| + \|x^{-2} y x^2 y^{-1} \|+ \gamma \|x\| + \delta \|y\|)

\displaystyle \leq \frac{1}{4} ( \|x\| + 2\alpha \|x\| + \beta \|y\| + \gamma \|x\| + \delta \|y\|)

giving the claim. \Box

Here is a typical application of the above estimates. If (1) holds for {(\alpha,\beta)}, then by part (i) it holds for {(\beta,\alpha)}, then by (ii) (2) holds for {(\beta+1,\frac{\alpha}{2})}, then by (iv) (1) holds for {(\frac{3\beta+2}{4}, \frac{3\alpha}{8})}. The map {(\alpha,\beta) \mapsto (\frac{3\beta+2}{4}, \frac{3\alpha}{8})} has fixed point {(\alpha,\beta) = (\frac{16}{23}, \frac{6}{23})}, thus

\displaystyle \| xyx^{-1}y^{-1} \| \leq \frac{16}{23} \|x\| + \frac{6}{23} \|y\|.

For instance, if {\|a\|, \|b\| \leq 1}, then {\|aba^{-1}b^{-1} \| \leq 22/23 = 0.95652\dots}.

Terence TaoMetrics of linear growth – the solution

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1 Let {G = (G,\cdot)} be a group. Suppose one has a “seminorm” function {\| \|: G \rightarrow [0,+\infty)} which obeys the triangle inequality

\displaystyle \|xy \| \leq \|x\| + \|y\|

for all {x,y \in G}, with equality whenever {x=y}. Then the seminorm factors through the abelianisation map {G \mapsto G/[G,G]}.

Proof: By the triangle inequality, it suffices to show that {\| [x,y]\| = 0} for all {x,y \in G}, where {[x,y] := xyx^{-1}y^{-1}} is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have {\|x^2\| = 2 \|x\|}, and hence {\|x^n \| = n \|x\|} whenever {n} is a power of two. On the other hand, by the triangle inequality we have {\|x^n \| \leq n\|x\|} for all positive {n}, and hence by the triangle inequality again we also have the matching lower bound, thus

\displaystyle \|x^n \| = n \|x\|

for all {n > 0}. The claim is also true for {n=0} (apply the preceding bound with {x=1} and {n=2}). By replacing {\|x\|} with {\max(\|x\|, \|x^{-1}\|)} if necessary we may now also assume without loss of generality that {\|x^{-1} \| = \|x\|}, thus

\displaystyle \|x^n \| = |n| \|x\| \ \ \ \ \ (1)

 

for all integers {n}.

Next, for any {x,y \in G}, and any natural number {n}, we have

\displaystyle \|yxy^{-1} \| = \frac{1}{n} \| (yxy^{-1})^n \|

\displaystyle = \frac{1}{n} \| y x^n y^{-1} \|

\displaystyle \leq \frac{1}{n} ( \|y\| + n \|x\| + \|y\|^{-1} )

so on taking limits as {n \rightarrow \infty} we have {\|yxy^{-1} \| \leq \|x\|}. Replacing {x,y} by {yxy^{-1},y^{-1}} gives the matching lower bound, thus we have the conjugation invariance

\displaystyle \|yxy^{-1} \| = \|x\|. \ \ \ \ \ (2)

 

Next, we observe that if {x,y,z,w} are such that {x} is conjugate to both {wy} and {zw^{-1}}, then one has the inequality

\displaystyle \|x\| \leq \frac{1}{2} ( \|y \| + \| z \| ). \ \ \ \ \ (3)

 

Indeed, if we write {x = swys^{-1} = t zw^{-1} t^{-1}} for some {s,t \in G}, then for any natural number {n} one has

\displaystyle \|x\| = \frac{1}{2n} \| x^n x^n \|

\displaystyle = \frac{1}{2n} \| swy \dots wy s^{-1}t zw^{-1} \dots zw^{-1} t^{-1} \|

where the {wy} and {zw^{-1}} terms each appear {n} times. From (2) we see that conjugation by {w} does not affect the norm. Using this and the triangle inequality several times, we conclude that

\displaystyle \|x\| \leq \frac{1}{2n} ( \|s\| + n \|y\| + \| s^{-1} t\| + n \|z\| + \|t^{-1} \| ),

and the claim (3) follows by sending {n \rightarrow \infty}.

The following special case of (3) will be of particular interest. Let {x,y \in G}, and for any integers {m,k}, define the quantity

\displaystyle f(m,k) := \| x^m [x,y]^k \|.

Observe that {x^m [x,y]^k} is conjugate to both {x (x^{m-1} [x,y]^k)} and to {(y^{-1} x^m [x,y]^{k-1} xy) x^{-1}}, hence by (3) one has

\displaystyle \| x^m [x,y]^k \| \leq \frac{1}{2} ( \| x^{m-1} [x,y]^k \| + \| y^{-1} x^{m} [x,y]^{k-1} xy \|)

which by (2) leads to the recursive inequality

\displaystyle f(m,k) \leq \frac{1}{2} (f(m-1,k) + f(m+1,k-1)).

We can write this in probabilistic notation as

\displaystyle f(m,k) \leq {\bf E} f( (m,k) + X )

where {X} is a random vector that takes the values {(-1,0)} and {(1,-1)} with probability {1/2} each. Iterating this, we conclude in particular that for any large natural number {n}, one has

\displaystyle f(0,n) \leq {\bf E} f( Z )

where {Z := (0,n) + X_1 + \dots + X_{2n}} and {X_1,\dots,X_{2n}} are iid copies of {X}. We can write {Z = (1,-1/2) (Y_1 + \dots + Y_{2n})} where Y_1,\dots,Y_{2n} = \pm 1 are iid signs.  By the triangle inequality, we thus have

\displaystyle f( Z ) \leq |Y_1+\dots+Y_{2n}| (\|x\| + \frac{1}{2} \| [x,y] \|),

noting that Y_1+\dots+Y_{2n} is an even integer.  On the other hand, Y_1+\dots+Y_{2n} has mean zero and variance 2n, hence by Cauchy-Schwarz

\displaystyle f(0,n) \leq \sqrt{2n}( \|x\| + \frac{1}{2} \| [x,y] \|).

But by (1), the left-hand side is equal to {n \| [x,y]\|}. Dividing by {n} and then sending {n \rightarrow \infty}, we obtain the claim. \Box

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation {G = (G,+)}, thus for instance {\| nx \| = |n| \|x\|} for {n \in {\bf Z}}. We think of {G} as a {{\bf Z}}-module. One can then extend the seminorm to the associated {{\bf Q}}-vector space {G \otimes_{\bf Z} {\bf Q}} by the formula {\|\frac{a}{b} x\| := \frac{a}{b} \|x\|}, and then to the associated {{\bf R}}-vector space {G \otimes_{\bf Z} {\bf R}} by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition {\|x\| = \|x^{-1}\|}). Conversely, any seminorm on {G \otimes_{\bf Z} {\bf R}} induces a seminorm on {G}. (These arguments also appear in this paper of Khare and Rajaratnam.)

 

Terence TaoHomogeneous length functions on groups

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies homogeneous length functions {\| \|: G \rightarrow {\bf R}^+} on an arbitrary group {G = (G,\cdot,e,()^{-1})}, that is to say non-negative functions that obey the symmetry condition {\|x^{-1}\| = \|x\|}, the non-degeneracy condition {\|x\|=0 \iff x=e}, the triangle inequality {\|xy\| \leq \|x\| + \|y\|}, and the homogeneity condition {\|x^2\| = 2\|x\|}. It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that {G} can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as {\|[x,y]\|}, until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

David Hoggstreams and dust

At Stars Group meeting, Lauren Anderson (Flatiron) and Denis Erkal (Surrey) both spoke about stellar streams. Anderson spoke about finding them with a new search technique that looks at proper motions for stars found in great-circle segments; this is being prepared for Gaia DR2. Erkal spoke about constraining the Milky-Way potential using only configurational aspects of streams: If a small stream segment locally don't contain the Galactic Center, there must be asphericity in the gravitational potential.

Late in the day, Yi-Kuan Chiang (JHU) showed me absolutely beautiful results cross-correlating various Milky-Way dust maps with high-redshift objects. There ought to be no correlations, at least in the low-extinction regions. But there are correlations, and it is because the dust maps are all contaminated by high-redshift dust in the extragalactic objects themselves (or objects correlated therewith). He can conclude nice things about different dust-map techniques. We discussed (inconclusively) whether his work could be turned around and used to improve Milky-Way map-making.

January 11, 2018

Jordan EllenbergWanlin Li, “Vanishing of hyperelliptic L-functions at the central point”

My Ph.D. student Wanlin Li has posted her first paper!  And it’s very cool.  Here’s the idea.  If chi is a real quadratic Dirichlet character, there’s no reason the special value L(1/2,chi) should vanish; the functional equation doesn’t enforce it, there’s no group whose rank is supposed to be the order of vanishing, etc.  And there’s an old conjecture of Chowla which says the special value never vanishes.  On the very useful principle that what needn’t happen doesn’t happen.

Alexandra Florea (last seen on the blog here)  gave a great seminar here last year about quadratic L-functions over function fields, which gave Wanlin the idea of thinking about Chowla’s conjecture in that setting.  And something interesting developed — it turns out that Chowla’s conjecture is totally false!  OK, well, maybe not totally false.  Let’s put it this way.  If you count quadratic extensions of F_q(t) up to conductor N, Wanlin shows that at least c N^a of the corresponding L-functions vanish at the center of the critical strip.  The exponent a is either 1/2,1/3, or 1/5, depending on q.  But it is never 1.  Which is to say that Wanlin’s theorem leaves open the possibility that o(N) of the first N hyperelliptic L-functions vanishes at the critical point.  In other words, a density form of Chowla’s conjecture over function fields might still be true — in fact, I’d guess it probably is.

The main idea is to use some algebraic geometry.  To say an L-function vanishes at 1/2 is to say some Frobenius eigenvalue which has to have absolute value q^{1/2} is actually equal to q^{1/2}.  In turn, this is telling you that the hyperelliptic curve over F_q whose L-function you’re studying has a map to some fixed elliptic curve.  Well, that’s something you can make happen by physically writing down equations!  Of course you also need a lower bound for the number of distinct quadratic extensions of F_q(t) that arise this way; this is the most delicate part.

I think it’s very interesting to wonder what the truth of the matter is.  I hope I’ll be back in a few months to tell you what new things Wanlin has discovered about it!

 

Jordan EllenbergA rejection from Gordon Lish

At some point I’m going to go through my gigantic file of rejection letters from the mid-1990s, when I was trying to be a fiction writer, and write a post about them, but for now, here’s the one I have from Gordon Lish, from 27 June 1996, when he was editing The Quarterly:

Excellent Jordan — you only impress me further with your force. I
would take the cut if I liked it. I don’t — but I do admire the man
who made it. Cheers, Gordon.

 

January 10, 2018

BackreactionSuperfluid dark matter gets seriously into business

very dark fluid Most matter in the universe isn’t like the stuff we are made of. Instead, it’s a thinly distributed, cold, medium which rarely interacts both with itself and with other kinds of matter. It also doesn’t emit light, which is why physicists refer to it as “dark matter.” A recently proposed idea, according to which dark matter may be superfluid, has now become more concrete,

David Hoggstream search

Lauren Anderson (Flatiron) and Vasily Belokurov (Cambridge) have been developing a (relatively) model-free search for compact structures in phase-space, to be used on Gaia DR2 in April. Anderson showed me the current results, which highlight many possible streams, in pre-Gaia test data from SDSS (created by Sergey Koposov at CMU). She requires that the phase-space over-density be consistent with a section of a great circle, at a (somewhat) well-defined Galactocentric distance (because: reflex proper-motion from the Sun's velocity), moving in a direction along the circle. She finds lots of putative streams, but many of them seem to highlight edges and issues with the spatial selection function, so there are still issues to work out. The nice thing is that if we can get something working here, it will crush on the forthcoming Gaia data, which will have a much simpler selection function.

David Hoggloose talk of #aliens

[Warning: This post is not, strictly, a research post. It is a response to events in the astronomical community in the recent past.] Word on the street (I can't find out, because it is not open) is that some argument broke out on the Facebook(tm) astronomy group about loose discussion on the internets about #aliens, and things like the Boyajian star or the 'Oumuamua asteroid. Since I am partially responsible for this loose talk, here is my position:

First, I want to separate informal discussion (like on twitter or blogs) from formal discussion in scientific papers (like what might be submitted to arXiv) from press releases. These are three different things, and I think we need to treat them differently. Second, I am going to assert that it is reasonable and normal for astronomers to discuss in scientific papers (sometimes) the possibility that there is alien life or alien technology with visible impact on observations. Third, I am going to presume that the non-expert public deserves our complete respect and cooperation. If you disagree with any of these things, my argument might not appeal to you.

On the second assumption (aliens are worthy of discussion), you can ask yourself: Was it a reasonable use of telescope time to look at 'Oumuamua in the radio, to search for technological radio transmissions? If you think that this was a reasonable thing to do with our resources, then you agree with the second assumption. Similarly if you think SETI is worth doing. If you don't think these uses of telescopes are reasonable—and it is understandable and justifiable not to—then you might think all talk of aliens is illegitimate. Fine. But I think that most of us think that it is legitimate to study SETI and related matters. I certainly do.

Now if we accept the second assumption, and if we accept the third assumption (and I really don't have any time for you if you don't accept the third assumption), then I think it is legitimate (and perhaps even ethically required) that we have our discussions about aliens out in the open, visible to the public! The argument (that we shouldn't be talking about such things) appears to be: “Some people (and in particular some news outlets) go ape-shit when there is talk of aliens, so we all need to stop talking about aliens!” But now let's move this into another context: Imagine someone in a role in which they serve the public and are partially responsible for X saying: “Some people (and in particular some news outlets) go ape-shit when there is talk of X, so we all need to stop talking about X!” Obviously that would be a completely unethical position for any public servant to take. And it wouldn't just be fodder for conspiracy theorists, it would actually be evidence of a conspiracy.

Imagine we, as a community, decided to only discuss alien technology in private, and never in public. Would that help or hurt with the wild speculation or ape-shit reactions? In the long run, I think it would hurt us, and hurt the public, and be unethical to boot. Informal discussion of all matters of importance to astronomers are legitimately held in the open. We are public servants, ultimately.

Now, I have two caveats to this. The first is that it is possible for papers and press releases and news articles to be irresponsible about their discussion of aliens. For example, the reportage claiming (example here)—and it may originate in the paper itself—that the reddening observed in the Boyajian Star rules out alien megastructures was debatably irresponsible in two ways. For one, it implied that the megastructure hypothesis was a leading hypothesis, which it was not, and for two, it implied that the megastructure hypothesis was specific enough to be ruled out by reddening, which it wasn't. Indeed, the chatter on Twitter(tm) led to questions about whether aliens could ever be ruled out by observations, and that is an interesting question, which relates to the second assumption (aliens are worthy of discussion) given above. Either way, the paper and resulting press implied that the observational result constrained aliens, which it did not; the posterior probability of aliens (extremely low to begin with) is almost completely unchanged by the observations in that paper. To imply otherwise is to imply that alien technology is a mature scientific hypothesis, which it isn't.

Note, in the above paragraph, that I hold papers and press releases to a higher standard than loose, informal discussion! That is my first assumption, above. You might disagree with it, but note that it would be essentially completely chilling to all informal, open discussion of science if we required refereed-publication-quality backing for anything we say, anywhere. It would effectively re-create the conspiracy that I reject above.

I don't mean to be too critical here, the Boyajian-star paper was overall extremely responsible and careful and sensible. As are many other papers about planet results, even ones that end up getting illustrated with an artist's impression of a rocky planet with ocean shores and/or raging surf. If I have a complaint about exoplanet science as a community (and I count myself a member of this community; I am not casting blame elsewhere), it is about the paper-to-press interface, where artist's conceptions and small signals are amplified into luscious and misleading story-time by perfectly sensible reporters. We (as a community, and as a set of funded projects) are complicit in this.

The second caveat to what I have written above is that I (for one) and many others talk on Twitter(tm) with tongue in cheek and with sarcasm, irony, and exaggeration. It takes knowledge of the medium, of scientists, and of the individuals involved to decode it properly. When I tweeted that it was “likely” that 'Oumuamua was an alien spaceship, I was obviously (to me) exaggerating, for the purposes of having a fun and interesting discussion. And indeed, the asteroid looks different in color, shape, and spin rate (and maybe therefore composition and tensile strength) from other asteroids in our own Solar System. But it might have been irresponsible to use my exaggeration and humor when it comes to aliens, because aliens do set off some people, especially those who might not know the conventions of scientists and twitter. I take that criticism, and I'll try to be more careful.

One last point: The underlying idea of those who say we should keep alien discussion behind closed doors (or cut it off completely) is at least partly that the public can't handle it. I find that attitude disturbing and wrong: In my experience, ordinary people are very wise readers of the news, with good sense and responsibility, and they are just as good at reading arguments on Twitter(tm). The fact that there are some exceptions—or that the Daily Mail is an irresponsible news outlet—does not change the truth of my third assumption (people deserve our respect). We should just ignore and deprecate irresponsible news, and continue to have our discussions out in the open!

In the long run, astronomy will benefit from open-ness, honesty, and carefully circumscribed reporting of goals and results. We won't benefit from hiding our legitimate scientific discussions from the public for fear that they will be mis-interpreted.

January 09, 2018

Tommaso DorigoJ4BnUHZMGRqHevJ7QB72vtaM

The title of this post is the password code required to connect to my wireless network at home. The service, provided by the Italian internet provider Fastweb, has been discontinued as I moved to Padova from Venice, so you can't really do much with the code itself. On the other hand, it should give you hints on a couple of things.

read more

BackreactionMe, elsewhere

Beginning 2018, I will no longer write for Ethan Siegel’s Forbes collection “Starts With a Bang.” Instead, I will write a semi-regular column for Quanta Magazine, the first of which -- about asymptotically safe gravity -- appeared yesterday. In contrast to Forbes, Quanta Magazine keeps the copyright, which means that the articles I write for them will not be mirrored on this blog. You

January 08, 2018

Doug NatelsonSelected items

A few recent items that caught my eye:

  • The ever-creative McEuen and Cohen groups at Cornell worked together to make graphene-based origami widgets.   Access to the paper seems limited right now, but here is a link that has some of the figures.
  • Something else that the Cohen group has worked on in the past are complex fluids, such as colloidal suspensions.  The general statistical physics problem of large ensembles of interacting classical objects (e.g., maybe short-range rigid interactions, as in grains of sand, or perhaps M&Ms) is incredibly rich.  Sure, there are no quantum effects, but often you have to throw out the key simplifying assumption of statistical physics (that your system can readily explore all microscopic states compatible with overall constraints).  This can lead to some really weird effects, like dice packing themselves into an ordered array when stirred properly.  
  • When an ensemble of (relatively) hard classical objects really lock up collectively and start acting like a solid, that's called jamming.  It's still a very active subject of study, and is of huge industrial importance.  It also explains why mayonnaise gets much more viscous all of the sudden as egg yolk is added.
  • I'd be remiss if I didn't highlight a really nice article in Quanta about one of the grand challenges of (condensed matter) physics:  Classifying all possible thermodynamic phases of matter.   While the popular audience thinks of a handful of phases (solid, liquid, gas, maybe plasma), the physics perspective is broader, because of ideas about order and symmetries.  Now we understand more than ever before that we need to consider phases with different  topological properties as well.  Classification is not just "stamp collecting".

January 07, 2018

Jordan EllenbergHow I won the talent show

So I don’t want to brag or anything but I won the neighborhood talent show tonight.  Look, here’s my trophy:

My act was “The Great Squarerootio.”  I drank beer and computed square roots in my head.  So I thought it might be fun to explain how to do this!  Old hat for my mathematician readers, but this is elementary enough for everyone.

Here’s how it works.  Somebody in the audience said “752.”  First of all, you need to know your squares.  I think I know them all up to 36^2 = 1296.  In this case, I can see that 752 is between 27^2 = 729 and 28^2 = 784, a little closer to 729.  So my first estimate is 27.  This is not too bad:  27^2 = 729 is 23 away from 752, so my error is 23.

Now here’s the rule that makes things go:

new estimate = old estimate + error/(2*old estimate).

In this case, the error is 23 and my old estimate is 27, so I should calculate 23/2*27; well, 23 is just a bit less than 27, between 10 and 20% less, so 23/2*27 is a bit less than 1/2, but should be bigger than 0.4.

So our new estimate is “27.4 something.”

Actual answer: 27.4226…

Actual value of 27 + 23/54: 27.4259…

so I probably would have gotten pretty close on the hundredth place if I’d taken more care with estimating the fraction.  Of course, another way to get even closer is to take 27.4 as your “old estimate” and repeat the process!  But then I’d have to know the square of 27.4, which I don’t by heart, and computing it mentally would give me some trouble.

Why does this work?  The two-word answer is “Newton’s method.”  But let me give a more elementary answer.

Suppose x is our original estimate, and n is the number we’re trying to find the square root of, and e is the error; that is, n = x^2 + e.

We want to know how much we should change x in order to get a better estimate.  So let’s say we modify x by d.  Then

(x+d)^2 = x^2 + 2xd + d^2

Now let’s say we’ve wisely chosen x to be the closest integer to the square root of n, so d should be pretty small, less than 1 at any rate; then d^2 is really small.  So small I will decide to ignore it, and estimate

(x+d)^2 = x^2 + 2xd.

What we want is

(x+d)^2 = n = x^2 +e.

For this to be the case, we need e = 2xd, which tells us we should take d = e/2x; that’s our rule above!

Here’s another way to think about it.  We’re trying to compute 752^(1/2), right?  And 752 is 729+23.  So what is (729+23)^(1/2)?

If you remember the binomial theorem from Algebra 2, you might remember that it tells you how to compute (x+y)^n for any n.  Well, any positive whole number n, right?  That’s the thing!  No!  Any n!  It works for fractions too!  Your Algebra 2 teacher may have concealed this awesome fact from you!  I will not be so tight-lipped.

The binomial theorem says

(x+y)^n = x^n + {n \choose 1} x^{n-1} y + {n \choose 2} x^{n-2} y^2 + \ldots +

Plugging in x=729,y=23,n=1/2 we get

(729+23)^{1/2} = 729^{1/2} + {1/2 \choose 1} 729^{-1/2} \cdot 23 + {1/2 \choose 2} 729^{-3/2} \cdot 23^2 + \ldots

Now 729^{1/2} we know; that’s 27.  What is the binomial coefficient 1/2 choose 1?  Is it the number of ways to choose 1 item out of 1/2 choices?  No, because that makes no sense.  Rather:  by “n choose 1” we just mean n, by “n choose 2” we just mean (1/2)n(n-1), etc.

So we get

(729+23)^{1/2} = 27 + (1/2) \cdot 23 / 27 + (-1/8) \cdot 23^2 / 27^3 + \ldots

And look, that second term is 23/54 again!  So the Great Squarerootio algorithm is really just “use the first two terms in the binomial expansion.”

To sum up:  if you know your squares up to 1000, and you can estimate fractions reasonably fast, you can get a pretty good mental approximation to the square root of any three-digit number.  Even while drinking beer!  You might even be able to beat out cute kids in your neighborhood and win a big honking cup!

 

January 04, 2018

Mark Chu-CarrollA Gentle Explanation of the Intel Speculative Execution CPU Bug

There’s a lot of talk today about the recent discovery of a significant bug in the Intel CPUs that so many of us use every day. It’s an interesting problem, and understanding it requires knowing a little bit about how CPUs work, so I thought I’d take a shot at writing an explainer.

Let me start with a huge caveat: this involves a lot of details about how CPUs work, and in order to explain it, I’m going to simplify things to an almost ridiculous degree in order to try to come up with an explanation that’s comprehensible to a lay person. I’m never deliberately lying about how things work, but at times, I’m simplifying enough that it will be infuriating to an expert. I’m doing my best to explain my understanding of this problem in a way that most people will be able to understand, but I’m bound to oversimplify in some places, and get details wrong in others. I apologize in advance.

It’s also early days for this problem. Intel is still trying to keep the exact details of the bug quiet, to make it harder for dishonest people to exploit it. So I’m working from the information I’ve been able to gather about the issue so far. I’ll do my best to correct this post as new information comes out, but I can only do that when I’m not at work!

That said: what we know so far is that the Intel bug involves non-kernel code being able to access cached kernel memory through the use of something called speculative execution.

To an average person, that means about as much as a problem in the flux thruster atom pulsar electrical ventury space-time implosion field generator coil.

Let’s start with a quick overview of a modern CPU.

The CPU, in simple terms, the brain of a computer. It’s the component that actually does computations. It reads a sequence of instructions from memory, and then follows those instructions to perform some computation on some values, which are also stored in memory. This is a massive simplification, but basically, you can think of the CPU as a pile of hardware than runs a fixed program:

def simplified_cpu_main_loop():
  IP = 0
  while true:
     (op, in1, in2, out) = fetch(IP)
     val1 = fetch(in1)
     val2 = fetch(in2)
     result, IP = perform(op, in1, in2)
     store(result, out)

There’s a variable called the instruction pointer (abbreviated IP) built-in to the CPU that tells it where to fetch the next instruction from. Each time the clock ticks, the CPU fetches an instruction from the memory address stored in the instruction pointer, fetches the arguments to that instruction from cells in memory, performs the operation on those arguments, and then stores the result into another cell in the computer memory. Each individual operation produces both a result, and a new value for the instruction pointer. Most of the time, you just increment the instruction pointer to look at the next instruction, but for comparisons or branches, you can change it to something else.

What I described above is how older computers really worked. But as CPUs got faster, chipmaker ran into a huge problem: the CPU can perform its operations faster and faster every year. But retrieving a value from memory hasn’t gotten faster at the same rate as executing instructions. The exact numbers don’t matter, but to give you an idea, a modern CPU can execute an instruction in less than one nanosecond, but fetching a single value from memory takes more than 100 nanoseconds. In the scheme we described above, you need to fetch the instruction from memory (one fetch), and then fetch two parameters from memory (another two fetches), execute the instruction (1 nanosecond), and then store the result back into memory (one store). Assuming a store is no slower than a fetch, that means that for one nanosecond of computation time, the CPU needs to do 3 fetches and one store for each instruction. That means that the CPU is waiting, idle, for at least 400ns, during which it could have executed another 400 instructions, if it didn’t need to wait for memory.

That’s no good, obviously. There’s no point in making a fast CPU if all it’s going to do is sit around and wait for slow memory. But designers found ways to work around that, by creating ways to do a lot of computation without needing to pause to wait things to be retrieved from/stored to memory.

One of those tricks was to add registers to the CPUs. A register is a cell of memory inside of the CPU itself. Early processors (like the 6502 that was used by the Apple II) had one main register called an accumulator. Every arithmetic instruction would work by retrieving a value from memory, and then performing some arithmetic operation on the value in the accumulator and the value retrieved from memory, and leave the result in the accumulator. (So, for example, if 0x1234 were the address variable X, you could add the value of X to the accumulator with the instruction “ADD (1234)”. More modern CPUs added many registers, so that you can keep all of the values that you need for some computation in different registers. Reading values from or writing values to registers is lightning fast – in fact, it’s effectively free. So you structure your computations so that they load up the registers with the values they need, then do the computation in registers, and then dump the results out to memory. Your CPU can run a fairly long sequence of instructions without ever pausing for a memory fetch.

Expanding on the idea of putting memory into the CPU, they added ways of reducing the cost of working with memory by creating copies of the active memory regions on the CPU. These are called caches. When you try to retrieve something from memory, if it’s in the cache, then you can access it much faster. When you access something from a memory location that isn’t currently in the cache, the CPU will copy a chunk of memory including that location into the cache.

You might ask why, if you can make the cache fast, why not just make all of memory like the cache? The answer is that the time it takes in hardware to retrieve something from memory increases with amount of memory that you can potentially access. Pointing at a cache with 1K of memory is lightning fast. Pointing at a cache with 1 megabyte of memory is much slower that the 1K cache, but much faster that a 100MB cache; pointing at a cache with 100MB is even slower, and so on.

So what we actually do in practice is have multiple tiers of memory. We have the registers (a very small set – a dozen or so memory cells, which can be accessed instantly); a level-0 cache (on the order of 8k in Intel’s chips), which is pretty fast; a level-1 cache (100s of kilobytes), an L2 cache (megabytes), L3 (tens of megabytes), and now even L4 (100s of megabytes). If something isn’t in L0 cache, then we look for it in L1; if we can’t find it in L1, we look in L2, and so on, until if we can’t find it in any cache, we actually go out to the main memory.

There’s more we can do to make things faster. In the CPU, you can’t actually execute an entire instruction all at once – it’s got multiple steps. For a (vastly simplified) example, in the pseudocode above, you can think of each instruction as four phases: (1) decode the instruction (figuring out what operation it performs, and what its parameters are), (2) fetch the parameters, (3) perform the operations internal computation, and (4) write out the result. By taking advantage of that, you can set up your CPU to actually do a lot of work in parallel. If there are three phases to executing an instruction, then you can execute phase one of instruction one in one cycle; phase one of instruction two and phase two of instruction one in the next cycle; phase one of instruction three, phase two of instruction two, and phase three of instruction one in the third cycle. This process is called pipelining.

To really take advantage of pipelining, you need to keep the pipeline full. If your CPU has a four-stage pipeline, then ideally, you always know what the next four instructions you’re going to execute are. If you’ve got the machine code version of an if-then-else branch, when you start the comparison, you don’t know what’s going to come next until you finish it, because there are two possibilities. That means that when you get to phase 2 of your branch instruction, you can’t start phase one of the next instruction. instruction until the current one is finished – which means that you’ve lost the advantage of your pipeline.

That leads to another neat trick that people play in hardware, called branch prediction. You can make a guess about which way a branch is going to go. An easy way to understand this is to think of some numerical code:

def run_branch_prediction_demo():
  for i in 1 to 1000:
     for j in 1 to 1000:
          q = a[i][j] * sqrt(b[i][j])

After each iteration of the inner loop, you check to see if j == 1000. If it isn’t, you branch back to the beginning of that loop. 999 times, you branch back to the beginning of the loop, and one time, you won’t. So you can predict that you take the backward branch, and you can start executing the early phases of the first instructions of the next iteration. That may, most of the time you’re running the loop, your pipeline is full, and you’re executing your computation quickly!

The catch is that you can’t execute anything that stores a result. You need to be able to say “Oops, everything that I started after that branch was wrong, so throw it away!”. Alongside with branch prediction, most CPUs also provide speculative execution, which is a way of continuing to execute instructions in the pipeline, but being able to discard the results if they’re the result of an incorrect branch prediction.

Ok, we’re close. We’ve got to talk about just another couple of basic ideas before we can get to just what the problem is with these Intel chips.

We’re going to jump up the stack a bit, and instead of talking directly about the CPU hardware, we’re going to talk about the operating system, and how it’s implemented on the CPU.

An operating system is just a program that runs on your computer. The operating system can load and run other programs (your end-user applications), and it manages all of the resources that those other programs can work with. When you use an application that allocates memory, it sent a request called a syscall to the operating system asking it to give it some memory. If your application wants to read data from a disk drive, it makes a syscall to open a file and read data. The operating system is responsible for really controlling all of those resources, and making sure that each program that’s running only accesses the things that it should be allowed to access. Program A can only use memory allocated by program A; if it tries to access memory allocated by program B, it should cause an error.

The operating system is, therefore, a special program. It’s allowed to touch any piece of memory, any resource owned by anything on the computer. How does that work?

There are two pieces. First, there’s something called memory protection. The hardware provides a mechanism that the CPU can use to say something like “This piece of memory is owned by program A”. When the CPU is running program A, the memory protection system will arrange the way that memory looks to the program so that it can access that piece of memory; anything else just doesn’t exist to A. That’s called memory mapping: the system memory of the computer is mapped for A so that it can see certain pieces of memory, and not see others. In addition to memory mapping, the memory protection system can mark certain pieces of memory as only being accessible by privileged processes.

Privileged processes get us to the next point. In the CPU, there’s something called an execution mode: programs can run in a privileged mode (sometimes called kernel space execution), or it can run in a non-privileged mode (sometimes called user-space execution). Only code that’s running in kernel-space can do things like send commands to the memory manager, or change memory protection settings.

When your program makes a syscall, what really happens is that your program puts the syscall parameters into a special place, and then sends a signal called an interrupt. The interrupt switches the CPU into system space, and gives control to the operating system, which reads the interrupt parameters, and does whatever it needs to. Then it puts the result where the user space program expects it, switches back to user-space, and then allows the user space program to continue.

That process of switching from the user space program to the kernel space, doing something, and then switching back is called a context switch. Context switches are very expensive. Implemented naively, you need to redo the memory mapping every time you switch. So the interrupt consists of “stop what you’re doing, switch to privileged mode, switch to the kernel memory map, run the syscall, switch to the user program memory map, switch to user mode”.

Ok. So. We’re finally at the point where we can actually talk about the Intel bug.

Intel chips contain a trick to make syscalls less expensive. Instead of having to switch memory maps on a syscall, they allow the kernel memory to be mapped into the memory map of every process running in the system. But since kernel memory can contain all sorts of secret stuff (passwords, data belonging to other processes, among other things), you can’t let user space programs look at it – so the kernel memory is mapped, but it’s marked as privileged. With things set up that way, a syscall can drop the two “switch memory map” steps in the syscall scenario. Now all a syscall needs to do is switch to kernel mode, run the syscall, and switch back to user mode. It’s dramatically faster!

Here’s the problem, as best I understand from the information that’s currently available:

Code that’s running under speculative execution doesn’t do the check whether or not memory accesses from cache are accessing privileged memory. It starts running the instructions without the privilege check, and when it’s time to commit to whether or not the speculative execution should be continued, the check will occur. But during that window, you’ve got the opportunity to run a batch of instructions against the cache without privilege checks. So you can write code with the right sequence of branch instructions to get branch prediction to work the way you want it to; and then you can use that to read memory that you shouldn’t be able to read.

With that as a starting point, you can build up interesting exploits that can ultimately allow you to do almost anything you want. It’s not a trivial exploit, but with a bit of work, you can use a user space program to make a sequence of syscalls to get information you want into memory, and then write that information wherever you want to – and that means that you can acquire root-level access, and do anything you want.

The only fix for this is to stop doing that trick where you map the kernel memory into every user space memory map, because there’s no way to enforce the privileged memory property in speculative execution. In other words, drop the whole syscall performance optimization. That’ll avoid the security issue, but it’s a pretty expensive fix: requiring a full context switch for every syscall will slow down the execution of user space programs by something between 5 and 30 percent.

January 03, 2018

BackreactionSometimes I believe in string theory. Then I wake up.

They talk about me. Grumpy Rainbow Unicorn.[Image Source.] And I can’t blame them. Because nothing else is happening on this planet. There’s just me and my attempt to convince physicists that beauty isn’t truth. Yes, I know it’s not much of an insight that pretty ideas aren’t always correct. That’s why I objected when my editor suggested I title my book “Why Beauty isn’t Truth.” Because,

BackreactionHow do you prove that Earth is older than 10,000 years?

Mesosaurus fossil. Probably dating back to the early Permian Period, roughly 300 million years ago. [Image source] Planet Earth formed around 4.5 billion years ago. The first primitive forms of life appeared about 4 billion years ago. Natural selection did the rest, giving rise to species increasingly better adapted to their environment. Evidence, as they say, is overwhelming. Or is it?

January 02, 2018

Mark Chu-CarrollZombie Math in the Vortex

Paul Krugman has taken to calling certain kinds of economic ideas zombie economics, because no matter how many times they’re shown to be false, they just keep coming back from the dead. I certainly don’t have stature that compares in any way to Krugmant, but I’m still going to use his terminology for some bad math. There are some crackpot ideas that you just can’t kill.

For example, vortex math. I wrote about vortex math for the first time in 2012, again in early 2013, and again in late 2013. But like a zombie in a bad movie, it’s fans won’t let it stay dead. There must have been a discussion on some vortex-math fan forum recently, because over the last month, I’ve been getting comments on the old posts, and emails taking me to task for supposedly being unfair, closed-minded, ignorant, and generally a very nasty person.

Before I look at any of their criticisms, let’s start with a quick refresher. What is vortex math?

We’re going to create a pattern of single-digit numbers using multiples of 2. Take the number 1. Multiply it by 2, and you get 2. Multiple it by 2, and you get 4. Again, you get 8. Again, and you get 16. 16 is two digits, but we only want one-digit numbers, so we add them together, getting 7. Double, you get 14, so add the digits, and you get 5. Double, you get 10, add the digits, and you get 1. So you’ve got a repeating sequence: 1, 2, 4, 8, 7, 5, …


Take the numbers 1 through 9, and put them at equal distances around the perimeter of a circle. Draw an arrow from a number to its single-digit double. You end up with something that looks kinda-sorta like the infinity symbol. You can also fit those numbers onto the surface of a torus.

That’s really all there is to vortex math. This guy named Marco Rodin discovered that there’s a repeating pattern, and if you draw it on a circle, it looks kinda-like the infinity symbol, and that there must be something incredibly profound and important about it. Launching from there, he came up with numerous claims about what that means. According to vortex math, there’s something deeply significant about that pattern:

  1. If you make metallic windings on a toroidal surface according to that pattern and use it as a generator, it will generate free energy.
  2. Take that same coil, and run a current through it, and you have a perfect, reactionless space drive (called “the flux thruster atom pulsar electrical ventury space time implosion field generator coil”).
  3. If you use those numbers as a pattern in a medical device, it will cure cancer, as well as every other disease.
  4. If you use that numerical pattern, you can devise better compression algorithms that can compress any string of bits.
  5. and so on…

Essentially, according to vortex math, that repeated pattern of numbers defines a “vortex”, which is the deepest structure in the universe, and it’s the key to understanding all of math, all of physics, all of metaphysics, all of medicine. It’s the fundamental pattern of everything, and by understanding it, you can do absolutely anything.

As a math geek, the problem with stuff like vortex math is that it’s difficult to refute mathematically, because even though Rodin calls it math, there’s really no math to it. There’s a pattern, and therefore magic! Beyond the observation that there’s a pattern, there’s nothing but claims of things that must be true because there’s a pattern, without any actual mathematical argument.

Let me show you an example, from one of Rodin’s followers, named Randy Powell.

I call my discovery the ABHA Torus. It is now the full completion of how to engineer Marko Rodin’s Vortex Based Mathematics. The ABHA Torus as I have discovered it is the true and perfect Torus and it has the ability to reveal in 3-D space any and all mathematical/geometric relationships possible allowing it to essentially accomplish any desired functional application in the world of technology. This is because the ABHA Torus provides us a mathematical framework where the true secrets of numbers (qualitative relationships based on angle and ratio) are revealed in fullness.

This is why I believe that the ABHA Torus as I have calculated is the most powerful mathematical tool in existence because it presents proof that numbers are not just flat imaginary things. To the contrary, numbers are stationary vector interstices that are real and exhibiting at all times spatial, temporal, and volumetric qualities. Being stationary means that they are fixed constants. In the ABHA Torus the numbers never move but the functions move through the numbers modeling vibration and the underlying fractal circuitry that natures uses to harness living energy.

The ABHA Torus as revealed by the Rodin/Powell solution displays a perfectly symmetrical spin array of numbers (revealing even prime number symmetry), a feat that has baffled countless scientists and mathematicians throughout the ages. It even uncovers the secret of bilateral symmetry as actually being the result of a diagonal motion along the surface and through the internal volume of the torus in an expanding and contracting polarized logarithmic spiral diamond grain reticulation pattern produced by the interplay of a previously unobserved Positive Polarity Energetic Emanation (so-called ‘dark’ or ‘zero-point’ energy) and a resulting Negative Polarity Back Draft Counter Space (gravity).

If experimentally proven correct such a model would for example replace the standard approach to toroidal coils used in energy production today by precisely defining all the proportional and angular relationships existent in a moving system and revealing not only the true pathway that all accelerated motion seeks (be it an electron around the nucleus of an atom or water flowing down a drain) but in addition revealing this heretofore unobserved, undefined point energetic source underlying all space-time, motion, and vibration.

Lots of impressive sounding words, strung together in profound sounding ways, but what does it mean? Sure, gravity is a “back draft” of an unobserved “positive polarity energetic emanatation”, and therefore we’ve unified dark energy and gravity, and unified all of the forces of our universe. That sounds terrific, except that it doesn’t mean anything! How can you test that? What evidence would be consistent with it? What evidence would be inconsistent with it? No one can answer those questions, because none of it means anything.

As I’ve said lots of times before: there’s a reason for the formal framework of mathematics. There’s a reason for the painful process of mathematical proof. There’s a reason why mathematicians and scientists have devised an elaborate language and notation for expressing mathematical ideas. And that reason is because it’s easy to string together words in profound sounding ways. It’s easy to string together reasoning in ways that look like they might be compelling if you took the time to understand them. But to do actual mathematics or actual science, you need to do more that string together something that sounds good. You need to put together something that is precise. The point of mathematical notation and mathematical reasoning is to take complex ideas and turn them into precisely defined, unambiguous structures that have the same meaning to everyone who looks at them.

“positive polarity energetic emanation” is a bunch of gobbledegook wordage that doesn’t mean anything to anyone. I can’t refute the claim that gravity is a back-draft negative polarity energetic reaction to dark energy. I can’t support that claim, either. I can’t do much of anything with it, because Randy Powell hasn’t said anything meaningful. It’s vague and undefined in ways that make it impossible to reason about in any way.

And that’s the way that things go throughout all of vortex math. There’s this cute pattern, and it must mean something! Therefore… endless streams of words, without any actual mathematical, physical, or scientific argument.

There’s so much wrong with vortex math, but it all comes down to the fact that it takes some arbitrary artifacts of human culture, and assigns them deep, profound meaning for no reason.

There’s this pattern in the doubling of numbers and reducing them to one digit. Why multiple by two? Because we like it, and it produces a pretty pattern. Why not use 3? Well, because in base-10, it won’t produce a good pattern: [1, 3, 9, 9, 9, 9, ….] But we can pick another number like 7: [1, 7, 5, 8, 2, 5, 8, 2, 5, ….], and get a perfectly good series: why is that series less compelling than [1, 4, 8, 7, 2, 5]?

There’s nothing magical about base-10. We can do the same thing in base-8: [1, 2, 4, 1, 2, 4…] How about base-12, which was used for a lot of stuff in Egypt? [1, 2, 4, 8, 5, 10, 9, 7, 3, 6, 1] – that gives us a longer pattern! What makes base-10 special? Why does the base-10 pattern mean something that other bases, or other numerical representations, don’t? The vortex math folks can’t answer that. (Note: I made an arithmetic error in the initial version of the base-12 sequence above. It was pointed out in comments by David Wallace. Thanks!)

If we plot the numbers on a circle, we get something that looks kind-of like an infinity symbol! What does that mean? Why should the infinity symbal (which was invented in the 17th century, and chosen because it looked sort of like a number, and sort-of like the last letter of the greek alphabet) have any intrinsic meaning to the universe?

It’s giving profound meaning to arbitrary things, for unsupported reasons.

So what’s in the recent flood of criticism from the vortex math guys?

Well, there’s a lot of “You’re mean, so you’re wrong.” And there’s a lot of “Why don’t you prove that they’re wrong instead of making fun of them?”. And last but not least, there’s a lot of “Yeah, well, the fibonacci series is just a pattern of numbers too, but it’s really important”.

On the first: Yeah, fine, I’m mean. But I get pretty pissed at seeing people get screwed over by charlatans. The vortex math guys use this stuff to take money from “investors” based on their claims about producing limitless free energy, UFO space drives, and cancer cures. This isn’t abstract: this kind of nonsense hurts people. They people who are pushing these scams deserve to be mocked, without mercy. They don’t deserve kindness or respect, and they’re not going to get it from me.

I’d love to be proved wrong on this. One of my daughter’s friends is currently dying of cancer. I’d give up nearly anything to be able to stop her, and other children like her, from dying an awful death. If the vortex math folks could do anything for this poor kid, I would gladly grovel and humiliate myself at their feet. I would dedicate the rest of my life to nothing but helping them in their work.

But the fact is, when they talk about the miraculous things vortex math can do? At best, they’re delusional; more likely, they’re just lying. There is no cure for cancer in [1, 2, 4, 8, 7, 5, 1].

As for the Fibonacci series: well. It’s an interesting pattern. It does appear to show up in some interesting places in nature. But there are two really important differences.

  1. The Fibonacci series shows up in every numeric notation, in every number base, no matter how you do numbers.
  2. It does show up in nature. This is key: there’s more to it than just words and vague assertions. You can really find fragments of the Fibonacci series in nature. By doing a careful mathematical analysis, you can find the Fibonacci series in numerous places in mathematics, such as the solutions to a range of interesting dynamic optimization problems. When you find a way of observing the vortex math pattern in nature, or a way of producing actual numeric solutions for real problems, in a way that anyone can reproduce, I’ll happily give it another look.
  3. The Fibonacci series does appear in nature – but it’s also been used by numerous crackpots to make ridiculous assertions about how the world must work!

Tommaso DorigoVenice --> Padova

I was born and have lived in Venice for over 51 years now (omitting to mention some 2 years of interruption when I worked for Harvard University, 18 years ago), but this has come to an end on December 31st, when I concluded a rather complex move to Padova, 35 kilometers west. 
Venice is a wonderful city and quite a special place, if you ask me. A city with a millenary history, crammed with magnificent palaces and churches. A place where one could write a book about every stone. Walking through the maze of narrow streets or making one's way through a tight network of canals is an unforgettable experience, but living there for decades is something else - it makes you a part of it. I feel I own the place, in some way. So why did I leave it?

read more

Peter RohdePhD positions available

A fully-funded PhD position is available at the University of Technology Sydney, Australia, to conduct forefront theoretical research in the quantum information sciences, working with Dr Peter Rohde within the Centre for Quantum Software & Information (QSI).

The research topics are flexible, including:

* Quantum machine learning
* Quantum cryptography (especially homomorphic encryption and blind quantum computing)
* Quantum networking (especially cloud quantum computing)
* Quantum computing (with an emphasis on optical quantum computing, boson-sampling and quantum random walks)
* Quantum information theory
* Quantum metrology
* Your own suggestions for exciting projects in quantum technology

The candidate should have a background in physics, computer science, engineering, mathematics or a related discipline, and be highly creative, independent and passionate about quantum technology. The duration of the position is for 3 years, and includes a scholarship for $25,861/year (tax free). The position is research-only, with no teaching or coursework obligations.

QSI is a leading research centre in the quantum information sciences, and the candidate will have the opportunity to collaborate with leading researchers within the centre, as well as with other researchers domestically and internationally. Sydney is home to several major quantum research centres, presenting outstanding local collaboration opportunities.

The candidate will conduct theoretical research to be published in international journals, present research findings at major conferences, and build collaboration networks. Travel opportunities will be available.

To apply for the position or request further information, please contact Dr Peter Rohde (peter.rohde@uts.edu.au) by January 14. When applying, please provide a resume, academic record, contact details for two academic referees, and a statement of your research interests and passions. Applications are now open.

Please distribute this advert amongst your colleagues, students, mailing lists and Facebook groups.

– Ad astra per alas fideles. Scientia potentia est.

January 01, 2018

John BaezQuantum Mechanics and the Dodecahedron

This is an expanded version of my G+ post, which was a watered-down version of Greg Egan’s G+ post and the comments on that. I’ll start out slow, and pick up speed as I go.

Quantum mechanics meets the dodecahedron

In quantum mechanics, the position of a particle is not a definite thing: it’s described by a ‘wavefunction’. This says how probable it is to find the particle at any location… but it also contains other information, like how probable it is to find the particle moving at any velocity.

Take a hydrogen atom, and look at the wavefunction of the electron.

Question 1. Can we make the electron’s wavefunction have all the rotational symmetries of a dodecahedron—that wonderful Platonic solid with 12 pentagonal faces?

Yes! In fact it’s too easy: you can make the wavefunction look like whatever you want.

So let’s make the question harder. Like everything else in quantum mechanics, angular momentum can be uncertain. In fact you can never make all 3 components of angular momentum take definite values simultaneously! However, there are lots of wavefunctions where the magnitude of an electron’s angular momentum is completely definite.

This leads naturally to the next question, which was first posed by Gerard Westendorp:

Question 2. Can an electron’s wavefunction have a definite magnitude for its angular momentum while having all the rotational symmetries of a dodecahedron?

Yes! And there are infinitely many ways for this to happen! This is true even if we neglect the radial dependence of the wavefunction—that is, how it depends on the distance from the proton. Henceforth I’ll always do that, which lets us treat the wavefunction as a function on a sphere. And by the way, I’m also ignoring the electron’s spin! So, whenever I say ‘angular momentum’ I mean orbital angular momentum: the part that depends only on the electron’s position and velocity.

Question 2 has a trivial solution that’s too silly to bother with. It’s the spherically symmetric wavefunction! That’s invariant under all rotations. The real challenge is to figure out the simplest nontrivial solution. Egan figured it out, and here’s what it looks like:

The rotation here is just an artistic touch. Really the solution should be just sitting there, or perhaps changing colors while staying the same shape.

In what sense is this the simplest nontrivial solution? Well, the magnitude of the angular momentum is equal to

\hbar^2 \sqrt{\ell(\ell+1)}

where the number \ell is quantized: it can only take values 0, 1, 2, 3,… and so on.

The trivial solution to Question 2 has \ell = 0. The first nontrivial solution has \ell = 6. Why 6? That’s where things get interesting. We can get it using the 6 lines connecting opposite faces of the dodecahedron!

I’ll explain later how this works. For now, let’s move straight on to a harder question:

Question 3. What’s the smallest choice of \ell where we can find two linearly independent wavefunctions that both have the same \ell and both have all the rotational symmetries of a dodecahedron?

It turns out to be \ell = 30. And Egan created an image of a wavefunction oscillating between these two possibilities!

But we can go a lot further:

Question 4. For each \ell, how many linearly independent functions on the sphere have that value of \ell and all the rotational symmetries of a dodecahedron?

For \ell ranging from 0 to 29 there are either none or one. There are none for these numbers:

1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 17, 19, 23, 29

and one for these numbers:

0, 6, 10, 12, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28

The pattern continues as follows. For \ell ranging from 30 to 59 there are either one or two. There is one for these numbers:

31, 32, 33, 34, 35, 37, 38, 39, 41, 43, 44, 47, 49, 53, 59

and two for these numbers:

30, 36, 40, 42, 45, 46, 48, 50, 51, 52, 54, 55, 56, 57, 58

The numbers in these two lists are just 30 more than the numbers in the first two lists! And it continues on like this forever: there’s always one more linearly independent solution for \ell + 30 than there is for \ell.

Question 5. What’s special about these numbers from 0 to 29?

0, 6, 10, 12, 15, 18, 20, 21, 22, 24, 25, 26, 27, 28

You don’t need to know tons of math to figure this out—but I guess it’s a sort of weird pattern-recognition puzzle unless you know which patterns are likely to be important here. So I’ll give away the answer.

Here’s the answer: these are the numbers below 30 that can be written as sums of the numbers 6, 10 and 15.

But the real question is why? Also: what’s so special about the number 30?

The short, cryptic answer is this. The dodecahedron has 6 axes connecting the centers of opposite faces, 10 axes connecting opposite vertices, and 15 axes connecting the centers of opposite edges. The least common multiple of these numbers is 30.

But this requires more explanation!

For this, we need more math. You may want to get off here. But first, let me show you the solutions for \ell = 6, \ell = 10, and \ell = 15, as drawn by Greg Egan. I’ve already showed you \ell = 6, which we could call the quantum dodecahedron:

Here is \ell = 10, which looks like a quantum icosahedron:

And here is \ell = 15:

Maybe this deserves to be called a quantum Coxeter complex, since the Coxeter complex for the group of rotations and reflections of the dodecahedron looks like this:

Functions with icosahedral symmetry

The dodecahedron and icosahedron have the same symmetries, but for some reason people talk about the icosahedron when discussing symmetry groups, so let me do that.

So far we’ve been looking at the rotational symmetries of the icosahedron. These form a group called \mathrm{A}_5, or \mathrm{I} for short, with 60 elements. We’ve been looking for certain functions on the sphere that are invariant under the action of this group. To get them all, we’ll first get ahold of all polynomials on \mathbb{R}^3 that are invariant under the action of this group Then we’ll restrict these to the sphere.

To save time, we’ll use the work of Claude Chevalley. He looked at rotation and reflection symmetries of the icosahedron. These form the group \mathrm{I} \times \mathbb{Z}/2, also known as \mathrm{H}_3, but let’s call it \hat{\mathrm{I}} for short. It has 120 elements, but never confuse it with two other groups with 120 elements: the symmetric group on 5 letters, and the binary icosahedral group.

Chevalley found all polynomials on \mathbb{R}^3 that are invariant under the action of this bigger group \hat{\mathrm{I}}. These invariant polynomials form an algebra, and Chevalley showed that this algebra is freely generated by 3 homogeneous polynomials:

P(x,y,z) = x^2 + y^2 + z^2, of degree 2.

Q(x,y,z), of degree 6. To get this we take the dot product of (x,y,z) with each of the 6 vectors joining antipodal vertices of the icosahedron, and multiply them together.

R(x,y,z), of degree 10. To get this we take the dot product of (x,y,z) with each of the 10 vectors joining antipodal face centers of the icosahedron, and multiply them together.

So, linear combinations of products of these give all polynomials on \mathbb{R}^3 invariant under all rotation and reflection symmetries of the icosahedron.

But we want the polynomials that are invariant under just rotational symmetries of the icosahedron! To get all these, we need an extra generator:

S(x,y,z), of degree 15. To get this we take the dot product of (x,y,z) with each of the 15 vectors joining antipodal edge centers of the icosahedron, and multiply them together.

You can check that this is invariant under rotational symmetries of the icosahedron. But unlike our other polynomials, this one is not invariant under reflection symmetries! Because 15 is an odd number, S switches sign under ‘total inversion’—that is, replacing (x,y,z) with -(x,y,z). This is a product of three reflection symmetries of the icosahedron.

Thanks to Egan’s extensive computations, I’m completely convinced that P,Q,R and S generate the algebra of all \mathrm{I}-invariant polynomials on \mathbb{R}^3. I’ll take this as a fact, even though I don’t have a clean, human-readable proof. But someone must have proved it already—do you know where?

Since we now have 4 polynomials on \mathbb{R}^3, they must obey a relation. Egan figured it out:

S^2 = 500 P^9 Q^2 - 2275 P^6 Q^3 + 3440 P^3 Q^4 - 1728 Q^5 + 200 P^7 Q R
- 795 P^4 Q^2 R + 720 P Q^3 R + 4 P^5 R^2 -65 P^2 Q R^2 - R^3

The exact coefficients depend on some normalization factors used in defining Q,R and S. Luckily the details don’t matter much. All we’ll really need is that this relation expresses S^2 in terms of the other generators. And this fact is easy to see without any difficult calculations!

How? Well, we’ve seen S is unchanged by rotations, while it changes sign under total inversion. So, the most any rotation or reflection symmetry of the icosahedron can do to S is change its sign. This means that S^2 is invariant under all these symmetries. So, by Chevalley’s result, it must be a polynomial in P, Q, and R.

So, we now have a nice description of the \mathrm{I}-invariant polynomials on \mathbb{R}^3, in terms of generators and relations. Each of these gives an \mathrm{I}-invariant function on the sphere. And Leo Stein, a postdoc at Caltech who has a great blog on math and physics, has kindly created some images of these.

The polynomial P is spherically symmetric so it’s too boring to draw. The polynomial Q, of degree 6, looks like this when restricted to the sphere:

Since it was made by multiplying linear functions, one for each axis connecting opposite vertices of an icosahedron, it shouldn’t be surprising that we see blue blobs centered at these vertices.

The polynomial R, of degree 10, looks like this:

Here the blue blobs are centered on the icosahedron’s 20 faces.

Finally, here’s S, of degree 15:

This time the blue blobs are centered on the icosahedron’s 30 edges.

Now let’s think a bit about functions on the sphere that arise from polynomials on \mathbb{R}^3. Let’s call them algebraic functions on the sphere. They form an algebra, and it’s just the algebra of polynomials on \mathbb{R}^3 modulo the relation P = 1, since the sphere is the set \{P = 1\}.

It makes no sense to talk about the ‘degree’ of an algebraic function on the sphere, since the relation P = 1 equates polynomials of different degree. What makes sense is the number \ell that I was talking about earlier!

The group \mathrm{SO}(3) acts by rotation on the space of algebraic functions on the sphere, and we can break this space up into irreducible representations of \mathrm{SO}(3). It’s a direct sum of irreps, one of each ‘spin’ \ell = 0, 1, 2, \dots.

So, we can’t talk about the degree of a function on the sphere, but we can talk about its \ell value. On the other hand, it’s very convenient to work with homogeneous polynomials on \mathbb{R}^3, which have a definite degree—and these restrict to functions on the sphere. How can we relate the degree and the quantity \ell?

Here’s one way. The polynomials on \mathbb{R}^3 form a graded algebra. That means it’s a direct sum of vector spaces consisting of homogeneous polynomials of fixed degree, and if we multiply two homogeneous polynomials their degrees add. But the algebra of polynomials restricted to the sphere is merely filtered algebra.

What does this mean? Let F be the algebra of all algebraic functions on the sphere, and let F_\ell \subset F consist of those that are restrictions of polynomials of degree \le \ell. Then:

1) F_\ell \subseteq F_{\ell + 1}

and

2) \displaystyle{ F = \bigcup_{\ell = 0}^\infty F_\ell }

and

3) if we multiply a function in F_\ell by one in F_m, we get one in F_{\ell + m}.

That’s what a filtered algebra amounts to.

But starting from a filtered algebra, we can get a graded algebra! It’s called the associated graded algebra.

To do this, we form

G_\ell = F_\ell / F_{\ell - 1}

and let

\displaystyle{ G = \bigoplus_{\ell = 0}^\infty G_\ell }

Then G has a product where multiplying a guy in G_\ell and one in G_m gives one in G_{\ell + m}. So, it’s indeed a graded algebra! For the details, see Wikipedia, which manages to make it look harder than it is. The basic idea is that we multiply in F and then ‘ignore terms of lower degree’. That’s what G_\ell = F_\ell / F_{\ell - 1} is all about.

Now I want to use two nice facts. First, G_\ell is the spin-\ell representation of \mathrm{SO}(3). Second, there’s a natural map from any filtered algebra to its associated graded algebra, which is an isomorphism of vector spaces (though not of algebras). So, we get an natural isomorphism of vector spaces

\displaystyle{  F \cong G = \bigoplus_{\ell = 0}^\infty G_\ell }

from the algebraic functions on the sphere to the direct sum of all the spin-\ell representations!

Now to the point: because this isomorphism is natural, it commutes with symmetries, so we can also use it to study algebraic functions on the sphere that are invariant under a group of linear transformations of \mathbb{R}^3.

Before tackling the group we’re really interested in, let’s try the group of rotation and reflection symmetries of the icosahedron, \hat{\mathrm{I}}. As I mentioned, Chevalley worked out the algebra of polynomials on \mathbb{R}^3 that are invariant under this bigger group. It’s a graded commutative algebra, and it’s free on three generators: P of degree 2, Q of degree 6, and R of degree 10.

Starting from here, to get the algebra of \hat{\mathrm{I}}-invariant algebraic functions on the sphere, we mod out by the relation P = 1. This gives a filtered algebra which I’ll call F^{\hat{\mathrm{I}}}. (It’s common to use a superscript with the name of a group to indicate that we’re talking about the stuff that’s invariant under some action of that group.) From this we can form the associated graded algebra

\displaystyle{ G^{\hat{\mathrm{I}}} = \bigoplus_{\ell = 0}^\infty G_\ell^{\hat{\mathrm{I}}} }

where

G_\ell^{\hat{\mathrm{I}}} = F_\ell^{\hat{\mathrm{I}}} / F_{\ell - 1}^{\hat{\mathrm{I}}}

If you’ve understood everything I’ve been trying to explain, you’ll see that G_\ell^{\hat{\mathrm{I}}} is the space of all functions on the sphere that transform in the spin-\ell representation and are invariant under the rotation and reflection symmetries of the icosahedron.

But now for the fun part: what is this space like? By the work of Chevalley, the algebra F^{\hat{\mathrm{I}}} is spanned by products

P^p Q^q R^r

but since we have the relation P = 1, and no other relations, it has a basis given by products

Q^q R^r

So, the space F_\ell^{\hat{\mathrm{I}}} has a basis of products like this whose degree is \le \ell, meaning

6 q + 10 r \le \ell

Thus, the space we’re really interested in:

G_\ell^{\hat{\mathrm{I}}} = F_\ell^{\hat{\mathrm{I}}} / F_{\ell - 1}^{\hat{\mathrm{I}}}

has a basis consisting of equivalence classes

[Q^q R^r]

where

6 q + 10 r = \ell

So, we get:

Theorem 1. The dimension of the space of functions on the sphere that lie in the spin-\ell representation of \mathrm{SO}(3) and are invariant under the rotation and reflection symmetries of the icosahedron equals the number of ways of writing \ell as an unordered sum of 6’s and 10’s.

Let’s see how this goes:

\ell = 0: dimension 1, with basis [1]

\ell = 1: dimension 0

\ell = 2: dimension 0

\ell = 3: dimension 0

\ell = 4: dimension 0

\ell = 5: dimension 0

\ell = 6: dimension 1, with basis [Q]

\ell = 7: dimension 0

\ell = 8: dimension 0

\ell = 9: dimension 0

\ell = 10: dimension 1, with basis [R]

\ell = 11: dimension 0

\ell = 12: dimension 1, with basis [Q^2]

\ell = 13: dimension 0

\ell = 14: dimension 0

\ell = 15: dimension 0

\ell = 16: dimension 1, with basis [Q R]

\ell = 17: dimension 0

\ell = 18: dimension 1, with basis [Q^3]

\ell = 19: dimension 0

\ell = 20: dimension 1, with basis [R^2]

\ell = 21: dimension 0

\ell = 22: dimension 1, with basis [Q^2 R]

\ell = 23: dimension 0

\ell = 24: dimension 1, with basis [Q^4]

\ell = 25: dimension 0

\ell = 26: dimension 1, with basis [Q R^2]

\ell = 27: dimension 0

\ell = 28: dimension 1, with basis [Q^3 R]

\ell = 29: dimension 0

\ell = 30: dimension 2, with basis [Q^5], [R^3]

So, the story starts out boring, with long gaps. The odd numbers are completely uninvolved. But it heats up near the end, and reaches a thrilling climax at \ell = 30. At this point we get two linearly independent solutions, because 30 is the least common multiple of the degrees of Q and R.

It’s easy to see that from here on the story ‘repeats’ with period 30, with the dimension growing by 1 each time:

\mathrm{dim}(G_{\ell+30}^{\hat{\mathrm{I}}}) = \mathrm{dim}(G_{\ell}^{\hat{\mathrm{I}}}) + 1

Now, finally, we are to tackle Question 4 from the first part of this post: for each \ell, how many linearly independent functions on the sphere have that value of \ell and all the rotational symmetries of a dodecahedron?

We just need to repeat our analysis with \mathrm{I}, the group of rotational symmetries of the dodecahedron, replacing the bigger group \hat{\mathrm{I}}.

We start with algebra of polynomials on \mathbb{R}^3 that are invariant under \mathrm{I}. As we’ve seen, this is a graded commutative algebra with four generators: P,Q,R as before, but also S of degree 15. To make up for this extra generator there’s an extra relation, which expresses S^2 in terms of the other generators.

Starting from here, to get the algebra of \mathrm{I}-invariant algebraic functions on the sphere, we mod out by the relation P = 1. This gives a filtered algebra I’ll call F^{\mathrm{I}}. Then we form the associated graded algebra

\displaystyle{ G^{\mathrm{I}} = \bigoplus_{\ell = 0}^\infty G_\ell^{\mathrm{I}} }

where

G_\ell^{\mathrm{I}} = F_\ell^{\mathrm{I}} / F_{\ell - 1}^{\mathrm{I}}

What we really want to know is the dimension of G_\ell^{\mathrm{I}}, since this is the space of functions on the sphere that transform in the spin-\ell representation and are invariant under the rotational symmetries of the icosahedron.

So, what’s this space like? The algebra F^{\mathrm{I}} is spanned by products

P^p Q^q R^r S^t

but since we have the relation P = 1, and a relation expressing S^2 in terms of other generators, it has a basis given by products

Q^q R^r S^s where s = 0, 1

So, the space F_\ell^{\mathrm{I}} has a basis of products like this whose degree is \le \ell, meaning

6 q + 10 r + 15 s \le \ell and s = 0, 1

Thus, the space we’re really interested in:

G_\ell^{\mathrm{I}} = F_\ell^{\mathrm{I}} / F_{\ell - 1}^{\mathrm{I}}

has a basis consisting of equivalence classes

[Q^q R^r S^s]

where

6 q + 10 r + 15 s = \ell and s = 0, 1

So, we get:

Theorem 2. The dimension of the space of functions on the sphere that lie in the spin-\ell representation of \mathrm{SO}(3) and are invariant under the rotational symmetries of the icosahedron equals the number of ways of writing \ell as an unordered sum of 6’s, 10’s and at most one 15.

Let’s work out these dimensions explicitly, and see how the extra generator S changes the story! Since it has degree 15, it contributes some solutions for odd values of \ell. But when we reach the magic number 30, this extra generator loses its power: S^2 has degree 30, but it’s a linear combination of other things.

\ell = 0: dimension 1, with basis [1]

\ell = 1: dimension 0

\ell = 2: dimension 0

\ell = 3: dimension 0

\ell = 4: dimension 0

\ell = 5: dimension 0

\ell = 6: dimension 1, with basis [Q]

\ell = 7: dimension 0

\ell = 8: dimension 0

\ell = 9: dimension 0

\ell = 10: dimension 1, with basis [R]

\ell = 11: dimension 0

\ell = 12: dimension 1, with basis [Q^2]

\ell = 13: dimension 0

\ell = 14: dimension 0

\ell = 15: dimension 1, with basis [S]

\ell = 16: dimension 1, with basis [Q R]

\ell = 17: dimension 0

\ell = 18: dimension 1, with basis [Q^3]

\ell = 19: dimension 0

\ell = 20: dimension 1, with basis [R^2]

\ell = 21: dimension 1, with basis [Q S]

\ell = 22: dimension 1, with basis [Q^2 R]

\ell = 23: dimension 0

\ell = 24: dimension 1, with basis [Q^4]

\ell = 25: dimension 1, with basis [R S]

\ell = 26: dimension 1, with basis [Q R^2]

\ell = 27: dimension 1, with basis [Q^2 S]

\ell = 28: dimension 1, with basis [Q^3 R]

\ell = 29: dimension 0

\ell = 30: dimension 2, with basis [Q^5], [R^3]

From here on the story ‘repeats’ with period 30, with the dimension growing by 1 each time:

\mathrm{dim}(G_{\ell+30}^{\mathrm{I}}) = \mathrm{dim}(G_{\ell}^{\mathrm{I}}) + 1

So, we’ve more or less proved everything that I claimed in the first part. So we’re done!

Postscript

But I can’t resist saying a bit more.

First, there’s a very different and somewhat easier way to compute the dimensions in Theorems 1 and 2. It uses the theory of characters, and Egan explained it in a comment on the blog post on which this is based.

Second, if you look in these comments, you’ll also see a lot of material about harmonic polynomials on \mathbb{R}^3—that is, those obeying the Laplace equation. These polynomials are very nice when you’re trying to decompose the space of functions on the sphere into irreps of \mathrm{SO}(3). The reason is that the harmonic homogeneous polynomials of degree \ell, when restricted to the sphere, give you exactly the spin-\ell representation!

If you take all homogeneous polynomials of degree \ell and restrict them to the sphere you get a lot of ‘redundant junk’. You get the spin-\ell rep, plus the spin-(\ell-2) rep, plus the spin-(\ell-4) rep, and so on. The reason is the polynomial

P = x^2 + y^2 + z^2

and its powers: if you have a polynomial living in the spin-\ell rep and you multiply it by P, you get another one living in the spin-\ell rep, but you’ve boosted the degree by 2.

Layra Idarani pointed out that this is part of a nice general theory. But I found all this stuff slightly distracting when I was trying to prove Theorems 1 and 2 assuming that we had explicit presentations of the algebras of \hat{\mathrm{I}}– and \mathrm{I}-invariant polynomials on \mathbb{R}^3. So, instead of introducing facts about harmonic polynomials, I decided to use the ‘associated graded algebra’ trick. This is a more algebraic way to ‘eliminate the redundant junk’ in the algebra of polynomials and chop the space of functions on the sphere into irreps of \mathrm{SO}(3).

Also, Egan and Idarani went ahead and considered what happens when we replace the icosahedron by another Platonic solid. It’s enough to consider the cube and tetrahedron. These cases are actually subtler than the icosahedron! For example, when we take the dot product of (x,y,z) with each of the 10 vectors joining antipodal face centers of the cube, and multiply them together, we get a polynomial that’s not invariant under rotations of the cube! Up to a constant it’s just x y z, and this changes sign under some rotations.

People call this sort of quantity, which gets multiplied by a number under transformations instead of staying the same, a semi-invariant. The reason we run into semi-invariants for the cube and tetrahedron is that their rotational symmetry groups, \mathrm{S}_4 and \mathrm{A}_4, have nontrivial abelianizations, namely \mathbb{Z}/2 and \mathbb{Z}/3. The abelianization of \mathrm{I} \cong \mathrm{A}_5 is trivial.

Egan summarized the story as follows:

Just to sum things up for the cube and the tetrahedron, since the good stuff has ended up scattered over many comments:

For the cube, we define:

A of degree 4 from the cube’s vertex-axes, a full invariant
B of degree 6 from the cube’s edge-centre-axes, a semi-invariant
C of degree 3 from the cube’s face-centre-axes, a semi-invariant

We have full invariants:

A of degree 4
C2 of degree 6
BC of degree 9

B2 can be expressed in terms of A, C and P, so we never use it, and we use BC at most once.

So the number of copies of the trivial rep of the rotational symmetry group of the cube in spin ℓ is the number of ways to write ℓ as an unordered sum of 4, 6 and at most one 9.

For the tetrahedron, we embed its vertices as four vertices of the cube. We then define:

V of degree 4 from the tet’s vertices, a full invariant
E of degree 3 from the tet’s edge-centre axes, a full invariant

And the B we defined for the embedding cube serves as a full invariant of the tet, of degree 6.

B2 can be expressed in terms of V, E and P, so we use B at most once.

So the number of copies of the trivial rep of the rotational symmetry group of the tetrahedron in spin ℓ is the number of ways to write ℓ as a sum of 3, 4 and at most one 6.

All of this stuff reminds me of a baby version of the theory of modular forms. For example, the algebra of modular forms is graded by ‘weight’, and it’s the free commutative algebra on a guy of weight 4 and a guy of weight 6. So, the dimension of the space of modular forms of weight k is the number of ways of writing k as an unordered sum of 4’s and 6’s. Since the least common multiple of 4 and 6 is 12, we get a pattern that ‘repeats’, in a certain sense, mod 12. Here I’m talking about the simplest sort of modular forms, based on the group \mathrm{SL}_2(\mathbb{Z}). But there are lots of variants, and I have the feeling that this post is secretly about some sort of variant based on finite subgroups of \mathrm{SL}(2,\mathbb{C}) instead of infinite discrete subgroups.

There’s a lot more to say about all this, but I have to stop or I’ll never stop. Please ask questions and if you want me to say more!

Doug NatelsonThe new year and another arbitrary milestone

Happy new year to all!  I'm sure 2018 will bring some exciting developments in the discipline - at minimum, there will surely be a lot of talk about quantum computing.  I will attempt to post more often, and to work further on ways to bring condensed matter and nanoscale physics to a broader audience, though other responsibilities continue to make that a challenge.  Still, to modify a quote from Winston Churchill, "Writing a [blog] is like having a friend and companion at your side, to whom you can always turn for comfort and amusement, and whose society becomes more attractive as a new and widening field of interest is lighted in the mind."

By the way, this is the 1000th post on Nanoscale Views.  As we all know, this has special significance because 1000 is a big, round number.

n-Category Café Entropy Modulo a Prime (Continued)

In the comments last time, a conversation got going about pp-adic entropy. But here I’ll return to the original subject: entropy modulo pp. I’ll answer the question:

Given a “probability distribution” mod pp, that is, a tuple π=(π 1,,π n)(/p) n \pi = (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n summing to 11, what is the right definition of its entropy H p(π)/p? H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}?

How will we know when we’ve got the right definition? As I explained last time, the acid test is whether it satisfies the chain rule

H p(γ(π 1,,π n))=H p(γ)+ i=1 nγ iH p(π i). H_p(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_p(\gamma) + \sum_{i = 1}^n \gamma_i H_p(\pi^i).

This is supposed to hold for all γ=(γ 1,,γ n)Π n\gamma = (\gamma_1, \ldots, \gamma_n) \in \Pi_n and π i=(π 1 i,,π k i i)Π k i\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i}) \in \Pi_{k_i}, where Π n\Pi_n is the hyperplane

Π n={(π 1,,π n)(/p) n:π 1++π n=1}, \Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1\},

whose elements we’re calling “probability distributions” mod pp. And if God is smiling on us, H pH_p will be essentially the only quantity that satisfies the chain rule. Then we’ll know we’ve got the right definition.

Black belts in functional equations will be able to use the chain rule and nothing else to work out what H pH_p must be. But the rest of us might like an extra clue, and we have one in the definition of real Shannon entropy:

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i.

Now, we saw last time that there is no logarithm mod pp; that is, there is no group homomorphism

(/p) ×/p. (\mathbb{Z}/p\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.

But there is a next-best thing: a homomorphism

(/p 2) ×/p. (\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.

This is called the Fermat quotient q pq_p, and it’s given by

q p(n)=n p11p/p. q_p(n) = \frac{n^{p - 1} - 1}{p} \in \mathbb{Z}/p\mathbb{Z}.

Let’s go through why this works.

The elements of /p 2\mathbb{Z}/p^2\mathbb{Z} are the congruence classes mod p 2p^2 of the integers not divisible by pp. Fermat’s little theorem says that whenever nn is not divisible by pp,

n p11p \frac{n^{p - 1} - 1}{p}

is an integer. This, or rather its congruence class mod pp, is the Fermat quotient. The congruence class of nn mod p 2p^2 determines the congruence class of n p11n^{p - 1} - 1 mod p 2p^2, and it therefore determines the congruence class of (n p11)/p(n^{p - 1} - 1)/p mod pp. So, q pq_p defines a function (/p 2) ×/p(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}. It’s a pleasant exercise to show that it’s a homomorphism. In other words, q pq_p has the log-like property

q p(mn)=q p(m)+q p(n) q_p(m n) = q_p(m) + q_p(n)

for all integers m,nm, n not divisible by pp.

In fact, it’s essentially unique as such. Any other homomorphism (/p 2) ×/p(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z} is a scalar multiple of q pq_p. (This follows from the classical theorem that the group (/p 2) ×(\mathbb{Z}/p^2\mathbb{Z})^\times is cyclic.) It’s just like the fact that up to a scalar multiple, the real logarithm is the unique measurable function log:(0,)R\log : (0, \infty) \to \R such that log(xy)=logx+logy\log(x y) = \log x + \log y, but here there’s nothing like measurability complicating things.

So: q pq_p functions as a kind of logarithm. Given a mod pp probability distribution π=Π n\pi = \in \Pi_n, we might therefore guess that the right definition of its entropy is

i:π i0π iq p(a i), - \sum_{i : \pi_i \neq 0} \pi_i q_p(a_i),

where a ia_i is an integer representing π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}.

However, this doesn’t work. It depends on the choice of representatives a ia_i.

To get the right answer, we’ll look at real entropy in a slightly different way. Define :[0,1]\partial_\mathbb{R}: [0, 1] \to \mathbb{R} by

(x)={xlogx if x0, 0 if x=0.. \partial_\mathbb{R}(x) = \begin{cases} - x \log x &if&nbsp; x \neq 0, \\ 0 &if&nbsp; x = 0. \end{cases}.

Then \partial_\mathbb{R} has the derivative-like property

(xy)=x (y)+ (x)y. \partial_\mathbb{R}(x y) = x \partial_\mathbb{R}(y) + \partial_\mathbb{R}(x) y.

A linear map with this property is called a derivation, so it’s reasonable to call \partial_\mathbb{R} a nonlinear derivation.

The observation that \partial_\mathbb{R} is a nonlinear derivation turns out to be quite useful. For instance, real entropy is given by

H (π)= i=1 n (π i) H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i)

(πΠ n\pi \in \Pi_n), and verifying the chain rule for H H_\mathbb{R} is done most neatly using the derivation property of \partial_\mathbb{R}.

An equivalent formula for real entropy is

H (π)= i=1 n (π i) ( i=1 nπ i). H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i) - \partial_\mathbb{R}\biggl( \sum_{i = 1}^n \pi_i \biggr).

This is a triviality: π i=1\sum \pi_i = 1, so (π i)=0\partial_\mathbb{R}\bigl( \sum \pi_i \bigr) = 0, so this is the same as the previous formula. But it’s also quite suggestive: H (π)H_\mathbb{R}(\pi) measures the extent to which the nonlinear derivation \partial_\mathbb{R} fails to preserve the sum π i\sum \pi_i.

Now let’s try to imitate this in /p\mathbb{Z}/p\mathbb{Z}. Since q pq_p plays a similar role to log\log, it’s natural to define

p(n)=nq p(n)=nn pp \partial_p(n) = -n q_p(n) = \frac{n - n^p}{p}

for integers nn not divisible by pp. But the last expression makes sense even if nn is divisible by pp. So, we can define a function

p:/p 2/p \partial_p : \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}

by p(n)=(nn p)/p\partial_p(n) = (n - n^p)/p. (This is called a pp-derivation.) It’s easy to check that p\partial_p has the derivative-like property

p(mn)=m p(n)+ p(m)n. \partial_p(m n) = m \partial_p(n) + \partial_p(m) n.

And now we arrive at the long-awaited definition. The entropy mod pp of π=(π 1,,π n)\pi = (\pi_1, \ldots, \pi_n) is

H p(π)= i=1 n p(a i) p( i=1 na i), H_p(\pi) = \sum_{i = 1}^n \partial_p(a_i) - \partial_p\biggl( \sum_{i = 1}^n a_i \biggr),

where a ia_i \in \mathbb{Z} represents π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}. This is independent of the choice of representatives a ia_i. And when you work it out explicitly, it gives

H p(π)=1p(1 i=1 na i p). H_p(\pi) = \frac{1}{p} \biggl( 1 - \sum_{i = 1}^n a_i^p \biggr).

Just as in the real case, H pH_p satisfies the chain rule, which is most easily shown using the derivation property of p\partial_p.

Before I say any more, let’s have some examples.

  • In the real case, the uniform distribution u n=(1/n,,1/n)u_n = (1/n, \ldots, 1/n) has entropy logn\log n. Mod pp, this distribution only makes sense if pp does not divide nn (otherwise 1/n1/n is undefined); but assuming that, we do indeed have H p(u n)=q p(n)H_p(u_n) = q_p(n), as we’d expect.

  • When we take our prime pp to be 22, a probability distribution π\pi is just a sequence of bits like (0,0,1,0,1,1,1,0,1)(0, 0, 1, 0, 1, 1, 1, 0, 1) with an odd number of 11s. Its entropy H 2(π)/2H_2(\pi) \in \mathbb{Z}/2\mathbb{Z} turns out to be 00 if the number of 11s is congruent to 11 mod 44, and 11 if the number of 11s is congruent to 33 mod 44.

  • What about distributions on two elements? In other words, let α/p\alpha \in \mathbb{Z}/p\mathbb{Z} and put π=(α,1α)\pi = (\alpha, 1 - \alpha). What is H p(π)H_p(\pi)?

    It takes a bit of algebra to figure this out, but it’s not too hard, and the outcome is that for p2p \neq 2, H p(α,1α)= r=1 p1α rr. H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r}. This function was, in fact, the starting point of Kontsevich’s note, and it’s what he called the 1121\tfrac{1}{2}-logarithm.

We’ve now succeeded in finding a definition of entropy mod pp that satisfies the chain rule. That’s not quite enough, though. In principle, there could be loads of things satisfying the chain rule, in which case, what special status would ours have?

But in fact, up to the inevitable constant factor, our definition of entropy mod pp is the one and only definition satisfying the chain rule:

Theorem   Let (I:Π n/p)(I: \Pi_n \to \mathbb{Z}/p\mathbb{Z}) be a sequence of functions. Then II satisfies the chain rule if and only if I=cH pI = c H_p for some c/pc \in \mathbb{Z}/p\mathbb{Z}.

This is precisely analogous to the characterization theorem for real entropy, except that in the real case some analytic condition on II has to be imposed (continuity in Faddeev’s theorem, and measurability in the stronger theorem of Lee). So, this is excellent justification for calling H pH_p the entropy mod pp.

I’ll say nothing about the proof except the following. In Faddeev’s theorem over \mathbb{R}, the tricky part of the proof involves the fact that the sequence (logn) n1(\log n)_{n \geq 1} is not uniquely characterized up to a constant factor by the equation log(mn)=logm+logn\log(m n) = \log m + \log n; to make that work, you have to introduce some analytic condition. Over /p\mathbb{Z}/p\mathbb{Z}, the tricky part involves the fact that the domain of the “logarithm” (Fermat quotient) is not /p\mathbb{Z}/p\mathbb{Z}, but /p 2\mathbb{Z}/p^2\mathbb{Z}. So, analytic difficulties are replaced by number-theoretic difficulties.

Kontsevich didn’t actually write down a definition of entropy mod pp in his two-and-a-half page note. He did exactly enough to show that there must be a unique sensible such definition… and left it there! Of course he could have worked it out if he’d wanted to, and maybe he even did, but he didn’t write it up here.

Anyway, let’s return to the quotation from Kontsevich that I began my first post with:

Conclusion: If we have a random variable ξ\xi which takes finitely many values with all probabilities in \mathbb{Q} then we can define not only the transcendental number H(ξ)H(\xi) but also its “residues modulo pp” for almost all primes pp !

In the notation of these posts, he’s saying the following. Let

π=(π 1,,π n) \pi = (\pi_1, \ldots, \pi_n)

be a real probability distribution in which each π i\pi_i is rational. There are only finitely many primes that divide one or more of the denominators of π 1,,π n\pi_1, \ldots, \pi_n. For primes pp not belonging to this exceptional set, we can interpret π\pi as a probability distribution in /p\mathbb{Z}/p\mathbb{Z}. We can therefore take its mod pp entropy, H p(π)H_p(\pi).

Kontsevich is playfully suggesting that we view H p(π)/pH_p(\pi) \in \mathbb{Z}/p\mathbb{Z} as the residue class mod pp of H (π)H_\mathbb{R}(\pi) \in \mathbb{R}.

There is more to this than meets the eye! Different real probability distributions can have the same real entropy, so there’s a question of consistency. Kontsevich’s suggestion only makes sense if

H (π)=H (π)H p(π)=H p(π). H_\mathbb{R}(\pi) = H_\mathbb{R}(\pi') \implies H_p(\pi) = H_p(\pi').

And this is true! I have a proof, though I’m not convinced it’s optimal. Does anyone see an easy argument for this?

Let’s write (p)\mathcal{H}^{(p)} for the set of real numbers of the form H (π)H_\mathbb{R}(\pi), where π\pi is a real probability distribution whose probabilities π i\pi_i can all be expressed as fractions with denominator not divisible by pp. We’ve just seen that there’s a well-defined map

[.]: (p)/p [.] : \mathcal{H}^{(p)} \to \mathbb{Z}/p\mathbb{Z}

defined by

[H (π)]=H p(π). [H_\mathbb{R}(\pi)] = H_p(\pi).

For x (p)x \in \mathcal{H}^{(p)} \subseteq \mathbb{R}, we view [x][x] as the congruence class mod pp of xx. This notion of “congruence class” even behaves something like the ordinary notion, in the sense that [.][.] preserves addition.

(We can even go a bit further. Accompanying the characterization theorem for entropy mod pp, there is a characterization theorem for information loss mod pp, strictly analogous to the theorem that John Baez, Tobias Fritz and I proved over \mathbb{R}. I won’t review that stuff here, but the point is that an information loss is a difference of entropies, and this enables us to define the congruence class mod pp of the difference of two elements of (p)\mathcal{H}^{(p)}. The same additivity holds.)

There’s just one more thing. In a way, the definition of entropy mod pp is unsatisfactory. In order to define it, we had to step outside the world of /p\mathbb{Z}/p\mathbb{Z} by making arbitrary choices of representing integers, and then we had to show that the definition was independent of those choices. Can’t we do it directly?

In fact, we can. It’s a well-known miracle about finite fields KK that any function KKK \to K is a polynomial. It’s a slightly less well-known miracle that any function K nKK^n \to K, for any n0n \geq 0, is also a polynomial.

Of course, multiple polynomials can induce the same function. For instance, the polynomials x px^p and xx induce the same function /p/p\mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}. But it’s still possible to make a uniqueness statement. Given a function F:K nKF : K^n \to K, there’s a unique polynomial fK[x 1,,x n]f \in K[x_1, \ldots, x_n] that induces FF and is of degree less than the order of KK in each variable separately.

So, there must be a polynomial representing entropy, of order less than pp in each variable. And as it turns out, it’s this one:

H p(π 1,,π n)= 0r 1,,r n<p: r 1++r n=pπ 1 r 1π n r nr 1!r n!. H_p(\pi_1, \ldots, \pi_n) = - \sum_{\substack{0 \leq r_1, \ldots, r_n \lt p:\\r_1 + \cdots + r_n = p}} \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!}.

You can check that when n=2n = 2, this is in fact the same polynomial r=1 p1π 1 r/r\sum_{r = 1}^{p - 1} \pi_1^r/r as we met before — Kontsevich’s 1121\tfrac{1}{2}-logarithm.

It’s striking that this direct formula for entropy modulo a prime looks quite unlike the formula for real entropy,

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.

It’s also striking that in the case n=2n = 2, the formula for real entropy is

H (α,1α)=αlogα(1α)log(1α), H_\mathbb{R}(\alpha, 1 - \alpha) = - \alpha \log \alpha - (1 - \alpha) \log(1 - \alpha),

whereas mod pp, we get

H p(α,1α)= r=1 p1α rr, H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r},

which is a truncation of the Taylor series of log(1α)-\log(1 - \alpha). And yet, the characterization theorems for entropy over \mathbb{R} and over /p\mathbb{Z}/p\mathbb{Z} are strictly analogous.

As I see it, there are two or three big open questions:

  • Entropy over \mathbb{R} can be understood, interpreted and applied in many ways. How can we understand, interpret or apply entropy mod pp?

  • Entropy over \mathbb{R} and entropy mod pp are defined in roughly analogous ways, and uniquely characterized by strictly analogous theorems. Is there a common generalization? That is, can we unify the two definitions and characterization theorems, perhaps proving a theorem about entropy over suitable fields?

December 28, 2017

John BaezThe 600-Cell (Part 3)

There are still a few more things I want to say about the 600-cell. Last time I described the ‘compound of five 24-cells’. David Richter built a model of this, projected from 4 dimensions down to 3:

It’s nearly impossible to tell from this picture, but it’s five 24-cells inscribed in the 600-cell, with each vertex of the 600-cell being the vertex of just one of these five 24-cells. The trick for constructing it is to notice that the vertices of the 600-cell form a group sitting in the sphere of unit quaternions, and to find a 24-cell whose vertices form a subgroup.

The left cosets of a subgroup H \subset G are the sets

gH = \{gh : \; h \in H\}

They look like copies of H ‘translated’, or in our case ‘rotated’, inside G. Every point of G lies in exactly one coset.

In our example there are five cosets. Each is the set of vertices of a 24-cell inscribed in the 600-cell. Every vertex of the 600-cell lies in exactly one of these cosets. This gives our ‘compound of five 24-cells’.

It turns out this trick is part of a family of three tricks, each of which gives a nice compound of 4d regular polytopes. While I’ve been avoiding coordinates, I think they’ll help get the idea across now. Here’s a nice description of the 120 vertices of the 600-cell. We take these points:

\displaystyle{ (\pm \textstyle{\frac{1}{2}}, \pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}}) }

\displaystyle{ (\pm 1, 0, 0, 0) }

\displaystyle{ \textstyle{\frac{1}{2}} (\pm \Phi, \pm 1 , \pm 1/\Phi, 0 )}

and all those obtained by even permutations of the coordinates. So, we get

2^4 = 16

points of the first kind,

2 \times 4 = 8

points of the second kind, and

2^3 \times 4! / 2 = 96

points of the third kind, for a total of

16 + 8 + 96 = 120

points.

The 16 points of the first kind are the vertices of a 4-dimensional hypercube, the 4d analogue of a cube:

The 8 points of the second kind are the vertices of a 4-dimensional orthoplex, the 4d analogue of an octahedron:

The hypercube and orthoplex are dual to each other. Taking both their vertices together we get the 16 + 8 = 24 vertices of the 24-cell, which is self-dual:

The hypercube, orthoplex and 24-cell are regular polytopes, as is the 600-cell.

Now let’s think of any point in 4-dimensional space as a quaternion:

(a,b,c,d) = a + b i + c j + d k

If we do this, we can check that the 120 vertices of the 600-cell form a group under quaternion multiplication. As mentioned in Part 1, this group is called the binary icosahedral group or 2\mathrm{I}, because it’s a double cover of the rotational symmetry group of an icosahedron (or dodecahedron).

We can also check that the 24 vertices of the 24-cell form a group under quaternion multiplication. As mentioned in Part 1, this is called the binary tetrahedral group or 2\mathrm{T}, because it’s a double cover of the rotational symmetry group of a tetrahedron.

All this is old news. But it’s even easier to check that the 8 vertices of the orthoplex form a group under quaternion multiplication: they’re just

\pm 1, \pm i, \pm i, \pm k

This group is often called the quaternion group or \mathrm{Q}. It too is a double cover of a group of rotations! The 180° rotations about the x, y and z axes square to 1 and commute with each other; up in the double cover of the rotation group (the unit quaternions, or \mathrm{SU}(2)) they give elements that square to -1 and anticommute with each other.

Furthermore, the 180° rotations about the x, y and z axes are symmetries of a regular tetrahedron! This is easiest to visualize if you inscribe the tetrahedron in a cube thus:

So, up in the double cover of the 3d rotation group we get a chain of subgroups

\mathrm{Q} \subset 2\mathrm{T} \subset 2\mathrm{I}

which explains why we’re seeing an orthoplex inscribed in a 24-cell inscribed in a 600-cell! This explanation is more satisfying to me than the one involving coordinates.

Alas, I don’t see how to understand the hypercube inscribed in the 24-cell in quite this way, since the hypercube is not a subgroup of the unit quaternions. It certainly wasn’t in the coordinates I gave before—but worse, there’s no way to rotate the hypercube so that it becomes a subgroup. There must be something interesting to say here, but I don’t know it. So, I’ll forget the hypercube for now.

Instead, I’ll use group theory to do something nice with the orthoplex.

First, look at the orthoplexes sitting inside the 24-cell! We’ve got 8-element subgroup of a 24-element group:

\mathrm{Q} \subset 2\mathrm{T}

so it has three right cosets, each forming the vertices of an orthoplex inscribed in the 24-cell. So, we get compound of three orthoplexes: a way of partitioning the vertices of the 24-cell into those of three orthoplexes.

Second, look at the orthoplexes sitting inside the 600-cell! We’ve got 8-element subgroup of a 120-element group:

\mathrm{Q} \subset 2\mathrm{I}

so it has 15 right cosets, each forming the vertices of an orthoplex inscribed in the 600-cell. So, we get a compound of 15 orthoplexes: a way of partitioning the vertices of the 600-cell into those of 15 orthoplexes.

And third, these fit nicely with what we saw last time: the 24-cells sitting inside the 600-cell! We saw a 24-element subgroup of a 120-element group

2\mathrm{T} \subset 2\mathrm{I}

so it has 5 right cosets, each forming the vertices of a 24-cell inscribed in the 600-cell. That gave us the compound of five 24-cells: a way of partitioning the vertices of the 600-cell into those of five 24-cells.

There are some nontrivial counting problems associated with each of these three compounds. David Roberson has already solved most of these.

1) How many ways are there of inscribing an orthoplex in a 24-cell?

2) How many ways are there of inscribing a compound of three orthoplexes in a 24-cell?

3) How many ways are there of inscribing an orthoplex in a 600-cell? David used a computer to show there are 75. Is there a nice human-understandable argument?

4) How many ways are there of inscribing a compound of 15 orthoplexes in a 600-cell? David used a computer to show there are 280. Is there a nice human-understandable argument?

5) How many ways are there of inscribing a 24-cell in a 600-cell? David used a computer to show there are 25. Is there a nice human-understandable argument?

4) How many ways are there of inscribing a compound of five 24-cells in a 600-cell? David used a computer to show there are 10. Is there a nice human-understandable argument? (It’s pretty easy to prove that 10 is a lower bound.)

For those who prefer visual delights to math puzzles, here is a model of the compound of 15 orthoplexes, cleverly projected from 4 dimensions down to 3, made by David Richter and some friends:

It took four people 6 hours to make this! Click on the image to learn more about this amazing shape, and explore David Richter’s pages to see more compounds.

So far my tale has not encompassed the 120-cell, which is the dual of the 600-cell. This has 600 vertices and 120 dodecahedral faces:

Unfortunately, like the hypercube, the vertices of the 120-cell cannot be made into a subgroup of the unit quaternions. I’ll need some other idea to think about them in a way that I enjoy. But the 120-cell is amazing because every regular polytope in 4 dimensions can be inscribed in the 120-cell.

For example, we can inscribe the orthoplex in the 120-cell. Since the orthoplex has 8 vertices while the 120-cell has 600, and

600/8 = 75

we might hope for a compound of 75 orthoplexes whose vertices, taken together, are those of the 120-cell. And indeed it exists… and David Richter and his friends have built a model!

Image credits

You can click on any image to see its source. The photographs of models of the compound of five 24-cells and the compound of 15 orthoplexes are due to David Richter and friends. The shiny ball-and-strut pictures of the tetrahedron in the cube and the 120-cells were made by Tom Ruen using Robert Webb’s Stella software and placed on Wikicommons. The 2d projections of the hypercube, orthoplex and 24-cell were made by Tom Ruen and placed into the public domain on Wikicommons.

Gordon WattsChristmas Project

Every Christmas I try to do some sort of project. Something new. Sometimes it turns into something real, and last for years. Sometimes it goes no where. Normally, I have an idea of what I’m going to attempt – usually it has been bugging me for months and I can’t wait till break to get it started. This year, I had none.

But, I arrived home at my parent’s house in New Jersey and there it was waiting for me. The house is old – more 200 yrs old – and the steam furnace had just been replaced. For those of you unfamiliar with this method of heating a house: it is noisy! The furnace boils water, and the steam is forced up through the pipes to cast iron radiators. The radiators hiss through valves as the air is forced up – an iconic sound from my childhood. Eventually, after traveling sometimes four floors, the super hot steam reaches the end of a radiator and the valve shuts off. The valves are cool – heat sensitive! The radiator, full of hot steam, then warms the room – and rather effectively.

The bane of this system, however, is that it can leak. And you have no idea where the leak is in the whole house! The only way you know: the furnace reservoir needs refilling too often. So… the problem: how to detect the reservoir needs refilling? Especially with this new modern furnace which can automatically refill its resevoir.

Me: Oh, look, there is a little LED that comes on when the automatic refilling system comes on! I can watch that! Dad: Oh, look, there is a little light that comes on when the water level is low. We can watch that.

Dad’s choice of tools: a wifi cam that is triggered by noise. Me: A Raspberry Pi 3, a photo-resistor, and a capacitor. Hahahaha. Game on!

IMG_20171227_030002What’s funny? Neither of us have detected a water-refill since we started this project. The first picture at the right you can see both of our devices – in the foreground taped to the gas input line is the CAM watching the water refill light through a mirror, and in the background (look for the yellow tape) is the Pi taped to the refill controller (and the capacitor and sensor hanging down looking at the LED on the bottom of the box).

I chose the Pi because I’ve used it once before – for a Spotify end-point. But never for anything that it is designed for. An Arduino is almost certainly better suited to this – but I wasn’t confident that I could get it up and running in the 3 days I had to make this (including time for ordering and shipping of all parts from Amazon). It was a lot of fun! And consumed a bunch of time. “Hey, where is Gordon? He needs to come for Christmas dinner!” “Wait, are you working on Christmas day?” – for once I could answer that last one with a honest no! Hahaha. Smile

I learned a bunch:

  • I had to solder! It has been a loooong time since I’ve done that. My first graduate student, whom I made learn how to solder before I let him graduate, would have laughed at how rusty my skills were!
  • I was surprised to learn, at the start, that the Pi has no analog to digital converter. I stole a quick and dirty trick that lots of people have used to get around this problem: time how long it takes to charge a capacitor up with a photoresistor. This is probably the biggest source of noise in my system, but does for crude measurements.
  • I got to write all my code in Python. Even interrupt handling (ok, no call backs, but still!)
  • The Pi, by default, runs a full build of Linux. Also, python 3! I made full use of this – all my code is in python, and a bit in bash to help it get going. I used things like cron and pip – they were either there, or trivial to install. Really, for this project, I was never consious of the Pi being anything less than a full computer.
  • At first I tried to write auto detection code – that would see any changes in the light levels and write them to a file… which was then served on a nginx simple webserver (seriously – that was about 2 lines of code to install). But the noise in the system plus the fact that we’ve not had a fill so I don’t know what my signal looks like yet… So, that code will have to be revised.
  • In the end, I have to write a file with the raw data in it, and analyze that – at least, until I know what an actual signal looks like. So… how to get that data off the Pi – especially given that I can’t access it anymore now that I’ve left New Jersey? In the end I used some Python code to push the files to OneDrive. Other than figuring out how to deal with OAuth2, it was really easy (and I’m still not done fighting the authentication battle). What will happen if/when it fails? Well… I’ve recorded the commands my Dad will have to execute to get the new authentication files down there. Hopefully there isn’t going to be an expiration!
  • imageTo analyze the raw data I’ve used a new tool I’ve recently learned at work: numpy and Jupyter notebooks. They allow me to produce a plot like this one. The dip near the left hand side of the plot is my Dad shining the flashlight at my sensors to see if I could actually see anything. The joker.

Pretty much the only thing I’d used before was Linux, and some very simple things with an older Raspberry Pi 2. If anyone is on the fence about this – I’d definately recommend trying it out. It is very easy and there are 1000’s of web pages with step by step instructions for most things you’ll want to do!


December 27, 2017

Doug NatelsonThe Quantum Labyrinth - a review

Because of real life constraints I'm a bit slow off the mark compared to others, but I've just finished reading The Quantum Labyrinth by Paul Halpern, and wanted to get some thoughts down about it.  The book is a bit of a superposition between a dual biography of Feynman and Wheeler, and a general history of the long-term impact of what started out as their absorber theory.  

The biographical aspects of Feynman have been well trod before by many, including Feynman himself and rather more objectively by James Gleick.   Feynman helped create his own legend (safecracking, being a mathematically prodigious, bongo-playing smart-ass).  The bits called back in the present work that resonate with me now (perhaps because of my age) are how lost he was after his first wife's death, his insecurity about whether he was really getting anything done after QED, his embracing of family life with his third wife, and his love of teaching - both as theater and as a way to feel accomplishment when research may be slow going.  

From other books I'd known a bit about Wheeler, who was still occasionally supervising physics senior theses at Princeton when I was an undergrad.  The backstory about his brother's death in WWII as motivation for Wheeler's continued defense work after the war was new to me.   Halpern does a very good job conveying Wheeler's style - coining pithy epigrams ("Spacetime tells matter how to move; matter tells spacetime how to curve.", "The boundary of a boundary is zero.") and jumping from topic to topic with way outside the box thinking.  We also see him editing his students' theses and papers to avoid antagonizing people.  Interesting.

From the Feynman side, the absorber theory morphed into path integrals, his eponymous diagrams, and his treatment of quantum electrodynamics.   The book does a good job discussing this, though like nearly every popularization, occasionally the analogies, similes, and metaphors end up sacrificing accuracy for the sake of trying to convey physical intuition.    From the Wheeler angle, we get to learn about attempts at quantizing gravity, geons, wormholes, and the many worlds interpretation of quantum mechanics.

It's a fun read that gives you a sense of the personalities and the times for a big chunk of twentieth century theoretical physics, and I'm impressed with Halpern's ability to convey these things without being a professional historian.  

Mark Chu-CarrollGarbage Collection with Semispaces

The roots of most garbage collection ideas come from the Lisp community. Lisp was really the first major garbage collection language that was used to write complicated things. So it’s natural that the first big innovation in the world of GC that we’re going to look at comes from the Lisp community.

In early Lisp systems with garbage collection, the pause that occured when the GC did a mark/sweep to reclaim memory was very long, so it was important to find ways to make the cycle faster. Lisp code had the properly that it tended to allocate a lot of small, short-lived objects: Lisp, particularly early lisp, tended to represent everything using tiny structures called cons cells, and Lisp programs generate bazillions of them. Lots of short-lived cons cells needed to get released in every GC cycle, and the bulk of the GC pause was caused by the amount of time that the GC spend going through all of the dead cons cells, and releasing them.

Beyond just that speed issue, there’s another problem with naive mark-sweep collection when you’re dealing with large numbers of short lived objects, called heap fragmentation. The GC does a pass marking all of the memory in use, and then goes through each unused block of memory, and releases it. What can happen is that you can end up with lots of memory free, but scattered around in lots of small chunks. For an extreme example, imagine that you’re building two lists made up of 8-byte cells. You allocate a cell for list A, and then you do something using A, and generate a new value which you add as a new cell in list B. So you’re alternating: allocate a cell for A, then a cell for B. When you get done, you discard A, and just keep B. After the GC runs, what does your memory look like? If A and B each have 10,000 cells, then what you have is 8 bytes of free memory that used to be part of A, and then 8 bytes of used memory for a cell of B, then 8 bytes of free, then 8 used, etc. You’ve ended up with 80,000 bytes of free memory, none of which can be used to store anything larger than 8 bytes. Eventually, you can wind up with your entire available heap broken into small enough pieces that you can’t actually use it for anything.

What the lisp folks came up with is a way of getting rid of fragmentation, and dramatically reducing the cost of the sweep phase, by using something called semispaces. With semispaces, you do some cleverness that can be summed up as moving from mark-sweep to copy-swap.

The idea is that instead of keeping all of your available heap in one chunk, you divide it into two equal regions, called semispaces. You call one of the semispaces the primary, and the other the secondary. When you allocate memory, you only allocate from the primary space. When the primary space gets to be almost full, you start a collection cycle.

When you’re doing your mark phase, instead of just marking each live value, you copy it to the secondary space. When all of the live values have been copied to the secondary space, you update all of the pointers within the live values to their new addresses in the secondary space.

Then, instead of releasing each of the unused values, you just swap the primary and secondary space. Everything in the old primary space gets released, all at once. The copy phase also compacts everything as it moves it into the secondary space, consolidating all of the free memory in one contiguous chunk. If you implement it well, you can also have beneficial side effect of moving things close in ways that improve the cache performance of your program.

For Lisp programs, semispaces are a huge win: they reduce the cost of the sweep phase to a constant time, at the expense of a nearly linear increase in mark time, which works out really well. And it eliminates the problem of fragmentation. All in all, it’s a great tradeoff!

Of course, it’s got some major downsides as well, which can make it work very poorly in some cases:

  1. The copy phase is significantly more expensive than a traditional mark-phase. The time it takes to copy is linear in the total amount of live data, versus linear in the number of live objects in a conventional mark. Whether semispaces will work well for a given application depend on the properties of the data that you’re working with. If you’ve got lots of large objects, then the increase in time caused by the copy instead of mark can significantly outweigh the savings of the almost free swap, making your GC pauses much longer; but if you’ve got lots of short-lived, small objects, then the increase in time for the copy can be much smaller than time savings from the swap, resulting in dramatically shortened GC pauses.
  2. Your application needs to have access to twice as much memory as you actually expect it to use, because you need two full semispaces. There’s really no good way around this: you really need to have a chunk of unused memory large enough to store all of your live objects – and it’s always possible that nearly everything is alive for a while, meaning that you really do need two equally sized semispaces.
  3. You don’t individually release values, which means that you can’t have any code that runs when an value gets collected. In a conventional mark-sweep, you can have objects provide functions called finalizers to help them clean up when they’re released – so objects like files can close themselves. In a semispace, you can’t do that.

The basic idea of semispaces is beautiful, and it’s adaptable to some other environments where a pure semispace doesn’t make sense, but some form of copying and bulk release can work out well.

For example, years ago, at a previous job, one of my coworkers was working on a custom Java runtime system for a large highly scalable transaction processing system. The idea of this was that you get a request from a client system to perform some task. You perform some computation using the data from the client request, update some data structures on the server, and then return a result to the client. Then you go on to the next request.

The requests are mostly standalone: they do a bunch of computation using the data that they recieved in the request. Once they’re done with a given request, almost nothing that they used processing it will ever be looked at again.

So what they did was integrate a copying GC into the transaction system. Each time they started a new transaction, they’d give it a new memory space to live it. When the transaction finished, they’d do a quick copy cycle to copy out anything that was referenced in the master server data outside the transaction, and then they’d just take that chunk of memory, and make it available for use by the next transaction.

The result? Garbage collection became close to free. The number of pointers into the transaction space from the master server data was usually zero, one, or two. That meant that the copy phase was super-short. The release phase was constant time, just dropping the pointer to the transaction space back into the available queue.

So they were able to go from an older system which had issues with GC pauses to a new system with no pauses at all. It wouldn’t work outside of that specific environment, but for that kind of application, it screamed.

Terence TaoCorrelations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)

for medium-sized {h} and large {X}, where {k \geq l \geq 1} are natural numbers and {d_k(n) = \sum_{n = m_1 \dots m_k} 1} is the {k^{th}} divisor function (actually our methods can also treat a generalisation in which {k} is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})

for any fixed {\varepsilon>0}, where {P_{k,l,h}} is a certain explicit (but rather complicated) polynomial of degree {k+l-1}. Such asymptotics are known when {l \leq 2}, but remain open for {k \geq l \geq 3}. In the previous paper, we were able to obtain a weaker bound of the form

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)

for {1-O_A(\log^{-A} X)} of the shifts {-H \leq h \leq H}, whenever the shift range {H} lies between {X^{8/33+\varepsilon}} and {X^{1-\varepsilon}}. But the methods become increasingly hard to use as {H} gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

\displaystyle  \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X

for {1-o(1)} of the shifts {-H \leq h \leq H}, where {H} can now be as short as {H = \log^{10^4 k \log k} X}. The constant {10^4} can be improved, but there are serious obstacles to using our method to go below {\log^{k \log k} X} (as the exceptionally large values of {d_k} then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions {d_k,d_l} are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

\displaystyle  S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).

Actually, it is convenient to first prune {d_k} slightly by zeroing out this function on “atypical” numbers {n} that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of {S(\alpha)} for “major arc” {\alpha} can be treated by standard techniques (and is the source of the main term {P_{k,l,h}(\log X) X}; the main difficulty comes from treating the contribution of “minor arc” {\alpha}.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global {L^2} norm {\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local {L^2} norms {\int_I |S(\alpha)|^2\ d\alpha}, where {I} was a minor arc interval of length about {1/H}, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For {d_k}, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local {L^2} norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as {d_k}, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global {L^2} bound will definitely be unusable, because the {\ell^2} sum {\sum_{n \leq X} d_k(n)^2} has too many unwanted factors of {\log X}. Fortunately, we can substitute this global {L^2} bound with a “large values” bound that controls expressions such as

\displaystyle  \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha

for a moderate number of disjoint intervals {I_1,\dots,I_J}, with a bound that is slightly better (for {J} a medium-sized power of {\log X}) than what one would have obtained by bounding each integral {\int_{I_i} |S(\alpha)|^2\ d\alpha} separately. (One needs to save more than {J^{1/2}} for the argument to work; we end up saving a factor of about {J^{3/4}}.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

\displaystyle  \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},

where {\alpha_i} is the midpoint of {I_i}; thus we need some upper bound on the large local Fourier coefficients of {d_k}. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace {d_k} by a more tractable and “pseudorandom” majorant {\tilde d_k} for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

\displaystyle  |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|

after various averaging in the {x, i,i'} parameters. These local Fourier coefficients of {\tilde d_k} turn out to be small on average unless {\alpha_i -\alpha_{i'}} is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the {\alpha_i} are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies {\alpha_i} together, and modifying the duality argument accordingly).

John PreskillPanza’s paradox

I finished reading a translation of Don Quixote this past spring. Miguel de Cervantes wrote the novel during the 1600s. The hero, a Spanish gentleman, believes the tales of chivalry’s golden days. He determines to outdo his heroes as a knight. Don Quixote enlists a peasant, Sancho Panza, to serve as his squire. Bony Don Quixote quotes classical texts; tubby Sancho Panza can’t sign his name. The pair roams the countryside, seeking adventures.

Don Quixote might have sold more copies than any other novel in history. Historians have dubbed Don Quixote “the first modern novel”; “quixotic” appears in English dictionaries; and artists and writers still spin off the story. Don Quixote reverberates throughout the 500 years that have followed it.

Cervantes, I discovered, had grasped a paradox that mathematicians had exposed last century.

xkcd.jpg

Artists continue to spin off Don Quixote.

Don Quixote will vanquish so many villains, the pair expects, rulers will shower gifts on him. Someone will bequeath him a kingdom or an empire. Don Quixote promises to transfer part of his land to Sancho. Sancho expects to govern an ínsula, or island.

Sancho’s expectation amuses a duke and duchess. They pretend to grant Sancho an ínsula as a joke. How would such a simpleton rule? they wonder. They send servants and villagers to Sancho with fabricated problems. Sancho arbitrates the actors’ cases. Grossman translates one case as follows:

the first [case] was an engima presented to him by a stranger, who said:
“Señor, a very large river divided a lord’s lands into two parts [ . . . ] a bridge crossed this river, and at the end of it was a gallows and a kind of tribunal hall in which there were ordinarily four judges who applied the law set down by the owner of the river, the bridge, and the lands, which was as follows: ‘If anyone crosses this bridge from one side to the other, he must first take an oath as to where he is going and why; and if he swears the truth, let him pass; and if he tells a lie, let him die by hanging on the gallows displayed there, with no chance of pardon.’ Knowing this law and its rigorous conditions, many people crossed the bridge, and then, when it was clear that what they swore was true, the judges let them pass freely. It so happened, then, that a man once took the oath, and he swore and said that because of the oath he was going to die on the gallows, and he swore to nothing else. The judges studied the oath and said: ‘If we allow this man to pass freely, he lied in his oath, and according to the law he must die; and if we hang him, he swore that he was going to die on this gallows, and having sworn the truth, according to the same law he must go free.’ Señor Governor, the question for your grace is what should the judges do with the man.”

Cervantes formulated a paradox that looks, to me, equivalent to Russell’s barber paradox. Bertrand Russell contributed to philosophy during the early 1900s. He concocted an argument called the Russell-Zermelo paradox, which I’ll describe later. An acquaintance tried to encapsulate the paradox as follows: Consider an adult male barber who shaves all men who do not shave themselves. Does the barber shave himself?

Suppose that the barber doesn’t. (Analogously, suppose that the smart aleck in Panza’s paradox doesn’t lie.) The barber is a man whom the barber shaves. (The smart aleck must survive.) Hence the barber must shave himself. (Hence the traveler lies.) But we assumed that the barber doesn’t shave himself. (But we assumed that the traveler doesn’t lie.) Stalemate.

Barber

A barber plays a role in Don Quixote as in the Russell-Zermelo-like paradox. But the former barber has a wash basin that Don Quixote mistakes for a helmet.

Philosophers and mathematicians have debated to what extent the barber paradox illustrates the Russell-Zermelo paradox. Russell formulated the paradox in 1902. The mathematician Ernst Zermelo formulated the paradox around the same time. Mathematicians had just developed the field of set theory. A set is a collection of objects. Examples include the set of positive numbers, the set of polygons, and the set of readers who’ve looked at a Quantum Frontiers post.

Russell and Zermelo described a certain set \mathcal{S} of sets, a certain collection of sets. Let’s label the second-tier sets S_j  =  S_1, S_2, S_3, etc.

Russell 1

Each second-tier set S_j can contain elements. The elements can include third-tier sets s^{(j)}_k = s^{(j)}_1 ,  s^{(j)}_2, s^{(j)}_3, etc.

Russell 2

But no third-tier set s^{(j)}_k equals the second-tier set S_j. That is, no second-tier set S_j is element of itself.

Russell 3

Let \mathcal{S} contain every set that does not contain itself. Does the first-tier set \mathcal{S} contain itself?

Russell 4

Suppose that it does: \mathcal{S} = S_j for some j. \mathcal{S} is an element of itself. But, we said, “no second-tier set S_j is an element of itself.” So \mathcal{S} must not be an element of itself. But \mathcal{S} “contain[s] every set that does not contain itself.” So \mathcal{S} must contain itself. But we just concluded that \mathcal{S} doesn’t. Stalemate.

The Stanford Encyclopedia of Philosophy synopsizes the Russell-Zermelo paradox: “the paradox arises within naïve set theory by considering the set of all sets that are not members of themselves. Such a set appears to be a member of itself if and only if it is not a member of itself.”

One might resolve the Russell-Zermelo paradox by concluding that no set \mathcal{S} exists. One might resolve the barber paradox by concluding that no such barber exists. How does Sancho resolve what I’ll dub Panza’s paradox?

He initially decrees, “‘let the part of the man that swore the truth pass freely, and hang the part that told a lie.’”1 The petitioner protests: Dividing the smart aleck will kill him. But the law suggests that the smart aleck should live.

Sancho revises his decree:

“since the reasons for condemning him or sparing him are balanced perfectly, they should let him pass freely, for doing good is always more praiseworthy than doing evil, and I’d sign this with my own name if I knew how to write, and in this case I haven’t said my own idea but a precept that came to mind, one of many that was given to me by my master, Don Quixote [ . . . ] when the law is in doubt, I should favor and embrace mercy.”

One can resolve the barber’s paradox by concluding that such a barber cannot exist. Sancho resolves Panza’s paradox by concluding that the landowner’s law cannot govern all bridge-crossings. The law lacks what computer scientists would call an “edge case.” An edge case falls outside the jurisdiction of the most-often-used part of a rule. One must specify explicitly how to process edge cases, when writing computer programs. Sancho codes the edge case, supplementing the law.

Bridge

Upon starting to read about Sancho’s trial, I sat bolt upright. I ran to my laptop upon finishing. Miguel de Cervantes had intuited, during the 1600s, a paradox not articulated by mathematicians until the 1900s. Surely, the literati had pounced on Cervantes’s foresight?

Mathematics writer Martin Gardner had. I found also two relevant slides in a Powerpoint and three relevant pages in an article. But more critics classified Panza’s paradox as an incarnation of the liar’s paradox than invoked Russell.

Scholars have credited Cervantes with anticipating, and initiating, literary waves that have propagated for four centuries. Perhaps we should credit him with anticipating mathematics not formalized for three.

 

1This decree evokes the story of King Solomon and the baby. Two women bring a baby to King Solomon. Each woman claims the baby as hers. “Cut the baby in two,” Solomon rules, “and give half to each woman.” One woman assents. The other cries, “No! Let her have the child; just don’t kill the baby.” The baby, Solomon discerns, belongs to the second woman. A mother would rather see her child in someone else’s home than see her child killed. Sancho, like Solomon, rules that someone be divided in two. But Sancho, lacking Solomon’s wisdom, misapplies Solomon’s tactic.


December 26, 2017

John BaezThe 600-Cell (Part 1)

I can’t stop thinking about the 600-cell:

It’s a ‘Platonic solid in 4 dimensions’ with 600 tetrahedral faces and 120 vertices. One reason I like it is that you can think of these vertices as forming a group: a double cover of the rotational symmetry group of the icosahedron. Another reason is that it’s a halfway house between the icosahedron and the \mathrm{E}_8 lattice. I explained all this in my last post here:

From the icosahedron to E8.

I wrote that post as a spinoff of an article I was writing for the Newsletter of the London Mathematical Society, which had a deadline attached to it. Now I should be writing something else, for another deadline. But somehow deadlines strongly demotivate me—they make me want to do anything else. So I’ve been continuing to think about the 600-cell. I posed some puzzles about it in the comments to my last post, and they led me to some interesting thoughts, which I feel like explaining. But they’re not quite solidified, so right now I just want to give a fairly concrete picture of the 600-cell, or at least its vertices.

This will be a much less demanding post than the last one—and correspondingly less rewarding. Remember the basic idea:

Points in the 3-sphere can be seen as quaternions of norm 1, and these form a group \mathrm{SU}(2) that double covers \mathrm{SO}(3). The vertices of the 600-cell are the points of a subgroup \Gamma \subset \mathrm{SU}(2) that double covers the rotational symmetry group of the icosahedron. This group \Gamma is the famous binary icosahedral group.

Thus, we can name the vertices of the 600-cell by rotations of the icosahedron—as long as we remember to distinguish between a rotation by \theta and a rotation by \theta + 2\pi. Let’s do it!

• 0° (1 of these). We can take the identity rotation as our chosen ‘favorite’ vertex of the 600-cell.

• 72° (12 of these). The nearest neighbors of our chosen vertex correspond to the rotations by the smallest angles that are symmetries of the icosahedron; these correspond to taking any of its 12 vertices and giving it a 1/5 turn clockwise.

• 120° (20 of these). The next nearest neighbors correspond to taking one of the 20 faces of the icosahedron and giving it a 1/3 turn clockwise.

• 144° (12 of these). These correspond to taking one of the vertices of the icosahedron and giving it a 2/5 turn clockwise.

• 180° (30 of these). These correspond to taking one of the edges and giving it a 1/2 turn clockwise. (Note that since we’re working in the double cover \mathrm{SU(2)} rather than \mathrm{SO}(3), giving one edge a half turn clockwise counts as different than giving the opposite edge a half turn clockwise.)

• 216° (12 of these). These correspond to taking one of the vertices of the icosahedron and giving it a 3/5 turn clockwise. (Again, this counts as different than rotating the opposite vertex by a 2/5 turn clockwise.)

• 240° (20 of these). These correspond to taking one of the faces of the icosahedron and giving it a 2/3 turn clockwise. (Again, this counts as different than rotating the opposite vertex by a 1/3 turn clockwise.)

• 288° (12 of these). These correspond to taking any of the vertices and giving it a 4/5 turn clockwise.

• 360° (1 of these). This corresponds to a full turn in any direction.

Let’s check:

1 + 12 + 20 + 12 + 30 + 12 + 20 + 12 + 1 = 120

Good! We need a total of 120 vertices.

This calculation also shows that if we move a hyperplane through the 3-sphere, which hits our favorite vertex the moment it touches the 3-sphere, it will give the following slices of the 600-cell:

• Slice 1: a point (our favorite vertex),

• Slice 2: an icosahedron (whose vertices are the 12 nearest neighbors of our favorite vertex),

• Slice 3: a dodecahedron (whose vertices are the 20 next-nearest neighbors),

• Slice 4: an icosahedron (the 12 third-nearest neighbors),

• Slice 5: an icosidodecahedron (the 30 fourth-nearest neighbors),

• Slice 6: an icosahedron (the 12 fifth-nearest neighbors),

• Slice 7: a dodecahedron (the 20 sixth-nearest neighbors),

• Slice 8: an icosahedron (the 12 seventh-nearest neighbors),

• Slice 9: a point (the vertex opposite our favorite).

Here’s a picture drawn by J. Gregory Moxness, illustrating this:

Note that there are 9 slices. Each corresponds to a different conjugacy class in the group \Gamma. These in turn correspond to the dots in the extended Dynkin diagram of \mathrm{E}_8, which has the usual 8 dots and one more.

The usual \mathrm{E}_8 Dynkin diagram has ‘legs’ of lengths 5, 2 and 3:

The three legs correspond to conjugacy classes in \Gamma that map to rotational symmetries of an icosahedron that preserve a vertex (5 conjugacy classes), an edge (2 conjugacy classes), and a (3 conjugacy classes)… not counting the element -1 \in \Gamma. That last element gives the extra dot in the extended Dynkin diagram.

Image credits

You can click on an image to see its source. The shiny ball-and-strut picture of the 120-cell was made by Tom Ruen using Robert Webb’s Stella software and placed on Wikicommons. The picture of slices of the 120-cell was drawn by J. Gregory Moxness and placed on Wikicommons under a Creative Commons Attribution-Share Alike 4.0 International license.

Tommaso DorigoChristmas Homework

Scared by the void of Christmas vacations? Unable to put just a few more feet between your mouth and the candy tray? Suffocating in the trivialities of the chit-chat with relatives? I have a solution for you. How about trying to solve a few simple high-energy physics quizzes? 

I offer three questions below, and you are welcome to think any or all of them over today and tomorrow. In two days I will give my answer, explain the underlying physics a bit, and comment your own answers, if you have been capable of typing them despite your skyrocketing glycemic index.

read more

December 24, 2017

n-Category Café An M5-Brane Model

When you try to quantize 10-dimensional supergravity theories, you are led to some theories involving strings. These are fairly well understood, because the worldsheet of a string is 2-dimensional, so string theories can be studied using 2-dimensional conformal quantum field theories, which are mathematically tractable.

When you try to quantize 11-dimensional supergravity, you are led to a theory involving 2-branes and 5-branes. People call it M-theory, because while it seems to have magical properties, our understanding of it is still murky — because it involves these higher-dimensional membranes. They have 3- and 6-dimensional worldsheets, respectively. So, precisely formulating M-theory seems to require understanding certain quantum field theories in 3 and 6 dimensions. These are bound to be tougher than 2d quantum field theories… tougher to make mathematically rigorous, for example… but even worse, until recently people didn’t know what either of these theories were!

In 2008, Aharony, Bergman, Jafferis and Maldacena figured out the 3-dimensional theory: it’s a supersymmetric Chern–Simons theory coupled to matter in a way that makes it no longer a topological quantum field theory, but still conformally invariant. It’s now called the ABJM theory. This discovery led to the ‘M2-brane mini-revolution’, as various puzzles about M-theory got solved.

The 6-dimensional theory has been much more elusive. It’s called the (0,2) theory. It should be a 6-dimensional conformal quantum field theory. But its curious properties got people thinking that it couldn’t arise from any Lagrangian — a serious roadblock, given how physicists normally like to study quantum field theories. But people have continued avidly seeking it, and not just for its role in a potential ‘theory of everything’. Witten and others have shown that if it existed, it would shed new light on Khovanov duality and geometric Langlands correspondence! The best introduction is here:

In a recent interview with Quanta magazine, Witten called this elusive 6-dimensional theory “the pinnacle”:

Q: I’ve heard about the mysterious (2,0) theory, a quantum field theory describing particles in six dimensions, which is dual to M-theory describing strings and gravity in seven-dimensional AdS space. Does this (2,0) theory play an important role in the web of dualities?

A: Yes, that’s the pinnacle. In terms of conventional quantum field theory without gravity, there is nothing quite like it above six dimensions. From the (2,0) theory’s existence and main properties, you can deduce an incredible amount about what happens in lower dimensions. An awful lot of important dualities in four and fewer dimensions follow from this six-dimensional theory and its properties. However, whereas what we know about quantum field theory is normally from quantizing a classical field theory, there’s no reasonable classical starting point of the (2,0) theory. The (2,0) theory has properties [such as combinations of symmetries] that sound impossible when you first hear about them. So you can ask why dualities exist, but you can also ask why is there a 6-D theory with such and such properties? This seems to me a more fundamental restatement.

Indeed, it sits atop a terrifying network of field theories in various lower dimensions:

Now, maybe, maybe this theory has been found:

As Urs cautiously and wisely wrote:

If this holds water, it will be big.

Here’s the abstract:

Abstract. We present an action for a six-dimensional superconformal field theory containing a non-abelian tensor multiplet. All of the ingredients of this action have been available in the literature. We bring these pieces together by choosing the string Lie 2-algebra as a gauge structure, which we motivated in previous work. The kinematical data contains a connection on a categorified principal bundle, which is the appropriate mathematical description of the parallel transport of self-dual strings. Our action can be written down for each of the simply laced Dynkin diagrams, and each case reduces to a four-dimensional supersymmetric Yang-Mills theory with corresponding gauge Lie algebra. Our action also reduces nicely to an M2-brane model which is a deformation of the ABJM model.

My own interest in this is purely self-centered. I hope this theory holds water — I hope it continues to pass various tests it needs to pass to be the elusive (0,2) theory — because it uses ideas from higher gauge theory, and in particular the string Lie 2-algebra!

This is a ‘categorified Lie algebra’ that one can construct starting from any Lie algebra with an invariant inner product. It was first found (though not under this name) by Alissa Crans, who was then working on her thesis with me:

The idea was published here:

In 2005, together with Danny Stevenson and Urs Schreiber, we connected the string Lie 2-algebra to central extensions of loop groups and the ‘string group’:

  • John C. Baez, Alissa S. Crans, Danny Stevenson, Urs Schreiber, From loop groups to 2-groups, Homotopy, Homology and Applications 9 (2007), 101–135.

though our paper was published only after a struggle. Subsequently Urs worked out a much better understanding of how Lie nn-algebras appear in higher gauge theory and string theory. In 2012, together with Domenico Fiorenza, Hisham Sati, he began working out how string Lie 2-algebras are related to the 5-branes in M-theory:

They focused on the 7-dimensional Chern–Simons theory which should be connected to the elusive 6-dimensional (0,2) theory via the AdS/CFT correspondence. The new work by Saemann and Schmidt goes further by making an explicit proposal for the Lagrangian of the 6-dimensional theory.

For more details, read Urs’ blog article on G+.

All this leaves me feeling excited but also bemused. When I started thinking about 2-groups and Lie 2-algebras, I knew right away that they should be related to the parallel transport of 1-dimensional objects — that is, strings. But not being especially interested in string theory, I had no inkling of how, exactly, the Lie 2-algebras that Alissa Crans and I discovered might play a role in that subject. That began becoming clear in our paper with Danny and Urs. But then I left the subject, moving on to questions that better fit my real interests.

From tiny seeds grow great oaks. Should have stuck with the string Lie 2-algebra and helped it grow? Probably not. But sometimes I wish I had.

December 22, 2017

Mark Chu-CarrollA Beginner’s Guide to Garbage Collection

I was just reading an interesting paper about garbage collection (GC), and realized that I’d never written about it here, so I thought maybe I’d write a couple of articles about it. I’m going to start by talking about the two most basic techniques: mark and sweep collection, and reference counting. In future posts, I’ll move on to talk about various neat things in the world of GC.

So let’s start at the beginning. What is garbage collection?

When you’re writing a program, you need to store values in memory. Most of the time, if you want to do something interesting, you need to be able to work with lots of different values. You read data from your user, and you need to be able to create a place to store it. So (simplifying a bit) you ask your operating system to give you some memory to work with. That’s called memory allocation.

The thing about memory allocation is that the amount of memory that a computer has is finite. If you keep on grabbing more of your computer’s memory, you’re eventually going to run out. So you need to be able to both grab new memory when you need it, and then give it back when you’re done.

In many languages (for example, C or C++), that’s all done manually. You need to write code that says when you want to grab a chunk of memory, and you also need to say when you’re done with it. Your program needs to keep track of how long it needs to use a chunk of memory, and give it back when it’s done. Doing it that way is called manual memory management.

For some programs, manual memory management is the right way to go. On the other hand, it can be very difficult to get right. When you’re building a complicated system with a lot of interacting pieces, keeping track of who’s using a given piece of memory can become extremely complicated. It’s hard to get right – and when you don’t get it right, your program allocates memory and never gives it back – which means that over time, it will be using more and more memory, until there’s none left. That’s called a memory leak. It’s very hard to write a complicated system using manual memory management without memory leaks.

You might reasonably ask, what makes it so hard? You’re taking resources from the system, and using them. Why can’t you just give them back when you’re done with them?

It’s easiest to explain using an example. I’m going to walk through a real-life example from one of my past jobs.

I was working on a piece of software that managed the configuration of services for a cluster management platform. In the system, there were many subsystems that needed to be configured, but we wanted to have one configuration. So we had a piece of configuration that was used to figure out what resources were needed to run a service. There was another piece that was used to figure out where log messages would get stored. There was another piece that specified what was an error that was serious enough to page an engineer. There was another piece that told the system how to figure out which engineer to page. And so on.

We’d process the configuration, and then send pieces of it to the different subsystems that needed them. Often, one subsystem would then need to grab information from the piece of configuration that was the primary responsibility of a different subsystem. For example, when there’s an major error, and you need to page an engineer, we wanted to include a link to the appropriate log in the page. So the pager needed to be able to get access to the logging configuration.

The set of components that worked as part of this configuration system wasn’t fixed. People could add new components as new things got added to the system. Each component would register which section of the configuration it was interested in – but then, when it received its configuration fragment, it could also ask for other pieces of the configuration that it needed.

So, here’s the problem. Any given piece of the configuration could be used by 1, or 2, or 3, or 4, or 20 different components. Which piece of the system should be responsible for knowing when all of the other components are done using it? How can it keep track of that?

That’s the basic problem with manual memory management. It’s easy in easy cases, but in complex systems with overlapping realms of responsibility, where multiple systems are sharing data structures in memory, it’s difficult to build a system where there’s one responsible agent that knows when everyone is done with a piece of memory.

It’s not impossible, but it’s difficult. In a system like the one above, the way that we made it work was by doing a lot of copying of data. Instead of having one copy of a chunk of evaluated configuration which was shared among multiple readers, we’d have many copies of the same thing – one for each component. That worked, but it wasn’t free. We ended up needing to use well over ten times as much memory as we could have using shared data structures. When you’ve got a system that could work with a gigabyte of data, multiplying it by ten is a pretty big deal! Even if you’ve got a massive amount of memory available, making copies of gigabytes of data takes a lot of time!

The most important point here is to understand just how hard it is to get this stuff right. I’ve been a software engineer for a long time, and I’ve worked on a lot of systems. Until the advent of the Rust programming language, I’d never seen a single long-running system built with manual memory management that didn’t have a memory leak somewhere. (I’ll talk more about Rust and how it manages to accomplish this in a later post.)

So manual memory management is very hard to get right, and it can potentially be pretty expensive. On the other hand, it’s predictable: you know, when you write your code, what the costs of managing memory will be, because you wrote the code that does it. And, if you get it right, you can control exacly how much memory your program is using at any time.

The alternative to manual memory management is to somehow make the program automatically keep track of memory, and get rid of it when it’s no longer used. But how do you know when something is no longer used?

It’s pretty easy. You call a chunk of memory live if it can be reached by any variable in the program. If it can’t, it’s garbage, and you can get rid of it. Garbage collection is any mechanism in a programming language or execution environment that automatically figures out when something is garbage, and releases it.

There are two basic methods that we can use to figure out which chunks of memory contain live values, and which are garbage. They’re called reference counting and mark-sweep. (There’s a pool of people who are going to get angry at this definition because, they argue, reference counting isn’t garbage collection. They insist that reference counting is something fundmentally different, and that only mark-sweep is really garbage collection. Obviously I disagree. The definition that I’m using is that anything which automatically releases unused memory is garbage collection.)

In reference counting, each block of memory that gets allocated includes a counter called its reference count. Every time you create a new way of referring to something – by assigning it to a variable, or passing it as a parameter, assigning it to a field of another data structure – you add one to the reference count of the block of memory that contains it. Every time you remove a way of referencing something – by changing a variable, or returning from a function call, or garbage collecting a data structure that referenced it, you decrement its reference count by one. When the reference count for a block of memory hits zero, you can release it.

It’s simple, and it’s predictable. You know that the moment you stop using something, it’s going to get released. That’s great! But there are some problems with reference counting. First, you need to make sure that every single time you change anything, you correctly update the reference counts. Miss any updates, and either things will get released before you’re done with them, or things won’t get released and you’ll leak memory. The other, potentially bigger problem, is that there are a bunch of data structures where simple reference counting doesn’t work. For example, think of a doubly-linked list. That’s a list of values, stored so that each value in the list contains pointers to both the element ahead of it in the list, and to the element behind it in the list. Every element in that list always has at least one thing pointing to it. So none of their reference counts will ever be zero, and no element of the list will ever get collected! (There are ways around that, which we’ll talk about in a later post.)

The other main garbage collection technique is called mark-sweep. In mark-sweep, you have a two-phase process: in the mark phase, you walk over all of the data structures figuring out what’s still being used, and in the sweep phase, you free up anything that isn’t getting used.

In the marking phase, you start with a set of pointers called the root set. The root set consists of the things that you know are being used: the values of all of the variables in the parts of your program that are running, and anything that’s being referenced by the execution environment.

You create a marking queue, consisting initially of the root-set. Then you start to process the queue. For each element in the queue, if it hasn’t been marked yet, you mark it as alive, and then you add everything that it contains a reference to to the queue. If it was already marked as live, you just skip over it: it’s done.

Once the mark phase is done, everything that can possibly be referenced by your running program has been marked as live. So now you can sweep: go through the memory space of your program, and release anything that wasn’t marked as live. Boom: you’ve just gotten rid of everything that’s no longer needed.

Naive mark-sweep has one really big problem: your program can’t change anything during the mark phase! That means that any time you want to release some unusued memory, you need to stop the execution of your program while the garbage collection is going through memory, figuring out what’s still alive.

Personally, I really love working in garbage collected languages. In modern GC systems, the pauses are relatively non-intrusive, and the execution time cost of them is often significantly less than the additional copy-costs of manual collection. But it’s far from a panacaea: it doesn’t even completely prevent memory leaks! (One of the things that surprised me quite a bit earlier in my career was discovering a huge memory leak in a Java program.)

Anyway, that’s the intro to the general ideas. In subsequent posts, I’ll talk about a lot of different things in the area of memory management and garbage collection. I’m mostly going to focus on mark-sweep: reference counting is a very simple idea, and most of the applications of it focus on maintaining that simplicity. But in the world of mark-sweep, there’s a ton of interesting stuff: semispaces (which make the sweep phase of GC faster and more effective), generational garbage collection (which makes the GC system faster, and reduces the number of pauses), incremental collection (which allows the mark phase to be done without stopping the whole program), and various techniques used to implement all of this, like read-barriers, write barriers, and colored pointers.

n-Category Café From the Icosahedron to E8

Here’s a draft of a little thing I’m writing for the Newsletter of the London Mathematical Society. The regular icosahedron is connected to many ‘exceptional objects’ in mathematics, and here I describe two ways of using it to construct E 8 \mathrm{E}_8. One uses a subring of the quaternions called the ‘icosians’, while the other uses Du Val’s work on the resolution of Kleinian singularities. I leave it as a challenge to find the connection between these two constructions!

(Dedicated readers of this blog may recall that I was struggling with the second construction in July. David Speyer helped me a lot, but I got distracted by other work and the discussion fizzled. Now I’ve made more progress… but I’ve realized that the details would never fit in the Newsletter, so I’m afraid anyone interested will have to wait a bit longer.)

You can get a PDF version here:

From the icosahedron to E8.

But blogs are more fun.

From the Icosahedron to E8

In mathematics, every sufficiently beautiful object is connected to all others. Many exciting adventures, of various levels of difficulty, can be had by following these connections. Take, for example, the icosahedron — that is, the regular icosahedron, one of the five Platonic solids. Starting from this it is just a hop, skip and a jump to the E 8\mathrm{E}_8 lattice, a wonderful pattern of points in 8 dimensions! As we explore this connection we shall see that it also ties together many other remarkable entities: the golden ratio, the quaternions, the quintic equation, a highly symmetrical 4-dimensional shape called the 600-cell, and a manifold called the Poincaré homology 3-sphere.

Indeed, the main problem with these adventures is knowing where to stop. The story we shall tell is just a snippet of a longer one involving the McKay correspondence and quiver representations. It would be easy to bring in the octonions, exceptional Lie groups, and more. But it can be enjoyed without these digressions, so let us introduce the protagonists without further ado.

The icosahedron has a long history. According to a comment in Euclid’s Elements it was discovered by Plato’s friend Theaetetus, a geometer who lived from roughly 415 to 369 BC. Since Theaetetus is believed to have classified the Platonic solids, he may have found the icosahedron as part of this project. If so, it is one of the earliest mathematical objects discovered as part of a classification theorem. In any event, it was known to Plato: in his Timaeus, he argued that water comes in atoms of this shape.

The icosahedron has 20 triangular faces, 30 edges, and 12 vertices. We can take the vertices to be the four points

(0,±1,±Φ) (0 , \pm 1 , \pm \Phi)

and all those obtained from these by cyclic permutations of the coordinates, where

Φ=5+12 \displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} }

is the golden ratio. Thus, we can group the vertices into three orthogonal golden rectangles: rectangles whose proportions are Φ\Phi to 1.

In fact, there are five ways to do this. The rotational symmetries of the icosahedron permute these five ways, and any nontrivial rotation gives a nontrivial permutation. The rotational symmetry group of the icosahedron is thus a subgroup of S 5 \mathrm{S}_5. Moreover, this subgroup has 60 elements. After all, any rotation is determined by what it does to a chosen face of the icosahedron: it can map this face to any of the 20 faces, and it can do so in 3 ways. The rotational symmetry group of the icosahedron is therefore a 60-element subgroup of S 5 \mathrm{S}_5. Group theory therefore tells us that it must be the alternating group A 5 \mathrm{A}_5.

The E 8 \mathrm{E}_8 lattice is harder to visualize than the icosahedron, but still easy to characterize. Take a bunch of equal-sized spheres in 8 dimensions. Get as many of these spheres to touch a single sphere as you possibly can. Then, get as many to touch those spheres as you possibly can, and so on. Unlike in 3 dimensions, where there is “wiggle room”, you have no choice about how to proceed, except for an overall rotation and translation. The balls will inevitably be centered at points of the E 8 \mathrm{E}_8 lattice!

We can also characterize the E 8\mathrm{E}_8 lattice as the one giving the densest packing of spheres among all lattices in 8 dimensions. This packing was long suspected to be optimal even among those that do not arise from lattices — but this fact was proved only in 2016, by the young mathematician Maryna Viazovska [V].

We can also describe the E 8\mathrm{E}_8 lattice more explicitly. In suitable coordinates, it consists of vectors for which:

• the components are either all integers or all integers plus 12\textstyle{\frac{1}{2}}, and

• the components sum to an even number.

This lattice consists of all integral linear combinations of the 8 rows of this matrix:

(1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 12 12 12 12 12 12 12 12) \left( \begin{array}{rrrrrrrr} 1&amp;-1&amp;0&amp;0&amp;0&amp;0&amp;0&amp;0 \\ 0&amp;1&amp;-1&amp;0&amp;0&amp;0&amp;0&amp;0 \\ 0&amp;0&amp;1&amp;-1&amp;0&amp;0&amp;0&amp;0 \\ 0&amp;0&amp;0&amp;1&amp;-1&amp;0&amp;0&amp;0 \\ 0&amp;0&amp;0&amp;0&amp;1&amp;-1&amp;0&amp;0 \\ 0&amp;0&amp;0&amp;0&amp;0&amp;1&amp;-1&amp;0 \\ 0&amp;0&amp;0&amp;0&amp;0&amp;1&amp;1&amp;0 \\ -\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2}&amp;-\frac{1}{2} \end{array} \right)

The inner product of any row vector with itself is 2, while the inner product of distinct row vectors is either 0 or -1. Thus, any two of these vectors lie at an angle of either 90° or 120° from each other. If we draw a dot for each vector, and connect two dots by an edge when the angle between their vectors is 120° we get this pattern:

This is called the E 8\mathrm{E}_8 Dynkin diagram. In the first part of our story we shall find the E 8 \mathrm{E}_8 lattice hiding in the icosahedron; in the second part, we shall find this diagram. The two parts of this story must be related — but the relation remains mysterious, at least to me.

The Icosians

The quickest route from the icosahedron to E 8 \mathrm{E}_8 goes through the fourth dimension. The symmetries of the icosahedron can be described using certain quaternions; the integer linear combinations of these form a subring of the quaternions called the ‘icosians’, but the icosians can be reinterpreted as a lattice in 8 dimensions, and this is the E 8 \mathrm{E}_8 lattice [CS]. Let us see how this works. The quaternions, discovered by Hamilton, are a 4-dimensional algebra

={a+bi+cj+dk:a,b,c,d} \displaystyle{ \mathbb{H} = \{a + b i + c j + d k \colon \; a,b,c,d\in \mathbb{R}\} }

with multiplication given as follows:

i 2=j 2=k 2=1, \displaystyle{i^2 = j^2 = k^2 = -1, } ij=k=jiandcyclicpermutations \displaystyle{i j = k = - j i \; and \; cyclic \; permutations }

It is a normed division algebra, meaning that the norm

|a+bi+cj+dk|=a 2+b 2+c 2+d 2 \displaystyle{ |a + b i + c j + d k| = \sqrt{a^2 + b^2 + c^2 + d^2} }

obeys

|qq|=|q||q| |q q'| = |q| |q'|

for all q,qq,q' \in \mathbb{H}. The unit sphere in \mathbb{H} is thus a group, often called SU(2) \mathrm{SU}(2) because its elements can be identified with 2×2 2 \times 2 unitary matrices with determinant 1. This group acts as rotations of 3-dimensional Euclidean space, since we can see any point in 3 \mathbb{R}^3 as a purely imaginary quaternion x=bi+cj+dk x = b i + c j + d k, and the quaternion qxq 1 qxq^{-1} is then purely imaginary for any qSO(3) q \in \mathrm{SO}(3). Indeed, this action gives a double cover

α:SU(2)SO(3) \displaystyle{ \alpha \colon \mathrm{SU}(2) \to \mathrm{SO}(3) }

where SO(3) \mathrm{SO}(3) is the group of rotations of 3 \mathbb{R}^3.

We can thus take any Platonic solid, look at its group of rotational symmetries, get a subgroup of SO(3) \mathrm{SO}(3), and take its double cover in SU(2) \mathrm{SU}(2). If we do this starting with the icosahedron, we see that the 60 60-element group A 5SO(3) \mathrm{A}_5 \subset \mathrm{SO}(3) is covered by a 120-element group ΓSU(2) \Gamma \subset \mathrm{SU}(2), called the binary icosahedral group.

The elements of Γ \Gamma are quaternions of norm one, and it turns out that they are the vertices of a 4-dimensional regular polytope: a 4-dimensional cousin of the Platonic solids. It deserves to be called the ‘hypericosahedron’, but it is usually called the 600-cell, since it has 600 tetrahedral faces. Here is the 600-cell projected down to 3 dimensions, drawn using Robert Webb’s Stella software:

Explicitly, if we identify \mathbb{H} with 4 \mathbb{R}^4, the elements of Γ \Gamma are the points

(±12,±12,±12,±12) \displaystyle{ (\pm \textstyle{\frac{1}{2}}, \pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}}) }

(±1,0,0,0) \displaystyle{ (\pm 1, 0, 0, 0) }

12(±Φ,±1,±1/Φ,0), \displaystyle{ \textstyle{\frac{1}{2}} (\pm \Phi, \pm 1 , \pm 1/\Phi, 0 ),}

and those obtained from these by even permutations of the coordinates. Since these points are closed under multiplication, if we take integral linear combinations of them we get a subring of the quaternions:

𝕀={ qΓa qq:a q}. \displaystyle{ \mathbb{I} = \{ \sum_{q \in \Gamma} a_q q : \; a_q \in \mathbb{Z} \} \subset \mathbb{H} .}

Conway and Sloane [CS] call this the ring of icosians. The icosians are not a lattice in the quaternions: they are dense. However, any icosian is of the form a+bi+cj+dk a + bi + cj + dk where a,b,c a,b,c, and d d live in the golden field

(5)={x+5y:x,y} \displaystyle{ \mathbb{Q}(\sqrt{5}) = \{ x + \sqrt{5} y : \; x,y \in \mathbb{Q}\} }

Thus we can think of an icosian as an 8-tuple of rational numbers. Such 8-tuples form a lattice in 8 dimensions.

In fact we can put a norm on the icosians as follows. For q𝕀 q \in \mathbb{I} the usual quaternionic norm has

|q| 2=x+5y \displaystyle{ |q|^2 = x + \sqrt{5} y }

for some rational numbers x x and y y, but we can define a new norm on 𝕀 \mathbb{I} by setting

q 2=x+y \displaystyle{ \|q\|^2 = x + y }

With respect to this new norm, the icosians form a lattice that fits isometrically in 8-dimensional Euclidean space. And this is none other than E 8 \mathrm{E}_8!

Klein’s Icosahedral Function

Not only is the E 8 \mathrm{E}_8 lattice hiding in the icosahedron; so is the E 8 \mathrm{E}_8 Dynkin diagram. The space of all regular icosahedra of arbitrary size centered at the origin has a singularity, which corresponds to a degenerate special case: the icosahedron of zero size. If we resolve this singularity in a minimal way we get eight Riemann spheres, intersecting in a pattern described by the E 8 \mathrm{E}_8 Dynkin diagram!

This remarkable story starts around 1884 with Felix Klein’s Lectures on the Icosahedron [Kl]. In this work he inscribed an icosahedron in the Riemann sphere, P 1 \mathbb{C}\mathrm{P}^1. He thus got the icosahedron’s symmetry group, A 5 \mathrm{A}_5, to act as conformal transformations of P 1 \mathbb{C}\mathrm{P}^1 — indeed, rotations. He then found a rational function of one complex variable that is invariant under all these transformations. This function equals 0 0 at the centers of the icosahedron’s faces, 1 at the midpoints of its edges, and \infty at its vertices.

Here is Klein’s icosahedral function as drawn by Abdelaziz Nait Merzouk. The color shows its phase, while the contour lines show its magnitude:

We can think of Klein’s icosahedral function as a branched cover of the Riemann sphere by itself with 60 sheets:

:P 1P 1. \displaystyle{ \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1 .}

Indeed, A 5 \mathrm{A}_5 acts on P 1 \mathbb{C}\mathrm{P}^1, and the quotient space P 1/A 5 \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 is isomorphic to P 1 \mathbb{C}\mathrm{P}^1 again. The function \mathcal{I} gives an explicit formula for the quotient map P 1P 1/A 5P 1 \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 \cong \mathbb{C}\mathrm{P}^1.

Klein managed to reduce solving the quintic to the problem of solving the equation (z)=w \mathcal{I}(z) = w for z z. A modern exposition of this result is Shurman’s Geometry of the Quintic [Sh]. For a more high-powered approach, see the paper by Nash [N]. Unfortunately, neither of these treatments avoids complicated calculations. But our interest in Klein’s icosahedral function here does not come from its connection to the quintic: instead, we want to see its connection to E 8 \mathrm{E}_8.

For this we should actually construct Klein’s icosahedral function. To do this, recall that the Riemann sphere P 1 \mathbb{C}\mathrm{P}^1 is the space of 1-dimensional linear subspaces of 2 \mathbb{C}^2. Let us work directly with 2 \mathbb{C}^2. While SO(3) \mathrm{SO}(3) acts on P 1 \mathbb{C}\mathrm{P}^1, this comes from an action of this group’s double cover SU(2) \mathrm{SU}(2) on 2 \mathbb{C}^2. As we have seen, the rotational symmetry group of the icosahedron, A 5SO(3) \mathrm{A}_5 \subset \mathrm{SO}(3), is double covered by the binary icosahedral group ΓSU(2) \Gamma \subset \mathrm{SU}(2). To build an A 5 \mathrm{A}_5-invariant rational function on P 1 \mathbb{C}\mathrm{P}^1, we should thus look for Γ \Gamma-invariant homogeneous polynomials on 2 \mathbb{C}^2.

It is easy to construct three such polynomials:

V V, of degree 12 12, vanishing on the 1d subspaces corresponding to icosahedron vertices.

E E, of degree 30 30, vanishing on the 1d subspaces corresponding to icosahedron edge midpoints.

F F, of degree 20 20, vanishing on the 1d subspaces corresponding to icosahedron face centers.

Remember, we have embedded the icosahedron in P 1 \mathbb{C}\mathrm{P}^1, and each point in P 1 \mathbb{C}\mathrm{P}^1 is a 1-dimensional subspace of 2 \mathbb{C}^2, so each icosahedron vertex determines such a subspace, and there is a linear function on 2 \mathbb{C}^2, unique up to a constant factor, that vanishes on this subspace. The icosahedron has 12 12 vertices, so we get 12 12 linear functions this way. Multiplying them gives V V, a homogeneous polynomial of degree 12 12 on 2 \mathbb{C}^2 that vanishes on all the subspaces corresponding to icosahedron vertices! The same trick gives E E, which has degree 30 30 because the icosahedron has 30 30 edges, and F F, which has degree 20 20 because the icosahedron has 20 20 faces.

A bit of work is required to check that V,E V,E and F F are invariant under Γ \Gamma, instead of changing by constant factors under group transformations. Indeed, if we had copied this construction using a tetrahedron or octahedron, this would not be the case. For details, see Shurman’s book [Sh], which is free online, or van Hoboken’s nice thesis [VH].

Since both F 3 F^3 and V 5 V^5 have degree 60 60, F 3/V 5 F^3/V^5 is homogeneous of degree zero, so it defines a rational function :P 1P 1 \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1. This function is invariant under A 5 \mathrm{A}_5 because F F and V V are invariant under Γ \Gamma. Since F F vanishes at face centers of the icosahedron while V V vanishes at vertices, =F 3/V 5 \mathcal{I} = F^3/V^5 equals 0 0 at face centers and \infty at vertices. Finally, thanks to its invariance property, \mathcal{I} takes the same value at every edge center, so we can normalize V V or F F to make this value 1. Thus, \mathcal{I} has precisely the properties required of Klein’s icosahedral function!

The Appearance of E8

Now comes the really interesting part. Three polynomials on a 2-dimensional space must obey a relation, and V,E V,E, and F F obey a very pretty one, at least after we normalize them correctly:

V 5+E 2+F 3=0. \displaystyle{ V^5 + E^2 + F^3 = 0. }

We could guess this relation simply by noting that each term must have the same degree. Every Γ \Gamma-invariant polynomial on 2 \mathbb{C}^2 is a polynomial in V,E V, E and F F, and indeed

2/Γ{(V,E,F) 3:V 5+E 2+F 3=0}. \displaystyle{ \mathbb{C}^2 / \Gamma \cong \{ (V,E,F) \in \mathbb{C}^3 \colon \; V^5 + E^2 + F^3 = 0 \} . }

This complex surface is smooth except at V=E=F=0 V = E = F = 0, where it has a singularity. And hiding in this singularity is E 8 \mathrm{E}_8!

To see this, we need to ‘resolve’ the singularity. Roughly, this means that we find a smooth complex surface S S and an onto map

that is one-to-one away from the singularity. (More precisely, if X X is an algebraic variety with singular points X singX X_{\mathrm{sing}} \subset X, π:SX \pi \colon S \to X is a resolution of X X if S S is smooth, π \pi is proper, π 1(XX sing) \pi^{-1}(X - X_{sing}) is dense in S S, and π \pi is an isomorphism between π 1(XX sing) \pi^{-1}(X - X_{sing}) and XX sing X - X_{sing}. For more details see Lamotke’s book [L].)

There are many such resolutions, but one minimal resolution, meaning that all others factor uniquely through this one:

What sits above the singularity in this minimal resolution? Eight copies of the Riemann sphere P 1 \mathbb{C}\mathrm{P}^1, one for each dot here:

Two of these P 1 \mathbb{C}\mathrm{P}^1s intersect in a point if their dots are connected by an edge: otherwise they are disjoint.

This amazing fact was discovered by Patrick Du Val in 1934 [DV]. Why is it true? Alas, there is not enough room in the margin, or even in the entire blog article, to explain this. The books by Kirillov [Ki] and Lamotke [L] fill in the details. But here is a clue. The E 8 \mathrm{E}_8 Dynkin diagram has ‘legs’ of lengths 5,2 5, 2 and 3 3:

On the other hand,

A 5v,e,f|v 5=e 2=f 3=vef=1 \displaystyle{ \mathrm{A}_5 \cong \langle v, e, f | v^5 = e^2 = f^3 = v e f = 1 \rangle }

where in terms of the rotational symmetries of the icosahedron:

v v is a 1/5 1/5 turn around some vertex of the icosahedron,

e e is a 1/2 1/2 turn around the center of an edge touching that vertex,

f f is a 1/3 1/3 turn around the center of a face touching that vertex,

and we must choose the sense of these rotations correctly to obtain vef=1 v e f = 1. To get a presentation of the binary icosahedral group we drop one relation:

Γv,e,f|v 5=e 2=f 3=vef \displaystyle{ \Gamma \cong \langle v, e, f | v^5 = e^2 = f^3 = v e f \rangle }

The dots in the E 8\mathrm{E}_8 Dynkin diagram correspond naturally to conjugacy classes in Γ\Gamma, not counting the conjugacy class of the central element 1-1. Each of these conjugacy classes, in turn, gives a copy of P 1\mathbb{C}\mathrm{P}^1 in the minimal resolution of 2/Γ\mathbb{C}^2/\Gamma.

Not only the E 8\mathrm{E}_8 Dynkin diagram, but also the E 8\mathrm{E}_8 lattice, can be found in the minimal resolution of 2/Γ\mathbb{C}^2/\Gamma. Topologically, this space is a 4-dimensional manifold. Its real second homology group is an 8-dimensional vector space with an inner product given by the intersection pairing. The integral second homology is a lattice in this vector space spanned by the 8 copies of P 1\mathbb{C}P^1 we have just seen—and it is a copy of the E 8\mathrm{E}_8 lattice [KS].

But let us turn to a more basic question: what is 2/Γ\mathbb{C}^2/\Gamma like as a topological space? To tackle this, first note that we can identify a pair of complex numbers with a single quaternion, and this gives a homeomorphism

2/Γ/Γ \mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma

where we let Γ\Gamma act by right multiplication on \mathbb{H}. So, it suffices to understand /Γ\mathbb{H}/\Gamma.

Next, note that sitting inside /Γ\mathbb{H}/\Gamma are the points coming from the unit sphere in \mathbb{H}. These points form the 3-dimensional manifold SU(2)/Γ\mathrm{SU}(2)/\Gamma, which is called the Poincaré homology 3-sphere [KS]. This is a wonderful thing in its own right: Poincaré discovered it as a counterexample to his guess that any compact 3-manifold with the same homology as a 3-sphere is actually diffeomorphic to the 3-sphere, and it is deeply connected to E 8\mathrm{E}_8. But for our purposes, what matters is that we can think of this manifold in another way, since we have a diffeomorphism

SU(2)/ΓSO(3)/A 5. \mathrm{SU}(2)/\Gamma \cong \mathrm{SO}(3)/\mathrm{A}_5.

The latter is just the space of all icosahedra inscribed in the unit sphere in 3d space, where we count two as the same if they differ by a rotational symmetry.

This is a nice description of the points of /Γ\mathbb{H}/\Gamma coming from points in the unit sphere of H\H. But every quaternion lies in some sphere centered at the origin of \mathbb{H}, of possibly zero radius. It follows that 2/Γ/Γ\mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma is the space of all icosahedra centered at the origin of 3d space — of arbitrary size, including a degenerate icosahedron of zero size. This degenerate icosahedron is the singular point in 2/Γ\mathbb{C}^2/\Gamma. This is where E 8\E_8 is hiding.

Clearly much has been left unexplained in this brief account. Most of the missing details can be found in the references. But it remains unknown — at least to me — how the two constructions of E 8\mathrm{E}_8 from the icosahedron fit together in a unified picture.

Recall what we did. First we took the binary icosahedral group Γ\Gamma \subset \mathbb{H}, took integer linear combinations of its elements, thought of these as forming a lattice in an 8-dimensional rational vector space with a natural norm, and discovered that this lattice is a copy of the E 8\mathrm{E}_8 lattice. Then we took 2/Γ/Γ\mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma, took its minimal resolution, and found that the integral 2nd homology of this space, equipped with its natural inner product, is a copy of the E 8\mathrm{E}_8 lattice. From the same ingredients we built the same lattice in two very different ways! How are these constructions connected? This puzzle deserves a nice solution.

Acknowledgements

I thank Tong Yang for inviting me to speak on this topic at the Annual General Meeting of the Hong Kong Mathematical Society on May 20, 2017, and Guowu Meng for hosting me at the HKUST while I prepared that talk. I also thank the many people, too numerous to accurately list, who have helped me understand these topics over the years.

Bibliography

[CS] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer, Berlin, 2013.

[DV] P. du Val, On isolated singularities of surfaces which do not affect the conditions of adjunction, I, II and III, Proc. Camb. Phil. Soc. 30, 453–459, 460–465, 483–491.

[KS] R. Kirby and M. Scharlemann, Eight faces of the Poincaré homology 3-sphere, Usp. Mat. Nauk. 37 (1982), 139–159. Available at https://tinyurl.com/ybrn4pjq.

[Ki] A. Kirillov, Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

[Kl] F. Klein, Lectures on the Ikosahedron and the Solution of Equations of the Fifth Degree, Trüubner & Co., London, 1888. Available at https://archive.org/details/cu31924059413439.

[L] K. Lamotke, Regular Solids and Isolated Singularities, Vieweg & Sohn, Braunschweig, 1986.

[N] O. Nash, On Klein’s icosahedral solution of the quintic. Available as arXiv:1308.0955.

[Sh] J. Shurman, Geometry of the Quintic, Wiley, New York, 1997. Available at http://people.reed.edu/~jerry/Quintic/quintic.html.

[Sl] P. Slodowy, Platonic solids, Kleinian singularities, and Lie groups, in Algebraic Geometry, Lecture Notes in Mathematics 1008, Springer, Berlin, 1983, pp. 102–138.

[VH] J. van Hoboken, Platonic Solids, Binary Polyhedral Groups, Kleinian Singularities and Lie Algebras of Type A, D, E, Master’s Thesis, University of Amsterdam, 2002. Available at http://math.ucr.edu/home/baez/joris_van_hoboken_platonic.pdf.

[V] M. Viazovska, The sphere packing problem in dimension 8, Ann. Math. 185 (2017), 991–1015. Available at https://arxiv.org/abs/1603.04246.

n-Category Café Arithmetic Gauge Theory

Around 2008-9 we had several exchanges with Minhyong Kim here at the Café, in particular on his views of approaching number theory from a homotopic perspective, in particular in the post Kim on Fundamental Groups in Number Theory. (See also the threads Afternoon Fishing and The Elusive Proteus.)

I even recall proposing a polymath project based on his ideas in Galois Theory in Two Variables. Something physics-like was in the air, and this seemed a good location with two mathematical physicists as hosts, John having extensively written on number theory in This Week’s Finds.

Nothing came of that, but it’s interesting to see Minhyong is very much in the news these days, including in a popular article in Quanta magazine, Secret Link Uncovered Between Pure Math and Physics.

The Quanta article has Minhyong saying:

“I was hiding it because for many years I was somewhat embarrassed by the physics connection,” he said. “Number theorists are a pretty tough-minded group of people, and influences from physics sometimes make them more skeptical of the mathematics.”

Café readers had an earlier alert from an interview I conducted with Minhyong, reported in Minhyong Kim in The Reasoner. There he was prepared to announce

The work that occupies me most right now, arithmetic homotopy theory, concerns itself very much with arithmetic moduli spaces that are similar in nature and construction to moduli spaces of solutions to the Yang-Mills equation.

Now his articles are appearing bearing explicit names such as ‘Arithmetic Chern-Simons theory’ (I and II), and today, we have Arithmetic Gauge Theory: A Brief Introduction.

What’s moved on in the intervening years from our side (‘our’ largely in the sense of the nLab) is an approach (very much due to Urs) to gauge field theory which looks to extract its abstract essence, and even to express this in the language of cohesive homotopy type theory, see nLab: geometry of physics. What I would love to know is how best to think of the deepest level of commonality between constructions deserving of the name ‘gauge theory’.

On the apparently non-physics side, who knows what depths might be reached if topological Langlands is ever worked out in stable homotopy theory, there being a gauge theoretic connection to geometric Langlands and even to the arithmetic version, as Minhyong remarks in his latest article:

We note also that the Langlands reciprocity conjecture … has as its goal the rewriting of arithmetic L-functions quite generally in terms of automorphic L-functions… it seems reasonable to expect the geometry of arithmetic gauge fields to play a key role in importing quantum field theoretic dualities to arithmetic geometry.

Perhaps the deepest idea would have to reflect the lack of uniformity in arithmetic. Minhyong writes in his latest paper about the action of G K=Gal(K¯,K)G_K = Gal(\bar K, K)

The G KG_K-action is usually highly non-trivial, and this is a main difference from geometric gauge theory, where the gauge group tends to be constant over spacetime.

Even if orbifolds and singularities appear in the latter, maybe there’s still a difference. From a dilettantish wish to make sense of Buium and Borger’s nLab: arithmetic jet spaces, I had hoped that the geometric jet space constructions as beautifully captured by the nlab: jet comonad, might help. But arithmetic always seems a little obstructive, and one can’t quite reach the adjoint quadruples of the cohesive world: nlab: Borger’s absolute geometry. James Borger explained this to me as follows:

the usual, differential jet space of X can be constructed by gluing together jet spaces on open subsets, essentially because an infinitesimal arc can never leave an open subset. However, the analogous thing is not true for arithmetic jet spaces, because a Frobenius lift can jump outside an open set. So you can’t construct them by gluing local jet spaces together!

So plenty to explore. Where Minhyong speaks of arithmetic Euler-Lagrange equations, how does this compare with the jet comonadic version of Urs, outlined in Higher Prequantum Geometry II: The Principle of Extremal Action - Comonadically?

December 20, 2017

Tommaso DorigoDiboson Resonances - Part 2

In the previous post I discussed the generalities of "diboson production" at the LHC. Dibosons are pairs of elementary bosons - the photon (carrier of electromagnetic interactions), the W and Z bosons (carriers of the weak interaction, respectively charged and neutral), the gluon (carrier of the strong interaction, and coming in 8 undistinguishable varieties), and the Higgs particle. 

read more

December 19, 2017

Doug NatelsonThe state of science - hyperbole doesn't help.

It seems like every few weeks these days there is a breathless essay or editorial saying science is broken, or that science as a whole is in the midst of a terrible crisis, or that science is both broken and in the midst of a terrible crisis.  These articles do have a point, and I'm not trying to trivialize anything they say, but come on - get a grip.  Science, and its cousin engineering, have literally reshaped society in the last couple of hundred years.  We live in an age of miracles so ubiquitous we don't notice how miraculous they are.  More people (in absolute numbers and as a percentage of the population) are involved in some flavor of science or engineering than ever before.

That does mean that yes, there will be more problems in absolute numbers than before, too, because the practice of science and engineering is a human endeavor.  Like anything else done by humans, that means there will be a broad spectrum of personalities involved, that not everyone will agree with interpretations or ideas, that some people will make mistakes, and that occasionally some objectionable people will behave unethically.   Decisions will be made and incentives set up that may have unintended consequences (e.g., trying to boost Chinese science by rewarding high impact papers leads to a perverse incentive to cheat.).   This does not imply that the entire practice of science is hopelessly flawed and riddled with rot, any more than a nonzero malpractice rate implies that all of medicine is a disaster.

Why is there such a sense of unease right now about the state of science and the research enterprise?  I'm not a sociologist, but here's my take.

Spreading information, good and bad, can happen more readily than ever before.  People look at sites like pubpeer and come away with the impression that the sky is falling, when in fact we should be happy that there now, for the first time ever, exists a venue for pointing out potential problems.  We are now able to learn about flawed studies and misconduct far more effectively than even twenty years ago, and that changes perceptions.  This seems to be similar to the disconnect between perception of crime rates and actual crime rates.

Science is, in fact, often difficult.  People can be working with complex systems, perhaps more complicated than their models assume.   This means that sometimes there can be good (that is, legitimate) reasons why reproducing someone's results can be difficult.  Correlation doesn't equal causation; biological and social phenomena can be incredibly complex, with many underlying degrees of freedom and often only a few quantifiable parameters.  In the physical sciences we often look askance at those fields and think that we are much better, but laboratory science in physics and chemistry can be genuinely challenging.  (An example from my own career:  We were working with a collaborator whose postdoc was making some very interesting nanoparticles, and we saw exciting results with them, including features that coincided with a known property of the target material.  The postdoc went on to a faculty position and the synthesis got taken over by a senior grad student.  Even following very clear directions, it took over 6 months before the grad student's particles had the target composition and we reproduced the original results, because of some incredibly subtle issue with the synthesis procedure that had changed unintentionally and "shouldn't" have mattered.)

Hyperbolic self-promotion and reporting are bad.   Not everything is a breakthrough of cosmic significance, not every advance is transformative, and that's ok.  Acting otherwise sets scientists and engineers up for a public backlash from years of overpromising and underdelivering.   The public ends up with the perception that scientists and engineers are hucksters.  Just as bad, the public ends up with the idea that "science" is just as valid a way of looking at the world as astrology, despite the fact that science and engineering have actually resulted in technological society.  Even worse, in the US it is becoming very difficult to disentangle science from politics, again despite the fact that one is (at least in principle) a way of looking at the world and trying to determine what the rules are, while the other can be driven entirely by ideology.  This discussion of permissible vocabulary is indicative of a far graver threat to science as a means of learning about the universe than actual structural problems with science itself.  Philosophical definitions aside and practical ones to the fore, facts are real, and have meaning, and science is a way of constraining what those facts are.

We can and should do better.  Better at being rigorous, better at making sure our conclusions are justified and knowing their limits of validity, better at explaining ourselves to each other and the public, better at policing ourselves when people transgress in their scientific ethics or code of conduct.

None of these issues, however, imply that science itself as a whole is hopelessly flawed or broken, and I am concerned that by repeatedly stating that science is broken, we are giving aid and comfort to those who don't understand it and feel threatened by it.


December 18, 2017

Secret Blogging SeminarA grad student fellowship

My CAREER grant includes funding for a thesis-writing fellowship for graduate students who have done extraordinary teaching and outreach during their time as a grad student.  If you know any such grad students who are planning on graduating in the 2018-2019 school year please encourage them to apply.  Deadline is Jan 31 and details here.

While I’m shamelessly plugging stuff for early career people, Dave Penneys, Julia Plavnik, and I are running an MRC this summer in Quantum Symmetry.  It’s aimed at people at -2 to +5 years from Ph.D. working in tensor categories, subfactors, topological phases of matter, and related fields, and the deadline is Feb 15.

 


December 15, 2017

Andrew JaffeWMAP Breaks Through

It was announced this morning that the WMAP team has won the $3 million Breakthrough Prize. Unlike the Nobel Prize, which infamously is only awarded to three people each year, the Breakthrough Prize was awarded to the whole 27-member WMAP team, led by Chuck Bennett, Gary Hinshaw, Norm Jarosik, Lyman Page, and David Spergel, but including everyone through postdocs and grad students who worked on the project. This is great, and I am happy to send my hearty congratulations to all of them (many of whom I know well and am lucky to count as friends).

I actually knew about the prize last week as I was interviewed by Nature for an article about it. Luckily I didn’t have to keep the secret for long. Although I admit to a little envy, it’s hard to argue that the prize wasn’t deserved. WMAP was ideally placed to solidify the current standard model of cosmology, a Universe dominated by dark matter and dark energy, with strong indications that there was a period of cosmological inflation at very early times, which had several important observational consequences. First, it made the geometry of the Universe — as described by Einstein’s theory of general relativity, which links the contents of the Universe with its shape — flat. Second, it generated the tiny initial seeds which eventually grew into the galaxies that we observe in the Universe today (and the stars and planets within them, of course).

By the time WMAP released its first results in 2003, a series of earlier experiments (including MAXIMA and BOOMERanG, which I had the privilege of being part of) had gone much of the way toward this standard model. Indeed, about ten years one of my Imperial colleagues, Carlo Contaldi, and I wanted to make that comparison explicit, so we used what were then considered fancy Bayesian sampling techniques to combine the data from balloons and ground-based telescopes (which are collectively known as “sub-orbital” experiments) and compare the results to WMAP. We got a plot like the following (which we never published), showing the main quantity that these CMB experiments measure, called the power spectrum (which I’ve discussed in a little more detail here). The horizontal axis corresponds to the size of structures in the map (actually, its inverse, so smaller is to the right) and the vertical axis to how large the the signal is on those scales.

Grand unified spectrum

As you can see, the suborbital experiments, en masse, had data at least as good as WMAP on most scales except the very largest (leftmost; this is because you really do need a satellite to see the entire sky) and indeed were able to probe smaller scales than WMAP (to the right). Since then, I’ve had the further privilege of being part of the Planck Satellite team, whose work has superseded all of these, giving much more precise measurements over all of these scales: PlanckCl

Am I jealous? Ok, a little bit.

But it’s also true, perhaps for entirely sociological reasons, that the community is more apt to trust results from a single, monolithic, very expensive satellite than an ensemble of results from a heterogeneous set of balloons and telescopes, run on (comparative!) shoestrings. On the other hand, the overall agreement amongst those experiments, and between them and WMAP, is remarkable.

And that agreement remains remarkable, even if much of the effort of the cosmology community is devoted to understanding the small but significant differences that remain, especially between one monolithic and expensive satellite (WMAP) and another (Planck). Indeed, those “real and serious” (to quote myself) differences would be hard to see even if I plotted them on the same graph. But since both are ostensibly measuring exactly the same thing (the CMB sky), any differences — even those much smaller than the error bars — must be accounted for almost certainly boil down to differences in the analyses or misunderstanding of each team’s own data. Somewhat more interesting are differences between CMB results and measurements of cosmology from other, very different, methods, but that’s a story for another day.

Tommaso DorigoDiboson Resonances: A Whole New World - Part 1

After one quite frantic November, I emerged victorious two weeks ago from the delivery of a 78-pages, 49-thousand-word review titled "Hadron Collider Searches for Diboson Resonances". The article, which will be published in the prestigious "Progress in Particle and Nuclear Physics", an Elsevier journal with an impact factor above 11 (compare with Physics Letters B, IF=4.8, or Physical Review Letters, IF=8.5, to see why it's relevant), is currently in peer review, but that does not mean that I cannot make a short summary of its contents here.

read more

December 14, 2017

Robert HellingWhat are the odds?

It's the time of year, you give out special problems in your classes. So this is mine for the blog. It is motivated by this picture of the home secretaries of the German federal states after their annual meeting as well as some recent discussions on Facebook:
I would like to call it Summers' problem:

Let's have two real random variables $M$ and $F$ that are drawn according to two probability distributions $\rho_{M/F}(x)$ (for starters you may both assume to be Gaussians but possibly with different mean and variance). Take $N$ draws from each and order the $2N$ results. What is the probability that the $k$ largest ones are all from $M$ rather than $F$? Express your results in terms of the $\rho_{M/F}(x)$. We are also interested in asymptotic results for $N$ large and $k$ fixed as well as $N$ and $k$ large but $k/N$ fixed.

Last bonus question: How many of the people that say that they hire only based on merit and end up with an all male board realise that by this they say that women are not as good by quite a margin?

Secret Blogging SeminarFighting the grad student tax

I’m throwing this post up quickly, because time is of the essence. I had hoped someone else would do the work. If they did, please link them in the comments.

As many of you know, the US House and Senate have passed revisions to the tax code. According to the House, but not the Senate draft, graduate tuition remissions are taxed as income. Thus, here at U Michigan, our graduate stipend is 19K and our tuition is 12K. If the House version takes effect, our students would be billed as if receiving 31K without getting a penny more to pay it with.

It is thus crucial which version of the bill goes forward. The first meeting of the reconciliation committee is TONIGHT, at 6 PM Eastern Time. Please contact your congress people. You can look up their contact information here. Even if they are clearly on the right side of this issue, they need to be able to report how many calls they have gotten about it when arguing with their colleagues. Remember — be polite, make it clear what you are asking for, and make it clear that you live and vote in their district. If you work at a large public university in their district, you may want to point out the effect this will have on that university.

I’ll try to look up information about congress people who are specifically on the committee or otherwise particularly vulnerable. Jordan Ellenberg wrote a “friends only” facebook post relevant to this, which I encourage him to repost publicly, on his blog or in the comments here.

UPDATE According to the Wall Street Journal, the grad tax is out. Ordinarily, I thank my congress people when they’ve been on the right side of an issue and won. (Congress people are human, and appreciate thanks too!) In this case, I believe the negotiations happened largely in secret, so I’m not sure who deserves thanks. If anyone knows, feel free to post below.


December 12, 2017

Jonathan ShockReflections

It's been three years or so since my last post. It would be cliche to talk of all that has changed, but perhaps the more crucial aspects surround what has remained the same. Ironically, the changes have often been associated with the realisation of themes and patterns which recur again and again...perhaps it is simply the spotting of these patterns which leads to potential growth, though growth often feels like a multi-directional random walk, where over time you get further from where you started, but the directions only proliferate as you find more corners to turn down. The perpendicular nature of Euclidean dimensions is itself probably illusory as here, directions may curve back on themselves and become a lost corridor which you had previously explored and long since left behind, with dust hiding your former footsteps...but the particles still floating reminds you that you've been here before and that there are mistakes still to learn from what may be a narrow and uncomfortable passageway. So perhaps it is all some giant torus, and the trick is counting the closed cycles and trying to find ones which are enjoyable to go around, rather than trying to escape from them all, which is probably futile.

A cycle which recurs is hidden truth, or perhaps, more accurately the hiding of truth...it may be deeper than the hiding of truth from the map as the map-maker himself may be unaware of what is resistance/compromise/expectation/honesty/pain/frustration or joy in a given journey. Google Maps has to go back to the matrix to find any reality hidden within the neurons, and maybe the reality has been erased long ago...'safe passage', or 'dead end' being all that is left...where in fact it may not be a dead end at all, and the paralysis that ensues would be eased with only the simplest of secret code words being divulged...I don't like that...there is a better way...can you do this for me?

This is not to be read as deep or meaningful, but really as a little gush of words, to ease the fingers into writing some more, which I hope to do. This is a deep breath and a stretch after a long long sleep...


December 05, 2017

John PreskillThe light show

Atoms 2

A strontium magneto-optical trap.

How did a quantum physics experiment end up looking like a night club? Due to a fortunate coincidence of nature, my lab mates and I at Endres Lab get to use three primary colors of laser light – red, blue, and green – to trap strontium atoms.  Let’s take a closer look at the physics behind this visually entrancing combination.

The spectrum

Sr level structure

The electronic spectrum of strontium near the ground state.

The trick to research is finding a problem that is challenging enough to be interesting, but accessible enough to not be impossible.  Strontium embodies this maxim in its electronic spectrum.  While at first glance it may seem daunting, it’s not too bad once you get to know each other.  Two valence electrons divide the spectrum into a spin-singlet sector and a spin-triplet sector – a designation that roughly defines whether the electron spins point in the opposite or in the same direction.  Certain transitions between these sectors are extremely precisely defined, and currently offer the best clock standards in the world.  Although navigating this spectrum requires more lasers, it offers opportunities for quantum physics that singly-valent spectra do not.  In the end, the experimental complexity is still very much manageable, and produces some great visuals to boot.  Here are some of the lasers we use in our lab:

The blue

At the center of the .gif above is a pulsating cloud of strontium atoms, shining brightly blue.  This is a magneto-optical trap, produced chiefly by strontium’s blue transition at 461nm.

IMG_3379

461nm blue laser light being routed through various paths.

The blue transition is exceptionally strong, scattering about 100 million photons per atom per second.  It is the transition we use to slow strontium atoms from a hot thermal beam traveling at hundreds of meters per second down to a cold cloud at about 1 milliKelvin.  In less than a second, this procedure gives us a couple hundred million atoms to work with.  As the experiment repeats, we get to watch this cloud pulse in and out of existence.

The red(s)

IMG_3380

689nm red light.  Bonus: Fabry-Perot interference fringes on my camera!

While the blue transition is a strong workhorse, the red transition at 689nm trades off strength for precision.  It couples strontium’s spin-singlet ground state to an excited spin-triplet state, a much weaker but more precisely defined transition.  While it does not scatter as fast as the blue (only about 23,000 photons per atom per second), it allows us to cool our atoms to much colder temperatures, on the order of 1 microKelvin.

In addition to our red laser at 689nm, we have two other reds at 679nm and 707nm.  These are necessary to essentially plug “holes” in the blue transition, which eventually cause an atom to fall into long-lived states other than the ground state.  It is generally true that the more complicated an atomic spectrum gets, the more “holes” there are to plug, and this is many times the reason why certain atoms and molecules are harder to trap than others.

The green

After we have established a cold magneto-optical trap, it is time to pick out individual atoms from this cloud and load them into very tightly focused optical traps that we call tweezers.  Here, our green laser comes into play.  This laser’s wavelength is far away from any particular transition, as we do not want it to scatter any photons at all.  However, its large intensity creates a conservative trapping potential for the atom, allowing us to hold onto it and even move it around.  Furthermore, its wavelength is what we call “magic”, which means it is chosen such that the ground and excited state experience the same trapping potential.

IMG_3369

The quite powerful green laser.  So powerful that you can see the beam in the air, like in the movies.

The invisible

Yet to be implemented are two more lasers slightly off the visible spectrum at both the ultraviolet and infrared sides.  Our ultraviolet laser will be crucial to elevating our experiment from single-body to many-body quantum physics, as it will allow us to drive our atoms to very highly excited Rydberg states which interact with long range.  Our infrared laser will allow us to trap atoms in the extremely precise clock state under “magic” conditions.

 

The combination of strontium’s various optical pathways allows for a lot of new tricks beyond just cooling and trapping.  Having Rydberg states alongside narrow-line transitions, for example, has yet unexplored potential for quantum simulation.  It is a playground that is very exciting without being utterly overwhelming.  Stay tuned as we continue our exploration – maybe we’ll have a yellow laser next time too.

 


November 29, 2017

John PreskillMachine learning the arXiv

Over the last year or so, the machine learning wave has really been sweeping through the field of condensed matter physics. Machine learning techniques have been applied to condensed matter physics before, but very sparsely and with little recognition. These days, I guess (partially) due to the general machine learning and AI hype, the amount of such studies skyrocketed (I admit to contributing to that..). I’ve been keeping track of this using the arXiv and Twitter (@Evert_v_N), but you should know about this website for getting an overview of the physics & machine learning papers: https://physicsml.github.io/pages/papers.html.

This effort of applying machine learning to physics is a serious attempt at trying to understand how such tools could be useful in a variety of ways. It isn’t very hard to get a neural network to learn ‘something’ from physics data, but it is really hard to find out what – and especially how – the network does that. That’s why toy cases such as the Ising model or the Kosterlitz-Thouless transition have been so important!

When you’re keeping track of machine learning and AI developments, you soon realize that there are examples out there of amazing feats. Being able to generate photo-realistic pictures given just a sentence. e.g. “a brown bird with golden speckles and red wings is sitting on a yellow flower with pointy petals”, is (I think..) pretty cool. I can’t help but wonder if we’ll get to a point where we can ask it to generate “the groundstate of the Heisenberg model on a Kagome lattice of 100×100”…

Another feat I want to mention, and the main motivation for this post, is that of being able to encode words as vectors. That doesn’t immediately seem like a big achievement, but it is once you want to have ‘similar’ words have ‘similar’ vectors. That is, you intuitively understand that Queen and King are very similar, but differ basically only in gender. Can we teach that to a computer (read: neural network) by just having it read some text? Turns out we can. The general encoding of words to vectors is aptly named ‘Word2Vec’, and some of the top algorithms that do that were introduced here (https://arxiv.org/abs/1301.3781) and here (https://arxiv.org/abs/1310.4546). The neat thing is that we can actually do arithmetics with these words encoded as vectors, so that the network learns (with no other input than text!):

  • King – Man + Woman = Queen
  • Paris – France + Italy = Rome

In that spirit, I wondered if we can achieve the same thing with physics jargon. Everyone knows, namely, that “electrons + two dimensions + magnetic field = Landau levels”. But is that clear from condensed matter titles?

Try it yourself

If you decide at this point that the rest of the blog is too long, at least have a look here: everthemore.pythonanywhere.com or skip to the last section. That website demonstrates the main point of this post. If that sparks your curiosity, read on!

This post is mainly for entertainment, and so a small disclaimer is in order: in all of the results below, I am sure things can be improved upon. Consider this a ‘proof of principle’. However, I would be thrilled to see what kind of trained models you can come up with yourself! So for that purpose, all of the code (plus some bonus content!) can be found on this github repository: https://github.com/everthemore/physics2vec.

Harvesting the arXiv

The perfect dataset for our endeavor can be found in the form of the arXiv. I’ve written a small script (see github repository) that harvests the titles of a given section from the arXiv. It also has options for getting the abstracts, but I’ll leave that for a separate investigation. Note that in principle we could also get the source-files of all of these papers, but doing that in bulk requires a payment; and getting them one by one will 1) take forever and 2) probably get us banned.

Collecting all this data of the physics:cond-mat subsection took right about 1.5 hours and resulted in 240737 titles and abstracts (I last ran this script on November 20th, 2017). I’ve filtered them by year and month, and you can see the result in Fig.1 below. Seems like we have some catching up to do in 2017 still (although as the inset shows, we have nothing to fear. November is almost over, but we still have the ‘getting things done before x-mas’ rush coming up!).

numpapers

Figure 1: The number of papers in the cond-mat arXiv section over the years. We’re behind, but the year isn’t over yet! (Data up to Nov 20th 2017)

Analyzing n-grams

After tidying up the titles (removing LaTeX, converting everything to lowercase, etc.), the next thing to do is to train a language model on finding n-grams. N-grams are basically fixed n-word expressions such as ‘cooper pair’ (bigram) or ‘metal insulator transition’ (trigram). This makes it easier to train a Word2Vec encoding, since these phrases are fixed and can be considered a single word. The python module we’ll use for Word2Vec is gensim (https://radimrehurek.com/gensim/), and it conveniently has phrase-detection built-in. The language model it builds reports back to us the n-grams it finds, and assigns them a score indicating how certain it is about them. Notice that this is not the same as how frequently it appears in the dataset. Hence an n-gram can appear fewer times than another, but have a higher certainty because it always appears in the same combination. For example, ‘de-haas-van-alphen’ appears less than, but is more certain than, ‘cooper-pair’, because ‘pair’ does not always come paired (pun intended) with ‘cooper’. I’ve analyzed up to 4-grams in the analysis below.

I can tell you’re curious by now to find out what some of the most certain n-grams in cond-mat are (again, these are not necessarily the most frequent), so here are some interesting findings:

  • The most certain n-grams are all surname combo’s, Affleck-Kennedy-Lieb-Tasaki being the number 1. Kugel-Khomskii is the most certain 2-name combo and Einstein-Podolksi-Rosen the most certain 3-name combo.
  • The first certain non-name n-gram is a ‘quartz tuning fork’, followed by a ‘superconducting coplanar waveguide resonator’. Who knew.
  • The bigram ‘phys. rev.’ and trigram ‘phys. rev. lett.’ are relatively high up in the confidence lists. These seem to come from the “Comment on […]”-titles on the arXiv.
  • I learned that there is such a thing as a Lefschetz thimble. I also learned that those things are called thimbles in English (we (in Holland) call them ‘finger-hats’!).

In terms of frequency however, which is probably more of interest to us, the most dominant n-grams are Two-dimensional, Quantum dot, Phase transition, Magnetic field, One dimensional and Bose-Einstein (in descending order). It seems 2D is still more popular than 1D, and all in all the top n-grams do a good job at ‘defining’ condensed matter physics. I’ll refer you to the github repository code if you want to see a full list! You’ll find there a piece of code that produces wordclouds from the dominant words and n-grams too, such as this one:

caltechwordcloud.png

For fun though, before we finally get to the Word2Vec encoding, I’ve also kept track of all of these as a function of year, so that we can now turn to finding out which bigrams have been gaining the most popularity. The table below shows the top 5 n-grams for the period 2010 – 2016 (not including 2017) and for the period 2015 – Nov 20th 2017.

2010-2016

2015 – November 20th 2017

Spin liquids  Topological phases & transitions
 Weyl semimetals  Spin chains
 Topological phases & transitions  Machine learning
 Surface states  Transition metal dichalcogenides
 Transition metal dichalcogenides  Thermal transport
 Many-body localization  Open quantum systems

Actually, the real number 5 in the left column was ‘Topological insulators’, but given number 3 I skipped it. Also, this top 5 includes a number 6 (!), which I just could not leave off given that everyone seems to have been working on MBL. If we really want to be early adopters though, taking only the last 1.8 years (2015 – now, Nov 20th 2017)  in the right column of the table shows some interesting newcomers. Surprisingly, many-body localization is not even in the top 20 anymore. Suffice it to say, if you have been working on anything topology-related, you have nothing to worry about. Machine learning is indeed gaining lots of attention, but we’ve yet to see if it doesn’t go the MBL-route (I certainly don’t hope so!). Quantum computing does not seem to be on the cond-mat radar, but I’m certain we would find that high up in the quant-ph arXiv section.

CondMat2Vec

Alright, finally time to use some actual neural networks for machine learning. As I started this post, what we’re about to do is try to train a network to encode/decode words into vectors, while simultaneously making sure that similar words (by meaning!) have similar vectors. Now that we have the n-grams, we want the Word2Vec algorithm to treat these as words by themselves (they are, after all, fixed combinations).

In the Word2Vec algorithm, we get to decide the length of the vectors that encode words ourselves. Larger vectors means more freedom in encoding words, but also makes it harder to learn similarity. In addition, we get to choose a window-size, indicating how many words the algorithm will look ahead to analyze relations between words. Both of these parameters are free for you to play with if you have a look at the source code repository. For the website everthemore.pythonanywhere.com, I’ve uploaded a size 100 with window-size 10 model, which I found to produce sensible results. Sensible here means “based on my expectations”, such as the previous example of “2D + electrons + magnetic field = Landau levels”. Let’s ask our network some questions.

First, as a simple check, let’s see what our encoding thinks some jargon is similar to:

  • Superconductor ~ Superconducting, Cuprate superconductor, Superconductivity, Layered superconductor, Unconventional superconductor, Superconducting gap, Cuprate, Weyl semimetal, …
  • Majorana ~ Majorana fermion, Majorana mode, Non-abelian, Zero-energy, braiding, topologically protected, …

It seems we could start to cluster words based on this. But the real test comes now, in the form of arithmetics. According to our network (I am listing the top two choices in some cases; the encoder outputs a list of similar vectors, ordered by similarity):

  • Majorana + Braiding = Non-Abelian
  • Electron + Hole = Exciton, Carrier
  • Spin + Magnetic field = Magnetization, Antiferromagnetic
  • Particle + Charge = Electron, Charged particle

And, sure enough:

  • 2D + electrons + magnetic field = Landau level, Magnetoresistance oscillation

The above is just a small sample of the things I’ve tried. See the link in the try it yourself section above if you want to have a go. Not all of the examples work nicely. For example, neither lattice + wave nor lattice + excitation nor lattice + force seem to result in anything related to the word ‘phonon’. I would guess that increasing the window size will help remedy this problem. Even better probably would be to include abstracts!

Outlook

I could play with this for hours, and I’m sure that by including the abstracts and tweaking the vector size (plus some more parameters I haven’t even mentioned) one could optimize this more. Once we have an optimized model, we could start to cluster the vectors to define research fields, visualizing the relations between n-grams (both suggestions thanks to Thomas Vidick and John Preskill!), and many other things. This post has become rather long already however, and I will leave further investigation to a possible future post. I’d be very happy to incorporate anything cool you find yourselves though, so please let me know!


November 27, 2017

John PreskillGently yoking yin to yang

The architecture at the University of California, Berkeley mystified me. California Hall evokes a Spanish mission. The main library consists of white stone pillared by ionic columns. A sea-green building scintillates in the sunlight like a scarab. The buildings straddle the map of styles.

Architecture.001

So do Berkeley’s quantum scientists, information-theory users, and statistical mechanics.

The chemists rove from abstract quantum information (QI) theory to experiments. Physicists experiment with superconducting qubits, trapped ions, and numerical simulations. Computer scientists invent algorithms for quantum computers to perform.

Few activities light me up more than bouncing from quantum group to info-theory group to stat-mech group, hunting commonalities. I was honored to bounce from group to group at Berkeley this September.

What a trampoline Berkeley has.

The groups fan out across campus and science, but I found compatibility. Including a collaboration that illuminated quantum incompatibility.

Quantum incompatibility originated in studies by Werner Heisenberg. He and colleagues cofounded quantum mechanics during the early 20th century. Measuring one property of a quantum system, Heisenberg intuited, can affect another property.

The most famous example involves position and momentum. Say that I hand you an electron. The electron occupies some quantum state represented by | \Psi \rangle. Suppose that you measure the electron’s position. The measurement outputs one of many possible values x (unless | \Psi \rangle has an unusual form, the form a Dirac delta function).

But we can’t say that the electron occupies any particular point x = x_0 in space. Measurement devices have limited precision. You can measure the position only to within some error \varepsilon: x = x_0 \pm \varepsilon.

Suppose that, immediately afterward, you measure the electron’s momentum. This measurement, too, outputs one of many possible values. What probability q(p) dp does the measurement have of outputting some value p? We can calculate q(p) dp, knowing the mathematical form of | \Psi \rangle and knowing the values of x_0 and \varepsilon.

q(p) is a probability density, which you can think of as a set of probabilities. The density can vary with p. Suppose that q(p) varies little: The probabilities spread evenly across the possible p values. You have no idea which value your momentum measurement will output. Suppose, instead, that q(p) peaks sharply at some value p = p_0. You can likely predict the momentum measurement’s outcome.

The certainty about the momentum measurement trades off with the precision \varepsilon of the position measurement. The smaller the \varepsilon (the more precisely you measured the position), the greater the momentum’s unpredictability. We call position and momentum complementary, or incompatible.

You can’t measure incompatible properties, with high precision, simultaneously. Imagine trying to do so. Upon measuring the momentum, you ascribe a tiny range of momentum values p to the electron. If you measured the momentum again, an instant later, you could likely predict that measurement’s outcome: The second measurement’s q(p) would peak sharply (encode high predictability). But, in the first instant, you measure also the position. Hence, by the discussion above, q(p) would spread out widely. But we just concluded that q(p) would peak sharply. This contradiction illustrates that you can’t measure position and momentum, precisely, at the same time.

But you can simultaneously measure incompatible properties weakly. A weak measurement has an enormous \varepsilon. A weak position measurement barely spreads out q(p). If you want more details, ask a Quantum Frontiers regular; I’ve been harping on weak measurements for months.

Blame Berkeley for my harping this month. Irfan Siddiqi’s and Birgitta Whaley’s groups collaborated on weak measurements of incompatible observables. They tracked how the measured quantum state | \Psi (t) \rangle evolved in time (represented by t).

Irfan’s group manipulates superconducting qubits.1 The qubits sit in the physics building, a white-stone specimen stamped with an egg-and-dart motif. Across the street sit chemists, including members of Birgitta’s group. The experimental physicists and theoretical chemists teamed up to study a quantum lack of teaming up.

Phys. &amp; chem. bldgs

The experiment involved one superconducting qubit. The qubit has properties analogous to position and momentum: A ball, called the Bloch ball, represents the set of states that the qubit can occupy. Imagine an arrow pointing from the sphere’s center to any point in the ball. This Bloch vector represents the qubit’s state. Consider an arrow that points upward from the center to the surface. This arrow represents the qubit state | 0 \rangle. | 0 \rangle is the quantum analog of the possible value 0 of a bit, or unit of information. The analogous downward-pointing arrow represents the qubit state | 1 \rangle, analogous to 1.

Infinitely many axes intersect the sphere. Different axes represent different observables that Irfan’s group can measure. Nonparallel axes represent incompatible observables. For example, the x-axis represents an observable \sigma_x analogous to position. The y-axis represents an observable \sigma_y analogous to momentum.

Tug-of-war

Siddiqi lab, decorated with the trademark for the paper’s tug-of-war between incompatible observables. Photo credit: Leigh Martin, one of the paper’s leading authors.

Irfan’s group stuck their superconducting qubit in a cavity, or box. The cavity contained light that interacted with the qubit. The interactions transferred information from the qubit to the light: The light measured the qubit’s state. The experimentalists controlled the interactions, controlling the axes “along which” the light was measured. The experimentalists weakly measured along two axes simultaneously.

Suppose that the axes coincided—say, at the x-axis \hat{x}. The qubit would collapse to the state | \Psi \rangle = \frac{1}{ \sqrt{2} } ( | 0 \rangle + | 1 \rangle ), represented by the arrow that points along \hat{x} to the sphere’s surface, or to the state | \Psi \rangle = \frac{1}{ \sqrt{2} } ( | 0 \rangle - | 1 \rangle ), represented by the opposite arrow.

0 deg

(Projection of) the Bloch Ball after the measurement. The system can access the colored points. The lighter a point, the greater the late-time state’s weight on the point.

Let \hat{x}' denote an axis near \hat{x}—say, 18° away. Suppose that the group weakly measured along \hat{x} and \hat{x}'. The state would partially collapse. The system would access points in the region straddled by \hat{x} and \hat{x}', as well as points straddled by - \hat{x} and - \hat{x}'.

18 deg

Finally, suppose that the group weakly measured along \hat{x} and \hat{y}. These axes stand in for position and momentum. The state would, loosely speaking, swirl around the Bloch ball.

90 deg

The Berkeley experiment illuminates foundations of quantum theory. Incompatible observables, physics students learn, can’t be measured simultaneously. This experiment blasts our expectations, using weak measurements. But the experiment doesn’t just destroy. It rebuilds the blast zone, by showing how | \Psi (t) \rangle evolves.

“Position” and “momentum” can hang together. So can experimentalists and theorists, physicists and chemists. So, somehow, can the California mission and the ionic columns. Maybe I’ll understand the scarab building when we understand quantum theory.2

With thanks to Birgitta’s group, Irfan’s group, and the rest of Berkeley’s quantum/stat-mech/info-theory community for its hospitality. The Bloch-sphere figures come from http://www.nature.com/articles/nature19762.

1The qubit is the quantum analog of a bit. The bit is the basic unit of information. A bit can be in one of two possible states, which we can label as 0 and 1. Qubits can manifest in many physical systems, including superconducting circuits. Such circuits are tiny quantum circuits through which current can flow, without resistance, forever.

2Soda Hall dazzled but startled me.


November 25, 2017

Sean CarrollThanksgiving

This year we give thanks for a simple but profound principle of statistical mechanics that extends the famous Second Law of Thermodynamics: the Jarzynski Equality. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, effective field theory, the error bar, gauge symmetry, Landauer’s Principle, the Fourier Transform, Riemannian Geometry, and the speed of light.)

The Second Law says that entropy increases in closed systems. But really it says that entropy usually increases; thermodynamics is the limit of statistical mechanics, and in the real world there can be rare but inevitable fluctuations around the typical behavior. The Jarzynski Equality is a way of quantifying such fluctuations, which is increasingly important in the modern world of nanoscale science and biophysics.

Our story begins, as so many thermodynamic tales tend to do, with manipulating a piston containing a certain amount of gas. The gas is of course made of a number of jiggling particles (atoms and molecules). All of those jiggling particles contain energy, and we call the total amount of that energy the internal energy U of the gas. Let’s imagine the whole thing is embedded in an environment (a “heat bath”) at temperature T. That means that the gas inside the piston starts at temperature T, and after we manipulate it a bit and let it settle down, it will relax back to T by exchanging heat with the environment as necessary.

Finally, let’s divide the internal energy into “useful energy” and “useless energy.” The useful energy, known to the cognoscenti as the (Helmholtz) free energy and denoted by F, is the amount of energy potentially available to do useful work. For example, the pressure in our piston may be quite high, and we could release it to push a lever or something. But there is also useless energy, which is just the entropy S of the system times the temperature T. That expresses the fact that once energy is in a highly-entropic form, there’s nothing useful we can do with it any more. So the total internal energy is the free energy plus the useless energy,

U = F + TS. \qquad \qquad (1)

Our piston starts in a boring equilibrium configuration a, but we’re not going to let it just sit there. Instead, we’re going to push in the piston, decreasing the volume inside, ending up in configuration b. This squeezes the gas together, and we expect that the total amount of energy will go up. It will typically cost us energy to do this, of course, and we refer to that energy as the work Wab we do when we push the piston from a to b.

Remember that when we’re done pushing, the system might have heated up a bit, but we let it exchange heat Q with the environment to return to the temperature T. So three things happen when we do our work on the piston: (1) the free energy of the system changes; (2) the entropy changes, and therefore the useless energy; and (3) heat is exchanged with the environment. In total we have

W_{ab} = \Delta F_{ab} + T\Delta S_{ab} - Q_{ab}.\qquad \qquad (2)

(There is no ΔT, because T is the temperature of the environment, which stays fixed.) The Second Law of Thermodynamics says that entropy increases (or stays constant) in closed systems. Our system isn’t closed, since it might leak heat to the environment. But really the Second Law says that the total of the last two terms on the right-hand side of this equation add up to a positive number; in other words, the increase in entropy will more than compensate for the loss of heat. (Alternatively, you can lower the entropy of a bottle of champagne by putting it in a refrigerator and letting it cool down; no laws of physics are violated.) One way of stating the Second Law for situations such as this is therefore

W_{ab} \geq \Delta F_{ab}. \qquad \qquad (3)

The work we do on the system is greater than or equal to the change in free energy from beginning to end. We can make this inequality into an equality if we act as efficiently as possible, minimizing the entropy/heat production: that’s an adiabatic process, and in practical terms amounts to moving the piston as gradually as possible, rather than giving it a sudden jolt. That’s the limit in which the process is reversible: we can get the same energy out as we put in, just by going backwards.

Awesome. But the language we’re speaking here is that of classical thermodynamics, which we all know is the limit of statistical mechanics when we have many particles. Let’s be a little more modern and open-minded, and take seriously the fact that our gas is actually a collection of particles in random motion. Because of that randomness, there will be fluctuations over and above the “typical” behavior we’ve been describing. Maybe, just by chance, all of the gas molecules happen to be moving away from our piston just as we move it, so we don’t have to do any work at all; alternatively, maybe there are more than the usual number of molecules hitting the piston, so we have to do more work than usual. The Jarzynski Equality, derived 20 years ago by Christopher Jarzynski, is a way of saying something about those fluctuations.

One simple way of taking our thermodynamic version of the Second Law (3) and making it still hold true in a world of fluctuations is simply to say that it holds true on average. To denote an average over all possible things that could be happening in our system, we write angle brackets \langle \cdots \rangle around the quantity in question. So a more precise statement would be that the average work we do is greater than or equal to the change in free energy:

\displaystyle \left\langle W_{ab}\right\rangle \geq \Delta F_{ab}. \qquad \qquad (4)

(We don’t need angle brackets around ΔF, because F is determined completely by the equilibrium properties of the initial and final states a and b; it doesn’t fluctuate.) Let me multiply both sides by -1, which means we  need to flip the inequality sign to go the other way around:

\displaystyle -\left\langle W_{ab}\right\rangle \leq -\Delta F_{ab}. \qquad \qquad (5)

Next I will exponentiate both sides of the inequality. Note that this keeps the inequality sign going the same way, because the exponential is a monotonically increasing function; if x is less than y, we know that ex is less than ey.

\displaystyle e^{-\left\langle W_{ab}\right\rangle} \leq e^{-\Delta F_{ab}}. \qquad\qquad (6)

(More typically we will see the exponents divided by kT, where k is Boltzmann’s constant, but for simplicity I’m using units where kT = 1.)

Jarzynski’s equality is the following remarkable statement: in equation (6), if we exchange  the exponential of the average work e^{-\langle W\rangle} for the average of the exponential of the work \langle e^{-W}\rangle, we get a precise equality, not merely an inequality:

\displaystyle \left\langle e^{-W_{ab}}\right\rangle = e^{-\Delta F_{ab}}. \qquad\qquad (7)

That’s the Jarzynski Equality: the average, over many trials, of the exponential of minus the work done, is equal to the exponential of minus the free energies between the initial and final states. It’s a stronger statement than the Second Law, just because it’s an equality rather than an inequality.

In fact, we can derive the Second Law from the Jarzynski equality, using a math trick known as Jensen’s inequality. For our purposes, this says that the exponential of an average is less than the average of an exponential, e^{\langle x\rangle} \leq \langle e^x \rangle. Thus we immediately get

\displaystyle e^{-\left\langle W_{ab}\right\rangle} \leq \left\langle e^{-W_{ab}}\right\rangle = e^{-\Delta F_{ab}}, \qquad\qquad (8)

as we had before. Then just take the log of both sides to get \langle W_{ab}\rangle \geq \Delta F_{ab}, which is one way of writing the Second Law.

So what does it mean? As we said, because of fluctuations, the work we needed to do on the piston will sometimes be a bit less than or a bit greater than the average, and the Second Law says that the average will be greater than the difference in free energies from beginning to end. Jarzynski’s Equality says there is a quantity, the exponential of minus the work, that averages out to be exactly the exponential of minus the free-energy difference. The function e^{-W} is convex and decreasing as a function of W. A fluctuation where W is lower than average, therefore, contributes a greater shift to the average of e^{-W} than a corresponding fluctuation where W is higher than average. To satisfy the Jarzynski Equality, we must have more fluctuations upward in W than downward in W, by a precise amount. So on average, we’ll need to do more work than the difference in free energies, as the Second Law implies.

It’s a remarkable thing, really. Much of conventional thermodynamics deals with inequalities, with equality being achieved only in adiabatic processes happening close to equilibrium. The Jarzynski Equality is fully non-equilibrium, achieving equality no matter how dramatically we push around our piston. It tells us not only about the average behavior of statistical systems, but about the full ensemble of possibilities for individual trajectories around that average.

The Jarzynski Equality has launched a mini-revolution in nonequilibrium statistical mechanics, the news of which hasn’t quite trickled to the outside world as yet. It’s one of a number of relations, collectively known as “fluctuation theorems,” which also include the Crooks Fluctuation Theorem, not to mention our own Bayesian Second Law of Thermodynamics. As our technological and experimental capabilities reach down to scales where the fluctuations become important, our theoretical toolbox has to keep pace. And that’s happening: the Jarzynski equality isn’t just imagination, it’s been experimentally tested and verified. (Of course, I remain just a poor theorist myself, so if you want to understand this image from the experimental paper, you’ll have to talk to someone who knows more about Raman spectroscopy than I do.)

November 23, 2017

John PreskillMajorana update

If you are, by any chance, following progress in the field of Majorana bound states, then you are for sure super excited about ample Majorana results arriving this Fall. On the other hand, if you just heard about these elusive states recently, it is time for an update. For physicists working in the field, this Fall was perhaps the most exciting time since the first experimental reports from 2012. In the last few weeks there was not only one, but at least three interesting manuscripts reporting new insightful data which may finally provide a definitive experimental verification of the existence of these states in condensed matter systems.

But before I dive into these new results, let me give a brief history on the topic of  Majorana states and their experimental observation. The story starts with the young talented physicist Ettore Majorana, who hypothesized back in 1937 the existence of fermionic particles which were their own antiparticles. These hypothetical particles, now called Majorana fermions, were proposed in the context of elementary particle physics, but never observed. Some 60 years later, in the early 2000s, theoretical work emerged showing that Majorana fermionic states can exist as the quasiparticle excitations in certain low-dimensional superconducting systems (not a real particle as originally proposed, but otherwise having the exact same properties). Since then theorists have proposed half a dozen possible ways to realize Majorana modes using readily available materials such as superconductors, semiconductors, magnets, as well as topological insulators (for curious readers, I recommend manuscripts [1, 2, 3] for an overview of the different proposed methods to realize Majorana states in the lab).

The most fascinating thing about Majorana states is that they belong to the class of anyons, which means that they behave neither as bosons nor as fermions upon exchange. For example, if you have two identical fermionic (or bosonic) states and you exchange their positions, the quantum mechanical function describing the two states will acquire a phase factor of -1 (or +1). Anyons, on the other hand, can have an arbitrary phase factor eiφ upon exchange. For this reason, they are considered to be a starting point for topological quantum computation. If you want to learn more about anyons, check out the video below featuring IQIM’s Gil Refael and Jason Alicea.

 

Back in 2012, a group in Delft (led by Prof. Leo Kouwenhoven) announced the observation of zero-energy states in a nanoscale device consisting of a semiconductor nanowire coupled to a superconductor. These states behaved very similarly to the Majoranas that were previously predicted to occur in this system. The key word here is ‘similar’, since the behavior of these modes was not fully consistent with the theoretical predictions. Namely, the electrical conductance carried through the observed zero energy states was only about ~5% of the expected perfect transmission value for Majoranas. This part of the data was very puzzling, and immediately cast some doubts throughout the community. The physicists were quickly divided into what I will call enthusiasts (believers that these initial results indeed originated from Majorana states) and skeptics (who were pointing out that effects, other than Majoranas, can result in similarly looking zero energy peaks). And thus a great debate started.

In the coming years, experimentalists tried to observe zero energy features in improved devices, track how these features evolve with external parameters, such as gate voltages, length of the wires, etc., or focus on completely different platforms for hosting Majorana states, such as magnetic flux vortices in topological superconductors and magnetic atomic chains placed on a superconducting surface.  However, these results were not enough to convince skeptics that the observed states indeed originated from the Majoranas and not some other yet-to-be-discovered phenomenon. And so, the debate continued. With each generation of the experiments some of the alternative proposed scenarios were ruled out, but the final verification was still missing.

Fast forward to the events of this Fall and the exciting recent results. The manuscript I would like to invite you to read was just posted on ArXiv a couple of weeks ago. The main result is the observation of the perfectly quantized 2e2/h conductance at zero energy, the long sought signature of the Majorana states. This quantization implies that in this latest generation of semiconducting-superconducting devices zero-energy states exhibit perfect electron-hole symmetry and thus allow for perfect Andreev reflection. These remarkable results may finally end the debate and convince most of the skeptics out there.

Fig_blog

Figure 1. (a,b) Comparison between devices and measurements from 2012 and 2017. (a) In 2012 a device made by combining a superconductor (Niobium Titanium Nitride alloy) and Indium Antimonide nanowire resulted in the first signature of zero energy states but the conductance peak was only about 0.1 x e2/h. Adapted from Mourik et al. Science 2012. (b) Similar device from 2017 made by carefully depositing superconducting Aluminum on Indium Arsenide. The fully developed 2e2/h conductance peak was observed. Adapted from Zhang et. al. ArXiv 2017. (c) Schematics of the Andreev reflection through the Normal (N)/Superconductor (S) interface. (d,e) Alternative view of the Andreev reflection process as a tunneling through a double barrier without and with Majorana modes (shown in yellow).

To fully appreciate these results, it is useful to quickly review the physics of Andreev reflection (Fig. 1c-e) that occurs at the interface between a normal region with a superconductor [4]. As the electron (blue) in the normal region enters a superconductor and pulls an additional electron with it to form a Copper pair, an extra hole (red) is left behind (Fig. 1(c)). You can also think about this process as the transmission through two leads, one connecting the superconductor to the electrons and the other to the holes (Fig. 1d). This allows us to view this problem as a transmission through the double barrier that is generally low. In the presence of a Majorana state, however, there is a resonant level at zero energy which is coupled with the same amplitude with both electrons and holes. This in turn results in the resonant Andreev reflection with a perfect quantization of 2e2/h (Fig. 1e). Note that, even in the configuration without Majorana modes, perfect quantization is possible but highly unlikely as it requires very careful tuning of the barrier potential (the authors did show that their quantization is robust against tuning the voltages on the gates, ruling out this possibility).

Going back to the experiments, you may wonder what made this breakthrough possible? It seems to be the combination of various factors, including using epitaxially grown  superconductors and more sophisticated fabrication methods. As often happens in experimental physics, this milestone did not come from one ingenious idea, but rather from numerous technical improvements obtained by several generations of hard-working grad students and postdocs.

If you are up for more Majorana reading, you can find two more recent eye-catching manuscripts here and here. Note that the list of interesting recent Majorana papers is a mere selection by the author and not complete by any means. A few months ago, my IQIM colleagues wrote a nice blog entry about topological qubits arriving in 2018. Although this may sound overly optimistic, the recent results suggest that the field is definitely taking off. While there are certainly many challenges to be solved, we may see the next generation of experiments designed to probe control over the Majorana states quite soon. Stay tuned for more!!!!!!


November 19, 2017

Chad OrzelMeet Charlie

It’s been a couple of years since we lost the Queen of Niskayuna, and we’ve held off getting a dog until now because we were planning a big home renovation– adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking, and we would not have wanted to put a dog through that. It was bad enough putting us through that…

Withe the renovation complete, we started looking for a dog a month or so back, and eventually ended up working with a local rescue group with the brilliantly unsubtle name Help Orphan Puppies. This weekend, we officially adopted this cutie:

Charlie, the new pupper at Chateau Steelypips, showing off his one pointy ear.

He was listed on the website as “Prince,” but his foster family had been calling him “Charlie,” and the kids liked that name a lot, so we’re keeping it. He’s a Plott Hound mix (the “mix” being evident in the one ear that sticks up while the other flops down), one of six puppies found with his mother back in May in a ravine in I think they said South Carolina. He’s the last of the litter to find a permanent home. The name change is appropriate, as Emmy was listed as “Princess” before we adopted her and changed her name.

Charlie’s a sweet and energetic boy, who’s basically housebroken, and sorta-kinda crate trained, which is about the same as Emmy when we got her. He knows how to sit, and is learning other commands. He’s very sweet with people, and we haven’t really met any other dogs yet, but he was fostered in a home with two other dogs, so we hope he’ll do well. And he’s super good at jumping– he cleared a 28″ child safety gate we were attempting to use to keep him in the mud room– and does a zoom with the best of them:

Charlie does a zoom.

The kids are absolutely over the moon about having a dog again, as you can see from their paparazzi turn:

Charlie poses for the paparazzi.

He’s a very good boy, all in all, and we’re very pleased to have him. I can’t really describe how good it felt on Saturday afternoon to once again settle down on the couch with a football game on tv, and drop my hand down to pet a dog lying on the floor next to me. I still miss some things about Emmy, but Charlie’s already filling a huge void.

Chad OrzelGo On Till You Come to the End; Then Stop

ScienceBlogs is coming to an end. I don’t know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I’ve been absolutely getting crushed between work and the book-in-progress and getting Charlie the pupper, but I did manage to export and re-import the content to an archive site back on steelypips.org. (The theme there is an awful default WordPress one, but I’m too slammed with work to make it look better; the point is just to have an online archive for the temporary redirects to work with.)

I’m one of a handful who were there from the very beginning to the bitter end– I got asked to join up in late 2005, and the first new post here was on January 11,2016 (I copied over some older content before it went live, so it wasn’t just a blank page with a “Welcome to my new blog!” post). It seems fitting to have the last post be on the site’s last day of operation.

The history of ScienceBlogs and my place in it was… complicated. There were some early efforts to build a real community among the bloggers, but we turned out to be an irascible lot, and after a while that kind of fell apart. The site was originally associated with Seed magazine, which folded, then it was a stand-alone thing for a bit, then partnered with National Geoographic, and the last few years it’s been an independent entity again. I’ve been mostly blogging at Forbes since mid-2015, so I’ve been pretty removed from the network– I’m honestly not even sure what blogs have been active in the past few years. I’ll continue to blog at Forbes, and may or may not re-launch more personal blogging at the archive site. A lot of that content is now posted to

What led to the slow demise of ScienceBlogs? Like most people who’ve been associated with it over the years, I have Thoughts on the subject, but I don’t really feel like airing them at this point. (If somebody else wants to write an epic oral history of SB, email me, and we can talk…) I don’t think it was ever going to be a high-margin business, and there were a number of mis-steps over the years that undercut the money-making potential even more. I probably burned or at least charred some bridges by staying with the site as long as I did, but whatever. And it’s not like anybody else is getting fabulously wealthy from running blog networks that pay reasonable rates.

ScienceBlogs unquestionably gave an enormous boost to my career. I’ve gotten any number of cool opportunities as a direct result of blogging here, most importantly my career as a writer of pop-physics books. There were some things along the way that didn’t pan out as I’d hoped, but this site launched me to what fame I have, and I’ll always be grateful for that.

So, ave atque vale, ScienceBlogs. It was a noble experiment, and the good days were very good indeed.

November 16, 2017