Planet Musings

March 27, 2015

Georg von HippelWorkshop "Fundamental Parameters from Lattice QCD" at MITP (upcoming deadline)

Recent years have seen a significant increase in the overall accuracy of lattice QCD calculations of various hadronic observables. Results for quark and hadron masses, decay constants, form factors, the strong coupling constant and many other quantities are becoming increasingly important for testing the validity of the Standard Model. Prominent examples include calculations of Standard Model parameters, such as quark masses and the strong coupling constant, as well as the determination of CKM matrix elements, which is based on a variety of input quantities from experiment and theory. In order to make lattice QCD calculations more accessible to the entire particle physics community, several initiatives and working groups have sprung up, which collect the available lattice results and produce global averages.

The scientific programme "Fundamental Parameters from Lattice QCD" at the Mainz Institute of Theoretical Physics (MITP) is designed to bring together lattice practitioners with members of the phenomenological and experimental communities who are using lattice estimates as input for phenomenological studies. In addition to sharing the expertise among several communities, the aim of the programme is to identify key quantities which allow for tests of the CKM paradigm with greater accuracy and to discuss the procedures in order to arrive at more reliable global estimates.

The deadline for registration is Tuesday, 31 March 2015.

Dave BaconWhat If Papers Had APIs?

API is an abbreviation that stands for “Application Program Interface.” Roughly speaking an API is a specification of a software component in terms of the operations one can perform with that component. For example, a common kind of an API is the set of methods supported by a encapsulated bit of code a.k.a. a library (for example, a library could have the purpose of “drawing pretty stuff on the screen”, the API is then the set of commands like “draw a rectangle”, and specify how you pass parameters to this method, how rectangles overlay on each other, etc.) Importantly the API is supposed to specify how the library functions, but does this in a way that is independent of the inner workings of the library (though this wall is often broken in practice). Another common API is found when a service exposes remote calls that can be made to manipulate and perform operations on that service. For example, Twitter supports an API for reading and writing twitter data. This later example, of a service exposing a set of calls that can manipulate the data stored on a remote server, is particularly powerful, because it allows one to gain access to data through simple access to a communication network. (As an interesting aside, see this rant for why APIs are likely key to some of Amazon’s success.)

jdrzxAs you might guess, (see for example my latest flop Should Papers Have Unit Tests?), I like smooshing together disparate concepts and seeing what comes out the other side. When thinking about APIs then led me to consider the question “What if Papers had APIs”?

In normal settings academic papers are considered to be relatively static objects. Sure papers on the arXiv, for example, have versions (some more than others!) And there are efforts like Living Reviews in Relativity, where review articles are updated by the authors. But in general papers exist, as fixed “complete” works. In programming terms we would say that are “immutable”. So if we consider the question of exposing an API for papers, one might think that this might just be a read only API. And indeed this form of API exists for many journals, and also for the arXiv. These forms of “paper APIs” allow one to read information, mostly metadata, about a paper.

But what about a paper API that allows mutation? At first glance this heresy is rather disturbing: allowing calls from outside of a paper to change the content of the paper seems dangerous. It also isn’t clear what benefit could come from this. With, I think, one exception. Citations are the currency of academia (last I checked they were still, however not fungible with bitcoins). But citations really only go in one direction (with exceptions for simultaneous works): you cite a paper whose work you build upon (or whose work you demonstrate is wrong, etc). What if a paper exposed a reverse citation index. That is, if I put my paper on the arXiv, and then, when you write your paper showing how my paper is horribly wrong, you can make a call to my paper’s api that mutates my paper and adds to it links to your paper. Of course, this seems crazy: what is to stop rampant back spamming of citations, especially by *ahem* cranks? Here it seems that one could implement a simple approval system for the receiving paper. If this were done on some common system, then you could expose the mutated paper either A) with approved mutations or B) with unapproved mutations (or one could go ‘social’ on this problem and allow voting on the changes).

What benefit would such a system confer? In some ways it would make more accessible something that we all use: the “cited by” index of services like Google Scholar. One difference is that it could be possible to be more precise in the reverse citation: for example while Scholar provides a list of relevant papers, if the API could expose the ability to add links to specific locations in a paper, one could arguably get better reverse citations (because, frankly, the weakness of the cited by indices is their lack of specificity).

What else might a paper API expose? I’m not convinced this isn’t an interesting question to ponder. Thanks for reading another wacko mashup episode of the Quantum Pontiff!

Dave BaconQIP 2015 Talks Available

Talks from QIP 2015 are now available on this YouTube channel. Great to see! I’m still amazed by the wondrous technology that allows me to watch talks given on the other side of the world, at my own leisure, on such wonderful quantum esoterica.

March 26, 2015

John BaezStationary Stability in Finite Populations

guest post by Marc Harper

A while back, in the article Relative entropy minimization in evolutionary dynamics, we looked at extensions of the information geometry / evolutionary game theory story to more general time-scales, incentives, and geometries. Today we’ll see how to make this all work in finite populations!

Let’s recall the basic idea from last time, which John also described in his information geometry series. The main theorem is this: when there’s an evolutionarily stable state for a given fitness landscape, the relative entropy between the stable state and the population distribution decreases along the population trajectories as they converge to the stable state. In short, relative entropy is a Lyapunov function. This is a nice way to look at the action of a population under natural selection, and it has interesting analogies to Bayesian inference.

The replicator equation is a nice model from an intuitive viewpoint, and it’s mathematically elegant. But it has some drawbacks when it comes to modeling real populations. One major issue is that the replicator equation implicitly assumes that the population proportions of each type are differentiable functions of time, obeying a differential equation. This only makes sense in the limit of large populations. Other closely related models, such as the Lotka-Volterra model, focus on the number of individuals of each type (e.g. predators and prey) instead of the proportion. But they often assume that the number of individuals is a differentiable function of time, and a population of 3.5 isn’t very realistic either.

Real populations of replicating entities are not infinitely large; in fact they are often relatively small and of course have whole numbers of each type, at least for large biological replicators (like animals). They take up space and only so many can interact meaningfully. There are quite a few models of evolution that handle finite populations and some predate the replicator equation. Models with more realistic assumptions typically have to leave the realm of derivatives and differential equations behind, which means that the analysis of such models is more difficult, but the behaviors of the models are often much more interesting. Hopefully by the end of this post, you’ll see how all of these diagrams fit together:

One of the best-known finite population models is the Moran process, which is a Markov chain on a finite population. This is the quintessential birth-death process. For a moment consider a population of just two types A and B. The state of the population is given by a pair of nonnegative integers (a,b) with a+b=N, the total number of replicators in the population, and a and b the number of individuals of type A and B respectively. Though it may artificial to fix the population size N, this often turns out not to be that big of a deal, and you can assume the population is at its carrying capacity to make the assumption realistic. (Lots of people study populations that can change size and that have replicators spatially distributed say on a graph, but we’ll assume they can all interact with each whenever they want for now).

A Markov model works by transitioning from state to state in each round of the process, so we need to define the transitions probabilities to complete the model. Let’s put a fitness landscape on the population, given by two functions f_A and f_B of the population state (a,b). Now we choose an individual to reproduce proportionally to fitness, e.g. we choose an A individual to reproduce with probability

\displaystyle{ \frac{a f_A}{a f_A + b f_B} }

since there are a individuals of type A and they each have fitness f_A. This is analogous to the ratio of fitness to mean fitness from the discrete replicator equation, since

\displaystyle{ \frac{a f_A}{a f_A + b f_B} =  \frac{\frac{a}{N} f_A}{\frac{a}{N} f_A + \frac{b}{N} f_B} \to \frac{x_i f_i(x)}{\overline{f(x)}} }

and the discrete replicator equation is typically similar to the continuous replicator equation (this can be made precise), so the Moran process captures the idea of natural selection in a similar way. Actually there is a way to recover the replicator equation from the Moran process in large populations—details at the end!

We’ll assume that the fitnesses are nonnegative and that the total fitness (the denominator) is never zero; if that seems artificial, some people prefer to transform the fitness landscape by e^{\beta f(x)}, which gives a ratio reminiscent of the Boltzmann or Fermi distribution from statistical physics, with the parameter \beta playing the role of intensity of selection rather than inverse temperature. This is sometimes called Fermi selection.

That takes care of the birth part. The death part is easier: we just choose an individual at random (uniformly) to be replaced. Now we can form the transition probabilities of moving between population states. For instance the probability of moving from state (a,b) to (a+1, b-1) is given by the product of the birth and death probabilities, since they are independent:

\displaystyle{ T_a^{a+1} = \frac{a f_A}{a f_A + b f_B} \frac{b}{N} }

since we have to chose a replicator of type A to reproduce and one of type B to be replaced. Similarly for (a,b) to (a-1, b+1) (switch all the a’s and b’s), and we can write the probability of staying in the state (a, N-a) as

T_a^{a} = 1 - T_{a}^{a+1} - T_{a}^{a-1}

Since we only replace one individual at a time, this covers all the possible transitions, and keeps the population constant.

We’d like to analyze this model and many people have come up with clever ways to do so, computing quantities like fixation probabilities (also known as absorption probabilities), indicating the chance that the population will end up with one type completely dominating, i.e. in state (0, N) or (N,0). If we assume that the fitness of type A is constant and simply equal to 1, and the fitness of type B is r \neq 1, we can calculate the probability that a single mutant of type B will take over a population of type A using standard Markov chain methods:

\displaystyle{\rho = \frac{1 - r^{-1}}{1 - r^{-N}} }

For neutral relative fitness (r=1), \rho = 1/N, which is the probability a neutral mutant invades by drift alone since selection is neutral. Since the two boundary states (0, N) or (N,0) are absorbing (no transitions out), in the long run every population ends up in one of these two states, i.e. the population is homogeneous. (This is the formulation referred to by Matteo Smerlak in The mathematical origins of irreversibility.)

That’s a bit different flavor of result than what we discussed previously, since we had stable states where both types were present, and now that’s impossible, and a bit disappointing. We need to make the population model a bit more complex to have more interesting behaviors, and we can do this in a very nice way by adding the effects of mutation. At the time of reproduction, we’ll allow either type to mutate into the other with probability \mu. This changes the transition probabilities to something like

\displaystyle{ T_a^{a+1} = \frac{a (1-\mu) f_A + b \mu f_B}{a f_A + b f_B} \frac{b}{N} }

Now the process never stops wiggling around, but it does have something known as a stationary distribution, which gives the probability that the population is in any given state in the long run.

For populations with more than two types the basic ideas are the same, but there are more neighboring states that the population could move to, and many more states in the Markov process. One can also use more complicated mutation matrices, but this setup is good enough to typically guarantee that no one species completely takes over. For interesting behaviors, typically \mu = 1/N is a good choice (there’s some biological evidence that mutation rates are typically inversely proportional to genome size).

Without mutation, once the population reached (0,N) or (N,0), it stayed there. Now the population bounces between states, either because of drift, selection, or mutation. Based on our stability theorems for evolutionarily stable states, it’s reasonable to hope that for small mutation rates and larger populations (less drift), the population should spend most of its time near the evolutionarily stable state. This can be measured by the stationary distribution which gives the long run probabilities of a process being in a given state.

Previous work by Claussen and Traulsen:

• Jens Christian Claussen and Arne Traulsen, Non-Gaussian fluctuations arising from finite populations: exact results for the evolutionary Moran process, Physical Review E 71 (2005), 025101.

suggested that the stationary distribution is at least sometimes maximal around evolutionarily stable states. Specifically, they showed that for a very similar model with fitness landscape given by

\left(\begin{array}{c} f_A \\ f_B \end{array}\right)  = \left(\begin{array}{cc} 1 & 2\\ 2&1 \end{array}\right)  \left(\begin{array}{c} a\\ b \end{array}\right)

the stationary state is essentially a binomial distribution centered at (N/2, N/2).

Unfortunately, the stationary distribution can be very difficult to compute for an arbitrary Markov chain. While it can be computed for the Markov process described above without mutation, and in the case studied by Claussen and Traulsen, there’s no general analytic formula for the process with mutation, nor for more than two types, because the processes are not reversible. Since we can’t compute the stationary distribution analytically, we’ll have to find another way to show that the local maxima of the stationary distribution are “evolutionarily stable”. We can approximate the stationary distribution fairly easily with a computer, so it’s easy to plot the results for just about any landscape and reasonable population size (e.g. N \approx 100).

It turns out that we can use a relative entropy minimization approach, just like for the continuous replicator equation! But how? We lack some essential ingredients such as deterministic and differentiable trajectories. Here’s what we do:

• We show that the local maxima and minima of the stationary distribution satisfy a complex balance criterion.

• We then show that these states minimize an expected relative entropy.

• This will mean that the current state and the expected next state are ‘close’.

• Lastly, we show that these states satisfy an analogous definition of evolutionary stability (now incorporating mutation).

The relative entropy allows us to measure how close the current state is to the expected next state, which captures the idea of stability in another way. This ports the relative minimization Lyapunov result to some more realistic Markov chain models. The only downside is that we’ll assume the populations are “sufficiently large”, but in practice for populations of three types, N=20 is typically enough for common fitness landscapes (there are lots of examples here for N=80, which are prettier than the smaller populations). The reason for this is that the population state (a,b) needs enough “resolution” (a/N, b/N) to get sufficiently close to the stable state, which is not necessarily a ratio of integers. If you allow some wiggle room, smaller populations are still typically pretty close.

Evolutionarily stable states are closely related to Nash equilibria, which have a nice intuitive description in traditional game theory as “states that no player has an incentive to deviate from”. But in evolutionary game theory, we don’t use a game matrix to compute e.g. maximum payoff strategies, rather the game matrix defines a fitness landscape which then determines how natural selection unfolds.

We’re going to see this idea again in a moment, and to help get there let’s introduce an function called an incentive that encodes how a fitness landscape is used for selection. One way is to simply replace the quantities a f_A(a,b) and b f_B(a,b) in the fitness-proportionate selection ratio above, which now becomes (for two population types):

\displaystyle{ \frac{\varphi_A(a,b)}{\varphi_A(a,b) + \varphi_B(a,b)} }

Here \varphi_A(a,b) and \varphi_B(a,b) are the incentive function components that determine how the fitness landscape is used for natural selection (if at all). We have seen two examples above:

\varphi_A(a,b) = a f_A(a, b)

for the Moran process and fitness-proportionate selection, and

\varphi_A(a,b) = a e^{\beta f_A(a, b)}

for an alternative that incorporates a strength of selection term \beta, preventing division by zero for fitness landscapes defined by zero-sum game matrices, such as a rock-paper-scissors game. Using an incentive function also simplifies the transition probabilities and results as we move to populations of more than two types. Introducing mutation, we can describe the ratio for incentive-proportion selection with mutation for the ith population type when the population is in state x=(a,b,\ldots) / N as

\displaystyle{ p_i(x) = \frac{\sum_{k=1}^{n}{\varphi_k(x) M_{i k} }}{\sum_{k=1}^{n}{\varphi_k(x)}} }

for some matrix of mutation probabilities M. This is just the probability that we get a new individual of the ith type (by birth and/or mutation). A common choice for the mutation matrix is to use a single mutation probability \mu and spread it out over all the types, such as letting

M_{ij} = \mu / (n-1)


M_{ii} = 1 - \mu

Now we are ready to define the expected next state for the population and see how it captures a notion of stability. For a given state population x in a multitype population, using x to indicate the normalized population state (a,b,\ldots) / N, consider all the neighboring states y that the population could move to in one step of the process (one birth-death cycle). These neighboring states are the result of increasing a population type by one (birth) and decreasing another by one (death, possibly the same type), of course excluding cases on the boundary where the number of individuals of any type drops below zero or rises above N. Now we can define the expected next state as the sum of neighboring states weighted by the transition probabilities

E(x) = \sum_{y}{y T_x^{y}}

with transition probabilities given by

T_{x}^{y} = p_{i}(x) x_{j}

for states y that differ in 1/N at the ith coordinate and -1/N at jth coordinate from x. Here x_j is just the probability of the random death of an individual of the jth type, so the transition probabilities are still just birth (with mutation) and death as for the Moran process we started with.

Skipping some straightforward algebraic manipulations, we can show that

\displaystyle{ E(x) = \sum_{y}{y T_x^{y}} = \frac{N-1}{N}x + \frac{1}{N}p(x)}

Then it’s easy to see that E(x) = x if and only if x = p(x), and that x = p(x) if and only if x_i = \varphi_i(x). So we have a nice description of ‘stability’ in terms of fixed points of the expected next state function and the incentive function

x = E(x) = p(x) = \varphi(x),

and we’ve gotten back to “no one has an incentive to deviate”. More precisely, for the Moran process

\varphi_i(x) = x_i f_i(x)

and we get back f_i(x) = f_j(x) for every type. So we take x = \varphi(x) as our analogous condition to an evolutionarily stable state, though it’s just the ‘no motion’ part and not also the ‘stable’ part. That’s what we need the stationary distribution for!

To turn this into a useful number that measures stability, we use the relative entropy of the expected next state and the current state, in analogy with the Lyapunov theorem for the replicator equation. The relative entropy

\displaystyle{ D(x, y) = \sum_i x_i \ln(x_i) - y_i \ln(x_i) }

has the really nice property that D(x,y) = 0 if and only if x = y, so we can use the relative entropy D(E(x), x) as a measure of how close to stable any particular state is! Here the expected next state takes the place of the ‘evolutionarily stable state’ in the result described last time for the replicator equation.

Finally, we need to show that the maxima (and minima) of of the stationary distribution are these fixed points by showing that these states minimize the expected relative entropy.

Seeing that local maxima and minima of the stationary distribution minimize the expected relative entropy is a more involved, so let’s just sketch the details. In general, these Markov processes are not reversible, so they don’t satisfy the detailed-balance condition, but the stationary probabilities do satisfy something called the global balance condition, which says that for the stationary distribution s we have that

s_x \sum_{x}{T_x^{y}} = \sum_{y}{s_y T_y^{x}}

When the stationary distribution is at a local maximum (or minimum), we can show essentially that this implies (up to an \epsilon, for a large enough population) that

\displaystyle{\sum_{x}{T_x^{y}} = \sum_{y}{T_y^{x}} }

a sort of probability inflow-outflow equation, which is very similar to the condition of complex balanced equilibrium described by Manoj Gopalkrishnan in this Azimuth post. With some algebraic manipulation, we can show that these states have E(x)=x.

Now let’s look again at the figures from the start. The first shows the vector field of the replicator equation:

You can see rest points at the center, on the center of each boundary edge, and on the corner points. The center point is evolutionarily stable, the center points of the boundary are semi-stable (but stable when the population is restricted to a boundary simplex), and the corner points are unstable.

This one shows the stationary distribution for a finite population model with a Fermi incentive on the same landscape, for a population of size 80:

A fixed population size gives a partitioning of the simplex, and each triangle of the partition is colored by the value of the stationary distribution. So you can see that there are local maxima in the center and on the centers of the triangle boundary edges. In this case, the size of the mutation probability determines how much of the stationary distribution is concentrated on the center of the simplex.

This shows one-half of the Euclidean distance squared between the current state and the expected next state:

And finally, this shows the same thing but with the relative entropy as the ‘distance function':

As you can see, the Euclidean distance is locally minimal at each of the local maxima and minima of the stationary distribution (including the corners); the relative entropy is only guaranteed so on the interior states (because the relative entropy doesn’t play nicely with the boundary, and unlike the replicator equation, the Markov process can jump on and off the boundary). It turns out that the relative Rényi entropies for q between 0 and 1 also work just fine, but for the large population limit (the replicator dynamic), the relative entropy is the somehow the right choice for the replicator equation (has the derivative that easily gives Lyapunov stability), which is due to the connections between relative entropy and Fisher information in the information geometry of the simplex. The Euclidean distance is the q=0 case and the ordinary relative entropy is q=1.

As it turns out, something very similar holds for another popular finite population model, the Wright–Fisher process! This model is more complicated, so if you are interested in the details, check out our paper, which has many nice examples and figures. We also define a process that bridges the gap between the atomic nature of the Moran process and the generational nature of the Wright–Fisher process, and prove the general result for that model.

Finally, let’s see how the Moran process relates back to the replicator equation (see also the appendix in this paper), and how we recover the stability theory of the replicator equation. We can use the transition probabilities of the Moran process to define a stochastic differential equation (called a Langevin equation) with drift and diffusion terms that are essentially (for populations with two types:

\mathrm{Drift}(x) = T^{+}(x) - T^{-}(x)

\displaystyle{ \mathrm{Diffusion}(x) = \sqrt{\frac{T^{+}(x) + T^{-}(x)}{N}} }

As the population size gets larger, the diffusion term drops out, and the stochastic differential equation becomes essentially the replicator equation. For the stationary distribution, the variance (e.g. for the binomial example above) also has an inverse dependence on N, so the distribution limits to a delta-function that is zero except for at the evolutionarily stable state!

What about the relative entropy? Loosely speaking, as the population size gets larger, the iteration of the expected next state also becomes deterministic. Then the evolutionarily stable states is a fixed point of the expected next state function, and the expected relative entropy is essentially the same as the ordinary relative entropy, at least in a neighborhood of the evolutionarily stable state. This is good enough to establish local stability.

Earlier I said both the local maxima and minima minimize the expected relative entropy. Dash and I haven’t proven that the local maxima always correspond to evolutionarily stable states (and the minima to unstable states). That’s because the generalization of evolutionarily stable state we use is really just a ‘no motion’ condition, and isn’t strong enough to imply stability in a neighborhood for the deterministic replicator equation. So for now we are calling the local maxima stationary stable states.

We’ve also tried a similar approach to populations evolving on networks, which is a popular topic in evolutionary graph theory, and the results are encouraging! But there are many more ‘states’ in such a process, since the configuration of the network has to be taken into account, and whether the population is clustered together or not. See the end of our paper for an interesting example of a population on a cycle.

Clifford JohnsonFramed Graphite

framed_graphiteIt took a while, but I got this task done. (Click for a slightly larger view.)Things take a lot longer these days, because...newborn. You'll recall that I did a little drawing of the youngster very soon after his arrival in December. Well, it was decided a while back that it should be on display on a wall in the house rather than hide in my notebooks like my other sketches tend to do. This was a great honour, but presented me with difficulty. I have a rule to not take any pages out of my notebooks. You'll think it is nuts, but you'll find that this madness is shared by many people who keep notebooks/sketchbooks. Somehow the whole thing is a Thing, if you know what I mean. To tear a page out would be a distortion of the record.... it would spoil the archival aspect of the book. (Who am I kidding? I don't think it likely that future historians will be poring over my notebooks... but I know that future Clifford will be, and it will be annoying to find a gap.) (It is sort of like deleting comments from a discussion on a blog post. I try not to do that without good reason, and I leave a trail to show that it was done if I must.) Anyway, where was I? Ah. Pages. Well, I had to find a way of making a framed version of the drawing that kept the spirit and feel of the drawing intact while [...] Click to continue reading this post

March 25, 2015

John PreskillPutting back the pieces of a broken hologram

It is Monday afternoon and the day seems to be a productive one, if not yet quite memorable. As I revise some notes on my desk, Beni Yoshida walks into my office to remind me that the high-energy physics seminar is about to start. I hesitate, somewhat apprehensive of the near-certain frustration of being lost during the first few minutes of a talk in an unfamiliar field. I normally avoid such a situation, but in my email I find John’s forecast for an accessible talk by Daniel Harlow and a title with three words I can cling onto. “Quantum error correction” has driven my curiosity for the last seven years. The remaining acronyms in the title will become much more familiar in the four months to come.

Most of you are probably familiar with holograms, these shiny flat films representing a 3D object from essentially any desired angle. I find it quite remarkable how all the information of a 3D object can be printed on an essentially 2D film. True, the colors are not represented as faithfully as in a traditional photograph, but it looks as though we have taken a photograph from every possible angle! The speaker’s main message that day seemed even more provocative than the idea of holography itself. Even if the hologram is broken into pieces, and some of these are lost, we may still use the remaining pieces to recover parts of the 3D image or even the full thing given a sufficiently large portion of the hologram. The 3D object is not only recorded in 2D, it is recorded redundantly!

Left to right: Beni Yoshida, Aleksander Kubica, Aidan Chatwin-Davies and Fernando Pastawski discussing holographic codes.

Left to right: Beni Yoshida, Aleksander Kubica, Aidan Chatwin-Davies and Fernando Pastawski discussing holographic codes.

Half way through Daniel’s exposition, Beni and I exchange a knowing glance. We recognize a familiar pattern from our latest project. A pattern which has gained the moniker of “cleaning lemma” within the quantum information community which can be thought of as a quantitative analog of reconstructing the 3D image from pieces of the hologram. Daniel makes connections using a language that we are familiar with. Beni and I discuss what we have understood and how to make it more concrete as we stride back through campus. We scribble diagrams on the whiteboard and string words such as tensor, encoder, MERA and negative curvature into our discussion. An image from the web gives us some intuition on the latter. We are onto something. We have a model. It is simple. It is new. It is exciting.

Poincare projection of a regular pentagon tiling of negatively curved space.

Poincare projection of a regular pentagon tiling of negatively curved space.

Food has not come our way so we head to my apartment as we enthusiastically continue our discussion. I can only provide two avocados and some leftover pasta but that is not important, we are sharing the joy of insight. We arrange a meeting with Daniel to present our progress. By Wednesday Beni and I introduce the holographic pentagon code at the group meeting. A core for a new project is already there, but we need some help to navigate the high-energy waters. Who better to guide us in such an endeavor than our mentor, John Preskill, who recognized the importance of quantum information in Holography as early as 1999 and has repeatedly proven himself a master of both trades.

“I feel that the idea of holography has a strong whiff of entanglement—for we have seen that in a profoundly entangled state the amount of information stored locally in the microscopic degrees of freedom can be far less than we would naively expect. For example, in the case of the quantum error-correcting codes, the encoded information may occupy a small ‘global’ subspace of a much larger Hilbert space. Similarly, the distinct topological phases of a fractional quantum Hall system look alike locally in the bulk, but have distinguishable edge states at the boundary.”
-J. Preskill, 1999

As Beni puts it, the time for using modern quantum information tools in high-energy physics has come. By this he means quantum error correction and maybe tensor networks. First privately, then more openly, we continue to sharpen and shape our project. Through conferences, Skype calls and emails, we further our discussion and progressively shape ideas. Many speculations mature to conjectures and fall victim to counterexamples. Some stand the test of simulations or are even promoted to theorems by virtue of mathematical proofs.

Beni Yoshida presenting our work at a quantum entanglement conference in Puerto Rico.

Beni Yoshida presenting our work at a quantum entanglement conference in Puerto Rico.

I publicly present the project for the first time at a select quantum information conference in Australia. Two months later, after a particularly intense writing, revising and editing process, the article is almost complete. As we finalize the text and relabel the figures, Daniel and Beni unveil our work to quantum entanglement experts in Puerto Rico. The talks are a hit and it is time to let all our peers read about it.

You are invited to do so and Beni will even be serving a reader’s guide in an upcoming post.

Chad Orzel“Talking Dogs and Galileian Blogs” at Vanderbilt, Thursday 3/26/15

I mentioned last week that I’m giving a talk at Vanderbilt tomorrow, but as they went to the trouble of writing a press release, the least I can do is share it:

It’s clear that this year’s Forman lecturer at Vanderbilt University, Chad Orzel, will talk about physics to almost anyone.

After all, two of his popular science books are How to Teach Physics to Your Dog and How to Teach Relativity to Your Dog. Orzel, an associate professor of physics at Union College in New York and author of the ScienceBlog “Uncertain Principles,” is scheduled to speak on campus at 3 p.m. Thursday, March 26.

As is the custom among my people, I sent them a title and abstract:

Title: Talking Dogs and Galileian Blogs: Social Media for Communicating Science

Abstract: Modern social media technologies provide an unprecedented opportunity to engage and inform a broad audience about the practice and products of science. Such outreach efforts are critically important in an era of funding cuts and global crises that demand scientific solutions. In this talk I’ll offer examples and advice on the use of social media for science communication, drawn from more than a dozen years of communicating science online.

This shares some DNA with the evangelical blogging-as-outreach talk I’ve been giving off and on for several years, but that was getting a little outdated. So I decided to blow it up and make a new version, which I nearly have finished… with less than 24 hours before my flight to Tennessee. Whee!

Anyway, if you’re in the Nashville area or could be on really short notice, stop by. Otherwise, stay tuned for Exciting! Blogging! News! early next week (give or take).

BackreactionNo, the LHC will not make contact with parallel universes

Evidence for rainbow gravity by butterfly
production at the LHC.

The most recent news about quantum gravity phenomenology going through the press is that the LHC upon restart at higher energies will make contact with parallel universes, excuse me, with PARALLEL UNIVERSES. The telegraph even wants you to believe that this would disprove the Big Bang, and tomorrow maybe it will cause global warming, cure Alzheimer and lead to the production of butterflies at the LHC, who knows. This story is so obviously nonsense that I thought it would be unnecessary to comment on this, but I have underestimated the willingness of news outlets to promote shallow science, and also the willingness of authors to feed that fire.

This story is based on the paper:
    Absence of Black Holes at LHC due to Gravity's Rainbow
    Ahmed Farag Ali, Mir Faizal, Mohammed M. Khalil
    arXiv:1410.4765 [hep-th]
    Phys.Lett. B743 (2015) 295
which just got published in PLB. Let me tell you right away that this paper would not have passed my desk. I'd have returned it as major revisions necessary.

Here is a summary of what they have done. In models with large additional dimensions, the Planck scale, where effects of quantum gravity become important, can be lowered to energies accessible at colliders. This is an old story that was big 15 years ago or so, and I wrote my PhD thesis on this. In the new paper they use a modification of general relativity that is called "rainbow gravity" and revisit the story in this framework.

In rainbow gravity the metric is energy-dependent which it normally is not. This energy-dependence is a non-standard modification that is not confirmed by any evidence. It is neither a theory nor a model, it is just an idea that, despite more than a decade of work, never developed into a proper model. Rainbow gravity has not been shown to be compatible with the standard model. There is no known quantization of this approach and one cannot describe interactions in this framework at all. Moreover, it is known to lead to non-localities with are ruled out already. For what I am concerned, no papers should get published on the topic until these issues have been resolved.

Rainbow gravity enjoys some popularity because it leads to Planck scale effects that can affect the propagation of particles, which could potentially be observable. Alas, no such effects have been found. No such effects have been found if the Planck scale is the normal one! The absolutely last thing you want to do at this point is argue that rainbow gravity should be combined with large extra dimensions, because then its effects would get stronger and probably be ruled out already. At the very least you would have to revisit all existing constraints on modified dispersion relations and reaction thresholds and so on. This isn't even mentioned in the paper.

That isn't all there is to say though. In their paper, the authors also unashamedly claim that such a modification has been predicted by Loop Quantum Gravity, and that it is a natural incorporation of effects found in string theory. Both of these statements are manifestly wrong. Modifications like this have been motivated by, but never been derived from Loop Quantum Gravity. And String Theory gives rise to some kind of minimal length, yes, but certainly not to rainbow gravity; in fact, the expression of the minimal length relation in string theory is known to be incompatible with the one the authors use. The claims that this model they use has some kind of derivation or even a semi-plausible motivation from other theories is just marketing. If I had been a referee of this paper, I would have requested that all these wrong claims be scraped.

In the rest of the paper, the authors then reconsider the emission rate of black holes in extra dimension with the energy-dependent metric.

They erroneously state that the temperature diverges when the mass goes to zero and that it comes to a "catastrophic evaporation". This has been known to be wrong since 20 years. This supposed catastrophic evaporation is due to an incorrect thermodynamical treatment, see for example section 3.1 of this paper. You do not need quantum gravitational effects to avoid this, you just have to get thermodynamics right. Another reason to not publish the paper. To be fair though, this point is pretty irrelevant for the rest of the authors' calculation.

They then argue that rainbow gravity leads to black hole remnants because the temperature of the black hole decreases towards the Planck scale. This isn't so surprising and is something that happens generically in models with modifications at the Planck scale, because they can bring down the final emission rate so that it converges and eventually stops.

The authors then further claim that the modification from rainbow gravity affects the cross-section for black hole production, which is probably correct, or at least not wrong. They then take constraints on the lowered Planck scale from existing searches for gravitons (ie missing energy) that should also be produced in this case. They use the contraints obtained from the graviton limits to say that with these limits, black hole production should not yet have been seen, but might appear in the upcoming LHC runs. They should not of course have used the constaints from a paper that were obtained in a scenario without the rainbow gravity modification, because the production of gravitons would likewise be modified.

Having said all that, the conclusion that they come to that rainbow gravity may lead to black hole remnants and make it more difficult to produce black holes is probably right, but it is nothing new. The reason is that these types of models lead to a generalized uncertainty principle, and all these calculations have been done before in this context. As the authors nicely point out, I wrote a paper already in 2004 saying that black hole production at the LHC should be suppressed if one takes into account that the Planck length acts as a minimal length.

Yes, in my youth I worked on black hole production at the LHC. I gracefully got out of this when it became obvious there wouldn't be black holes at the LHC, some time in 2005. And my paper, I should add, doesn't work with rainbow gravity but with a Lorentz-invariant high-energy deformation that only becomes relevant in the collision region and thus does not affect the propagation of free particles. In other words, in contrast to the model that the authors use, my model is not already ruled out by astrophysical constraints. The relevant aspects of the argument however are quite similar, thus the similar conclusions: If you take into account Planck length effects, it becomes more difficult to squeeze matter together to form a black hole because the additional space-time distortion acts against your efforts. This means you need to invest more energy than you thought to get particles close enough to collapse and form a horizon.

What does any of this have to do with paralell universes? Nothing, really, except that one of the authors, Mir Faizal, told some journalist there is a connection. In the piece one can read:
""Normally, when people think of the multiverse, they think of the many-worlds interpretation of quantum mechanics, where every possibility is actualized," Faizal told "This cannot be tested and so it is philosophy and not science. This is not what we mean by parallel universes. What we mean is real universes in extra dimensions. As gravity can flow out of our universe into the extra dimensions, such a model can be tested by the detection of mini black holes at the LHC. We have calculated the energy at which we expect to detect these mini black holes in gravity's rainbow [a new theory]. If we do detect mini black holes at this energy, then we will know that both gravity's rainbow and extra dimensions are correct."
To begin with rainbow gravity is neither new nor a theory, but that addition seems to be the journalist's fault. For what the parallel universes are concerned, to get these in extra dimensions you would need to have additional branes next to our own one and there is nothing like this in the paper. What this has to do with the multiverse I don't know, that's an entirely different story. Maybe this quote was taken out of context.

Why does the media hype this nonsense? Three reasons I can think of. First, the next LHC startup is near and they're looking for a hook to get the story across. Black holes and parallel universes sound good, regardless of whether this has anything to do with reality. Second, the paper shamelessly overstates the relevance of the investigation, makes claims that are manifestly wrong, and fails to point out the miserable state that the framework they use is in. Third, the authors willingly feed the hype in the press.

Did the topic of rainbow gravity and the author's name, Mir Faizal, sound familiar? That's because I wrote about both only a month ago, when the press was hyping another nonsense story about black holes in rainbow gravity with the same author. In that previous paper they claimed that black holes in rainbow gravity don't have a horizon and nothing was mentioned about them forming remnants. I don't see how these both supposed consequences of rainbow gravity are even compatible with each other. If anything this just reinforces my impression that this isn't physics, it's just fanciful interpretation of algebraic manipulations that have no relation to reality whatsoever.

In summary: The authors work in a framework that combines rainbow gravity with a lowered Planck scale, which is already ruled out. They derive bounds on black hole production using existing data analysis that does not apply in the framework they use. The main conclusion that Planck length effects should suppress black hole production at the LHC is correct, but this has been known since 10 years at least. None of this has anything to do with parallel universes.

March 24, 2015

Doug NatelsonBrief items, public science outreach edition

Here are a couple of interesting things I've come across in terms of public science outreach lately:

  • I generally f-ing love "I f-ing love science" - they reach a truly impressive number of people, and they usually do a good job of conveying why science itself (beyond just particular results) is fun.  That being said, I've started to notice lately that in the physics and astro stories they run they sometimes either use inaccurate/hype-y headlines or report what is basically a press release completely uncritically.  For instance, while it fires the mind of science fiction fans everywhere, I don't think it's actually good that IFLS decided to highlight a paper from the relatively obscure journal Phys. Lett. B and claim in a headline that the LHC could detect extra spatial dimensions by making mini black holes.  Sure.  And SETI might detect a signal next week.  What are the odds that this will actually take place?  Similarly, the headline "Spacetime foam discovery proves Einstein right" implies that someone has actually observed signatures of spacetime foam.  In fact, the story is the exact opposite:  Observations of photons from gamma ray bursts have shown no evidence of "foaminess" of spacetime, meaning that general relativity (without any exotic quantumness) can explain the results.   A little improved quality control on the selection and headlines particularly on the high energy/astro stories would be great, thanks.
  • There was an article in the most recent APS News that got me interested in Alan Alda's efforts at Stony Brook on communicating science to the public.  Alda, who hosted Scientific American Frontiers and played Feynman on Broadway, has dedicated a large part of his time in recent years to the cause of trying to spread the word to the general public about what science is, how it works, how it often involves compelling narratives, and how it is in many ways a pinnacle of human achievement.  He is a fan of "challenge" contests, where participants are invited to submit a 300-word non-jargony explanation of some concept or phenomenon (e.g., "What is a flame?", "What is sleep?").  This is really hard to do well!  
  • Vox has an article that isn't surprising at all:  Uncritical, hype-filled reporting of medical studies leads to news articles that give conflicting information to the public, and contributes to a growing sense among the lay-people that science is untrustworthy or a matter of opinion.  Sigh.
  • Occasionally deficit-hawk politicians realize that science research can benefit them by, e.g., curing cancer.  If only they thought that basic research itself was valuable.

David Hogghealth

I took a physical-health day today, which means I stayed at home and worked on my students' projects, including commenting on drafts, manuscripts, or plots from Malz, Vakili, and Wang.

David HoggSimons Center for Data Analysis

Bernhard Schölkopf arrived for a couple of days of work. We spent the morning discussing radio interferometry, Kepler light-curve modeling, and various thing philosophical. We headed up to the Simons Foundation to the Simons Center for Data Analysis for lunch. We had lunch with Marina Spivak (Simons) and Jim Simons (Simons). With the latter I discussed the issues of finding exoplanet rings, moons, and Trojans.

After lunch we ran into Leslie Greengard (Simons) and Alex Barnett (Dartmouth), with whom we had a long conversation about the linear algebra of non-compact kernel matrices on the sphere. This all relates to tractable non-approximate likelihood functions for the cosmic microwave background. The conversation ranged from cautiously optimistic (that we could do this for Planck-like data sets) to totally pessimistic, ending on an optimistic note. The day ended with a talk by Laura Haas (IBM) about infrastructure (and social science) she has been building (at IBM and in academic projects around data-driven science and discovery. She showed a great example of drug discovery (for cancer) by automated "reading" of the literature.

Chad OrzelHow Does Angular Momentum Emerge?

Yesterday’s post about VPython simulation of the famous bicycle wheel demo showed that you can get the precession and nutation from a simulation that only includes forces. But this is still kind of mysterious, from the standpoint of basic physics intuition. Specifically, it’s sort of hard to see how any of this produces a force up and to the left, as required for the precession to happen.

I spent a bunch of time last night drawing pictures and writing equations, and I think I have the start of an explanation. It all comes down to the picture of rigid objects as really stiff springs– the grey lines in the “featured image” above. If we imagine a start condition where all the springs are at their relaxed length, then look a short instant of time later, I think I can see where the force is coming from.

The two instants I want to imaging are shown schematically here– think of this as an end-on view of the rotating “wheel”:

Cartoon of the spinning "wheel." On the left, the original orientation with the gravitational force (green arrows) and initial linear momentum (red arrows) marked. On the right, the "wheel" a short time later, where all the balls have fallen and moved slightly.

Cartoon of the spinning “wheel.” On the left, the original orientation with the gravitational force (green arrows) and initial linear momentum (red arrows) marked. On the right, the “wheel” a short time later, where all the balls have fallen and moved slightly.

All five of the balls (the four on the “rim” and the one at the hub) are subject to a downward gravitational force, and also have some linear momentum. If they start out exactly horizontal and vertical with the springs making up the spokes at their relaxed length, gravity is the only force that matters, and it’s indicated by short downward-pointing green arrows. The linear momentum for each ball is indicated by the reddish arrow (which deliberately doesn’t quite touch the ball, the usual convention I employ to try to avoid mixing up force and momentum arrows). The top ball is moving to the right, the bottom to the left, and so on.

A short time later, you end up with the situation on the right in the figure: each ball has moved a bit in the direction of its initial momentum, and also fallen down slightly due to gravity. You might be saying “Wait, the left ball actually shifted up from its initial position,” which is true, but that’s because the initial momentum was exactly vertical. It’s moved up by less than it would’ve without gravity, though.

Each of the “spokes” here has now stretched a tiny bit, and will thus be exerting a force pulling the ball toward the center. That’s a good thing, because it’s exactly the centripetal force you need to keep the balls spinning around in a more-or-less circular path (since I started the springs with no stretch, they’ll actually wobble in and out a bit as they go around, but it’s just cleaner to see what’s going on if we start with no forces). All four spokes have stretched by exactly the same amount, though, because the hub has also fallen by the same amount (you can convince yourself of this with half a page or so of equations; I’m not going to bother). All four spokes will exert exactly the same force, so they won’t have any net effect on the motion of the hub. So, this won’t get you the precession effect; but we knew that, because just having the spokes wasn’t enough in the simulation, either.

However, there are four more “springs” here, namely the braces stretching back toward the pivot point, which in this diagram would be straight back into the screen some distance from the initial position of the hub. If we imagine those four springs also started at their relaxed length, the motion of the balls makes them stretch, as well.

And while another half-page of equations can show it conclusively, I think you can see from the picture that these four springs do not all stretch by the same amount. On the right half of the figure, the distance from the center of the left ball to the center of the dotted outline showing the original position of the hub is smaller than the distance from the right ball to the original position of the hub, showing that the right spring will be more stretched than the left. And likewise, the top ball is slightly closer to where the hub was than the bottom is, so the bottom spring is stretched more than the top.

That means that the bottom and right braces will exert larger forces on the bottom and right balls. What are the directions of these forces? Well, mostly into the screen, but the bottom brace will also pull upward, and a little bit to the right. And the right brace will pull to the left, and a little bit up. The leftward pull of the right brace will be considerably bigger than the rightward pull of the bottom brace, so these don’t cancel each other out. There’s a down-and-left pull from the top brace on the top ball, and a right-and-down pull from the left brace on the left ball, as well, but these will be smaller than the pulls from the bottom and right braces.

If we carried this forward another step into the future, those forces would get communicated to the hub– the right ball would shift left as well as down, and the bottom ball would fall less quickly, compressing those spokes a bit beyond what they otherwise would be, which will then push the hub up and to the left. And the hub will push on the other two balls, so the whole thing moves up and to the left. But moving up and to the left is exactly what you need to get the precession effect seen in the simulation, and in the cool bicycle wheel demo.

Now, working all this out in detail, and carrying it forward for many more time steps is just a miserable mathematical grind. Which is why I simulated it on a computer in the first place, and why you’d be a fool to try to calculate this on paper without using angular momentum. But having confirmed from the simulation that this really does work with only forces, the two-time-step business above convinces me that I understand the origin of the precession and nutation at least in a qualitative way, which is more than good enough for a blog.

Jordan EllenbergDavid English

Got the Sunday Times, which I don’t usually do, and in the Book Review letters section I saw a familiar name:  David English, of Somerville, MA.  I started noticing this guy when I was in grad school.  He writes letters to the editor.  A lot of letters to the editor.  Google finds about 10 pages of hits for his letters to the Times, starting in 1993 and continuing at a steady clip through the present.  He wrote to the New Yorker and New York Magazine, too. And I thought I remembered him showing up in the Globe letter column, too, but Google can’t find that.

Who is David English of Somerville, MA?  And has he actually had more letters to the New York Times published than anyone else alive?

March 23, 2015

John BaezA Networked World (Part 1)

guest post by David Spivak

The problem

The idea that’s haunted me, and motivated me, for the past seven years or so came to me while reading a book called The Moment of Complexity: our Emerging Network Culture, by Mark C. Taylor. It was a fascinating book about how our world is becoming increasingly networked—wired up and connected—and that this is leading to a dramatic increase in complexity. I’m not sure if it was stated explicitly there, but I got the idea that with the advent of the World Wide Web in 1991, a new neural network had been born. The lights had been turned on, and planet earth now had a brain.

I wondered how far this idea could be pushed. Is the world alive, is it a single living thing? If it is, in the sense I meant, then its primary job is to survive, and to survive it’ll have to make decisions. So there I was in my living room thinking, “oh my god, we’ve got to steer this thing!”

Taylor pointed out that as complexity increases, it’ll become harder to make sense of what’s going on in the world. That seemed to me like a big problem on the horizon, because in order to make good decisions, we need to have a good grasp on what’s occurring. I became obsessed with the idea of helping my species through this time of unprecedented complexity. I wanted to understand what was needed in order to help humanity make good decisions.

What seemed important as a first step is that we humans need to unify our understanding—to come to agreement—on matters of fact. For example, humanity still doesn’t know whether global warming is happening. Sure almost all credible scientists have agreed that it is happening, but does that steer money into programs that will slow it or mitigate its effects? This isn’t an issue of what course to take to solve a given problem; it’s about whether the problem even exists! It’s like when people were talking about Obama being a Muslim, born in Kenya, etc., and some people were denying it, saying he was born in Hawaii. If that’s true, why did he repeatedly refuse to show his birth certificate?

It is important, as a first step, to improve the extent to which we agree on the most obvious facts. This kind of “sanity check” is a necessary foundation for discussions about what course we should take. If we want to steer the ship, we have to make committed choices, like “we’re turning left now,” and we need to do so as a group. That is, there needs to be some amount of agreement about the way we should steer, so we’re not fighting ourselves.

Luckily there are a many cases of a group that needs to, and is able to, steer itself as a whole. For example as a human, my neural brain works with my cells to steer my body. Similarly, corporations steer themselves based on boards of directors, and based on flows of information, which run bureaucratically and/or informally between different parts of the company. Note that in neither case is there any suggestion that each part—cell, employee, or corporate entity—is “rational”; they’re all just doing their thing. What we do see in these cases is that the group members work together in a context where information and internal agreement is valued and often attained.

It seemed to me that intelligent, group-directed steering is possible. It does occur. But what’s the mechanism by which it happens, and how can we think about it? I figured that the way we steer, i.e., make decisions, is by using information.

I should be clear: whenever I say information, I never mean it “in the sense of Claude Shannon”. As beautiful as Shannon’s notion of information is, he’s not talking about the kind of information I mean. He explicitly said in his seminal paper that information in his sense is not concerned with meaning:

Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages.

In contrast, I’m interested in the semantic stuff, which flows between humans, and which makes possible decisions about things like climate change. Shannon invented a very useful quantitative measure of meaningless probability distributions.

That’s not the kind of information I’m talking about. When I say “I want to know what information is”, I’m saying I want to formulate the notion of human-usable semantic meaning, in as mathematical a way as possible.

Back to my problem: we need to steer the ship, and to do so we need to use information properly. Unfortunately, I had no idea what information is, nor how it’s used to make decisions (let alone to make good ones), nor how it’s obtained from our interaction with the world. Moreover, I didn’t have a clue how the minute information-handling at the micro-level, e.g., done by cells inside a body or employees inside a corporation, would yield information-handling at the macro (body or corporate) level.

I set out to try to understand what information is and how it can be communicated. What kind of stuff is information? It seems to follow rules: facts can be put together to form new facts, but only in certain ways. I was once explaining this idea to Dan Kan, and he agreed saying, “Yes, information is inherently a combinatorial affair.” What is the combinatorics of information?

Communication is similarly difficult to understand, once you dig into it. For example, my brain somehow enables me to use information and so does yours. But our brains are wired up in personal and ad hoc ways, when you look closely, a bit like a fingerprint or retinal scan. I found it fascinating that two highly personalized semantic networks could interface well enough to effectively collaborate.

There are two issues that I wanted to understand, and by to understand I mean to make mathematical to my own satisfaction. The first is what information is, as structured stuff, and what communication is, as a transfer of structured stuff. The second is how communication at micro-levels can create, or be, understanding at macro-levels, i.e., how a group can steer as a singleton.

Looking back on this endeavor now, I remain concerned. Things are getting increasingly complex, in the sorts of ways predicted by Mark C. Taylor in his book, and we seem to be losing some control: of the NSA, of privacy, of people 3D printing guns or germs, of drones, of big financial institutions, etc.

Can we expect or hope that our species as a whole will make decisions that are healthy, like keeping the temperature down, given the information we have available? Are we in the driver’s seat, or is our ship currently in the process of spiraling out of our control?

Let’s assume that we don’t want to panic but that we do want to participate in helping the human community to make appropriate decisions. A possible first step could be to formalize the notion of “using information well”. If we could do this rigorously, it would go a long way toward helping humanity get onto a healthy course. Further, mathematics is one of humanity’s best inventions. Using this tool to improve our ability to use information properly is a non-partisan approach to addressing the issue. It’s not about fighting, it’s about figuring out what’s happening, and weighing all our options in an informed way.

So, I ask: What kind of mathematics might serve as a formal ground for the notion of meaningful information, including both its successful communication and its role in decision-making?

Tommaso DorigoSpring Flukes: New 3-Sigma Signals From LHCb And ATLAS

Spring is finally in, and with it the great expectations for a new run of the Large Hadron Collider, which will restart in a month or so with a 62.5% increase in center of mass energy of the proton-proton collisions it produces: 13 TeV. At 13 TeV, the production of a 2-TeV Z' boson, say, would not be so terribly rare, making a signal soon visible in the data that ATLAS and CMS are eager to collect.

read more

BackreactionNo, you cannot test quantum gravity with X-ray superradiance

I am always looking for new ways to repeat myself, so I cannot possibly leave out this opportunity to point out yet another possibility to not test quantum gravity. Chris Lee from Arstechnica informed the world last week that “Deflecting X-rays due to gravity may provide view on quantum gravity”, which is a summary of the paper

The idea is to shine light on a crystal at frequencies high enough so that it excites nuclear resonances. This excitation is delocalized, and the energy is basically absorbed and reemitted systematically, which leads to a propagation of the light-induced excitation through the crystal. How this propagation proceeds depends on the oscillations of the nuclei, which again depends on the local proper time. If you place the crystal in a gravitational field, the proper time will depend on the strength of the field. As a consequence, the propagation of the excitation through the crystal depends on the gradient of the gravitational field. The authors argue that in principle this influence of gravity on the passage of time in the crystal should be measurable.

They then look at a related but slightly different effect in which the crystal rotates and the time-dilatation resulting from the (non-inertial!) motion gives rise to a similar effect, though much larger in magnitude.

The authors do not claim that this experiment would be more sensitive than already existing ones. I assume that if it was so, they’d have pointed this out. Instead, they write the main advantage is that this new method allows to test both special and general relativistic effects in tabletop experiments.

It’s a neat paper. What does it have to do with quantum gravity? Well, nothing. Indeed the whole paper doesn’t say anything about quantum gravity. Quantum gravity, I remind you, is the quantization of the gravitational interaction, which plays no role for this whatsoever. Chris Lee in his Arstechnica piece explains
“Experiments like these may even be sensitive enough to see the influence of quantum mechanics on space and time.”
Which is just plainly wrong. The influence of quantum mechanics on space-time is far too weak to be measurable in this experiment, or in any other known laboratory experiment. If you figure out how to do this on a tabletop, book your trip to Stockholm right away. Though I recommend you show me the paper before you waste your money.

Here is what Chris Lee had to say about the question what he thinks it’s got to do with quantum gravity:
Deviations from general relativity aren’t the same as quantum gravity. And besides this, for all I can tell the authors haven’t claimed that they can test a new parameter regime that hasn’t been tested before. The reference to quantum gravity is an obvious attempt to sex up the piece and has no scientific credibility whatsoever.

Summary: Just because it’s something with quantum and something with gravity doesn’t mean it’s quantum gravity.

Chad OrzelThe Emergence of Angular Momentum

The third of the great physics principles introduced in our introductory mechanics courses is the conservation of angular momentum, or the Angular Momentum Principle in the language of the Matter and Interactions curriculum we use. This tends to be one of the hardest topics to introduce, in no small part because it’s the last thing introduced and we’re usually really short on time, but also because it’s really weird. Angular momentum is very different than linear momentum, and involves all sorts of vector products and things going off at right angles.

This leads to some of the coolest demos in the whole course, such as this classic involving a spinning wheel suspended at one end of the axle:

The spinning wheel remains nearly horizontal and precesses in a circle because of the Angular Momentum Principle. The spin gives the wheel a substantial angular momentum, which is directed along the axle of the wheel. Angular momentum can only be changed by a torque, which requires a force exerted at some distance from a pivot point. Gravity provides such a force, pulling on the center of the wheel at some distance from the suspension point, but the weird math of angular momentum means that the torque is at right angles to both the force of gravity and the axle of the wheel, so the gravitational torque makes the angular momentum rotate in the horizontal plane, which means the axle must rotate.

This is a cool trick, but it’s also presented in a way that makes it seem a little magical– angular momentum just is along the axis, and torque just is at right angles to the force, and deal with it. Which is one of the things making it so difficult to learn and to teach.

But, of course, this is a classical system, so all of its dynamics need to be contained within Newton’s Laws. Which means it ought to be possible to look at how angular momentum comes out of the ordinary linear momentum and forces of the components making up the wheel. Of course, it’s kind of hard to see how this works, but that’s what we have computers for.

So, I wrote a VPython program to look at this, which I also put on the web with GlowScript, so you can play with it. Or point at my source code and laugh. The “featured image” up top is some of the output from the VPython on my desktop.

So, what have we got here? Well, there are six balls: a pivot point at the origin, and five balls some distance out, four of them connected to a central “hub.” This is my simulated “wheel.” I used four because I could put them on the x and y axes, and then not have to do complicated math to determine the velocity needed to make them go in a circle.

To make these spin, of course, I need some sort of force to bend them onto a curved path, otherwise they would just fly off in a straight line. That’s accomplished by the four spokes connecting them to the central hub. These are simulated as “springs” with a very high spring constant– you can see a bit of wobble in and out as the wheel spins, in this .gif of the output in a world with no gravity:

Spinning "wheel" with no gravity, showing the basic rotational motion.

Spinning “wheel” with no gravity, showing the basic rotational motion.

So, is this enough to produce angular momentum and the levitating wheel precession effect? Well, we can add a gravitational force and see what happens:

Spinning wheel with gravity, but no cross-bracing.

Spinning wheel with gravity, but no cross-bracing.

This actually has a little bit of the angular momentum thing going on, in that the “wheel” remains vertically oriented throughout. It doesn’t keep the axis in the horizontal plane, though, but instead swings down and back like a pendulum. It also gets a little wobbly, and eventually the balls start to catch up to each other, which is probably some numerical weirdness.

We can fix that problem by adding cross-braces– making the lines connecting the balls on the “wheel” into rigid springs. That gets us this:

Spinning "wheel" with cross-bracing but no brace to the pivot point.

Spinning “wheel” with cross-bracing but no brace to the pivot point.

This does the same pendulum-type motion, with the “wheel” remaining vertically oriented, but avoids the part where the balls catch up to each other, so it’s generally cleaner. But again, this doesn’t keep the axle in the horizontal plane. To do that, we need one last piece, namely a way to keep the wheel perpendicular to the axle. This is accomplished by making the braces connecting the balls to the pivot point at the center into very stiff springs, at which point you finally get something that resembles the classic bicycle wheel:

Simulated spinning "wheel" with all braces made into stiff springs, making it effectively a rigid object.

Simulated spinning “wheel” with all braces made into stiff springs, making it effectively a rigid object.

Here, you see the precession effect of the bicycle wheel, with a small up-and-down wobble. The technical term for this is “nutation,” and it’s a part of the phenomenology of angular momentum. There’s a nice discussion in this arxiv paper, which also fulfills my contractual obligation to indirectly cite Feynman at least once a year. The nutation gets smaller as you make the initial angular momentum bigger, and tends to damp out pretty quickly– you can see a little of it in that bicycle-wheel video above, but lots of lecture-hall versions of this start out with so little nutation that you really don’t notice it at all.

(You can make the effect bigger and more complicated by playing with the parameters of the problem. If I reduce the spring constant of the braces holding the wheel perpendicular to the axle, for example, I can make it draw little loops:

Screen shot of a simulation with the braces to the pivot point reduced to 1% of the initial spring constant.

Screen shot of a simulation with the braces to the pivot point reduced to 1% of the initial spring constant.

This is more like what you see in a spinning top, which is the other system where you can find people talking at length about nutation.)

The cool thing about all of this is that there’s nothing about angular momentum or torque in the code I wrote– you can look at the source on GlowScript and confirm that. It’s all plain vanilla forces and linear momentum. The angular momentum stuff just happens naturally. I think that’s pretty awesome. This kind of point tends to be mostly lost on students in the intro class, alas–I’ve done something similar with energy, which utterly failed to draw a reaction–but then that’s why I have this blog…

And there’s the latest fun-with-VPython activity. Which also serves as a remember-how-to-do-VPython activity, because next Monday I’ll start teaching two sections of intro mechanics again, for the first time in a while, and I was in need of a fun refresher.

March 22, 2015

Mark Chu-CarrollLogical Statements as Tasks

In the next step of our exploration of type theory, we’re going to take a step away from the set-based stuff. There are set-based intepretations of every statement in set theory. But what we really want to focus on is the interpretation of statements in computational terms.

What that means is that we’re going to take logical statements, and view them as computation tasks – that is, as formal logical specifications of a computation that we’d like to do. Under that interpretation, a proof of a statement S is an implementation of the task specified by S.

This interpretation is much, much easier for computer science types like me than the set-based interpretations. We can walk through the interpretations of all of the statements of our intuitionistic logic in just a few minutes.

We’ll start with the simple statements.

A \land B is a specification for a program that produces a pair (a, b) where a is a solution for A, and b is a solution for B.
A \lor B is a specification for a program that produces either a solution to A or a solution to B, along with a way of identifying which of A and B it solved. We do that using a version of the classical projection functions: A \lor B produce either \text{inl}(A) (that is, the left projection), or \text{inr}(B) (the right projection).
A \supset B is a specification for a program that produces a solution to B given a solution to A; in lambda calculus terms, it’s a form like \lambda x: b(x).

Now, we can move on to quantified statements. They get a bit more complicated, but if you read the quantifier right, it’s not bad.

(\forall x \in A) B(x) is a program which, when executed, yields a program of the form \lambda x.b(x), where b(x) is an implementation of B, and x is an implementation of A. In other words, a universal statement is a program factory, which produces a program that turns one program into another program.

To me, the easiest way to understand this is to expand the quantifier. A quantified statement \forall x \in A: B(x) can be read as \forall x: x \in A \Rightarrow B(x). If you read it that way, and just follow the computational interpretation of implication, you get precisely the definition above.

Existential quantification is easy. An existential statement \exists x \in A: B(x) is a two part problem: it needs a value for a (that is, a value of x for which a proof exists that x \in A), and a proof that for that specific value of x, x \in B. A solution, then, has two parts: it’s a pair (a, b), where a is a value in A, and b is a program that computes the problem B(a).

This is the perspective from which most of Martin-Loff’s type theory pursues things. There’s a reason why ML’s type theory is so well-loved by computer scientists: because what we’re really doing here is taking a foundational theory of mathematics, and turning it into a specification language for describing computing systems.

That’s the fundamental beauty of what Martin-Loff did: he found a way of formulating all of constructive mathematics so that it’s one and the same thing as the theory of computation.

And that’s why this kind of type theory is so useful as a component of programming languages: because it’s allowing you to describe the semantics of your program in terms that are completely natural to the program. The type system is a description of the problem; and the program is the proof.

With full-blown Martin-Loff type system, the types really are a full specification of the computation described by the program. We don’t actually use the full expressiveness of type theory in real languages – among other things, it’s not checkable! But we do use a constrained, limited logic with Martin-Loff’s semantics. That’s what types really are in programming languages: they’re logical statements! As we get deeper into type theory, well see exactly how that works.

John BaezThermodynamics with Continuous Information Flow

guest post by Blake S. Pollard

Over a century ago James Clerk Maxwell created a thought experiment that has helped shape our understanding of the Second Law of Thermodynamics: the law that says entropy can never decrease.

Maxwell’s proposed experiment was simple. Suppose you had a box filled with an ideal gas at equilibrium at some temperature. You stick in an insulating partition, splitting the box into two halves. These two halves are isolated from one another except for one important caveat: somewhere along the partition resides a being capable of opening and closing a door, allowing gas particles to flow between the two halves. This being is also capable of observing the velocities of individual gas particles. Every time a particularly fast molecule is headed towards the door the being opens it, letting fly into the other half of the box. When a slow particle heads towards the door the being keeps it closed. After some time, fast molecules would build up on one side of the box, meaning half the box would heat up! To an observer it would seem like the box, originally at a uniform temperature, would for some reason start splitting up into a hot half and a cold half. This seems to violate the Second Law (as well as all our experience with boxes of gas).

Of course, this apparent violation probably has something to do with positing the existence of intelligent microscopic doormen. This being, and the thought experiment itself, are typically referred to as Maxwell’s demon.

Photo credit: Peter MacDonald, Edmonds, UK

When people cook up situations that seem to violate the Second Law there is typically a simple resolution: you have to consider the whole system! In the case of Maxwell’s demon, while the entropy of the box decreases, the entropy of the system as a whole, demon include, goes up. Precisely quantifying how Maxwell’s demon doesn’t violate the Second Law has led people to a better understanding of the role of information in thermodynamics.

At the American Physical Society March Meeting in San Antonio, Texas, I had the pleasure of hearing some great talks on entropy, information, and the Second Law. Jordan Horowitz, a postdoc at Boston University, gave a talk on his work with Massimiliano Esposito, a researcher at the University of Luxembourg, on how one can understand situations like Maxwell’s demon (and a whole lot more) by analyzing the flow of information between subsystems.

Consider a system made up of two parts, X and Y. Each subsystem has a discrete set of states. Each systems makes transitions among these discrete states. These dynamics can be modeled as Markov processes. They are interested in modeling the thermodynamics of information flow between subsystems. To this end they consider a bipartite system, meaning that either X transitions or Y transitions, never both at the same time. The probability distribution p(x,y) of the whole system evolves according to the master equation:

\displaystyle{ \frac{dp(x,y)}{dt} = \sum_{x', y'} H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}p(x,y) }

where H_{x,x'}^{y,y'} is the rate at which the system transitions from (x',y') \to (x,y). The ‘bipartite’ condition means that H has the form

H_{x,x'}^{y,y'} = \left\{ \begin{array}{cc} H_{x,x'}^y & x \neq x'; y=y' \\   H_x^{y,y'} & x=x'; y \neq y' \\  0 & \text{otherwise.} \end{array} \right.

The joint system is an open system that satisfies the second law of thermodynamics:

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_{XY}}{dt} + \frac{dS_e}{dt} \geq 0 }


\displaystyle{ S_{XY} = - \sum_{x,y} p(x,y) \ln ( p(x,y) ) }

is the Shannon entropy of the system, satisfying

\displaystyle{ \frac{dS_{XY} }{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}   p(x,y) \right] \ln \left( \frac{p(x',y')}{p(x,y)} \right) }


\displaystyle{ \frac{dS_e}{dt}  = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{ H_{x,x'}^{y,y'} } {H_{x',x}^{y',y} } \right) }

is the entropy change of the environment.

We want to investigate how the entropy production of the whole system relates to entropy production in the bipartite pieces X and Y. To this end they define a new flow, the information flow, as the time rate of change of the mutual information

\displaystyle{ I = \sum_{x,y} p(x,y) \ln \left( \frac{p(x,y)}{p(x)p(y)} \right) }

Its time derivative can be split up as

\displaystyle{ \frac{dI}{dt} = \frac{dI^X}{dt} + \frac{dI^Y}{dt}}


\displaystyle{ \frac{dI^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y} p(x',y) - H_{x',x}^{y}p(x,y) \right] \ln \left( \frac{ p(y|x) }{p(y|x')} \right) }


\displaystyle{ \frac{dI^Y}{dt} = \sum_{x,y} \left[ H_{x}^{y,y'}p(x,y') - H_{x}^{y',y}p(x,y) \right] \ln \left( \frac{p(x|y)}{p(x|y')} \right) }

are the information flows associated with the subsystems X and Y respectively.


\displaystyle{ \frac{dI^X}{dt} > 0}

a transition in X increases the mutual information I, meaning that X ‘knows’ more about Y and vice versa.

We can rewrite the entropy production entering into the second law in terms of these information flows as

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_i^X}{dt} + \frac{dS_i^Y}{dt} }


\displaystyle{ \frac{dS_i^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^y p(x',y) - H_{x',x}^y p(x,y) \right] \ln \left( \frac{H_{x,x'}^y p(x',y) } {H_{x',x}^y p(x,y) } \right) \geq 0 }

and similarly for \frac{dS_Y}{dt} . This gives the following decomposition of entropy production in each subsystem:

\displaystyle{ \frac{dS_i^X}{dt} = \frac{dS^X}{dt} + \frac{dS^X_e}{dt} - \frac{dI^X}{dt} \geq 0 }

\displaystyle{ \frac{dS_i^Y}{dt} = \frac{dS^Y}{dt} + \frac{dS^X_e}{dt} - \frac{dI^Y}{dt} \geq 0},

where the inequalities hold for each subsystem. To see this, if you write out the left hand side of each inequality you will find that they are both of the form

\displaystyle{ \sum_{x,y} \left[ x-y \right] \ln \left( \frac{x}{y} \right) }

which is non-negative for x,y \geq 0.

The interaction between the subsystems is contained entirely in the information flow terms. Neglecting these terms gives rise to situations like Maxwell’s demon where a subsystem seems to violate the second law.

Lots of Markov processes have boring equilibria \frac{dp}{dt} = 0 where there is no net flow among the states. Markov processes also admit non-equilibrium steady states, where there may be some constant flow of information. In this steady state all explicit time derivatives are zero, including the net information flow:

\displaystyle{ \frac{dI}{dt} = 0 }

which implies that \frac{dI^X}{dt} = - \frac{dI^Y}{dt}. In this situation the above inequalities become

\displaystyle{ \frac{dS^X_i}{dt} = \frac{dS_e^X}{dt} - \frac{dI^X}{dt} }


\displaystyle{ \frac{dS^Y_i}{dt} = \frac{dS_e^X}{dt} + \frac{dI^X}{dt} }.


\displaystyle{ \frac{dI^X}{dt} > 0 }

then X is learning something about Y or acting as a sensor. The first inequality

\frac{dS_e^X}{dt} \geq \frac{dI^X}{dt} quantifies the minimum amount of energy X must supply to do this sensing. Similarly -\frac{dS_e^Y}{dt} \leq \frac{dI^X}{dt} bounds the amount of useful energy is available to Y as a result of this information transfer.

In their paper Horowitz and Esposito explore a few other examples and show the utility of this simple breakup of a system into two interacting subsystems in explaining various interesting situations in which the flow of information has thermodynamic significance.

For the whole story, read their paper!

• Jordan Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Phys. Rev. X 4 (2014), 031015.

March 21, 2015

David Hoggrobust fitting, intelligence, and stellar systems

In the morning I talked to Ben Weaver (NYU) about performing robust (as in "robust statistics") fitting of binary-star radial-velocity functions to the radial velocity measurements of the individual exposures from the APOGEE spectroscopy. The goal is to identify radial-velocity outliers and improve APOGEE data analysis, but we might make a few discoveries along the way, a la what's implied by this paper.

At lunch-time I met up with Bruce Knuteson (Kn-X) who is starting a company (see here) that uses a clever but simple economic model to obtain true information from untrusted and anonymous sources. He asked me about possible uses in astrophysics. He also asked me if I know anyone in US intelligence. I don't!

In the afternoon, Tim Morton (Princeton) came up to discuss things related to multiple-star and exoplanet systems. One of the things we discussed is how to parameterize or build pdfs over planetary systems, which can have very different numbers of elements and parameters. One option is to classify systems into classes, and build a model of each (implicitly qualitatively different) class and then model the full distribution as a mixture of classes. Another is to model the "biggest" or "most important" planet first; in this case we build a model of the pdf over the "most important planet" and then deal with the rest of the planets later. Another is to say that every single star has a huge number of planets (like thousands or infinity) and just most of them are unobservable. Then the model is over the an (effectively) infinite-dimensional vector for every system (most elements of which describe planets that are unobservable or will not be observed any time soon).

This infinite-planet descriptor sounds insane, but there are lots of tractable models like this in the world of non-parametrics. And the Solar System certainly suggests that most stars probably do have many thousands of planets (at least). You can guess from this discussion where we are leaning. Everything we figure out about planet systems applies to stellar systems too.

Doug Natelson"Flip chip" approach to nanoelectronics

Most people who aren't experts in the field don't really appreciate how amazing our electronic device capabilities are in integrated circuits.  Every time some lithographic patterning, materials deposition, or etching step is performed on an electrically interesting substrate (e.g., a Si chip), there is some amount of chemical damage or modification to the underlying material.  In the Si industry, we have gotten extremely good over the last five decades at either minimizing that collateral damage, or making sure that we can reverse its effects.  However, other systems have proven more problematic.  Any surface processing on GaAs-based structures tends to reduce the mobility of charge in underlying devices, and increases the apparent disorder in the material.  For more complex oxides like the cuprate or pnictide superconductors, even air exposure under ambient conditions (let alone much lithographic processing) can alter the surface oxygen content, affecting the properties of the underlying material.

However, for both basic science and technological motivations, we sometimes want to apply electrodes on small scales onto materials where damage from traditional patterning methods is unavoidable and can have severe consequences for the resulting measurements.  For example, this work used electrodes patterned onto PDMS, a soft silicone rubber.  The elastomer-supported electrodes were then laminated (reversibly!) onto the surface of a single crystal of rubrene, a small molecule organic semiconductor.  Conventional lithography onto such a fragile van der Waals crystal is basically impossible, but with this approach the investigators were able to make nice transistor devices to study intrinsic charge transport in the material.  

One issue with PDMS as a substrate is that it is very squishy with a large thermal expansion coefficient.  Sometimes that can be useful (read this - it's very clever), but it means that it's very difficult to put truly nanoscale electrodes onto PDMS and have them survive without distortion, wrinkling, cracking of metal layers, etc.  PDMS also really can't be used at temperatures much below ambient.  A more rigid substrate that is really flat would be great, with the idea that one could do sophisticated fab of electrode patterns, and then "flip" the electrode substrate into contact with the material of interest, which could remain untouched or unblemished by lithographic processes.

In this recent preprint, a collaboration between the Gervais group at McGill and the CINT at Sandia, the investigators used a rigid sapphire (Al2O3) substrate to support patterned Au electrodes separated by a sub-micron gap. They then flipped this onto completely unpatterned (except for large Ohmic contacts far away) GaAs/AlGaAs heterostructures.  With this arrangement, cleverly designed to remain in intimate contact even when the device is cooled to sub-Kelvin temperatures, they are able to make a quantum point contact while in principle maintaining the highest possible charge mobility of the underlying semiconductor.  It's very cool, though making truly intimate contact between two rigid substrates over mm-scale areas is very challenging - the surfaces have to be very clean, and very flat!  This configuration, while not implementable for too many device designs, is nonetheless of great potential use for expanding the kinds of materials we can probe with nanoscale electrode arrangements.

Geraint F. LewisMoving Charges and Magnetic Fields

Still struggling with grant writing season, so another post which has resulted in my random musings about the Universe (which actually happens quite a lot).

In second semester, I am teaching electricity and magnetism to our First Year Advanced Class. I really enjoy teaching this class as the kids are on the ball and can ask some deep and meaningful questions.

But the course is not ideal. Why? Because we teach from a textbook and the problem is that virtually all modern text books are almost the same. Science is trotted out in an almost historical progression. But it does not have to be taught that way.

In fact, it would be great if we could start with Hamiltonian and Lagrangian approaches, and derive physics from a top down approach. We're told that it's mathematically too challenging, but it really isn't. In fact, I would start with a book like The Theoretical Minimum, not some multicoloured compendium of physics.

We have to work with what we have!

One of the key concepts that we have to get across is that electricity and magnetism are not really two separate things, but are actually two sides of the same coin. And, in the world of classical physics, it was the outstanding work of James Clerk Maxwell who provided the mathematical framework that broad them together. Maxwell gave us his famous equations that underpin electro-magnetism.
Again, being the advanced class, we can go beyond this and look at the work that came after Maxwell, and that was the work by Albert Einstein, especially Special Theory of Relativity.

The wonderful thing about special relativity is that the mix of electric and magnetic fields depends upon the motion of an observer. One person sees a particular configuration of electric and magnetic fields, and another observer, moving relative to the first, will see a different mix of electric and magnetic fields.

This is nice to say, but what does it actually mean? Can we do anything with it to help understand electricity and magnetism a little more? I think so.

In this course (and EM courses in general) we spend a lot of time calculating the electric field of a static charge distribution. For this, we use the rather marvellous Gauss's law, that relates the electric field distribution to the underlying charges.
I've written about this wonderful law before, and should how you can use symmetries (i.e. nice simple shapes like spheres, boxes and cylinders) to calculate the electric field.

Then we come to the sources of magnetic field. And things, well, get messy. There are some rules we can use, but it's, well, as I said, messy.

We know that magnetic fields are due to moving charges, but what's the magnetic field of a lonely little charge moving on its own? Looks something like this
Where does this come from? And how do you calculate it? Is there an easier way?

And the answer is yes! The kids have done a touch of special relativity at high school and (without really knowing it in detail) have seen the Lorentz transformations. Now, introductory lessons on special relativity often harp on about swimming back and forth across rivers, or something like that, and have a merry dance before getting to the point. And the transforms are presented as a way to map coordinators from one observer to another, but they are much more powerful than that.

You can use them to transform vectors from one observers viewpoint to another. Including electric and magnetic fields. And these are simple algebra.

where we also have the famous Lorentz factor. So, what does this set of equations tell us? Well, if we have an observer who sees a particular electric field (Ex,Ey,Ez), and magnetic field (Bx,By,Bz), then an observer moving with a velocity v (in the x-direction) with see the electric and magnetic fields with the primed components.

Now, we know that the electric field of an isolated charge at rest is. We can use Gauss's law and it tells us that the field is spherically symmetrical and looks like this
The field drops off in strength with the square of the distance. What would be the electric and magnetic fields if this charge was trundling past us at a velocity v? Easy, we just use the Lorentz transforms to tell us. We know exactly what the electric field looks like of the charge at rest, and we know that, at rest, there is no magnetic field.

Being as lazy as I am, I didn't want to calculate anything by hand, so I chucked it into MATLAB, a mathematical environment that many students have access too. I'm not going to be an apologist for MATLAB's default graphics style (which I think sucks - but there are, with a bit of work, solutions).

Anyway, here's a charge at rest. The blue arrows are the electric field. No magnetic field, remember!
So, top left is a view along the x-axis, then y, then z, then a 3-D view. Cool!

Now, what does this charge look like if it is moving relative to me? Throw it into the Lorentz transforms, and voila!

MAGNETIC FIELDS!!! The charge is moving along the x-axis with respect to me, and when we look along x we can see that the magnetic fields wrap around the direction of motion (remember your right hand grip rule kids!).

That was for a velocity of 10% the speed of light. Let's what it up to 99.999%
The electric field gets distorted also!

Students also use Gauss's law to calculate the electric field of an infinitely long line of charge. Now the strength of the field drops off as the inverse of the distance from the line of charge.

Now, let's consider an observer moving at a velocity relative to the line of charge.
Excellent! Similar to what we saw before, and what we would expect. The magnetic field curls around the moving line of charge (which, of course, is simply an electric current).

Didn't we know that, you say? Yes, but I think this is more powerful, not only to reveal the relativistic relationship between the electric and magnetic fields, but also once you have written the few lines of algebraic code in MATLAB (or python or whatever the kids are using these days) you can ask about more complicated situations. You can play with physics (which, IMHO, is how you really understand it).

So, to round off, what's the magnetic field of a perpendicular infinite line of charge moving with respect to you. I am sure you could, with a bit of work, calculate it with usual mathematical approaches, but let's just take a look.

Here's at rest
A bit like further up, but now pointing along a different axis.

Before we add velocity, you physicists and budding physicists make a prediction! Here goes! A tenth the velocity of light and we get
I dunno if we were expecting that! Remember, top left is looking along the x-axis, along the direction of motion. So we have created some magnetic structure. Just not the simple structure we normally see!

And now at 99.99% we get
And, of course, I could play with lots of other geometries, like what happens if you move a ring of charge etc. But let's not get too excited, and come back to that another day.

March 20, 2015

David HoggBlanton-Hogg group meeting

Today was the first-ever instance of the new Blanton–Hogg combined group meeting. Chang-Hoon Hahn (NYU) presented work on the environmental dependence of galaxy populations in the PRIMUS data set and a referee report he is responding to. We discussed how the redshift incompleteness of the survey might depend on galaxy type. Vakili showed some preliminary results he has on machine-learning-based photometric redshifts. We encouraged him to go down the "feature selection" path to start; it would be great to know what SDSS catalog entries are most useful for predicting redshift! Sanderson presented issues she is having with building a hierarchical probabilistic model of the Milky Way satellite galaxies. She had issues with the completeness (omg, how many times have we had such issues at Camp Hogg!) but I hijacked the conversation onto the differences between binomial and Poisson likelihood functions. Her problem is very, very similar to that solved by Foreman-Mackey for exoplanets, but just with different functional forms for everything.

Sean CarrollGuest Post: Don Page on God and Cosmology

Don Page is one of the world’s leading experts on theoretical gravitational physics and cosmology, as well as a previous guest-blogger around these parts. (There are more world experts in theoretical physics than there are people who have guest-blogged for me, so the latter category is arguably a greater honor.) He is also, somewhat unusually among cosmologists, an Evangelical Christian, and interested in the relationship between cosmology and religious belief.

Longtime readers may have noticed that I’m not very religious myself. But I’m always willing to engage with people with whom I disagree, if the conversation is substantive and proceeds in good faith. I may disagree with Don, but I’m always interested in what he has to say.

Recently Don watched the debate I had with William Lane Craig on “God and Cosmology.” I think these remarks from a devoted Christian who understands the cosmology very well will be of interest to people on either side of the debate.

Open letter to Sean Carroll and William Lane Craig:

I just ran across your debate at the 2014 Greer-Heard Forum, and greatly enjoyed listening to it. Since my own views are often a combination of one or the others of yours (though they also often differ from both of yours), I thought I would give some comments.

I tend to be skeptical of philosophical arguments for the existence of God, since I do not believe there are any that start with assumptions universally accepted. My own attempt at what I call the Optimal Argument for God (one, two, three, four), certainly makes assumptions that only a small fraction of people, and perhaps even only a small fraction of theists, believe in, such as my assumption that the world is the best possible. You know that well, Sean, from my provocative seminar at Caltech in November on “Cosmological Ontology and Epistemology” that included this argument at the end.

I mainly think philosophical arguments might be useful for motivating someone to think about theism in a new way and perhaps raise the prior probability someone might assign to theism. I do think that if one assigns theism not too low a prior probability, the historical evidence for the life, teachings, death, and resurrection of Jesus can lead to a posterior probability for theism (and for Jesus being the Son of God) being quite high. But if one thinks a priori that theism is extremely improbable, then the historical evidence for the Resurrection would be discounted and not lead to a high posterior probability for theism.

I tend to favor a Bayesian approach in which one assigns prior probabilities based on simplicity and then weights these by the likelihoods (the probabilities that different theories assign to our observations) to get, when the product is normalized by dividing by the sum of the products for all theories, the posterior probabilities for the theories. Of course, this is an idealized approach, since we don’t yet have _any_ plausible complete theory for the universe to calculate the conditional probability, given the theory, of any realistic observation.

For me, when I consider evidence from cosmology and physics, I find it remarkable that it seems consistent with all we know that the ultimate theory might be extremely simple and yet lead to sentient experiences such as ours. A Bayesian analysis with Occam’s razor to assign simpler theories higher prior probabilities would favor simpler theories, but the observations we do make preclude the simplest possible theories (such as the theory that nothing concrete exists, or the theory that all logically possible sentient experiences occur with equal probability, which would presumably make ours have zero probability in this theory if there are indeed an infinite number of logically possible sentient experiences). So it seems mysterious why the best theory of the universe (which we don’t have yet) may be extremely simple but yet not maximally simple. I don’t see that naturalism would explain this, though it could well accept it as a brute fact.

One might think that adding the hypothesis that the world (all that exists) includes God would make the theory for the entire world more complex, but it is not obvious that is the case, since it might be that God is even simpler than the universe, so that one would get a simpler explanation starting with God than starting with just the universe. But I agree with your point, Sean, that theism is not very well defined, since for a complete theory of a world that includes God, one would need to specify the nature of God.

For example, I have postulated that God loves mathematical elegance, as well as loving to create sentient beings, so something like this might explain both why the laws of physics, and the quantum state of the universe, and the rules for getting from those to the probabilities of observations, seem much simpler than they might have been, and why there are sentient experiences with a rather high degree of order. However, I admit there is a lot of logically possible variation on what God’s nature could be, so that it seems to me that at least we humans have to take that nature as a brute fact, analogous to the way naturalists would have to take the laws of physics and other aspects of the natural universe as brute facts. I don’t think either theism or naturalism solves this problem, so it seems to me rather a matter of faith which makes more progress toward solving it. That is, theism per se cannot deduce from purely a priori reasoning the full nature of God (e.g., when would He prefer to maintain elegant laws of physics, and when would He prefer to cure someone from cancer in a truly miraculous way that changes the laws of physics), and naturalism per se cannot deduce from purely a priori reasoning the full nature of the universe (e.g., what are the dynamical laws of physics, what are the boundary conditions, what are the rules for getting probabilities, etc.).

In view of these beliefs of mine, I am not convinced that most philosophical arguments for the existence of God are very persuasive. In particular, I am highly skeptical of the Kalam Cosmological Argument, which I shall quote here from one of your slides, Bill:

  1. If the universe began to exist, then there is a transcendent cause
    which brought the universe into existence.
  2. The universe began to exist.
  3. Therefore, there is a transcendent cause which brought the
    universe into existence.

I do not believe that the first premise is metaphysically necessary, and I am also not at all sure that our universe had a beginning. (I do believe that the first premise is true in the actual world, since I do believe that God exists as a transcendent cause which brought the universe into existence, but I do not see that this premise is true in all logically possible worlds.)

I agree with you, Sean, that we learn our ideas of causation from the lawfulness of nature and from the directionality of the second law of thermodynamics that lead to the commonsense view that causes precede their effects (or occur at the same time, if Bill insists). But then we have learned that the laws of physics are CPT invariant (essentially the same in each direction of time), so in a fundamental sense the future determines the past just as much as the past determines the future. I agree that just from our experience of the one-way causation we observe within the universe, which is just a merely effective description and not fundamental, we cannot logically derive the conclusion that the entire universe has a cause, since the effective unidirectional causation we commonly experience is something just within the universe and need not be extrapolated to a putative cause for the universe as a whole.

However, since to me the totality of data, including the historical evidence for the Resurrection of Jesus, is most simply explained by postulating that there is a God who is the Creator of the universe, I do believe by faith that God is indeed the cause of the universe (and indeed the ultimate Cause and Determiner of everything concrete, that is, everything not logically necessary, other than Himself—and I do believe, like Richard Swinburne, that God is concrete and not logically necessary, the ultimate brute fact). I have a hunch that God created a universe with apparent unidirectional causation in order to give His creatures some dim picture of the true causation that He has in relation to the universe He has created. But I do not see any metaphysical necessity in this.

(I have a similar hunch that God created us with the illusion of libertarian free will as a picture of the true freedom that He has, though it might be that if God does only what is best and if there is a unique best, one could object that even God does not have libertarian free will, but in any case I would believe that it would be better for God to do what is best than to have any putative libertarian free will, for which I see little value. Yet another hunch I have is that it is actually sentient experiences rather than created individual `persons’ that are fundamental, but God created our experiences to include beliefs that we are individual persons to give us a dim image of Him as the one true Person, or Persons in the Trinitarian perspective. However, this would take us too far afield from my points here.)

On the issue of whether our universe had a beginning, besides not believing that this is at all relevant to the issue of whether or not God exists, I agreed almost entirely with Sean’s points rather than yours, Bill, on this issue. We simply do not know whether or not our universe had a beginning, but there are certainly models, such as Sean’s with Jennifer Chen (hep-th/0410270 and gr-qc/0505037), that do not have a beginning. I myself have also favored a bounce model in which there is something like a quantum superposition of semiclassical spacetimes (though I don’t really think quantum theory gives probabilities for histories, just for sentient experiences), in most of which the universe contracts from past infinite time and then has a bounce to expand forever. In as much as these spacetimes are approximately classical throughout, there is a time in each that goes from minus infinity to plus infinity.

In this model, as in Sean’s, the coarse-grained entropy has a minimum at or near the time when the spatial volume is minimized (at the bounce), so that entropy increases in both directions away from the bounce. At times well away from the bounce, there is a strong arrow of time, so that in those regions if one defines the direction of time as the direction in which entropy increases, it is rather as if there are two expanding universes both coming out from the bounce. But it is erroneous to say that the bounce is a true beginning of time, since the structure of spacetime there (at least if there is an approximately classical spacetime there) has timelike curves going from a proper time of minus infinity through the bounce (say at proper time zero) and then to proper time of plus infinity. That is, there are worldlines that go through the bounce and have no beginning there, so it seems rather artificial to say the universe began at the bounce that is in the middle just because it happens to be when the entropy is minimized. I think Sean made this point very well in the debate.

In other words, in this model there is a time coordinate t on the spacetime (say the proper time t of a suitable collection of worldlines, such as timelike geodesics that are orthogonal to the extremal hypersurface of minimal spatial volume at the bounce, where one sets t = 0) that goes from minus infinity to plus infinity with no beginning (and no end). Well away from the bounce, there is a different thermodynamic time t' (increasing with increasing entropy) that for t >> 0 increases with t but for t << 0 decreases with t (so there t' becomes more positive as t becomes more negative). For example, if one said that t' is only defined for |t| > 1, say, one might have something like

t' = (t^2 - 1)^{1/2},

the positive square root of one less than the square of t. This thermodynamic time t' only has real values when the absolute value of the coordinate time t, that is, |t|, is no smaller than 1, and then t' increases with |t|.

One might say that t' begins (at t' = 0) at t = -1 (for one universe that has t' growing as t decreases from -1 to minus infinity) and at t = +1 (for another universe that has t' growing as t increases from +1 to plus infinity). But since the spacetime exists for all real t, with respect to that time arising from general relativity there is no beginning and no end of this universe.

Bill, I think you also objected to a model like this by saying that it violates the second law (presumably in the sense that the coarse-grained entropy does not increase monotonically with t for all real t). But if we exist for t >> 1 (or for t << -1; there would be no change to the overall behavior if t were replaced with -t, since the laws are CPT invariant), then we would be in a region where the second law is observed to hold, with coarse-grained entropy increasing with t' \sim t (or with t' \sim -t if t << -1). A viable bounce model would have it so that it would be very difficult or impossible for us directly to observe the bounce region where the second law does not apply, so our observations would be in accord with the second law even though it does not apply for the entire universe.

I think I objected to both of your probability estimates for various things regarding fine tuning. Probabilities depend on the theory or model, so without a definite model, one cannot claim that the probability for some feature like fine tuning is small. It was correct to list me among the people believing in fine tuning in the sense that I do believe that there are parameters that naively are far different from what one might expect (such as the cosmological constant), but I agreed with the sentiment of the woman questioner that there are not really probabilities in the absence of a model.

Bill, you referred to using some “non-standard” probabilities, as if there is just one standard. But there isn’t. As Sean noted, there are models giving high probabilities for Boltzmann brain observations (which I think count strongly against such models) and other models giving low probabilities for them (which on this regard fits our ordered observations statistically). We don’t yet know the best model for avoiding Boltzmann brain domination (and, Sean, you know that I am skeptical of your recent ingenious model), though just because I am skeptical of this particular model does not imply that I believe that the problem is insoluble or gives evidence against a multiverse; in any case it seems also to be a problem that needs to be dealt with even in just single-universe models.

Sean, at one point your referred to some naive estimate of the very low probability of the flatness of the universe, but then you said that we now know the probability of flatness is very near unity. This is indeed true, as Stephen Hawking and I showed long ago (“How Probable Is Inflation?” Nuclear Physics B298, 789-809, 1988) when we used the canonical measure for classical universes, but one could get other probabilities by using other measures from other models.

In summary, I think the evidence from fine tuning is ambiguous, since the probabilities depend on the models. Whether or not the universe had a beginning also is ambiguous, and furthermore I don’t see that it has any relevance to the question of whether or not God exists, since the first premise of the Kalam cosmological argument is highly dubious metaphysically, depending on contingent intuitions we have developed from living in a universe with relatively simple laws of physics and with a strong thermodynamic arrow of time.

Nevertheless, in view of all the evidence, including both the elegance of the laws of physics, the existence of orderly sentient experiences, and the historical evidence, I do believe that God exists and think the world is actually simpler if it contains God than it would have been without God. So I do not agree with you, Sean, that naturalism is simpler than theism, though I can appreciate how you might view it that way.

Best wishes,


Clifford JohnsonFestival of Books!

what_are_you_reading(Click for larger view of 2010 Festival "What are you reading?" wall.) So the Festival of Books is 18-19th April this year. If you're in or near LA, I hope you're going! It's free, it's huge (the largest book festival in the USA) and also huge fun! They've announced the schedule of events and the dates on which you can snag (free) tickets for various indoor panels and appearances since they are very popular, as usual. So check out the panels, appearances, and performances here. (Check out several of my past posts on the Festival here. Note also that the festival is on the USC campus which is easy to get to using great public transport links if you don't want to deal with traffic and parking.) Note also that the shortlist for the 2014 LA Times Book Prizes was announced (a while back - I forgot to post about it) and it is here. I always find it interesting... for a start, it is a great list of reading suggestions! By the way, apparently I'm officially an author - not just a guy who writes from time to time - an author. Why? Well, I'm listed as one on the schedule site. I'll be on one of the author panels! It is moderated by KC Cole, and I'll be joining [...] Click to continue reading this post

David Hogg#astrohackny, CMB likelihood

I spent most of #astrohackny arguing with Jeff Andrews (Columbia) about white-dwarf cooling age differences and how to do inference given measurements of white dwarf masses and cooling times (for white dwarfs in coeval binaries). The problem is non-trivial and is giving Andrews biased results. In the end we decided to obey the advice I usually give, which is to beat up the likelihood function before doing the full inference. Meaning: Try to figure out if the inference issues are in the likelihood function, the prior, or the MCMC sampler. Since all these things combine in a full inference, it makes sense to "unit test" (as it were) the likelihood function first.

Late in the day I discussed the CMB likelihood function with Evan Biederstedt. Our goal is to show that we can perform a non-approximate likelihood function evaluation in real space for a non-uniformly observed CMB sky (heteroskedastic and cut sky). This involves solving—and taking the determinant of—a large matrix (50 million squared in the case of Planck). I, for one, think we can do this, using our brand-new linear algebra foo.

ResonaancesLHCb: B-meson anomaly persists

Today LHCb released a new analysis of the angular distribution in  the B0 → K*0(892) (→K+π-) μ+ μ- decays. In this 4-body decay process, the angles between the direction of flight of all the different particles can be measured as a function of the invariant mass  q^2 of the di-muon pair. The results are summarized in terms of several form factors with imaginative names like P5', FL, etc. The interest in this particular decay comes from the fact that 2 years ago LHCb reported a large deviation from the standard model prediction in one q^2 region of 1 form factor called P5'. That measurement was based on 1 inverse femtobarn of data;  today it was updated to full 3 fb-1 of run-1 data. The news is that the anomaly persists in the q^2 region 4-8 GeV, see the plot.  The measurement  moved a bit toward the standard model, but the statistical errors have shrunk as well.  All in all, the significance of the anomaly is quoted as 3.7 sigma, the same as in the previous LHCb analysis. New physics that effectively induces new contributions to the 4-fermion operator (\bar b_L \gamma_\rho s_L) (\bar \mu \gamma_\rho \mu) can significantly improve agreement with the data, see the blue line in the plot. The preference for new physics remains remains high, at the 4 sigma level, when this measurement is combined with other B-meson observables.

So how excited should we be? One thing we learned today is that the anomaly is unlikely to be a statistical fluctuation. However, the observable is not of the clean kind, as the measured angular distributions are  susceptible to poorly known QCD effects. The significance depends a lot on what is assumed about these uncertainties, and experts wage ferocious battles about the numbers. See for example this paper where larger uncertainties are advocated, in which case the significance becomes negligible. Therefore, the deviation from the standard model is not yet convincing at this point. Other observables may tip the scale.  If a  consistent pattern of deviations in several B-physics observables emerges,  only then we can trumpet victory.

Plots borrowed from David Straub's talk in Moriond; see also the talk of Joaquim Matias with similar conclusions. David has a post with more details about the process and uncertainties. For a more popular write-up, see this article on Quanta Magazine. 

Terence TaoAn averaged form of Chowla’s conjecture

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

\displaystyle  \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(X)

as {X \rightarrow \infty} for any distinct natural numbers {h_1,\dots,h_k}, where {\lambda} denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function {\mu} and obtain a morally equivalent conjecture.) This conjecture remains open for any {k \geq 2}; for instance the assertion

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+2) = o(X)

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the {h_1,\dots,h_k} parameters. More precisely, one has

Theorem 1 (Chowla on the average) Suppose {H = H(X) \leq X} is a quantity that goes to infinity as {X \rightarrow \infty} (but it can go to infinity arbitrarily slowly). Then for any fixed {k \geq 1}, we have

\displaystyle  \sum_{h_1,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k)| = o( H^k X ).

In fact, we can remove one of the averaging parameters and obtain

\displaystyle  \sum_{h_2,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n) \lambda(n+h_2) \dots \lambda(n+h_k)| = o( H^{k-1} X ).

Actually we can make the decay rate a bit more quantitative, gaining about {\frac{\log\log H}{\log H}} over the trivial bound. The key case is {k=2}; while the unaveraged Chowla conjecture becomes more difficult as {k} increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger {k}, and we end up deducing the higher {k} case of the conjecture from the {k=2} case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

\displaystyle  \int_{{\mathbf T}} (\int_{\mathbf R} |\sum_{x \leq n \leq x+H} f(n) e(\alpha n)|^2 dx)^2\ d\alpha

\displaystyle = \sum_{|h| \leq H} (H-|h|)^2 |\sum_n f(n) \overline{f}(n+h)|^2

(related to a standard Fourier-analytic identity for the Gowers {U^2} norm) it turns out that the {k=2} case of the above theorem can basically be derived from an estimate of the form

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o( H X )

uniformly for all {\alpha \in {\mathbf T}}. For “major arc” {\alpha}, close to a rational {a/q} for small {q}, we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” {\alpha}, we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the {k} copies of {\lambda} in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions {g_1,\dots,g_k} as long as these functions were far from a twisted Dirichlet character {n \mapsto \chi(n) n^{it}} in the sense that

\displaystyle  \sum_p \frac{1 - \hbox{Re} g(p) \overline{\chi(p) p^{it}}}{p} = +\infty. \ \ \ \ \ (1)

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function {g} which behaves like {n^{it_j}} on scales {n \sim N_j} for some {N_j} going to infinity and some slowly varying {t_j}, and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations {\sum_{n \leq N_j} g(n+1) \overline{g(n)}} will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that {t=0} in this case; we discuss this fact in an appendix to the paper.)

Filed under: math.NT, paper Tagged: Chowla's conjecture, Kaisa Matomaki, Liouville function, Maksym Radziwill

Sean CarrollAuction: Multiply-Signed Copy of Why Evolution Is True

Here is a belated but very welcome spinoff of our Moving Naturalism Forward workshop from 2012: Jerry Coyne was clever enough to bring along a copy of his book, Why Evolution Is True, and have all the participants sign it. He subsequently gathered a few more distinguished autographs, and to make it just a bit more beautiful, artist Kelly Houle added some original illustrations. Jerry is now auctioning off the book to benefit Doctors Without Borders. Check it out:



Here is the list of signatories:

  • Dan Barker
  • Sean Carroll
  • Jerry Coyne
  • Richard Dawkins
  • Terrence Deacon
  • Simon DeDeo
  • Daniel Dennett
  • Owen Flanagan
  • Anna Laurie Gaylor
  • Rebecca Goldstein
  • Ben Goren
  • Kelly Houle
  • Lawrence Krauss
  • Janna Levin
  • Jennifer Ouellette
  • Massimo Pigliucci
  • Steven Pinker
  • Carolyn Porco
  • Nicholas Pritzker
  • Alex Rosenberg
  • Don Ross
  • Steven Weinberg

Jerry is hoping it will fetch a good price to benefit the charity, so we’re spreading the word. I notice that a baseball signed by Mickey Mantle goes for about $2000. In my opinion a book signed by Steven Weinberg alone should go for even more, so just imagine what this is worth. You have ten days to get your bids in — and if it’s a bit pricey for you personally, I’m sure there’s someone who loves you enough to buy it for you.

March 19, 2015

Clifford JohnsonLAIH Luncheon with Jack Miles

LAIH_Jack_Miles_6_march_2015_2 (Click for larger view.) On Friday 6th March the Los Angeles Institute for the Humanities (LAIH) was delighted to have our luncheon talk given by LAIH Fellow Jack Miles. He told us some of the story behind (and the making of) the Norton Anthology of World Religions - he is the main editor of this massive work - and lots of the ins and outs of how you go about undertaking such an enterprise. It was fascinating to hear how the various religions were chosen, for example, and how he selected and recruited specialist editors for each of the religions. It was an excellent talk, made all the more enjoyable by having Jack's quiet and [...] Click to continue reading this post

Chad OrzelFavorite Quantum Physics in Fiction?

We’ll be accepting applications for The Schrödinger Sessions workshop at JQI through tomorrow. We already have 80-plus applicants for fewer than 20 planned spots, including a couple of authors I really, really like and some folks who have won awards, etc., so we’re going to have our work cut out for us picking the attendees…

We’re also discussing the program for the workshop– more details when we have something more final– which has me thinking about good examples to use of storytelling involving quantum physics. I’d like to be able to give a few shout-outs to already-existing fiction involving the ideas we’ll be discussing. And while I know several already, I’m always happy to hear more…

Things on my mental list already:

Robert Charles Wilson’s “Divided by Infinity”, probably the best fictional exploration of Many-Worlds that I’ve seen. Yes, I’m aware of Larry Niven’s “All the Myriad Ways.” This is better.

— Ted Chiang’s “Story of Your Life,” though that’s less explicitly quantum. It turns on the idea of the principle of least action, which is essential for Feynman’s formulation of quantum physics, but originates in classical physics. It’s an amazing story, though.

— Hannu Rajaniemi’s The Quantum Thief and sequels make heavy use of ideas from quantum technology. He even specifically cites ion traps when talking about quantum computing infrastructure, which is a great fit with the labs at JQI.

— Charlie Stross mentions quantum communications a lot in his SF. He’s really hit or miss for me, and sadly most of what misses for me hits with a lot of other people, so he’s evolving toward a less appealing state, but one of his early books– either Singularity Sky or Iron Sunrise, I forget which– had what’s probably my favorite hand-wave involving this use of quantum entanglement for FTL communications, with the idea that FTL travel via hyperspace breaks entanglement, and thus the entangled qubits used for instantaneous communications are a precious resource shipped through normal space at great expense.

It’s hard to think of on-screen examples of quantum technology, though, largely because it’s been years since I’ve had the free time to watch many movies. Interstellar name-checks the need for “quantum data,” but it’s really about astrophysics, not quantum mechanics. I know the Coen brothers did a movie a few years back where quantum physics plays a metaphorical sort of role, but I haven’t seen it. Quantum computing as a way of cracking encryption may have been a McGuffin in a thriller movie or two, but I don’t recall specific examples.

Anyway, I would love to have a longer list of stuff to suggest/ cite/ name-drop. Please leave ideas in the comments…

John BaezQuantum Superposition

guest post by Piotr Migdał

In this blog post I will introduce some basics of quantum mechanics, with the emphasis on why a particle being in a few places at once behaves measurably differently from a particle whose position we just don’t know. It’s a kind of continuation of the “Quantum Network Theory” series (Part 1, Part 2) by Tomi Johnson about our work in Jake Biamonte’s group at the ISI Foundation in Turin. My goal is to explain quantum community detection. Before that, I need to introduce the relevant basics of quantum mechanics, and of the classical community detection.

But before I start, let me introduce myself, as it’s my first post to Azimuth.

I just finished my quantum optics theory Ph.D in Maciej Lewenstein’s group at The Institute of Photonic Sciences in Castelldefels, a beach near Barcelona. My scientific interests range from quantum physics, through complex networks, to data-driven approach…. to pretty much anything—and now I work as a data science freelancer. I enjoy doing data visualizations (for example of relations between topics in mathematics), I am a big fan of Rényi entropy (largely thanks to Azimuth), and I’m a believer in open science. If you think that there are too many off-topic side projects here, you are absolutely right!

In my opinion, quantum mechanics is easy. Based on my gifted education experience it takes roughly 9 intense hours to introduce entanglement to students having only a very basic linear algebra background. Even more, I believe that it is possible to get familiar with quantum mechanics by just playing with it—so I am developing a Quantum Game!

Quantum weirdness

In quantum mechanics a particle can be in a few places at once. It sounds strange. So strange, that some pioneers of quantum mechanics (including, famously, Albert Einstein) didn’t want to believe in it: not because of any disagreement with experiment, not because of any lack of mathematical beauty, just because it didn’t fit their philosophical view of physics.

It went further: in the Soviet Union the idea that electron can be in many places (resonance bonds) was considered to oppose materialism. Later, in California, hippies investigated quantum mechanics as a basis for parapsychology—which, arguably, gave birth to the field of quantum information.

As Griffiths put it in his Introduction to Quantum Mechanics (Chapter 4.4.1):

To the layman, the philosopher, or the classical physicist, a statement of the form “this particle doesn’t have a well-defined position” [...] sounds vague, incompetent, or (worst of all) profound. It is none of these.

In this guest blog post I will try to show that not only can a particle be in many places at once, but also that if it were not in many places at once then it would cause problems. That is, as fundamental phenomena as atoms forming chemical bonds, or particle moving in the vacuum, require it.

As in many other cases, the simplest non-trivial case is perfect for explaining idea, as it covers the most important phenomena, while being easy to analyze, visualize and comprehend. Quantum mechanics is not an exception—let us start with a system of two states.

A two state system

Let us study a simplified model of the hydrogen molecular ion \mathrm{H}_2^+, that is, a system of two protons and one electron (see Feynman Lectures on Physics, Vol. III, Chapter 10.1). Since the protons are heavy and slow, we treat them as fixed. We focus on the electron moving in the electric field created by protons.

In quantum mechanics we describe the state of a system using a complex vector. In simple terms, this is a list of complex numbers called ‘probability amplitudes’. For an electron that can be near one proton or another, we use a list of two numbers:

|\psi\rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

In this state the electron is near the first proton with probability |\alpha|^2, and near the second one with probability |\beta|^2.

Note that

|\psi \rangle = \alpha \begin{bmatrix}          1 \\ 0      \end{bmatrix} + \beta \begin{bmatrix}          0 \\ 1      \end{bmatrix}

So, we say the electron is in a ‘linear combination’ or ‘superposition’ of the two states

|1\rangle =      \begin{bmatrix}          1 \\ 0      \end{bmatrix}

(where it’s near the first proton) and the state

|2\rangle =      \begin{bmatrix}          0 \\ 1      \end{bmatrix}

(where it’s near the second proton).

Why do we denote unit vectors in strange brackets looking like

| \mathrm{something} \rangle ?

Well, this is called Dirac notation (or bra-ket notation) and it is immensely useful in quantum mechanics. We won’t go into it in detail here; merely note that | \cdot \rangle stands for a column vector and \langle \cdot | stands for a row vector, while \psi is a traditional symbol for a quantum state.).

Amplitudes can be thought as ‘square roots’ of probabilities. We can force an electron to localize by performing a classical measurement, for example by moving protons away and measuring which of them has neutral charge (for being coupled with the electron). Then, we get probability | \alpha|^2 of finding it near the first proton and |\beta|^2 of finding it near the second. So, we require that

|\alpha|^2 + |\beta|^2 = 1

Note that as amplitudes are complex, for a given probability there are many possible amplitudes. For example

1 = |1|^2 = |-1|^2 = |i|^2 = \left| \tfrac{1+i}{\sqrt{2}} \right|^2 = \cdots

where i is the imaginary unit, with i^2 = -1.

We will now show that the electron ‘wants’ to be spread out. Electrons don’t really have desires, so this is physics slang for saying that the electron will have less energy if its probability of being near the first proton is equal to its probability of being near the second proton: namely, 50%.

In quantum mechanics, a Hamiltonian is a matrix that describes the relation between the energy and evolution (i.e. how the state changes in time). The expected value of the energy of any state | \psi \rangle is

E = \langle \psi | H | \psi \rangle

Here the row vector \langle \psi | is the column vector | \psi\rangle after transposition and complex conjugation (i.e. changing i to -i), and

\langle \psi | H | \psi \rangle

means we are doing matrix multiplication on \langle \psi |, H and | \psi \rangle to get a number.

For the electron in the \mathrm{H}_2^+ molecule the Hamiltonian can be written as the following 2 \times 2 matrix with real, positive entries:

H =      \begin{bmatrix}          E_0 & \Delta \\          \Delta & E_0      \end{bmatrix},

where E_0 is the energy of the electron being either in state |1\rangle or state |2\rangle, and \Delta is the ‘tunneling amplitude’, which describes how easy it is for the electron to move from neighborhood of one proton to that of the other.

The expected value—physicists call it the ‘expectation value’—of the energy of a given state |\psi\rangle is:

E = \langle \psi | H | \psi \rangle \equiv      \begin{bmatrix}          \alpha^* & \beta^*      \end{bmatrix}      \begin{bmatrix}          E_0 & \Delta \\          \Delta & E_0      \end{bmatrix}      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}.

The star symbol denotes the complex conjugation. If you are unfamiliar with complex numbers, just work with real numbers on which this operation does nothing.

Exercise 1. Find \alpha and \beta with

|\alpha|^2 + |\beta|^2 = 1

that minimize or maximize the expectation value of energy \langle \psi | H | \psi \rangle for

|\psi\rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

Exercise 2. What’s the expectation value value of the energy for the states | 1 \rangle and | 2 \rangle?

Or if you are lazy, just read the answer! It is straightforward to check that

E = (\alpha^* \alpha + \beta^* \beta) E_0 + (\alpha^* \beta + \beta^* \alpha) \Delta

The coefficient of E_0 is 1, so the minimal energy is E_0 - \Delta and the maximal energy is E_0 + \Delta. The states achieving these energies are spread out:

| \psi_- \rangle =      \begin{bmatrix}          1/\sqrt{2} \\ -1/\sqrt{2}      \end{bmatrix},      \quad \text{with} \quad      \quad E = E_0 - \Delta


| \psi_+ \rangle =      \begin{bmatrix}          1/\sqrt{2} \\ 1/\sqrt{2}      \end{bmatrix},      \quad \text{with} \quad      \quad E = E_0 + \Delta

The energies of these states are below and above the energy E_0, and \Delta says how much.

So, the electron is ‘happier’ (electrons don’t have moods either) to be in the state |\psi_-\rangle than to be localized near only one of the protons. In other words—and this is Chemistry 101—atoms like to share electrons and it bonds them. Also, they like to share electrons in a particular and symmetric way.

For reference, |\psi_+ \rangle is called ‘antibonding state’. If the electron is in this state, the atoms will get repelled from each other—and so much for the molecule!

How to classically add quantum things

How can we tell a difference between an electron being in a superposition between two states, and just not knowing its ‘real’ position? Well, first we need to devise a way to describe probabilistic mixtures.

It looks simple—if we have an electron in the state |1\rangle or |2\rangle with probabilities 1/2, we may be tempted to write

|\psi\rangle = \tfrac{1}{\sqrt{2}} |1\rangle + \tfrac{1}{\sqrt{2}} |2\rangle

We’re getting the right probabilities, so it looks legit. But there is something strange about the energy. We have obtained the state |\psi_+\rangle with energy E_0+\Delta by mixing two states with the energy E_0!

Moreover, we could have used different amplitudes such that |\alpha|^2=|\beta|^2=1/2 and gotten different energies. So, we need to devise a way to avoid guessing amplitudes. All in all, we used quotation marks for ‘square roots’ for a reason!

It turns out that to describe statistical mixtures we can use density matrices.

The states we’d been looking at before are described by vectors like this:

| \psi \rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

These are called ‘pure states’. For a pure state, here is how we create a density matrix:

\rho = | \psi \rangle \langle \psi |      \equiv      \begin{bmatrix}          \alpha \alpha^* & \alpha \beta^*\\          \beta \alpha^* & \beta \beta^*       \end{bmatrix}

On the diagonal we get probabilities (|\alpha|^2 and |\beta|^2), whereas the off-diagonal terms (\alpha \beta^* and its complex conjugate) are related to the presence of quantum effects. For example, for |\psi_-\rangle we get

\rho =      \begin{bmatrix}          1/2 & -1/2\\          -1/2 & 1/2       \end{bmatrix}

For an electron in the state |1\rangle we get

\rho =      \begin{bmatrix}          1 & 0\\          0 & 0       \end{bmatrix}.

To calculate the energy, the recipe is the following:

E = \mathrm{tr}[H \rho]

where \mathrm{tr} is the ‘trace‘: the sum of the diagonal entries. For a n \times n square matrix with entries A_{ij} its trace is

\mathrm{tr}(A) = A_{11} + A_{22} + \ldots + A_{nn}

Exercise 3. Show that this formula for energy, and the previous one, give the same result on pure states.

I advertised that density matrices allow us to mix quantum states. How do they do that? Very simple: just by adding density matrices, multiplied by the respective probabilities:

\rho = p_1 \rho_1 + p_2 \rho_2 + \cdots + p_n \rho_n

It is exactly how we would mix probability vectors. Indeed, the diagonals are probability vectors!

So, let’s say that our co-worker was drunk and we are not sure if (s)he said that the state is |\psi_-\rangle or |1\rangle. However, we think that the probabilities are 1/3 and 2/3. We get the density matrix:

\rho =      \begin{bmatrix}          5/6 & -1/6\\          -1/6 & 1/6       \end{bmatrix}

So, how about its energy?

Exercise 4. Show that calculating energy using density matrix gives the same result as averaging energy over component pure states.

I may have given the impression that density matrix is an artificial thing, at best—a practical trick, and what we ‘really’ have are pure states (vectors), each with a given probability. If so, the next exercise is for you:

Exercise 5. Show that a 50%-50% mixture of |1\rangle and |2\rangle is the same as a 50%-50% mixture of |\psi_+\rangle and |\psi_-\rangle.

This is different than statistical mechanics, or statistics, where we can always think about probability distributions as uniquely defined statistical mixtures of possible states. Here, as we see, it can be a bit more tricky.

As we said, for the diagonals things work as for classical probabilities. But there is more—at the same time as adding probabilities we also add the off-diagonal terms, which can add up to cancel, depending on their signs. It’s why it’s mixing quantum states may make them losing their quantum properties.

The value of the off-diagonal term is related to so-called ‘coherence’ between the states |1\rangle and |2\rangle. Its value is bounded by the respective probabilities:

\left| \rho_{12} \right| \leq \sqrt{\rho_{11}\rho_{22}} = \sqrt{p_1 p_2}

where for pure states we get equality.

If the value is zero, there are no quantum effects between two positions: this means that the electron is sure to be at one place or the other, though we might be uncertain at which place. This is fundamentally different from a superposition (non-zero \rho_{12}), where we are uncertain at which site a particle is, but it can no longer be thought to be at one site or the other: it must be in some way associated with both simultaneously.

Exercise 6. For each c \in [-1,1] propose how to obtain a mixed state described by density matrix

\rho =       \begin{bmatrix}          1/2 & c/2\\          c/2 & 1/2       \end{bmatrix}

by mixing pure states of your choice.

A spatial wavefunction

A similar thing works for position. Instead of a two-level system let’s take a particle in one dimension. The analogue of a state vector is a wavefunction, a complex-valued function on a line:


In this continuous variant, p(x) = |\psi(x)|^2 is the probability density of finding particle in one place.

We construct the density matrix (or rather: ‘density operator’) in an way that is analogous to what we did for the two-level system:

\rho(x, x') = \psi(x) \psi^*(x')

Instead of a 2×2 matrix matrix, it is a complex function of two real variables. The probability density can be described by its diagonal values, i.e.

p(x) = \rho(x,x)

Again, we may wonder if the particle energetically favors being in many places at once. Well, it does.

Density matrices for a classical and quantum state. They yield the same probability distributions (for positions). However, their off-diagonal values (i.e. $x\neq x’$) are different. The classical state is just a probabilistic mixture of a particle being in a particular place.

What would happen if we had a mixture of perfectly localized particles? Due to Heisenberg’s uncertainly principle we have

\Delta x \Delta p \geq \frac{\hbar}{2},

that is, that the product of standard deviations of position and momentum is at least some value.

If we exactly know the position, then the uncertainty of momentum goes to infinity. (The same thing holds if we don’t know position, but it can be known, even in principle. Quantum mechanics couldn’t care less if the particle’s position is known by us, by our friend, by our detector or by a dust particle.)

The Hamiltonian represents energy, the energy of a free particle in continuous system is


where m is its mass, and p is its momentum: that is, mass times velocity. So, if the particle is completely localized:

• its energy is infinite,
• its velocity are infinite, so in no time its wavefunction will spread everywhere.

Infinite energies sometimes happen if physics. But if we get infinite velocities we see that there is something wrong. So a particle needs to be spread out, or ‘delocalized’, to some degree, to have finite energy.

As a side note, to consider high energies we would need to employ special relativity. In fact, one cannot localize a massive particle that much, as it will create a soup of particles and antiparticles, once its energy related to momentum uncertainty is as much as the energy related to its mass; see the Darwin term in the fine structure.

Moreover, depending on the degree of its delocalization its behavior is different. For example, a statistical mixture of highly localized particles would spread a lot faster than the same $p(x)$ but derived from a single wavefunction. The density matrix of the former would be in between of that of pure state (a ‘circular’ Gaussian function) and the classical state (a ‘linear’ Gaussian). That is, it would be an ‘oval’ Gaussian, with off-diagonal values being smaller than for the pure state.

Let us look at two Gaussian wavefunctions, with varying level of coherent superposition between them. That is, each Gaussian is already a superposition, but when we combine two we let ourselves use a superposition, or a mixture, or something in between. For a perfect superposition of Gaussian, we would have the density matrix

\rho(x,x') = \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \right) \left( \phi(x'+\tfrac{d}{2}) + \phi(x'-\tfrac{d}{2}) \right)

where \phi(x) is a normalized Gaussian function. For a statistical mixture between these Gaussians split by a distance of d, we would have:

\rho(x,x') = \frac{1}{2} \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2})  +  \frac{1}{2} \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})

And in general,

\begin{array}{ccl}  \rho(x,x') &=& \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2})  +  \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})\right) + \\ \\  && \frac{c}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})  + \phi(x-\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) \right)  \end{array}

for some |c| \leq 1.

PICTURE: Two Gaussian wavefunctions (centered at -2 and +2) in a coherent superposition with each other (the first and the last plot) and a statistical mixture (the middle plot); the 2nd and 4th plot show intermediary states. Superposition can be with different phase, much like the hydrogen example. Color represents absolute value and hue phase; here red is for positive numbers and teal is for negative.

(Click to enlarge.)


We have seen learnt the difference between the quantum superposition and the statistical mixture of states. In particular, while both of these descriptions may give the same probabilities, their predictions on the physical properties of states differ. For example, we need an electron to be delocalized in a specific way to describe chemical bonds; and we need delocalization of any particle to predict its movement.

We used density matrices to express both quantum superposition and (classical) lack of knowledge on the same ground. We have identified its off-diagonal terms as ones related to the quantum coherence.

But what if there were not only two states, but many? So, instead of \mathrm{H}_2^+ (we were not even considering the full hydrogen atom, but only its ionized version), how about electric excitation on something bigger? Not even \mathrm{C}_2\mathrm{H}_5\mathrm{OH} or some sugar, but a protein complex!

So, this will be your homework (cf. this homework on topology). Just joking, there will be another blog post.

March 18, 2015

Jordan EllenbergMath Bracket 2015

March Math Madness is here!  Presenting the 2015 math bracket, as usual prepared by our crack team of handicappers here at the UW math department.  As always, remember that the math bracket is for entertainment purposes only and you should not take offense if the group rated your department lower than the plainly inferior department that knocked you out.  Under no circumstances should you use the math bracket to decide where to go to grad school.

Math Bracket 2015-page-0Lots of tough choices this year!


Chad OrzelJust How Idiotic Are GPAs?

Yesterday’s quick rant had the slightly clickbait-y title “GPAs are Idiotic,” because, well, I’m trying to get people to read the blog, y’know. It’s a little hyperbolic, though, and wasn’t founded in anything but a vague intuition that the crude digitization step involved in going from numerical course averages to letter grades then back to multi-digit GPA on a four-point scale is a silly addition to the grading process.

But, you know, that’s not really scientific, and I have access to sophisticated computing technology, so we can simulate the problematic process, and see just how much trouble that digitization step would cause.

So, I simulated a “B” student: I generated a list of 36 random numbers with a mean of 0.85 (which is generally a “B” when I assign letter grades) and a standard deviation of 0.07 (not quite a full letter grade the way things usually shake out), then wrote a bunch of “If” statements to convert those to “letter” grades on the scale we normally use (anything between 0.87 and 0.90 is a “B+” and gets converted to 3.3, anything between 0.90 and 0.93 is an “A-” and gets converted to 3.7). The result looks like this:

Conversion from decimal "class average" to "letter grade."

Conversion from decimal “class average” to “letter grade.”

I averaged these “letter” grades together to get a simulated “GPA” for this imaginary set of 36 classes (the minimum number required to graduate from Union). To get something directly comparable, I converted this “GPA” back to a decimal score using a linear fit to the step function data shown above.

Then I repeated this a whole bunch of times, ending up with 930 “GPA” scores. And got a plot like this (also seen as the “featured image” above) comparing the “GPA” calculated from the “letter” grades to the average of the original “class average” without the intermediate letter step:

Comparison of "GPA" for simulated B students with and without the intermediate step of passing through letter grades.

Comparison of “GPA” for simulated B students with and without the intermediate step of passing through letter grades.

So, what does this say? Well, this tells us that the conversion to letter grades and then back adds noise– if the two “average” grades were in perfect agreement, all those points would fall on a single line, like this sanity-check plot of the output of the conversion function:

A plot of the back-converted decimal score versus the "GPA" it was calculated from.

A plot of the back-converted decimal score versus the “GPA” it was calculated from.

The thicker band of points you see in the plot of the real simulation indicates that, as expected, the crude digitization step of converting to letter grades and then averaging adds some noise to the system– the resulting “GPA” is sometimes higher than the “real” average without digitization, and sometimes lower. Usually lower, actually, because the fit function I was using to do the conversion skews that way slightly, but it doesn’t really matter; what matters is the “width” of that band. Depending on the exact details of the digitization, a particular “real” average falls somewhere within a range of about 0.01– a student with an 0.84 “real” average will end up somewhere between 0.835 and 0.845 after passing through the “letter grade” step. Or, if you want this on a 4-point GPA scale, it’s about 0.05 GPA points, so between 3.00 and 3.05.

(The standard deviation of the distributions of scores are basically identical, but both a bit higher than the input to the simulation– 0.110 and 0.111 for the original and after-letter averages, respectively. This is probably an artifact of some shortcuts I took when generating the 930 points you see in that plot, but I’m not too worried about the difference.)

How bad is that? Enh, it’s kind of hard to say. The class rank that we report on transcripts certainly turns on smaller GPA differences than that– we report GPA to three decimal places for a reason. So to the extent that those rankings actually matter, it’s probably not good to have that step in there.

But, on the other hand, the right thing to talk about is probably comparing the noise introduced by the letter-grade step to the inherent uncertainty in the grading process, and in that light, it doesn’t look so bad. That is, I’m not sure I would trust the class average grades I calculate for a typical intro course to be much better than plus or minus one percent, which is what you see in the back-converted average. The noise from passing through letter grades is probably comparable to the noise inherent in assigning numerical grades in the first place.

So, what about yesterday’s headline? Well, from a standpoint of the extra effort required on the part of faculty who have to assign letter grades that then get converted back to numbers, it’s still a silly and pointless step. But in the grand scheme of things, it’s probably not doing all that much damage.


(Important caveat: These results depend on the exact numbers I picked for the simulation– mean of 0.85, standard deviation of 0.07– and those were more or less pulled out of thin air. I could probably do something a bit more rigorous by looking at student GPA data and my old grade sheets and so on, but I don’t care that much. This quick-and-dirty analysis is enough to satisfy my curiosity on this question.)

Tommaso DorigoWatch The Solar Eclipse On Friday!

In the morning of March 20th Europeans will be treated with the amazing show of a total solar eclipse. The path of totality is unfortunately confined to the northern Atlantic ocean, and will miss Iceland and England, passing only over the Faroer islands - no wonder there's no hotel room available there since last September! Curiously, the totality will end on the north pole, which on March 20th has the sun exactly at the horizon. Hence the conditions for a great shot like the one below are perfect - I only hope somebody will be at the north pole with a camera...

(Image credit: Fred Bruenjes;

read more

BackreactionNo foam: New constraint on space-time fluctuations rules out stochastic Planck scale dispersion

The most abused word in science writing is “space-time foam.” You’d think is a technical term, but it isn’t – space-time foam is just a catch-all phrase for some sort of quantum gravitational effect that alters space-time at the Planck scale. The technical term, if there is any, would be “Planck scale effect”. And please note that I didn’t say quantum gravitational effects “at short distances” because that is an observer-dependent statement and wouldn’t be compatible with Special Relativity. It is generally believed that space-time is affected at high curvature, which is an observer-independent statement and doesn’t a priori have anything to do with distances whatsoever.

Having said that, you can of course hypothesize Planck scale effects that do not respect Special Relativity and then go out to find constraints, because maybe quantum gravity does indeed violate Special Relativity? There is a whole paper industry behind this since violations of Special Relativity tend to result in large and measurable consequences, in contrast to other quantum gravity effects, which are tiny. A lot of experiments have been conducted already looking for deviations from Special Relativity. And one after the other they have come back confirming Special Relativity, and General Relativity in extension. Or, as the press has it: “Einstein was right.”

Since there are so many tests already, it has become increasingly hard to still believe in Planck scale effects that violate Special Relativity. But players gonna play and haters gonna hate, and so some clever physicists have come up with models that supposedly lead to Einstein-defeating Planck scale effects which could be potentially observable, be indicators for quantum gravity, and are still compatible with existing observations. A hallmark of these deviations from Special Relativity is that the propagation of light through space-time becomes dependent on the wavelength of the light, an effect which is called “vacuum dispersion”.

There are two different ways this vacuum dispersion of light can work. One is that light of shorter wavelength travels faster than that of longer wavelength, or the other way round. This is a systematic dispersion. The other way is that the dispersion is stochastic, so that the light sometimes travels faster, sometimes slower, but on the average it moves still with the good, old, boring speed of light.

The first of these cases, the systematic one, has been constrained to high precision already, and no Planck scale effects have been seen. This has been discussed since a decade or so, and I think (hope!) that by now it’s pretty much off the table. You can always of course come up with some fancy reason for why you didn’t see anything, but this is arguably unsexy. The second case of stochastic dispersion is harder to come by because on the average you do get back Special Relativity.

I already mentioned in September last year that Jonathan Granot gave a talk at the 2014 conference on “Experimental Search for Quantum Gravity” where he told us he and collaborators had been working on constraining the stochastic case. I tentatively inquired if they saw any deviations from no effect and got a head shake, but was told to keep my mouth shut until the paper is out. To make a long story short, the paper has appeared now, and they don’t see any evidence for Planck scale effects whatsoever:
A Planck-scale limit on spacetime fuzziness and stochastic Lorentz invariance violation
Vlasios Vasileiou, Jonathan Granot, Tsvi Piran, Giovanni Amelino-Camelia
What they did for this analysis is to take a particularly pretty gamma ray burst, GRB090510. The photons from gamma ray bursts like this travel over a very long distances (some Gpc), during which the deviations from the expected travel time add up. The gamma ray spectrum can also extend to quite high energies (about 30 GeV for this one) which is helpful because the dispersion effect is supposed to become stronger with high energy.

What the authors do then is basically to compare the lower energy part of the spectrum with the higher energy part and see if they have a noticeable difference in the dispersion, which would tend to wash out structures. The answer is, no, there’s no difference. This in turn can be used to constrain the scale at which effects can set in, and they get a constraint a little higher than the Planck scale (1.6 times) at high confidence (99%).

It’s a neat paper, well done, and I hope this will put the case to rest.

Am I surprised by the finding? No. Not only because I knew the result since September, but also because the underlying models that give rise to such effects are theoretically unsatisfactory, at least for what I am concerned. This is particularly apparent for the systematic case. In the systematic case the models are either long ruled out already because they break Lorentz-invariance, or they result in non-local effects which are also ruled out already. Or, if you want to avoid both they are simply theoretically inconsistent. I showed this in a paper some years ago. I also mentioned in that paper that the argument I presented does not apply for the stochastic case. However, I added this because I simply wasn’t in the mood to spend more time on this than I already had. I am pretty sure you could use the argument I made also to kill the stochastic case on similar reasoning. So that’s why I’m not surprised. It is of course always good to have experimental confirmation.

While I am at it, let me clear out a common confusion with these types of tests. The models that are being constrained here do not rely on space-time discreteness, or “graininess” as the media likes to put it. It might be that some discrete models give rise to the effects considered here, but I don’t know of any. There are discrete models of space-time of course (Causal Sets, Causal Dynamical Triangulation, LQG, and some other emergent things), but there is no indication that any of these leads to an effect like the stochastic energy-dependent dispersion. If you want to constrain space-time discreteness, you should look for defects in the discrete structure instead.

And because my writer friends always complain the fault isn’t theirs but the fault is that of the physicists who express themselves sloppily, I agree, at least in this case. If you look at the paper, it’s full with foam, and it totally makes the reader believe that the foam is a technically well-defined thing. It’s not. Every time you read the word “space-time foam” make that “Planck scale effect” and suddenly you’ll realize that all it means is a particular parameterization of deviations from Special Relativity that, depending on taste, is more or less well-motivated. Or, as my prof liked to say, a paramterization of ignorance.

In summary: No foam. I’m not surprised. I hope we can now forget deformations of Special Relativity.

Jordan EllenbergStrawberries and Cream

I discovered yesterday, three nested directories down in my math department account, that I still had a bunch of files from my last desktop Mac, which retired in about 2003. And among those files were backups from my college Mac Plus, and among those files were backups from 3 1/4″ discs I used on the family IBM PC in the late 1980s. Which is to say I have readable text files of almost every piece of writing I produced from age 15 through about 25.

Very weird to encounter my prior self so directly. And surprising that so much of it is familiar to me, line by line. I can see, now, who I liked to rip off: Raymond Carver, a lot. Donald Barthelme. There’s one poem where I’m pretty sure I was going for “mid-80s Laurie Anderson lyrics.” Like everyone else back then I was really into worrying about nuclear war. I produced two issues of a very mild-mannered underground newspaper called “Ground Zero” with a big mushroom cloud on the front, for the purpose of which my pseudonym was “Bogus Librarian.” (I really liked Bill and Ted’s. Still do, actually.) Anyway, there’s a nuclear war story in this batch, which ends like this: “And the white fire came, and he wept no more.” Who is “he”? The President, natch.

But actually what I came here to include is the first thing I really remember writing, which is a play, called “Strawberries and Cream.”  I wrote it for Harold White’s 9th grade English class.  The first time I met Mr. White he said “Who’s your favorite author?” and I said “I don’t know, I don’t think I had one,” and he said, “Well, that’s terrible, everyone should have a favorite author,” and I probably should have felt bullied but instead felt rather adult and taken seriously.

A central element of his English class was writing imitations of writers, one in each genre.  So I wrote an imitation John Cheever story, and I think an imitation Edna St. Vincent Millay poem (I can’t find this one, tragically.) But the thing Mr. White asked me to read that really sang to me was The Bald Soprano.  Was it that obvious, from the outside, that it was mid-century Continental absurdism I was lacking?  Or was it just a lucky guess?

Anyway:  below the fold, please enjoy “Strawberries and Cream,” the imitation Eugene Ionesco play I wrote when I was 15.


Strawberries and Cream

[Curtain opens. The set is a blank, featureless room, lit brightly.
In the back of the stage is a bare wooden door. Upstage right there is
a piano, but it will remain unlighted until it is used. In the front
left corner ANTHONY sits at a desk typing quietly. SANE AUTHOR and MAD
AUTHOR are playing poker in front stage.]

Mad Author : I have three of a kind. You?
Sane Author: I have many paper rectangular objects, marked with numbers
and symbols. Two of my numbers seem to be equal, that is
to say the same. Of course, they may not be at all. For
example, on is marked in red and one in black. The
asymmetry in colour gives a Nietzchean twist to the
Mad Author : One pair. [takes the pot and deals another hand]
I have two pair. You?
Sane Author: Five of a kind.
Mad Author : [examines SA’s cards] You have no five of a kind. All of
your cards are different.
Sane Author: Ah, but they are all cards.
Mad Author : Very true.
[SANE AUTHOR takes the pot]
[Door opens. Enter PRIEST. MAD AUTHOR hides cards.]
Priest : Bless you.
Mad Author : I haven’t sneezed.
Priest : But you will, quite soon, and then I shall not have to
bless you again.
Sane Author: Very good planning, Father. My friend here has always been
exceptional in the nasal impulse, as it were. Your
blessing is a marvelous Hegelian tribute to the power of
determinism in the human scene.
Priest : Thank you. [MA sneezes] There, see?
[SA pulls out a top and begins spinning it]
Mad Author : What are you doing?
Sane Author: I am exploring the philosophical realms of existence.
Mad Author : You are spinning a top.
Sane Author: But a very philosophical top it is.
[Door opens. Enter TOM,DICK, HARRY, and COLONEL. TOM, DICK and HARRY
are all dressed identically. COLONEL is in full uniform and carrying
two colored flags.]
Mad Author : Ah, hello, Tom.
Sane Author: Ah, hello, Dick.
Tom : Hello, Mr. Jones. And how are you?
Priest : Ah, hello, Harry.
Dick : Hello, Mr. Smith. And how are you?
Mad Author : Fine, thank you, Tom.
Harry : Hello, Father, and how are you?
Sane Author: Fine, thank you, Dick.
Tom : Nice weather we’re having.
Priest : Fine, thank you, Harry.
Dick : Awful weather we’re having.
Mad Author : Oh yes, quite.
Harry : They can’t seem to decide whether we’re having-
Sane Author: Oh yes, quite.
Priest : Nice or awful?
Harry : Oh yes, quite.
MA,SA, and Priest : I say, who is your quiet friend?
[COLONEL signals with flags]
Tom : Oh, that’s the Colonel.
Dick : He’s quite blind, you know.
Harry : So he has to speak in Singapore.
Tom,Dick,and Harry: I say, who is your quiet friend?
[ANTHONY types louder for a moment]
Mad Author : Oh, that’s Anthony.
Sane Author: He’s writing a play, you know.
Priest : So he has to type all day.
Tom : What is the name of this play?
Sane Author: It’s called “Strawberries and Cream”.
Tom : Strawberries and cream? Why?
Sane Author: Apparently, a character uses that phrase as a refrain
throughout the play.
Tom : I see.
Priest : Strawberries and cream!
Tom : And may I ask what is this play about?
Sane Author: You may.
Priest : Strawberries and cream! Strawberries and cream!
Tom : What is it about, then?
Priest : Strawberries and cream! Strawberries and cream!
Strawberries and cream!
Harry : Do stop; it’s unbecoming to a man of the cloth.
Sane Author: I’m glad you asked. This play is a masterpiece, a marvel
of nihilistic anti-existentialism. It exposes the
underlying Joycean motivations in the fabric of society.
You DO, I trust, prefer the teachings of Kant in
neo-classicism to the Freudian gedankensystem of Sartre?
Tom : Quite.
Dick : Mr. Smith, are you a philosopher?
Sane Author: No, a carpenter. By hobby, that is. By profession, I’m an
Dick : Oh? What sort?
Sane Author: A sane author. My friend Jones [points at MA] is a mad
author, though.
Harry : Are you sure he’s mad?
Sane Author: Quite sure.
Mad Author : I heard that!
Sane Author: Of course you did, you’re not deaf like the Colonel here.
Priest : I thought they said he was paralyzed.
Sane Author: Never mind.
Tom : How do you know he’s mad?
Sane Author: Ask him and see.
Tom : Mr. Jones, are you mad?
Mad Author : No, not at all.
Tom(to SA) : He says he’s not mad.
Sane Author: Naturally. The mad always insist that they are sane.
Mad Author : But you said YOU were sane!
Sane Author: I did.
Mad Author : So you are mad as well!
Sane Author: No, for the sane also insist that they are sane.
Mad Author : Then how do you know I’m mad?
Sane Author: Just look at you!
[All stare at MAD AUTHOR]
Mad Author : Alright, believe what you will. I still say I’m sane.
Sane Author: You would.
[Door opens. Enter MUSICIAN]
All else : Ah, hello, Musician,. Fine, thank you, Musician. Oh yes,
Musician : Hello, all. And how are you? Awfully nice weather we’re
Priest : Strawberries and cream! Strawberries and cream!
Musician : Oh yes, that reminds me..
Mad Author : What?
Musician : Now I’ve forgotten. Deja vu, as the French say.
Tom : Memory is such a fickle thing.
Musician : Exactly so. In fact, I’ve written a song about it.
Sane Author: Let us hear, let us hear!
Musician : Certainly. Bring me my imaginary piano.
[MAD AUTHOR wheels out imaginary piano, wiping brow as if with great
Tom : (whispers to SA) But that is a real piano!
Sane Author: He thinks it is imaginary. Humor him.
[MUSICIAN begins to “play”, hitting the note G-sharp over and over.]
Musician : I call this song “Bananas and Cream”.
Dick : Might “Peaches” not substitute well for “Bananas”?
Harry : Or “Blueberries”?
Musician : (ignoring them, sings tunelessly in a different key)
I remember apples,
And oranges,
And grapefruit,
And melons,
And mangoes,
But O the cream!
[All applaud loudly]
Sane Author: That perfectly captured the quintessential essential
essence of the matter.
Mad Author : How true!
Priest : How true!
Tom : How true!
Dick : How true!
Harry : How true!
[COLONEL waves flags]
Dick : He says.. “How true!”
Tom : A shame.. He was such a great speaker before he went mad.
Mad Author : How true!
Musician : Ah, now I remember. I know you’ll all find this quite
Mad Author : Do tell!
Musician : It was.. No, I’ve forgotten again.
Priest : Strawberries and cream!
Tom : Why do you keep repeating that cryptic phrase?
Priest : What else should I be doing?
Tom : Well, what are priests supposed to do?
Priest : Er…
Sane Author: He is waiting for the Messiah to come!
Priest : (relieved) Yes, that’s just it.
Tom : How do you know that the Messiah is coming?
Priest : Eh.. I’m sure Mr.Smith can explain it best.
Sane Author: Certainly. You admit that our Saviour has gone?
Tom : Oh yes, quite.
Sane Author: Then he must come. It is as simple as what goes down must
come up.
Harry : Ah… the discipline of logic! All men are mortal,
Socrates is a man..
Dick : Therefore, Socrates is a philosopher.
Sane Author: A carpenter, by hobby.
Musician : Cogito ergo sum, as the French say.
Sane Author: (pulls out large sheet of paper) Besides, he’s on the
guest list.
[Door opens. Enter MESSIAH.]
Tom : Ah, there he is now.
Sane Author: He’s upset my top.
Messiah : So pleased to meet you all.
All else : Likewise, I’m sure.
Sane Author: Meow. Meow.
Tom : Why do you keep repeating that cryptic phrase?
Sane Author: I’ve decided that I am a cat.
Tom : How is that?
Priest : We thought you were an author.
Sane Author: Yes, yes, but a cat nonetheless. Examine the evidence. A
cat has four limbs, two eyes, two ears, a nose, and hair
on its head. And so have I! And a cat, of course, may look
upon a king.
Messiah : Or a Messiah.
Tom : Or even a Colonel.
Mad Author : You’ve got no tail.
Sane Author: I’m a Manx cat, then.
Harry : Just so.
Sane Author: Meow.
Musician : Oh, yes, now I remember.
[During this exchange, the COLONEL walks over to the real piano.]
Messiah : Oh, good. I was hoping I hadn’t missed it.
Musician : No, that wasn’t it.
Dick : Oh, dear.
Musician : Memory is such a fickle thing. I’ve written a song about
it, you know.
Mad Author : We know.
Musician : I call it “Peaches and Cream”.
Dick : Might “Bananas” not substitute well for “Peaches”?
Musician : Perhaps.
[COLONEL begins to play beautiful, difficult piece in a minor key.]
Tom : There goes the Colonel on his imaginary piano.
Musician : It looks quite real to me.
Harry : He was a very good pianist before he died.
Musician : It’s right on the tip of my tongue.
Mad Author : Is it?
Musician : It is. It’s quite important, I think. I can’t seem to
Sane Author: I know how it is.
Musician : That’s life, as the French say.
Priest : Strawberries and cream!
Messiah : Strawberries and cream!
Sane Author: Strawberries and cream!
All chant : Strawberries and cream! Strawberries and cream!
Strawberries and cream! Strawberries and cream!
[The chant continues, growing softer and softer. No one moves.
ANTHONY’s typing grows louder. The lighting gains a red tint.]
Musician : I remember now! I have it!
[Anthony, with a flourish, stops typing and tears out the last sheet
with a flourish. He hands it to the Musician.]
Musician : (reads) Curtain closes.
[Curtain closes.]

John BaezPlanets in the Fourth Dimension

You probably that planets go around the sun in elliptical orbits. But do you know why?

In fact, they’re moving in circles in 4 dimensions. But when these circles are projected down to 3-dimensional space, they become ellipses!

This animation by Greg Egan shows the idea:

The plane here represents 2 of the 3 space dimensions we live in. The vertical direction is the mysterious fourth dimension. The planet goes around in a circle in 4-dimensional space. But down here in 3 dimensions, its ‘shadow’ moves in an ellipse!

What’s this fourth dimension I’m talking about here? It’s a lot like time. But it’s not exactly time. It’s the difference between ordinary time and another sort of time, which flows at a rate inversely proportional to the distance between the planet and the sun.

The movie uses this other sort of time. Relative to this other time, the planet is moving at constant speed around a circle in 4 dimensions. But in ordinary time, its shadow in 3 dimensions moves faster when it’s closer to the sun.

All this sounds crazy, but it’s not some new physics theory. It’s just a different way of thinking about Newtonian physics!

Physicists have known about this viewpoint at least since 1980, thanks to a paper by the mathematical physicist Jürgen Moser. Some parts of the story are much older. A lot of papers have been written about it.

But I only realized how simple it is when I got this paper in my email, from someone I’d never heard of before:

• Jesper Göransson, Symmetries of the Kepler problem, 8 March 2015.

I get a lot of papers by crackpots in my email, but the occasional gem from someone I don’t know makes up for all those.

The best thing about Göransson’s 4-dimensional description of planetary motion is that it gives a clean explanation of an amazing fact. You can take any elliptical orbit, apply a rotation of 4-dimensional space, and get another valid orbit!

Of course we can rotate an elliptical orbit about the sun in the usual 3-dimensional way and get another elliptical orbit. The interesting part is that we can also do 4-dimensional rotations. This can make a round ellipse look skinny: when we tilt a circle into the fourth dimension, its ‘shadow’ in 3-dimensional space becomes thinner!

In fact, you can turn any elliptical orbit into any other elliptical orbit with the same energy by a 4-dimensional rotation of this sort. All elliptical orbits with the same energy are really just circular orbits on the same sphere in 4 dimensions!

Jesper Göransson explains how this works in a terse and elegant way. But I can’t resist summarizing the key results.

The Kepler problem

Suppose we have a particle moving in an inverse square force law. Its equation of motion is

\displaystyle{ m \ddot{\mathbf{r}} = - \frac{k \mathbf{r}}{r^3} }

where \mathbf{r} is its position as a function of time, r is its distance from the origin, m is its mass, and k says how strong the force is. From this we can derive the law of conservation of energy, which says

\displaystyle{ \frac{m \dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{k}{r} = E }

for some constant E that depends on the particle’s orbit, but doesn’t change with time.

Let’s consider an attractive force, so k > 0, and elliptical orbits, so E < 0. Let’s call the particle a ‘planet’. It’s a planet moving around the sun, where we treat the sun as so heavy that it remains perfectly fixed at the origin.

I only want to study orbits of a single fixed energy E. This frees us to choose units of mass, length and time in which

m = 1, \;\; k = 1, \;\; E = -\frac{1}{2}

This will reduce the clutter of letters and let us focus on the key ideas. If you prefer an approach that keeps in the units, see Göransson’s paper.

Now the equation of motion is

\displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} }

and conservation of energy says

\displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} }

The big idea, apparently due to Moser, is to switch from our ordinary notion of time to a new notion of time! We’ll call this new time s, and demand that

\displaystyle{ \frac{d s}{d t} = \frac{1}{r} }

This new kind of time ticks more slowly as you get farther from the sun. So, using this new time speeds up the planet’s motion when it’s far from the sun. If that seems backwards, just think about it. For a planet very far from the sun, one day of this new time could equal a week of ordinary time. So, measured using this new time, a planet far from the sun might travel in one day what would normally take a week.

This compensates for the planet’s ordinary tendency to move slower when it’s far from the sun. In fact, with this new kind of time, a planet moves just as fast when it’s farthest from the sun as when it’s closest.

Amazing stuff happens with this new notion of time!

To see this, first rewrite conservation of energy using this new notion of time. I’ve been using a dot for the ordinary time derivative, following Newton. Let’s use a prime for the derivative with respect to s. So, for example, we have

\displaystyle{ t' = \frac{dt}{ds} = r }


\displaystyle{ \mathbf{r}' = \frac{dr}{ds} = \frac{dt}{ds}\frac{dr}{dt} = r \dot{\mathbf{r}} }

Using this new kind of time derivative, Göransson shows that conservation of energy can be written as

\displaystyle{ (t' - 1)^2 + \mathbf{r}' \cdot \mathbf{r}' = 1 }

This is the equation of a sphere in 4-dimensional space!

I’ll prove this later. First let’s talk about what it means. To understand it, we should treat the ordinary time coordinate t and the space coordinates (x,y,z) on an equal footing. The point


moves around in 4-dimensional space as the parameter s changes. What we’re seeing is that the velocity of this point, namely

\mathbf{v} = (t',x',y',z')

moves around on a sphere in 4-dimensional space! It’s a sphere of radius one centered at the point


With some further calculation we can show some other wonderful facts:

\mathbf{r}''' = -\mathbf{r}'


t''' = -(t' - 1)

These are the usual equations for a harmonic oscillator, but with an extra derivative!

I’ll prove these wonderful facts later. For now let’s just think about what they mean. We can state both of them in words as follows: the 4-dimensional velocity \mathbf{v} carries out simple harmonic motion about the point (1,0,0,0).

That’s nice. But since \mathbf{v} also stays on the unit sphere centered at this point, we can conclude something even better: v must move along a great circle on this sphere, at constant speed!

This implies that the spatial components of the 4-dimensional velocity have mean 0, while the t component has mean 1.

The first part here makes a lot of sense: our planet doesn’t drift ever farther from the Sun, so its mean velocity must be zero. The second part is a bit subtler, but it also makes sense: the ordinary time t moves forward at speed 1 on average with respect to the new time parameter s, but its rate of change oscillates in a sinusoidal way.

If we integrate both sides of

\mathbf{r}''' = -\mathbf{r}'

we get

\mathbf{r}'' = -\mathbf{r} + \mathbf{a}

for some constant vector \mathbf{a}. This says that the position \mathbf{r} oscillates harmonically about a point \mathbf{a}. Since \mathbf{a} doesn’t change with time, it’s a conserved quantity: it’s called the Runge–Lenz vector.

Often people start with the inverse square force law, show that angular momentum and the Runge–Lenz vector are conserved, and use these 6 conserved quantities and Noether’s theorem to show there’s a 6-dimensional group of symmetries. For solutions with negative energy, this turns out to be the group of rotations in 4 dimensions, \mathrm{SO}(4). With more work, we can see how the Kepler problem is related to a harmonic oscillator in 4 dimensions. Doing this involves reparametrizing time.

I like Göransson’s approach better in many ways, because it starts by biting the bullet and reparametrizing time. This lets him rather efficiently show that the planet’s elliptical orbit is a projection to 3-dimensional space of a circular orbit in 4d space. The 4d rotational symmetry is then evident!

Göransson actually carries out his argument for an inverse square law in n-dimensional space; it’s no harder. The elliptical orbits in n dimensions are projections of circular orbits in n+1 dimensions. Angular momentum is a bivector in n dimensions; together with the Runge–Lenz vector it forms a bivector in n+1 dimensions. This is the conserved quantity associated to the (n+1) dimensional rotational symmetry of the problem.

He also carries out the analogous argument for positive-energy orbits, which are hyperbolas, and zero-energy orbits, which are parabolas. The hyperbolic case has the Lorentz group symmetry and the zero-energy case has Euclidean group symmetry! This was already known, but it’s nice to see how easily Göransson’s calculations handle all three cases.

Mathematical details

Checking all this is a straightforward exercise in vector calculus, but it takes a bit of work, so let me do some here. There will still be details left to fill in, and I urge that you give it a try, because this is the sort of thing that’s more interesting to do than to watch.

There are a lot of equations coming up, so I’ll put boxes around the important ones. The basic ones are the force law, conservation of energy, and the change of variables that gives

\boxed{  t' = r , \qquad  \mathbf{r}' = r \dot{\mathbf{r}} }

We start with conservation of energy:

\boxed{ \displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} -  \frac{1}{r}  = -\frac{1}{2} } }

and then use

\displaystyle{ \dot{\mathbf{r}} = \frac{d\mathbf{r}/dt}{dt/ds} = \frac{\mathbf{r}'}{t'} }

to obtain

\displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{2 t'^2}  - \frac{1}{t'} = -\frac{1}{2} }

With a little algebra this gives

\boxed{ \displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1} }

This shows that the ‘4-velocity’

\mathbf{v} = (t',x',y',z')

stays on the unit sphere centered at (1,0,0,0).

The next step is to take the equation of motion

\boxed{ \displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} } }

and rewrite it using primes (s derivatives) instead of dots (t derivatives). We start with

\displaystyle{ \dot{\mathbf{r}} = \frac{\mathbf{r}'}{r} }

and differentiate again to get

\ddot{\mathbf{r}} = \displaystyle{ \frac{1}{r} \left(\frac{\mathbf{r}'}{r}\right)' }  = \displaystyle{ \frac{1}{r} \left( \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^2} \right) } = \displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} }

Now we use our other equation for \ddot{\mathbf{r}} and get

\displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} = - \frac{\mathbf{r}}{r^3} }


r \mathbf{r}'' - r' \mathbf{r}' = -\mathbf{r}


\boxed{ \displaystyle{ \mathbf{r}'' =  \frac{r' \mathbf{r}' - \mathbf{r}}{r} } }

To go further, it’s good to get a formula for r'' as well. First we compute

r' = \displaystyle{ \frac{d}{ds} (\mathbf{r} \cdot \mathbf{r})^{\frac{1}{2}} } = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}}{r} }

and then differentiating again,

r'' = \displaystyle{\frac{d}{ds} \frac{\mathbf{r}' \cdot \mathbf{r}}{r} } = \displaystyle{ \frac{r \mathbf{r}'' \cdot \mathbf{r} + r \mathbf{r}' \cdot \mathbf{r}' - r' \mathbf{r}' \cdot \mathbf{r}}{r^2} }

Plugging in our formula for \mathbf{r}'', some wonderful cancellations occur and we get

r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 }

But we can do better! Remember, conservation of energy says

\displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1}

and we know t' = r. So,

\mathbf{r}' \cdot \mathbf{r}' = 1 - (r - 1)^2 = 2r - r^2


r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 } = 1 - r

So, we see

\boxed{ r'' = 1 - r }

Can you get here more elegantly?

Since t' = r this instantly gives

\boxed{ t''' = 1 - t' }

as desired.

Next let’s get a similar formula for \mathbf{r}'''. We start with

\displaystyle{ \mathbf{r}'' =  \frac{r' \mathbf{r}' - \mathbf{r}}{r} }

and differentiate both sides to get

\displaystyle{ \mathbf{r}''' = \frac{r r'' \mathbf{r}' + r r' \mathbf{r}'' - r \mathbf{r}' - r'}{r^2} }

Then plug in our formulas for r'' and \mathbf{r}''. Some truly miraculous cancellations occur and we get

\boxed{  \mathbf{r}''' = -\mathbf{r}' }

I could show you how it works—but to really believe it you have to do it yourself. It’s just algebra. Again, I’d like a better way to see why this happens!

Integrating both sides—which is a bit weird, since we got this equation by differentiating both sides of another one—we get

\boxed{ \mathbf{r}'' = -\mathbf{r} + \mathbf{a} }

for some fixed vector \mathbf{a}, the Runge–Lenz vector. This says \mathbf{r} undergoes harmonic motion about \mathbf{a}. It’s quite remarkable that both \mathbf{r} and its norm r undergo harmonic motion! At first I thought this was impossible, but it’s just a very special circumstance.

The quantum version of a planetary orbit is a hydrogen atom. Everything we just did has a quantum version! For more on that, see

• Greg Egan, The ellipse and the atom.

For more of the history of this problem, see:

• John Baez, Mysteries of the gravitational 2-body problem.

This also treats quantum aspects, connections to supersymmetry and Jordan algebras, and more! Someday I’ll update it to include the material in this blog post.

Jordan EllenbergA boy’s first casserole

CJ had a vision for dinner. I don’t know where he came up with this. But he said he wanted mashed potatoes with green beans and chopped up hardboiled eggs. OK I said but you know what it needs, some Penzey’s toasted onions and we can put some chunks of gruyere in there and it’ll melt. In the end I was suspicious of the hardboiled eggs so we had them on the side. The final product was something I think could easily be sold in the grocery store hot case at $8.99 a pound. I know this looks kind of like barf, but it works. (See also: the Israeli electoral system.)

March 17, 2015

Terence Tao254A, Supplement 6: A cheap version of the theorems of Halasz and Matomaki-Radziwiłł (optional)

In analytic number theory, it is a well-known phenomenon that for many arithmetic functions {f: {\bf N} \rightarrow {\bf C}} of interest in number theory, it is significazintly easier to estimate logarithmic sums such as

\displaystyle \sum_{n \leq x} \frac{f(n)}{n}

than it is to estimate summatory functions such as

\displaystyle \sum_{n \leq x} f(n).

(Here we are normalising {f} to be roughly constant in size, e.g. {f(n) = O( n^{o(1)} )} as {n \rightarrow \infty}.) For instance, when {f} is the von Mangoldt function {\Lambda}, the logarithmic sums {\sum_{n \leq x} \frac{\Lambda(n)}{n}} can be adequately estimated by Mertens’ theorem, which can be easily proven by elementary means (see Notes 1); but a satisfactory estimate on the summatory function {\sum_{n \leq x} \Lambda(n)} requires the prime number theorem, which is substantially harder to prove (see Notes 2). (From a complex-analytic or Fourier-analytic viewpoint, the problem is that the logarithmic sums {\sum_{n \leq x} \frac{f(n)}{n}} can usually be controlled just from knowledge of the Dirichlet series {\sum_n \frac{f(n)}{n^s}} for {s} near {1}; but the summatory functions require control of the Dirichlet series {\sum_n \frac{f(n)}{n^s}} for {s} on or near a large portion of the line {\{ 1+it: t \in {\bf R} \}}. See Notes 2 for further discussion.)

Viewed conversely, whenever one has a difficult estimate on a summatory function such as {\sum_{n \leq x} f(n)}, one can look to see if there is a “cheaper” version of that estimate that only controls the logarithmic sums {\sum_{n \leq x} \frac{f(n)}{n}}, which is easier to prove than the original, more “expensive” estimate. In this post, we shall do this for two theorems, a classical theorem of Halasz on mean values of multiplicative functions on long intervals, and a much more recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. The two are related; the former theorem is an ingredient in the latter (though in the special case of the Matomaki-Radziwiłł theorem considered here, we will not need Halasz’s theorem directly, instead using a key tool in the proof of that theorem).

We begin with Halasz’s theorem. Here is a version of this theorem, due to Montgomery and to Tenenbaum:

Theorem 1 (Halasz-Montgomery-Tenenbaum) Let {f: {\bf N} \rightarrow {\bf C}} be a multiplicative function with {|f(n)| \leq 1} for all {n}. Let {x \geq 3} and {T \geq 1}, and set

\displaystyle M := \min_{|t| \leq T} \sum_{p \leq x} \frac{1 - \hbox{Re}( f(p) p^{-it} )}{p}.

Then one has

\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{\sqrt{T}}.

Informally, this theorem asserts that {\sum_{n \leq x} f(n)} is small compared with {x}, unless {f} “pretends” to be like the character {p \mapsto p^{it}} on primes for some small {y}. (This is the starting point of the “pretentious” approach of Granville and Soundararajan to analytic number theory, as developed for instance here.) We now give a “cheap” version of this theorem which is significantly weaker (both because it settles for controlling logarithmic sums rather than summatory functions, it requires {f} to be completely multiplicative instead of multiplicative, it requires a strong bound on the analogue of the quantity {M}, and because it only gives qualitative decay rather than quantitative estimates), but easier to prove:

Theorem 2 (Cheap Halasz) Let {x} be an asymptotic parameter goingto infinity. Let {f: {\bf N} \rightarrow {\bf C}} be a completely multiplicative function (possibly depending on {x}) such that {|f(n)| \leq 1} for all {n}, such that

\displaystyle \sum_{p \leq x} \frac{1 - \hbox{Re}( f(p) )}{p} \gg \log\log x. \ \ \ \ \ (1)



\displaystyle \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n} = o(1). \ \ \ \ \ (2)


Note that now that we are content with estimating exponential sums, we no longer need to preclude the possibility that {f(p)} pretends to be like {p^{it}}; see Exercise 11 of Notes 1 for a related observation.

To prove this theorem, we first need a special case of the Turan-Kubilius inequality.

Lemma 3 (Turan-Kubilius) Let {x} be a parameter going to infinity, and let {1 < P < x} be a quantity depending on {x} such that {P = x^{o(1)}} and {P \rightarrow \infty} as {x \rightarrow \infty}. Then

\displaystyle \sum_{n \leq x} \frac{ | \frac{1}{\log \log P} \sum_{p \leq P: p|n} 1 - 1 |}{n} = o( \log x ).

Informally, this lemma is asserting that

\displaystyle \sum_{p \leq P: p|n} 1 \approx \log \log P

for most large numbers {n}. Another way of writing this heuristically is in terms of Dirichlet convolutions:

\displaystyle 1 \approx 1 * \frac{1}{\log\log P} 1_{{\mathcal P} \cap [1,P]}.

This type of estimate was previously discussed as a tool to establish a criterion of Katai and Bourgain-Sarnak-Ziegler for Möbius orthogonality estimates in this previous blog post. See also Section 5 of Notes 1 for some similar computations.

Proof: By Cauchy-Schwarz it suffices to show that

\displaystyle \sum_{n \leq x} \frac{ | \frac{1}{\log \log P} \sum_{p \leq P: p|n} 1 - 1 |^2}{n} = o( \log x ).

Expanding out the square, it suffices to show that

\displaystyle \sum_{n \leq x} \frac{ (\frac{1}{\log \log P} \sum_{p \leq P: p|n} 1)^j}{n} = \log x + o( \log x )

for {j=0,1,2}.

We just show the {j=2} case, as the {j=0,1} cases are similar (and easier). We rearrange the left-hand side as

\displaystyle \frac{1}{(\log\log P)^2} \sum_{p_1, p_2 \leq P} \sum_{n \leq x: p_1,p_2|n} \frac{1}{n}.

We can estimate the inner sum as {(1+o(1)) \frac{1}{[p_1,p_2]} \log x}. But a routine application of Mertens’ theorem (handling the diagonal case when {p_1=p_2} separately) shows that

\displaystyle \sum_{p_1, p_2 \leq P} \frac{1}{[p_1,p_2]} = (1+o(1)) (\log\log P)^2

and the claim follows. \Box

Remark 4 As an alternative to the Turan-Kubilius inequality, one can use the Ramaré identity

\displaystyle \sum_{p \leq P: p|n} \frac{1}{\# \{ p' \leq P: p'|n\} + 1} - 1 = 1_{(p,n)=1 \hbox{ for all } p \leq P}

(see e.g. Section 17.3 of Friedlander-Iwaniec). This identity turns out to give superior quantitative results than the Turan-Kubilius inequality in applications; see the paper of Matomaki and Radziwiłł for an instance of this.

We now prove Theorem 2. Let {Q} denote the left-hand side of (2); by the triangle inequality we have {Q=O(1)}. By Lemma 3 (for some {P = x^{o(1)}} to be chosen later) and the triangle inequality we have

\displaystyle \sum_{n \leq x} \frac{\frac{1}{\log \log P} \sum_{p \leq P: p|n} f(n)}{n} = Q \log x + o( \log x ).

We rearrange the left-hand side as

\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p} \sum_{m \leq x/p} \frac{f(m)}{m}.

We now replace the constraint {m \leq x/p} by {m \leq x}. The error incurred in doing so is

\displaystyle O( \frac{1}{\log\log P} \sum_{p \leq P} \frac{1}{p} \sum_{x/P \leq m \leq x} \frac{1}{m} )

which by Mertens’ theorem is {O(\log P) = o( \log x )}. Thus we have

\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p} \sum_{m \leq x} \frac{f(m)}{m} = Q \log x + o( \log x ).

But by definition of {Q}, we have {\sum_{m \leq x} \frac{f(m)}{m} = Q \log x}, thus

\displaystyle [1 - \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p}] Q = o(1). \ \ \ \ \ (3)


From Mertens’ theorem, the expression in brackets can be rewritten as

\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{1 - f(p)}{p} + o(1)

and so the real part of this expression is

\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{1 - \hbox{Re} f(p)}{p} + o(1).

By (1), Mertens’ theorem and the hypothesis on {f} we have

\displaystyle \sum_{p \leq x^\varepsilon} \frac{(1 - \hbox{Re} f(p)) \log p}{p} \gg \log\log x^\varepsilon - O_\varepsilon(1)

for any {\varepsilon > 0}. This implies that we can find {P = x^{o(1)}} going to infinity such that

\displaystyle \sum_{p \leq P} \frac{(1 - \hbox{Re} f(p)) \log p}{p} \gg (1-o(1))\log\log P

and thus the expression in brackets has real part {\gg 1-o(1)}. The claim follows.

The Turan-Kubilius argument is certainly not the most efficient way to estimate sums such as {\frac{1}{n} \sum_{n \leq x} f(n)}. In the exercise below we give a significantly more accurate estimate that works when {f} is non-negative.

Exercise 5 (Granville-Koukoulopoulos-Matomaki)

  • (i) If {g} is a completely multiplicative function with {g(p) \in \{0,1\}} for all primes {p}, show that

    \displaystyle (e^{-\gamma}-o(1)) \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1} \leq \sum_{n \leq x} \frac{g(n)}{n} \leq \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1}.

    as {x \rightarrow \infty}. (Hint: for the upper bound, expand out the Euler product. For the lower bound, show that {\sum_{n \leq x} \frac{g(n)}{n} \times \sum_{n \leq x} \frac{h(n)}{n} \ge \sum_{n \leq x} \frac{1}{n}}, where {h} is the completely multiplicative function with {h(p) = 1-g(p)} for all primes {p}.)

  • (ii) If {g} is multiplicative and takes values in {[0,1]}, show that

    \displaystyle \sum_{n \leq x} \frac{g(n)}{n} \asymp \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1}

    \displaystyle \asymp \exp( \sum_{p \leq x} \frac{g(p)}{p} )

    for all {x \geq 1}.

Now we turn to a very recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. For sake of illustration we specialise their results to the simpler case of the Liouville function {\lambda}, although their arguments actually work (with some additional effort) for arbitrary multiplicative functions of magnitude at most {1} that are real-valued (or more generally, stay far from complex characters {p \mapsto p^{it}}). Furthermore, we give a qualitative form of their estimates rather than a quantitative one:

Theorem 6 (Matomaki-Radziwiłł, special case) Let {X} be a parameter going to infinity, and let {2 \leq h \leq X} be a quantity going to infinity as {X \rightarrow \infty}. Then for all but {o(X)} of the integers {x \in [X,2X]}, one has

\displaystyle \sum_{x \leq n \leq x+h} \lambda(n) = o( h ).

Equivalently, one has

\displaystyle \sum_{X \leq x \leq 2X} |\sum_{x \leq n \leq x+h} \lambda(n)|^2 = o( h^2 X ). \ \ \ \ \ (4)


A simple sieving argument (see Exercise 18 of Supplement 4) shows that one can replace {\lambda} by the Möbius function {\mu} and obtain the same conclusion. See this recent note of Matomaki and Radziwiłł for a simple proof of their (quantitative) main theorem in this special case.

Of course, (4) improves upon the trivial bound of {O( h^2 X )}. Prior to this paper, such estimates were only known (using arguments similar to those in Section 3 of Notes 6) for {h \geq X^{1/6+\varepsilon}} unconditionally, or for {h \geq \log^A X} for some sufficiently large {A} if one assumed the Riemann hypothesis. This theorem also represents some progress towards Chowla’s conjecture (discussed in Supplement 4) that

\displaystyle \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_k) = o( x )

as {x \rightarrow \infty} for any fixed distinct {h_1,\dots,h_k}; indeed, it implies that this conjecture holds if one performs a small amount of averaging in the {h_1,\dots,h_k}.

Below the fold, we give a “cheap” version of the Matomaki-Radziwiłł argument. More precisely, we establish

Theorem 7 (Cheap Matomaki-Radziwiłł) Let {X} be a parameter going to infinity, and let {1 \leq T \leq X}. Then

\displaystyle \int_X^{X^A} \left|\sum_{x \leq n \leq e^{1/T} x} \frac{\lambda(n)}{n}\right|^2\frac{dx}{x} = o\left( \frac{\log X}{T^2} \right), \ \ \ \ \ (5)


for any fixed {A>1}.

Note that (5) improves upon the trivial bound of {O( \frac{\log X}{T^2} )}. Again, one can replace {\lambda} with {\mu} if desired. Due to the cheapness of Theorem 7, the proof will require few ingredients; the deepest input is the improved zero-free region for the Riemann zeta function due to Vinogradov and Korobov. Other than that, the main tools are the Turan-Kubilius result established above, and some Fourier (or complex) analysis.

— 1. Proof of theorem —

We now prove Theorem 7. We first observe that it will suffice to show that

\displaystyle \int_0^\infty \varphi( \frac{\log x}{\log X} ) \left|\sum_n \eta( T( \log x - \log n ) ) \frac{\lambda(n)}{n}\right|^2\frac{dx}{x} = o\left( \frac{\log X}{T^2} \right)

for any smooth {\eta, \varphi: {\bf R} \rightarrow {\bf R}} supported on (say) {[-2,2]} and {(1,+\infty)} respectively, as the claim follows by taking {\eta} and {\varphi} to be approximations to {1_{[-1,0]}} and {1_{[1,A]}} respectively and using the triangle inequality to control the error.

We will need a quantity {P = X^{o(1)}} that goes to infinity reasonably fast; for instance, {P := \exp( \log^{0.99} X )} will suffice. By Lemma 3 and the triangle inequality, we can replace {\lambda(n)} in (5) by {\frac{1}{\log \log P} \sum_{p \leq P: p|n} \lambda(n)} while only incurring an acceptable error. Thus our task is now to show that

\displaystyle \int_0^\infty \varphi( \frac{\log x}{\log X} ) \left|\sum_n \eta( T(\log x - \log n)) \frac{\sum_{p \leq P: p|n} \lambda(n)}{n}\right|^2\frac{dx}{x}

\displaystyle = o\left( \frac{\log X}{T^2} (\log\log P)^2 \right).

I will (perhaps idiosyncratically) adopt a Fourier-analytic point of view here, rather than a more traditional complex-analytic point of view (for instance, we will use Fourier transforms as a substitute for Dirichlet series). To bring the Fourier perspective to the forefront, we make the change of variables {x = e^u} and {n = pm}, and note that {\varphi( \frac{\log x}{\log X} ) = \varphi( \frac{\log m}{\log X} ) + o(1)}, to rearrange the previous claim as

\displaystyle \int_{\bf R} |\sum_{p \leq P} \frac{1}{p} F( u - \log p )|^2\ du = o( \log X (\log\log P)^2 ).


\displaystyle F(y) := T \sum_m \varphi( \frac{\log m}{\log X} ) \eta( T( y - \log m) ) \frac{\lambda(m)}{m}. \ \ \ \ \ (6)


Introducing the normalised discrete measure

\displaystyle \mu := \frac{1}{\log\log P} \sum_{p \leq P} \frac{1}{p} \delta_{\log p},

it thus suffices to show that

\displaystyle \| F * \mu \|_{L^2({\bf R})}^2 = o( \log X )

where {*} now denotes ordinary (Fourier) convolution rather than Dirichlet convolution.

From Mertens’ theorem we see that {\mu} has total mass {O(1)}; also, from the triangle inequality (and the hypothesis {T \leq X}) we see that {F} is supported on {[(1-o(1)) \log X, (2+o(1))\log X]} and obeys the pointwise bound of {O(1)}; also, the derivative of {F} is bounded by {O(T)}. Thus we see that the trivial bound on {\| F * \mu \|_{L^2({\bf R})}^2} is {O( \log X)} by Young’s inequality. To improve upon this, we use Fourier analysis. By Plancherel’s theorem, we have

\displaystyle \| F * \mu \|_{L^2({\bf R})}^2 = \int_{\bf R} |\hat F(\xi)|^2 |\hat \mu(\xi)|^2\ d\xi

where {\hat F, \hat \mu} are the Fourier transforms

\displaystyle \hat F(\xi) := \int_{\bf R} F(x) e^{-2\pi i x \xi}\ dx


\displaystyle \hat \mu(\xi) := \int_{\bf R} e^{-2\pi i x \xi}\ d\mu(x).

From Plancherel’s theorem and the derivative bound on {\hat F} we have

\displaystyle \int_{\bf R} |\hat F(\xi)|^2 \ll \log X


\displaystyle \int_{\bf R} |\xi|^2 |\hat F(\xi)|^2 \ll T^2 \log X \leq X^2 \log X

so the contribution of those {\xi} with {\hat \mu(\xi)=o(1)} or {X/|\xi| = o(1)} is acceptable. Also, from the definition of {F} we have

\displaystyle \hat F(\xi) = \hat \eta( \xi / T ) \sum_m \varphi( \frac{\log m}{\log X} ) \frac{\lambda(m)}{m^{1 + 2\pi i \xi}}

and so from the prime number theorem we have {\hat F(\xi) = o(1)} when {\xi = O(1)}; since {\hat \mu(\xi) = O(1)}, we see that the contribution of the region {|\xi| = O(1)} is also acceptable. It thus suffices to show that

\displaystyle \hat \mu(\xi) = o(1)

whenever {\xi = O(X)} and {1/|\xi|=o(1)}. But by definition of {\mu}, we may expand {\hat \mu(\xi)} as

\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{1}{p^{1 + 2\pi i \xi}}

so by smoothed dyadic decomposition (and by choosing {P = X^{o(1)}} with {o(1)} decaying sufficiently slowly) it suffices to show that

\displaystyle \sum_p \varphi( \frac{\log p}{\log Q} ) \frac{\log p}{p^{1 + \frac{1}{\log Q} + 2\pi i \xi}} = o( \log Q )

whenever {Q = X^{o(1)}} for some sufficiently slowly decaying {o(1)}. We replace the summation over primes with a von Mangoldt function weight to rewrite this as

\displaystyle \sum_n \varphi( \frac{\log n}{\log Q} ) \frac{\Lambda(n)}{n^{1 + \frac{1}{\log Q} + 2\pi i \xi}} = o( \log Q ).

Performing a Fourier expansion of the smooth function {\varphi}, it thus suffices to show the Dirichlet series bound

\displaystyle -\frac{\zeta'}{\zeta}(\sigma+it) = \sum_n \frac{\Lambda(n)}{n^{\sigma+it}} = o( \log |t| )

as {|t| \rightarrow \infty} and {\sigma > 1} (we use the crude bound {-\frac{\zeta'}{\zeta}(\sigma+it) \ll \frac{1}{\sigma-1}} to deal with the {t=O(1)} contribution). But this follows from the Vinogradov-Korobov bounds (who in fact get a bound of {O( \log^{2/3}(|t|) \log\log^{1/3}(|t|) )} as {|t| \rightarrow \infty}); see Exercise 43 of Notes 2 combined with Exercise 4(i) of Notes 5.

Remark 8 If one were working with a more general completely multiplicative function {f} than the Liouville function {\lambda}, then one would have to use a duality argument to control the large values of {\hat \mu} (which could occur at a couple more locations than {\xi = O(1)}), and use some version of Halasz’s theorem to also obtain some non-trivial bounds on {F} at those large values (this would require some hypothesis that {f} does not pretend to be like any of the characters {p \mapsto p^{it}} with {t = O(X)}). These new ingredients are in a similar spirit to the “log-free density theorem” from Theorem 6 of Notes 7. See the Matomaki-Radziwiłł paper for details (in the non-cheap case).

Filed under: 254A - analytic prime number theory, math.NT Tagged: Halasz's theorem, Kaisa Matomaki, Maksym Radziwill, multiplicative number theory, Turan-Kubilius inequality

March 16, 2015

ResonaancesWeekend Plot: Fermi and more dwarfs

This weekend's plot comes from the recent paper of the Fermi collaboration:

It shows the limits on the cross section of dark matter annihilation into tau lepton pairs. The limits are obtained from gamma-ray observations of 15 dwarf galaxies during 6 years. Dwarf galaxies are satellites of Milky Way made mostly of dark matter with few stars in it, which makes them a clean environment to search for dark matter signals. This study is particularly interesting because it is sensitive to dark matter models that could explain the gamma-ray excess detected from the center of the Milky Way.  Similar limits for the annihilation into b-quarks have already been shown before at conferences. In that case, the region favored by the Galactic center excess seems entirely excluded. Annihilation of 10 GeV dark matter into tau leptons could also explain the excess. As can be seen in the plot, in this case there is also  large tension with the dwarf limits, although astrophysical uncertainties help to keep hopes alive.  

Gamma-ray observations by Fermi will continue for another few years, and the limits will get stronger.   But a faster way to increase the statistics may be to find more observation targets. Numerical simulations with vanilla WIMP dark matter predict a few hundred dwarfs around the Milky Way. Interestingly, a discovery of several new dwarf candidates was reported last week. This is an important development, as the total number of known dwarf galaxies now exceeds the number of dwarf characters in Peter Jackson movies. One of the candidates, known provisionally as DES J0335.6-5403 or  Reticulum-2, has a large J-factor (the larger the better, much like the h-index).  In fact, some gamma-ray excess around 1-10 GeV is observed from this source, and one paper last week even quantified its significance as ~4 astrosigma (or ~3 astrosigma in an alternative more conservative analysis). However, in the Fermi analysis using  more recent reconstruction Pass-8 photon reconstruction,  the significance quoted is only 1.5 sigma. Moreover the dark matter annihilation cross section required to fit the excess is excluded by an order of magnitude by the combined dwarf limits. Therefore,  for the moment, the excess should not be taken seriously.

John PreskillQuantum Frontiers salutes Terry Pratchett.

I blame British novels for my love of physics. Philip Pullman introduced me to elementary particles; Jasper Fforde, to the possibility that multiple worlds exist; Diana Wynne Jones, to questions about space and time.

So began the personal statement in my application to Caltech’s PhD program. I didn’t mention Sir Terry Pratchett, but he belongs in the list. Pratchett wrote over 70 books, blending science fiction with fantasy, humor, and truths about humankind. Pratchett passed away last week, having completed several novels after doctors diagnosed him with early-onset Alzheimer’s. According to the San Francisco Chronicle, Pratchett “parodie[d] everything in sight.” Everything in sight included physics.

Terry Pratchett continues to influence my trajectory through physics: This cover has a cameo in a seminar I’m presenting in Maryland this March.

Pratchett set many novels on the Discworld, a pancake of a land perched atop four elephants, which balance on the shell of a turtle that swims through space. Discworld wizards quantify magic in units called thaums. Units impressed their importance upon me in week one of my first high-school physics class. We define one meter as “the length of the path travelled by light in vacuum during a time interval of 1/299 792 458 of a second.” Wizards define one thaum as “the amount of magic needed to create one small white pigeon or three normal-sized billiard balls.”

Wizards study the thaum in a High-Energy Magic Building reminiscent of Caltech’s Lauritsen-Downs Building. To split the thaum, the wizards built a Thaumatic Resonator. Particle physicists in our world have split atoms into constituent particles called mesons and baryons. Discworld wizards discovered that the thaum consists of resons. Mesons and baryons consist of quarks, seemingly elementary particles that we believe cannot be split. Quarks fall into six types, called flavors: up, down, charmed, strange, top (or truth), and bottom (or beauty). Resons, too, consist of quarks. The Discworld’s quarks have the flavors up, down, sideways, sex appeal, and peppermint.

Reading about the Discworld since high school, I’ve wanted to grasp Pratchett’s allusions. I’ve wanted to do more than laugh at them. In Pyramids, Pratchett describes “ideas that would make even a quantum mechanic give in and hand back his toolbox.” Pratchett’s ideas have given me a hankering for that toolbox. Pratchett nudged me toward training as a quantum mechanic.

Pratchett hasn’t only piqued my curiosity about his allusions. He’s piqued my desire to create as he did, to do physics as he wrote. While reading or writing, we build worlds in our imaginations. We visualize settings; we grow acquainted with characters; we sense a plot’s consistency or the consistency of a system of magic. We build worlds in our imaginations also when doing and studying physics and math. The Standard Model is a system that encapsulates the consistency of our knowledge about particles. We tell stories about electrons’ behaviors in magnetic fields. Theorems’ proofs have logical structures like plots’. Pratchett and other authors trained me to build worlds in my imagination. Little wonder I’m training to build worlds as a physicist.

Around the time I graduated from college, Diana Wynne Jones passed away. So did Brian Jacques (another British novelist) and Madeleine L’Engle. L’Engle wasn’t British, but I forgave her because her Time Quartet introduced me to dimensions beyond three. As I completed one stage of intellectual growth, creators who’d led me there left.

Terry Pratchett has joined Jones, Jacques, and L’Engle. I will probably create nothing as valuable as his Discworld, let alone a character in the Standard Model toward which the Discworld steered me.

But, because of Terry Pratchett, I have to try.

March 15, 2015

Tim GowersUSS changes — don’t be fooled

This post is meant for anybody who will be affected by proposed changes to the Universities Superannuation Scheme, the body to which I and many other UK academics have paid our pension contributions and that now proposes to change the rules to deal with the fact that it has a large deficit as a result of the financial crisis. (Or rather, it says it has a large deficit, but there are arguments that the amount by which it is in deficit or surplus is highly volatile, so major changes are not necessarily justified.)

Of course, any change will have to be in the direction of making the deal less generous for those with pensions. Indeed, changes have already been made. Until a few years ago, the amount you got at the end was based on your final salary. More precisely, you got one 80th of your final salary per year after retirement for each year that you contributed to the scheme, up to a maximum of 40 years of contributions (and thus a maximum of half your final salary when you retire). But a few years ago they closed this final-salary scheme to new entrants, because (they said) it had become too expensive. This was partly because now a much larger proportion of academics end up as professors, so their final salaries are higher, and also of course because people live for longer.

They now propose to close the final-salary scheme even for existing participants. That of course raises the question of what happens to the contributions we have already made to the scheme. If the USS really can’t afford to keep going with the present arrangements, it is perhaps reasonable to say that we cannot continue to make contributions under those arrangements, but our past contributions were made under the very clear understanding that each year of contributions would add one 80th of our final salary to our eventual annual pension payments. Will that still be the case?

I received a letter from the USS yesterday that included the following reassuring paragraph.

As an active member of the Final Salary section of the scheme, you would be affected by the proposed changes. Under the proposals, the pension benefits provided to you in the future would be different to those that are currently provided through the scheme. It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

Does this mean, then, that the pension I have already built up is safe? No, it decidedly doesn’t. If you received a similar letter and were reassured by the above paragraph, then please unreassure yourself, since it is hiding the fact that you stand to lose a lot of money (the precise amount depending on your circumstances — I will discuss this later in the post).

The key to how this can be lies in a paragraph from a leaflet that I received with the letter. It says the following.

If you are a member of the current final salary section, the benefits you have built up — your accrued benefits — will be calculated using your pensionable salary and pensionable service immediately prior to the implementation date. Going forward, those accrued benefits will be revalued in line with increases in official pensions (currently the Consumer Prices Index — CPI) each April, up to the point of retirement or leaving the scheme.

In plain language, they are saying that for each year of contributions that you have made to the scheme, you will now earn one 80th of your salary at the time that the changes to the scheme are implemented and not at the time that you retire. So if, say, you are in mid career and your final salary ends up 25% higher than your current salary, then what you will get for your contributions so far will be reduced by 20%. (The difference between those two percentages is because if you increase a number by 25%, then to get back to the original number you have to decrease the new number by 20%.)

Let’s illustrate this USS-style with a few hypothetical examples. I will ignore inflation, but it is straightforward to adjust for it.

1. Alice is a historian. She was appointed 19 years ago, when she was in her late 20s. Since then, she has had two children, which caused a temporary drop in her academic productivity, but she has made up for it since, and her career is going well. She has just become a reader, and is told that she is very likely to become a professor in the next two or three years. Her current salary is £56,482 per year and will be £58,172 next year.

Looking into the future, she does indeed become a professor, in 2018, and starts two notches up from the bottom of the professorial salary scale, at £71,506. Looking further into the future, she ends up at the top of Band 1 of the professorial scale, with a salary of £85,354 (plus inflationary increases).

Unfortunately for her, the changes to the scheme are implemented before she is promoted, so the 20 years of contributions that she has by then amassed earn her 20/80, or a quarter, of her reader’s salary of £58,172, per year. That is, it earns her £14,543 per year. (This is not her total pension — just the part of her pension that results from the contributions she has made so far.) Had the scheme not been changed, those contributions would have instead earned her a quarter of her final salary of £85,354, which would work out as £21,438.50 per year. So she has lost nearly £7,000 per year from her pension as a result of the changes. She is destined to live for 25 years after she retires, so her loss works out as £175,000.

2. Bob is also a historian and a good friend of Alice. He was appointed at the same time, is the same age, and has had a very similar career, but he has progressed slightly earlier because he did not have a period of low academic productivity. He became a reader three years ago and will become a professor later this year, starting two notches above the bottom salary level, at £71,506. He too is destined to end his career at the top of professorial Band 1 with a salary of £85,354.

Under the new scheme, his pension contributions up to the time of the change will earn him a quarter of £71,506 per year, or £17,876.50. Under the current scheme, they would have earned him £21,438.50 per year, just as Alice’s would, since their final salaries are destined to be the same. So Bob too has lost out.

However, Bob was luckier than Alice because he was promoted just before the change to the system, as a result of which his salary at the time of the change will be substantially higher than that of Alice. Even though Alice will be promoted soon afterwards, she will end up much worse off than Bob, to the tune of £3,333.50 per year.

3. Carl is a mathematician. He proved some very good results in his early 30s and was promoted to professor at the age of 38. He too has put in 20 years of contributions by the time of the changes, by which time he is at the top of Band 1 with a salary of £85,354. Unfortunately, soon after he became a professor, he burnt out somewhat, never quite matching the achievements of his youth, so his salary is not going to increase any further. So for him the changes to the system make no difference: his current salary is is final salary. As with both Alice and Bob, under the current system his contributions would earn him £21,438.50. But for Carl they will earn him £21,438.50 under the new system as well.

There are two general points I want to make with these examples. The first is that the changes amount to the breaking of an agreement. We were not obliged to take out a pension with USS, but were told that it was crazy not to do so because the payout was based on our final salary. I started my pension late (out of sheer stupidity, but that’s another story) and decided that at considerable expense (because there was not an accompanying employers’ contribution) I would make additional voluntary contributions. When I was deciding to do this, it was explained to me that each year I bought would add one 80th of my final salary to my pension. I am on a salary scale and have not reached the top of it, so if the USS make the proposed changes then they will be reneging on that agreement.

Is this legal? Here again is what they said.

It is important to note that the pension rights you have already earned are protected by law and in the scheme rules; the proposed changes will only affect the pension benefits that you will be able to build up in the future if the changes are implemented as proposed.

A lot depends on what is meant by “the pension rights you have already earned”. I would understand that to mean my final salary multiplied by the number of years I have contributed to the scheme divided by 80, since that is what I was told I would be getting for the money I have paid in so far. However, I think it may be that in law what I have already earned is what I could take away if I left the scheme now, which would be based on my current salary, and that part of “building up in the future” is sticking around in Cambridge while my salary increases. If anybody knows the answer to this legal question, I would be very interested. I have tried to find out by looking at the Pension Schemes Act 1993, and in particular Chapter 4, but it is pretty impenetrable. (Lawyers often claim that this impenetrability is necessary in order to avoid ambiguity, but in this instance it seems to have the opposite effect.)

But even if it turns out that it is not illegal for USS to interpret “the pension rights you have already earned” in this way, it is quite clearly immoral: it is a straightforward breaking of the terms of the agreement I had with them when I decided to take out a USS pension and make additional voluntary contributions. And of course I am far from alone in this respect. I personally don’t expect my final salary to be all that much higher than my current salary, so I probably won’t lose too much, but people whose final salaries are likely to be a lot higher than their current salaries will lose hugely.

The second point is that the way the USS has decided to share out the pain hugely exacerbates unfairnesses that are already present in the system. It is not fair that scientists are typically promoted much earlier than those in the humanities. In many cases it is not fair when men are promoted earlier than women. But at least those who were promoted more slowly could console themselves with the thought that they would probably catch up eventually, and that their pensions would therefore be comparable. If the changes come into effect, then as the examples above illustrate, if two people are in mid career at the time of the changes and are destined to reach the same final salary, but one has been promoted more than the other at the time of the changes, then the first person will end up not just with all that extra salary as at present but also with a substantially higher pension.

There is a mathematical point to make here that applies to many different policies. It is very wrong if the effect of the policy does not depend roughly continuously on somebody’s circumstances. But if you belong to the final-salary section and are up for promotion soon, you had better hope that you get promoted just before the change rather than just after it, since the accumulated difference it will make to your pension will be very large, even though the difference to your career progression will be small.

If all this bothers you, please do two things. First, alert your colleagues to what is going on and to what is wrong with it. Secondly, consider signing a petition that has been set up to oppose the changes.

Update. There are two further points that have come to my attention that mean that the situation is worse than I described it. The first is that I forgot to mention the lump sum that one receives on retirement. This is worth three times one’s annual pension, so for each of Alice, Bob and Carl, what they stand to lose from the lump sum under the new system is three quarters of the difference between their current salary and their final salary. Thus, Alice loses around £21,000 from her lump sum, while Carl loses nothing from his.

However, it turns out that Carl is not quite as fortunate as I claimed above, owing to a further consideration that I did not know about, which is that academic salaries tend to rise faster than inflation. I don’t mean that the salary of any one individual rises faster as a result of salary increments. I mean that if you take the salary at a fixed place in the salary scale, then that tends to rise faster than inflation. So although Carl will remain on the same point at the top of Band 1 for the rest of his career, his salary is likely to be significantly higher in real terms when he retires than it is now. I am told that it is quite usual for salaries to go up by at least 1% more than inflation, so in 20 years’ time this could make a big difference. This second consideration makes the situation worse for Alice and Bob by the same amount that it does for Carl.

ResonaancesAfter-weekend plot: new Planck limits on dark matter

The Planck collaboration just released updated results that include an input from their  CMB polarization measurements. The most interesting are the new constraints on the annihilation cross section of dark matter:

Dark matter annihilation in the early universe injects energy into the primordial plasma and increases the ionization fraction. Planck is looking for imprints of that in the CMB temperature and polarization spectrum. The relevant parameters are the dark matter mass and  <σv>*feff, where <σv> is the thermally averaged annihilation cross section during the recombination epoch, and feff ~0.2 accounts for the absorption efficiency. The new limits are a factor of 5 better than the latest ones from the WMAP satellite, and a factor of 2.5 better than the previous combined constraints.

What does it mean for us?  In vanilla models of thermal WIMP dark matter <σv> = 3*10^-26 cm^3/sec, in which case dark matter particles with masses below ~10 GeV are excluded by Planck. Actually, in this mass range the Planck limits are far less stringent the ones obtained by the Fermi collaboration from gamma-ray observations of dwarf galaxies. However, the two are complementary to some extent. For example, Planck probes the annihilation cross section in the early universe, which can be different than today. Furthermore, the CMB constraints obviously do not depend on the distribution of dark matter in galaxies, which is a serious source of uncertainty for cosmic rays experiments.  Finally, the CMB limits extend to higher dark matter masses where gamma-ray satellites lose sensitivity. The last point implies that Planck can weigh in on the PAMELA/AMS cosmic-ray positron excess. In models where the dark matter annihilation cross section during the recombination epoch is the same as today, the mass and cross section range that can explain the excess is excluded by Planck. Thus, the new results make it even more difficult to interpret the positron anomaly as a signal of dark matter.

ResonaancesB-modes: what's next

The signal of gravitational waves from inflation is the holy grail of cosmology. As is well known, at the end of a quest for the holy grail there is always the Taunting Frenchman....  This is also the fate of the BICEP quest for primordial B-mode polarization imprinted in the Cosmic Microwave Background by the gravitational waves.  We've already known, since many months, that the high intensity of the galactic dust foreground does not allow BICEP2 to unequivocally detect the primordial B-mode signal. The only open question was how strong limits on the parameter r - the tensor-to-scalar ratio of primordial fluctuations - can be set. This is the main result of the recent paper that combines data from the BICEP2, Keck Array, and Planck instruments. BICEP2 and Keck are orders of magnitude more sensitive than Planck to CMB polarization fluctuations. However, they made measurements only at one frequency of 150 GhZ where the CMB signal is large. Planck, on the other hand, can contribute  measurements at higher frequencies where the galactic dust dominates, which allows them to map out the foregrounds in the window observed by BICEP. Cross-correlating the Planck and BICEP maps allows one to subtract the dust component, and extract the constraints on the parameter r. The limit quoted by BICEP and Planck,  r < 0.12, is however worse than  r < 0.11 from Planck's analysis of temperature fluctuations. This still leaves a lot of room for the primordial B-mode signal hiding in the CMB.  

So the BICEP2 saga is definitely over, but the search for the primordial B-modes is not.  The lesson we learned is that single frequency instruments like BICEP2 are not good in view of large galactic foregrounds. The road ahead is then clear: build more precise multi-frequency instruments, such that foregrounds can be subtracted. While we will not send a new CMB satellite observatory anytime soon, there are literally dozens of ground based and balloon CMB experiments already running or coming online in the near future. In particular, the BICEP program continues, with Keck Array running at other frequencies, and the more precise BICEP3 telescope to be completed this year. Furthermore, the SPIDER balloon experiment just completed the first Antarctica flight early this year, with a two-frequency instrument on board. Hence, better limits on r are expected already this year. See the snapshots below, borrowed from these slides, for a compilation of upcoming experiments.

Impressive, isn't it? These experiments should be soon sensitive to r~0.01, and in the long run to r~0.001. Of course, there is no guarantee of a detection. If the energy scale of inflation is just a little below 10^16 GeV, then we will never observe the signal of gravitational waves. Thus, the success of this enterprise crucially depends on Nature being kind. However the high stakes make  these searches worthwhile. A discovery, would surely count among the greatest scientific breakthrough of 21st century. Better limits, on the other hand, will exclude some simple models of inflation.  For example, single-field inflation with a quadratic potential is already under pressure. Other interesting models, such as natural inflation, may go under the knife soon. 

For quantitative estimates of future experiments' sensitivity to r, see this paper.

Gordon WattsPi Day–We should do it more!


Today was Pi day. To join in the festivities, here in Marseille, I took my kid to the Pi-day exhibit at MuCEM, the new fancy museum they built in 2013 here in Marseille. It was packed. The room was on the top floor, and it was packed with people (sorry for the poor quality of the photo, my cell phone doesn’t handle the sun pouring in the windows well!). It was full of tables with various activities all having to do with mathematics. Puzzles and games that ranged from logic to group theory. It was very well done, and the students were enthusiastic and very helpful. They really wanted nothing more than to be here on a Saturday with this huge crowd of people. For the 45 minutes we were exploring everyone seemed to be having a good time.

And when I say packed, I really do mean packed. When we left the fire marshals had arrived, and were carefully counting people. The folks (all students from nearby universities) were carefully making sure that only one person went in for everyone that went out.

Each time I go to one of these things or participate in one of these things I’m reminded how much the public likes it. The Particle Fever movie is an obvious recent really big example. It was shown over here in Marseille in a theater for the first time about 6 months ago. The theater sold out! This was not uncommon back in the USA (though sometimes smaller audiences happened as well!). The staging was genius: the creator of the movie is a fellow physicist and each time a town would do a showing, he would get in contact with some of his friends to do Q&A after the movie.

Another big one I helped put together was the Higgs announcement on July 3, 2012, in Seattle. There were some 6 of us. It started at midnight and went on till 2 am (closing time). At midnight, on a Tuesday night, there were close to 200 people there! We’d basically packed the bar. The bar had to kick us out as people were peppering us with questions as we were trying to leave before closing. It was a lot of fun for us, and it looked like a lot of fun for everyone else that attended.

I remember the planning stages for that clearly. We had contingency plans in case no one showed up. Or how to alter our presentation if there were only 5 people. I think we were opening for about 40 or so. And almost 200 showed up. I think most of us did not think the public was interested. This attitude is pretty common – why would they care about the work we do is a common theme in conversations about outreach. And it is demonstrably wrong. Smile

The lesson for people in these fields: people want to know about this stuff! And we should figure out how to do these public outreach events more often. Some cost a lot and are years in the making (e.g. the movie Particle Fever), but others are easy. For example – Science Café’s around the USA.

And in more different ways. For example, some friends of mine have come up with a neat way of looking for cosmic rays – using your cell phones (most interesting conversation on this project can be found on twitter). What a great way to get everyone involved!

And there are selfish reasons for us to do these things! A lot of funding for science comes from various governments agencies in the USA and around the world (be it local or federal), and the more of the public knows what is being done with their tax dollars, and what interesting results are being produced, the better. Sure, there are people who will never be convinced, but there are also a lot that will become even more enthusiastic.

So… what are your next plans for an outreach project?

March 14, 2015

Clifford JohnsonLA Marathon Route Panorama!

sky_spots_marathon_pano_stitch_cvj_13_march_2015(Click for much larger view.) Sunday is the 30th LA Marathon. In celebration of this, giant spotlights were set up at various points along the route (from Dodger stadium all the way out to Santa Monica... roughly a station each mile, I read somewhere) and turned on last night for about an hour between around 9 and 10. I stood on a conveniently placed rooftop and had a go at capturing this. See the picture (click for much larger view). It involved pushing the exposure by about two stops, [...] Click to continue reading this post

Tommaso DorigoThe Graph Of The Week: Hyper-Boosted Top Quarks

The top quark is the heaviest known elementary particle. It was discovered in 1995 by the CDF and DZERO experiments at the Fermilab Tevatron collider after a long hunt that had started almost two decades earlier: it took long because the top weighs as much as a whole silver atom, and producing this much matter in single particle-particle collisions is difficult: it requires collision energies that started to be available only in 1985, and the rarity of the production processes dictate collision rates that were delivered only in the early nineties.

read more

Doug NatelsonTunneling two-level systems in solids: Direct measurements

Back in the ancient mists of time, I did my doctoral work studying tunneling two-level systems (TLS) in disordered solids.  What do these words mean?  First, read this post from 2009.   TLS are little, localized excitations that were conjectured to exist in disordered materials.  Imagine a little double-welled potential, like this image from W. A. Phillips, Rep. Prog. Phys. 50 (1987) 1657-1708.
The low temperature thermal, acoustic, and dielectric properties of glasses, for example, appear to be dominated by these little suckers, and because of the disordered nature of those materials, they come in all sorts of flavors - some with high barriers in the middle, some with low barriers; some with nearly symmetric wells, some with very asymmetric wells.   These TLS also "couple to strain" (that's how they talk to lattice vibrations and influence thermal and acoustic properties), meaning that if you stretch or squish the material, you raise one well and lower the other by an amount proportional to the stretching or squishing.

When I was a grad student, there were a tiny number of experiments that attempted to examine individual TLS, but in most disordered materials they could only be probed indirectly.   Fast forward 20 years.  It turns out that superconducting structures developed for quantum computing can be extremely sensitive to the presence of TLS, which typically exist in the glassy metal oxide layers used as tunnel barriers or at the surfaces of the superconductors.  A very cool new paper on the arxiv shows this extremely clearly.  If you look at Figure 2d, they are able to track the energy splittings of the TLS while straining the material (!), and they can actually see direct evidence of TLS talking coherently to each other.  There are "avoided crossings" between TLS levels, meaning that occasionally you end up with TLS pairs that are close enough to each other that energy can slosh coherently back and forth between them.   I find this level of information very impressive, and the TLS case continues to be an impressive example of theorists concocting a model based on comparatively scant information, and then experimentalists validating it well beyond the original expectations.   From the quantum computing perspective, though, these little entities are not a good thing, and demonstrate a maxim I formulated as a grad student:  "TLSs are everywhere, and they're evil."

(On the quantitative side:  If the energy difference between the bottoms of the two wells is \(\Delta\), and the tunneling matrix element that would allow transitions between the two wells is \(\Delta_{0}\), then a very simple calculation says that the energy difference between the ground state of this system and the first excited state is given by \(\sqrt{\Delta^{2} + \Delta_{0}^{2}}\).  If coupling to strain linearly tunes \(\Delta\), then that energy splitting should trace out a shape just like the curves seen in Fig. 2d of the paper.)

n-Category Café Split Octonions and the Rolling Ball

You may enjoy these webpages:

because they explain a nice example of the Erlangen Program more tersely — and I hope more simply — than before, with the help of some animations made by Geoffrey Dixon using WebGL. You can actually get a ball to roll in way that illustrates the incidence geometry associated to the exceptional Lie group G 2\mathrm{G}_2!

Abstract. Understanding exceptional Lie groups as the symmetry groups of more familiar objects is a fascinating challenge. The compact form of the smallest exceptional Lie group, G 2\mathrm{G}_2, is the symmetry group of an 8-dimensional nonassociative algebra called the octonions. However, another form of this group arises as symmetries of a simple problem in classical mechanics! The space of configurations of a ball rolling on another ball without slipping or twisting defines a manifold where the tangent space of each point is equipped with a 2-dimensional subspace describing the allowed infinitesimal motions. Under certain special conditions, the split real form of G 2\mathrm{G}_2 acts as symmetries. We can understand this using the quaternions together with an 8-dimensional algebra called the ‘split octonions’. The rolling ball picture makes the geometry associated to G 2\mathrm{G}_2 quite vivid. This is joint work with James Dolan and John Huerta, with animations created by Geoffrey Dixon.

I’m going to take this show on the road and give talks about it at Penn State, the University of York (virtually), and elsewhere. And there’s no shortage of material to read for more details. John Huerta has blogged about this work here:

* John Huerta, G2 and the rolling ball.

and I have a 5-part series where I gradually lead up to the main idea, starting with easier examples:

* John Baez, Rolling circles and balls.

There’s also plenty of actual papers:

So, enjoy!

March 12, 2015

Doug NatelsonTable-top particle physics

We had a great colloquium here today by Dave DeMille from Yale University.   He spoke about his group's collaborative measurements (working with John Doyle and Gerry Gabrielse at Harvard) trying to measure the electric dipole moment of the electron.  When we teach students, we explain that as far as we have been able to determine, an electron is a truly pointlike particle (infinitesimal in size) with charge -e and spin 1/2.  That is, it has no internal structure (though somehow it contains intrinsic angular momentum, but that is a story for another day), and that means that attempts to probe the charge distribution of the electron (e.g., scattering measurements) indicate that its charge is distributed in a spherically symmetric way.

We know, though, that from the standpoint of quantum field theory like quantum electrodynamics that we should actually think of the electron as being surrounded by a cloud of "virtual" particles of various sorts.   In Feynman-like language, when an electron goes from here to there, we need to consider not just the direct path, but also the quantum amplitudes for paths with intermediate states (that could be classically forbidden), like spitting out and reabsorbing a photon between here and there.   Those paths give rise to important, measurable consequences, like the Lamb shift, so we know that they're real.  Where things get very interesting is when you wonder about more complicated corrections involving particles that break time reversal symmetry (like B and K mesons).  If you throw in what we know from the Standard Model of particle physics, those corrections lead to the conclusion that there actually should be a non-zero electric dipole moment of the electron.  That is, along its axis of "spin", there should be a slight deficit of negative charge at the north pole and excess of negative charge at the south pole, corresponding to a shift of the charge of the electron by about \(10^{-40}) cm.  That is far too small to measure.

However, suppose that there are more funky particles out there (e.g., dark matter candidates like the supersymmetric particles that many people predict should be seen at the LHC or larger colliders).  If those particles have masses on the TeV scale (that'd be convenient), there is then an expectation that there should be a detectable electric dipole moment.  DeMille and collaborators have used extremely clever atomic physics techniques involving optical measurements on beams of ThO molecules in magnetic and electric fields to look, and they've pushed the bound on any such moment (pdf) to levels that already eliminate many candidate theories.

Two comments.  First, this talk confirmed for me once again that you really have to have a special kind of personality to do truly precision measurements.  The laundry list of systematic error sources that they considered is amazing, as are the control experiments.  Second, I love this kind of thing, using "table-top" experiments (for certain definitions of "table") to get at particle physics questions.   Note that the entire cost of the whole experiment over several years so far as been around $2M.  That's not even a rounding error on the LHC budget.  Sustained investing at a decent level in this kind of work may have enormous bang-for-the-buck compared with building ever-larger colliders.

Georg von HippelQNP 2015, Day Five

Apologies for the delay in posting this. Travel and jetlag kept me from attending to it earlier.

The first talk today was by Guy de Teramond, who described applications of light-front superconformal quantum mechanics to hadronic physics. I have to admit that I couldn't fully take in all the details, but as far as I understood an isomorphy between AdS2 and the conformal group in one dimension can be used to derive a form of the light-front Hamiltonian for mesons from an AdS/QCD correspondence, in which the dilaton field is fixed to be φ(z)=1/2 z2 by the requirement of conformal invariance, and a similar construction in the superconformal case leads to a light-front Hamiltonian for baryons. A relationship between the Regge trajectories for mesons and baryons can then be interpreted as a form of supersymmetry in this framework.

Next was Beatriz Gay Ducati with a review of the pheonomenology of heavy quarks in nuclear matter, a topic where there are still many open issues. The photoproduction of quarkonia on nucleons and nuclei allows to probe the gluon distribution, since the dominant production process is photon-gluon fusion, but to be able to interpret the data, many nuclear matter effects need to be understood.

After the coffee break, this was followed by a talk by Hrayr Matevosyan on transverse momentum distributions (TMDs), which are complementary to GPDs in the sense of being obtained by integrating out other variables starting from the full Wigner distributions. Here, again, there are many open issues, such as the Sivers, Collins or Boer-Mulders effects.

The next speaker was Raju Venugopalan, who spoke about two outstanding problems in QCD at high parton densities, namely the question of how the systems created in heavy-ion collisions thermalise, and the phenomenon of "the ridge" in proton-nucleus collisions, which would seem to suggest hydrodynamic behaviour in a system that is too small to be understood as a liquid. Both problems may have to do with the structure of the dense initial state, which is theorised to be a colour-glass condensate or "glasma", and the way in which it evolves into a more dilute system.

After the lunch break, Sonny Mantry reviewed some recent advances made in applying Soft-Collinear Effective Theory (SCET) to a range of questions in strong-interaction physics. SCET is the effective field theory obtained when QCD fluctuations around a hard particle momentum are considered to be small and a corresponding expansion (analogous to the 1/m expansion in HQET) is made. SCET has been successfully applied to many different problems; an interesting and important one is the problem of relating the "Monte Carlo mass" usually quoted for the top quark to the top quark mass in a more well-defined scheme such as MSbar.

The last talk in the plenary programme was a review of the Electron-Ion Collider (EIC) project by Zein-Eddine Meziani. By combining the precision obtainable using an electron beam with the access to the gluon-dominated regime provided by a havy ion beam, as well as the ability to study the nucleon spin using a polarised nucleon beam, the EIC will enable a much more in-depth study of many of the still unresolved questions in QCD, such as the nucleon spin structure and colour distributions. There are currently two competing designs, the eRHIC at Brookhaven, and the MEIC at Jefferson Lab.

Before the conference closed, Michel Garçon announced that the next conference of the series (QNP 2018) will be held in Japan (either in Tsukuba or in Mito, Ibaraki prefecture). The local organising committee and conference office staff received some well-deserved applause for a very smoothly-run conference, and the scientific part of the conference programme was adjourned.

As it was still in the afternoon, I went with some colleagues to visit La Sebastiana, the house of Pablo Neruda in Valparaíso, taking one of the city's famous ascensores down (although up might have been more convenient, as the streets get very steep) before walking back to Viña del Mar along the sea coast.

The next day, there was an organised excursion to a vineyard in the Casablanca valley, where we got to taste some very good Chilean wines (some of the them matured in traditional clay vats) and liqueurs with a very pleasant lunch.

I got to spend another day in Valparaíso before travelling back (a happily uneventful, if again rather long trip).

n-Category Café A Scale-Dependent Notion of Dimension for Metric Spaces (Part 1)

Consider the following shape that we are zooming into and out of. What dimension would you say it was?

zooming dots

At small scales (or when it is very far away) it appears as a point, so seems zero-dimensional. At bigger scales it appears to be a line, so seems one-dimensional. At even bigger scales it appears to have breadth as well as length, so seems two-dimensional. Then, finally, at very large scales it appears to be made from many widely separated points, so seems zero-dimensional again.

We arrive at an important observation:

The perceived dimension varies with the scale at which the shape is viewed.

Here is a graph of my attempt to capture this perceived notion of dimension mathematically.

A graph

Hopefully you can see that, moving from the left, you start off at zero, then move into a region where the function takes value around one, then briefly moves up to two then drops down to zero again.

The ‘shape’ that we are zooming into above is actually a grid of 3000×163000\times 16 points as that’s all my computer could handle easily in making the above picture. If I’d had a much bigger computer and used say 10 7×10 210^{7}\times 10^{2} points then I would have got something that more obviously had both a one-dimensional and two-dimensional regime.

In this post I want to explain how this idea of dimension comes about. It relies on a notion of ‘size’ of metric spaces, for instance, you could use Tom’s notion of magnitude, but the above picture uses my notion of 22-spread.

I have mentioned this idea before at the Café in the post on my spread paper, but I wanted to expand on it somewhat.

What do we mean by dimension?

First we should try to understand how we think of dimension. We are used to thinking of it in terms of ‘degrees of freedom’, or the number of different directions you can go in, so a line is one dimensional and a square is two dimensional. This will always give us an integer dimension.

It was realised that there were spaces that could be given meaningful non-integer dimensions by using a different approach to what dimension is. Let’s follow one train of thought. If we think about the square and the line again, we can see that they have different scaling properties. If we scale up a line by a factor of two, then we can fit exactly two copies of the original line in. If we scale up a square by a factor of two then we can fit four copies of the original square in.

A square and a line

More generally if we scale up the line by a factor of nn then we can fit n=n 1n=n^{1} copies of the original line in and if we similarly scale up the square by a factor of nn we can fit n 2n^{2} copies of the original square. The indices 11 and 22 coincide with the dimensions of the spaces.

Let’s take that as the basis of a different notion of dimension and see what happens when we scale up some fractally shapes.

A first attempt at defining dimension

Let’s take a not well defined notion of scale dimension to be the following. Say that the scale dimension is DD if when we scale up by a factor of nn (for suitable nn) we get n Dn^{D} copies of our original space back.

We see that the Koch curve is scaled by a factor of 33 we can fit four copies of the original in, or more generally when scaled by a factor of n=3 kn=3^{k} we can fit 4 k=(3 log 3(4)) k=n log 3(4)4^{k}=(3^{\log _{3}(4)})^{k}= n^{\log _{3}(4)} copies in. So this says that the scale dimension is log 3(4)\log _{3}(4).


Similarly for the Cantor set. If you scale it by a factor of three you can fit two copies of the original in, more generally if we scale it by a factor of n=3 kn=3^{k} we can fit 2 k=(3 log 3(2)) k=n log 3(2)2^{k}=(3^{\log _{3}(2)})^{k}= n^{\log _{3}(2)} copies in. So this says that the scale dimension is log 3(2)\log _{3}(2).


The above idea, although appealing, has several flaws. One flaw is that it is not clear what is meant by having nn-copies of our original shape: for instance, when we scale up the square and decompose into smaller squares the boundaries overlap, so we get a bit less than four copies. Another flaw is that the above idea only works for certain special shapes: for instance, if we scale up a disk by a factor of two we don’t get a shape we can decompose into four of our original shapes. Indeed, there is no n>1n\gt 1 such that when we scale up a disk by a factor of nn we get a shape that can be decomposed into n 2n^{2} copies of our original disk.

A second attempt

We could address these two flaws by observing that when we scale up a plane shape such as a square or a disk by a factor of two we get a shape which is four times the area of the original shape. Similarly, when we scale all lengths by a factor of nn we get a shape which has n 2n^{2} the area. More generally, for a shape in DD-dimensional Euclidean space, when we scale all lengths by a factor of nn we get a shape which has n Dn^{D} times the volume.

So we could try to say that a shape has volume dimension DD if when you scale it up by a factor of nn the volume changes by a factor of n Dn^{D}. Unfortunately, that doesn’t work for various reasons. Firstly, you have to specify the kind of volume you need for each shape, so you need length for lines and area for squares, so you’re presupposing the dimension when you pick the volume to use! Secondly, this is not going to work for fractals as none of these volumes make sense for a fractal. The Koch curve does not have a meaningfully non-trivial length or area. (To define fractal dimensions such as the Hausdorff or Minkowski dimension you’d take a different turn at this point, but we’ll carry on the way we’re going.)

Defining dimension using a notion of size

An alternative would be if we had a single notion of size which we could use for all shapes simultaneously. For a given shape we could then see how this size changes under scaling and use that to define a notion of dimension. This is the approach I used as I know of several appropriate notions of size.

  • Tom Leinster introduced ‘magnitude’ |||\cdot | which is a notion of size which is defined for every compact subset of a Euclidean space — and more generally, for every compact, positive definite metric space. I’ll not give the definition here.

  • Tom also introduced ‘maximum diversity’ which is based on magnitude, is a bit more complicated to define but is somewhat better behaved on all metric spaces. He gives the definition in this post if you are interested.

  • I introduced a one-parameter family {E q} q0\{ E_{q}\} _{q\ge 0} of notions of size which are defined for every finite metric space — and I suspect for every compact subset of Euclidean space as well. These are easier to define and calculate, but I’ll just give the definitions for E 0E_{0} and E 2E_{2}, which are particularly nice to write down. If (X,d)(X,d) is a finite metric space with NN points then the 00-spread and 22-spread are defined as follows: E 0(X) xX1 yXe d(x,y);E 2(X)N 2 x,yXe d(x,y). E_{0}(X)\coloneqq \sum _{x\in X}\frac{1}{\displaystyle\sum _{y\in X}e^{-d(x,y)}}; \qquad E_{2}(X)\coloneqq \frac{N^{2}}{\displaystyle\sum _{x,y\in X}e^{-d(x,y)}}.

Once we have a notion of size SS, we want to use this to define a notion of dimension which tells us how the size alters under scaling. Typically we can’t do anything as naive as finding a DD such that if we scale by a factor of nn then the size scales by a factor of n Dn^{D}, the sizes described above are generally more interesting than that!

For a shape XX and t>0t\gt 0 we define tXt X to be the shape XX scaled up by a factor of tt. We then have the size S(tX)S(t X) being a function of tt and we define the dimension of XX to be the growth rate of this function at t=1t=1. By growth rate I mean some number associated to a function such that the growth rate of t Dt^{D} is DD. Recalling from school that the way to calculate the growth rate of a function is to plot the function on log-log paper and measure the gradient, we take the instantaneous growth rate of a function to be the logarithmic derivative, d(ln(f(t)))dln(t)\frac{d (\ln (f(t)))}{d \ln (t)}. A couple of applications of the chain rule tell us that this is just df(t)dttf(t)\frac{d f(t)}{dt}\frac{t}{f(t)}.

For a notion of size SS, this leads us to define the S-dimension of a shape XX by dim S(X)dln(S(tX))dln(t)| t=1=dS(tX)dttS(tX)| t=1. \dim _{S}(X)\coloneqq \frac{d \ln (S(t X))}{d \ln (t)}\bigg | _{t=1}=\frac{d S(t X)}{dt}\frac{t}{S(t X)}\bigg |_{t=1}.

The graph in the introduction is the E 2E_2-dimension of the 3000×163000\times 16 lattice of points in the animation plotted as we scale the set of points, i.e. it’s the graph of dim S(tX)dim_S(t X) as we vary tt.

In practice, it seems that the magnitude, E 0E_{0} and E 2E_{2} dimensions of a given shape look very similar. They have different advantages over each over: for instance, we know, or have conjectured, closed forms for magnitudes of several shapes; but in the case of things that we don’t have a closed form for it is quicker to calculate the 00- or 22-spread dimension.

Next time…

Next time I’ll give more examples. I’ll also say something about connections with Minkowski dimension, although I don’t understand the full story there.

Sean CarrollNew Course: The Higgs Boson and Beyond

Happy to announce that I have a new course out with The Great Courses (produced by The Teaching Company). This one is called The Higgs Boson and Beyond, and consists of twelve half-hour lectures. I previously have done two other courses for them: Dark Matter and Dark Energy, and Mysteries of Modern Physics: Time. Both of those were 24 lectures each, so this time we’re getting to the good stuff more quickly.

The inspiration for the course was, naturally, the 2012 discovery of the Higgs, and you’ll be unsurprised to learn that there is some overlap with my book The Particle at the End of the Universe. It’s certainly not just me reading the book, though; the lecture format is very different than the written word, and I’ve adjusted the topics and order appropriately. Here’s the lineup:


  1. The Importance of the Higgs Boson
  2. Quantum Field Theory
  3. Atoms to Particles
  4. The Power of Symmetry
  5. The Higgs Field
  6. Mass and Energy
  7. Colliding Particles
  8. Particle Accelerators and Detectors
  9. The Large Hadron Collider
  10. Capturing the Higgs Boson
  11. Beyond the Standard Model
  12. Frontiers: Higgs in Space

Because it is a course, the presentation here is in a more strictly logical order than it is in the book, starting from quantum field theory and working our way up. It’s still aimed at a completely non-expert audience, though a bit of enthusiasm for physics will be helpful for grappling with the more challenging material. And it’s available in both audio-only or video — but I have to say they did a really nice job with the graphics this time around, so the video is worth having.

And it’s on sale! Don’t know how long that will last, but there’s a big difference between regular prices at The Great Courses and the sale prices. A bargain either way!

Sean CarrollThe Big Questions

The other day I mused on Twitter about three big origin questions: the origin of the universe, the origin of life, and the origin of consciousness. Which isn’t to say they are related, just that they’re all interesting and important (and currently nowhere near solved). Physicists have taken stabs at the life question, but (with a few dramatic exceptions) they’ve mostly stayed away from consciousness. Probably for the best.

Here’s Ed Witten giving his own personal — and characteristically sensible — opinion, which is that consciousness is a really knotty problem, although not so difficult that we should start contemplating changing the laws of physics in order to solve it. Though I am more optimistic than he is that we’ll understand it on a reasonable timescale. (Hat tip to Ash Jogalekar.)

Anyone seriously interested in tackling these big questions would be well-served by acknowledging that much (most? almost all?) progress in science is incremental, sneaking up on major discoveries by a series of small steps rather than leaping right to a dramatic new paradigm. Even if you want to understand the origin of the universe, it might behoove you to think about some more specific and tractable problems, like the nature of quantum fluctuations in inflation, or the emergence of spacetime in string theory. If you want to understand the origin of consciousness, it’s a good strategy to think about something like our perception of color, with the idea of working your way up to the more challenging issues.

Conversely, it’s these big questions that attract crackpots like honey attracts flies. I get a lot of emails (and physical letters) from cranks, but they never have a new theory of the branching ratio of the Higgs boson into four leptons; it’s always about the nature of space and time and everything. It’s too easy for anyone to have an opinion about these big questions, whether or not those opinions are worth paying attention to.

All of which leads up to saying: it’s still worth tackling the big questions! Start small, but think big. Because they are so hard, it’s too easy to make fun of attempts to solve the biggest questions, or to imagine that they are irreducibly mysterious and will never be solved. I wouldn’t be at all surprised if we had quite compelling pictures of the origin of the universe, life, and consciousness within the next hundred years. But only if we’re willing to tackle the big problems seriously.

Sean CarrollFrom Child Preacher to Adult Humanist

One of the best experiences I had at last year’s Freedom From Religion Foundation convention was listening to this wonderful talk by Anthony Pinn. (Talk begins at 5:40.)

Pinn, growing up outside Buffalo NY, became a preacher in his local church at the ripe young age of 12. Now, there’s nothing an audience of atheists likes better than a story of someone who was devoutly religious and later does an about-face to embrace atheism. (Not an uncommon path, with many possible twists, as you can read in Daniel Dennett and Linda LaScola’s Caught in the Pulpit.) And Pinn gives it to us, hitting all the best notes: being in love with Jesus but also with thinking critically, being surprised to meet theologians who read the Bible as literature rather than as The Word, and ultimately losing his faith entirely while studying at Harvard Divinity School.

But there’s a lot more to his message than a congratulatory triumph of rationality over superstition. Through his life, Pinn has been concerned with the effect that ideas and actions have on real people, especially the African-American community. His mother always reminded him to “move through the world knowing your footsteps matter,” valuable advice no matter what your ontological orientation might be.

This comes out in the Q&A period — often not worth listening to, but in this case it’s the highlight of the presentation. The audience of atheists are looking for yet more self-affirmation, demanding to know why more Blacks haven’t accepted the truth of a secular worldview. Pinn is very frank: naturalism hasn’t yet offered African-Americans a “soft landing.” Too many atheists, he points out, spend a lot of time critiquing religious traditions, and a lot of time patting themselves on the back for being rational and fair-minded, and not nearly enough time constructing something positive, a system of networks and support structures free of the spiritual trappings. It’s a good message for us to hear.

It would have been fantastic to have Anthony at Moving Naturalism Forward. Next time! (Not that there are currently any plans for a next time.)

March 11, 2015

Scott AaronsonThe ultimate physical limits of privacy

Somewhat along the lines of my last post, the other day a reader sent me an amusing list of questions about privacy and fundamental physics.  The questions, and my answers, are below.

1. Does the universe provide us with a minimum level of information security?

I’m not sure what the question means. Yes, there are various types of information security that are rooted in the known laws of physics—some of them (like quantum key distribution) even relying on specific aspects of quantum physics—whose security one can argue for by appealing to the known properties of the physical world. Crucially, however, any information security protocol is only as good as the assumptions it rests on: for example, that the attacker can’t violate the attack model by, say, breaking into your house with an ax!

2. For example, is my information safe from entities outside the light-cone I project?

Yes, I think it’s safe to assume that your information is safe from any entities outside your future light-cone. Indeed, if information is not in your future light-cone, then almost by definition, you had no role in creating it, so in what sense should it be called “yours”?

3. Assume that there are distant alien cultures with infinite life spans – would they always be able to wait long enough for my light cone to spread to them, and then have a chance of detecting my “private” information?

First of all, the aliens would need to be in your future light-cone (see my answer to 2). In 1998, it was discovered that there’s a ‘dark energy’ pushing the galaxies apart at an exponentially-increasing rate. Assuming the dark energy remains there at its current density, galaxies that are far enough away from us (more than a few tens of billions of light-years) will always recede from us faster than the speed of light, meaning that they’ll remain outside our future light-cone, and signals from us can never reach them. So, at least you’re safe from those aliens!

For the aliens in your future light-cone, the question is subtler. Suppose you took the only piece of paper on which your secrets were written, and burned it to ash—nothing high-tech, just burned it. Then there’s no technology that we know today, or could even seriously envision, that would piece the secrets together. It would be like unscrambling an egg, or bringing back the dead from decomposing corpses, or undoing a quantum measurement. It would mean, effectively, reversing the Arrow of Time in the relevant part of the universe. This is formally allowed by the Second Law of Thermodynamics, since the decrease in entropy within that region could be balanced by an increase in entropy elsewhere, but it would require a staggering level of control over the region’s degrees of freedom.

On the other hand, it’s also true that the microscopic laws of physics are reversible: they never destroy information. And for that reason, as a matter of principle, we can’t rule out the possibility that some civilization of the very far future, whether human or alien, could piece together what was written on your paper even after you’d burned it to a crisp. Indeed, with such godlike knowledge and control, maybe they could even reconstruct the past states of your brain, and thereby piece together private thoughts that you’d never written anywhere!

4. Does living in a black hole provide privacy? Couldn’t they follow you into the hole?

No, I would not recommend jumping into a black hole as a way to ensure your privacy. For one thing, you won’t get to enjoy the privacy for long (a couple hours, maybe, for a supermassive black hole at the center of a galaxy?) before getting spaghettified on your way to the singularity. For another, as you correctly pointed out, other people could still snoop on you by jumping into the black hole themselves—although they’d have to want badly enough to learn your secrets that they wouldn’t mind dying themselves along with you, and also not being able to share whatever they learned with anyone outside the hole.

But a third problem is that even inside a black hole, your secrets might not be safe forever! Since the 1970s, it’s been thought that all information dropped into a black hole eventually comes out, in extremely-scrambled form, in the Hawking radiation that black holes produce as they slowly shrink and evaporate. What do I mean by “slowly”? Well, the evaporation would take about 1070 years for a black hole the mass of the sun, or about 10100 years for the black holes at the centers of galaxies. Furthermore, even after the black hole had evaporated, piecing together the infalling secrets from the Hawking radiation would probably make reconstructing what was on the burned paper from the smoke and ash seem trivial by comparison! But just like in the case of the burned paper, the information is still formally present (if current ideas about quantum gravity are correct), so one can’t rule out that it could be reconstructed by some civilization of the extremely remote future.

BackreactionWhat physics says about the vacuum: A visit to the seashore.

[Image Source:]
Imagine you are at the seashore, watching the waves. Somewhere in the distance you see a sailboat — wait, don’t fall asleep yet. The waves and I want to tell you a story about nothing.

Before quantum mechanics, “vacuum” meant the absence of particles, and that was it. But with the advent of quantum mechanics, the vacuum became much more interesting. The sea we’re watching is much like this quantum vacuum. The boats on the sea’s surface are what physicists call “real” particles; they are the things you put in colliders and shoot at each other. But there are also waves on the surface of the sea. The waves are like “virtual” particles; they are fluctuations around sea level that come out of the sea and fade back into it.

Virtual particles have to obey more rules than sea waves though. Because electric charge must be conserved, virtual particles can only be created together with their anti-particles that carry the opposite charge. Energy too must be conserved, but due to Heisenberg’s uncertainty principle, we are allowed to temporarily borrow some energy from the vacuum, as long as we give it back quickly enough. This means that the virtual particle pairs can only exist for a short time, and the more energy they carry, the shorter the duration of their existence.

You cannot directly measure virtual particles in a detector, but their presence has indirect observable consequences that have been tested to great accuracy. Atomic nuclei, for example, carry around them a cloud of virtual particles, and this cloud shifts the energy levels of electrons orbiting around the nucleus.

So we know, not just theoretically but experimentally, that the vacuum is not empty. It’s full with virtual particles that constantly bubble in and out of existence.

Vizualization of a quantum field theory calculation showing virtual particles in the quantum vacuum.
Image Credits: Derek Leinweber

Let us go back to the seashore; I quite liked it there. We measure elevation relative to the average sea level, which we call elevation zero. But this number is just a convention. All we really ever measure are differences between heights, so the absolute number does not matter. For the quantum vacuum, physicists similarly normalize the total energy and momentum to zero because all we ever measure are energies relative to it. Do not attempt to think of the vacuum’s energy and momentum as if it was that of a particle; it is not. In contrast to the energy-momentum of particles, that of the vacuum is invariant under a change of reference frame, as Einstein’s theory of Special Relativity requires. The vacuum looks the same for the guy in the train and for the one on the station.

But what if we take into account gravity, you ask? Well, there is the rub. According to General Relativity, all forms of energy have a gravitational pull. More energy, more pull. With gravity, we are no longer free to just define the sea level as zero. It’s like we had suddenly discovered that the Earth is round and there is an absolute zero of elevation, which is at the center of the Earth.

In best manner of a physicist, I have left out a small detail, which is that the calculated energy of the quantum vacuum is actually infinite. Yeah, I know, doesn’t sound good. If you don’t care what the total vacuum energy is anyway, this doesn’t matter. But if you take into account gravity, the vacuum energy becomes measurable, and therefore it does matter.

The vacuum energy one obtains from quantum field theory is of the same form as Einstein’s Cosmological Constant because this is the only form which (in an uncurved space-time) does not depend on the observer. We measured the Cosmological Constant to have a small, positive, nonzero value which is responsible for the accelerated expansion of the universe. But why it has just this value, and why not infinity (or at least something huge), nobody knows. This “Cosmological Constant Problem” is one of the big open problems in theoretical physics today and its origin lies in our lacking understanding of the quantum vacuum.

But this isn’t the only mystery surrounding the sea of virtual particles. Quantum theory tells you how particles belong together with fields. The quantum vacuum by definition doesn’t have real particles in it, and normally this means that the field that it belongs to also vanishes. For these fields, the average sea level is at zero, regardless of whether there are boats on the water or aren’t. But for some fields the real particles are more like stones. They’ll not stay on the surface, they will sink and make the sea level rise. We say the field “has a non-zero vacuum expectation value.”

On the seashore, you now have to wade through the water, which will slow you down. This is what the Higgs-field does: It drags down particles and thereby effectively gives them mass. If you dive and kick the stones that sunk to the bottom hard enough, you can sometimes make one jump out of the surface. This is essentially what the LHC does, just call the stones “Higgs bosons.” I’m really getting into this seashore thing ;)

Next, let us imagine we could shove the Earth closer to the Sun. Oceans would evaporate and you could walk again without having to drag through the water. You’d also be dead, sorry about this, but what about the vacuum? Amazingly, you can do the same. Physicists say the “vacuum melts” rather than evaporates, but it’s very similar: If you pump enough energy into the vacuum, the level sinks to zero and all particles are massless again.

You may complain now that if you pump energy into the vacuum, it’s no longer vacuum. True. But the point is that you change the previously non-zero vacuum expectation value. To our best knowledge, it was zero in the very early universe and theoretical physicists would love to have a glimpse at this state of matter. For this however they’d have to achieve a temperature of 1015 Kelvin! Even the core of the sun “only” makes it to 107 Kelvin.

One way to get to such high temperature, if only in a very small region of space, is with strong electromagnetic fields.

In a recent paper, Hegelich, Mourou, and Rafelski estimated that with the presently most advanced technology high intensity lasers could get close to the necessary temperature. This is still far off reality, but it will probably one day become possible!

Back to the sea: Fluids can exist in a “superheated” state. In such a state, the medium is liquid even though its temperature is above the boiling point. Superheated liquids are “metastable,” this means if you give them any opportunity they will very suddenly evaporate into the preferred stable gaseous state. This can happen if you boil water in the microwave, so always be very careful taking it out.

The vacuum that we live in might be a metastable state: a “false vacuum.” In this case it will evaporate at some point, and in this process release an enormous amount of energy. Nobody really knows whether this will indeed happen. But even if it does happen, best present estimates date this event into the distant future, when life is no longer possible anyway because stars have run out of power. Particle physicist Joseph Lykken estimated something like a Googol years; that’s about 1090 times the present age of the universe.

According to some theories, our universe came into existence from another metastable vacuum state, and the energy that was released in this process eventually gave rise to all we see around us now. Some physicists, notably Lawrence Krauss, refer to this as creating a universe from “nothing.”

If you take away all particles, you get the quantum vacuum, but you still have space-time. If we had a quantum theory for space-time as well, you could take away space-time too, at least operationally. This might be the best description of a physical “nothing” that we can ever reach, but it still would not be an absolute nothing because even this state is still a mathematical “something”.

Now what exactly it means for mathematics to “exist” I better leave to philosophers. All I have to say about this is, well, nothing.

If you want to know more about the philosophy behind nothing, you might like Jim Holt’s book “Why does the world exist”, which I reviewed here

This post previously appeared at Starts With a Bang under the title “Everything you ever wanted to know about nothing”.

March 10, 2015

Clifford JohnsonLayout Design

Rough layout design. Text suppressed because... spoilers. Feel free to supply your own dialogue... share it here if you like! panel_desgin_10_march_2015 (Click image for larger view.) In case you're wondering, I'm trying here and there to find a bit of time to do a bit of rough (but less rough than last pass) layout design for the book. Sample above. This helps me check that all the flow, layout, pace, and transitions are [...] Click to continue reading this post

Scott AaronsonThe flow of emails within the block inbox

As a diversion from the important topics of shaming, anti-shaming, and anti-anti-shaming, I thought I’d share a little email exchange (with my interlocutor’s kind permission), which gives a good example of what I find myself doing all day when I’m not blogging, changing diapers, or thinking about possibly doing some real work (but where did all the time go?).

Dear Professor Aaronson,

I would be very pleased to know your opinion about time.  In a letter of condolence to the Besso family, Albert Einstein wrote: “Now he has departed from this strange world a little ahead of me. That means nothing. People like us, who believe in physics, know that the distinction between past, present and future is only a stubbornly persistent illusion.” I’m a medical doctor and everyday I see time’s effect over human bodies. Is Einstein saying time is an illusion?  For who ‘believe in physics’ is death an illusion?  Don’t we lose our dears and will they continue to live in an ‘eternal world’?

Is time only human perceptive illusion (as some scientists say physics has proved)?

Dear [redacted],

I don’t read Einstein in that famous quote as saying that time itself is an illusion, but rather, that the sense of time flowing from past to present to future is an illusion. He meant, for example, that the differential equations of physics can just as easily be run backward (from future to past) as forward (from past to future), and that studying physics can strongly encourage a perspective—which philosophers call the “block universe” perspective—where you treat the entire history of spacetime as just a fixed, 4-dimensional manifold, with time simply another dimension in addition to the three spatial ones (admittedly, a dimension that the laws of physics treat somewhat differently than the other three). And yes, relativity encourages this perspective, by showing that different observers, moving at different speeds relative to each other, will divide up the 4-dimensional manifold into time slices in different ways, with two events judged to be simultaneous by one observer judged to be happening at different times by another.

But even after Einstein is read this way, I’d personally respond: well, that’s just one perspective you can take. A perfectly understandable one, if you’re Einstein, and especially if you’re Einstein trying to comfort the bereaved. But still: would you want to say, for example, that because physics treats the table in front of you as just a collection of elementary particles held together by forces, therefore the table, as such, doesn’t “exist”? That seems overwrought. Physics deepens your understanding of the table, of course—showing you what its microscopic constituents are and why they hold themselves together—but the table still “exists.”  In much the same way, physics enormously deepened our understanding of what we mean by the “flow of time”—showing how the “flow” emerges from the time-symmetric equations of physics, combined with the time-asymmetric phenomena of thermodynamics, which increase the universe’s entropy as we move away from the Big Bang, and thereby allow for the creation of memories, records, and other irreversible effects (a part of the story that I didn’t even get into here). But it feels overwrought to say that, because physics gives us a perspective from which we can see the “flow of time” as emerging from something deeper, therefore the “flow” doesn’t exist, or is just an illusion.

Hope that helps!


(followup question)

Dear Professor,

I’ve been thinking about the “block universe” and it seems to me that in it past, present and future all coexist.  So on the basis of Einstein’s theory, do all exist eternally, and why do we perceive only the present?


But you don’t perceive only the present!  In the past, you perceived what’s now the past (and which you now remember), and in the future, you’ll perceive what’s now the future (and which you now look forward to), right?  And as for why the present is the present, and not some other point in time?  Well, that strikes me as one of those questions like why you’re you, out of all the possible people who you could have been instead, or why, assuming there are billions of habitable planets, you find yourself on earth and not on any of the other planets.  Maybe the best answer is that you had to be someone, living somewhere, at some particular point in time when you asked this question—and you could’ve wondered the same thing regardless of what the answer had turned out to be.