# Planet Musings

## December 12, 2013

### Backreaction — The Comeback of Massive Gravity

Massive gravity, a modification of general relativity in which gravitons have mass, has an interesting history. Massive gravity was long believed to be internally inconsistent, but physicists at Stockholm University now claim to have constructed a consistent theory for massive gravity. This theory is a viable alternative to general relativity and can address some of its problems.

In Einstein’s theory of general relativity gravitational waves spread with the speed of light, and the quanta of the gravitational field, the gravitons, are expected to do the same*. To be more precise, gravitons move with the speed of massless particles because they are assumed to be massless. But whether or not a particle is indeed massless is in the end a question of experiment.

Neutrinos were long believed to be massless, but we know today that at least two of them have tiny non-zero masses (whose absolute value has not yet been determined). The mass of the photon is known to be zero to extremely high precision on experimental grounds. But what about gravity? This is a timely question because a small mass would lead to a long-distance modification of general relativity, and present observational evidence left physicists with some puzzles at these long distances, notably dark energy and dark matter.

However, to be able to even properly ask whether gravitons have masses, we need a consistent theory for massive gravity. But making gravitons massive is a challenge for the theoretical physicist. In fact, it was long believed to be impossible.

The problems start when you want to introduce a mass-term into general relativity. For vector fields, you can take a contraction of fields of the form AνAν to stand in front of the mass term. In general relativity the field is the metric tensor, and the only full contractions that you can create without using derivatives are constant: they create a cosmological constant, not a graviton mass. If you want a mass-term in general relativity you need a second two-tensor, that is a field which looks like a metric but isn’t the metric. Theories of this type are also known as ‘bi-metric’. Massive gravity is thus intimately related to bi-metric gravity.

But that’s only the beginning of the problems, a beginning that dates back more than 70 years.

In 1939, Fierz and Pauli wrote down a theory of massive gravity in the perturbative limit. They found that for the theory to be consistent – meaning free of ‘ghosts’ that lead to unphysical instabilities – the parameters in the mass-terms must have specific values. With these values, the theory is viable.

In 1970 however, van Dam and Veltman and, independently, Zukharov, showed that in the Fierz-Pauli approach, the limit in which the mass of the graviton is taken to zero is not continuous and does, contrary to naïve expectations, not reproduce general relativity. Any graviton mass, regardless how small, leads to deviations that can contribute factors of order one to observables, which is in conflict with observation. The Fierz-Pauli theory now seemed theoretically fine, but experimentally ruled out.

Two years later, in 1972, Vainshtein argued that this discontinuity is due to the treatment of the gravitational degrees of freedom in the linearization procedure and can be cured in a full, non-linear, version of massive gravity. Unfortunately, in the same year, Deser and Boulware claimed that any non-linear completion of the Fierz-Pauli approach reintroduces the ghost. So now massive gravity was experimentally fine but theoretically sick.

Nothing much happened in this area for more than 30 years. Then, in the early 2000s, the wormy can was opened again by Arkani-Hamed et al and Creminelli et al, but they essentially confirmed the Deser-Boulware problem.

The situation began to look brighter in 2010, when de Rahm and Gabadadze proposed a theory of massive gravity that did not suffer from the ghost-problem in a certain limit. Needless to say, after massive gravity had been thought dead and buried for 40 years, nobody really believed this would work. The de Rahm-Gabadadze approach did not make many friends because the second metric was treated as a fixed background field, and the theory was shown to allow for superluminal propagation (and, more recently, acausality).

However, starting in 2011, Fawad Hassan and Rachel Rosen from Stockholm University (ie next door), succeeded in formulating a theory of massive gravity that does not suffer from the ghost instability. The key to success was a generalization of the de Rahm-Gabadadze approach in which the second metric is also fully dynamic, and the interaction terms between the two metrics take on a specific form. The specific form of the interaction terms is chosen such that it generates a constraint which removes the ghost field. The resulting theory is to best present knowledge fully consistent and symmetric between the two metrics.

(Which, incidentally, explains my involvement with the subject, as I published a paper with a fully dynamic, symmetric, bi-metric theory in 2008, though I wasn’t interested in the massive case and don’t have interaction terms. The main result of my paper is that I ended up in defense committees of Fawad’s students.)

In the last years, the Stockholm group has produced a series of very interesting papers that not only formalizes their approach and shows its consistency, but they also derived specific solutions. This is not a small feat as it is already difficult to find solutions in general relativity if you have only one metric and having two doesn’t make the situation easier. Indeed, not many solutions are presently known, and the known ones have quite strong symmetry assumptions. (More students in the pipe...)

Meanwhile, others have studied how well this modification of general relativity fares as an alternative to ΛCDM. It has been found that massive gravity can fit all cosmological data without the need to introduce an additional cosmological constant. But before you get too excited about this, note that massive gravity has more free parameters than ΛCDM, that being the coupling constants in the interaction terms.

What is missing right now though is a smoking-gun signal, some observation that would allow to distinguish massive gravity from standard general relativity and could be used to distinguish between both. This is presently a very active area of research and one that I’m sure we’ll hear more about.

* To be precise, in the classical theory we should be speaking of gravitational waves instead. The frequent confusion between gravitational waves and gravitons, the latter of which only exist in quantized gravity, is a bad habit but forgivable. Far worse are people who say ‘gravity wave’ when they refer to a gravitational wave. A gravity wave is a type of cloud formation and has nothing to do with linearized gravity.

### Steinn Sigurðsson — Yule lads: it begins

Tonight it begins:

Be good.

Believe!

### David Hogg — finding good priors

In a secret "overwhelming force" project, Foreman-Mackey is resampling all the Kepler Objects of Interest, to produce a full probabilistic catalog. In many low signal-to-noise systems, there are fitting degeneracies (really near-degeneracies). In many of these, the posterior pdf does not accord with our intuitive views about what ought to be going on. We realized that this is because our "flat priors" didn't accord with our real, prior beliefs. That is, there was a disparity between our actual prior beliefs and our coded-up prior function. We knew this would be the case—we are using wrong but simple interim priors that we plan to replace with a hierarchical inference—but it was amusing to be reminded of the the simple point that ugly priors lead to ugly inferences. We made some small changes to the priors, resampled, and our inferences look much better.

### Scott Aaronson — 23, Me, and the Right to Misinterpret Probabilities

If you’re the sort of person who reads this blog, you may have heard that 23andMe—the company that (until recently) let anyone spit into a capsule, send it away to a DNA lab, and then learn basic information about their ancestry, disease risks, etc.—has suspended much of its service, on orders from the US Food and Drug Administration.  As I understand it, on Nov. 25, the FDA ordered 23andMe to stop marketing to new customers (though it can still serve existing customers), and on Dec. 5, the company stopped offering new health-related information to any customers (though you can still access the health information you had before, and ancestry and other non-health information is unaffected).

Of course, the impact of these developments is broader: within a couple weeks, “do-it-yourself genomics” has gone from an industry whose explosive growth lots of commentators took as a given, to one whose future looks severely in doubt (at least in the US).

The FDA gave the reasons for its order in a letter to Ann Wojcicki, 23andMe’s CEO.  Excerpts:

For instance, if the BRCA-related risk assessment for breast or ovarian cancer reports a false positive, it could lead a patient to undergo prophylactic surgery, chemoprevention, intensive screening, or other morbidity-inducing actions, while a false negative could result in a failure to recognize an actual risk that may exist.  Assessments for drug responses carry the risks that patients relying on such tests may begin to self-manage their treatments through dose changes or even abandon certain therapies depending on the outcome of the assessment.  For example, false genotype results for your warfarin drug response test could have significant unreasonable risk of illness, injury, or death to the patient due to thrombosis or bleeding events that occur from treatment with a drug at a dose that does not provide the appropriately calibrated anticoagulant effect …  The risk of serious injury or death is known to be high when patients are either non-compliant or not properly dosed; combined with the risk that a direct-to-consumer test result may be used by a patient to self-manage, serious concerns are raised if test results are not adequately understood by patients or if incorrect test results are reported.

To clarify, the DNA labs that 23andMe uses are already government-regulated.  Thus, the question at issue here is not whether, if 23andMe claims (say) that you have CG instead of CC at some particular locus, the information is reliable.  Rather, the question is whether 23andMe should be allowed to tell you that fact, while also telling you that a recent research paper found that people with CG have a 10.4% probability of developing Alzheimer’s disease, as compared to a 7.2% base rate.  More bluntly, the question is whether ordinary schmoes ought to be trusted to learn such facts about themselves, without a doctor as an intermediary to interpret the results for them, or perhaps to decide that there’s no good reason for the patient to know at all.

Among medical experts, a common attitude seems to be something like this: sure, getting access to your own genetic data is harmless fun, as long as you’re an overeducated nerd who just wants to satisfy his or her intellectual curiosity (or perhaps narcissism).  But 23andMe crossed a crucial line when it started marketing its service to the hoi polloi, as something that could genuinely tell them about health risks.  Most people don’t understand probability, and are incapable of parsing “based on certain gene variants we found, your chances of developing diabetes are about 6 times higher than the baseline” as anything other than “you will develop diabetes.”  Nor, just as worryingly, are they able to parse “your chances are lower than the baseline” as anything other than “you won’t develop diabetes.”

I understand this argument.  Nevertheless, I find it completely inconsistent with a free society.  Moreover, I predict that in the future, the FDA’s current stance will be looked back upon as an outrage, with the subtleties in the FDA’s position mattering about as much as the subtleties in the Church’s position toward Galileo (“look, Mr. G., it’s fine to discuss heliocentrism among your fellow astronomers, as a hypothesis or a calculational tool—just don’t write books telling the general public that heliocentrism is literally true, and that they should change their worldviews as a result!”).  That’s why I signed this petition asking the FDA to reconsider its decision, and I encourage you to sign it too.

Here are some comments that might help clarify my views:

(1) I signed up for 23andMe a few years ago, as did the rest of my family.  The information I gained from it wasn’t exactly earth-shattering: I learned, for example, that my eyes are probably blue, that my ancestry is mostly Ashkenazi, that there’s a risk my eyesight will further deteriorate as I age (the same thing a succession of ophthalmologists told me), that I can’t taste the bitter flavor in brussels sprouts, and that I’m an “unlikely sprinter.”  On the other hand, seeing exactly which gene variants correlate with these things, and how they compare to the variants my parents and brother have, was … cool.  It felt like I imagine it must have felt to buy a personal computer in 1975.  In addition, I found nothing the slightest bit dishonest about the way the results were reported.  Each result was stated explicitly in terms of probabilities—giving both the baseline rate for each condition, and the rate conditioned on having such-and-such gene variant—and there were even links to the original research papers if I wanted to read them myself.  I only wish that I got half as much context and detail from conventional doctor visits—or for that matter, from most materials I’ve read from the FDA itself.  (When Dana was pregnant, I was pleasantly surprised when some of the tests she underwent came back with explicit probabilities and base rates.  I remember wishing doctors would give me that kind of information more often.)

(2) From my limited reading and experience, I think it’s entirely possible that do-it-yourself genetic testing is overhyped; that it won’t live up to its most fervent advocates’ promises; that for most interesting traits there are just too many genes involved, via too many labyrinthine pathways, to make terribly useful predictions about individuals, etc.  So it’s important to me that, in deciding whether what 23andMe does should be legal, we’re not being asked to decide any of these complicated questions!  We’re only being asked whether the FDA should get to decide the answers in advance.

(3) As regular readers will know, I’m far from a doctrinaire libertarian.  Thus, my opposition to shutting down 23andMe is not at all a corollary of reflexive opposition to any government regulation of anything.  In fact, I’d be fine if the FDA wanted to insert a warning message on 23andMe (in addition to the warnings 23andMe already provides), emphasizing that genetic tests only provide crude statistical information, that they need to be interpreted with care, consult your doctor before doing anything based on these results, etc.  But when it comes to banning access to the results, I have trouble with some of the obvious slippery slopes.  E.g., what happens when some Chinese or Russian company launches a competing service?  Do we ban Americans from mailing their saliva overseas?  What happens when individuals become able just to sequence their entire genomes, and store and analyze them on their laptops?  Do we ban the sequencing technology?  Or do we just ban software that makes it easy enough to analyze the results?  If the software is hard enough to use, so only professional biologists use it, does that make it OK again?  Also, if the FDA will be in the business of banning genomic data analysis tools, then what about medical books?  For that matter, what about any books or websites, of any kind, that might cause someone to make a poor medical decision?  What would such a policy, if applied consistently, do to the multibillion-dollar alternative medicine industry?

(4) I don’t understand the history of 23andMe’s interactions with the FDA.  From what I’ve read, though, they have been communicating for five years, with everything 23andMe has said in public sounding conciliatory rather than defiant (though the FDA has accused 23andMe of being tardy with its responses).  Apparently, the key problem is simply that the FDA hasn’t yet developed a regulatory policy specifically for direct-to-consumer genetic tests.  It’s been considering such a policy for years—but in the meantime, it believes no one should be marketing such tests for health purposes before a policy exists.  Alas, there are very few cases where I’d feel inclined to support a government in saying: “X is a new technology that lots of people are excited about.  However, our regulatory policies haven’t yet caught up to X.  Therefore, our decision is that X is banned, until and unless we figure out how to regulate it.”  Maybe I could support such a policy, if X had the potential to level cities and kill millions.  But when it comes to consumer DNA tests, this sort of preemptive banning seems purposefully designed to give wet dreams to Ayn Rand fans.

(5) I confess that, despite everything I’ve said, my moral intuitions might be different if dead bodies were piling up because of terrible 23andMe-inspired medical decisions.  But as far as I know, there’s no evidence so far that even a single person was harmed.  Which isn’t so surprising: after all, people might run to their doctor terrified about something they learned on 23onMe, but no sane doctor would ever make a decision solely on that basis, without ordering further tests.

## December 11, 2013

### Dave Bacon — Are articles in high-impact journals more like designer handbags, or monarch butterflies?

Monarch butterfly handbag (src)

US biologist Randy Schekman, who shared this year’s physiology and medicine Nobel prize, has made prompt use of his new bully pulpit. In

How journals like Nature, Cell and Science are damaging science: The incentives offered by top journals distort science, just as big bonuses distort banking

he singled out these “luxury” journals as a particularly harmful part of the current milieu in which “the biggest rewards follow the flashiest work, not the best,” and he vowed no longer to publish in them. An accompanying Guardian article includes defensive quotes from representatives of Science and Nature, especially in response to Schekman’s assertions that the journals favor controversial articles over boring but scientifically more important ones like replication studies, and that they deliberately seek to boost their impact factors by restricting the number of articles published, “like fashion designers who create limited-edition handbags or suits.”  Focusing on journals, his main concrete suggestion is to increase the role of open-access online journals like his elife, supported by philanthropic foundations rather than subscriptions. But Schekman acknowledges that blame extends to funding organizations and universities, which use publication in high-impact-factor journals as a flawed proxy for quality, and to scientists who succumb to the perverse incentives to put career advancement ahead of good science.  Similar points were made last year in Serge Haroche’s thoughtful piece on why it’s harder to do good science now than in his youth.   This, and Nature‘s recent story on Brazilian journals’ manipulation of impact factor statistics, illustrate how prestige journals are part of the solution as well as the problem.

Weary of people and institutions competing for the moral high ground in a complex terrain, I sought a less value-laden approach,  in which scientists, universities, and journals would be viewed merely as interacting IGUSes (information gathering and utilizing systems), operating with incomplete information about one another. In such an environment, reliance on proxies is inevitable, and the evolution of false advertising is a phenomenon to be studied rather than disparaged.  A review article on biological mimicry introduced me to some of the refreshingly blunt standard terminology of that field.  Mimicry,  it said,  involves three roles:  a model,  i.e.,  a living or material agent emitting perceptible signals, a mimic that plagiarizes the model, and a dupe whose senses are receptive to the model’s signal and which is thus deceived by the mimic’s similar signals.  As in human affairs, it is not uncommon for a single player to perform several of these roles simultaneously.

### Terence Tao — Ultraproducts as a Bridge Between Discrete and Continuous Analysis

(This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)

Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.

The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:

 (Discrete) (Continuous) (Limit method) Ramsey theory Topological dynamics Compactness Density Ramsey theory Ergodic theory Furstenberg correspondence principle Graph/hypergraph regularity Measure theory Graph limits Polynomial regularity Linear algebra Ultralimits Structural decompositions Hilbert space geometry Ultralimits Fourier analysis Spectral theory Direct and inverse limits Quantitative algebraic geometry Algebraic geometry Schemes Discrete metric spaces Continuous metric spaces Gromov-Hausdorff limits Approximate group theory Topological group theory Model theory

As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:

• Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects ${x_n}$ in a common space ${X}$, which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object ${\lim_{n \rightarrow \infty} x_n}$, which remains in the same space, and is “close” to many of the original objects ${x_n}$ with respect to the given metric or topology.
• Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects ${x_n}$ in a category ${X}$, which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit ${\varinjlim x_n}$ or the inverse limit ${\varprojlim x_n}$ of these objects, which is another object in the same category ${X}$, and is connected to the original objects ${x_n}$ by various morphisms.
• Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects ${x_{\bf n}}$ or of spaces ${X_{\bf n}}$, each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups, ${X_{\bf n}}$ might be groups and ${x_{\bf n}}$ might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ or a new space ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$, which is still a model of the same language (e.g. if the spaces ${X_{\bf n}}$ were all groups, then the limiting space ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ is an abelian group, then the ${X_{\bf n}}$ will also be abelian groups for many ${{\bf n}}$.)

The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects ${x_{\bf n}}$ to all lie in a common space ${X}$ in order to form an ultralimit ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$; they are permitted to lie in different spaces ${X_{\bf n}}$; this is more natural in many discrete contexts, e.g. when considering graphs on ${{\bf n}}$ vertices in the limit when ${{\bf n}}$ goes to infinity. Also, no convergence properties on the ${x_{\bf n}}$ are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces ${X_{\bf n}}$ involved are required in order to construct the ultraproduct.

With so few requirements on the objects ${x_{\bf n}}$ or spaces ${X_{\bf n}}$, the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the ${x_{\bf n}}$, will be exactly obeyed by the limit object ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.

Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.

Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.

— 1. Basic theory —

To set up the machinery of ultraproducts (and nonstandard analysis) we will need to pick a non-principal ultrafilter ${\alpha \in \beta {\bf N} \backslash {\bf N}}$ on the natural numbers ${{\bf N} = \{1,2,\ldots\}}$. By definition, this is a collection ${\alpha}$ of subsets of the natural numbers ${{\bf N}}$ that obeys the following axioms:

• (i) ${\alpha}$ contains ${{\bf N}}$, but does not contain ${\emptyset}$.
• (ii) If ${A \subset {\bf N}}$ is in ${\alpha}$, then any subset of ${{\bf N}}$ containing ${A}$ is in ${\alpha}$.
• (iii) If ${A, B}$ lie in ${\alpha}$, then ${A \cap B}$ also lies in ${\alpha}$.
• (iv) If ${A \subset {\bf N}}$, then exactly one of ${A}$ and ${{\bf N} \backslash A}$ lies in ${\alpha}$.
• (v) No finite set lies in ${\alpha}$.

A collection that obeys axioms (i)-(iii) is a filter, a collection that obeys (i)-(iv) is an ultrafilter, and a collection that obeys (i)-(v) is a non-principal ultrafilter. (One can certainly build ultrafilters on other sets than the natural numbers ${{\bf N}}$, but for the type of applications we have in mind, ultrafilters on ${{\bf N}}$ are generally sufficient, much as how one can largely rely on sequences ${(x_n)_{n \in {\bf N}}}$ instead of nets ${(x_a)_{a \in A}}$ when using considering topological or metric notions of a limit in concrete applications.)

If ${\alpha}$ is an ultrafilter, a subset ${A}$ of ${{\bf N}}$ is said to be ${\alpha}$-large if it lies in ${\alpha}$, and ${\alpha}$-small otherwise. To emphasise the limit interpretation of ultrafilters, we will also use “for ${n}$ sufficiently close to ${\alpha}$” as a synonym for “for an ${\alpha}$-large set of ${n}$“. Note from the above axioms that if ${{\bf N}}$ is partitioned into finitely many components, then exactly one of these components is ${\alpha}$-large. One can thus think of an ultrafilter as a voting protocol, which converts the preferences of a countably infinite number of voters (indexed by the natural numbers) amongst a finite number of candidates into a “winner”, namely the candidate who amasses an ${\alpha}$-large set of votes; this voting protocol obeys all the hypotheses of Arrow’s theorem (rationality, independence of irrelevant alternatives, etc.) but not its conclusion (there are no dictators), due to the infinite number of voters. One can also view an ultrafilter as a finitely additive, ${\{0,1\}}$-valued probability measure on the natural numbers, in which every subset of ${{\bf N}}$ is measurable.

Every natural number ${n}$ defines a principal ultrafilter ${\bar{n}}$ (that is, an ultrafilter that does not obey axiom (v)), defined by setting ${A \in \bar{n}}$ if and only if ${n \in A}$. By abuse of notation, we may identify ${\bar{n}}$ with ${n}$, so that the set of principal ultrafilters is identified with ${{\bf N}}$. The set of all ultrafilters (principal and non-principal) is denoted ${\beta {\bf N}}$, and may be identified with the Stone-Cech compactification of the natural numbers; this space has a number of interesting structures on it (for instance, it is a left-continuous topological semigroup, which is a useful fact in Ramsey theory and topological dynamics, as discussed in this previous post), but we will not discuss these structures further here. In particular, the use of ultrafilters in establishing correspondence principles between discrete and continuous structures is unrelated to the use of ultrafilters (particularly minimal or idempotent ultrafilters) in Ramsey theory and topological dynamics that is discussed in that previous post. (However, it is certainly possible to use both types of ultrafilters in separate places for a single application; see this paper of Bergelson and myself for an example.)

The existence of a non-principal ultrafilter ${\alpha}$ can be guaranteed by an application of Zorn’s lemma (basically by running the transfinite version of a greedy algorithm to build the collection of ${\alpha}$-large and ${\alpha}$-small sets):

Lemma 1 There exists a non-principal ultrafilter ${\alpha \in \beta {\bf N} \backslash {\bf N}}$.

Proof: Define the Frechet filter ${F}$ to be the collection of all the cofinite subsets of ${{\bf N}}$, that is to say those subsets of ${{\bf N}}$ whose complement is finite. This is a filter. A routine application of Zorn’s lemma shows that this filter must be contained in at least one maximal filter ${\alpha}$. If ${\alpha}$ is not an ultrafilter, then there is a set ${E}$ which is neither ${\alpha}$-large or ${\alpha}$-small; if we then let ${\alpha'}$ be the filter generated by ${\alpha}$ and ${E}$ (that is, ${\alpha'}$ consists of those subsets of ${{\bf N}}$ that either lie in ${\alpha}$, or contain the intersection of ${E}$ with an ${\alpha}$-large set), then one checks that ${\alpha'}$ is a filter that is strictly larger than ${\alpha}$, contradicting the maximality of ${\alpha}$. Thus ${\alpha}$ is an ultrafilter; since it contains ${F}$, it is non-principal as required. $\Box$

Remark 1 The above argument relied upon the axiom of choice. One can construct ultrafilters using slightly weaker set-theoretic axioms than the axiom of choice (e.g. the Boolean prime ideal theorem), but if one has no choice-like axioms whatsoever (or more precisely, one uses just the axioms ZF of Zermelo-Fraenkel set theory) then the existence of ultrafilters is undecidable. However, there is a result of Gödel that asserts that if a result is finitary in the sense that it can be phrased as a first-order statement in Peano Arithmetic, and it can be proven using the axiom of choice (or more precisely in ZFC set theory), then it can also be proven without the axiom of choice (i.e. in ZF set theory). In practical terms, this means that one can ignore the reliance on the axiom of choice for any finitary application of ultraproduct methods. Alternatively, instead of using Zorn’s lemma to construct a genuine ultrafilter, it is often possible (though messy) to build some sort of “partial” or “incomplete” ultrafilter which is still close enough to an ultrafilter to prove a desired result, but which does not need the full strength of the axiom of choice to construct. We will not discuss such methods here, but see e.g. this paper of Palmgren. (See also this previous blog post for a “cheap” version of nonstandard analysis which uses the Frechet filter rather than an ultrafilter.) In any event, we will freely rely on the axiom of choice in the rest of this post.

Henceforth, we select a single non-principal ultrafilter ${\alpha \in \beta {\bf N} \backslash {\bf N}}$ to use in the rest of the post. The choice of this ultrafilter is highly non-unique (${\beta {\bf N} \backslash {\bf N}}$ is uncountable), but for the purposes of establishing a correspondence between discrete and continuous mathematics, the precise choice of ultrafilter ${\alpha}$ will not be relevant, as long as it is kept fixed throughout the discussion.

We can use the ultrafilter ${\alpha}$ to form limits of sequences ${(x_{\bf n})_{{\bf n} \in {\bf N}}}$, even if these sequences have no limit in the topological or metric senses. For instance, if ${x_{\bf n} \in X}$ takes values in a single finite space ${X}$ for all natural numbers ${{\bf n}}$, then we can define the ultralimit to be the unique element ${x_\alpha = \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ of ${X}$ such that ${x_{\bf n} = x_\alpha}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$; this element exists and is unique by the ultrafilter axioms (i)-(iv). For instance, the ultralimit ${(-1)^\alpha := \lim_{{\bf n} \rightarrow \alpha} (-1)^{\bf n}}$ of the sequence ${1,-1,1,-1,\ldots}$ is either ${1}$ or ${-1}$, with the former occurring if the even numbers are ${\alpha}$-large, and the latter occurring if the odd numbers are ${\alpha}$-large. So the ultralimit here depends on ${\alpha}$, but once ${\alpha}$ is fixed, the ultralimit is well-defined. One can also view ${x_\alpha = \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ as the outcome of an “election” among candidates in ${X}$, in which each “voter” ${{\bf n}}$ has ${{\bf n}}$ as a preferred candidate, and the ultrafilter ${\alpha}$ is used as the election protocol. (See this previous post for further discussion of this viewpoint.)

Now we consider the more general situation in which the ${x_{\bf n}}$ to not lie in a single finite space ${X}$, but instead each ${x_{\bf n}}$ lies in a space ${X_{\bf n}}$ which may depend on ${{\bf n}}$, and may also be infinite. Here, there does not necessarily exist a single object ${x_\alpha}$ that is equal to the ${x_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$. The situation here is similar to that in metric spaces, in that a Cauchy sequence in a metric space ${X = (X,d)}$ need not be convergent if the space ${X}$ is infinite. In the latter case, the problem can be resolved by constructing the metric completion ${\overline{X}}$ of ${X}$, which can be defined (slightly non-rigorously) as the space of formal limits ${\lim_{n \rightarrow \infty} x_n}$ of Cauchy sequences ${x_n}$, with two formal limits ${\lim_{n \rightarrow \infty} x_n}$, ${\lim_{n \rightarrow \infty} y_n}$ declared to be equal if ${d(x_n,y_n)}$ converges to zero (in the classical sense) as ${n \rightarrow \infty}$. To construct this Cauchy completion rigorous within the framework of set theory, one can define ${\overline{X}}$ to be the set of equivalence classes of tuples ${(x_n)_{n \in {\bf N}}}$ of Cauchy sequences in ${X}$, with the equivalence relation ${(x_n)_{n \in {\bf N}} \sim (y_n)_{n \in {\bf N}}}$ holding if and only if ${d(x_n,y_n)}$ converges to zero as ${n \rightarrow \infty}$. In order to view ${X}$ as a subset of its completion ${\overline{X}}$, one usually identifies each point ${x \in X}$ with the equivalence class of the corresponding constant sequence ${(x)_{n \in {\bf N}}}$. With this construction, one can for instance build the real numbers ${{\bf R}}$ as the metric completion of the rationals ${{\bf Q}}$ (and this is indeed one of the most popular ways to construct the reals, which is often covered in undergraduate analysis classes). Note though while one can use this set-theoretic machinery of equivalence classes of Cauchy sequences of rationals to construct the reals, for the purposes of using the real numbers ${{\bf R}}$ for mathematical applications, this set-theoretic construction is not relevant; the only thing one needs to retain is that Cauchy sequences of rationals (or of reals) will converge to a real, and conversely every real is the limit of a Cauchy sequence of rationals.

We can take a similar tack to construct ultralimits ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ of sequences ${x_{\bf n} \in X_{\bf n}}$ in different spaces ${X_{\bf n}}$. If one is willing to be slightly non-rigorous, the definition is almost the same: we can view these ultralimits as formal objects, with two ultralimits ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ and ${\lim_{{\bf n} \rightarrow \alpha} y_{\bf n}}$ declared to be equal if and only if ${x_{\bf n}=y_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$. To do things rigorously within the framework of set theory, one has to take a little extra care to avoid the set-theoretic issues associated to Russell’s paradox (or more precisely, avoiding the uses of proper classes that are too large to be sets). One way to deal with this is as follows. One can postulate the existence of a standard universe ${{\mathfrak U}}$ that contains all the objects of interest for one’s particular mathematical problem: for instance, if one is interested in a sequence of groups ${G_{\bf n}}$, then ${{\mathfrak U}}$ might contain all the elements of all of the ${G_{\bf n}}$. The only requirements we make of this standard universe are that (a) it is a set (rather than a proper class), and (b) it is fixed throughout the mathematical discussion. The former requirement means that ${{\mathcal U}}$, in general, cannot contain all spaces of a particular type; for instance, one cannot make ${{\mathcal U}}$ contain all groups, all vector spaces, all graphs, etc., as one would soon run into Russell-type paradoxes if this occurred. However, in practice, for any given mathematical problem (other than those arising within mathematical logic), one usually does not need to work simultaneously with all spaces; usually a much smaller number of spaces, e.g. a countable or uncountable family of such spaces, will suffice. Henceforth we will take the existence of a standard universe ${{\mathfrak U}}$ for granted. Elements of this universe will be referred to as standard objects, and subsets of the universe as standard spaces; for instance, we will usually place the classical number systems ${{\bf N}, {\bf Z}, {\bf R}}$, etc. inside this standard universe, so elements of ${{\bf N}}$ become standard natural numbers, elements of ${{\bf R}}$ become standard reals, and so forth. We then have:

Definition 2 (Formal definition of ultralimits and ultraproducts) Let ${(x_{\bf n})_{{\bf n} \in E}}$ be a sequence of standard objects, defined for an ${\alpha}$-large set ${E}$ (thus ${x_{\bf n} \in {\mathfrak U}}$ for all ${n \in E}$). The ultralimit ${x_\alpha = \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ is then defined to be the equivalence class of tuples ${(y_{\bf n})_{{\bf n} \in E'}}$ of standard objects defined on an ${\alpha}$-large set ${E'}$, such that ${x_{\bf n} = y_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$ in the common domain ${E \cap E'}$ of definition. Such an ultralimit is called a nonstandard object, and the set of all ultralimits is called the nonstandard universe ${{}^* {\mathfrak U}}$. We identify every standard object ${x}$ with the nonstandard object ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$, thus embedding ${{\mathfrak U}}$ in ${{}^* {\mathfrak U}}$.

Given a sequence ${X_{\bf n}}$ of standard spaces (i.e. subsets of ${{\mathfrak U}}$) defined for ${{\bf n}}$ sufficiently close to ${\alpha}$, the ultraproduct ${X_\alpha = \prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ is defined as the set of all ultralimits ${\lim_{{\bf n} \rightarrow\alpha} x_{\bf n}}$, where ${x_{\bf n}}$ lies in ${X_{\bf n}}$ for an ${\alpha}$-large set of ${n}$. Such an ultraproduct (which is a subset of ${{}^* {\mathfrak U}}$) will be called a nonstandard space (also known as an internal space). Similarly, the ultraproduct of a sequence of standard groups will be called a nonstandard group (or internal group), the ultraproduct of a sequence of standard fields will be called a nonstandard field (or internal field), and so forth.

If ${X}$ is a standard space, the nonstandard space ${\prod_{{\bf n} \rightarrow\alpha} X}$ (that is, the space of ultralimits of sequences in ${X}$) will be called the ultrapower of ${X}$ and is denoted ${{}^* X}$; note that ${X}$ embeds as a subset of ${{}^* X}$. The ultrapower ${{}^* {\bf N}}$ of the standard natural numbers will be referred to as the nonstandard natural numbers, the ultrapower ${{}^* {\bf R}}$ of the standard reals will be referred to as the nonstandard real numbers (also known as the umber), and so forth.

One can verify that when ${X}$ is finite, the ultrapower ${{}^* X}$ is equal to ${X}$ itself, and that the notion of an ultralimit ${\lim_{{\bf n} \rightarrow\alpha} x_{\bf n}}$ in the case when all the ${x_{\bf n}}$ lie in ${X}$ corresponds to the notion of an ultralimit defined previously.

The basic ontological Venn diagram is this: the standard universe ${{\mathfrak U}}$ embeds into the larger nonstandard universe ${{}^* {\mathfrak U}}$, and both exist inside the even larger “external universe” of ZFC set theory, which we will not give a name to (it is not a set, but is instead a proper class). This allows us to study various mathematical objects on three different levels: standard, nonstandard, and external. For instance, we have standard groups (groups in ${{\mathfrak U}}$) and nonstandard groups (ultraproducts of groups in ${{\mathfrak U}}$); both are examples of external groups (groups that are somewhere in the external universe), but there are certainly examples of external groups that are neither standard groups nor nonstandard groups. It is also worth cautioning that while we identify standard objects ${x}$ with their nonstandard counterparts ${\lim_{{\bf n} \rightarrow\alpha} x}$, we do not identify standard spaces ${X}$ with their nonstandard counterparts ${{}^* X = \prod_{{\bf n} \rightarrow\alpha} X}$; in particular, we do not view standard groups as examples of nonstandard groups (except in the finite case), nor do we view standard fields as examples of nonstandard fields, etc. Instead, one can view the nonstandard space ${{}^* X}$ as a “completion” of the standard space ${X}$, analogous to metric completion; see this previous blog post for further discussion of this perspective. A related perspective is to view ${{}^* X}$ as a double-precision version of ${X}$, so for instance ${{}^* {\bf R}}$ can be viewed as the space of “double-precision real numbers” as opposed to the “single-precision” real numbers in ${{\bf R}}$, or similarly ${{}^* {\bf Z}}$ may be viewed as a space of “long integers“, with ${{\bf Z}}$ being a space of “short integers”.

The analogues of standard groups, nonstandard groups, and external groups in finitary mathematics would be “groups ${G}$ that do not depend on an asymptotic parameter ${n}$“, “groups ${G_{\bf n}}$ that may depend on an asymptotic parameter ${{\bf n}}$“, and “informal group-like objects (such as the “group” of all “bounded” integers ${m = O(1)}$) that may require asymptotic notation to define” respectively. The latter concept is hard to pin down rigorously in finitary mathematics, even though the intuition it captures (e.g. that the “set” of “bounded” integers should be closed under addition) is fairly clear; but once one uses ultraproducts to separate the three levels of standard, nonstandard, and external objects, it is possible to make rigorous sense of these sorts of intuitions, as we shall see later.

Given a sequence ${f_{\bf n}: X_{\bf n} \rightarrow Y_{\bf n}}$ of standard functions between standard sets ${X_{\bf n},Y_{\bf n}}$ for an ${\alpha}$-large set of ${n}$, we define the ultralimit ${f_\alpha = \lim_{{\bf n} \rightarrow\alpha} f_{\bf n}}$ of this sequence to be the function ${f_\alpha: X_\alpha \rightarrow Y_\alpha}$ from the nonstandard space ${X_\alpha := \prod_{{\bf n} \rightarrow\alpha} X_{\bf n}}$ to the nonstandard space ${Y_\alpha := \prod_{{\bf n} \rightarrow\alpha} Y_{\bf n}}$ defined by the formula

$\displaystyle f_\alpha( \lim_{{\bf n} \rightarrow\alpha} x_{\bf n} ) := \lim_{{\bf n} \rightarrow\alpha} f_{\bf n}( x_{\bf n} )$

for all sequences ${x_{\bf n}}$ that lie in ${X_{\bf n}}$ for an ${\alpha}$-large set of ${n}$. It is easy to see that this ultralimit is well-defined; we refer to such functions as nonstandard functions.

From the above construction, we may now transfer operations and relations on standard spaces to their nonstandard counterparts. For instance, if ${G_{\bf n}}$ is a sequence of standard groups, then one can take the ultralimit of the multiplication operations ${\cdot_{\bf n}: G_{\bf n} \times G_{\bf n} \rightarrow G_{\bf n}}$ to obtain a multiplication operation ${\cdot_\alpha: G_\alpha \times G_\alpha \rightarrow G_\alpha}$ on the ultraproduct ${G_\alpha := \prod_{{\bf n} \rightarrow\alpha} G_{\bf n}}$; similarly, the inversion operation on ${G_{\bf n}}$ gives rise to an inversion operation on ${G_\alpha}$. Similarly, the arithmetic operations on the standard natural numbers ${{\bf N}}$ can be transferred to arithmetic operations on the nonstandard natural numbers ${{}^* {\bf N}}$, and properties and relations on the natural numbers can similarly be transferred. For instance, a nonstandard number ${N = \lim_{{\bf n} \rightarrow\alpha} N_{\bf n}}$ can be said to be prime if the ${N_{\bf n}}$ are prime for an ${\alpha}$-large set of ${n}$ (i.e. the ultralimit of the truth value of “${N_{\bf n}}$ is prime” is true); one nonstandard number ${N = \lim_{{\bf n} \rightarrow\alpha} N_{\bf n}}$ can be said to be larger than another ${M = \lim_{{\bf n} \rightarrow\alpha} M_{\bf n}}$ if one has ${N_{\bf n} > M_{\bf n}}$ for an ${\alpha}$-large set of ${n}$; and so forth.

One can then check by hand that statements that are true for standard objects are also true for their nonstandard counterparts. For instance, because the group operations of the standard groups ${G_{\bf n}}$ are associative, the group operation on the nonstandard group ${G_\alpha}$ is also associative, and in fact ${G_\alpha}$ will also be a group (as viewed externally). Similarly, addition and multiplication in the nonstandard natural numbers ${{}^* {\bf N}}$ obey the usual commutative, associative, and distributive laws, because these transfer from the corresponding laws for the standard natural numbers ${{\bf N}}$. For a more topical example, it is now a theorem that given any standard natural number ${N}$, there exist standard primes ${p,q}$ such that ${N < p < q \leq p+600}$; it is an instructive exercise to check that the same result then automatically holds in the nonstandard setting, thus for every nonstandard natural number ${N}$ there exist nonstandard primes ${p,q}$ such that ${N < p < q \leq p+600}$.

These facts are special cases of the transfer principle, and can be formalised by Łos’s theorem. Informally, this theorem asserts that if an “elementary” sentence holds for (an ${\alpha}$-large subsequence of) a sequence of standard objects, then it also holds in the ultralimit, and conversely. To make this assertion rigorous, we need some of the notation from first-order logic; this is somewhat formal and the reader may wish to skim briefly over the definitions and proofs that follow. To simplify the discussion, we restrict attention to one-sorted languages, although in practice one often works with many-sorted languages instead.

Definition 3 (Languages and structures, formal definition) A signature is a collection of formal operation symbols ${O}$ and relation symbols ${R}$, each of which is assigned an arity ${k}$ (a non-negative integer). (For instance, the signature of the language of groups would consist of the ${2}$-ary operation ${\cdot}$, the ${1}$-ary operation ${()^{-1}}$, and the ${0}$-ary operation ${1}$.) ${0}$-ary operations are also known as constant symbols. Given a signature and an infinite supply of variable symbols ${x}$, define a term to be a formal string generated by the following rules:

• Every variable symbol ${x}$ is a term.
• If ${O}$ is a ${k}$-ary operation and ${z_1,\ldots,z_k}$ are terms, then ${O( z_1,\ldots,z_k )}$ is a term.

By abuse of notation, we often move operation symbols to more traditional locations, for instance writing ${+(x,y)}$ as ${(x+y)}$, ${\cdot(x,y)}$ as ${(x \cdot y)}$, etc.. By further abuse of notation, parentheses are usually omitted when there is no loss of ambiguity, e.g. ${(x+y)}$ is often abbreviated as ${x+y}$.

A formula ${\phi(x_1,\ldots,x_m)}$ involving some finite number of distinct variable symbols ${x_1,\ldots,x_m}$ (known as the free variables of the formula) for some ${m \geq 0}$ is then a formal string generated by the following rules:

• If ${R}$ is a ${k}$-ary relation and ${z_1,\ldots,z_k}$ are terms which only involve variables from ${x_1,\ldots,x_m}$, then ${R(z_1,\ldots,z_k)}$ is a formula of ${x_1,\ldots,x_m}$.
• If ${z, z'}$ are terms only involving variables from ${x_1,\ldots,x_m}$, then ${(z=z')}$ is a formula with free variables ${x_1,\ldots,x_m}$.
• If ${\varphi = \varphi(x_1,\ldots,x_{\bf n}), \psi = \psi(x_1,\ldots,x_m)}$ are formulae with free variables ${x_1,\ldots,x_m}$, then ${(\neg \varphi), (\varphi \wedge \psi), (\varphi \vee \psi), (\varphi \implies \psi)}$, and ${(\varphi \iff \psi)}$ are formulae with free variables ${x_1,\ldots,x_m}$.
• If ${\varphi = \varphi(x_1,\ldots,x_m,x)}$ is a formula with free variables ${x_1,\ldots,x_m,x}$ (or some permutation thereof), then ${(\forall x: \varphi)}$ and ${(\exists x: \varphi)}$ are formulae with free variables ${x_1,\ldots,x_m}$.

A formula with no free variables is known as a sentence. As before, we abuse notation by moving relation symbols to more traditional locations, e.g. writing ${<(x,y)}$ as ${(x < y)}$, and by removing parentheses whenever there is no loss of ambiguity; we also permit concatenation of quantifiers, for instance abbreviating ${(\forall x: (\forall y: \varphi(x,y)))}$ as ${\forall x,y: \varphi(x,y)}$. Thus, for instance, in the language of the natural numbers,

$\displaystyle \exists p,q: n = p+q \wedge \neg (\exists a,b > 1: p=ab) \wedge \neg (\exists c,d > 1: q=cd)$

is a formula of one variable ${n}$, and

$\displaystyle \forall n > 2: (\exists m: n=2m) \implies$

$\displaystyle (\exists p,q: n = p+q \wedge \neg (\exists a,b > 1: p=ab) \wedge \neg (\exists c,d > 1: q=cd))$

is a sentence (which, in this case, encodes the even Goldbach conjecture).

We refer to the collection ${{\mathcal L}}$ of such formulae as the first-order language (or language, for short) associated with the given signature (and to the supply of variable symbols).

A structure ${M}$ for a first-order language ${{\mathcal L}}$ is a set ${M}$, together with a map ${O_M: M^k \rightarrow M}$ for every ${k}$-ary operation ${O}$, and a boolean relation ${R_M: M^k \rightarrow \{\hbox{true}, \hbox{false}\}}$ for every ${k}$-ary relation ${R}$. (For instance, a group is a structure for the first-order language of groups.) Any term ${z}$ of ${n}$ variables ${x_1,\ldots,x_{\bf n}}$ may be interpreted as a function ${z_M: M^{\bf n} \rightarrow M}$ by viewing the variable symbols ${x_1,\ldots,x_{\bf n}}$ as ranging in ${M}$, and replacing each operation ${O}$ or relation ${R}$ with their respective interpretations ${O_k, R_k}$. Similarly, any formula ${\phi(x_1,\ldots,x_{\bf n})}$ may be interpreted as a function ${\phi_M: M^{\bf n} \rightarrow\{\hbox{true}, \hbox{false}\}}$, and in particular any sentence ${\phi}$ is interpreted to be either true or false (and we write ${M \models \phi}$ if the former occurs), by interpreting quantifiers such as ${\forall x}$ and ${\exists x}$ as quantification over ${M}$. (For instance, ${\exists x: \phi(x,y)}$ is interpreted as true for ${y \in M}$ if there exists ${x \in M}$ such that ${\phi_M(x,y)}$ is true.)

Given a sequence ${M_{\bf n}}$ of standard structures for a given first-order language ${{\mathcal L}}$, we can form the ultraproduct ${M_\alpha := \prod_{{\bf n} \rightarrow\alpha} M_{\bf n}}$ in the obvious manner, with the nonstandard interpretations ${O_{M_\alpha}, R_{M_\alpha}}$ of the operator and relation symbols ${O,R}$ being the ultralimits of their standard counterparts. We refer to this ultraproduct as a nonstandard structure for the first-order language ${{\mathcal L}}$; both standard structures and nonstandard structures are examples of external structures, but as remarked before we do not consider standard structures as examples of nonstandard structures (except when the structure is finite).

Theorem 4 (Łos’s theorem) Let ${M_{\bf n}}$ be a sequence of standard structures for a first-order language ${{\mathcal L}}$, and let ${M_\alpha = \prod_{{\bf n} \rightarrow\alpha} M_{\bf n}}$ be the associated nonstandard structure.

• (i) If ${z}$ is a term involving the variables ${x_1,\ldots,x_m}$, then ${z_{M_\alpha}: M_\alpha^m\rightarrow M_\alpha}$ is the ultralimit of the ${z_{M_{\bf n}}: M_{\bf n}^m \rightarrow M_{\bf n}}$.
• (ii) If ${\phi(x_1,\ldots,x_m)}$ is a formula with free variables ${x_1,\ldots,x_m}$, then ${\phi_{M_\alpha}: M_\alpha^m \rightarrow \{ \hbox{true}, \hbox{false} \}}$ is the ultralimit of ${\phi_{M_{\bf n}}: M_{\bf n}^m \rightarrow \{ \hbox{true}, \hbox{false} \}}$.
• (iii) If ${\phi}$ is a sentence, then ${M_\alpha \models \phi}$ is true if and only if ${M_{\bf n} \models \phi}$ is true for ${{\bf n}}$ sufficiently close to ${\alpha}$.

Informally, this theorem asserts that first-order logic is continuous with respect to ultralimits.

Proof: The claim (i) is obtained by a routine induction on the length of the term ${z}$ and the ultrafilter axioms, while the claim (iii) is a consequence of (ii). The claim (ii) is similarly obtained by a routine induction on the length of the formula ${\phi}$ and the ultrafilter axioms; the only non-trivial claim to verify is that if the formula ${(x_1,\ldots,x_m,x) \mapsto \phi_{M_\alpha}(x_1,\ldots,x_m,x)}$ is the ultralimit of the formulae ${(x_1,\ldots,x_m,x) \mapsto \phi_{M_{\bf n}}(x_1,\ldots,x_m,x)}$, then the formula ${(x_1,\ldots,x_m) \mapsto \exists x \in M_\alpha: \phi_{M_\alpha}(x_1,\ldots,x_m,x)}$ is the ultralimit of the formulae ${(x_1,\ldots,x_m) \mapsto \exists x \in M_{\bf n}: \phi_{M_{\bf n}}(x_1,\ldots,x_m,x)}$, and similarly for the universal quantifier ${\forall}$.

We just prove the claim for the existential quantifier ${\exists}$, as the claim for ${\forall}$ is similar (and can be deduced from the existential case by rewriting ${\forall x:\phi}$ in the equivalent form ${\neg( \exists x: \neg \phi) )}$. Suppose first that ${x_{i,\alpha} = \lim_{{\bf n} \rightarrow \alpha} x_{i,{\bf n}} \in M_\alpha}$ for ${i=1,\ldots,k}$ are such that ${\exists x_{\bf n} \in M_{\bf n}: \phi_{M_{\bf n}}(x_{1,{\bf n}},\ldots,x_{m,{\bf n}},x_{\bf n})}$ holds for ${{\bf n}}$ sufficiently close to ${\alpha}$; by the axiom of choice, we may thus find ${x_{\bf n} \in M_{\bf n}}$ such that ${\phi_{M_{\bf n}}(x_{1,{\bf n}},\ldots,x_{m,{\bf n}},x_{\bf n})}$ holds for ${{\bf n}}$ sufficiently close to ${\alpha}$. Taking ultralimits and using the induction hypothesis, we conclude that ${\phi_{M_{\alpha}}(x_{1,{\alpha}},\ldots,x_{m,{\alpha}},x_{\alpha})}$ where ${x_\alpha := \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$, so that ${\exists x \in M_\alpha: \phi_{M_\alpha}(x_{1,\alpha},\ldots,x_{m,\alpha},x_\alpha)}$ is true. Conversely, suppose that ${\exists x_{\bf n} \in M_{\bf n}: \phi_{M_\alpha}(x_{1,{\bf n}},\ldots,x_{m, {\bf n}},x_{\bf n})}$ fails for ${{\bf n}}$ sufficiently close to ${\alpha}$; then for any ${x_\alpha = \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ in ${M_\alpha}$, ${\phi_{M_{\bf n}}(x_{1,{\bf n}},\ldots,x_{m, {\bf n}},x_{\bf n})}$ fails for ${{\bf n}}$ sufficiently close to ${\alpha}$, and hence on taking ultralimits ${\phi_{M_{\alpha}}(x_{1,{\alpha}},\ldots,x_{m, {\alpha}},x_{\alpha})}$ fails also, and the claim follows. $\Box$

We now give a classic consequence of Łos’s theorem to logic, namely the countable case of the compactness theorem:

Corollary 5 (Compactness theorem, countable case) Let ${{\mathcal L}}$ be a first-order language, and let ${\phi_1, \phi_2, \ldots}$ be a sequence of sentences. Suppose that for each natural number ${{\bf n}}$, there exists a structure ${M_{\bf n}}$ for ${{\mathcal L}}$ such that ${\phi_1,\ldots,\phi_{{\bf n}}}$ are satisfied by this structure. Then there exists a structure ${M}$ such that ${\phi_1,\phi_2,\ldots}$ are simultaneously satisfied by this structure.

The general case of the compactness theorem may be established by using ultraproducts on more general sets than the natural numbers; we leave the verification of this fact to the reader.

Proof: We select the standard universe ${{\mathfrak U}}$ to contain all of the ${M_{\bf n}}$, so that the ${M_{\bf n}}$ are all standard structures. If we then set ${M}$ to be the nonstandard structure ${\prod_{{\bf n} \rightarrow \alpha} M_{\bf n}}$, the claim follows from Łos’s theorem. $\Box$

— 2. Rank decompositions —

We can now give a simple example of how ultralimits may be used to connect quantitative finitary statements to qualitative infinitary statements, by giving an easy case of the polyomial regularity lemma, introduced by Ben Green and myself. Given a finite-dimensional vector space ${V}$ over a field ${{\bf F}}$ and a natural number ${d}$, we define a polynomial ${P: V \rightarrow {\bf F}}$ of degree at most ${d}$ on ${V}$ to be a function of the form

$\displaystyle P( x_1 e_1 + \ldots + x_n e_n ) = \sum_{i_1,\ldots,i_n \geq 0: i_1+\ldots+i_n \leq d} c_{i_1,\ldots,i_n} x_1^{i_1} \ldots x_n^{i_n}$

for any basis ${e_1,\ldots,e_n}$ of ${V}$ and some coefficients ${c_{i_1,\ldots,i_n} \in F}$; it is easy to see that the choice of basis is irrelevant for this definition. If ${V}$ is infinite dimensional instead of finite dimensional, we say that ${P: V \rightarrow F}$ is a polynomial of degree at most ${d}$ if its restriction to any finite subspace is still a polynomial of degree at most ${d}$. The space of such polynomials will be denoted ${\hbox{Poly}_d(V)}$; this is itself a vector space over ${{\bf F}}$.

If ${P \in \hbox{Poly}_d(V)}$ for some ${d \geq 2}$, we define the rank ${\hbox{rank}_{d-1}(P)}$ to be the least integer ${m}$ such that we can find ${Q_1,\ldots,Q_m \in \hbox{Poly}_{d-1}(V)}$ such that ${P}$ is a function of the ${Q_1,\ldots,Q_m}$, that is to say there exists a function ${f: {\bf F}^m \rightarrow {\bf F}}$ such that ${P(x) = f(Q_1(x),\ldots,Q_m(x))}$ for all ${x \in V}$. When ${V}$ is finite-dimensional, the rank is necessarily finite (just take ${Q_1,\ldots,Q_m}$ to be the coordinate functions), but the rank may be infinite when ${V}$ is infinite dimensional. This notion of rank is analogous to the notion of the rank of a quadratic form (which is essentially the ${d=2}$ case of the above notion).

We then have the following simple fact:

• (i) (Representation modulo low rank) For each ${1 \leq i \leq a}$, there exists a representation ${P_i = \sum_{j=1}^b c_{ij} Q_j + E_i}$, where ${c_{ij} \in {\bf F}}$ and ${\hbox{rank}_{d-1}(E_i) \leq M}$.
• (ii) (Independence modulo low rank) For every ${c_1,\ldots,c_b \in {\bf F}}$, not all zero, we have ${\hbox{rank}_{d-1}(\sum_{j=1}^b c_j Q_j) \geq F(M)}$.

(In applications, one actually needs an iterated version of this lemma, in which the degree ${d-1}$ polynomials involved in ${E_i}$ are themselves regularised in terms of high rank or lower degree polynomials, and so on and so forth down to the linear polynomials; for simplicity we will not discuss this iterated version here, but see the above-mentioned paper of Ben and myself for more discussion.)

The standard proof of this lemma is not difficult, and can be described by the following algorithm:

• (Step 0) Initialise ${a=b}$, ${Q_j = P_j}$ for ${j=1,\ldots,b}$, and ${M = 1}$.
• (Step 1) If property (ii) holds then STOP. Otherwise, move on to Step 2.
• (Step 2) By the failure of (ii), and by permuting the ${Q_j}$ if necessary, we may write ${Q_b}$ as a linear combination of ${Q_1,\ldots,Q_{b-1}}$, plus a polynomial of rank at most ${F(M)}$. If we then delete ${Q_b}$, we see that every ${P_i}$ is still a linear combination of ${Q_1,\ldots,Q_{b-1}}$, plus an error of rank at most ${M+F(M)}$.
• (Step 3) Replace ${b}$ by ${b-1}$ (thus deleting ${Q_b}$), replace ${M}$ by ${M+F(M)}$, and return to Step 1.

Since ${b}$ decreases by one at each loop of the algorithm, it loops at most ${a}$ times, and the claim follows.

Indeed, one just sets ${w_1,\ldots,w_b}$ to be a basis of the linear span of the ${v_1,\ldots,v_a}$. The ${w_1,\ldots,w_b}$ can also be constructed from the ${v_1,\ldots,v_a}$ by an algorithm extremely similar to the one given above; we leave this to the reader as an exercise.

From Lemma 7 we immediately obtain a qualitative version of Lemma 6:

• (i) (Representation modulo low rank) For each ${1 \leq i \leq a}$, there exists a representation ${P_i = \sum_{j=1}^b c_{ij} Q_j + E_i}$, where ${c_{ij} \in {\bf F}}$ and ${\hbox{rank}_{d-1}(E_i) < \infty}$.
• (ii) (Independence modulo low rank) For every ${c_1,\ldots,c_b \in {\bf F}}$, not all zero, we have ${\hbox{rank}_{d-1}(\sum_{j=1}^b c_j Q_j) = +\infty}$.

Proof: Let ${W}$ be the set of all polynomials in ${\hbox{Poly}_d(V)}$ of finite rank; this is clearly a subspace of ${\hbox{Poly}_d(V)}$, so we may form the quotient space ${\hbox{Poly}_d(V)/W}$, which is still a vector space over ${{\bf F}}$. If we then apply Lemma 7 to the projection ${v_1,\ldots,v_a}$ of the ${P_1,\ldots,P_a}$ to this quotient space ${\hbox{Poly}_d(V)/W}$, we obtain the claim. $\Box$

While Corollary 8 certainly looks very similar to Lemma 6, it appears at first glance that it cannot be used to prove Lemma 6, because the latter lemma has meaningful content when ${V}$ is finite-dimensional, whereas the former Corollary is trivial in that setting. However, this can be rectified by passing from the finitary setting to the infinitary setting. This can be done in a number of ways, but we will do this using the ultrafilter machinery just developed.

Proof: (Proof of Lemma 6 using Corollary 8.) We use the “compactness and contradiction” method. Suppose for contradiction that Lemma 6 failed. Carefully negating the quantifiers, and using the axiom of choice, this means that there exists ${a \geq 0}$ and ${F: {\bf N} \rightarrow {\bf N}}$, with the property that for each ${{\bf n}}$, we can find a vector space ${V_{\bf n}}$ over a field ${{\bf F}_{\bf n}}$ and polynomials ${P_{1,{\bf n}},\ldots,P_{a,{\bf n}} \in \hbox{Poly}_d(V_{\bf n})}$, such that for any ${1 \leq M \leq {\bf n}}$, there does NOT exist ${0 \leq b \leq a}$, ${Q_{1,{\bf n}},\ldots,Q_{b,{\bf n}} \in \hbox{Poly}_d(V_{\bf n})}$, obeying the properties (i), (ii) of Lemma 6 with the given data.

We may place all of the objects just constructed in a standard universe ${{\mathfrak U}}$. Taking ultralimits (and using Łos’s theorem repeatedly), we obtain a nonstandard field ${{\bf F}_\alpha = \prod_{{\bf n} \rightarrow \alpha} {\bf F}_{\bf n}}$, a nonstandard vector space ${V_\alpha = \prod_{{\bf n} \rightarrow \alpha} V_{\bf n}}$ over that field, polynomials ${P_{1,{\bf \alpha}},\ldots,P_{a,{\bf \alpha}} \in \hbox{Poly}_d(V_\alpha)}$, with the property that for any standard integer ${M \geq 1}$, there does NOT exist ${0 \leq b \leq a}$, ${Q_{1,\alpha},\ldots,Q_{b,\alpha} \in \hbox{Poly}_d(V_\alpha)}$, obeying the properties (i), (ii) of Lemma 6 with the given data. However, if one applies Corollary 8, we can find ${Q_{1,\alpha},\ldots,Q_{b,\alpha} \in \hbox{Poly}_d(V_\alpha)}$ obeying the conclusions (i), (ii) of Corollary 8. This implies the conclusions (i), (ii) of Lemma 6 if ${M}$ is chosen large enough, giving the required contradiction. $\Box$

— 3. Saturation and colouring —

Ultraproducts enjoy a very useful compactness-type property, known as countable saturation (or more precisely, ${\omega_1}$-saturation). This property may be phrased in many ways, but we will use the following version which emphasises the analogy with topological compactness.

Let ${X}$ be a nonstandard set, thus ${X = \prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ for some standard sets ${X_{\bf n}}$. We call an internal subset or nonstandard subset of ${X}$ to be any subset ${Y}$ of ${X}$ of the form ${Y = \prod_{{\bf n} \rightarrow \alpha} Y_{\bf n}}$; by Łos’s theorem we see that ${Y_{\bf n} \subset X_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$. For instance, in the nonstandard natural numbers ${{}^* {\bf N}}$, the nonstandard intervals ${[N] := \{ n \in {}^* {\bf N}: n \leq N \}}$ are internal sets for any nonstandard natural number ${N \in {}^* {\bf N}}$; indeed, if ${N = \lim_{{\bf n} \rightarrow \alpha} N_n}$, then ${[N]}$ is the ultraproduct of the standard intervals ${[N_{\bf n}]}$. More generally, from Łos’s theorem we see that any subset of ${X}$ that is defined by a first-order predicate is an internal set. On the other hand, as we shall see later, the standard natural numbers ${{\bf N}}$ are not an internal subset of ${{}^* {\bf N}}$ (and are thus merely an external subset of ${{}^* {\bf N}}$).

From Łos’s theorem we see that the internal subsets of a nonstandard set ${X}$ form a boolean algebra. We then have the following compactness property:

We remark that if we had used ultraproducts on larger index sets than the natural numbers, then we would be able to obtain stronger saturation properties than countable saturation, although it is never possible to get unlimited saturation (since every point in a nonstandard set ${X}$ is an internal set). In model theoretic applications it can be technically convenient to work with an extremely saturated model (e.g. a structure which is saturated up to an inaccessible cardinal), but for most applications outside of logic and set theory, such “monster models” are not necessary, and the more modest device of ultraproducts over countable index sets is often sufficient.

Proof: It suffices to prove the second claim, as the former follows by taking complements.

Write ${F_m = \prod_{{\bf n} \rightarrow \alpha} F_{m,{\bf n}}}$, then for each ${m}$, ${\bigcap_{m=1}^M F_m}$ is the ultraproduct of ${\bigcap_{m=1}^M F_{m,{\bf n}}}$. In particular, ${\bigcap_{m=1}^M F_{m,{\bf n}}}$ is non-empty for all ${{\bf n}}$ in an ${\alpha}$-large subset ${E_M}$ of the natural numbers. By shrinking the ${E_M}$ as necessary we may assume that they are monotone, thus ${E_1 \supset E_2 \supset \ldots}$, and such that ${E_M}$ does not contain any natural number less than ${M}$; in particular, ${\bigcap_{M=1}^\infty E_M = \emptyset}$. For any ${{\bf n} \in E_1}$, let ${M_{\bf n}}$ denote the largest natural number ${M}$ such that ${{\bf n} \in E_M}$, and then let ${x_{\bf n}}$ denote an element of ${\bigcap_{m=1}^{M_{\bf n}} F_{m,{\bf n}}}$. Then by construction, for each ${m \geq 1}$, we have ${x_{\bf n} \in F_{m,{\bf n}}}$ for all ${{\bf n} \in E_m}$, and thus the nonstandard object ${x := \lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ lies in ${F_m}$ for all ${m}$, and thus ${\bigcap_{m=1}^\infty F_m}$ is non-empty as required. $\Box$

This shows for instance that (as claimed previously) ${{\bf N}}$ is not an internal subset of ${{}^* {\bf N}}$, since it can be countably covered by the singleton sets ${\{n\}}$ for ${n \in {\bf N}}$ which are internal, but for which no finite cover exists. Among other things, this establishes the overspill principle: if a nonstandard formula ${P: {}^* {\bf N} \rightarrow \{\hbox{true},\hbox{false}\}}$ holds for all standard natural numbers, then it must hold for at least one unbounded natural number (defined as a nonstandard natural number that is not a standard natural number).

Another immediate corollary of the above proposition is that if a nonstandard set ${X}$ is covered by a countable sequence ${E_1,E_2,\ldots}$ of internal subsets, then the topology ${{\mathcal F}}$ on ${X}$ generated by ${E_1,E_2,\ldots}$ is compact (because it has a countable base of internal susbets, and Proposition 9 then gives countable compactness).

As an application of this compactness observation, we use ultraproducts to illustrate the topological dynamics version of the Furstenberg correspondence principle, that connects Ramsey theory to topological dynamics. More precisely, we consider the following results:

It is not difficult to use Theorem 10 to establish Theorem 11. Conversely, we will use ultraproducts to show that Theorem 11 implies Theorem 10. (We will not prove either of these theorems here, but see e.g. this previous blog post for a proof.)

Again, we use compactness and contradiction. Suppose for contradiction that Theorem 10 failed. Then there exists ${k,r \geq 1}$ such that for any standard ${{\bf n}}$, we may partition ${[{\bf n}]}$ into colour classes ${[{\bf n}] = E_{1,{\bf n}} \cup \ldots E_{r,{\bf n}}}$, none of which contain any ${k}$-term arithmetic progressions. Taking ultraproducts, we obtain a partition ${[N] = E_1 \cup \ldots E_r}$ of the nonstandard interval ${[N]}$ associated to the unbounded natural number ${N := \lim_{{\bf n} \rightarrow \alpha} {\bf n}}$ into internal colour classes ${E_1,\ldots,E_r}$, none of which contains a nonstandard ${k}$-term arithmetic progression.

Now consider the shifted colour classes ${E_i + n \cap [N]}$ for ${n \in {\bf Z}}$ and ${i=1,\ldots,r}$. These are a countable family of internal sets that cover ${[N]}$, and so by countable saturation, this gives ${[N]}$ the structure of a compact topological space.

We would like to use the shift map ${T: x \mapsto x+1}$ on ${[N]}$. Unfortunately, this map is not quite a bijection on ${[N]}$. To get around this, we work in the slightly smaller space

$\displaystyle X := \bigcap_{n \in {\bf N}} [n, N-n].$

A further application of countable saturation reveals that ${X}$ is still compact (with the induced topology from ${[N]}$). Furthermore, the shift map ${T: x \mapsto x+1}$ can be verified to be a homeomorphism on ${X}$, and as ${N}$ is unbounded, we see from the overspill principle that ${X}$ is non-empty. It is covered by the open sets ${E_i \cap X}$, so by Theorem 11, we can find ${x \in X}$ and a (standard) positive integer ${n}$ such that ${x, x+n, \ldots, x+(k-1)n}$ lie in a single ${E_i \cap X}$, thus ${E_i}$ contains a non-standard ${k}$-term arithmetic progression, giving the required contradiction.

— 4. Algebraic geometry and ultraproducts —

Another use of ultraproducts is to provide a bridge between quantitative algebraic geometry and qualitative algebraic geometry. This topic is discussed in detail in this previous post; we will just give some key results here without proof.

Theorem 12 (Algebraic geometry is continuous with respect to ultralimits) Let ${k_{\bf n}}$ be a sequence of algebraically closed fields, let ${d \geq 0}$ be a natural number, and let ${V_{\bf n} \subset k_{\bf n}^d}$ be a (standard) affine algebraic set, that is to say the solution set of some finite number of polynomials ${P_{1,{\bf n}}, \ldots, P_{M,{\bf n}}: k_{\bf n}^d \rightarrow k_{\bf n}}$. We define the complexity of an algebraic set ${V}$ to be the least ${M}$ such that ${V}$ is cut out by at most ${M}$ polynomials, each of degree at most ${M}$.

Then the nonstandard set ${V_\alpha := \prod_{{\bf n} \rightarrow \alpha} V_{\bf n}}$ is an algebraic set if and only if the complexity of the ${V_{\bf n}}$ are uniformly bounded for ${{\bf n}}$ sufficiently close to ${\alpha}$. Furthermore, if ${V_\alpha}$ is indeed an algebraic set, then:

• (Continuity of dimension) The dimension of ${V_\alpha}$ is equal to the dimension of ${V_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$.
• (Continuity of irreducibility) ${V_\alpha}$ is irreducible if and only if ${V_{\bf n}}$ is irreducible for ${{\bf n}}$ sufficiently close to ${\alpha}$.
• (Continuity of degree) The degree of ${V_\alpha}$ is equal to the degree of ${V_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$.

A similar statement holds for constructive sets: an ultraproduct ${E_\alpha = \prod_{{\bf n} \rightarrow \alpha} E_{\bf n}}$ of constructive sets ${E_{\bf n}}$ is constructive if and only if the original sets have uniformly bounded complexity for ${{\bf n}}$ sufficiently close to ${\alpha}$. These claims are proven in the previous blog post; the basic idea is to express basic concepts such as dimension, irreducibility, and degree in first-order terms, so that Łos’s theorem may be applied.

Using this theorem and the compactness and contradiction method, one can easily convert a number of qualitative algebraic results into quantitative ones. For instance, it is a basic fact in qualitative algebraic geometry that every algebraic set decomposes into a finite number of irreducible sets. Using the above theorem (and a relationship between degree and complexity), one can then show that the number of such sets is bounded by a quantity depending only on the degree of the original set (and in particular, uniform in the choice of field); again, see the previous blog post for details. Such quantitative bounds can be obtained by other means (for instance, by using the machinery of schemes), but the ultrafilter approach, being so general in nature, requires almost no actual algebraic geometry to set up and use.

— 5. Metric completion and Gromov-Hausdorff limits —

We previously gave an analogy between ultraproducts and metric completion. It turns out that this analogy can be made substantially more precise. We first need some basic nonstandard analysis notation. Call a nonstandard real number ${x \in {}^* {\bf R}}$ bounded if one has ${|x| \leq C}$ for some standard ${C>0}$, and infinitesimal if one has ${|x| \leq c}$ for all standard ${c>0}$. For instance, if ${N}$ is an unbounded natural number, then ${1+\frac{1}{N}}$ is bounded (but not standard), and ${\frac{1}{N}}$ is infinitesimal. We write ${x=O(1)}$ and ${x=o(1)}$ to denote the assertions that ${x}$ is bounded and infinitesimal respectively.

Proof: Uniqueness is clear (as no non-zero standard real can be infinitesimal). For existence, we consider the “Dedekind cut” ${\{ y \in {\bf R}: y < x \}}$, which is a non-empty bounded half-line in ${{\bf R}}$. Taking ${\hbox{st}(x)}$ to be the supremum of this cut, we easily verify that ${\hbox{st}(x)-\epsilon \leq x \leq \hbox{st}(x)+\epsilon}$ for any standard ${\epsilon > 0}$, and the claim follows. $\Box$

One can also prove this theorem by the usual Bolzano-Weierstrass argument, trapping ${x}$ in a nested sequence of dyadic standard intervals; we leave this argument to the interested reader.

One can view the standard part operation ${\hbox{st}: O({\bf R}) \rightarrow {\bf R}}$ algebraically, as a homomorphism from the bounded nonstandard reals ${O({\bf R}) := \{ x \in {}^* {\bf R}: x = O(1) \}}$ to the standard reals with kernel ${o({\bf R}) := \{ x \in {}^* {\bf R}: x = o(1)\}}$; note that the sets ${O({\bf R}), o({\bf R})}$ are perfectly well-defined as an (external) set in the nonstandard analysis formalism, even though it does not quite make rigorous set-theoretic sense in finitary mathematics (one cannot use finitary asymptotic notation such as ${O()}$ inside set-builder notation). Thus for instance we can say that ${o({\bf R})}$ is an ideal in the ring ${O({\bf R})}$, which is an assertion which is intuitively obvious in the finitary setting, but a bit hard to formalise properly. Note that ${{\bf R}}$ can then be interpreted as the image under the standard part map of the bounded nonstandard rationals ${O({\bf Q})}$. This can in fact be used to give an alternative definition of the real numbers, which is the nonstandard version of the Cauchy completion construction; we leave this construction to the reader.

We can perform similar constructions with other metric spaces. Actually, in order to have a reaonable notion of boundedness, it is convenient to work in the category of pointed metric spaces ${(X,x_0) = (X,d,x_0)}$, that is to say a metric space ${(X,d)}$ with a preferred “origin” ${x_0 \in X}$. If one has a sequence ${(X_{\bf n}, d_{\bf n}, x_{0,{\bf n}})}$ of standard pointed metric spaces, one can take ultralimits and create a nonstandard pointed metric space ${(X_\alpha, d_\alpha, x_{0,\alpha})}$. Here, ${X_\alpha := \prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ is the ultraproduct of the ${X_{\bf n}}$, ${x_{0,\alpha} := \lim_{{\bf n} \rightarrow \alpha} x_{0,{\bf n}}}$ is a point in ${X_\alpha}$, and ${d_\alpha: X_\alpha \times X_\alpha \rightarrow {}^* [0,+\infty)}$ is the ultralimit of the ${d_{\bf n}}$.

At present, ${(X_\alpha,d_\alpha)}$ is a nonstandard metric space, but not an external metric space, because the nonstandard distance ${d_\alpha(x,y)}$ between two points in ${X_\alpha}$ is a nonstandard real, and not necessarily a bounded one. To create an external metric space, we cut down the size of ${X_\alpha}$ in two different ways. Firstly, we restrict attention to the bounded portion ${O(X_\alpha) := \{ x \in X_\alpha: d_\alpha(x,x_{0,\alpha}) = O(1) \}}$ of ${X_\alpha}$. Next, we place an equivalence relation ${\sim}$ on ${O(X_\alpha)}$ by declaring ${x \sim y}$ if ${d_\alpha(x,y) = o(1)}$. The quotient space ${O(X_\alpha)/\sim}$, which we will denote suggestively as ${O(X_\alpha)/o(X_\alpha)}$ can then be checked to be a metric space with metric given by the formula

$\displaystyle d( [x], [y] ) := \hbox{st}( d_\alpha(x,y) )$

whenever ${x,y \in O(X_\alpha)}$, with corresponding equivalence classes ${[x],[y] \in O(X_\alpha)/o(X_\alpha)}$. The quotient map ${\hbox{st}: O(X_\alpha) \rightarrow O(X_\alpha)/o(X_\alpha)}$ will be called the standard part map, as it generalises the map in Proposition 13.

We refer to the pointed external metric spaces ${(O(X_\alpha)/o(x_\alpha), x_{0,\alpha})}$ as the nonstandard hull of the standard pointed metric spaces ${(X_{\bf n}, x_{0,{\bf n}})}$; it is also known as the ultralimit of these spaces, although we avoid this terminology here as we are using the term “ultralimit” for a slightly different (although certainly related) operation. These spaces are automatically complete, as can easily be verified using countable saturation. If the spaces ${X_{\bf n}}$ are asymptotically uniformly locally totally bounded in the sense that for every ${R,r>0}$ there exists ${M>0}$ such that for ${{\bf n}}$ sufficiently close to ${\alpha}$, one can cover the balls ${B(x_{0,{\bf n}}, R)}$ by at most ${M}$ balls of radius ${r}$, then the nonstandard hull ${O(X_\alpha)/o(x_\alpha)}$ is also locally totally bounded by Łos’s theorem, and thus locally compact by the Heine-Borel theorem. Furthermore, it is not difficult to show in this case that the ${(X_{\bf n}, x_{0,{\bf n}})}$ converge along ${\alpha}$ to ${(X_\alpha, x_{0,\alpha})}$ in the pointed Gromov-Hausdorff sense, thus for every ${R, \epsilon > 0}$, the Gromov-Hausdorff distance between ${B(x_{0,{\bf n}}, R)}$ and ${B(x_{0,{\alpha}}, R)}$ is less than ${\epsilon}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$. In particular, this demonstrates the basic fact that any sequence of asymptotically uniformly locally totally bounded pointed metric spaces has a convergent subsequence in the pointed Gromov-Hausdorff sense.

There are several interesting special cases of the nonstandard hull construction. For instance, if the spaces ${X_{\bf n} = X}$ are equal to a single locally totally bounded space, then the nonstandard hull becomes a metric completion ${\overline{X}}$ of ${X}$, with every bounded sequence ${x_{\bf n}}$ in ${X}$ giving rise to a limit point ${\hbox{st}( \lim_{{\bf n} \rightarrow \alpha} x_{\bf n})}$ in ${\overline{X}}$. When the metric spaces are normed vector spaces, in which nonstandard analysis can be used as a framework to describe the theory of concentration compactness; see this previous blog post for further discussion. Finally, if one starts with a finitely generated group ${G}$ with a word metric ${d}$, then by taking the nonstandard hull of ${G}$ with a rescaled word metric ${\frac{1}{R_{\bf n}} d}$ for a well-chosen set of radii ${R_{\bf n}}$, one can obtain a useful metric space known as the asymptotic cone of ${G}$, which for instance can be used to establish Gromov’s theorem on groups of polynomial growth; see this paper of van den Dries and Wilkie for details. Related to this, one can take the non-standard hull of a sequence of approximate groups (and the groups generated by such approximate groups) to obtain open neighbourhoods of locally compact groups, allowing one to use the theory of Hilbert’s fifth problem (which classifies the latter) to study approximate groups; see this text of mine for more discussion.

— 6. Loeb measure and regularity —

Measure theory, with its emphasis on countable additivity, is not obviously a first-order theory, and so it is not a priori obvious that it interacts well with ultraproducts. Nevertheless, thanks primarily to the countable saturation property of ultraproducts, one can often obtain a good notion of measure on nonstandard spaces, by carefully taking the ultralimit of measures on standard spaces. We give a fundamental such construction here, namely the Loeb measure construction, although for simplicity we restrict attention to the setting of nonstandard finite sets (also known as hyperfinite sets) to avoid some technicalities. Loeb measures are a special case of the more general concept of a Kiesler measure in model theory, but we will not discuss this more general concept here.

Let ${X_{\bf n}}$ be a sequence of standard non-empty finite sets, so that the ultraproduct ${X_\alpha = \prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ is a nonstandard non-empty finite set. On each of the ${X_{\bf n}}$, we can define the uniform probability measure ${\mu_{\bf n}}$, which assigns the measure ${\mu_{\bf n}(E_{\bf n}) := \frac{|E_{\bf n}|}{|X_{\bf n}|}}$ to each subset ${E_{\bf n}}$ of ${X_{\bf n}}$. Of course, ${\mu_{\bf n}}$ is a finitely additive probability measure. Taking ultraproducts and then taking standard parts, we can then define a finitely additive probability measure ${\mu_\alpha^0}$ on the boolean algebra ${{\cal B}_{X_\alpha}}$ of internal subsets ${E_\alpha = \prod_{{\bf n} \rightarrow \alpha} E_{\bf n}}$ of ${X_\alpha}$, by the formula

$\displaystyle \mu_\alpha^0( \prod_{{\bf n} \rightarrow \alpha} E_{\bf n} ) := \hbox{st}( \lim_{{\bf n} \rightarrow \alpha} \mu_{\bf n}( E_{\bf n} ) ).$

One easily verifies that this is a finitely additive probability measure. Furthermore, thanks to the countable saturation property, we see that ${\mu_\alpha^0}$ is a premeasure, that is to say that one has the countable additivity property

$\displaystyle \mu_\alpha^0( \bigcup_{m=1}^\infty E_m ) = \sum_{m=1}^\infty \mu_\alpha^0( E_m )$

whenever ${E_1,E_2,\ldots}$ are a family of disjoint internal subsets of ${X_\alpha}$ whose union is also an internal subset. Applying the Caratheodory extension theorem (or more precisely, the Hahn-Kolmogorov theorem), we can then extend ${\mu_\alpha^0}$ to a countably additive probability measure ${\mu_\alpha}$ on the ${\sigma}$-algebra ${{\cal L}_{X_\alpha}}$ generated by ${{\cal B}_{X_\alpha}}$; we refer to ${(X_\alpha, {\cal L}_{X_\alpha}, \mu_\alpha)}$ as a Loeb space, with ${\mu_\alpha}$ being the Loeb probability measure and ${{\cal L}_{X_\alpha}}$ being the Loeb ${\sigma}$-algebra.

The relationship between ${\mu_\alpha^0}$ and ${\mu_\alpha}$ is closely analogous to that between the elementary measure (or Jordan measure) of elementary subsets of (say) the unit cube, and that of Lebesgue measure of Borel subsets of that cube; see e.g. this text of mine for the basic theory here. For instance, it is not difficult to show that every Loeb measurable set ${F \subset X_\alpha}$ is “almost internal” in the sense that for any standard ${\epsilon > 0}$, one can find an internal set ${E \subset X_\alpha}$ which is ${\epsilon}$-close to ${F}$ in the sense that the outer measure

$\displaystyle \mu_\alpha^*(E \Delta F) := \inf \{ \sum_m \mu_\alpha^0(E_m): E_m \hbox{ internal and cover } E \Delta F \}$

of the symmetric difference ${E \Delta F}$ is at most ${\epsilon}$. Related to this, the Loeb measure and the outer measure agree on any Loeb measurable set, and the simple internal functions on ${X_\alpha}$ (i.e. finite linear combinations of indicator functions of internal sets) will be dense in ${L^p( X_\alpha, {\cal L}_{X_\alpha}, \mu_{X_\alpha} )}$ for any (standard) ${1 \leq p < \infty}$.

For instance, if ${N}$ is an unbounded natural number, then the set ${[o(N)] := \{ n \in [N]: n = o(N) \}}$ is not an internal subset of ${[N]}$, but it is the intersection of the countable family ${[N/k] = \{ n \in [N]: n \leq N/k\}}$ of internal sets for ${k \in {\bf N}}$. Each of the internal sets ${[N/k]}$ has measure ${\mu_{[N]}^0([N/k]) = \frac{1}{k}}$, and so ${[o(N)]}$ is a Loeb measurable set with Loeb measure zero. As another example, for any standard ${q \geq 1}$ and ${a \in {\bf Z}/q{\bf Z}}$, the residue class ${\{ n \in [N]: n = a\hbox{ mod } q \}}$ is an internal subset of ${[N]}$ of Loeb measure exactly ${1/q}$ (regardless of what the remainder of ${N}$ is modulo ${q}$).

As a quick application of Loeb measure, we give an instance of the Furstenberg correspondence principle, closely analogous to the topological dynamics example from a few sections earlier. Specifically, we connect the following two theorems:

Again, it is not difficult to show that Theorem 14 implies Theorem 15. To show the converse, we again use the compactness and contradiction method. If Theorem 14 failed, then there exist ${k \geq 1}$ and ${\delta>0}$, and a sequence ${N_{\bf n}}$ going to infinity, together with subsets ${E_{\bf n} \subset [N_{\bf n}]}$ of density at least ${\delta}$ which contain no ${k}$-term arithmetic progressions. Taking ultralimits, we obtain an unbounded natural number ${N}$ and an internal subset ${E \subset [N]}$ of Loeb measure at least ${\delta}$ which contains no non-standard ${k}$-term arithmetic progressions. The shift map ${T: x \mapsto x+1}$ is defined and bijective almost everywhere on ${[N]}$, and is measure-preserving, and Theorem 15 then provides the required contradiction.

Remark 2 The ultralimit construction actually gives a lot more relevant structure on ${[N]}$ than just that of a measure-preserving system. For instance, one can view the space ${\{ (x,x+h,x+k,x+h+k): x,x+h,x+k,x+h+k \in [N] \}}$ of parallelograms as a special subset of ${[N]^4}$, which one can also endow with a Loeb measure, and these spaces (as well as higher-dimensional analogues) which can be used to analyse the further additive structure of ${[N]}$ (in a manner that closely parallels the structural theory of Host and Kra on measure-preserving systems); see this paper of Szegedy for some details of this approach.

The above example already shows the power of basic measure-theoretic tools (such as the Hahn-Kolmogorov theorem) in connecting discrete and continuous results together. Now we turn to some applications of other basic measure theory tools, such as product measure and conditional expectation. Specifically, we will use ultraproducts and measure theory to prove the following “strong” form of the Szemerédi regularity lemma:

$\displaystyle f = f_{str} + f_{err} + f_{psd}$

where

• (Structure) ${f_{str}}$ takes values in ${[0,1]}$, and there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ such that ${f_{str}}$ is constant on ${V_i \times V_j}$ for all ${1 \leq i,j \leq M}$.
• (Smallness) ${f_{err}}$ takes values in ${[-1,1]}$, and ${{\bf E}_{v,w \in V} |f_{err}(v,w)|^2 \leq \epsilon}$.
• (Pseudorandomness) For any ${A,B \subset V}$, one has ${|{\bf E}_{v,w \in V} 1_A(v) 1_B(w) f_{psd}(v,w)| \leq \frac{1}{F(M)}}$.

The connection between the regularity lemma and nonstandard measure theory was first obtained by Elek and Szegedy, in which the extension of the arguments here to hypergraphs was also obtained.

We will deduce Lemma 16 from the following infinitary version:

$\displaystyle f = f_{str} + f_{err} + f_{psd}$

where

• (Structure) ${f_{str}}$ takes values in ${[0,1]}$, and there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ into finitely many internal sets such that ${f_{str}}$ is constant on ${V_i \times V_j}$ for all ${1 \leq i,j \leq M}$.
• (Smallness) ${f_{err}}$ takes values in ${[-1,1]}$, and ${\int_{V \times V} |f_{err}(v,w)|^2\ d\mu_{V \times V}(v,w) < \epsilon}$.
• (Pseudorandomness) For any measurable ${A,B \subset V}$, one has ${\int_{V \times V} 1_A(v) 1_B(w) f_{psd}(v,w)\ d\mu_{V \times V}(v,w) = 0}$.

We will prove Lemma 17 shortly, but let us first see how it implies Lemma 16. We once again use the compactness and contradiction method. If Lemma 16 fails, then there exists ${\epsilon>0}$ and ${F: {\bf N} \rightarrow {\bf N}}$, and for each ${{\bf n}}$ there exists a finite non-empty set ${V_{\bf n}}$ and a function ${f_{\bf n}: V_{\bf n} \times V_{\bf n} \rightarrow [0,1]}$, such that for no ${1 \leq M \leq {\bf n}}$ does there exist a decomposition

$\displaystyle f_{\bf n} = f_{str,{\bf n}} + f_{err,{\bf n}} + f_{psd,{\bf n}}$

obeying the conclusions of Lemma 16. We then take ultralimits to obtain a nonstandard finite non-empty set ${V_\alpha}$ and an internal function ${f_\alpha: V_\alpha \times V_\alpha \rightarrow {}^* [0,1]}$. The standard part ${f := \hbox{st} f_\alpha}$ is then a Loeb measurable function on ${V \times V}$ taking values in ${[0,1]}$. We apply Lemma 17 to obtain a decomposition

$\displaystyle f = f_{str} + f_{err} + f_{psd}$

with the stated properties for some finite ${M}$. Note that ${f_{str}}$ is already a simple internal function. The function ${f_{psd}}$ need not be simple internal, but may be approximated to arbitrary accuracy in ${L^2(V \times V, {\cal L}_{V \times V}, \mu_{V \times V})}$ by a simple internal function. Thus we may find a decomposition

$\displaystyle f_\alpha = f_{str} + f'_{err} + f'_{psd}$

where ${f'_{psd}}$ is simple internal and is such that

$\displaystyle |\int_{V \times V} 1_A(v) 1_B(w) f'_{psd}(v,w)\ d\mu_{V \times V}(v,w)| < \frac{1}{F(M)}$

for all measurable ${A,B \subset V}$, and ${f'_{err}}$ is simple internal, takes values in ${[-1,1]}$, and is such that

$\displaystyle \int_{V \times V} |f'_{err}(v,w)|^2\ d\mu_{V \times V}(v,w) < \epsilon.$

By Łos’s theorem, an analogous decomposition then holds for ${f_{\bf n}}$ for ${{\bf n}}$ sufficiently close to ${\alpha}$, but this contradicts the construction of the ${f_{\bf n}}$.

Now we prove Lemma 17. The proof hinges on a comparison between two spaces, the Loeb space

$\displaystyle (V \times V, {\cal L}_{V \times V}, \mu_{V \times V})$

on the product space ${V \times V}$, and the subtly different probability space

$\displaystyle (V \times V, {\cal L}_V \times {\cal L}_V, \mu_V \times \mu_V)$

that is the product of the Loeb space ${(V, {\cal L}_V, \mu_V)}$ with itself, thus ${{\cal L}_V \times {\cal L}_V}$ is the ${\sigma}$-algebra on ${V \times V}$ generated by rectangles ${A \times B}$ with ${A,B}$ Loeb measurable in ${V}$, and ${\mu_V \times \mu_V}$ is the product measure, which is the unique probability measure on ${{\cal L}_V \times {\cal L}_V}$ such that

$\displaystyle \mu_V \times \mu_V(A \times B) = \mu_V(A) \mu_V(B)$

for all Loeb measurable ${A,B \subset V}$.

It is not difficult to show that the former probability space is a refinement of the latter, thus ${{\cal L}_{V \times V}}$ contains ${{\cal L}_V \times {\cal L}_V}$, and ${\mu_{V \times V}}$ and ${\mu_V \times \mu_V}$ agree on their common domain ${{\cal L}_V \times {\cal L}_V}$ of definition. However, the former space is larger in general; the basic reason for this is for large finite spaces ${V_{\bf n}}$, it is possible to find subsets of ${V_{\bf n} \times V_{\bf n}}$ that are not well approximable by bounded boolean combinations of rectangles ${A_{\bf n} \times B_{\bf n}}$ (e.g. random subsets of ${V_{\bf n} \times V_{\bf n}}$ will have this property with high probability in the asymptotic limit when ${|V_{\bf n}|}$ goes to infinity). Despite this disparity, we may still form the conditional expectation ${\mathop{\bf E}( f | {\cal L}_V \times {\cal L}_V )}$ of ${f}$, which one can define as the orthogonal projection of ${f}$ in the Hilbert space ${L^2( V \times V, {\cal L}_{V \times V}, \mu_{V \times V})}$ to the closed subspace ${L^2( V \times V, {\cal L}_V \times {\cal L}_V, \mu_V \times \mu_V )}$. We then have a decomposition

$\displaystyle f = \mathop{\bf E}( f | {\cal L}_V \times {\cal L}_V ) + f_{psd}$

where ${f_{psd}}$ is orthogonal to all ${{\cal L}_V \times {\cal L}_V}$-measurable functions; in particular, it is orthogonal to ${1_{A \times B}}$ for any Loeb measurable ${A,B \subset V}$, so that

$\displaystyle \int_{V \times V} 1_A(v) 1_B(w) f_{psd}(v,w)\ d\mu_{V \times V}(v,w) = 0$

for all such ${A,B}$.

Next, we know that ${{\cal L}_V \times {\cal L}_V}$ is generated (up to null sets) by products ${A \times B}$ of internal sets, so we may approximate ${\mathop{\bf E}( f | {\cal L}_V \times {\cal L}_V )}$ to arbitary error in ${L^2}$ by a simple function ${f_{str}}$ that is a finite linear combination of indicators ${1_{A \times B}}$ of rectangles; as ${f}$ takes values in ${[0,1]}$, we can ensure that ${\mathop{\bf E}( f | {\cal L}_V \times {\cal L}_V )}$ and ${f_{str}}$. This gives a decomposition

$\displaystyle \mathop{\bf E}( f | {\cal L}_V \times {\cal L}_V ) = f_{str} + f_{err}$

and one easily verifies that all the required properties of Lemma 17 are obeyed.

Remark 3 It is instructive to deconstruct this proof and compare it with the standard “energy increment” proof of the regularity lemma. The energy increment process is now concealed within the standard proof of the existence of orthogonal projections in Hilbert spaces (see e.g. Proposition 1 of this previous blog post), so on some level the two proofs of the regularity lemma are essentially equivalent. However, the ultrafilter proof allows one to hide many of the ingredients of the argument into existing Hilbert space theory.

Remark 4 Although it is not needed here, one convenient fact about Loeb spaces on products is that they obey the following Fubini-Tonelli type theorem: if ${V, W}$ are nonstandard finite spaces and ${f: V \times W \rightarrow [0,+\infty]}$ is Loeb measurable on ${V \times W}$, then for all ${v \in V}$, the function ${w \mapsto f(v,w)}$ is Loeb measurable on ${W}$, the integral ${v \mapsto \int_W f(v,w)\ d\mu_W(w)}$ is Loeb measurable on ${V}$, and we have the identity

$\displaystyle \int_{V \times W} f(v,w)\ d\mu_{V \times W}(v,w) = \int_V ( \int_W f(v,w)\ d\mu_W(w) )\ d\mu_V(v).$

Similarly if the roles of ${V}$ and ${W}$ are swapped; in particular we have the familiar identity

$\displaystyle \int_V ( \int_W f(v,w)\ d\mu_W(w) )\ d\mu_V(v) = \int_W ( \int_V f(v,w)\ d\mu_V(v) )\ d\mu_W(w).$

These identities are obvious when ${f}$ is an internal simple function (as it follows from the corresponding claim in the finite case), and the general case follows from a standard approximation argument (using the monotone convergence theorem repeatedly). One can also obtain an analogous claim in the absolutely integrable category, in the spirit of Fubini’s theorem; we leave this to the interested reader.

### Steinn Sigurðsson — Hel

This is awesome!

I am reliably informed that this was one of the all time great concerts.

Skálmöld and the Icelandic Symphony in concert at Harpa.

### Terence Tao — Mertens’ theorems

Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:

Theorem 1 (Mertens’ theorems) In the asymptotic limit ${x \rightarrow \infty}$, we have

$\displaystyle \sum_{p\leq x} \frac{\log p}{p} = \log x + O(1), \ \ \ \ \ (1)$

$\displaystyle \sum_{p\leq x} \frac{1}{p} = \log \log x + O(1), \ \ \ \ \ (2)$

$\displaystyle \sum_{p\leq x} \log(1-\frac{1}{p}) = -\log \log x - \gamma + o(1) \ \ \ \ \ (3)$

where ${\gamma}$ is the Euler-Mascheroni constant, defined by requiring that

$\displaystyle 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + \gamma + o(1) \ \ \ \ \ (4)$

in the limit ${n \rightarrow \infty}$.

The third theorem (3) is usually stated in exponentiated form

$\displaystyle \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x},$

but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic ${\log(1-\frac{1}{p}) = -\frac{1}{p} + O(\frac{1}{p^2})}$.

Remarkably, these theorems can be proven without the assistance of the prime number theorem

$\displaystyle \sum_{p \leq x} 1 = \frac{x}{\log x} + o( \frac{x}{\log x} ),$

which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function ${\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}}$ in the neighbourhood of the pole at ${s=1}$, whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line ${\{ 1+it: t \in {\bf R} \}}$. Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula

$\displaystyle \zeta(s) = \prod_p (1-\frac{1}{p^s})^{-1}, \ \ \ \ \ (5)$

valid in the region ${\hbox{Re}(s) > 1}$ (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:

$\displaystyle \zeta(s) = \frac{1}{s-1} + O(1) \ \ \ \ \ (6)$

and

$\displaystyle \zeta'(s) = \frac{-1}{(s-1)^2} + O(1).$

Proof: For ${s}$ as in the proposition, we have ${\frac{1}{n^s} = \frac{1}{t^s} + O(\frac{1}{n^2})}$ for any natural number ${n}$ and ${n \leq t \leq n+1}$, and hence

$\displaystyle \frac{1}{n^s} = \int_n^{n+1} \frac{1}{t^s}\ dt + O( \frac{1}{n^2} ).$

Summing in ${n}$ and using the identity ${\int_1^\infty \frac{1}{t^s}\ dt = \frac{1}{s-1}}$, we obtain the first claim. Similarly, we have

$\displaystyle \frac{-\log n}{n^s} = \int_n^{n+1} \frac{-\log t}{t^s}\ dt + O( \frac{\log n}{n^2} ),$

and by summing in ${n}$ and using the identity ${\int_1^\infty \frac{-\log t}{t^s}\ dt = \frac{-1}{(s-1)^2}}$ (the derivative of the previous identity) we obtain the claim. $\Box$

The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with ${\gamma}$ replaced by an unspecified absolute constant. To get the specific constant ${\gamma}$ requires a little bit of additional effort. From (4), one might expect that the appearance of ${\gamma}$ arises from the refinement

$\displaystyle \zeta(s) = \frac{1}{s-1} + \gamma + O(|s-1|) \ \ \ \ \ (7)$

that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity ${\Gamma'(1) = - \gamma}$ (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:

Proposition 3 (Exponential integral asymptotics) In the limit ${\epsilon \rightarrow 0^+}$, we have

$\displaystyle \int_\epsilon^\infty \frac{e^{-t}}{t}\ dt = \log \frac{1}{\epsilon} - \gamma + o(1).$

A routine integration by parts shows that this asymptotic is equivalent to the identity

$\displaystyle \int_0^\infty e^{-t} \log t\ dt = -\gamma$

which is the identity ${\Gamma'(1)=-\gamma}$ mentioned previously.

Proof: We start by using the identity ${\frac{1}{i} = \int_0^1 x^{i-1}\ dx}$ to express the harmonic series ${H_n := 1+\frac{1}{2}+\ldots+\frac{1}{n}}$ as

$\displaystyle H_n = \int_0^1 1 + x + \ldots + x^{n-1}\ dx$

or on summing the geometric series

$\displaystyle H_n = \int_0^1 \frac{1-x^n}{1-x}\ dx.$

Since ${\int_0^{1-1/n} \frac{1}{1-x} = \log n}$, we thus have

$\displaystyle H_n - \log n = \int_0^1 \frac{1_{[1-1/n,1]}(x) - x^n}{1-x}\ dx;$

making the change of variables ${x = 1-\frac{t}{n}}$, this becomes

$\displaystyle H_n - \log n = \int_0^n \frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}\ dt.$

As ${n \rightarrow \infty}$, ${\frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}}$ converges pointwise to ${\frac{1_{[0,1]}(t) - e^{-t}}{t}}$ and is pointwise dominated by ${O( e^{-t} )}$. Taking limits as ${n \rightarrow \infty}$ using dominated convergence, we conclude that

$\displaystyle \gamma = \int_0^\infty \frac{1_{[0,1]}(t) - e^{-t}}{t}\ dt.$

or equivalently

$\displaystyle \int_0^\infty \frac{e^{-t} - 1_{[0,\epsilon]}(t)}{t}\ dt = \log \frac{1}{\epsilon} - \gamma.$

The claim then follows by bounding the ${\int_0^\epsilon}$ portion of the integral on the left-hand side. $\Box$

Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).

— 1. Proof of Mertens’ theorems —

Let ${s}$ be as in Proposition 2. Taking logarithms of ${\zeta(s)}$ using (5), we obtain the asymptotic

$\displaystyle \sum_p \log( 1 - \frac{1}{p^s} ) = \log(s-1) + O( |s-1| ) \ \ \ \ \ (8)$

using the standard branch of the logarithm; if we instead compute the logarithmic derivative ${\frac{\zeta'(s)}{\zeta(s)}}$, we obtain the closely related asymptotic

$\displaystyle \sum_p \frac{\log p}{p^s - 1} = \frac{1}{s-1} + O( 1 )$

$\displaystyle \sum_p \frac{\log p}{p^s} = \frac{1}{s-1} + O( 1 ). \ \ \ \ \ (9)$

These are already very close to Mertens’ theorems, except that the sharp cutoff ${1_{[0,x]}(p)}$ has been replaced by a smoother cutoff ${\frac{1}{p^{s-1}}}$. To pass from the smooth cutoff to the rough, we use Fourier analysis:

$\displaystyle \sum_p \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \log x ( \int_0^\infty F(t)\ dt + o(1) ). \ \ \ \ \ (10)$

$\displaystyle F(t) = \int_{\bf R} e^{-(1+iu) t} f(u)\ du \ \ \ \ \ (11)$

$\displaystyle \int_{\bf R} f(u) (\sum_p \frac{\log p}{p^{1 + \frac{1+iu}{\log x}}})\ du.$

The summation here may be crudely bounded in magnitude by

$\displaystyle |\sum_p \frac{\log p}{p^{1 + \frac{1+iu}{\log x}}}| \leq \sum_p \frac{\log p}{p^{1+1/\log x}} \ll \log x$

thanks to (9). From this and the rapid decrease of ${f}$, we see that the contribution of the integrand for which ${|u| \geq \sqrt{\log x}}$ (say) is ${o(\log x)}$. In the remaining region, we may apply (9) to estimate the left-hand side of (10) as

$\displaystyle \int_{|u| \leq \sqrt{\log x}} f(u) (\frac{\log x}{1+iu} + O(1) )\ du + o(\log x).$

The contribution of the ${O(1)}$ error is again ${o(\log x)}$, so from the rapid decrease of ${f}$ again we may rewrite this expression as

$\displaystyle \log x (\int_{\bf R} \frac{f(u)}{1+iu}\ du + o(1) )$

and the claim then follows by integrating (11) on ${[0,+\infty)}$. $\Box$

Setting ${F:= 1_{[0,1]}(x)}$ in the above proposition immediately gives the first Mertens’ theorem (1). We now skip to the third Mertens’ theorem (3), since as observed previously the second Mertens’ theorem (2) follows as a corollary. Let ${0 < \epsilon < 1}$ be a fixed quantity (independent of ${x}$). From (8) with ${s := 1+\frac{\epsilon}{\log x}}$ we have

$\displaystyle \sum_p \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = - \log \frac{1}{\epsilon} - \log \log x + o(1)$

where the asymptotic notation refers to the limit ${x \rightarrow \infty}$. Let us first consider the contribution to this sum from those ${p}$ with ${p>x}$:

$\displaystyle \sum_{p>x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ).$

By Taylor expansion we have ${\log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = -\frac{1}{p^{1+\epsilon/\log x}} + O( \frac{1}{p^2} )}$, and hence the above sum is equal to

$\displaystyle - \sum_{p>x} \frac{1}{p^{1+\epsilon/\log x}} + o(1)$

or equivalently

$\displaystyle - \frac{1}{\log x} \sum_{p>x} \frac{\log p}{p} F( \frac{\log p}{\log x} ) + o(1)$

where ${F(t) := \frac{e^{-\epsilon t}}{t}}$. From Proposition 4 we have

$\displaystyle \frac{1}{\log x} \sum_{x \leq p \leq x^N} \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \int_1^N \frac{e^{-\epsilon t}}{t}\ dt + o(1)$

for any fixed ${N>1}$. One can bound the tail error from ${p > x^N}$ from the first Mertens’ theorem (1) and dyadic decomposition, and on taking limits as ${N \rightarrow \infty}$ we conclude that

$\displaystyle \sum_{p>x} \log( 1 - \frac{1}{p^{\epsilon/\log x}} ) = - \int_1^\infty \frac{e^{-\epsilon t}}{t} \ dt + o(1);$

scaling ${t}$ by ${\epsilon}$, we rewrite this as

$\displaystyle \sum_{p>x} \log( 1 - \frac{1}{p^{\epsilon/\log x}} ) = - \int_\epsilon^\infty \frac{e^{-t}}{t} \ dt + o(1);$

Now we consider the contribution from those ${p}$ with ${p \leq x}$:

$\displaystyle \sum_{p \leq x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ).$

From the mean value theorem we have

$\displaystyle \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = \log(1-\frac{1}{p}) + O( \frac{\epsilon \log p}{p \log x} )$

so from the first Mertens’ theorem (1), we can write this sum as

$\displaystyle \sum_{p \leq x} \log( 1 - \frac{1}{p} ) + O( \epsilon ).$

Putting all this together, we see that

$\displaystyle \sum_{p \leq x} \log( 1 - \frac{1}{p} ) = -\log\log x + (\int_\epsilon^\infty \frac{e^{-t}}{t} \ dt - \log \frac{1}{\epsilon}) + O(\epsilon) + o(1).$

Sending ${\epsilon}$ slowly to zero using Proposition 3, we obtain the third Mertens’ theorem (3) as required.

Filed under: expository, math.NT Tagged: Euler product, Mertens' theorems, Riemann zeta function

### Matt Strassler — What’s the Status of the LHC Search for Supersymmetry?

It’s been quite a while (for good reason, as you’ll see) since I gave you a status update on the search for supersymmetry, one of several speculative ideas for what might lie beyond the known particles and forces.  Specifically, supersymmetry is one option (the most popular and most reviled, perhaps, but hardly the only one) for what might resolve the so-called “naturalness” puzzle, closely related to the “hierarchy problem” — Why is gravity so vastly weaker than the other forces? Why is the Higgs particle‘s mass so small compared to the mass of the lightest possible black hole?

Filed under: LHC News, Particle Physics Tagged: atlas, cms, LHC, supersymmetry

### n-Category CaféSevering Ties with the NSA

Updated on 11 Dec 2013: see end of post.

A letter from Chicago mathematician Sasha Beilinson in this month’s Notices of the American Mathematical Society calls for the AMS to sever all ties with the US National Security Agency, citing

the vast secret spying programs of the NSA that wildly exceed anything conspiracy theorists could imagine.

He lists some of the ways in which the AMS and NSA support each other, and issues a call for action:

What should be done is a question not only for US citizens but also for people all over the world: the NSA destroyed the security of the Internet and privacy of communications for the whole planet. But if any healing is possible, it would probably start with making the NSA and its ilk socially unacceptable — just as, in the days of my youth, working for the KGB was socially unacceptable for many in the Soviet Union.

I’m now wondering about the relationship between the LMS (London Mathematical Society, the British counterpart of the AMS) and GCHQ (Government Communications Headquarters, the British counterpart of the NSA). While GCHQ may employ fewer people, it has the inestimable advantage of not being constrained by that bothersome US constitution:

We have a light oversight regime compared with the US,

according to GCHQ lawyers, which is really saying something. Moreover, it is considered by some to be more extreme in its surveillance of the general population than even the NSA.

So, I’ve written to the president and vice-presidents of the LMS asking about its — or rather, “our”, as I’m a member — relationship with GCHQ. I’d like to know the facts. It may be that there’s no significant relationship, and that’s the answer I’d like to hear; but at present I simply don’t know.

What we do as mathematicians seldom has any contact with politics or human affairs. But this is one of those occasions. The NSA and GCHQ must be two of the largest employers of mathematicians in the world. Whatever you think of the ongoing mass surveillance, it can’t be denied that this is an issue that involves, and will continue to involve, our community.

Since we don’t usually have this kind of discussion here, let me make explicit what kind of thing I’m going to allow:

1. Discussion of the NSA and GCHQ is fine. That’s what this is about. Both places employ large numbers of mathematicians, and mathematics is involved in the mass surveillance programs — especially in the breaking and circumvention of online encryption. This is the relevance to the mathematical community.

2. The closer the discussion sticks to those issues that concern mathematicians, the better. If it strays too far away, I may steer it back (possibly using the “delete” button).

3. Please try to keep the temperature down. Good ways of doing this are to provide linked references and not to appeal to emotions.

4. The obvious stuff: no insults etc. (but I hope I don’t need to say that here). I’ll simply delete objectionable comments.

In short, please write thoughtfully, and please focus on the central issue: cooperation between the NSA/GCHQ and mathematicians.

I have now heard back informally from someone who was at the latest LMS Council meeting. (The Council is made up of academics and “is responsible for determining the strategy and policy of the Society”.)

Apparently, there was unanimous agreement that the LMS should at least be transparent about this, and should state publicly what connections there are between the LMS and GCHQ. (For example, GCHQ part-funds LMS instructional courses for graduate students.) I don’t know whether there was agreement on anything else, as I haven’t had an official response yet.

### Sean Carroll — Nobel Day

Today was the Nobel Prize ceremony, including of course the Physics Prize to François Englert and Peter Higgs. Congratulations once again to them!

(Parenthetically, it’s sad that the Nobel is used to puff up national pride. In Belgium, Englert gets into the headline but not Higgs; in the UK, it’s the other way around.)

Meanwhile here at Caltech, we welcomed back favorite son Murray Gell-Mann (who spends his days at the Santa Fe Institute these days) for the 50th anniversary of quarks. One of the speakers, Geoffrey West, pointed out that no Nobel was awarded for the idea of quarks. Gell-Mann did of course win the Nobel in 1969, but that was “for his contributions and discoveries concerning the classification of elementary particles and their interactions”. In other words, strangeness, SU(3) flavor symmetry, the Eightfold Way, and the prediction of the Omega-minus particle. (Other things Gell-Mann helped invent: kaon mixing, the renormalization group, the sigma model for pions, color and quantum chromodynamics, the seesaw mechanism for neutrino masses, and the decoherent histories approach to quantum mechanics. He is kind of a big deal.)

But, while we now understand SU(3) flavor symmetry in terms of the quark model (the up/down/strange quarks are all light compared to the QCD scale, giving rise to an approximate symmetry), the idea of quarks itself wasn’t honored by the 1969 prize. If it had been, the prize certainly would have been shared by George Zweig, who proposed the idea independently. So there’s still time to give out the Nobel for the quark model! Perhaps Gell-Mann and Zweig could share it with Harald Fritzsch, who collaborated with Gell-Mann on the invention of color and QCD. (The fact that QCD is asymptotically free won a prize for Gross, Politzer and Wilczek in 2004, but there hasn’t been a prize for the invention of the theory itself.) Modern particle physics has such a rich and fascinating history, we should honor it as accurately as possible.

## December 10, 2013

### Doug Natelson — Another workshop + some links

It's been a very hectic end-of-semester, between classes, research, and writing.  Thank all of you for your replies concerning power outages.  Perhaps recurring power problems are actually some secret motivator to get all of us to work on alternative energy research - I'll invent an arc reactor just to guarantee reliable power.

Right now, my colleagues and I here at Rice are hosting another workshop, this one on heavy fermions and quantum criticality.  For those who don't know, "heavy fermions" are found in materials where there are particular magnetic interactions between mobile charge carriers and localized (usually 4f) electrons.  Think of it this way:  the comparatively localized electrons have a very flat, narrow band dispersion (energy as a function of momentum).  (In the limit of completely localized states and isolated atoms, the "band" is completely flat, since the 4f states are identical for all the same atoms.)  If you hybridize those comparatively localized electrons with the more delocalized carriers that live in s and p bands, you end up with one band of electron states that is quite flat.  Since $$E \sim (\hbar^2/2m*)k^2$$, a very flat $$E(k)$$ would be equivalent to a really large effective mass $$m*$$.  Heavy fermion materials can have electron-like (or hole-like) charge carriers with effective masses several hundred times that of the free electron mass.   These systems are interesting for several reasons - they often have very rich phase diagrams (in terms of magnetic ordered states), can exhibit superconductivity (!), and indeed can share a lot of phenomenology with the high temperature superconductors, including having a "bad metal" normal state.  In the heavy fermion superconductors, it sure looks like spin fluctuations (related to the magnetism) are responsible for the pairing in the superconducting state.

Quantum criticality has to do with quantum phase transitions.  A quantum phase transition happens when you take a system and tune some parameter (like pressure, temperature, doping, etc.), and find that there is a sharp change (like onset of magnetic order or superconductivity) in the properties (defined by some order parameter) of the ground state (at zero temperature) at some value of that tuning (the quantum critical point).  Like an ordinary higher temperature phase transition, one class of quantum phase transitions is "second order", meaning (among other things) that fluctuations of the order parameter are important.  In this case, they are quantum fluctuations rather than thermal fluctuations, and there are particular predictions for how these things survive above $$T = 0$$ - for example, plotting properties as a function of $$\omega/T$$, where $$\omega$$ is frequency, should collapse big data sets onto universal curves.  There appears to be an intriguing correlation between quantum critical points, competition between phases, and the onset of superconductivity.  Fun stuff.

• Higgs doesn't think he would have gotten tenure in today's climate.
• Elsevier is still evil, or at least, not nice.
• A higher capacity cathode for Li-ion batteries looks interesting.
• NIST has decided to award their big materials center to a consortium led by Northwestern.

### John Baez — Global Climate Change Negotiations

There were many interesting talks at the Interdisciplinary Climate Change Workshop last week—too many for me to describe them all in detail. But I really must describe the talks by Radoslav Dimitrov. They were full of important things I didn’t know. Some are quite promising.

Radoslav S. Dimitrov is a professor at the Department of Political Science at Western University. What’s interesting is that he’s also been a delegate for the European Union at the UN climate change negotiations since 1990! His work documents the history of climate negotiations from behind closed doors.

Here are some things he said:

• In international diplomacy, there is no questioning the reality and importance of human-caused climate change. The question is just what to do about it.

• Governments go through every line of the IPCC reports twice. They cannot add anything the scientists have written, but they can delete things. All governments have veto power. This makes the the IPCC reports more conservative than they otherwise would be: “considerably diluted”.

• The climate change negotiations have surprised political scientists in many ways:

1) There is substantial cooperation even without the USA taking the lead.

2) Developing countries are accepting obligations, with many overcomplying.

3) There has been action by many countries and subnational entities without any treaty obligations.

4) There have been repeated failures of negotiation despite policy readiness.

• In 2011, China and Saudi Arabia rejected the final agreement at Durban as inadequate. Only Canada, the United States and Australia had been resisting stronger action on climate change. Canada abandoned the Kyoto Protocol the day after the collapse of negotiations at Durban. They publicly blamed China, India and Brazil, even though Brazil had accepted dramatic emissions cuts and China had, for the first time, accepted limits on emissions. Only India had taken a “hardline” attitude. Publicly blaming some other country for the collapse of negotiations is a no-no in diplomacy, so the Chinese took this move by Canada as a slap in the face. In return, they blamed Canada and “the West” for the collapse of Durban.

• Dimitrov is studying the role of persuasion in diplomacy, recording and analyzing hundreds of hours of discussions. Countries try to change each other’s minds, not just behavior.

• The global elite do not see climate change negotiations as an environmental issue. Instead, they feel they are “negotiating the future economy”. They focus on the negative economic consequences of inaction, and the economic benefits of climate action.

• In particular, the EU has managed to persuade many countries that climate change is worth tackling now. They do this with economic, not environmental arguments. For example, they argue that countries who take the initiative will have an advantage in future employment, getting most of the “green jobs”. Results include China’s latest 5-year plan, which some have called “the most progressive legislation in history”, and also Japan’s plan for a 60-80% reduction of carbon emissions. The EU itself also expects big returns on investment in climate change.

I apologize for any oversimplifications or downright errors in my notes here.

### References

You can see some slides for Dimitrov’s talks here:

• Radoslav S. Dimitrov, A climate of change.

• Radoslav S. Dimitrov, Inside Copenhagen: the state of climate governance, Global Environmental Politics 10 (2010), 18–24.

and these more recent book chapters, which are apparently not as easy to get:

• Radoslav S. Dimitrov, Environmental diplomacy, in Handbook of Global Environmental Politics, edited by Paul Harris, Routledge, forthcoming as of 2013.

• Radoslav S. Dimitrov, International negotiations, in Handbook of Global Climate and Environmental Policy, edited by Robert Falkner, Wiley-Blackwell forthcoming as of 2013.

• Radoslav S. Dimitrov, Persuasion in world politics: The UN climate change negotiations, in Handbook of Global Environmental Politics, edited by Peter Dauvergne, Edward Elgar Publishing, Cheltenham, UK, 2012.

• Radoslav S. Dimitrov, American prosperity and the high politics of climate change, in Prospects for a Post-American World, edited by Sabrina Hoque and Sean Clark, University of Toronto Press, Toronto, 2012.

### Tommaso Dorigo — Top Partners Wanted

No, this is not an article about top models. Rather, the subject of discussion are models that predict the existence of heavy partners of the top quark.

### David Hogg — posterior probability and cross validation

In MCMC meeting today, Goodman brought up the difference between the fully marginalized posterior model probability and what you learn by cross-validation, for model selection. As my loyal reader knows, I have many thoughts about this and also a nascent paper with Vanderplas. However, Goodman has a different take from me: He sees cross-validation as producing the most predictive model (preventing over-fitting), but posterior probability as delivering the most probable model, given the universe of models. I think this is deeply (and obviously) correct. However, we haven't settled on words for Hou's paper yet, because he is still willing to use the T-word ("truth"), and I am not! (I also think, in the end, this is all related to the point that we want to deliver the most useful result for our purposes, and this necessarily involves utility.)

### n-Category CaféA Technical Innovation

Here’s a new feature of the Café, thanks to our benevolent host Jacques Distler. If you ever want to see how someone has created some mathematical expression on this blog, there’s an easy way to do it.

With Firefox, you simply double-click on the expression. Try it: $A \times B^A \to B$ or $x_{m n}$ or

$\Biggl( \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \Biggr).$

A window should pop up showing the TeX source.

With other browsers, I’m not so sure. Try double-clicking. If that doesn’t work, then, according to Jacques’s instructions, you “bring up the MathJax context-menu for the formula, and choose Show Math As $\to$ Annotation $\to$ TeX”. I don’t know how one brings up this menu. Does anyone else know?

Once you’ve made the TeX source appear, you can cut and paste to your heart’s content. Of course, most users here are fluent in LaTeX. But like most math-oriented websites, we use a variant of TeX that’s a little different from standard LaTeX, so this should turn out to be a helpful feature.

## December 09, 2013

### John Baez — Lebesgue’s Universal Covering Problem

I try to focus on serious problems in this blog, mostly environmental issues and the attempt to develop ‘green mathematics’. But I seem unable to resist talking about random fun stuff now and then. For example, the Lebesgue universal covering problem. It’s not important, it’s just strange… but for some reason I feel a desire to talk about it.

For starters, let’s say the diameter of a region in the plane is the maximum distance between two points in this region. You all know what a circle of diameter 1 looks like. But an equilateral triangle with edges of length 1 also has diameter 1:

After all, two points in this triangle are farthest apart when they’re at two corners. And you’ll notice, if you think about it, that this triangle doesn’t fit inside a circle of diameter 1:

In 1914, the famous mathematician Henri Lebesgue sent a letter to a pal named Pál. And in this letter he asked: what’s the smallest possible region that contains every set in the plane with diameter 1?

He was actually more precise. The phrase “smallest possible” is vague, but Lebesgue wanted the set with the least possible area. The phrase “contains” is also vague, at least as I used it. I didn’t mean that our region should literally contain every set with diameter 1. Only the whole plane does that! I meant that we can rotate or translate any set with diameter 1 until it fits in our region. So a more precise statement is:

What is the smallest possible area of a region $S$ in the plane such that every set of diameter 1 in the plane can be rotated and translated to fit inside $S$?

You see why math gets a reputation of being dry: sometimes when you make a simple question precise it gets longer.

Even this second version of the question is a bit wrong, since it’s presuming there exists a region with smallest possible area that does the job. Maybe you can do it with regions whose area gets smaller and smaller, closer and closer to some number, but you can’t do it with a region whose area equals that number! Also, the word ‘region’ is a bit vague. So if I were talking to a nitpicky mathematician, I might pose the puzzle this way:

What is the greatest lower bound of the measures of closed sets $S \subseteq \mathbb{R}^2$ such that every set $T \subseteq \mathbb{R}^2$ of diameter 1 can be rotated and translated to fit inside $S$?

Anyway, the reason this puzzle is famous is not that it gets dry when you state it precisely. It’s that it’s hard to solve!

It’s called Lebesgue’s universal covering problem. A region in the plane is called a universal covering if it does the job: any set in the plane of diameter 1 can fit inside it, in the sense I made precise.

Pál set out to find universal coverings, and in 1920 he wrote a paper on his results. He found a very nice one: a regular hexagon circumscribed around a circle of diameter 1. Do you see why you can fit the equilateral triangle with side length 1 inside this?

This hexagon has area

$\sqrt{3}/2 \approx 0.86602540$

But in the same paper, Pál showed you could safely cut off two corners of this hexagon and get a smaller universal covering! This one has area

$2 - 2/\sqrt{3} \approx 0.84529946$

He guessed this solution was optimal.

However, in 1936, a guy named Sprague sliced two tiny pieces off Pál’s proposed solution and found a universal covering with area

$\sim 0.84413770$

Here’s the idea:

The big hexagon is Pál's original solution. He then inscribed a regular 12-sided polygon inside this, and showed that you can remove two of the corners, say $B_1B_2B$ and $F_1F_2F,$ and get a smaller universal covering. But Sprague noticed that near $D$ you can also remove the part outside the circle with radius 1 centered at $B_1$, as well as the part outside the circle with radius 1 centered at $F_2.$

Sprague conjectured that this is the best you can do.

But in 1975, Hansen showed you could slice off very tiny corners off Sprague’s solution, each of which reduces the area by just $6 \cdot 10^{-18}.$

I think this microscopic advance, more than anything else, convinced people that Lebesgue’s universal covering problem was fiendishly difficult. Viktor Klee, in a parody of the usual optimistic prophecies people like to make, wrote that:

… progress on this question, which has been painfully slow in the past, may be even more painfully slow in the future.

Indeed, my whole interest in this problem is rather morbid. I don’t know any reason that it’s important. I don’t see it as connected to lots of other beautiful math. It just seems astoundingly hard compared to what you might initially think. I admire people who work on it in the same way I admire people who decide to ski across the Antarctic. It’s fun to read about—from the comfort of my living room, sitting by the fire—but I can’t imagine actually doing it!

In 1980, a guy named Duff did a lot better. All the universal coverings so far were convex subsets of the plane. But he considered nonconvex universal coverings and found one with area:

$\sim 0.84413570$

This changed the game a lot. If we only consider convex universal coverings, it’s possible to prove there’s one with the least possible area—though we don’t know what it is. But if we allow nonconvex ones, we don’t even know this. There could be solutions whose area gets smaller and smaller, approaching some nonzero value, but never reaching it.

So at this point, Lebesgue’s puzzle split in two: one for universal coverings, and one for convex universal coverings. I’ll only talk about the second one, since I don’t know of further progress on the first.

Remember how Hansen improved Sprague’s universal covering by chopping off two tiny pieces? In 1992 he went further. He again sliced two corners off Sprague’s solution. One, the same shape as before, reduced the area by just $6 \cdot 10^{-18}.$ But the other was vastly larger: it reduced the area by a whopping $4 \cdot 10^{-11}.$

I believe this is the smallest convex universal covering known so far.

But there’s more to say. In 2005, Peter Brass and Mehrbod Sharifi came up with a nice lower bound. They showed any convex universal covering must have an area of least

$0.832$

They got this result by a rigorous computer-aided search for the convex set with the smallest area that contains a circle, equilateral triangle and pentagon of diameter 1.

If you only want your convex set to contain a circle and equilateral triangle of diameter 1, you can get away with this:

This gives a lower bound of

$0.825$

But the pentagon doesn’t fit in this set! Here is the least-area convex shape that also contains a pentagon of diameter 1, as found by Brass and Sharifi:

You’ll notice the triangle no longer has the same center as the circle!

To find this result, it was enough to keep the circle fixed, translate the triangle, and translate and rotate the pentagon. So, they searched through a 5-dimensional space, repeatedly computing the area of the convex hull of these three shapes to see how small they could make it. They considered 53,118,162 cases. Of these, 655,602 required further care—roughly speaking, they had to move the triangle and pentagon around slightly to see if they could make the area even smaller.

They say they could have done better if they’d also included a fourth shape, but this was computationally infeasible, since it would take them from a 5-dimensional search space to an 8-dimensional one. It’s possible that one could tackle this using a distributed computing project where a lot of people contribute computer time, like they do in the search for huge prime numbers.

If you hear of more progress on this issue, please let me know! I hope that sometime—perhaps tomorrow, perhaps decades hence—someone will report good news.

### References

Hansen’s record-holding convex universal cover is here:

• H. Hansen, Small universal covers for sets of unit diameter, Geometriae Dedicata 42 (1992), 205–213.

It’s quite readable. This is where I got the picture of Pál’s solution and Sprague’s improvement.

The paper on the current best lower bound for convex universal coverings is also quite nice:

• Peter Brass and Mehrbod Sharifi, A lower bound for Lebesgue’s universal cover problem, International Journal of Computational Geometry & Applications 15 (2005), 537–544.

The picture of the equilateral triangle in the circle comes from an earlier paper, which got a lower bound of

$0.8271$

by considering the circle and regular $3^n$-gons of diameter 1 for all $n$:

• Gy. Elekes, Generalized breadths, circular Cantor sets, and the least area UCC, Discrete & Computational Geometry 12 (1994), 439–449.

I have not yet managed to get ahold of Duff’s paper on the record-holding nonconvex universal covering:

• G. F. D. Duff, A smaller universal cover for sets of unit diameter, C. R. Math. Acad. Sci. 2 (1980), 37–42.

One interesting thing is that in 1963, Eggleston found a universal covering that’s minimal in the following sense: no subset of it is a universal covering. However, it doesn’t have the least possible area!

• H. G. Eggleston, Minimal universal covers in $\mathrm{E}^n,$ Israel J. Math. 1 (1963), 149–155.

Later, Kovalev showed that any ‘minimal’ universal covering in this sense is a star domain:

• M. D. Kovalev, A minimal Lebesgue covering exists (Russian), Mat. Zametki 40 (1986), 401–406, 430.

This means there’s a point $x_0$ inside the set such that for any other point $x$ in the set, the line segment connecting $x_0$ and $x$ is in the set. Any convex set is a star domain, but not conversely:

### John Baez — Entropy and Information in Biological Systems

John Harte is an ecologist who uses maximum entropy methods to predict the distribution, abundance and energy usage of species. Marc Harper uses information theory in bioinformatics and evolutionary game theory. Harper, Harte and I are organizing a workshop on entropy and information in biological systems, and I’m really excited about it!

It’ll take place at the National Institute for Mathematical and Biological Synthesis in Knoxville Tennesee. We are scheduling it for Wednesday-Friday, April 8-10, 2015. When the date gets confirmed, I’ll post an advertisement so you can apply to attend.

Writing the proposal was fun, because we got to pull together lots of interesting people who are applying information theory and entropy to biology in quite different ways. So, here it is!

### Proposal

Ever since Shannon initiated research on information theory in 1948, there have been hopes that the concept of information could serve as a tool to help systematize and unify work in biology. The link between information and entropy was noted very early on, and it suggested that a full thermodynamic understanding of biology would naturally involve the information processing and storage that are characteristic of living organisms. However, the subject is full of conceptual pitfalls for the unwary, and progress has been slower than initially expected. Premature attempts at ‘grand syntheses’ have often misfired. But applications of information theory and entropy to specific highly focused topics in biology have been increasingly successful, such as:

• the maximum entropy principle in ecology,
• Shannon and Rényi entropies as measures of biodiversity,
• information theory in evolutionary game theory,
• information and the thermodynamics of individual cells.

Because they work in diverse fields, researchers in these specific topics have had little opportunity to trade insights and take stock of the progress so far. The aim of the workshop is to do just this.

In what follows, participants’ names are in boldface, while the main goals of the workshop are in italics.

Roderick Dewar is a key advocate of the principle of Maximum Entropy Production, which says that biological systems—and indeed all open, non-equilibrium systems—act to produce entropy at the maximum rate. Along with others, he has applied this principle to make testable predictions in a wide range of biological systems, from ATP synthesis [DJZ2006] to respiration and photosynthesis of individual plants [D2010] and plant communities. He has also sought to derive this principle from ideas in statistical mechanics [D2004, D2009], but it remains controversial.

The first goal of this workshop is to study the validity of this principle.

While they may be related, the principle of Maximum Entropy Production should not be confused with the MaxEnt inference procedure, which says that we should choose the probabilistic hypothesis with the highest entropy subject to the constraints provided by our data. MaxEnt was first explicitly advocated by Jaynes. He noted that it is already implicit in the procedures of statistical mechanics, but convincingly argued that it can also be applied to situations where entropy is more ‘informational’ than ‘thermodynamic’ in character.

Recently John Harte has applied MaxEnt in this way to ecology, using it to make specific testable predictions for the distribution, abundance and energy usage of species across spatial scales and across habitats and taxonomic groups [Harte2008, Harte2009, Harte2011]. Annette Ostling is an expert on other theories that attempt to explain the same data, such as the ‘neutral model’ [AOE2008, ODLSG2009, O2005, O2012]. Dewar has also used MaxEnt in ecology [D2008], and he has argued that it underlies the principle of Maximum Entropy Production.

Thus, a second goal of this workshop is to familiarize all the participants with applications of the MaxEnt method to ecology, compare it with competing approaches, and study whether MaxEnt provides a sufficient justification for the principle of Maximum Entropy Production.

Entropy is not merely a predictive tool in ecology: it is also widely used as a measure of biodiversity. Here Shannon’s original concept of entropy naturally generalizes to ‘Rényi entropy’, which depends on a parameter $\alpha \ge 0$. This equals

$\displaystyle{ H_\alpha(p) = \frac{1}{1-\alpha} \log \sum_i p_i^\alpha }$

where $p_i$ is the fraction of organisms of the $i$th type (which could mean species, some other taxon, etc.). In the limit $\alpha \to 1$ this reduces to the Shannon entropy:

$\displaystyle{ H(p) = - \sum_i p_i \log p_i }$

As $\alpha$ increases, we give less weight to rare types of organisms. Christina Cobbold and Tom Leinster have described a systematic and highly flexible system of biodiversity measurement, with Rényi entropy at its heart [CL2012]. They consider both the case where all we have are the numbers $p_i$, and the more subtle case where we take the distance between different types of organisms into account.

John Baez has explained the role of Rényi entropy in thermodynamics [B2011], and together with Tom Leinster and Tobias Fritz he has proved other theorems characterizing entropy which explain its importance for information processing [BFL2011]. However, these ideas have not yet been connected to the widespread use of entropy in biodiversity studies. More importantly, the use of entropy as a measure of biodiversity has not been clearly connected to MaxEnt methods in ecology. Does the success of MaxEnt methods imply a tendency for ecosystems to maximize biodiversity subject to the constraints of resource availability? This seems surprising, but a more nuanced statement along these general lines might be correct.

So, a third goal of this workshop is to clarify relations between known characterizations of entropy, the use of entropy as a measure of biodiversity, and the use of MaxEnt methods in ecology.

As the amount of data to analyze in genomics continues to surpass the ability of humans to analyze it, we can expect automated experiment design to become ever more important. In Chris Lee and Marc Harper’s RoboMendel program [LH2013], a mathematically precise concept of ‘potential information’—how much information is left to learn—plays a crucial role in deciding what experiment to do next, given the data obtained so far. It will be useful for them to interact with William Bialek, who has expertise in estimating entropy from empirical data and using it to constrain properties of models [BBS, BNS2001, BNS2002], and Susanne Still, who applies information theory to automated theory building and biology [CES2010, PS2012].

However, there is another link between biology and potential information. Harper has noted that in an ecosystem where the population of each type of organism grows at a rate proportional to its fitness (which may depend on the fraction of organisms of each type), the quantity

$\displaystyle{ I(q||p) = \sum_i q_i \ln(q_i/p_i) }$

always decreases if there is an evolutionarily stable state [Harper2009]. Here $p_i$ is the fraction of organisms of the $i$th genotype at a given time, while $q_i$ is this fraction in the evolutionarily stable state. This quantity is often called the Shannon information of $q$ ‘relative to’ $p$. But in fact, it is precisely the same as Lee and Harper’s potential information! Indeed, there is a precise mathematical analogy between evolutionary games and processes where a probabilistic hypothesis is refined by repeated experiments.

Thus, a fourth goal of this workshop is to develop the concept of evolutionary games as ‘learning’ processes in which information is gained over time.

We shall try to synthesize this with Carl Bergstrom and Matina Donaldson-Matasci’s work on the ‘fitness value of information’: a measure of how much increase in fitness a population can obtain per bit of extra information [BL2004, DBL2010, DM2013]. Following Harper, we shall consider not only relative Shannon entropy, but also relative Rényi entropy, as a measure of information gain [Harper2011].

A fifth and final goal of this workshop is to study the interplay between information theory and the thermodynamics of individual cells and organelles.

Susanne Still has studied the thermodynamics of prediction in biological systems [BCSS2012]. And in a celebrated related piece of work, Jeremy England used thermodynamic arguments to a derive a lower bound for the amount of entropy generated during a process of self-replication of a bacterial cell [England2013]. Interestingly, he showed that E. coli comes within a factor of 3 of this lower bound.

In short, information theory and entropy methods are becoming powerful tools in biology, from the level of individual cells, to whole ecosystems, to experimental design, model-building, and the measurement of biodiversity. The time is ripe for an investigative workshop that brings together experts from different fields and lets them share insights and methods and begin to tackle some of the big remaining questions.

### Bibliography

[AOE2008] D. Alonso, A. Ostling and R. Etienne, The assumption of symmetry and species abundance distributions, Ecology Letters 11 (2008), 93–105.

[TMMABB2012} D. Amodei, W. Bialek, M. J. Berry II, O. Marre, T. Mora, and G. Tkacik, The simplest maximum entropy model for collective behavior in a neural network, arXiv:1207.6319 (2012).

[B2011] J. Baez, Rényi entropy and free energy, arXiv:1102.2098 (2011).

[BFL2011] J. Baez, T. Fritz and T. Leinster, A characterization of entropy in terms of information loss, Entropy 13 (2011), 1945–1957.

[B2011] J. Baez and M. Stay, Algorithmic thermodynamics, Math. Struct. Comp. Sci. 22 (2012), 771–787.

[BCSS2012] A. J. Bell, G. E. Crooks, S. Still and D. A Sivak, The thermodynamics of prediction, Phys. Rev. Lett. 109 (2012), 120604.

[BL2004] C. T. Bergstrom and M. Lachmann, Shannon information and biological fitness, in IEEE Information Theory Workshop 2004, IEEE, 2004, pp. 50-54.

[BBS] M. J. Berry II, W. Bialek and E. Schneidman, An information theoretic approach to the functional classification of neurons, in Advances in Neural Information Processing Systems 15, MIT Press, 2005.

[BNS2001] W. Bialek, I. Nemenman and N. Tishby, Predictability, complexity and learning, Neural Computation 13 (2001), 2409–2463.

[BNS2002] W. Bialek, I. Nemenman and F. Shafee, Entropy and inference, revisited, in Advances in Neural Information Processing Systems 14, MIT Press, 2002.

[CL2012] C. Cobbold and T. Leinster, Measuring diversity: the importance of species similarity, Ecology 93 (2012), 477–489.

[CES2010] J. P. Crutchfield, S. Still and C. Ellison, Optimal causal inference: estimating stored information and approximating causal architecture, Chaos 20 (2010), 037111.

[D2004] R. C. Dewar, Maximum entropy production and non-equilibrium statistical mechanics, in Non-Equilibrium Thermodynamics and Entropy Production: Life, Earth and Beyond, eds. A. Kleidon and R. Lorenz, Springer, New York, 2004, 41–55.

[DJZ2006] R. C. Dewar, D. Juretíc, P. Zupanovíc, The functional design of the rotary enzyme ATP synthase is consistent with maximum entropy production, Chem. Phys. Lett. 430 (2006), 177–182.

[D2008] R. C. Dewar, A. Porté, Statistical mechanics unifies different ecological patterns, J. Theor. Bio. 251 (2008), 389–403.

[D2009] R. C. Dewar, Maximum entropy production as an inference algorithm that translates physical assumptions into macroscopic predictions: don’t shoot the messenger, Entropy 11 (2009), 931–944.

[D2010] R. C. Dewar, Maximum entropy production and plant optimization theories, Phil. Trans. Roy. Soc. B 365 (2010) 1429–1435.

[DBL2010} M. C. Donaldson-Matasci, C. T. Bergstrom, and
M. Lachmann, The fitness value of information, Oikos 119 (2010), 219-230.

[DM2013] M. C. Donaldson-Matasci, G. DeGrandi-Hoffman, and A. Dornhaus, Bigger is better: honey bee colonies as distributed information-gathering systems, Animal Behaviour 85 (2013), 585–592.

[England2013] J. L. England, Statistical physics of self-replication, J. Chem. Phys. 139 (2013), 121923.

[ODLSG2009} J. L. Green, J. K. Lake, J. P. O’Dwyer, A. Ostling and V. M. Savage, An integrative framework for stochastic, size-structured community assembly, PNAS 106 (2009), 6170--6175.

[Harper2009] M. Harper, Information geometry and evolutionary game theory, arXiv:0911.1383 (2009).

[Harper2011] M. Harper, Escort evolutionary game theory, Physica D 240 (2011), 1411–1415.

[Harte2008] J. Harte, T. Zillio, E. Conlisk and A. Smith, Maximum entropy and the state-variable approach to macroecology, Ecology 89 (2008), 2700–2711.

[Harte2009] J. Harte, A. Smith and D. Storch, Biodiversity scales from plots to biomes with a universal species-area curve, Ecology Letters 12 (2009), 789–797.

[Harte2011] J. Harte, Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics, Oxford U. Press, Oxford, 2011.

[LH2013] M. Harper and C. Lee, Basic experiment planning via information metrics: the RoboMendel problem, arXiv:1210.4808 (2012).

[O2005] A. Ostling, Neutral theory tested by birds, Nature 436 (2005), 635.

[O2012] A. Ostling, Do fitness-equalizing tradeoffs lead to neutral communities?, Theoretical Ecology 5 (2012), 181–194.

[PS2012] D. Precup and S. Still, An information-theoretic approach to curiosity-driven reinforcement learning, Theory in Biosciences 131 (2012), 139–148.

### Terence Tao — Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves

This is the third thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$) or asymptotically as ${m \rightarrow \infty}$. The previous thread may be found here. The currently best known bounds on ${H_m}$ are:

• (Maynard) Assuming the Elliott-Halberstam conjecture, ${H_1 \leq 12}$.
• (Polymath8b, tentative) ${H_1 \leq 330}$. Assuming Elliott-Halberstam, ${H_2 \leq 330}$.
• (Polymath8b, tentative) ${H_2 \leq 484{,}126}$. Assuming Elliott-Halberstam, ${H_4 \leq 493{,}408}$.
• (Polymath8b) ${H_m \leq \exp( 3.817 m )}$ for sufficiently large ${m}$. Assuming Elliott-Halberstam, ${H_m \ll e^{2m} \log m}$ for sufficiently large ${m}$.

Much of the current focus of the Polymath8b project is on the quantity

$\displaystyle M_k = M_k({\cal R}_k) := \sup_F \frac{\sum_{m=1}^k J_k^{(m)}(F)}{I_k(F)}$

where ${F}$ ranges over square-integrable functions on the simplex

$\displaystyle {\cal R}_k := \{ (t_1,\ldots,t_k) \in [0,+\infty)^k: t_1+\ldots+t_k \leq 1 \}$

with ${I_k, J_k^{(m)}}$ being the quadratic forms

$\displaystyle I_k(F) := \int_{{\cal R}_k} F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k$

and

$\displaystyle J_k^{(m)}(F) := \int_{{\cal R}_{k-1}} (\int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_k)\ dt_m)^2$

$\displaystyle dt_1 \ldots dt_{m-1} dt_{m+1} \ldots dt_k.$

It was shown by Maynard that one has ${H_m \leq H(k)}$ whenever ${M_k > 4m}$, where ${H(k)}$ is the narrowest diameter of an admissible ${k}$-tuple. As discussed in the previous post, we have slight improvements to this implication, but they are currently difficult to implement, due to the need to perform high-dimensional integration. The quantity ${M_k}$ does seem however to be close to the theoretical limit of what the Selberg sieve method can achieve for implications of this type (at the Bombieri-Vinogradov level of distribution, at least); it seems of interest to explore more general sieves, although we have not yet made much progress in this direction.

The best asymptotic bounds for ${M_k}$ we have are

$\displaystyle \log k - \log\log\log k + O(1) \leq M_k \leq \frac{k}{k-1} \log k \ \ \ \ \ (1)$

which we prove below the fold. The upper bound holds for all ${k > 1}$; the lower bound is only valid for sufficiently large ${k}$, and gives the upper bound ${H_m \ll e^{2m} \log m}$ on Elliott-Halberstam.

For small ${k}$, the upper bound is quite competitive, for instance it provides the upper bound in the best values

$\displaystyle 1.845 \leq M_4 \leq 1.848$

and

$\displaystyle 2.001162 \leq M_5 \leq 2.011797$

we have for ${M_4}$ and ${M_5}$. The situation is a little less clear for medium values of ${k}$, for instance we have

$\displaystyle 3.95608 \leq M_{59} \leq 4.148$

and so it is not yet clear whether ${M_{59} > 4}$ (which would imply ${H_1 \leq 300}$). See this wiki page for some further upper and lower bounds on ${M_k}$.

The best lower bounds are not obtained through the asymptotic analysis, but rather through quadratic programming (extending the original method of Maynard). This has given significant numerical improvements to our best bounds (in particular lowering the ${H_1}$ bound from ${600}$ to ${330}$), but we have not yet been able to combine this method with the other potential improvements (enlarging the simplex, using MPZ distributional estimates, and exploiting upper bounds on two-point correlations) due to the computational difficulty involved.

— 1. Upper bound —

We now prove the upper bound in (1). The key estimate is

$\displaystyle (\int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1)^2 \ \ \ \ \ (2)$

$\displaystyle \leq \frac{\log k}{k-1} \int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)^2 (1 - t_1-\ldots-t_k+ kt_1)\ dt_1.$

Assuming this estimate, we may integrate in ${t_2,\ldots,t_k}$ to conclude that

$\displaystyle J_k^{(1)}(F) \leq \frac{\log k}{k-1} \int F^2 (1-t_1-\ldots-t_k+kt_1)\ dt_1 \ldots dt_k$

which symmetrises to

$\displaystyle \sum_{m=1}^k J_k^{(m)}(F) \leq k \frac{\log k}{k-1} \int F^2\ dt_1 \ldots dt_k$

It remains to prove (2). By Cauchy-Schwarz, it suffices to show that

$\displaystyle \int_0^{1-t_2-\ldots-t_k} \frac{dt_1}{1 - t_1-\ldots-t_k+ kt_1} \leq \frac{\log k}{k-1}.$

But writing ${s = t_2+\ldots+t_k}$, the left-hand side evaluates to

$\displaystyle \frac{1}{k-1} (\log k(1-s) - \log (1-s) ) = \frac{\log k}{k-1}$

as required.

This also suggests that extremal ${F(t_1,\ldots,t_k)}$ behave like ${\tilde F(t_2,\ldots,t_k) / (1-t_1-\ldots-t_k + kt_1)}$ for some function ${\tilde F}$, and similarly for permutations. However, it is not possible to have exact equality in all variables simultaneously, which indicates that the upper bound (1) is not optimal, although in practice it does remarkably well for small ${k}$.

— 2. Lower bound —

Using the notation of this previous post, we have the lower bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T )$

whenever ${g}$ is supported on ${[0,T]}$, ${m_i := \int_0^T g(t)^i\ dt}$, and ${X_1,\ldots,X_k}$ are independent random variables on ${[0,T]}$ with density ${\frac{1}{m_2} g(t)^2\ dt}$. We select the function

$\displaystyle g(t) := \frac{1}{1 + A t} 1_{[0,T]}(t)$

with ${A := \log k}$, and ${T := \varepsilon \frac{k}{A}}$ for some ${0 < \varepsilon < 1}$ to be chosen later. We have

$\displaystyle m_1 = \log(1+AT) = \log( 1 + \varepsilon k)$

and

$\displaystyle m_2 = \int_0^T \frac{1}{(1+At)^2}\ dt$

$\displaystyle = \frac{1}{A} - \frac{1}{A (1+AT)}$

$\displaystyle \leq \frac{1}{A} = \log k$

and so

$\displaystyle M_k \geq \frac{\log^2(1+\varepsilon k)}{\log k} {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T ).$

Observe that the random variables ${X_i}$ have mean

$\displaystyle \mu = \frac{1}{m_2} \frac{1}{A^2} (\log(1+AT)-1+\frac{1}{1+AT})$

$\displaystyle = (A + \frac{1}{\varepsilon k}) (\log(1+\varepsilon k)-1+\frac{1}{1+\varepsilon k})$

$\displaystyle \leq 1 - \frac{1}{\log k} + O( \frac{\log k}{\varepsilon k} ).$

The variance ${ \sigma^2}$ may be bounded crudely by

$\displaystyle \sigma^2 \leq \frac{1}{m_2} \int_0^T \frac{t^2}{(1+At)^2}\ dt$

$\displaystyle = O( A \frac{T}{A^2} ) = O( \frac{\varepsilon k}{\log^2 k} ).$

Thus the random variable ${X_1 + \ldots + X_{k-1}}$ has mean at most ${k - \frac{k}{\log k} + O( \frac{\log k}{\varepsilon} )}$ and variance ${O( \frac{\varepsilon k^2}{\log^2 k} )}$, with each variable bounded in magnitude by ${T = \frac{\varepsilon k}{\log k}}$. By Hoeffding’s inequality, this implies that ${X_1 + \ldots + X_{k-1}}$ is at most ${ k - T}$ with probability at most ${O( \exp(- c / \varepsilon^2 )}$ for some absolute constant ${c}$. If we set ${ \varepsilon := C / (\log \log k)^{1/2}}$ for a sufficiently large absolute constant ${C}$, we thus have

$\displaystyle {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T ) = 1 - O( 1 / \log k)$

and thus

$\displaystyle M_k \geq \log k - \log\log\log k + O(1)$

giving the lower bound in (1).

Hoeffding’s bound is proven by the exponential moment method, which more generally gives the bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} (1 - e^{-s(k-T)} (\frac{1}{m_2} \int_0^T g(t)^2 e^{st}\ dt)^{k-1} )$

for any ${s > 0}$. However, this bound is inferior to the linear algebra method for small ${k}$; for instance, we can only obtain ${DHL[k,2]}$ for ${k=339}$ by this method (compared to ${k=64}$ from quadratic programming), even if one uses the best available MPZ estimate.

Using ${MPZ[\varpi,\delta]}$, one can modify this lower bound, obtaining ${DHL[k_0,m+1]}$ whenever we can find ${\varpi, \delta, A, T, s, s'}$ obeying

$\displaystyle \frac{m_1^2}{m_2} ( 1 - \kappa_1 - \kappa_2 ) > \frac{m}{1/4+\varpi}$

where

$\displaystyle \kappa_1 := e^{-s(k-T)} (\frac{1}{m_2}\int_0^T e^{st} g(t)^2\ dt)^{k-1}$

and

$\displaystyle \kappa_2 := e^{s' \theta k} (\frac{1}{m_2}\int_0^{\tilde \delta k} e^{-s't} g(t)^2\ dt)^{k-1}$

with

$\displaystyle \tilde \delta' := \frac{T}{k}$

$\displaystyle \delta' = \tilde \delta' (\frac{1}{4}+\varpi)$

$\displaystyle \tilde \delta := \frac{\delta}{1/4+\varpi}.$

$\displaystyle \theta := \frac{i(\delta'-\delta)/2 + \varpi}{1/4+\varpi}$

$\displaystyle = i (\tilde \delta' - \tilde \delta)/2 + \frac{\varpi}{1/4+\varpi}.$

Details can be found at this comment.

The expressions ${\sum_{m=1}^k J_k^{(m)}(F)}$, ${I_k(F)}$ are quadratic forms in ${F}$, so one can (in principle, at least) obtain lower bounds for ${M_k}$ by restricting to a finite-dimensional space of ${F}$ and performing quadratic programming on the resulting finite-dimensional quadratic forms.

One can take ${F}$ to be symmetric without loss of generality. One can then look at functions ${F}$ that are linear combinations of symmetric polynomials

$\displaystyle m(\alpha) = \sum_{\sigma \in S_k} x_{\sigma(1)}^{\alpha_1} \ldots x_{\sigma(k)}^{\alpha_k}$

for various signatures ${\alpha_1,\alpha_2,\ldots}$; actually it turns out numerically that one can get more efficient results by working with combinations of ${m(\alpha)}$ for ${\alpha}$ up to a fixed degree, and ${(1-P_1)^a m(\alpha)}$ for ${\alpha}$ at that degree afterwords, where ${P_1(x_1,\ldots,x_k) = x_1+\ldots+x_k}$ is the first symmetric polynomial. (Note that the GPY type sieves come from the case when ${F}$ is purely a function of ${P_1}$.)

It may be that other choices of bases here are more efficient still (perhaps choosing bases that reflect the belief that ${F}$ should behave something like ${\prod_{i=1}^k \frac{1}{1+A k^{1/2} t_i}}$ for some smallish ${A}$), but one numerical obstacle is that we are only able to accurately compute the required coefficients for ${J_k^{(m)}(F), I_k(F)}$ in the case of polynomials.

Filed under: math.NA, math.NT, polymath Tagged: polymath8

### Andrew Jaffe — Breaking the silence (updated)

My apologies for being far too busy to post. I’ll be much louder in couple of weeks once we release the Planck data — on March 21. Until then, I have to shut up and follow the Planck rules.

OK, back to editing. (I’ll try to update this post with any advance information as it becomes available.)

Update (on timing, not content): the main Planck press conference will be held on the morning of 21 March at 10am CET at ESA HQ in Paris. There will be a simultaneous UK event (9am GMT) held at the Royal Astronomical Society in London, where the Paris event will be streamed, followed by a local Q&A session. (There will also be a more technical afternoon session in Paris.)

Probably more important for my astrophysics colleagues: the Planck papers will be posted on the ESA website at noon on the 21st, after the press event, and will appear on the ArXiV the following day, 22 March. Be sure to set aside some time next weekend!

### Andrew Jaffe — Planck 2013: the science

If you’re the kind of person who reads this blog, then you won’t have missed yesterday’s announcement of the first Planck cosmology results.

The most important is our picture of the cosmic microwave background itself:

But it takes a lot of work to go from the data coming off the Planck satellite to this picture. First, we have to make nine different maps, one at each of the frequencies in which Planck observes, from 30 GHz (with a wavelength of 1 cm) up to 850 GHz (0.350 mm) — note that the colour scales here are the same:

At low and high frequencies, these are dominated by the emission of our own galaxy, and there is at least some contamination over the whole range, so it takes hard work to separate the primordial CMB signal from the dirty (but interesting) astrophysics along the way. In fact, it’s sufficiently challenging that the team uses four different methods, each with different assumptions, to do so, and the results agree remarkably well.

In fact, we don’t use the above CMB image directly to do the main cosmological science. Instead, we build a Bayesian model of the data, combining our understanding of the foreground astrophysics and the cosmology, and marginalise over the astrophysical parameters in order to extract as much cosmological information as we can. (The formalism is described in the Planck likelihood paper, and the main results of the analysis are in the Planck cosmological parameters paper.)

The main tool for this is the power spectrum, a plot which shows us how the different hot and cold spots on our CMB map are distributed: In this plot, the left-hand side (low ℓ) corresponds to large angles on the sky and high ℓ to small angles. Planck’s results are remarkable for covering this whole range from ℓ=2 to ℓ=2500: the previous CMB satellite, WMAP, had a high-quality spectrum out to ℓ=750 or so; ground- and balloon-based experiments like SPT and ACT filled in some of the high-ℓ regime.

It’s worth marvelling at this for a moment, a triumph of modern cosmological theory and observation: our theoretical models fit our data from scales of 180° down to 0.1°, each of those bumps and wiggles a further sign of how well we understand the contents, history and evolution of the Universe. Our high-quality data has refined our knowledge of the cosmological parameters that describe the universe, decreasing the error bars by a factor of several on the six parameters that describe the simplest ΛCDM universe. Moreover, and maybe remarkably, the data don’t seem to require any additional parameters beyond those six: for example, despite previous evidence to the contrary, the Universe doesn’t need any additional neutrinos.

The quantity most well-measured by Planck is related to the typical size of spots in the CMB map; it’s about a degree, with an error of less than one part in 1,000. This quantity has changed a bit (by about the width of the error bar) since the previous WMAP results. This, in turn, causes us to revise our estimates of quantities like the expansion rate of the Universe (the Hubble constant), which has gone down, in fact by enough that it’s interestingly different from its best measurements using local (non-CMB) data, from more or less direct observations of galaxies moving away from us. Both methods have disadvantages: for the CMB, it’s a very indirect measurement, requiring imposing a model upon the directly measured spot size (known more technically as the “acoustic scale” since it comes from sound waves in the early Universe). For observations of local galaxies, it requires building up the famous cosmic distance ladder, calibrating our understanding of the distances to further and further objects, few of which we truly understand from first principles. So perhaps this discrepancy is due to messy and difficult astrophysics, or perhaps to interesting cosmological evolution.

This change in the expansion rate is also indirectly responsible for the results that have made the most headlines: it changes our best estimate of the age of the Universe (slower expansion means an older Universe) and of the relative amounts of its constituents (since the expansion rate is related to the geometry of the Universe, which, because of Einstein’s General Relativity, tells us the amount of matter).

But the cosmological parameters measured in this way are just Planck’s headlines: there is plenty more science. We’ve gone beyond the power spectrum above to put limits upon so-called non-Gaussianities which are signatures of the detailed way in which the seeds of large-scale structure in the Universe was initially laid down. We’ve observed clusters of galaxies which give us yet more insight into cosmology (and which seem to show an intriguing tension with some of the cosmological parameters). We’ve measured the deflection of light by gravitational lensing. And in work that I helped lead, we’ve used the CMB maps to put limits on some of the ways in which our simplest models of the Universe could be wrong, possibly having an interesting topology or rotation on the largest scales.

But because we’ve scrutinised our data so carefully, we have found some peculiarities which don’t quite fit the models. From the days of COBE and WMAP, there has been evidence that the largest angular scales in the map, a few degrees and larger, have some “anomalies” — some of the patterns show strange alignments, some show unexpected variation between two different hemispheres of the sky, and there are some areas of the sky that are larger and colder than is expected to occur in our theories. Individually, any of these might be a statistical fluke (and collectively they may still be) but perhaps they are giving us evidence of something exciting going on in the early Universe. Or perhaps, to use a bad analogy, the CMB map is like the Zapruder film: if you scrutinise anything carefully enough, you’ll find things that look a conspiracy, but turn out to have an innocent explanation.

I’ve mentioned eight different Planck papers so far, but in fact we’ve released 28 (and there will be a few more to come over the coming months, and many in the future). There’s an overall introduction to the Planck Mission, and papers on the data processing, observations of relatively nearby galaxies, and plenty more cosmology. The papers have been submitted to the journal A&A, they’re available on the ArXiV, and you can find a list of them at the ESA site.

Even more important for my cosmology colleagues, we’ve released the Planck data, as well, along with the necessary code and other information necessary to understand it: you can get it from the Planck Legacy Archive. I’m sure we’ve only just begun to get exciting and fun science out of the data from Planck. And this is only the beginning of Planck’s data: just the first 15 months of observations, and just the intensity of the CMB: in the coming years we’ll be analysing (and releasing) more than one more year of data, and starting to dig into Planck’s observations of the polarized sky.

### Tim Gowers — A little paradox

This post is intended as a footnote to one that I wrote a couple of years ago about the meaning of “implies” in mathematics, which was part of a series of posts designed as an introduction to certain aspects of university mathematics.

If you are reasonably comfortable with the kind of basic logic needed in an undergraduate course, then you may enjoy trying to find the flaw in the following argument, which must have a flaw, since I’m going to prove a general statement and then give a counterexample to it. If you find the exercise extremely easy, then you may prefer to hold back so that others who find it harder will have a chance to think about it. Or perhaps I should just say that if you don’t find it easy, then I think it would be a good exercise to think about it for a while before looking at other people’s suggested solutions.

First up is the general statement. In fact, it’s a very general statement. Suppose you are trying to prove a statement $Q$ and you have a hypothesis $\forall x\in X\ P(x)$ to work with. In other words, you are trying to prove the statement

$(\forall x\in X\ P(x))\implies Q\ .$

Now if $R$ and $S$ are two statements, then $R\implies S$ is true if and only if either $R$ is false or $S$ is true. Hence what we are trying to prove can be rewritten as follows.

$(\neg(\forall x\in X\ P(x)))\vee Q\ .$

Now we can bring the $\neg$ inside the $\forall$ as long as we convert the $\forall$ into $\exists$, so let’s do that. What we want to prove becomes this.

$(\exists x\in X\ \neg(P(x)))\vee Q\ .$

I’ll assume here that we haven’t done something foolish and given the name $x$ to one of the variables involved in the statement $Q$. So now I’m going to use the general rule that $(\exists x\in X\ R(x))\ \vee\ Q$ is equivalent to $\exists x\in X(R(x)\ \vee\ Q)$ to rewrite what we want to prove as the following.

$\exists x\in X\ (\neg(P(x))\ \vee\ Q)\ .$

Finally, let’s rewrite what’s inside the brackets using the $\implies$ sign.

$\exists x\in X\ (P(x)\implies Q)\ .$

Every single step I took there was a logical equivalence, so the conclusion is that if you want to show that $\forall x\in X\ P(x)$ implies $Q$, your task is the same as that of finding a single $x\in X$ such that $P(x)\implies Q$.

Now let me give a counterexample to that useful logical principle. Let $X$ be a set of real numbers. Define the diameter of $X$ to be $\sup\{|u-v|:u,v\in X\}$. I’ll write it $\mathrm{diam}(X)$.

Consider the following implication.

$(\forall x\in X\ |x|\leq 1)\implies\mathrm{diam}(X)\leq 2\ .$

That is clearly correct: if every element of $X$ has modulus at most 1, then $X$ is contained in the interval $[-1,1]$, so clearly can’t have diameter greater than 2.

But then, by the logical principle just derived, there must be a single element of $X$ such that if that element has modulus at most 1, then the diameter of $X$ is at most 2. In other words,

$\exists x\in X\ (|x|\leq 1\implies\mathrm{diam}(X)\leq 2)\ .$

But that is clearly nonsense. If all we know is that one particular element of $X$ has modulus at most 1, it can’t possibly imply that $X$ has diameter at most 2.

What has gone wrong here? If you can give a satisfactory answer, then you will have a good grasp of what mathematicians mean by “implies”.

### Matt Strassler — The Guardian’s Level-Headed Article on Fukushima

[Note: If you missed Wednesday evening's discussion of particle physics involving me, Sean Carroll and Alan Boyle, you can listen to it here.]

I still have a lot of work to do before I can myself write intelligently about the Fukushima Daiichi nuclear plant, and the nuclear accident and cleanup that occurred there. (See here and here for a couple of previous posts about it.) But I did want to draw your attention to one of the better newspaper articles that I’ve seen written about it, by Ian Sample at the Guardian. I can’t vouch for everything that Sample says, but given what I’ve read and investigated myself, I think he finds the right balance. He’s neither scaring people unnecessarily, nor reassuring them that everything will surely be just fine and that there’s no reason to be worried about anything. From what I know and understand, the situation is more or less just as serious and worthy of concern as Sample says it is; but conversely, I don’t have any reason to think it is much worse than what he describes.

Meanwhile, just as I don’t particularly trust anything said by TEPCO, the apparently incompetent and corrupt Japanese power company that runs and is trying to clean up the Fukushima plant, I’m also continuing to see lots of scary articles — totally irresponsible — written by people who should know better but seem bent upon frightening the public. The more wild the misstatements and misleading statements, the better, it seems.

One example of this kind of fear-mongering is to be found here: http://truth-out.org/news/item/19547-fukushima-a-global-threat-that-requires-a-global-response, by Kevin Zeese and Margaret Flowers. It’s one piece of junk after the next: the strategy is to take a fact, take another unrelated fact, quote a non-expert (or quote an expert out of context), stick them all together, and wow! frightening!! But here’s the thing: An experienced and attentive reader will know, after a few paragraphs, to ignore this article. Why?

Because it never puts anything in context. “When contact with radioactive cesium occurs, which is highly unlikely, a person can experience cell damage due to radiation of the cesium particles. Due to this, effects such as nausea, vomiting, diarrhea and bleeding may occur. When the exposure lasts a long time, people may even lose consciousness. Coma or even death may then follow. How serious the effects are depends upon the resistance of individual persons and the duration of exposure and the concentration a person is exposed to, experts say.” Well, how much cesium are we talking about here? Lots or a little? Ah, they don’t tell you that. [The answer: enormous amounts. There's no chance of you getting anywhere near that amount of exposure unless you yourself go wandering around on the Fukushima grounds, and go some place you're really not supposed to go. This didn't even happened to the workers who were at the Fukushima plant when everything was at its worst in March 2011. Even if you ate a fish every week from just off Japan that had a small amount of cesium in it, this would not happen to you.]

Because it makes illogical statements. “Since the accident at Fukushima on March 11, 2011, three reactor cores have gone missing.” Really? Gone missing? Does that make sense? Well then, why is so much radioactive cooling water — which is mentioned later in the article — being stored up at the Fukushima site? Isn’t that water being used to keep those cores cool? And how could that happen if the cores were missing? [The cores melted; it's not known precisely what shape they are in or precisely how much of each is inside or outside the original containment vessel, but they're being successfully cooled by water, so it's clear roughly where they are. They're not "missing"; that's a wild over-statement.]

Because the authors quote people without being careful to explain clearly who they are. “Harvey Wasserman, who has been working on nuclear energy issues for over 40 years,…” Is Harvey Wasserman a scientist or engineer? No.  But he gets lots of press in this article (and elsewhere.) [Wikipedia says: "Harvey Franklin Wasserman (born December 31, 1945) is an American journalist, author, democracy activist, and advocate for renewable energy. He has been a strategist and organizer in the anti-nuclear movement in the United States for over 30 years." I have nothing against Mr. Wasserman and I personally support both renewable energy and the elimination of nuclear power. But as far as I know, Wasserman has no scientific training, and is not an expert on cleaning up a nuclear plant and the risks thereof... and he's an anti-nuclear activist, so you do have to worry he's going to make thing sound worse than they are. Always look up the people being quoted!]

Because the article never once provides balance or nuance: absolutely everything is awful, awful, awful. I’m sorry, but things are never that black and white, or rather, black and black. There are shades of gray in the real world, and it’s important to tease them out a little bit. There are eventualities that would be really terrible, others that would be unfortunate, still others that would merely be a little disruptive in the local area — and they’re not equally bad, nor are they equally likely. [I don't get any sense that the authors are trying to help their readers understand; they're just bashing the reader over the head with one terrifying-sounding thing after another. This kind of article just isn't credible.]

The lesson: one has to be a critical, careful reader, and read between the lines! In contrast to Sample’s article in the Guardian, the document by Zeese and Flowers is not intended to inform; it is intended to frighten, period. I urge you to avoid getting your information from sources like that one. Find reliable, sensible people — Ian Sample is in that category, I think — and stick with them. And I would ignore anything Zeese and Flowers have to say in the future; people who’d write an article like theirs have no credibility.

### Jacques DistlerG2 and Spin(8) Triality

Oscar Chacaltana, Yuji Tachikawa and I are deep in the weeds of nilpotent orbits. One of the things we had to study were the nilpotent orbits of $\mathfrak{g}_2$, and how they sit in $\mathfrak{so}(8)$. Understanding the answer involves an explicit description of $Spin(8)$ triality, which I thought was kinda cute. Few people will care about the nilpotent orbits, but the bit about triality and $G_2$ might be of some independent interest. So here it is.

$Spin(8)$ has a triality symmetry (an outer autmomorphism of the Lie algebra), which permutes the three 8-dimensional irreducible representations: $8_v$, $8_s$, and $8_c$. $\mathfrak{g}_2\subset \mathfrak{so}(8)$ is the invariant subalgebra. (I’ll conveniently pass back and forth between the complex form of the Lie algebra and the compact real form of the group, as both are of interest to us.) What I want to do is describe that triality symmetry very explicitly and, thereby, the realization of $\mathfrak{g}_2$. Note that $Spin(8)$ contains an $({SU(2)}^4)/\mathbb{Z}_2$ subgroup, under which the adjoint decomposes as $28 = (3,1,1,1)+(1,3,1,1)+(1,1,3,1)+(1,1,1,3)+(2,2,2,2)$

Under this decomposition, the action of triality is easy to describe: pick one of the $\mathfrak{sl}(2)$ subalgebras to hold fixed, and consider all permutations of the other three (supplemented by the obvious action on the $(2,2,2,2)$).

That’s triality. Looked at this way, it seems absurdly simple. The above description gives a perfectly concrete action of triality, as permutations of the generators. And we can push a little harder, and really understand $\mathfrak{g}_2$, this way.

The subalgebra, invariant under the $S_3$ permutations, is $\mathfrak{g}_2\subset \mathfrak{so}(8)$, under which

$28 = 14 + 7 \otimes V$

where $V$ is the 2-dimensional irreducible representation of $S_3$. In terms of our previous decomposition,

$G_2 \supset (SU(2)\times {SU(2)}_D)/\mathbb{Z}_2$

where the first $SU(2)$ is the one you kept fixed, and ${SU(2)}_D$ is the diagonal $SU(2)$ of the three which are permuted by triality. Under this embedding,

$\begin{split} 14 &= (3,1)+(1,3)+(2,4)\\ 7 &= (1,3) + (2,2) \end{split}$

An explicit basis of antisymmetric $8\times 8$ matrices which give this $\mathfrak{g}_2$ subalgebra is as follows. First, we embed ${\mathfrak{sl}(2)}^4$, by taking the $8\times 8$ matrix to be block-diagonal, with $4\times 4$ blocks containing ${\mathfrak{sl}(2)}^2$, as

$\begin{matrix} \begin{split} H_L &= \sigma_2 \otimes \mathbb{1}\\ X_L &= \tfrac{1}{2} (\sigma_3+i\sigma_1) \otimes \sigma_2\\ Y_L &= \tfrac{1}{2} (\sigma_3-i\sigma_1) \otimes \sigma_2 = X_L^\dagger \end{split}&,\qquad\qquad \begin{split} H_R &= \mathbb{1} \otimes \sigma_2\\ X_R &= \tfrac{1}{2} \sigma_2 \otimes (\sigma_3+i\sigma_1)\\ Y_R &= \tfrac{1}{2} \sigma_2 \otimes (\sigma_3-i\sigma_1) = X_R^\dagger \end{split} \end{matrix}$

where we’ve chosen the normalization conventions

$\begin{split} [X,Y]&=H\\ [H,X]&=2X\\ [H,Y]&=-2Y \end{split}$

We pick one of these (the ${\mathfrak{sl}(2)}_L$ in the upper left-hand block) to hold fixed, and embed our second $\mathfrak{sl}(2)$ diagonally in the other three:

$\begin{split} H_1 &= \tfrac{1}{2} (\mathbb{1}+\sigma_3)\otimes \sigma_2 \otimes 1\\ X_1 &= \tfrac{1}{4} (\mathbb{1}+\sigma_3)\otimes (\sigma_3+i\sigma_1)\otimes\sigma_2\\ Y_1 &= X_1^\dagger\\ H_2 &= \tfrac{1}{2} (\mathbb{1}-\sigma_3)\otimes \sigma_2 \otimes \mathbb{1} + \mathbb{1}\otimes \mathbb{1} \otimes \sigma_2\\ X_2 &= \tfrac{1}{4} (\mathbb{1}-\sigma_3)\otimes (\sigma_3+i\sigma_1)\otimes \sigma_2 + \tfrac{1}{2} 1\otimes \sigma_2\otimes (\sigma_3+i\sigma_1)\\ Y_2 &= X_2^\dagger \end{split}$

The highest weight of the $(2,4)$ is

$S_{1,3} = \tfrac{1}{4} \sigma_2\otimes (\sigma_3+i\sigma_1)\otimes (\sigma_3+i\sigma_1)$

The remaining ones, e.g., $S_{-1,3} = [Y_1, S_{1,3}]$, are obtained by acting with the lowering operators, $Y_{1,2}$. With this choice of Cartan, the simple roots of $\mathfrak{g}_2$ correspond to $X_2$ (short root) and $S_{1,-3}$ (long root).

$X_2$

This 8-dimensional representation of $G_2$, as it’s reducible, is not the most convenient one for studying the representation theory of $G_2$. But it’s tailor-made for our purpose, which is understanding the embedding in $Spin(8)$. With an explicit embedding in hand, we can manufacture a distinguished triple, $(H,X,Y)$ for each nilpotent orbit of $\mathfrak{g}_2$, and see how it sits in $\mathfrak{so}(8)$. But that, probably, holds little interest for the general reader, so I’ll end here.

### Jacques DistlerNormal Coordinate Expansion

I’ve been spending several weeks at the Simons Center for Geometry and Physics. Towards the end of my stay, I got into a discussion with Tim Nguyen, about Ricci flow and nonlinear $\sigma$-models. He’d been reading Friedan’s PhD thesis, alongside [Kevin Costello’s book](http://www.ams.org/bookstore-getitem/item=SURV-170). So I pointed him to some old notes of mine on the normal coordinate expansion, a key ingredient in renormalizing nonlinear $\sigma$-models, using the background-field method.

Then it occurred to me that the internet would be a much more useful place for those notes. So, since I have some time to kill, in JFK, here they are.

Let $\phi:\Sigma\to M$ be a map from the worldsheet, $\Sigma$ into a Riemannian Manifold, $(M,g)$. The NL$\sigma$M action is
(1)$S= \int_\Sigma \tfrac{1}{2}(\phi^*g)(\partial_\mu,\partial_\mu) d^2x$
The Normal Coordinate Expansion is a particularly nice parametrization of fluctuations about the classical $\sigma$-model field $\phi$. I’ll use index-free notation, wherever possible. The Levi-Cevita connection, $\nabla$, on $M$ is torsion-free and metric-compatible,
(2)$\begin{split} \nabla_X Y-\nabla_Y X-[X,Y]&\equiv T(X,Y)=0\\ X(g(Y,Z))&=g(\nabla_X Y,Z)+g(Y,\nabla_X Z) \end{split}$
The Riemann curvature tensor is
(3)$R(X,Y)=\nabla_X\nabla_Y-\nabla_Y \nabla_X-\nabla_{[X,Y]}$
Consider a 1-parameter family of such $\sigma$-model maps, $\phi_t:\Sigma\to M$, $t\in[0,1]$ with $\phi_0=\phi$, our original $\sigma$-model map. We can equally-well think about this family as a map $\hat\phi:\Sigma\times[0,1]\to M$, with
(4)$\hat\phi(x,t)=\phi_t(x)$
Given $\hat\phi$, we can pull back the connection, $\nabla$, on $M$ to a connection $\hat \nabla$ on $\Sigma\times [0,1]$. We don’t want to choose any old 1-parameter family, though. Let $\xi(x)$ be a section of the pullback tangent bundle, $\phi^*TM$. We wish to choose $\hat\phi$ so that we can extend $\xi(x)$ to $\xi(x,t)$ such that
(5)$\begin{split} \xi(x,t)&=\hat\phi_*\tfrac{\partial}{\partial t}\\ \hat\nabla_{\partial_t}\xi&=0 \end{split}$
How do we achieve this? The idea is that, given a point $x\in\Sigma$, $\xi(x)$ gives a tangent vector to $M$ at the point $\phi(x)$. For this “initial condition”, we solve the geodesic equation
(6)$\begin{gathered} \gamma:\, [0,1]\to M\\ \ddot{\gamma}^k + \Gamma_{ij}^k \dot{\gamma}^i\dot{\gamma}^k=0\\ \gamma(0)=\phi(x),\qquad \dot{\gamma}(0)=\xi(x) \end{gathered}$
and define the point $\phi_t(x)\in M$ to be $\gamma(t)$. This is guaranteed to be well-defined for small enough $t$. We can extend it to $t=1$ by considering $\xi$ to be sufficiently “small”. The extension of $\xi(x)$ to $\xi(x,t)$ is simply the one given by parallel-transporting $\xi$ along the curve $\gamma$, $\hat\nabla_{\partial_t}\xi=0$ or, with slight abuse of notation,
(7)$\nabla_\xi\xi=0$
(This is an abuse of notation because $\xi$ is not really a tensor on $M$. We typically have points $x,x&apos\in\Sigma$ with $\phi(x)=\phi(x&apos)$ but $\xi(x)\neq\xi(x&apos)$. This “mistake” will correct itself when we pull back to $\Sigma$.) Since $\Bigl[\tfrac{\partial}{\partial t},\partial_\mu\bigr]=0$ and the connection $\nabla$ is torsion-free, we can always exchange
(8)$\hat\nabla_{\partial_t}v =\hat\nabla_{\partial_\mu}\xi$
where
(9)$v=\hat\phi_* \partial_\mu$
or, with the same abuse of notation,
(10)$\nabla_\xi v =\nabla_v \xi$
Taking another covariant derivative with respect to $t$, we get the equation of geodesic deviation [1] $\hat\nabla_{\partial_t}^2v =\hat\nabla_{\partial_t}\hat\nabla_{\partial_\mu}\xi =\hat R(\partial_t,\partial_\mu)\xi$ or, in our abusive notation,
(11)$\nabla_\xi\nabla_v \xi= R(\xi,v)\xi$
Now say we wish to evaluate the $t$-dependence of the pull-back of some tensor $T$ on $M$, $\phi_{t&apos}^* T = e^{t&apos\hat\nabla_{\partial_t}}(\hat\phi^*T)\vert_{t=0}$ which we can, again, write as
(12)$\phi^*(e^{t&apos\nabla_\xi} T) = \phi^*(T+t&apos\nabla_\xi T +\tfrac{t&apos^2}{2}\nabla^2_\xi T +\dots)$
We are all set to apply this to the $\sigma$-model Lagrangian,
(13)$\tfrac{1}{2}(\phi_t^*g)(\partial_\mu,\partial_\mu) = \phi_t^*(\tfrac{1}{2} g(v,v))$
We expand this using (12) and use (7),(10) and (11) to simplify the terms that result. Note that the $n^{th}$ order term in (12) is given by $\tfrac{ 1}{n}\nabla_\xi$ of the $(n-1)^{st}$ order term. The $0^{th}$ order term is $\tfrac{1}{2}g(v,v)$ At first order, we get $g(v,\nabla_\xi v)= g(v,\nabla_v \xi)$ Next comes $\tfrac{1}{2}g(\nabla_\xi v,\nabla_v \xi)+\tfrac{1}{2}g(v,\nabla_\xi \nabla_v \xi)= \tfrac{1}{2}g(\nabla_v \xi,\nabla_v \xi) + \tfrac{1}{2}g(v,R(\xi,v) \xi)$ At $3^{rd}$ order, we get $\begin{split} \tfrac{1}{6}[2g(\nabla_v \xi,R(\xi,v )\xi)+g(\nabla_v \xi,R(\xi,v)\xi)& +g(v,(\nabla_\xi R)(\xi,v)\xi) +g(v,R(\xi,\nabla_v\xi)\xi)]\\ &=\tfrac{1}{6} g(v,(\nabla R)(\xi,\xi,v)\xi) +\tfrac{2}{3}g(\nabla_v\xi,R(\xi,v)\xi) \end{split}$ where we used the symmetry of the Riemann tensor
(14)$g(U,R(V,W)Z)=g(W,R(Z,U)V)$
Finally, at $4^{th}$ order, we get $\begin{split} \tfrac{1}{24}[g(\nabla_v \xi,(\nabla R)(\xi,\xi,v )\xi) &+g(v,(\nabla \nabla R)(\xi,\xi,\xi,v )\xi) +g(v,(\nabla R)(\xi,\xi,\nabla_v\xi)\xi) ]\\ &+\tfrac{1}{6}[g(R(\xi,v)\xi,R(\xi,v)\xi) +g(\nabla_v \xi,(\nabla R)(\xi,\xi,v)\xi) +g(\nabla_v \xi,R(\xi,\nabla_v \xi)\xi)]\\ =&\tfrac{1}{4} g(\nabla_v \xi,(\nabla R)(\xi,\xi,v)\xi) +\tfrac{1}{6}g(\nabla_v \xi,R(\xi,\nabla_v \xi)\xi)\\ &+\tfrac{1}{6}g(R(\xi,v)\xi,R(\xi,v)\xi) +\tfrac{1}{24}g(v,(\nabla \nabla R)(\xi,\xi,\xi,v )\xi) \end{split}$ and so forth. Assembling all of these, we obtain
(15)$\begin{split} \phi_t^*\tfrac{1}{2}g(v,v)=\phi^*[\tfrac{1}{2}g(v,v)& +g(v,\nabla_v \xi) +\tfrac{1}{2}g(\nabla_v \xi,\nabla_v \xi) + \tfrac{1}{2}g(v,R(\xi,v) \xi)\\ &+\tfrac{1}{6} g(v,(\nabla R)(\xi,\xi,v)\xi) +\tfrac{2}{3}g(\nabla_v\xi,R(\xi,v)\xi)\\ &+\tfrac{1}{4} g(\nabla_v \xi,(\nabla R)(\xi,\xi,v)\xi) +\tfrac{1}{6}g(\nabla_v \xi,R(\xi,\nabla_v \xi)\xi)\\ & +\tfrac{1}{6}g(R(\xi,v)\xi,R(\xi,v)\xi) +\tfrac{1}{24}g(v,(\nabla \nabla R)(\xi,\xi,\xi,v )\xi)] \end{split}$
Pulling back to $\Sigma$, we obtain the desired expansion of the $\sigma$-model lagrangian. In conventional notation, replace $v^i=\partial_\mu\phi^i$, $(\nabla_v\xi)^i=D_\mu \xi^i$ and $g(U,R(W,Z)V)=R_{ijkl}U^i V^j W^k Z^l$ to obtain the expressions found in Friedan [2] or Freedman et al [3].

Exercise 1: Compute the next term in the expansion, $\begin{split} \tfrac{1}{12}g(\nabla_v\xi,(\nabla R)(\xi,\xi,\nabla_v\xi)\xi) +\tfrac{2}{15}g(R(\xi,v)\xi,R(\xi,\nabla_v\xi)\xi) +\tfrac{1}{15}g(\nabla_v\xi,(\nabla\nabla R)(\xi,\xi,\xi,v)\xi)\\ +\tfrac{7}{60}g(R(\xi,v)\xi,(\nabla R)(\xi,\xi,v)\xi) +\tfrac{1}{120}g(v,(\nabla\nabla\nabla R)(\xi,\xi,\xi,\xi,v)\xi) \end{split}$

Exercise 2: Let $M$ be the $n$-sphere, $S^n$, with the round metric, $ds^2= \frac{4 r^2(\mu) dx^i dx^i}{(1+|\mathbf{x}|^2)^2}$ Show that the solution to the one-loop $\beta$-function equation is $r^2(\mu) = r^2(\mu_0) + \frac{n-1}{4\pi}\log(\mu/\mu_0)$
[1] S. Weinberg, Gravitation and Cosmology, (Wiley, 1972) p. 148.
[2] D. H. Friedan, “Nonlinear Models in $2+\epsilon$ Dimensions,” Ann. Phys. 163 (1985) 318.
[3] L. Alvarez-Gaume, D. Z. Freedman and S. Mukhi, “The Background Field Method and the Ultraviolet Structure of the Supersymmetric Nonlinear Sigma Model,” Ann. Phys. 134 (1981) 85.

### Jacques DistlerUncertainty

#### Update (10/18/2012) — Mea Culpa:

Sonia pointed out to me that my (mis)interpretation of Ozawa was too charitable. We ended up (largely due to Steve Weinberg’s encouragement) writing a paper. So… where does one publish simple-minded (but, apparently, hitherto unappreciated) remarks about elementary Quantum Mechanics?

Sonia was chatting with me about this PRL (arXiv version), which seems to have made a splash in the news media and in the blogosphere. She couldn’t make heads or tails of it and (as you will see), I didn’t do much better. But I thought that I would take the opportunity to lay out a few relevant remarks.

Since we’re going to be talking about the Uncertainty Principle, and measurements, it behoves us to formulate our discussion in terms of density matrices.

A quantum system is described in terms of a density matrix, $\rho$, which is a self-adjoint, positive-semidefinite trace-class operator, satisfying

$Tr(\rho)=1,\qquad Tr(\rho^2) \leq 1$

In the Schrödinger picture (which we will use), it evolves unitarily in time

(1)$\rho(t_2) = U(t_2,t_1) \rho(t_1) {U(t_2,t_1)}^{-1}$

except when a measurement is made.

Consider a self-adjoint operator $A$ (an “observable”). We will assume that $A$ has a pure point spectrum, and let $P^{(A)}_i$ be the projection onto the $i^{\text{th}}$ eigenspace of $A$.

When we measure $A$, quantum mechanics computes for us

1. A classical probability distribution for the values on the readout panel of the measuring apparatus. The moments of this probability distribution are computed by taking traces. The $n^{\text{th}}$ moment is ${\langle A^n\rangle} = Tr(A^n\rho)$ In particular, the variance is ${(\Delta A)}^2 = Tr(A^2\rho) - {\left(Tr(A\rho)\right)}^2$
2. A change (which, under the assumptions stated, can be approximated as occurring instantaneously) in the density matrix,
(2)$\rho_{\text{after}}\equiv \hat{\rho}(\rho,A) = \sum_i P^{(A)}_i \rho P^{(A)}_i$

Thereafter, the system, described by the new density matrix, $\hat{\rho}$, again evolves unitarily, according to (1).

The new density matrix, $\hat{\rho}$, after the measurement1, can be completely characterized by two properties

1. All of the moments of $A$ are the same as before $\langle A^n\rangle = Tr(A^n \hat{\rho}(\rho,A))= Tr(A^n \rho)$ (In particular, $\Delta A$ is unchanged.) Moreover, for any observable, $C$, which commutes with $A$ ( $[C,A]=0$ ), $\langle C^n\rangle = Tr(C^n \hat{\rho}(\rho,A))= Tr(C^n \rho)$
2. However, the measurement has destroyed all interference between the different eigenspaces of $A$ $Tr([A,B] \hat{\rho}(\rho,A)) = 0, \qquad \forall\, B$

Note that it is really important that I have assumed a pure point spectrum. If $A$ has a continuous spectrum, then you have to deal with complications both physical and mathematical. Mathematically, you need to deal with the complications of the Spectral Theorem; physically, you have to put in finite detector resolutions, in order to make proper sense of what a “measurement” does. I’ll explain, later, how to deal with those complications

Now consider two such observables, $A$ and $B$. The Uncertainty Principle gives a lower bound on the product

(3)${(\Delta A)}_\rho {(\Delta B)}_\rho \geq \tfrac{1}{2}\left|Tr(-i[A,B]\rho)\right|$

in any state, $\rho$. (Exercise: Generalize the usual proof, presented for “pure states” to the case of density matrices.)

As stated, (3) is not a statement about the uncertainties in any actual sequence of measurements. After all, once you measure $A$, in state $\rho$, the density matrix changes, according to (2), to

(4)$\hat{\rho}(\rho,A) = \sum_i P^{(A)}_i \rho P^{(A)}_i$

so a subsequent measurement of $B$ is made in a different state from the initial one.

The obvious next thing to try is to note that, since the uncertainty of $A$ in the state $\hat{\rho}(\rho,A)$ is the *same* as in the state $\rho$, and since we are measuring $B$ in the state $\hat{\rho}(\rho,A)$, we can apply the Uncertainty Relation, (3) in the state $\hat{\rho}$, instead of in the state, $\rho$. Unfortunately, $Tr([A,B]\hat{\rho}(\rho,A))=0$, so this leads to an uninteresting lower bound on the product the uncertainties

(5)$(\Delta A) (\Delta B) = {(\Delta A)}_{\hat{\rho}(\rho,A)} {(\Delta B)}_{\hat{\rho}(\rho,A)} \geq 0$

for a measurement of $A$ immediately followed by a measurement of $B$.

It is, apparently, possible to derive a better lower bound on the product of the uncertainties of successive measurements (which is still, of course, weaker than the “naïve” $\tfrac{1}{2}\left|Tr(-i[A,B]\rho)\right|$, which is what you might have guessed for the lower bound, had you not thought about what (3) means). But I don’t know how to even state that result at the level of generality of the above discussion

Instead, I’d like to discuss how one treats measurements, when $A$ doesn’t have a pure point spectrum. When it’s discussed at all, it’s treated very poorly in the textbooks.

#### Measuring Unbounded Operators

Let’s go straight to the worst-case, of an unbounded operator, with $Spec(A)=\mathbb{R}$. Such an operator has no eigenvectors at all. What happens when we measure such an observable? Clearly, the two conditions which characterized the change in the density matrix, in the case of a pure point spectrum,

1. $Tr(A^n \hat{\rho}(\rho,A))= Tr(A^n \rho)$
2. $Tr([A,B] \hat{\rho}(\rho,A)) = 0, \qquad \forall\, B$

are going to have to be modified. The second condition clearly can’t hold for all choice of $B$, in the unbounded case (think $A=\mathbf{x}$ and $B=\mathbf{p}$). As to the first condition, we might *hope* that the moments of the classical probability distribution for the observed measurements of $A$ would be calculated by taking traces with the density matrix, $\hat{\rho}$. But that probability distribution *depends* on the resolution of the detector, something which the density matrix, $\rho$, knows nothing about.

To keep things simple, let’s specialize to $\mathcal{H}=\mathcal{L}^2(\mathbb{R})$ and $A=\mathbf{x}$. Let’s imagine a detector which can measure the particle’s position with a resolution, $\sigma$. Let’s define a projection operator

$P_{x_0}:\quad \psi(x) \mapsto \int\frac{d y}{\sigma\sqrt{\pi}} e^{-\left( (x-x_0)^2+(y-x_0)^2\right)/2\sigma^2}\psi(y)$

which reflects the notion that our detector has measured the position to be $x_0$, to within an accuracy $\sigma$. Here, I’ve chosen a Gaussian; but really any acceptance function peaked at $x=x_0$, and dying away sufficiently fast away from $x_0$ will do (and may more-accurately reflect the properties of your actual detector).

But I only know how to do Gaussian integrals, so this one is a convenient choice.

This is, indeed, a projection operator: $P_{x_0}P_{x_0} = P_{x_0}$. But integrating over $x_0$ doesn’t quite give the completeness relation one would want

$\int \frac{d x_0}{2\sigma\sqrt{\pi}} P_{x_0} \neq \mathbb{1}$

$\int \frac{d x_0}{2\sigma\sqrt{\pi}} P_{x_0}:\quad \psi(x) \mapsto \int \frac{d u}{2\sigma\sqrt{\pi}} e^{-u^2/4\sigma^2} \psi(x+u)$

Rather than getting $\psi(x)$ back, we get $\psi(x)$ smeared against a Gaussian.

To fix this, we need to consider a more general class of projection operators (here, again, the Gaussian acceptance function proves very convenient):

(6)$P_{x_0,k_0}:\quad \psi(x)\mapsto \int \frac{d y}{\sigma \sqrt{\pi}}\exp\left[-\left((x-x_0)^2 + (y-x_0)^2\right)/2\sigma^2 +i k_0(x-y)\right] \psi(y)$

These are still projection operators,

$P_{x_0,k_0} P_{x_0,k_0} = P_{x_0,k_0}$

But now they obey the completeness relation

(7)$\int \frac{d x_0 d k_0}{2\pi} P_{x_0,k_0} =\mathbb{1}$

so we can now assert that the density matrix after measuring $\mathbf{x}$ is

(8)$\hat{\rho} = \int \frac{d x_0 d k_0}{2\pi} P_{x_0,k_0} \rho P_{x_0,k_0}$

If $\rho$ is represented by the integral kernel, $K(x,y)$:

$\rho:\quad \psi(x) \mapsto \int d y\, K(x,y) \psi(y)$

then the new density matrix, $\hat{\rho}$ is represented by the integral kernel

$\hat{K}(x,y) = e^{-(x-y)^2/2\sigma^2}\int \frac{d u}{\sigma \sqrt{2\pi}} e^{-u^2/2\sigma^2} K(x+u,y+u)$

Here we see clearly that it has the desired properties:

1. The off-diagonal terms are suppressed; $\hat{K}(x,y)\to 0$ for $|x-y|\gg \sigma$.
2. The near-diagonal terms are smeared by a Gaussian, representing the finite resolution of the detector.

Moreover, the moments of the probability distribution for the measured value of $x$ are given by taking traces with $\hat{\rho}$:

$\langle x^n\rangle = Tr(\mathbf{x}^n \hat{\rho})$

One easily computes

$\begin{split} Tr(\mathbf{x} \hat{\rho})&=Tr(\mathbf{x} \rho)\\ Tr(\mathbf{x}^2 \hat{\rho})&=\sigma^2+ Tr(\mathbf{x}^2 \rho)\\ \end{split}$

So the intrinsic quantum-mechanical uncertainty of the position, $x$, in the state, $\rho$, adds in quadrature with the systematic uncertainty of the measuring apparatus to produce the measured uncertainty

$(\Delta x)^2_{\text{measured}} = (\Delta x)^2_{\hat{\rho}} = (\Delta x)^2_{\rho} + \sigma^2$

exactly as we expect.

There’s one feature of this Gaussian measuring apparatus which is a little special. Of course, we expect that measuring $x$ should change the distribution for values of $p$. Here, the effect (at least on the first few moments) is quite simple

$\begin{split} Tr(\mathbf{p} \hat{\rho})&=Tr(\mathbf{p} \rho)\\ Tr(\mathbf{p}^2 \hat{\rho})&=\frac{1}{\sigma^2}+ Tr(\mathbf{p}^2 \rho)\\ \end{split}$

If we wanted to compute the effect of measuring $p$, using a Gaussian detector with systematic uncertainty $\sigma_p =1/\sigma$, we would use the same projectors (6) and obtain the same density matrix (8) after the measurement. This leads to very simple formulæ for the uncertainties resulting from successive measurements. Say we start with an initial state, $\rho$, measure $x$ with a Gaussian detector with systematic uncertainty $\sigma_x$, and then measure $p$ with another Gaussian detector with systematic uncertainty $\sigma_p$. The measured uncertainties are

(9)$\begin{split} {(\Delta x)}^2&= {(\Delta x)}^2_\rho +\sigma_x^2\\ {(\Delta p)}^2&= {(\Delta p)}^2_{\hat{\rho}} +\sigma_p^2={(\Delta p)}^2_\rho +\frac{1}{\sigma_x^2}+\sigma_p^2 \end{split}$

You can play around with other, non-Gaussian, acceptance functions to replace (6). You’re limited only by your ability to find a complete set of projectors, satisfying the analogue of (7) and, of course, by your ability to do the requisite integrals.

What you’ll discover is that the Gaussian acceptance function provides the best tradeoff (when, say, you measure $x$) between the systematic uncertainty in $x$ and the contribution to the quantum-mechanical uncertainty in $p$, resulting from the measurement.

#### Update (9/20/2012):

I looked some more at the Ozawa paper whose “Universally valid reformulation” of the uncertainty principle this PRL proposes to test. Unfortunately, it doesn’t seem nearly as interesting as it did at first glance.

• For observables, $A,B$, with pure point spectra, we can assume “ideal” measuring apparati (whose measured uncertainty equals the inherent quantum-mechanical uncertainty of the observable in the quantum state in which the measurement is made). In that case, his uncertainty relation (see (17) of his paper) reduces to the “uninteresting” (5). Of course that’s trivially satisfied. I believe that a stronger bound can be derived, in this case. But doing so requires more sophisticated techniques than Ozawa uses.
• For unbounded observables, like $x,p$, we can see from what I’ve said above that the actual lower bound is *stronger* than the one Ozawa derives. Consider a measurement of $x$, followed by a measurement of $p$. From (9), the product of the measured uncertainties2 satisfies
(10)$\begin{split} {(\Delta x)}^2{(\Delta p)}^2 &\geq \frac{5}{4} + {(\Delta x)}^2_\rho \left(\frac{1}{\sigma_x^2}+\sigma_p^2\right) + {(\Delta p)}^2_\rho \sigma_x^2 +\sigma_x^2\sigma_p^2\\ &\geq {\left(\frac{1}{2} +\sqrt{1+\sigma_x^2\sigma_p^2}\right)}^2 \end{split}$

where the last inequality is saturated by an initial state, $\rho$, which is a pure state consisting of a Gaussian wave packet with carefully-chosen width, ${(\Delta x)}^2_\rho = \frac{\sigma_x^2}{2\sqrt{1+\sigma_x^2\sigma_p^2}},\qquad {(\Delta p)}^2_\rho = \frac{\sqrt{1+\sigma_x^2\sigma_p^2}}{2\sigma_x^2}$

#### Update (9/28/2012):

Here is, at least, one lower bound (stronger than Ozawa’s stupid bound) for the product of measured uncertainties when $A,B$ have pure point spectra. Let

$\hat{B} = \sum_i P^{(A)}_i B P^{(A)}_i$

and

$M_i = P^{(A)}_i B \left(\mathbb{1} - P^{(A)}_i\right)$

We easily compute

${(\Delta B)}^2_{\hat{\rho}} = {(\Delta \hat{B})}^2_{\rho} + \sum_i Tr(M_i \M_i^\dagger \rho)$

and hence

(11)${(\Delta A)}^2_\rho {(\Delta B)}^2_{\hat{\rho}} = {(\Delta A)}^2_\rho {(\Delta \hat{B})}^2_{\rho} + {(\Delta A)}^2_\rho \sum_i Tr(M_i \M_i^\dagger \rho)$

Since $[\hat{B},A]=0$, the first term is bounded below3 by

(12)${(\Delta A)}^2_\rho {(\Delta \hat{B})}^2_{\rho} \geq {\left(\frac{1}{2} Tr(\{A,\hat{B}\}\rho)-Tr(A\rho)Tr(\hat{B}\rho)\right)}^2$

The second term is also positive-semidefinite.

For the classic case of a 2-state system, with $A=J_x$ and $B=J_y$ (the system considered by the aforementioned PRL), we see that $\hat{B}\equiv 0$, and the product of uncertainties is entirely given by the second term of (11).

The most general density matrix for the 2-state system is parametrized by the unit 3-ball

$\rho = \frac{1}{2}\left(\mathbb{1}+ \vec{a}\cdot\vec{\sigma}\right), \qquad \vec{a}\cdot\vec{a}\leq 1$

The points on the boundary $S^2=\{\vec{a}\cdot\vec{a}=1\}$ correspond to pure states.

Upon measuring $A=J_x\equiv\tfrac{1}{2}\sigma_x$, the density matrix after the measurement is

$\hat{\rho} = \frac{1}{2}\left(\mathbb{1}+ \a_x\sigma_x\right)$

and, for a subsequent measurement of $J_y$,

${(\Delta J_x)}^2_\rho{(\Delta J_y)}^2_{\hat{\rho}} = \frac{1}{16}(1-a_x^2)$

as “predicted” by (11).

1 Frequently, one wants to ask questions about conditional probabilites: “Given that a measurement of $A$ yields the value $\lambda_1$, what is the probability distribution for a subsequent measurement of …”. To answer such questions, one typically works with a new (“projected”) density matrix, $\rho&apos = \frac{1}{Z} P^{(A)}_1 \rho P^{(A)}_1$, where the normalization factor $Z=Tr(P^{(A)}_1 \rho)$ is required to make $Tr(\rho&apos)=1$. The formalism in the main text of this post is geared, instead, to computing joint probability distributions.

2 Ozawa’s inequality isn’t for the product of the measured uncertainties, $(\Delta x)(\Delta p)$, but rather for the product, $(\Delta x){(\Delta p)}_{\hat{\rho}}$, of the measured uncertainty in $x$ with the quantum-mechanical uncertainty in $p$ in the state $\hat{\rho}$ which results from the $x$-measurement. To obtain this, just mentally set $\sigma_p=0$ in the above formulæ.

3 Let $S= \left(A-{\langle A\rangle}_\rho\mathbb{1}\right)+e^{i\theta}\alpha \left(B-{\langle B\rangle}_\rho\mathbb{1}\right)$ for $\alpha\in\mathbb{R}$. Consider $Q(\alpha) = Tr(S S^\dagger \rho)$ This is a quadratic expression in $\alpha$, which is positive-semidefinite, $Q(\alpha)\geq 0$. Thus, the discriminant must be negative-semidefinite, $D\leq 0$. For $\theta=\pi/2$, this yields the conventional uncertainty relation, ${(\Delta A)}^2_\rho{(\Delta B)}^2_\rho\geq\frac{1}{4}{\left(Tr([A,B]\rho)\right)}^2$ For $\theta=0$, it yields ${(\Delta A)}^2_\rho{(\Delta B)}^2_\rho\geq{\left(\frac{1}{2}Tr(\{A,B\}\rho)-Tr(A\rho)Tr(B\rho)\right)}^2$ which is an expression you sometimes see, in the higher-quality textbooks.

### Clifford Johnson — Lunch Time in Colour

Finished the inks and threw some (digital) paints over the lunch group. I think it's time to move on to another part of The Project. I've spent far too long fiddling with the light in this. -cvj Click to continue reading this post

## December 08, 2013

### Sean Carroll — The Branch We Were Sitting On

In the latest issue of the New York Review, Cathleen Schine reviews Levels of Life, a new book by Julian Barnes. It’s described as a three-part meditation on grief, following the death of Barnes’s wife Pat Kavanagh.

One of the things that is of no solace to Barnes (and there are many) is religion. He writes:

When we killed–or exiled–God, we also killed ourselves…. No God, no afterlife, no us. We were right to kill Him, of course, this long-standing imaginary friend of ours. And we weren’t going to get an afterlife anyway. But we sawed off the branch we were sitting on. And the view from there, from that height–even if it was only an illusion of a view–wasn’t so bad.

I can’t disagree. Atheists often proclaim the death of God in positively gleeful terms, but it’s important to recognize what was lost–a purpose in living, a natural place in the universe. The loss is not irretrievable; there is nothing that stops us from creating our own meaning even if there’s no supernatural overseer to hand one to us. But it’s a daunting task, one to which we haven’t really faced up.

### Andrew Jaffe — Academic Blogging Still Dangerous?

Nearly a decade ago, blogging was young, and its place in the academic world wasn’t clear. Back in 2005, I wrote about an anonymous article in the Chronicle of Higher Education, a so-called “advice” column admonishing academic job seekers to avoid blogging, mostly because it let the hiring committee find out things that had nothing whatever to do with their academic job, and reject them on those (inappropriate) grounds.

I thought things had changed. Many academics have blogs, and indeed many institutions encourage it (here at Imperial, there’s a College-wide list of blogs written by people at all levels, and I’ve helped teach a course on blogging for young academics). More generally, outreach has become an important component of academic life (that is, it’s at least necessary to pay it lip service when applying for funding or promotions) and blogging is usually seen as a useful way to reach a wide audience outside of one’s field.

So I was distressed to see the lament — from an academic blogger — “Want an academic job? Hold your tongue”. Things haven’t changed as much as I thought:

… [A senior academic said that] the blog, while it was to be commended for its forthright tone, was so informal and laced with profanity that the professor could not help but hold the blog against the potential faculty member…. It was the consensus that aspiring young scientists should steer clear of such activities.

Depending on the content of the blog in question, this seems somewhere between a disregard for academic freedom and a judgment of the candidate on completely irrelevant grounds. Of course, it is natural to want the personalities of our colleagues to mesh well with our own, and almost impossible to completely ignore supposedly extraneous information. But we are hiring for academic jobs, and what should matter are research and teaching ability.

Of course, I’ve been lucky: I already had a permanent job when I started blogging, and I work in the UK system which doesn’t have a tenure review process. And I admit this blog has steered clear of truly controversial topics (depending on what you think of Bayesian probability, at least).

### Andrew Jaffe — The next generation of large satellites: PRISM and/or eLISA?

Today was the deadline for submitting so-called “White Papers” proposing the next generation of the European Space Agency satellite missions. Because of the long lead times for these sorts of complicated technical achievements, this call is for launches in the faraway years of 2028 or 2034. (These dates would be harder to wrap my head around if I weren’t writing this on the same weekend that I’m attending the 25th reunion of my university graduation, an event about which it’s difficult to avoid the clichéd thought that May, 1988 feels like the day before yesterday.)

At least two of the ideas are particularly close to my scientific heart.

The Polarized Radiation Imaging and Spectroscopy Mission (PRISM) is a cosmic microwave background (CMB) telescope, following on from Planck and the current generation of sub-orbital telescopes like EBEX and PolarBear: whereas Planck has 72 detectors observing the sky over nine frequencies on the sky, PRISM would have more than 7000 detectors working in a similar way to Planck over 32 frequencies, along with another set observing 300 narrow frequency bands, and another instrument dedicated to measuring the spectrum of the CMB in even more detail. Combined, these instruments allow a wide variety of cosmological and astrophysical goals, concentrating on more direct observations of early Universe physics than possible with current instruments, in particular the possible background of gravitational waves from inflation, and the small correlations induced by the physics of inflation and other physical processes in the history of the Universe.

The eLISA mission is the latest attempt to build a gravitational radiation observatory in space, observing astrophysical sources rather than the primordial background affecting the CMB, using giant lasers to measure the distance between three separate free-floating satellites a million kilometres apart from one another. As a gravitational wave passes through the triangle, it bends space and effectively changes the distance between them. The trio would thereby be sensitive to the gravitational waves produced by small, dense objects orbiting one another, objects like white dwarfs, neutron stars and, most excitingly, black holes. This would give us a probe of physics in locations we can’t see with ordinary light, and in regimes that we can’t reproduce on earth or anywhere nearby.

In the selection process, ESA is supposed to take into account the interests of the community. Hence both of these missions are soliciting support, of active and interested scientists and also the more general public: check out the sites for PRISM and eLISA. It’s a tough call. Both cases would be more convincing with a detection of gravitational radiation in their respective regimes, but the process requires putting down a marker early on. In the long term, a CMB mission like PRISM seems inevitable — there are unlikely to be any technical showstoppers — it’s just a big telescope in a slightly unusual range of frequencies. eLISA is more technically challenging: the LISA Pathfinder effort has shown just how hard it is to keep and monitor a free-floating mass in space, and the lack of a detection so far from the ground-based LIGO observatory, although completely consistent with expectations, has kept the community’s enthusiasm lower. (This will likely change with Advanced LIGO, expected to see many hundreds of sources as soon as it comes online in 2015 or thereabouts.)

Full disclosure: although I’ve signed up to support both, I’m directly involved in the PRISM white paper.

### Dave Bacon — QIP 2014 accepted talks

This Thanksgiving, even if we can’t all be fortunate enough to be presenting a talk at QIP, we can be thankful for being part of a vibrant research community with so many different lines of work going on. The QIP 2014 accepted talks are now posted with 36 out of 222 accepted. While many of the hot topics of yesteryear (hidden subgroup problem, capacity of the depolarizing channel) have fallen by the wayside, there is still good work happening in the old core topics (algorithms, information theory, complexity, coding, Bell inequalities) and in topics that have moved into the mainstream (untrusted devices, topological order, Hamiltonian complexity).

Here is a list of talks, loosely categorized by topic (inspired by Thomas’s list from last year). I’m pretty sad about missing my first QIP since I joined the field, because its unusually late timing overlaps the first week of the semester at MIT. But in advance of the talks, I’ll write a few words (in italics) about what I would be excited about hearing if I were there.

### Quantum Shannon Theory

There are a lot of new entropies! Some of these may be useful – at first for tackling strong converses, but eventually maybe for other applications as well. Others may, well, just contribute to the entropy of the universe. The bounds on entanglement rate of Hamiltonians are exciting, and looking at them, I wonder why it took so long for us to find them.

1a. A new quantum generalization of the Rényi divergence with applications to the strong converse in quantum channel coding
Frédéric Dupuis, Serge Fehr, Martin Müller-Lennert, Oleg Szehr, Marco Tomamichel, Mark Wilde, Andreas Winter and Dong Yang. 1306.3142 1306.1586
merged with
1b. Quantum hypothesis testing and the operational interpretation of the quantum Renyi divergences
Milan Mosonyi and Tomohiro Ogawa. 1309.3228

25. Zero-error source-channel coding with entanglement
Jop Briet, Harry Buhrman, Monique Laurent, Teresa Piovesan and Giannicola Scarpa. 1308.4283

28. Bound entangled states with secret key and their classical counterpart
Maris Ozols, Graeme Smith and John A. Smolin. 1305.0848
It’s funny how bound key is a topic for quant-ph, even though it is something that is in principle purely a classical question. I think this probably is because of Charlie’s influence. (Note that this particular paper is mostly quantum.)

3a. Entanglement rates and area laws
Michaël Mariën, Karel Van Acoleyen and Frank Verstraete. 1304.5931 (This one could also be put in the condensed-matter category.)
merged with
3b. Quantum skew divergence

22. Quantum subdivision capacities and continuous-time quantum coding
Alexander Müller-Hermes, David Reeb and Michael Wolf. 1310.2856

### Quantum Algorithms

This first paper is something I tried (unsuccessfully, needless to say) to disprove for a long time. I still think that this paper contains yet-undigested clues about the difficulties of non-FT simulations.

2a. Exponential improvement in precision for Hamiltonian-evolution simulation
Dominic Berry, Richard Cleve and Rolando Somma. 1308.5424
merged with
2b. Quantum simulation of sparse Hamiltonians and continuous queries with optimal error dependence
Andrew Childs and Robin Kothari.
update: The papers appear now to be merged. The joint (five-author) paper is 1312.1414.

35. Nested quantum walk
Andrew Childs, Stacey Jeffery, Robin Kothari and Frederic Magniez.
(not sure about arxiv # – maybe this is a generalization of 1302.7316?)

### Quantum games: from Bell inequalities to Tsirelson inequalities

It is interesting how the first generation of quantum information results is about showing the power of entanglement, and now we are all trying to limit the power of entanglement. These papers are, in a sense, about toy problems. But I think the math of Tsirelson-type inequalities is going to be important in the future. For example, the monogamy bounds that I’ve recently become obsessed with can be seen as upper bounds on the entangled value of symmetrized games.

4a. Binary constraint system games and locally commutative reductions
Zhengfeng Ji. 1310.3794
merged with
4b. Characterization of binary constraint system games
Richard Cleve and Rajat Mittal. 1209.2729

20. A parallel repetition theorem for entangled projection games
Irit Dinur, David Steurer and Thomas Vidick. 1310.4113

33. Parallel repetition of entangled games with exponential decay via the superposed information cost
André Chailloux and Giannicola Scarpa. 1310.7787

### Untrusted randomness generation

Somehow self-testing has exploded! There is a lot of information theory here, but the convex geometry of conditional probability distributions also is relevant, and it will be interesting to see more connections here in the future.

5a. Self-testing quantum dice certified by an uncertainty principle
Carl Miller and Yaoyun Shi.
merged with
5b. Robust device-independent randomness amplification from any min-entropy source
Kai-Min Chung, Yaoyun Shi and Xiaodi Wu.

19. Infinite randomness expansion and amplification with a constant number of devices
Matthew Coudron and Henry Yuen. 1310.6755

29. Robust device-independent randomness amplification with few devices
Fernando Brandao, Ravishankar Ramanathan, Andrzej Grudka, Karol Horodecki, Michal Horodecki and Pawel Horodecki. 1310.4544

### The fuzzy area between quantum complexity theory, quantum algorithms and classical simulation of quantum systems. (But not Hamiltonian complexity.)

I had a bit of trouble categorizing these, and also in deciding how surprising I should find each of the results. I am also somewhat embarrassed about still not really knowing exactly what a quantum double is.

6. Quantum interactive proofs and the complexity of entanglement detection
Kevin Milner, Gus Gutoski, Patrick Hayden and Mark Wilde. 1308.5788

7. Quantum Fourier transforms and the complexity of link invariants for quantum doubles of finite groups
Hari Krovi and Alexander Russell. 1210.1550

16. Purifications of multipartite states: limitations and constructive methods
Gemma De Las Cuevas, Norbert Schuch, David Pérez-García and J. Ignacio Cirac.

### Hamiltonian complexity started as a branch of quantum complexity theory but by now has mostly devoured its host

A lot of exciting results. The poly-time algorithm for 1D Hamiltonians appears not quite ready for practice yet, but I think it is close. The Cubitt-Montanaro classification theorem brings new focus to transverse-field Ising, and to the weird world of stoquastic Hamiltonians (along which lines I think the strengths of stoquastic adiabatic evolution deserve more attention). The other papers each do more or less what we expect, but introduce a lot of technical tools that will likely see more use in the coming years.

13. A polynomial-time algorithm for the ground state of 1D gapped local Hamiltonians
Zeph Landau, Umesh Vazirani and Thomas Vidick. 1307.5143

15. Classification of the complexity of local Hamiltonian problems
Toby Cubitt and Ashley Montanaro. 1311.3161

30. Undecidability of the spectral gap
Toby Cubitt, David Pérez-García and Michael Wolf.

23. The Bose-Hubbard model is QMA-complete
Andrew M. Childs, David Gosset and Zak Webb. 1311.3297

24. Quantum 3-SAT is QMA1-complete
David Gosset and Daniel Nagaj. 1302.0290

26. Quantum locally testable codes
Dorit Aharonov and Lior Eldar. (also QECC) 1310.5664

### Codes with spatially local generators aka topological order aka quantum Kitaev theory

If a theorist is going to make some awesome contribution to building a quantum computer, it will probably be via this category. Yoshida’s paper is very exciting, although I think the big breakthroughs here were in Haah’s still underappreciated masterpiece. Kim’s work gives operational meaning to the topological entanglement entropy, a quantity I had always viewed with perhaps undeserved skepticism. It too was partially anticipated by an earlier paper, by Osborne.

8. Classical and quantum fractal code
Beni Yoshida. I think this title is a QIP-friendly rebranding of 1302.6248

21. Long-range entanglement is necessary for a topological storage of information
Isaac Kim. 1304.3925

### Bit commitment is still impossible (sorry Horace Yuen) but information-theoretic two-party crypto is alive and well

The math in this area is getting nicer, and the protocols more realistic. The most unrealistic thing about two-party crypto is probably the idea that it would ever be used, when people either don’t care about security or don’t even trust NIST not to be a tool of the NSA.

10. Entanglement sampling and applications
Frédéric Dupuis, Omar Fawzi and Stephanie Wehner. 1305.1316

36. Single-shot security for one-time memories in the isolated qubits model
Yi-Kai Liu. 1304.5007

### Communication complexity

It is interesting how quantum and classical techniques are not so far apart for many of these problems, in part because classical TCS is approaching so many problem using norms, SDPs, Banach spaces, random matrices, etc.

12. Efficient quantum protocols for XOR functions
Shengyu Zhang. 1307.6738

9. Noisy Interactive quantum communication
Gilles Brassard, Ashwin Nayak, Alain Tapp, Dave Touchette and Falk Unger. 1309.2643 (also info th / coding)

### Foundations

I hate to be a Philistine, but I wonder what disaster would befall us if there WERE a psi-epistemic model that worked. Apart from being able to prove false statements. Maybe a commenter can help?

14. No psi-epistemic model can explain the indistinguishability of quantum states
Eric Cavalcanti, Jonathan Barrett, Raymond Lal and Owen Maroney. 1310.8302

### ??? but it’s from THE RAT

update: A source on the PC says that this is an intriguing mixture of foundations and Bell inequalities, again in the “Tsirelson regime” of exploring the boundary between quantum and non-signaling.
17. Almost quantum
Miguel Navascues, Yelena Guryanova, Matty Hoban and Antonio Acín.

I love the part where Daniel promises not to cheat. Even though I am not actively researching in this area, I think the race between surface codes and concatenated codes is pretty exciting.

18. What is the overhead required for fault-tolerant quantum computation?
Daniel Gottesman. 1310.2984

27. Universal fault-tolerant quantum computation with only transversal gates and error correction
Adam Paetznick and Ben Reichardt. 1304.3709

### Once we equilibrate, we will still spend a fraction of our time discussing thermodynamics and quantum Markov chains

I love just how diverse and deep this category is. There are many specific questions that would be great to know about, and the fact that big general questions are still being solved is a sign of how far we still have to go. I enjoyed seeing the Brown-Fawzi paper solve problems that stumped me in my earlier work on the subject, and I was also impressed by the Cubitt et al paper being able to make a new and basic statement about classical Markov chains. The other two made me happy through their “the more entropies the better” approach to the world.

31. An improved Landauer Principle with finite-size corrections and applications to statistical physics
David Reeb and Michael M. Wolf. 1306.4352

32. The second laws of quantum thermodynamics
Fernando Brandao, Michal Horodecki, Jonathan Oppenheim, Nelly Ng and Stephanie Wehner. 1305.5278

34. Decoupling with random quantum circuits
Winton Brown and Omar Fawzi. 1307.0632

11. Stability of local quantum dissipative systems
Toby Cubitt, Angelo Lucia, Spyridon Michalakis and David Pérez-García. 1303.4744

## December 07, 2013

### Matt Strassler — No Comet, But Two Crescents

I’m sure you’ve all read in books that Venus is a planet that orbits the Sun and is closer to the Sun than is the Earth. But why learn from books what you can check for yourself?!?

[Note: If you missed Wednesday evening's discussion of particle physics involving me, Sean Carroll and Alan Boyle, you can listen to it here.]

As many feared, Comet ISON didn’t survive its close visit to the Sun, so there’s no reason to get up at 6 in the morning to go looking for it. [You might want to look for dim but pretty Comet Lovejoy, however, barely visible to the naked eye from dark skies.] At 6 in the evening, however, there’s good reason to be looking in the western skies — the Moon (for the next few days) and Venus (for the next few weeks) are shining brightly there.  Right now Venus is about as bright as it ever gets during its cycle.

The very best way to look at them is with binoculars, or a small telescope.  Easily with the telescope, and less easily with binoculars (you’ll need steady hands and sharp eyes, so be patient) you should be able to see that it’s not just the Moon that forms a crescent right now: Venus does too!

If you watch Venus in your binoculars or telescope over the next few weeks, you’ll see proof, with your own eyes, that Venus, like the Earth, orbits the Sun, and it does so at a distance smaller than the distance from the Sun to Earth.

The proof is simple enough, and Galileo himself provided it, by pointing his rudimentary telescope at the Sun 400 years ago, and watching Venus carefully, week by week.  What he saw was this: that when Venus was in the evening sky (every few months it moves from the evening sky to the morning sky, and then back again; it’s never in both),

• it was first rather dim, low in the sky at sunset, and nearly a disk, though a rather small one;
• then it would grow bright, larger, high in the sky at sunset, and develop a half-moon and then a crescent shape;
• and finally it would drop lower in the sky again at sunset, still rather bright, but now a thin crescent that was even larger from tip to tip than before.

The reason for this is illustrated in the figure below, taken from this post [which, although specific in some ways to the sky in February 2012, still has a number of general observations about the skies that apply at any time.]

A planet (such as Mercury or Venus) with an orbit that is smaller than Earth’s has phases like the Moon.  The portion of Venus that is lit is a half-sphere (shown in light color); the portion of Venus we can see is a different half-sphere (dashed red lines); the overlap is shaped like the wedge of an orange and looks like a crescent in the sky.  But unlike the Moon, which is at a nearly fixed distance from Earth, such a planet appears to grow and shrink during its orbit round the Sun, due to its changing distance from Earth. It is always largest when a crescent and smallest when full, and is brightest somewhere in between.

So go dig out those binoculars and telescopes, or use Venus as an excuse to buy new ones! Watch Venus, week by week, as it grows larger in the sky and becomes a thinner crescent, moving ever closer to the sunset horizon.  And a month from now the Moon, having made its orbit round the Earth, will return as a new crescent for you to admire.

Of course there’s another proof that Venus is closer to the Sun than Earth is: on very rare occasions Venus passes directly between the Earth and the Sun.  No more of those “transits” for a long time I’m afraid, but you can see pictures of last June’s transit here, and read about the great scientific value of such transits here

Filed under: Astronomy Tagged: astronomy, moon, venus

### Backreaction — Are irreproducible scientific results okay and just business as usual?

During the last years a lot of attention has been drawn to the prevalence of irreproducible results in science. That published research findings tend to weaken or vanish over time is a pressing problem in particular in some areas of the life sciences, psychology and neuroscience. On the face of it, the issue is that scientists work with too small samples and frequently cherry-pick their data. Next to involuntarily poor statistics, the blame has primarily been put on the publish-or-perish culture of modern academia.

While I blame that culture for many ills, I think here the finger is pointed at the wrong target.

Scientists aren’t interested in publishing findings that they suspect to be spurious. That they do it anyway is because a) funding agencies don’t hand out sufficient money for decent studies with large samples b) funding agencies don’t like reproduction studies because, eh, it’s been done before and c) journals don’t like to publish negative findings. The latter in particular leads scientists to actively search for effects, which creates a clear bias. It also skews meta-studies against null results.

I will not pretend that physics is immune to this problem, though in physics the issue is, forgive my language, significantly less severe.

A point in case though is the application of many different analysis methods to the same data set. Collaborations have their procedures sorted out to avoid this pitfall, but once the data is public it can be analyzed by everybody and their methods, and sooner or later somebody will find something just by chance. That’s why, every once in while we hear of a supposedly interesting peculiarity in the cosmic microwave background, you know, evidence for a bubble collision, parallel universes, a cyclic universe, a lopsided universe, an alien message, and so on. One cannot even blame them for not accounting for other researchers who are trying creative analysis methods on the same data, because that’s unknown unknowns. And theoretical papers can be irreproducible in the sense of just being wrong, but the vast majority of these just get ignored (and if not the error is often of interest in itself).

So even while the fish at my doorstep isn’t the most rotten one, I think irreproducible results are highly problematic, and I welcome measures that have been taken, eg by Nature magazine, to improve the situation.

And then there’s Jared Horvath, over at SciAm blogs, who thinks irreproducibility is okay, because it’s been done before. He lists some famous historical examples where scientists have cherry-picked their data because they had a hunch that their hypothesis is correct even if the data didn’t support it. Jared concludes:
“There is a larger lesson to be gleaned from this brief history. If replication were the gold standard of scientific progress, we would still be banging our heads against our benches trying to arrive at the precise values that Galileo reported.”
You might forgive Jared, who is a is a PhD candidate in cognitive neuroscience, for cherry picking his historical data, because he’s been trained in today’s publish-and-perish culture. Unfortunately, he’s not the only one who believes that something is okay because a few people in the past succeeded with it. Michael Brooks has written a whole book about it. In “Free Radicals: The Secret Anarchy of Science”, you can read for example
“It is the intuitive understanding, the gut feeling about what the answer should be, that marks the greatest scientists. Whether they fudge their data or not is actually immaterial.”
Possibly the book gets better after this, but I haven’t progressed beyond this page because every time I see that paragraph I want to cry.

The “gut feeling about what the answer should be” does mark great scientists, yes. It also marks pseudoscientists and crackpots, just that you don’t find these the history books. The argument that fudging data is okay because great scientists did it and time proved them right is like browsing bibliographies and concluding that in the past everybody was famous.

I’m not a historian and I cannot set that record straight, but I can tell you that the conclusion that irreproducibility is a necessary ingredient to scientific progress is unwarranted.

But I have one piece of data to make my case, a transcript of a talk given by Irwin Langmuir in the 1950s, published in Physics Today in 1989. It carries the brilliant title “Pathological Science” and describes Langmuir’s first-hand encounters with scientists who had a gut feeling about what the answer should be. I really recommend you read the whole thing (pdf here), but just for the flavor here’s an excerpt:
Mitogenic rays.
About 1923 there was a whole series of papers by Gurwitsch and others. There were hundreds of them published on mitogenic rays. There are still a few of them being published [in 1953]. I don’t know how many of you have ever heard of mitogenic rays. They are given off by growing plants, living things, and were proved, according to Gurwitsch, to be something that would go through glass but not through quarz. They seemed to be some sort of ultraviolet light… If you looked over these photographic plates that showed this ultraviolet light you found that the amount of light was not so much bigger than the natural particles of the photographic plate, so that people could have different opinions as to whether or not it showed this effect. The result was that less than half of the people who tried to repeat these experiments got any confirmation of it…”
Langmuir relates several stories of this type, all about scientists who discarded some of their data or read output to their favor. None of these scientists has left a mark in the history books. They have however done one thing. They’ve wasted their and other scientist’s time by not properly accounting for their methods.

There were hundreds of papers published on a spurious result – in 1953. Since then the scientific community has considerably grown, technology has become much more sophisticated (not to mention expensive), and scientists have become increasingly specialized. For most research findings, there are very few scientists who are able to conduct a reproduction study, even leaving aside the problems with funding and publishing. In 2013, scientists have to rely on their colleagues much more than was the case 60 years ago, and certainly in the days of Millikan and Galileo. The harm being caused by cherry picked data and non-reported ‘post-selection’ (a euphemism for cherry-picking), in terms of waste of time has increase with the community. Heck, there were dozens of researchers who wasted time (and thus their employers money...) on ‘superluminal neutrinos’ even though everybody knew these results to be irreproducible (in the sense that they hadn’t been found by any previous measurements).

Worse, this fallacious argument signals a basic misunderstanding about how science works.

The argument is based on the premise that if a scientific finding is correct, it doesn’t matter where it came from or how it was found. That is then taken to justify the ignorance of any scientific method (and frequently attributed to Feyerabend). It is correct in that in the end it doesn’t matter exactly how a truth about nature was revealed. But we do not speak of a scientific method to say that there is only one way to make progress. The scientific method is used to increase the chances of progress. It’s the difference between letting the proverbial monkey hammer away and hiring a professional science writer for your magazine’s blog. Yes, the monkey can produce a decent blogpost, and if that is so then that is so. But chances are eternal inflation will end before you get to see a good result. That’s why scientists have quality control and publishing ethics, why we have peer review and letters of recommendation, why we speak about statistical significance and double-blind studies and reproducible results: Not because in the absence of methods nothing good can happen, but because these methods have proven useful to prevent us from fooling ourselves and thereby make success considerably more likely.

Having said that, expert intuition can be extremely useful and there is nothing wrong with voicing a “gut feeling” as long as it is marked as such. It is unfortunate indeed that the present academic system does not give much space for scientists to express their intuition, or maybe they are shying away from it. But that’s a different story and shell be told another time.

So the answer to the question posed in the title is a clear no. The question is not whether science has progressed despite the dishonest methods that have been employed in the past, but how much better if would have progressed if that had not been so.

----

I stole that awesome gif from over here. I don't know its original source.

### Andrew Jaffe — Teaching mistakes

The academic year has begun, and I’m teaching our second-year Quantum Mechanics course again. I was pretty happy with last year’s version, and the students didn’t completely disagree.

This year, there have been a few changes to the structure of the course — although not as much to the content as I might have liked (“if it ain’t broke, don’t fix it”, although I’d still love to use more of the elegant Dirac notation and perhaps discuss quantum information a bit more). We’ve moved some of the material to the first year, so the students should already come into the course with at least some exposure to the famous Schrödinger Equation which describes the evolution of the quantum wave function. But of course all lecturers treat this material slightly differently, so I’ve tried to revisit some of that material in my own language, although perhaps a bit too quickly.

Perhaps more importantly, we’ve also changed the tutorial system. We used to attempt an imperfect rendition of the Oxbridge small-group tutorial system, but we’ve moved to something with larger groups and (we hope) a more consistent presentation of the material. We’re only on the second term with this new system, so the jury is still out, both in terms of the students’ reactions, and our own. Perhaps surprisingly, they do like the fact that there is more assessed (i.e., explicitly graded, counting towards the final mark in the course) material — coming from the US system, I would like to see yet more of this, while those brought up on the UK system prefer the final exam to carry most (ideally all!) the weight.

So far I’ve given three lectures, including a last-minute swap yesterday. The first lecture — mostly content-free — went pretty well, but I’m not too happy with my performance on the last two: I’ve made a mistake in each of the last two lectures. I’ve heard people say that the students don’t mind a few (corrected) mistakes; it humanises the teachers. But I suspect that the students would, on the whole, prefer less-human, more perfect, lecturing…

Yesterday, we were talking about a particle trapped in a finite potential well — that is, a particle confined to be in a box, but (because of the weirdness of quantum mechanics) with some probability of being found outside. That probability depends upon the energy of the particle, and because of the details of the way I defined that energy (starting at a negative number, instead of the more natural value of zero), I got confused about the signs of some of the quantities I was dealing with. I explained the concepts (I think) completely correctly, but with mistakes in the math behind them, the students (and me) got confused about the details. But many, many thanks to the students who kept pressing me on the issue and helped us puzzle out the problems.

Today’s mistake was less conceptual, but no less annoying — I wrote (and said) “cotangent” when I meant “tangent” (and vice versa). In my notes, this was all completely correct, but when you’re standing up in front of 200 or so students, sometimes you miss the detail on the page in front of you. Again, this was in some sense just a mathematical detail, but (as we always stress) without the right math, you can’t really understand the concepts. So, thanks to the students who saw that I was making a mistake, and my apologies to the whole class.

### Geraint F. Lewis — The large-scale structure of the halo of the Andromeda Galaxy Part I: global stellar density, morphology and metallicity properties

And now for a major paper from the Pan-Andromeda Archaeological Survey (PAndAS), led by astronomer-extraordinare Rodrigo Ibata. I've written a lot about PAndAS over the years (or maybe a year and a bit I've been blogging here) and we've discovered an awful lot, but one of the key things we wanted to do is measure the size and shape of the stellar halo of the Andromeda Galaxy.

The stellar halo is an interesting place. It's basically made up of the first generation of stars that formed in the dark matter halo in which the spiral galaxy of Andromeda was born, and the properties of the halo are a measure of the  formation history of the galaxy, something we can directly compare to our theoretical models.

But there is always a spanner in the works, and in this case it is the fact that Andromeda, like the Milky Way, is a cannibal and has been eating smaller galaxies. These little galaxies get ripped apart by the gravitational pull of Andromeda, and their stars litter the halo in streams and clumps. As we've seen before, Andromeda has a lot of this debris scattered all over the place.

So, we are left with a problem, namely how do we see the stellar halo, which is quite diffuse and faint, underneath the prominent substructure? This is where this paper comes in.

Well, the first thing is to collect the data, and that's where PAndAS comes in. The below picture confirms just how big the PAndAS survey is, and just how long it took us to get data.
It always amazes me how small the spiral disk of Andromeda is compared to the area we surveyed, but that's what we need to see the stellar halo which should be a couple of hundred kiloparsecs in extent.

Taking the data is only the first step. The next step, the calibration of the data, was, well, painful. I won't go into the detail here, but if you are going to look for faint things, you really need to understand your data at the limit, to understand what's a star, what's a galaxy, what's just noise. There are lots of things you need to consider to make sure the data is nice, uniform and calibrated. But that's what we did :)

Once you've done that, we can ask where the stars are. And here they are;
As you can see, chunks and bumps everywhere, all the dinner leftovers of the cannibal Andromeda. And all of that stuff is in the way of finding the halo!

What do we do? We have to mask out the substructure and search for the underlaying halo. We are in luck, however, as we don't have one map of substructure, we have a few of them. Why? Well, I've written about this before, but the stars in the substructure come from different sized objects, and so them chemical history will be different; in little systems, the heavy elements produced in supernova explosions are not held by their gravitational pull, and so they can be relatively "metal poor", but in larger systems the gas can't escape and gets mixed into the next generation of stars, making them more
"metal-rich".

So, here's our masks as a function of the iron abundance compared to hydrogen.
We see that the giant stream is more metal rich, but as we go to metal poor we see the more extensive substructure, including the South West Cloud.

What do we find? Well, we see the halo (horrah!) and it does what it should - it is brightest near the main body of Andromeda, but gets fainter and fainter towards the edge. Here's a picture of the profile:
It's hard to explain just how faint the halo is, but it is big, basically stretching out to the edge of our PAndAS data, and then beyond, and looks like it accounts for roughly 10% of the stellar mass in Andromeda. It is not inconsequential!

But as we started out noting, its properties provide important clues to the very process of galaxy formation. And it appears that it looks like we would expect from our models of structure formation, with large galaxies being built over time through the accretion of smaller systems.

We're working on a few new tests of the halo, and should hopefully have some more results soon. But for now,  well done Rod!

# The large-scale structure of the halo of the Andromeda Galaxy Part I: global stellar density, morphology and metallicity properties

We present an analysis of the large-scale structure of the halo of the Andromeda galaxy, based on the Pan-Andromeda Archeological Survey (PAndAS), currently the most complete map of resolved stellar populations in any galactic halo. Despite copious substructure, the global halo populations follow closely power law profiles that become steeper with increasing metallicity. We divide the sample into stream-like populations and a smooth halo component. Fitting a three-dimensional halo model reveals that the most metal-poor populations ([Fe/H]<-1.7) are distributed approximately spherically (slightly prolate with ellipticity c/a=1.09+/-0.03), with only a relatively small fraction (42%) residing in discernible stream-like structures. The sphericity of the ancient smooth component strongly hints that the dark matter halo is also approximately spherical. More metal-rich populations contain higher fractions of stars in streams (86% for [Fe/H]>-0.6). The space density of the smooth metal-poor component has a global power-law slope of -3.08+/-0.07, and a non-parametric fit shows that the slope remains nearly constant from 30kpc to 300kpc. The total stellar mass in the halo at distances beyond 2 degrees is 1.1x10^10 Solar masses, while that of the smooth component is 3x10^9 Solar masses. Extrapolating into the inner galaxy, the total stellar mass of the smooth halo is plausibly 8x10^9 Solar masses. We detect a substantial metallicity gradient, which declines from [Fe/H]=-0.7 at R=30kpc to [Fe/H]=-1.5 at R=150kpc for the full sample, with the smooth halo being 0.2dex more metal poor than the full sample at each radius. While qualitatively in-line with expectations from cosmological simulations, these observations are of great importance as they provide a prototype template that such simulations must now be able to reproduce in quantitative detail.

## December 06, 2013

### Quantum Diaries — Has there ever been a paradigm shift?

Yes, once!

Paradigm and paradigm shift are so over used and misused that the world would benefit if they were simply banned.  Originally Thomas Kuhn (1922–1996) in his 1962 book, The Structure of Scientific Revolutions, used the word paradigm to refer to the set of practices that define a scientific discipline at any particular period of time. A paradigm shift is when the entire structure of a field changes, not when someone simply uses a different mathematical formulation. Perhaps it is just grandiosity, everyone thinking their latest idea is earth shaking (or paradigm shifting), but the idea has been so debased that almost any change is called a paradigm shift, down to level of changing the color of ones socks.

The archetypal example, and I would suggest the only real example in the natural and physical sciences, is the paradigm shift from Aristotelian to Newtonian physics. This was not just a change in physics from the perfect motion is circular to an object either is at rest or moves at a constant velocity, unless acted upon by an external force but a change in how knowledge is defined and acquired. There is more here than a different description of motion; the very concept of what is important has changed. In Newtonian physics there is no place for perfect motion but only rules to describe how objects actually behave. Newtonian physics was driven by observation. Newton, himself, went further and claimed his results were derived from observation. While Aristotelian physics is broadly consistent with observation it is driven more by abstract concepts like perfection.  Aristotle (384 BCE – 322 BCE) would most likely have considered Galileo Galilei’s (1564 – 1642) careful experiments beneath him.  Socrates (c. 469 BC – 399 BC) certainly would have. Their epistemology was not based on careful observation.

While there have been major changes in the physical sciences since Newton, they do not reach the threshold needed to call them a paradigm shifts since they are all within the paradigm defined by the scientific method. I would suggest Kuhn was misled by the Aristotle-Newton example where, indeed, the two approaches are incommensurate: What constitutes a reasonable explanation is simply different for the two men. But would the same be true with Michael Faraday (1791 – 1867) and Niels Bohr (1885–1962) who were chronologically on opposite sides of the quantum mechanics cataclysm?  One could easily imagine Faraday, transported in time, having a fruitful discussion with Bohr. While the quantum revolution was indeed cataclysmic, changing mankind’s basic understanding of how the universe worked, it was based on the same concept of knowledge as Newtonian physics. You make models based on observations and validate them through testable predictions.  The pre-cataclysmic scientists understood the need for change due to failed predictions, even if, like Albert Einstein (1879 – 1955) or Erwin Schrödinger (1887 – 1961), they found quantum mechanics repugnant. The phenomenology was too powerful to ignore.

Sir Karl Popper (1902 – 1994) provided another ingredients missed by Kuhn, the idea that science advances by the bold new hypothesis, not by deducing models from observation. The Bohr model of the atom was a bold hypothesis not a paradigm shift, a bold hypothesis refined by other scientists and tested in the crucible of careful observation. I would also suggest that Kuhn did not understand the role of simplicity in making scientific models unique. It is true that one can always make an old model agree with past observations by making it more complex[1]. This process frequently has the side effect of reducing the old models ability to make predictions. It is to remedy these problems that a bold new hypothesis is needed. But to be successful, the bold new hypothesis should be simpler than the modified version of the original model and more crucially must make testable predictions that are confirmed by observation. But even then, it is not a paradigm shift; just a verified bold new hypothesis.

Despite the nay-saying, Kuhn’s ideas did advance the understanding of the scientific method. In particular, it was a good antidote to the logical positivists who wanted to eliminate the role of the model or what Kuhn called the paradigm altogether. Kuhn made the point that is the framework that gives meaning to observations. Combined with Popper’s insights, Kuhn’s ideas paved the way for a fairly comprehensive understanding of the scientific method.

But back to the overused word paradigm, it would be nice if we could turn back the clock and restrict the term paradigm shift to those changes where the before and after are truly incommensurate; where there is no common ground to decide which is better. Or if you like, the demarcation criteria for a paradigm shift is that the before and after are incommensurate[2]. That would rule out the change of sock color from being a paradigm shift. However, we cannot turn back the clock so I will go back to my first suggestion that the word be banned.

[1] This is known as the Duhem-Quine thesis.

[2] There are probably paradigm shifts, even in the restricted meaning of the word, if we go outside science. The French revolution could be considered a paradigm shift in the relation between the populace and the state.

### David Hogg — tidal disruption and exoplanets around nearby stars

Ramirez-Ruiz (Santa Cruz) gave a morning talk on tidal disruption flares. He finds that for every fully disrupted star (by a central black hole in a galaxy) there should be many partially disrupted stars, and we should be able to find the remnants. The remnants should look odd in various ways, and be on odd orbits. Worth looking for!

In the afternoon, Johnson (Harvard) talked about exoplanets around cool stars, with some diversions into interesting false positives (a white dwarf eclipsing an M star, providing a gravitational-lensing measurement of the white dwarf mass) and new hardware (the MINERVA project is building an exoplanet search out of cheap hardware). Johnson gave various motivations for his work, but not the least was the idea that someday we might go to one of these planets! A great pair of talks. Late in the afternoon, Johnson and I collected some suggestions for combining our research groups and projects.

### David Hogg — exoplanet projects

John Asher Johnson (Harvard) and Ben Montet (Caltech) are secretly visiting NYU today and tomorrow. We spent a long session today talking about overlapping interests. Montet showed results on the population of long-period planets, where they have radial velocity trends (not period signals but trends) and adaptive-optics imaging upper limits (that is, no detections). What can you conclude when you have no period and no direct detection? A lot, it turns out, because the trends sets a relationship between mass, inclination, and period, and the adaptive optics rules out a large class of binary-star models.

In related news, Johnson was excited about the methods Fadely and I are using to infer the HST pixel-convolved point-spread function. It is very related to methods he wants to use to infer the line-spread function in the HiRes data he has on exoplanets. He was particularly impressed by our smoothness priors that regularize very flexible model fits without breaking convexity.

### Chad Orzel — Cute Kid Composite

I’m giving my talk about blogs as a tool for science outreach again on Monday, and need an updated version of the cute-kid picture screenshot I use in that. So here’s a composite of the two kid pictures I posted a week or two ago, because it should all fit on screen, even on my laptop.

I probably should do something more directly productive, but I drove down to NYC yesterday to do an interview and meet up with some folks from the TED@NYC event back in October, and got stuck there because the attendants at the lot where I parked my car didn’t warn me that they would lock the place up before I got back. I drove back up this morning, which was unduly irritating, and has thrown my whole day off. So low-impact stuff like slide updating and CV formatting are the order of the afternoon.

### Sean Carroll — The Spark in the Park

A few years ago, not long after we moved to LA, Jennifer and I got a call from some of the writers on the TV series BONES. There’s already a science component to the show, which features brainy forensic anthropologist Brennan (Emily Deschanel) and her team of lab mates working with fiery FBI agent Booth (David Boreanaz) to solve crimes, most of which involve skeletons and physical evidence in some crucial way. This time they needed some physics input, as they wanted the murderer to be a researcher who used their physics expertise to carry out the crime, and were looking for unusual but realistic ideas. We were able to provide some crucial sociological advice (no, professional research scientists probably wouldn’t meet at a Mensa conference) and consulted with experimentalist friends who would know how to use radioactive substances in potentially lethal ways. I won’t say who, exactly, but when the episode aired they ended up calling the research institute the Collar Lab.

Apparently physicists are a suspiciously violent bunch, because tonight’s episode features another scientist suspect, this time played by Richard Schiff of West Wing fame. I got a chance to consult once again, and this time contributed something a bit more tangible to the set: a collection of blackboards in the physicist’s office. (Which, as in all Hollywood conceptions, is a lot more spacious and ornate than any real physicist’s office I’ve ever seen.) You can see the actual work tonight (8pm ET/PT on Fox), but here’s one that I made up that they didn’t end up using.

It does look like our professor is a theoretical cosmologist of some sort, doesn’t it? The equations here will be familiar to anyone who has carefully read “Dynamical Compactification from de Sitter Space.” The boards that actually will appear on the show are taken mostly from “Attractor Solutions in Scalar-Field Cosmology” and “A Consistent Effective Theory of Long-Wavelength Cosmological Perturbations.” Hey, if I’m going to write down a bunch of equations, they might as well be my equations, right?

But I actually got to be a little more than just a technical scribe. (Although that’s not an unimportant role — not only are the equations themselves gibberish to non-experts, it’s difficult for someone who isn’t familiar with the notation to even accurately transcribe the individual symbols.) No spoilers, but the equation-laden blackboards actually play a prominent role in a scene that appears late in the episode, so I was able to provide an infinitesimally tiny amount of creative input. And the scene itself (the overall conception of which belongs to writers Emily Silver and Stephen Nathan) packs quite an emotional wallop, something not typically associated with a series of equations. I haven’t seen the finished episode yet, but it was a great experience to actually be present on set during filming and watch the sausage being made.

### Quantum Diaries — One giant leap for the Higgs boson

Both the ATLAS and CMS collaborations at CERN have now shown solid evidence that the new particle discovered in July 2012 behaves even more like the Higgs boson, by establishing that it also decays into particles known as tau leptons, a very heavy version of electrons.

Why is this so important? CMS and ATLAS had already established that the new boson was indeed one type of a Higgs boson. In that case, theory predicted it should decay into several types of particles. So far, decays into W and Z bosons as well as photons were well established. Now, for the first time, both experiments have evidence that it also decays into tau leptons.

The decay of a particle is very much like making change for a coin. If the Higgs boson were a one euro coin, there would be several ways to break it up into smaller coins, but, so far, the change machine seemed to only make change in some particular ways. Now, additional evidence for one more way has been shown.

There are two classes of fundamental particles, called fermions and bosons depending on their spin, their value of angular momentum. Particles of matter (like taus, electrons and quarks) belong to the fermion family. On the other hand, the particles associated with the various forces acting upon these fermions are bosons (like the photons and the W and Z bosons.).

The CMS experiment had already shown evidence for Higgs boson decays into fermions last summer with a signal of 3.4 sigma when combining the tau and b quark channels. A sigma corresponds to one standard deviation, the size of potential statistical fluctuations.  Three sigma is needed to claim evidence while five sigma is usually required for a discovery.

For the first time, there is now solid evidence from a single channel – and two experiments have independently produced it. ATLAS collaboration showed evidence for the tau channel alone with a signal of 4.1 sigma, while CMS obtained 3.4 sigma, both bringing strong evidence that this particular type of decays occurs.

Combining their most recent results for taus and b quarks, CMS now has evidence for decays into fermions at the 4.0 sigma level.

The data collected by the ATLAS experiment (black dots) are consistent with coming from the sum of all backgrounds (colour histograms) plus contributions from a Higgs boson going into a pair of tau leptons (red curve). Below, the background is subtracted from the data to reveal the most likely mass of the Higgs boson, namely 125 GeV (red curve).

CMS is also starting to see decays into pairs of b quarks at the 2.0 sigma-level. While this is still not very significant, it is the first indication for this decay so far at the LHC. The Tevatron experiments have reported seeing it at the 2.8 sigma-level. Although the Higgs boson decays into b quarks about 60% of the time, it comes with so much background that it makes it extremely difficult to measure this particular decay at the LHC.

Not only did the experiments report evidence that the Higgs boson decays into tau leptons, but they also measured how often this occurs. The Standard Model, the theory that describes just about everything observed so far in particle physics, states that a Higgs boson should decay into a pair of tau leptons about 8% of the time. CMS measured a value corresponding to 0.87 ± 0.29 times this rate, i.e. a value compatible with 1.0 as expected for the Standard Model Higgs boson. ATLAS obtained 1.4 +0.5 -0.4, which is also consistent within errors with the predicted value of 1.0.

One of the events collected by the CMS collaboration having the characteristics expected from the decay of the Standard Model Higgs boson to a pair of tau leptons. One of the taus decays to a muon (red line) and neutrinos (not visible in the detector), while the other tau decays into a charged hadron (blue towers) and a neutrino. There are also two forward-going particle jets (green towers).

With these new results, the experiments established one more property that was expected for the Standard Model Higgs boson. What remains to be clarified is the exact type of Higgs boson we are dealing with. Is this indeed the simplest one associated with the Standard Model? Or have we uncovered another type of Higgs boson, the lightest one of the five types of Higgs bosons predicted by another theory called supersymmetry.

It is still too early to dismiss the second hypothesis. While the Higgs boson is behaving so far exactly like what is expected for the Standard Model Higgs boson, the measurements lack the precision needed to rule out that it cannot be a supersymmetric type of Higgs boson. Getting a definite answer on this will require more data. This will happen once the Large Hadron Collider (LHC) resumes operation at nearly twice the current energy in 2015 after the current shutdown needed for maintenance and upgrade.

Meanwhile, these new results will be refined and finalised. But already they represent one small step for the experiments, a giant leap for the Higgs boson.

For all the details, see:

Presentation given by the ATLAS Collaboration on 28 November 2013

Presentation given by the CMS Collaboration on 3 December 2013

Pauline Gagnon

### Quantum Diaries — Un pas de géant pour le boson de Higgs

Les collaborations ATLAS et CMS du CERN ont maintenant l’évidence que la nouvelle particule découverte en juillet 2012 se comporte de plus en plus comme le boson de Higgs. Les deux expériences viennent en fait de démontrer que le boson de Higgs se désintègre aussi en particules tau, des particules semblables aux électrons mais beaucoup plus lourdes.

Pourquoi est-ce si important? CMS et l’ATLAS avaient déjà établi que ce nouveau boson était bien un type de boson de Higgs. Si tel est le cas, la théorie prévoit qu’il doit se désintégrer en plusieurs types de particules. Jusqu’ici, seules les désintégrations en bosons W et Z de même qu’en photons étaient confirmées. Pour la première fois, les deux expériences ont maintenant la preuve qu’il se désintègre aussi en particules tau.

La désintégration d’une particule s’apparente beaucoup à faire de la monnaie pour une pièce. Si le boson de Higgs était une pièce d’un euro, il pourrait se briser en différentes pièces de monnaie plus petites. Jusqu’à présent, le distributeur de monnaie semblait seulement donner la monnaie en quelques façons particulières. On a maintenant démontré qu‘il existe une façon supplémentaire.

Il y a deux classes de particules fondamentales, appelées fermions et bosons selon la valeur de quantité de mouvement angulaire. Les particules de matière comme les taus, les électrons et les quarks appartiennent tous à la famille des fermions. Par contre, les particules associées aux diverses forces qui agissent sur ces fermions sont des bosons, comme les photons et les bosons W et Z.

L”été dernier, l’expérience CMS avait déjà apporté la preuve avec un signal de 3.4 sigma que le boson de Higgs se désintégrait en fermions en combinant leurs résultats pour deux types de fermions, les taus et les quarks b. Un sigma correspond à un écart-type, la taille des fluctuations statistiques potentielles. Trois sigma sont nécessaires pour revendiquer une évidence tandis que cinq sigma sont nécessaires pour clamer une découverte.

Pour la première fois, il y a maintenant évidence pour un nouveau canal de désintégration (les taus) – et deux expériences l’ont produit indépendamment. La collaboration ATLAS a montré la preuve pour le canal des taus avec un signal de 4.1 sigma, tandis que CMS a obtenu 3.4 sigma, deux résultats forts prouvant que ce type de désintégrations se produit effectivement.

En combinant leurs résultats les plus récents pour les taus et les quarks b, CMS a maintenant une évidence pour des désintégrations en fermions avec 4.0 sigma.

Les données rassemblées par l’expérience ATLAS (les points noirs) sont en accord avec la somme de tous les évènements venant du bruit de fond (histogrammes en couleur) en plus des contributions venant d’un boson de Higgs se désintégrant en une paire de taus (la ligne rouge). En dessous, le bruit de fond est soustrait des données pour révéler la masse la plus probable du boson de Higgs, à savoir 125 GeV (la courbe rouge).

CMS commence aussi à voir des désintégrations en paires de quarks b avec un signal de 2.0 sigma. Bien que ceci ne soit toujours pas très significatif, c’est la première indication pour cette désintégration jusqu’ici au Grand collisionneur de hadrons (LHC). Les expériences du Tevatron avaient rapporté l’observation de telles désintégrations à 2.8 sigma. Bien que le boson de Higgs se désintègre en quarks b environ 60 % du temps, il y a tant de bruit de fond qu’il est extrêmement difficile de mesurer ces désintégrations au LHC.

Non seulement les expériences ont la preuve que le boson de Higgs se désintègre en paires de taus, mais elles mesurent aussi combien de fois ceci arrive. Le Modèle Standard, la théorie qui décrit à peu près tout ce qui a été observé jusqu’à maintenant en physique des particules, stipule qu’un boson de Higgs devrait se désintégrer en une paire de taus environ 8 % du temps. CMS a mesuré une valeur correspondant à 0.87 ± 0.29 fois ce taux, c’est-à-dire une valeur compatible avec 1.0 comme prévu pour le boson de Higgs du Modèle Standard. ATLAS obtient 1.4 +0.5-0.4, ce qui est aussi consistent avec la valeur de 1.0 à l‘intérieur des marges d’erreur.

Un des événements captés par la collaboration CMS ayant les caractéristiques attendues pour les désintégrations du boson de Higgs du Modèle Standard en une paire de taus. Un des taus se désintègre en un muon (ligne rouge) et en neutrinos (non visibles dans le détecteur), tandis que l’autre tau se désintègre en  hadrons (particules composées de quarks) (tours bleues) et un neutrino. Il y a aussi deux jets de particules vers l’avant (tours vertes).

Avec ces nouveaux résultats, les expériences ont établi une propriété de plus prédite pour le boson de Higgs du Modèle Standard. Reste encore à clarifier le type exact de boson de Higgs que nous avons. Est-ce bien le plus simple des bosons, celui associé au Modèle Standard? Ou avons nous découvert un autre type de boson de Higgs, le plus léger des cinq bosons de Higgs prévus par une autre théorie appelée la supersymétrie.

Il est encore trop tôt pour écarter cette deuxième hypothèse. Tandis que le boson de Higgs se comporte jusqu’ici exactement comme ce à quoi on s’attend pour le boson de Higgs du Modèle Standard, les mesures manquent encore de précision pour exclure qu’il soit de type supersymétrique. Une réponse définitive exige plus de données. Ceci arrivera une fois que le LHC reprendra du service à presque deux fois l’énergie actuelle en 2015 après l’arrêt actuel pour maintenance et consolidation.

En attendant, ces nouveaux résultats seront affinés et finalisés. Déjà ils représentent un petit pas pour les expériences et un bond de géant pour le boson de Higgs.

Pour tous les détails (en anglais seulement)

Présentation donnée par la collaboration ATLAS le 28 novembre 2013

Présentation donnée par la collaboration CMS le 3 décembre 2013

Pauline Gagnon

Pour être averti-e lors de la parution de nouveaux blogs, suivez-moi sur Twitter: @GagnonPauline ou par e-mail en ajoutant votre nom à cette liste de distribution.

### John Baez — Rolling Hypocycloids

The hypocycloid with n cusps is the curve traced out by a point on a circle rolling inside a circle whose radius is n times larger.

The hypocycloid with 2 cusps is sort of strange:

It’s just a line segment! It’s called the Tusi couple.

The hypocycloid with 3 cusps is called the deltoid, because it’s shaped like the Greek letter Δ:

The hypocycloid with 4 cusps is called the astroid, because it reminds people of a star:

I got interested in hypocycloids while writing some articles here:

My goal was to explain a paper John Huerta and I wrote about the truly amazing things that happen when you roll a ball on a ball 3 times as big. But I got into some interesting digressions.

While pondering hypocycloids in the ensuing discussion, the mathematician, physicist, programmer and science fiction author Greg Egan created this intriguing movie:

It’s a deltoid rolling inside an astroid. It fits in a perfectly snug way, with all three corners touching the astroid at all times!

Why does it work? It’s related to some things that physicists like: SU(3) and SU(4).

SU(n) is the group of n × n unitary matrices with determinant 1. Physicists like Pauli and Heisenberg got interested in SU(2) when they realized it describes the rotational symmetries of an electron. You need it to understand the spin of an electron. Later, Gell-Mann got interested in SU(3) because he thought it described the symmetries of quarks. He won a Nobel prize for predicting a new particle based on this theory, which was then discovered.

We now know that SU(3) does describe symmetries of quarks, but not in the way Gell-Mann thought. It turns out that quarks come in 3 ‘colors’—not real colors, but jokingly called red, green and blue. Similarly, electrons come in 2 different spin states, called up and down. Matrices in SU(3) can change the color of a quark, just as matrices in SU(2) can switch an electron’s spin from up to down, or some mix of up and down.

SU(4) would be important in physics if quarks came in 4 colors. In fact there’s a theory saying they do, with electrons and neutrinos being examples of quarks in their 4th color state! It’s called the Pati–Salam theory, and you can see an explanation here:

• John Baez and John Huerta, The algebra of grand unified theories.

It’s a lot of fun, because it unifies leptons (particles like electrons and neutrinos) and quarks. There’s even a chance that it’s true. But it’s not very popular these days, because it has some problems. It predicts that protons decay, which we haven’t seen happen yet.

Anyway, the math of SU(3) and SU(4) is perfectly well understood regardless of their applications to physics. And here’s the cool part:

If you take a matrix in SU(3) and add up its diagonal entries, you can get any number in the complex plane that lies inside a deltoid. If you take a matrix in SU(4) and add up its diagonal entries, you can get any number in the complex plane that lies inside an astroid. And using how SU(3) fits inside SU(4), you can show that a deltoid moves snugly inside an astroid!

The deltoid looks like it’s rolling, hence the title of this article. But Egan pointed out that it’s not truly ‘roll’ in the sense of mechanics—it slides a bit as it rolls.

But the really cool part—the new thing I want to show you today—is that this pattern continues!

For example, an astroid moves snugly inside a 5-pointed shape, thanks to how SU(4) sits inside SU(5). Here’s a movie of that, again made by Greg Egan:

In general, a hypocycloid with n cusps moves snugly inside a hypocycloid with n + 1 cusps. But this implies that you can have a hypocycloid with 2 cusps moves inside one with 3 cusps moving inside one with 4 cusps, etcetera! I’ve been wanting to see this for a while, but yesterday Egan created some movies showing this:

Depending on the details, you can get different patterns:

Egan explains:

In the first movie, every n-cusped hypocycloid moves inside the enclosing (n+1)-cusped hypocycloid at the same angular velocity and in the same direction, relative to the enclosing one … which is itself rotating, except for the very largest one. So the cumulative rotational velocity is highest for the line, lower for the deltoid, lower still for the astroid, in equal steps until you hit zero for the outermost hypocycloid.

In the second movie, the magnitude of the relative angular velocity is always the same between each hypocycloid and the one that encloses it, but the direction alternates as you go up each level.

Here’s one where he went up to a hypocycloid with 10 cusps:

Note how sometimes all the cusps meet!

I wonder, can a deltoid roll in a 5-hypocycloid? I haven’t worked through the math of just how the SU(n) → SU(n+1) embedding guarantees a fit here.

(And also, are there corresponding pictures we could draw to illustrate the embeddings of other Lie groups? This could be a lovely way to illustrate a wide range of relationships if it could be done more generally.)﻿

Egan figured out that yes, a deltoid can move nicely in a hypocycloid with 5 cusps:

He writes:

To make this, I took the border of a deltoid in “standard position” (as per the trace theorem) to be:

$\mathrm{deltoid}(t) = 2 \exp(i t) + \exp(-2 i t)$

Suppose I apply the linear function:

$f(s, z) = \exp(-2 i s / 5) z + 2 \exp(3 i s / 5)$

to the whole deltoid. Then at the point s=t on the deltoid, we have:

$f(s, \mathrm{deltoid}(s)) = 4 \exp(3 i s / 5) + \exp(-4 (3 i s / 5))$

which is a point on a 5-cusped hypocycloid in “standard position”. Of course with this choice of parameters you need to take $s$ from $0$ through to $10 \pi/3,$ not $2 \pi,$ to complete a cycle.

Using the same trick, you can get a hypocyloid with n cusps to move inside one with m cusps whenever n ≤ m. As for the more general question of how different Lie groups give different ‘generalized hypocycloids’, and how fitting one Lie group in another lets you roll one generalized hypocycloid in another, the current state of the art is here:

But there is still more left to do!

On a somewhat related note, check out this fractal obtained by rolling a circle inside a circle 4 times as big that’s rolling in a circle 4 times as big… and so on: astroidae ad infinitum!

## December 05, 2013

### Clifford Johnson — Goodbye, Nelson Mandela

Rest in peace… but let your legacy and the lessons of your actions and words forever stay alive and working in our societies worldwide. -cvj Click to continue reading this post

### John Baez — Talk at the SETI Institute

SETI means ‘Search for Extraterrestrial Intelligence’. I’m giving a talk at the SETI Institute on Tuesday December 17th, from noon to 1 pm. You can watch it live, watch it later on their YouTube channel, or actually go there and see it. It’s free, and you can just walk in at 189 San Bernardo Avenue in Mountain View, California, but please register if you can.

#### Life’s Struggle to Survive

When pondering the number of extraterrestrial civilizations, it is worth noting that even after it got started, the success of life on Earth was not a foregone conclusion. We recount some thrilling episodes from the history of our planet, some well-documented but others merely theorized: our collision with the planet Theia, the oxygen catastrophe, the snowball Earth events, the Permian-Triassic mass extinction event, the asteroid that hit Chicxulub, and more, including the global warming episode we are causing now. All of these hold lessons for what may happen on other planets.

If you know interesting things about these or other ‘close calls’, please tell me! I’m still preparing my talk, and there’s room for more fun facts. I’ll make my slides available when they’re ready.

The SETI Institute looks like an interesting place, and my host, Adrian Brown, is an expert on the poles of Mars. I’ve been fascinated about the water there, and I’ll definitely ask him about this paper:

• Adrian J. Brown, Shane Byrne, Livio L. Tornabene and Ted Roush, Louth crater: Evolution of a layered water ice mound, Icarus 196 (2008), 433–445.

Louth Crater is a fascinating place. Here’s a photo:

By the way, I’ll be in Berkeley from December 14th to 21st, except for a day trip down to Mountain View for this talk. I’ll be at the Machine Intelligence Research Institute talking to Eliezer Yudkowsky, Paul Christiano and others at a Workshop on Probability, Logic and Reflection. This invitation arose from my blog post here:

If you’re in Berkeley and you want to talk, drop me a line. I may be too busy, but I may not.

### Mark Chu-Carroll — Basics: What is an OS?

A reader of this blog apparently likes the way I explain things, and wrote to me to ask a question: what is an operating system? And how does a computer know how to load it?

I'm going to answer that, but I'm going to do it in a roundabout way. The usual answer is something like: "An operating system or OS is a software program that enables the computer hardware to communicate and operate with the computer software." In my opinion, that's a cop-out: it doesn't really answer anything. I'm going to take a somewhat roundabout approach, but hopefully give you an answer that actually explains things in more detail, which should help you understand it better.

When someone like me sets out to write a program, how can we do it? That sounds like an odd question, until you actually think about it. The core of the computer, the CPU, is a device which really can't do very much. It's a self-contained unit which can do lots of interesting mathematical and logical operations, but they all happen completely inside the CPU (how they happen inside the CPU is way beyond this post!). To get stuff in and out of the CPU, the only thing that the computer can do is read and write values from the computer's memory. That's really it.

So how do I get a program in to the computer? The computer can only read the program if it's in the computer's memory. And every way that I can get it into the memory involves the CPU!

Computers are built so that there are certain memory locations and operations that are used to interact with the outside world. They also have signal wires called interrupt pins where other devices (like disk drives) can apply a current to say "Hey, I've got something for you". The exact mechanics are, of course, complicated, and vary from CPU to CPU. But to give you an idea of what it's like, to read some data from disk, you'd do something like the following.

1. Set aside a chunk of memory where the data should be stored after it's read. This is called a buffer.
2. Figure out where the data you want to read is stored on the disk. You can identify disk locations as a number. (It's usually a bit more complicated than that, but we're trying to keep this simple.
3. Write that number into a special memory location that's monitored by the disk drive controller.
4. Wait until the disk controller signals you via an interrupt that the data is ready. The data will be stored in a special memory location, that can be altered by the disk. (Simplifying again, but this is sometimes called a DMA buffer.)
5. Copy the data from the controller's DMA buffer into the application's memory buffer that you allocated.

When you down to that level, programming is an intricate dance! No one
wants to do that - it's too complicated, too error prone, and just generally
too painful. But there's a deeper problem: at this level, it's every program
for itself. How do you decide where on the disk to put your data? How can you
make sure that no one else is going to use that part of the disk? How can you
tell another program where to find the data that you stored?

You want to have something that creates the illusion of a much simpler computational world. Of course, under the covers, it's all going to be that incredibly messy stuff, but you want to cover it up. That's the job of an operating system: it's a layer between the hardware and the programs that you run that create a new computational world that's much easier to work in.

Instead of having to do the dance of mucking with the hard disk drive controller yourself, the operating system gives you a way of saying "Open a file named 'foo'", and then it takes that request, figures out where 'foo' is on the disk, talks to the disk drive, gets the data, and then hands you a buffer containing it. You don't need to know what kind of disk drive the data is coming from, how the name 'foo' maps to sectors of the disk. You don't need to know where the control memory locations for the drive are. You just let the operating system do that for you.

So, ultimately, this is the answer: The operating system is a program that runs on the computer, and creates the environment in which other programs can run. It does a lot of things to create a pleasant environment in which to write and run other programs. Among the multitude of services provided by most modern operating system are:

1. Device input and output. This is what we talked about above: direct interaction with input and output devices is complicated and error prone; the operating system implements the input and output processes once, (hopefully) without errors, and then makes it easy for every other program to just use its correct implementation.
2. Multitasking: your computer has enough power to do many things at once. Most modern computers have more than one CPU. (My current laptop has 4!) And most programs end up spending a lot of their time doing nothing: waiting for you to press a key, or waiting for the disk drive to send it data. The operating system creates sandboxes, called processes, and allows one program to run in each sandbox. It takes care of ensuring that each process gets to run on a CPU for a fair share of the time.
3. Memory management. With more than one program running at the same time on your computer, you need to make sure that you're using memory that isn't also being used by some other program, and to make sure that no other program can alter the memory that you're using without your permission. The operating system decides what parts of memory can be used by which program.
4. Filesystems. Your disk drive is really just a huge collection of small sections, each of which can store a fixed number of bits, encoded in some strange format dictated by the mechanics of the drive. The OS provides an abstraction that's a lot easier to deal with.

I think that's enough for one day. Tomorrow: how the computer knows how to run the OS when it gets switched on!

### Tommaso Dorigo — The Quote Of The Week - Higgs On Anderson's Role In The Higgs Mechanism

"During the years 1962 to 1964 a debate developed about whether the Goldstone theorem could be evaded. Anderson pointed out that in a superconductor the Goldstone mode becomes a massive plasmon mode due to its electromagnetic interaction, and that this mode is just the longitudinal partner of transversely polarized electromagnetic modes, which also are massive (the Meissner effect!). Ths was the first description of what has become known as the Higgs mechanism.

### Backreaction — The three phases of space-time

If life grows over your head, your closest pop psy magazine recommends dividing it up into small, manageable chunks. Physicists too apply this method in difficult situations. Discrete approximations – taking a system apart into chunks – are enormously useful to understand emergent properties and to control misbehavior, such as divergences. Discretization is the basis of numerical simulations, but can also be used in an analytic approach, when the size of chunks is eventually taken towards zero.

Understanding space and time in the early universe is such a difficult situation where gravity is misbehaved and quantum effects of gravity should become important, yet we don’t know how to deal with them. Discretizing the system and treating it similar to other quantum systems is the maybe most conservative approach one can think of, yet it is challenging. Normally, discretization is used for a system within space and time. Now it is space and time themselves that are being discretized. There is no underlying geometry as reference on which to discretize.

Causal Dynamical Triangulations (CDT), pioneered by Loll, Ambjørn and Jurkiewicz, realizes this most conservative approach towards quantum gravity. Geometry is decomposed into triangular chunks (or their higher-dimensional versions respectively) and all possible geometries are summed over in a path integral (after Wick-rotation) with the weight given by the discretized curvature. The curvature is encoded in the way the chunks are connected to each other. The term ‘causal’ refers to a selection principle for geometries that are being summed over. In the end, the continuum limit can be taken, so this approach in an by itself doesn’t mean that spacetime fundamentally is discrete, just that it can be approximated by a discretization procedure.

The path integral that plays the central role here is Feynman’s famous brain child in which a quantum system takes all possible paths, and observables are computed by suitably summing up all possible contributions. It is the mathematical formulation of the statement that the electron goes through both slits. In CDT it’s space-time that goes through all allowed chunk configurations.

Evaluating the path integral of the triangulations is computationally highly intensive, but simple universes can now be simulated numerically. The results that have been found during the last years are promising: The approach produces a smooth extended geometry that appears well-behaved. This doesn’t sound like much, but keep in mind that they didn’t start with anything resembling geometry! It’s discrete things glued together, but it reproduces a universe with a well-behaved geometry like the one we see around.

Or does it?

The path integral of CDT contains free parameters, and most recently the simulations found that the properties of the universe it describes depend on the value of the parameters. I find this very intriguing because it means that, if space-time's quantum properties are captured by CDT, then space-time has various different phases, much like water has different phases.

In the image below you see the phase-diagram of space-time in CDT.

 CDT phase diagram of space-time. After Fig 5 in 1302.2173

The parameter κ is proportional to the inverse of Newton’s constant, and the parameter Δ quantifies the (difference in the) abundance of two different types of chunks that space-time is built up of. The phase marked C in the upper left, with the Hubble image, is where one finds a geometry resembling our universe. In the phase marked A to the right space-time falls apart into causally disconnected pieces. In the phase marked B at the bottom, space-time clumps together into a highly connected graph with a small diameter that doesn’t resemble any geometry. The numerical simulations indicate that the transition between the phases C and A is first order, and between C and B it’s second order.

In summary, in phase A everything is disconnected. In phase B everything is connected. In phase C you can share images of your lunch with people you don’t know on facebook.

Now you might say, well, but the parameters are what they are and facebook is what it is. But in quantum theory, parameters tend to depend on the scale, that is the distance or energies by which a system is probed. Physicists say “constant’s run”, which just rephrases the somewhat embarrassing statement that a constant is not constant. Since our universe is not in thermal equilibrium and has cooled down from a state of high temperature, constants have been running, and our universe can thus have passed through various phases in parameter space.

Of course it might be that CDT in the end is not the right way to describe the quantum properties of gravity. But I find this a very interesting development, because such a geometric phase transition might have left observable traces and brings us one step closer to experimental evidence for quantum gravity.

-----------
You can find a very good brief summary of CDT here, and the details eg in this paper.

Images used in the background of the phase-diagram, are from here, here and here.

## December 04, 2013

### Tommaso Dorigo — I Am Writing A Book

Inspired by my friend Peter Woit's openness in discussing his work in progress (a thick textbook on the foundations of quantum mechanics), I feel compelled to come out here about my own project.

### Chad Orzel — Keeping BEC Cold

I know I said there weren’t going to be physics posts for a while, but yesterday our Communications office passed along a media request about this paper on feedback cooling of BEC, from some sort of communications-person mailing list. I’d seen it talked up elsewhere– here, for example, so I banged out an email to the reporter in question. Who didn’t use any of my stuff in the story that ran late last night.

Having put in the work, though, I may as well get something out of it, so here’s the email I sent. Questions in bold are from the original request. The paper is in the New Journal of Physics, which is an open access journal, if you want to read it. I’d add figures and links to old posts and that sort of thing, but I’ve already spent longer on this than I should’ve, so you just get the email straight.

————

Questions for the sources include: 1) What is the importance of this finding regarding the “coldest” material in the universe?

One of the main applications people talk about for ultracold atoms is in the area of precision measurement, using the quantum nature of the atoms to make extremely sensitive detectors of acceleration, rotation, gravitational forces, and magnetic fields. These can be useful both for fundamental physics studies–looking at tiny interactions in fine detail– and for more practical projects. When I was a post-doc, the lab I worked in got funding from the Navy to develop atom-based detectors of rotation and gravity gradients because they might be useful in submarine navigation. A sensitive rotation detector can tell you how you’ve changed the orientation of the submarine, and a gravity sensor could help detect underwater mountains and other obstacles, in both cases without needing to send out sonar pulses that other people could detect.

These sorts of detectors mostly work by using the wave nature of matter, taking a bunch of atoms and splitting them apart onto two different paths, then bringing them back together recombining them. Differences in the interactions experienced by the part of the atoms that went along one path versus the other show up as changes of the final state when you recombine them. People do the same thing with light waves all the time– the best current sensors for a lot of these things involve light– but atoms offer a couple of advantages having to do with the fact that they’re massive particles, and thus interact much more strongly with gravity.

For this to work really well, you need the best possible source of atom waves– a collection of atoms that behave basically like a laser, with all of them having the same wavelength and “phase,” meaning that they’re all in step, with the peaks and valleys in the same places. Ultra-cold atoms from a Bose-Einstein Condensate are, in principle, a great way to do this– a BEC is, loosely speaking, a large collection of atoms all occupying a single quantum state, so all of the atom waves are perfectly in step. The practical problem with this is that a BEC is a fragile thing, so cold that a single atom absorbing a single photon can significantly heat the sample, which disturbs the waves, and thus reduces the sensitivity. But you need to shine light on the atoms to know what they’re doing, which means you always have a bit of heating going on. This limits the length of time you can keep a BEC around as a useful source of cold atom waves.

The new study is a theoretical investigation of a way to fix this problem. Having light shining on the atoms to monitor them will give you some heating, but if you’re clever about it, you can use the information you get about what they’re doing to correct for that heating– making small tweaks to the light and magnetic fields used to confine the atoms that takes that heat out. This kind of feedback system allows you to adjust the parameters in a way that can extend the useful lifetime of a BEC dramatically.

Some previous simulations had suggested that this might be possible, but this is a very tricky problem to solve, and the methods the previous studies used relied on some approximations to simplify the problem to something they could actually work with. That always leaves open the possibility that you’ve approximated away the killer problem that will prevent the system from working the way you’d like. This new paper uses a different approximation, that captures more of the details of the interaction, and while it shows the limits of the previous work, it also suggests that the feedback scheme could work in a practical system.

2) How cold is cold in this instance?

That’s a harder question to answer than you might think. There are two different ways to talk about temperature in physics: one is in terms of the speed at which the atoms in your sample are moving, and the other is in terms of the number of quantum states that they occupy.

In speed terms, you can never totally get rid of the motion of the atoms. If you let the sample go, turning off the trap that confines them, you’ll see them fly away at some speed, that probably corresponds to a billionth of a degree (Celsius) above absolute zero.

In occupied-state terms, the sample is essentially all in a single one of the possible energy states an atom can have inside the trap that holds them. There’s a tiny residue of atoms in higher-energy states, which corresponds to a really tiny temperature. Again, this is probably something in the nanokelvin (billionth of a degree) sort of range.

Those values are purely practical limits– you can never remove all of the energy from the system, because it would require an infinitely long time to get all the energy out. You can get as close to zero energy as you have the patience to wait for, though. The numbers they use in their simulation (and that people reach in experiment) are a sort of compromise–they could get colder, but it’s not worth the extra time.

Probably the best thing to say about the temperature is that it’s as close to absolute zero as you’d like it to be.

(Though, again, this is a theory paper, so their atoms are simulated in a computer. The numbers they put in are based on real experiments, though.)

3) What is the next step in the research?

This is a theoretical study, using a computer to simulate the behavior of atoms in a BEC exposed to their particular feedback scheme. The next step is to actually do it with real atoms in a real trapped BEC.

This shouldn’t be too hard to arrange, as they propose two different ways of realizing the feedback, closely based on two different systems people use for studying BEC. There are several different experimental research groups that could potentially implement this sort of scheme, scattered all around the world, including in Australia where some of these people are based. It’s a good bet that they’ve been talking to experimenters about how to do this, and there are almost certainly groups tooling up to implement this right now (if they’re not already well into the experiments).

Once they show it can work to keep the BEC around and coherent for a long time, then it’s a matter of integrating this with the various precision-measurement systems. Again, there are a lot of people who do this sort of thing, and I’m sure they’re watching this very closely.

### Matt Strassler — Wednesday: Sean Carroll & I Interviewed Again by Alan Boyle

Today, Wednesday December 4th, at 8 pm Eastern/5 pm Pacific time, Sean Carroll and I will be interviewed again by Alan Boyle on “Virtually Speaking Science”.   The link where you can listen in (in real time or at your leisure) is

What is “Virtually Speaking Science“?  It is an online radio program that presents, according to its website:

• Informal conversations hosted by science writers Alan Boyle, Tom Levenson and Jennifer Ouellette, who explore the explore the often-volatile landscape of science, politics and policy, the history and economics of science, science deniers and its relationship to democracy, and the role of women in the sciences.

Sean Carroll is a Caltech physicist, astrophysicist, writer and speaker, blogger at Preposterous Universe, who recently completed an excellent and now prize-winning popular book (which I highly recommend) on the Higgs particle, entitled “The Particle at the End of the Universe“.  Our interviewer Alan Boyle is a noted science writer, author of the book “The Case for Pluto“, winner of many awards, and currently NBC News Digital’s science editor [at the blog  "Cosmic Log"].

Sean and I were interviewed in February by Alan on this program; here’s the link.  I was interviewed on Virtually Speaking Science once before, by Tom Levenson, about the Large Hadron Collider (here’s the link).  Also, my public talk “The Quest for the Higgs Particle” is posted in their website (here’s the link to the audio and to the slides).

Filed under: Astronomy, Higgs, History of Science, LHC Background Info, Particle Physics, Public Outreach, Quantum Gravity, Science News, The Scientific Process Tagged: astronomy, DarkMatter, DoingScience, gravity, Higgs, LHC, particle physics, PublicPerception, PublicTalks

### Jordan Ellenberg — An incidence conjecture of Bourgain over fields of positive characteristic (with Hablicsek)

Marci Hablicsek (a finishing Ph.D. student at UW) and I recently posted a new preprint, “An incidence conjecture of Bourgain over fields of finite characteristic.”

The theme of the paper is a beautiful theorem of Larry Guth and Nets Katz, one of the early successes of Dvir’s “polynomial method.”  They proved a conjecture of Bourgain:

Given a set S of points in R^3, and a set of N^2 lines such that

• No more than N lines are contained in any plane;
• Each line contains at least N points of S;

then S has at least cN^3 points.

In other words, the only way for a big family of lines to have lots of multiple intersections is for all those lines to be contained in a plane.  (In the worst case where all the lines are in a plane, the incidences between points and lines are governed by the Szemeredi-Trotter theorem.)

I saw Nets speak about this in Wisconsin, and I was puzzled by the fact that the theorem only applied to fields of characteristic 0, when the proof was entirely algebraic.  But you know the proof must fail somehow in characteristic p, because the statement isn’t true in characteristic p.  For example, over the field k with p^2 elements, one can check that the Heisenberg surface

$X: x - x^p + yz^p - zy^p = 0$

has a set of p^4 lines, no more than p lying on any plane, and such that each line contains at least p^2 elements of X(k).  If the Guth-Katz theorem were true over k, we could take N = p^2 and conclude that |X(k)| is at least p^6.  But in fact, it’s around p^5.

It turns out that there is one little nugget in the proof of Guth-Katz which is not purely algebraic.  Namely:  they show that a lot of the lines are contained in some surface S with the following property;  at every smooth point s of S, the tangent plane to S at s intersects S with multiplicity greater than 2.  They express this in the form of an assertion that a certain curvature form vanishes everywhere.  In characteristic 0, this implies that S is a plane.  But not so in characteristic p!  (As always, the fundamental issue is that a function in characteristic p can have zero derivative without being constant — viz., x^p.)  All of us who did the problems in Hartshorne know about the smooth plane curve over F_3 with every point an inflection point.  Well, there are surfaces like that too (the Heisenberg surface is one such) and the point of the new paper is to deal with them.  In fact, we show that the Guth-Katz theorem is true word for word as long as you prevent lines not only from piling up in planes but also from piling up in these “flexy” surfaces.

It turns out that any such surface must have degree at least p, and this enables us to show that the Guth-Katz theorem is actually true, word for word, over the prime field F_p.

If you like, you can think of this as a strengthening of Dvir’s theorem for the case of F_p^3.  Dvir proves that a set of p^2 lines with no two lines in the same direction fills up a positive-density subset of the whole space.  What we prove is that the p^2 lines don’t have to point in distinct directions; it is enough to impose the weaker condition that no more than p of them lie in any plane; this already implies that the union of the lines has positive density.  Again, this strengthening doesn’t hold for larger finite fields, thanks to the Heisenberg surface and its variants.

This is rather satisfying, in that there are other situations in this area (e.g. sum-product problems) where there are qualitatively different bounds depending on whether the field k in question has nontrivial subfields or not.  But it is hard to see how a purely algebraic argument can “see the difference” between F_p and F_{p^2}.  The argument in this paper shows there’s at least one way this can happen.

Satisfying, also, because it represents an unexpected application for some funky characteristic-p algebraic geometry!  I have certainly never needed to remember that particular Hartshorne problem in my life up to now.

### Clifford Johnson — New Tool

Actually, this new tool is pretty old school, and I love it! There are times when I want to have a change of venue while doing rather detailed work for The Project… perhaps go sit in a cafe for a change, instead of at the drawing desk. But when not at the drawing desk, I could not use the lovely large illuminated magnifying glass […] Click to continue reading this post

### Jordan Ellenberg — Stoner

Stoner, a 1965 novel by John Williams, has been named the 2013 Waterstones Book of the Year.

Pretty cool to see an old book recognized!  I read this a while back; it’s one of those books often mentioned as a “forgotten classic” and I read such books out of a sense of obligation.  But sometimes, like this time, it pays off.  (See also:  Independent People, The Bridge on the Drina.)  Stoner represents a certain strain in the mid-century American novel that I really like, and which I don’t think exists in contemporary fiction.  Anguish, verbal restraint, weirdness.  Among famous authors, maybe some of Salinger, maybe some of O’Connor (but not glowing like O’Connor, more subdued, and not funny like Salinger, more deadpan.)  Besides Stoner I am thinking of James Purdy and Richard Yates — not even so much Revolutionary Road but The Easter Parade, which is grinding and merciless but at the same time strangely mild-mannered, in the same way Stoner is.

What else belongs here?

### Clifford Johnson — Fail Lab Episode 12 – Finale

….In which Crystal and many of the crew chill out on the sofa after a long hard season of shows and express some of their gut feelings about the whole business. Well done on an excellent series of shows, Crystal, Patrick, James, and all the other behind the scenes people! (Warning: - This episode may not be for the squeamish!) Embed below: […] Click to continue reading this post

## December 03, 2013

### Matt Strassler — The Fast and Glamorous Life of a Theoretical Physicist

Ah, the fast-paced life of a theoretical physicist!  I just got done giving a one-hour talk in Rome, given at a workshop for experts on the ATLAS experiment, one of the two general purpose experiments at the Large Hadron Collider [LHC]. Tomorrow morning I’ll be talking with a colleague at the Rutherford Appleton Lab in the U.K., an expert from CMS (the other general purpose experiment at the LHC). Then it’s off to San Francisco, where tomorrow (Wednesday, 5 p.m. Pacific Time, 8 p.m. Eastern), at the Exploratorium, I’ll be joined by Caltech’s Sean Carroll, who is an expert on cosmology and particle physics and whose book on the Higgs boson discovery just won a nice prize, and we’ll be discussing science with science writer Alan Boyle, as we did back in February. [You can click here to listen in to Wednesday's event.]  Next, on Thursday I’ll be at a meeting hosted in Stony Brook, on Long Island in New York State, discussing a Higgs-particle-related scientific project with theoretical physics colleagues as far flung as Hong Kong.  On Friday I shall rest.

“How does he do it?”, you ask. Hey, a private jet is a wonderful thing! Simple, convenient, no waiting at the gate; I highly recommend it! However — I don’t own one. All I have is Skype, and other Skype-like software.  My words will cross the globe, but my body won’t be going anywhere this week.

We should not take this kind of communication for granted! If the speed of light were 186,000 miles (300,000 kilometers) per hour, instead of 186,000 miles (300,000 kilometers) per second, ordinary life wouldn’t obviously change that much, but we simply couldn’t communicate internationally the way we do. It’s 4100 miles (6500 kilometers) across the earth’s surface to Rome; light takes about 0.02 seconds to travel that distance, so that’s the fastest anything can travel to make the trip. But if light traveled 186,000 miles per hour, then it would take over a minute for my words to reach Rome, making conversation completely impossible. A back-and-forth conversation would be difficult even between New York and Boston — for any signal to travel the 200 miles (300 kilometers) would require four seconds, so you’d be waiting for 8 seconds to hear the other person answer your questions. We’d have similar problems — slightly less severe — if the earth were as large as the sun.  And someday, as we populate the solar system, we’ll actually have this problem.

So think about that next time you call or Skype or otherwise contact a distant friend or colleague, and you have a conversation just as though you were next door, despite your being separated half-way round the planet. It’s a small world (and a fast one) after all.

Filed under: The Scientific Process Tagged: DoingScience, relativity

### Scott Aaronson — Twenty Reasons to Believe Oswald Acted Alone

As the world marked the 50th anniversary of the JFK assassination, I have to confess … no, no, not that I was in on the plot.  I wasn’t even born then, silly.  I have to confess that, in between struggling to make a paper deadline, attending a workshop in Princeton, celebrating Thanksgivukkah, teaching Lily how to pat her head and clap her hands, and not blogging, I also started dipping, for the first time in my life, into a tiny fraction of the vast literature about the JFK assassination.  The trigger (so to speak) for me was this article by David Talbot, the founder of Salon.com.  I figured, if the founder of Salon is a JFK conspiracy buff—if, for crying out loud, my skeptical heroes Bertrand Russell and Carl Sagan were both JFK conspiracy buffs—then maybe it’s at least worth familiarizing myself with the basic facts and arguments.

So, what happened when I did?  Were the scales peeled from my eyes?

In a sense, yes, they were.  Given how much has been written about this subject, and how many intelligent people take seriously the possibility of a conspiracy, I was shocked by how compelling I found the evidence to be that there were exactly three shots, all fired by Lee Harvey Oswald with a Carcano rifle from the sixth floor of the Texas School Book Depository, just as the Warren Commission said in 1964.  And as for Oswald’s motives, I think I understand them as well and as poorly as I understand the motives of the people who send me ramblings every week about P vs. NP and the secrets of the universe.

Before I started reading, if someone forced me to guess, maybe I would’ve assigned a ~10% probability to some sort of conspiracy.  Now, though, I’d place the JFK conspiracy hypothesis firmly in Moon-landings-were-faked, Twin-Towers-collapsed-from-the-inside territory.  Or to put it differently, “Oswald as lone, crazed assassin” has been added to my large class of “sanity-complete” propositions: propositions defined by the property that if I doubt any one of them, then there’s scarcely any part of the historical record that I shouldn’t doubt.  (And while one can’t exclude the possibility that Oswald confided in someone else before the act—his wife or a friend, for example—and that other person kept it a secret for 50 years, what’s known about Oswald strongly suggests that he didn’t.)

So, what convinced me?  In this post, I’ll give twenty reasons for believing that Oswald acted alone.  Notably, my reasons will have less to do with the minutiae of bullet angles and autopsy reports, than with general principles for deciding what’s true and what isn’t.  Of course, part of the reason for this focus is that the minutiae are debated in unbelievable detail elsewhere, and I have nothing further to contribute to those debates.  But another reason is that I’m skeptical that anyone actually comes to believe the JFK conspiracy hypothesis because they don’t see how the second bullet came in at the appropriate angle to pass through JFK’s neck and shoulder and then hit Governor Connally.  Clear up some technical point (or ten or fifty of them)—as has been done over and over—and the believers will simply claim that the data you used was altered by the CIA, or they’ll switch to other “anomalies” without batting an eye.  Instead, people start with certain general beliefs about how the world works, “who’s really in charge,” what sorts of explanations to look for, etc., and then use their general beliefs to decide which claims to accept about JFK’s head wounds or the foliage in Dealey Plaza—not vice versa.  That being so, one might as well just discuss the general beliefs from the outset.  So without further ado, here are my twenty reasons:

1. Conspiracy theorizing represents a known bug in the human nervous system.  Given that, I think our prior should be overwhelmingly against anything that even looks like a conspiracy theory.  (This is not to say conspiracies never happen.  Of course they do: Watergate, the Tobacco Institute, and the Nazi Final Solution were three well-known examples.  But the difference between conspiracy theorists’ fantasies and actual known conspiracies is this: in a conspiracy theory, some powerful organization’s public face hides a dark and terrible secret; its true mission is the opposite of its stated one.  By contrast, in every real conspiracy I can think of, the facade was already 90% as terrible as the reality!  And the “dark secret” was that the organization was doing precisely what you’d expect it to do, if its members genuinely held the beliefs that they claimed to hold.)

2. The shooting of Oswald by Jack Ruby created the perfect conditions for conspiracy theorizing to fester.  Conditioned on that happening, it would be astonishing if a conspiracy industry hadn’t arisen, with its hundreds of books and labyrinthine arguments, even under the assumption that Oswald and Ruby both really acted alone.

3. Other high-profile assassinations to which we might compare this one—for example, those of Lincoln, Garfield, McKinley, RFK, Martin Luther King Jr., Gandhi, Yitzchak Rabin…—appear to have been the work of “lone nuts,” or at most “conspiracies” of small numbers of lowlifes.  So why not this one?

4. Oswald seems to have perfectly fit the profile of a psychopathic killer (see, for example, Case Closed by Gerald Posner).  From very early in his life, Oswald exhibited grandiosity, resentment, lack of remorse, doctrinaire ideological fixations, and obsession with how he’d be remembered by history.

5. A half-century of investigation has failed to link any individual besides Oswald to the crime.  Conspiracy theorists love to throw around large, complicated entities like the CIA or the Mafia as potential “conspirators”—but in the rare cases when they’ve tried to go further, and implicate an actual human being other than Oswald or Ruby (or distant power figures like LBJ), the results have been pathetic and tragic.

6. Oswald had previously tried to assassinate General Walker—a fact that was confirmed by his widow Marina Oswald, but that, incredibly, is barely even discussed in the reams of conspiracy literature.

7. There’s clear evidence that Oswald murdered Officer Tippit an hour after shooting JFK—a fact that seems perfectly consistent with the state of mind of someone who’d just murdered the President, but that, again, seems to get remarkably little discussion in the conspiracy literature.

8. Besides being a violent nut, Oswald was also a known pathological liar.  He lied on his employment applications, he lied about having established a thriving New Orleans branch of Fair Play for Cuba, he lied and lied and lied.  Because of this tendency—as well as his persecution complex—Oswald’s loud protestations after his arrest that he was just a “patsy” count for almost nothing.

9. According to police accounts, Oswald acted snide and proud of himself after being taken into custody: for example, when asked whether he had killed the President, he replied “you find out for yourself.”  He certainly didn’t act like an innocent “patsy” arrested on such a grave charge would plausibly act.

10. Almost all JFK conspiracy theories must be false, simply because they’re mutually inconsistent.  Once you realize that, and start judging the competing conspiracy theories by the standards you’d have to judge them by if at most one could be true, enlightenment may dawn as you find there’s nothing in the way of just rejecting all of them.  (Of course, some people have gone through an analogous process with religions.)

11. The case for Oswald as lone assassin seems to become stronger, the more you focus on the physical evidence and stuff that happened right around the time and place of the event.  To an astonishing degree, the case for a conspiracy seems to rely on verbal testimony years or decades afterward—often by people who are known confabulators, who were nowhere near Dealey Plaza at the time, who have financial or revenge reasons to invent stories, and who “remembered” seeing Oswald and Ruby with CIA agents, etc. only under drugs or hypnosis.  This is precisely the pattern we would expect if conspiracy theorizing reflected the reality of the human nervous system rather than the reality of the assassination.

12. If the conspiracy is so powerful, why didn’t it do something more impressive than just assassinate JFK? Why didn’t it rig the election to prevent JFK from becoming President in the first place?  (In math, very often the way you discover a bug in your argument is by realizing that the argument gives you more than you originally intended—vastly, implausibly more.  Yet every pro-conspiracy argument I’ve read seems to suffer from the same problem.  For example, after successfully killing JFK, did the conspiracy simply disband?  Or did it go on to mastermind other assassinations?  If it didn’t, why not?  Isn’t pulling the puppet-strings of the world sort of an ongoing proposition?  What, if any, are the limits to this conspiracy’s power?)

13. Pretty much all the conspiracy writers I encountered exude total, 100% confidence, not only in the existence of additional shooters, but in the guilt of their favored villains (they might profess ignorance, but then in the very next sentence they’d talk about how JFK’s murder was “a triumph for the national security establishment”).  For me, their confidence had the effect of weakening my own confidence in their intellectual honesty, and in any aspects of their arguments that I had to take on faith.  The conspiracy camp would of course reply that the “Oswald acted alone” camp also exudes too much confidence in its position.  But the two cases are not symmetric: for one thing, because there are so many different conspiracy theories, but only one Oswald.  If I were a conspiracy believer I’d be racked with doubts, if nothing else then about whether my conspiracy was the right one.

14. Every conspiracy theory I’ve encountered seems to require “uncontrolled growth” in size and complexity: that is, the numbers of additional shooters, alterations of medical records, murders of inconvenient witnesses, coverups, coverups of the coverups, etc. that need to be postulated all seem to multiply without bound.  To some conspiracy believers, this uncontrolled growth might actually be a feature: the more nefarious and far-reaching the conspiracy’s tentacles, the better.  It should go without saying that I regard it as a bug.

15. JFK was not a liberal Messiah.  He moved slowly on civil rights for fear of a conservative backlash, invested heavily in building nukes, signed off on the botched plans to kill Fidel Castro, and helped lay the groundwork for the US’s later involvement in Vietnam.  Yes, it’s possible that he would’ve made wiser decisions about Vietnam than LBJ ended up making; that’s part of what makes his assassination (like RFK’s later assassination) a tragedy.  But many conspiracy theorists’ view of JFK as an implacable enemy of the military-industrial complex is preposterous.

16. By the same token, LBJ was not exactly a right-wing conspirator’s dream candidate.  He was, if anything, more aggressive on poverty and civil rights than JFK was.  And even if he did end up being better for certain military contractors, that’s not something that would’ve been easy to predict in 1963, when the US’s involvement in Vietnam had barely started.

17. Lots of politically-powerful figures have gone on the record as believers in a conspiracy, including John Kerry, numerous members of Congress, and even frequently-accused conspirator LBJ himself.  Some people would say that this lends credibility to the conspiracy cause.  To me, however, it indicates just the opposite: that there’s no secret cabal running the world, and that those in power are just as prone to bugs in the human nervous system as anyone else is.

18. As far as I can tell, the conspiracy theorists are absolutely correct that JFK’s security in Dallas was unbelievably poor; that the Warren Commission was as interested in reassuring the nation and preventing a war with the USSR or Cuba as it was in reaching the truth (the fact that it did reach the truth is almost incidental); and that agencies like the CIA and FBI kept records related to the assassination classified for way longer than there was any legitimate reason to (though note that most records finally were declassified in the 1990s, and they provided zero evidence for any conspiracy).  As you might guess, I ascribe all of these things to bureaucratic incompetence rather than to conspiratorial ultra-competence.  But once again, these government screwups help us understand how so many intelligent people could come to believe in a conspiracy even in the total absence of one.

19. In the context of the time, the belief that JFK was killed by a conspiracy filled a particular need: namely, the need to believe that the confusing, turbulent events of the 1960s had an understandable guiding motive behind them, and that a great man like JFK could only be brought down by an equally-great evil, rather than by a chronically-unemployed loser who happened to see on a map that JFK’s motorcade would be passing by his workplace.  Ironically, I think that Roger Ebert got it exactly right when he praised Oliver Stone’s JFK movie for its “emotional truth.”  In much the same way, one could say that Birth of a Nation was “emotionally true” for Southern racists, or that Ben Stein’s Expelled was “emotionally true” for creationists.  Again, I’d say that the “emotional truth” of the conspiracy hypothesis is further evidence for its factual falsehood: for it explains how so many people could come to believe in a conspiracy even if the evidence for one were dirt-poor.

20. At its core, every conspiracy argument seems to be built out of “holes”: “the details that don’t add up in the official account,” “the questions that haven’t been answered,” etc.  What I’ve never found is a truly coherent alternative scenario: just one “hole” after another.  This pattern is the single most important red flag for me, because it suggests that the JFK conspiracy theorists view themselves as basically defense attorneys: people who only need to sow enough doubts, rather than establish the reality of what happened.  Crucially, creationism, 9/11 trutherism, and every other elaborate-yet-totally-wrong intellectual edifice I’ve ever encountered has operated on precisely the same “defense attorney principle”: “if we can just raise enough doubts about the other side’s case, we win!”  But that’s a terrible approach to knowledge, once you’ve seen firsthand how a skilled arguer can raise unlimited doubts even about the nonexistence of a monster under your bed.  Such arguers are hoping, of course, that you’ll find their monster hypothesis so much more fun, exciting, and ironically comforting than the “random sounds in the night hypothesis,” that it won’t even occur to you to demand they show you their monster.

### Jordan Ellenberg — Names and words

When you get the copy-edited manuscript of a book back, it comes with a document called “Names and Words,” this is a list of proper names or unusual words in the book which might admit variant spelling or typography, and the list is there to keep everybody on the production team uniform.

Here’s the A-B section of my list.  I think it gives a pretty good sense of what the book is about.

Niels Henrik Abel

Aish HaTorah

Alcmaeon of Croton

Alhazen (Abu ‘Ali al-Hasan ibn al-Haytham)

Spike Albrecht

Ray Allen

Scott Allen

Akhil and Vikram Amar

Apollonius of Perga

Yasser Arafat

John Arbuthnot

Dan Ariely

Kenneth Arrow

John Ashbery

Daryl Renard Atkins

Yigal Attali

David Bakan

Stefan Banach

Dror Bar-Natan

Joseph-Émile Barbier

Leroy E. Burney

Andrew Beal

Nicholas Beaudrot

Bernd Beber

Gary Becker

Armando Benitez

Craig Bennett

Jim Bennett

George Berkeley

Joseph Berkson

Daniel Bernoulli

Jakob Bernoulli

Nicholas Bernoulli

Alphonse Bertillon

Bertillonage

Joseph Bertrand

best seller

best-selling

R. H. Bing

Otto Blumenthal

Usain Bolt

Farkas Bolyai

János Bolyai

Jean-Charles de Borda

Bose-Chaudhuri-Hocquenghem code

Nick Bostrom

David Brooks

Derren Brown

Filippo Brunelleschi

Pat Buchanan

Georges-Louis LeClerc, Comte de Buffon

Dylan Byers

Daniel Byman

David Byrne

### Scott Aaronson — Scattershot BosonSampling: A new approach to scalable BosonSampling experiments

Update (12/2): Jeremy Hsu has written a fantastic piece for IEEE Spectrum, entitled “D-Wave’s Year of Computing Dangerously.”

Update (11/13): See here for video of a fantastic talk that Matthias Troyer gave at Stanford, entitled “Quantum annealing and the D-Wave devices.” The talk includes the results of experiments on the 512-qubit machine. (Thanks to commenter jim for the pointer. I attended the talk when Matthias gave it last week at Harvard, but I don’t think that one was videotaped.)

Update (11/11): A commenter named RaulGPS has offered yet another great observation that, while forehead-slappingly obvious in retrospect, somehow hadn’t occurred to us.  Namely, Raul points out that the argument given in this post, for the hardness of Scattershot BosonSampling, can also be applied to answer open question #4 from my and Alex’s paper: namely, how hard is BosonSampling with Gaussian inputs and number-resolving detectors?  Raul points out that the latter, in general, is certainly at least as hard as Scattershot BS.  For we can embed Scattershot BS into “ordinary” BS with Gaussian inputs, by first generating a bunch of entangled 2-mode Gaussian states (which are highly attenuated, so that with high probability none of them have 2 or more photons per mode), and then applying a Haar-random unitary U to the “right halves” of these Gaussian states while doing nothing to the left halves.  Then we can measure the left halves to find out which of the input states contained a photon before we applied U.  This is precisely equivalent to Scattershot BS, except for the unimportant detail that our measurement of the “herald” photons has been deferred till the end of the experiment instead of happening at the beginning.  And therefore, since (as I explain in the post) a fast classical algorithm for approximate Scattershot BosonSampling would let us estimate the permanents of i.i.d. Gaussian matrices in BPPNP, we deduce that a fast classical algorithm for approximate Gaussian BosonSampling would have the same consequence.  In short, approximate Gaussian BS can be argued to be hard under precisely the same complexity assumption as can approximate ordinary BS (and approximate Scattershot BS).  Thus, in the table in Section 1.4 of our paper, the entries “Gaussian states / Adaptive, demolition” and “Gaussian states / Adaptive, nondemolition” should be “upgraded” from “Exact sampling hard” to “Apx. sampling hard?”

One other announcement: following a suggestion by commenter Rahul, I hereby invite guest posts on Shtetl-Optimized by experimentalists working on BosonSampling, offering your personal views about the prospects and difficulties of scaling up.  Send me email if you’re interested.  (Or if you don’t feel like writing a full post, of course you can also just leave a comment on this one.)

[Those impatient for a cool, obvious-in-retrospect new idea about BosonSampling, which I learned from the quantum optics group at Oxford, should scroll to the end of this post.  Those who don't even know what BosonSampling is, let alone Scattershot BosonSampling, should start at the beginning.]

BosonSampling is a proposal by me and Alex Arkhipov for a rudimentary kind of quantum computer: one that would be based entirely on generating single photons, sending them through a network of beamsplitters and phaseshifters, and then measuring where they ended up.  BosonSampling devices are not thought to be capable of universal quantum computing, or even universal classical computing for that matter.  And while they might be a stepping-stone toward universal optical quantum computers, they themselves have a grand total of zero known practical applications.  However, even if the task performed by BosonSamplers is useless, the task is of some scientific interest, by virtue of apparently being hard!  In particular, Alex and I showed that, if a BosonSampler can be simulated exactly in polynomial time by a classical computer, then P#P=BPPNP, and hence the polynomial hierarchy collapses to the third level.  Even if a BosonSampler can only be approximately simulated in classical polynomial time, the polynomial hierarchy would still collapse, if a reasonable-looking conjecture in classical complexity theory is true.  For these reasons, BosonSampling might provide an experimental path to testing the Extended Church-Turing Thesis—i.e., the thesis that all natural processes can be simulated with polynomial overhead by a classical computer—that’s more “direct” than building a universal quantum computer.  (As an asymptotic claim, obviously the ECT can never be decisively proved or refuted by a finite number of experiments.  However, if one could build a BosonSampler with, let’s say, 30 photons, then while it would still be feasible to verify the results with a classical computer, it would be fair to say that the BosonSampler was working “faster” than any known algorithm running on existing digital computers.)

In arguing for the hardness of BosonSampling, the crucial fact Alex and I exploited is that the amplitudes for n-photon processes are given by the permanents of nxn matrices of complex numbers, and Leslie Valiant proved in 1979 that the permanent is #P-complete (i.e., as hard as any combinatorial counting problem, and probably even “harder” than NP-complete).  To clarify, this doesn’t mean that a BosonSampler lets you calculate the permanent of a given matrix—that would be too good to be true!  (See the tagline of this blog.)  What you could do with a BosonSampler is weirder: you could sample from a probability distribution over matrices, in which matrices with large permanents are more likely to show up than matrices with small permanents.  So, what Alex and I had to do was to argue that even that sampling task is still probably intractable classically—in the sense that, if it weren’t, then there would also be unlikely classical algorithms for more “conventional” problems.

Anyway, that’s my attempt at a 2-paragraph summary of something we’ve been thinking about on and off for four years.  See here for my and Alex’s original paper on BosonSampling, here for a recent followup paper, here for PowerPoint slides, here and here for MIT News articles by Larry Hardesty, and here for my blog post about the first (very small, 3- or 4-photon) demonstrations of BosonSampling by quantum optics groups last year, with links to the four experimental papers that came out then.

In general, we’ve been thrilled by the enthusiastic reaction to BosonSampling by quantum optics people—especially given that the idea started out as pure complexity theory, with the connection to optics coming as an “unexpected bonus.”  But not surprisingly, BosonSampling has also come in for its share of criticism: e.g., that it’s impractical, unscalable, trivial, useless, oversold, impossible to verify, and probably some other things.  A few people have even claimed that, in expressing support and cautious optimism about the recent BosonSampling experiments, I’m guilty of the same sort of quantum computing hype that I complain about in others.  (I’ll let you be the judge of that.  Reread the paragraphs above, or anything else I’ve ever written about this topic, and then compare to, let’s say, this video.)

By far the most important criticism of BosonSampling—one that Alex and I have openly acknowledged and worried a lot about almost from the beginning—concerns the proposal’s scalability.  The basic problem is this: in BosonSampling, your goal is to measure a pattern of quantum interference among n identical, non-interacting photons, where n is as large as possible.  (The special case n=2 is called the Hong-Ou-Mandel dip; conversely, BosonSampling can be seen as just “Hong-Ou-Mandel on steroids.”)  The bigger n gets, the harder the experiment ought to be to simulate using a classical computer (with the difficulty increasing at least like ~2n).  The trouble is that, to detect interference among n photons, the various quantum-mechanical paths that your photons could take, from the sources, through the beamsplitter network, and finally to the detectors, have to get them there at exactly the same time—or at any rate, close enough to “the same time” that the wavepackets overlap.  Yet, while that ought to be possible in theory, the photon sources that actually exist today, and that will exist for the foreseeable future, just don’t seem good enough to make it happen, for anything more than a few photons.

The reason—well-known for decades as a bane to quantum information experiments—is that there’s no known process in nature that can serve as a deterministic single-photon source.  What you get from an attenuated laser is what’s called a coherent state: a particular kind of superposition of 0 photons, 1 photon, 2 photons, 3 photons, etc., rather than just 1 photon with certainty (the latter is called a Fock state).  Alas, coherent states behave essentially like classical light, which makes them pretty much useless for BosonSampling, and for many other quantum information tasks besides.  For that reason, a large fraction of modern quantum optics research relies on a process called Spontaneous Parametric Down-Conversion (SPDC).  In SPDC, a laser (called the “pump”) is used to stimulate a crystal to produce further photons.  The process is inefficient: most of the time, no photon comes out.  But crucially, any time a photon does come out, its arrival is “heralded” by a partner photon flying out in the opposite direction.  Once in a while, 2 photons come out simultaneously, in which case they’re heralded by 2 partner photons—and even more rarely, 3 photons come out, heralded by 3 partner photons, and so on.  Furthermore, there exists something called a number-resolving detector, which can tell you (today, sometimes, with as good as ~95% reliability) when one or more partner photons have arrived, and how many of them there are.  The result is that SPDC lets us build what’s called a nondeterministic single-photon source.  I.e., you can’t control exactly when a photon comes out—that’s random—but eventually one (and only one) photon will come out, and when that happens, you’ll know it happened, without even having to measure and destroy the precious photon.  The reason you’ll know is that the partner photon heralds its presence.

Alas, while SPDC sources have enabled demonstrations of a large number of cool quantum effects, there’s a fundamental problem with using them for BosonSampling.  The problem comes from the requirement that n—the number of single photons fired off simultaneously into your beamsplitter network—should be big (say, 20 or 30).  Suppose that, in a given instant, the probability that your SPDC source succeeds in generating a photon is p.  Then what’s the probability that two SPDC sources will both succeed in generating a photon at that instant?  p2.  And the probability that three sources will succeed is p3, etc.  In general, with n sources, the probability that they’ll succeed simultaneously falls off exponentially with n, and the amount of time you’ll need to sit in the lab waiting for the lucky event increases exponentially with n.  Sure, when it finally does happen, it will be “heralded.”  But if you need to wait exponential time for it to happen, then there would seem to be no advantage over classical computation.  This is the reason why so far, BosonSampling has only been demonstrated with 3-4 photons.

At least three solutions to the scaling problem suggest themselves, but each one has problems of its own.  The first solution is simply to use general methods for quantum fault-tolerance: it’s not hard to see that, if you had a fault-tolerant universal quantum computer, then you could simulate BosonSampling with as many photons as you wanted.  The trouble is that this requires a fault-tolerant universal quantum computer!  And if you had that, then you’d probably just skip BosonSampling and use Shor’s algorithm to factor some 10,000-digit numbers.  The second solution is to invent some specialized fault-tolerance method that would apply directly to quantum optics.  Unfortunately, we don’t know how to do that.  The third solution—until recently, the one that interested me and Alex the most—would be to argue that, even if your sources are so cruddy that you have no idea which ones generated a photon and which didn’t in any particular run, the BosonSampling distribution is still intractable to simulate classically.  After all, the great advantage of BosonSampling is that, unlike with (say) factoring or quantum simulation, we don’t actually care which problem we’re solving!  All we care about is that we’re doing something that we can argue is hard for classical computers.  And we have enormous leeway to change what that “something” is, to match the capabilities of current technology.  Alas, yet again, we don’t know how to argue that BosonSampling is hard to simulate approximately in the presence of realistic amounts of noise—at best, we can argue that it’s hard to simulate approximately in the presence of tiny amounts of noise, and hard to simulate super-accurately in the presence of realistic noise.

When faced with these problems, until recently, all we could do was

1. shrug our shoulders,
2. point out that none of the difficulties added up to a principled argument that scalable BosonSampling was not possible,
3. stress, again, that all we were asking for was to scale to 20 or 30 photons, not 100 or 1000 photons, and
4. express hope that technologies for single-photon generation currently on the drawing board—most notably, something called “optical multiplexing”—could be used to get up to the 20 or 30 photons we wanted.

Well, I’m pleased to announce, with this post, that there’s now a better idea for how to scale BosonSampling to interesting numbers of photons.  The idea, which I’ve taken to calling Scattershot BosonSampling, is not mine or Alex’s.  I learned of it from Ian Walmsley’s group at Oxford, where it’s been championed in particular by Steve Kolthammer(Update: A commenter has pointed me to a preprint by Lund, Rahimi-Keshari, and Ralph from May of this year, which I hadn’t seen before, and which contains substantially the same idea, albeit with an unsatisfactory argument for computational hardness.  In any case, as you’ll see, it’s not surprising that this idea would’ve occurred to multiple groups of experimentalists independently; what’s surprising is that we didn’t think of it!)  The minute I heard about Scattershot BS, I kicked myself for failing to think of it, and for getting sidetracked by much more complicated ideas.  Steve and others are working on a paper about Scattershot BS, but in the meantime, Steve has generously given me permission to share the idea on this blog.  I suggested a blog post for two reasons: first, as you’ll see, this idea really is “blog-sized.”  Once you make the observation, there’s barely any theoretical analysis that needs to be done!  And second, I was impatient to get out to the “experimental BosonSampling community”—not to mention to the critics!—that there’s now a better way to BosonSample, and one that’s incredibly simple to boot.

OK, so what is the idea?  Well, recall from above what an SPDC source does: it produces a photon with only a small probability, but whenever it does, it “heralds” the event with a second photon.  So, let’s imagine that you have an array of 200 SPDC sources.  And imagine that, these sources being unpredictable, only (say) 10 of them, on average, produce a photon at any given time.  Then what can you do?  Simple: just define those 10 sources to be the inputs to your experiment!  Or to say it more carefully: instead of sampling only from a probability distribution over output configurations of your n photons, now you’ll sample from a joint distribution over inputs and outputs: one where the input is uniformly random, and the output depends on the input (and also, of course, on the beamsplitter network).  So, this idea could also be called “Double BosonSampling”: now, not only do you not control which output will be observed (but only the probability distribution over outputs), you don’t control which input either—yet this lack of control is not a problem!  There are two key reasons why it isn’t:

1. As I said before, SPDC sources have the crucial property that they herald a photon when they produce one.  So, even though you can’t control which 10 or so of your 200 SPDC sources will produce a photon in any given run, you know which 10 they were.
2. In my and Alex’s original paper, the “hardest” case of BosonSampling that we were able to find—the case we used for our hardness reductions—is simply the one where the mxn “scattering matrix,” which describes the map between the n input modes and the m>>n output modes, is a Haar-random matrix whose columns are orthonormal vectors.  But now suppose we have m input modes and m output modes, and the mxm unitary matrix U mapping inputs to outputs is Haar-random.  Then any mxn submatrix of U will simply be an instance of the “original” hard case that Alex and I studied!

More formally, what can we  say about the computational complexity of Scattershot BS?  Admittedly, I don’t know of a reduction from ordinary BS to Scattershot BS (though it’s easy to give a reduction in the other direction).  However, under exactly the same assumption that Alex and I used to argue that ordinary BosonSampling was hard—our so-called Permanent of Gaussians Conjecture (PGC)—one can show that Scattershot BS is hard also, and by essentially the same proof.  The only difference is that, instead of talking about the permanents of nxn submatrices of an mxn Haar-random, column-orthonormal matrix, now we talk about the permanents of nxn submatrices of an mxm Haar-random unitary matrix.  Or to put it differently: where before we fixed the columns that defined our nxn submatrix and only varied the rows, now we vary both the rows and the columns.  But the resulting nxn submatrix is still close in variation distance to a matrix of i.i.d. Gaussians, for exactly the same reasons it was before.  And we can still check whether submatrices with large permanents are more likely to be sampled than submatrices with small permanents, in the way predicted by quantum mechanics.

Now, everything above assumed that each SPDC source produces either 0 or 1 photon.  But what happens when the SPDC sources produce 2 or more photons, as they sometimes do?  It turns out that there are two good ways to deal with these “higher-order terms” in the context of Scattershot BS.  The first way is by using number-resolving detectors to count how many herald photons each SPDC source produces.  That way, at least you’ll know exactly which sources produced extra photons, and how many extra photons each one produced.  And, as is often the case in BosonSampling, a devil you know is a devil you can deal with.  In particular, a few known sources producing extra photons, just means that the amplitudes of the output configurations will now be permanents of matrices with a few repeated rows in them.  But the permanent of an otherwise-random matrix with a few repeated rows should still be hard to compute!  Granted, we don’t know how to derive that as a consequence of our original hardness assumption, but this seems like a case where one is perfectly justified to stick one’s neck out and make a new assumption.

But there’s also a more elegant way to deal with higher-order terms.  Namely, suppose m>>n2 (i.e., the number of input modes is at least quadratically greater than the average number of photons).  That’s an assumption that Alex and I typically made anyway in our original BosonSampling paper, because of our desire to avoid what we called the “Bosonic Birthday Paradox” (i.e., the situation where two or more photons congregate in the same output mode).  What’s wonderful is that exactly the same assumption also implies that, in Scattershot BS, two or more photons will almost never be found in the same input mode!  That is, when you do the calculation, you find that, once you’ve attenuated your SPDC sources enough to avoid the Bosonic Birthday Paradox at the output modes, you’ve also attenuated them enough to avoid higher-order terms at the input modes.  Cool, huh?

Are there any drawbacks to Scattershot BS?  Well, Scattershot BS certainly requires more SPDC sources than ordinary BosonSampling does, for the same average number of photons.  A little less obviously, Scattershot BS also requires a larger-depth beamsplitter network.  In our original paper, Alex and I showed that for ordinary BosonSampling, it suffices to use a beamsplitter network of depth O(n log m), where n is the number of photons and m is the number of output modes (or equivalently detectors).  However, our construction took advantage of the fact that we knew exactly which n<<m sources the photons were going to come from, and could therefore optimize for those.  For Scattershot BS, the depth bound increases to O(m log m): since the n photons could come from any possible subset of the m input modes, we no longer get the savings based on knowing where they originate.  But this seems like a relatively minor issue.

I don’t want to give the impression that Scattershot BS is a silver bullet that will immediately let us BosonSample with 30 photons.  The most obvious limiting factor that remains is the efficiency of the photon detectors—both those used to detect the photons that have passed through the beamsplitter network, and those used to detect the herald photons.  Because of detector inefficiencies, I’m told that, without further technological improvements (or theoretical ideas), it will still be quite hard to push Scattershot BS beyond about 10 photons.  Still, as you might have noticed, 10 is greater than 4 (the current record)!  And certainly, Scattershot BS itself—a simple, obvious-in-retrospect idea that was under our noses for years, and that immediately pushes forward the number of photons a BosonSampler can handle—should make us exceedingly reluctant to declare there can’t be any more such ideas, and that our current ignorance amounts to a proof of impossibility.