Planet Musings

August 02, 2014

n-Category Café Wrestling with Tight Spans

I’ve been spending some time with Simon Willerton’s paper Tight spans, Isbell completions and semi-tropical modules. In particular, I’ve been trying to understand tight spans.

The tight span of a metric space AA is another metric space T(A)T(A), in which AA naturally embeds. For instance, the tight span of a two-point space is a line segment containing the original two points as its endpoints. Similarly, the tight span of a three-point space is a space shaped like the letter Y, with the original three points at its tips. Because of examples like this, some people like to think of the tight span as a kind of abstract convex hull.

Simon’s paper puts the tight span construction into the context of a categorical construction, Isbell conjugacy. I now understand these things better than I did, but there’s still a lot I don’t get. Here goes.

Simon wrote a blog post summarizing the main points of his paper, but I want to draw attention to slightly different aspects of it than he does. So, much as I recommend that post of his, I’ll make this self-contained.

We begin with Isbell conjugacy. For any small category AA, there’s an adjunction

ˇ:[A op,Set][A,Set] op:^ \check{\,\,}: [A^{op}, Set] \leftrightarrows [A, Set]^{op}: \hat{\,\,}

defined for F:A opSetF: A^{op} \to Set and bAb \in A by

Fˇ(b)=Hom(F,A(,b)) \check{F}(b) = Hom(F, A(-, b))

and for G:ASetG: A \to Set and aAa \in A by

G^(a)=Hom(G,A(a,)). \hat{G}(a) = Hom(G, A(a, -)).

We call Fˇ\check{F} the (Isbell) conjugate of FF, and similarly G^\hat{G} the (Isbell) conjugate of GG.

Like any adjunction, it restricts to an equivalence of categories in a canonical way. Specifically, it’s an equivalence between

the full subcategory of [A op,Set][A^{op}, Set] consisting of those objects FF such that the canonical map FFˇ^F \to \hat{\check{F}} is an isomorphism


the full subcategory of [A,Set] op[A, Set]^{op} consisting of those objects GG such that the canonical map GG^ˇG \to \check{\hat{G}} is an isomorphism.

I’ll call either of these equivalent categories the reflexive completion R(A)R(A) of AA. (Simon called it the Isbell completion, possibly with my encouragement, but “reflexive completion” is more descriptive and I prefer it now.) So, the reflexive completion of a category consists of all the “reflexive” presheaves on it — those canonically isomorphic to their double conjugate.

All of this categorical stuff generalizes seamlessly to an enriched context, at least if we work over a complete symmetric monoidal closed category.

For example, suppose we take our base category to be the category AbAb of abelian groups. Let kk be a field, viewed as a one-object AbAb-category. Both [k op,Ab][k^{op}, Ab] and [k,Ab][k, Ab] are the category of kk-vector spaces, and both ˇ\check{\,\,} and ^\hat{\,\,} are the dual vector space construction. The reflexive completion R(k)R(k) of kk is the category of kk-vector spaces VV for which the canonical map VV **V \to V^{\ast\ast} is an isomorphism — in other words, the finite-dimensional vector spaces.

But that’s not the example that will matter to us here.

We’ll be thinking primarily about the case where the base category is the poset ([0,],)([0, \infty], \geq) (the reverse of the usual order!) with monoidal structure given by addition. As Lawvere observed long ago, a [0,][0, \infty]-category is then a “generalized metric space”: a set AA of points together with a distance function

d:A×A[0,] d: A \times A \to [0, \infty]

satisfying the triangle inequality d(a,b)+d(b,c)d(a,c)d(a, b) + d(b, c) \geq d(a, c) and the equaiton d(a,a)=0d(a, a) = 0. These are looser structures than classical metric spaces, mainly because of the absence of the symmetry axiom d(a,b)=d(b,a)d(a, b) = d(b, a).

The enriched functors are distance-decreasing maps between metric spaces: those functions f:ABf: A \to B satisfying d B(f(a 1),f(a 2))d A(a 1,a 2)d_B(f(a_1), f(a_2)) \leq d_A(a_1, a_2). I’ll just call these “maps” of metric spaces.

If you work through the details, you find that Isbell conjugacy for metric spaces works as follows. Let AA be a generalized metric space. The conjugate of a map f:A op[0,]f: A^{op} \to [0, \infty] is the map fˇ:A[0,]\check{f}: A \to [0, \infty] defined by

fˇ(b)=sup aAmax{d(a,b)f(a),0} \check{f}(b) = sup_{a \in A} max \{ d(a, b) - f(a), 0 \}

and the conjugate of a map g:A[0,]g: A \to [0, \infty] is the map g^:A[0,]\hat{g}: A \to [0, \infty] defined by

g^(a)=sup bAmax{d(a,b)g(b),0}. \hat{g}(a) = sup_{b \in A} max \{ d(a, b) - g(b), 0 \}.

We always have ffˇ^f \geq \hat{\check{f}}, and the reflexive completion R(A)R(A) of AA consists of all maps f:A op[0,]f: A^{op} \to [0, \infty] such that f=fˇ^f = \hat{\check{f}}. (You can write out an explicit formula for that, but I’m not convinced it’s much help.) The metric on R(A)R(A) is the sup metric.

All that comes out of the general categorical machinery.

However, we can say something more that only makes sense because of the particular base category we’re using.

As we all know, symmetric metric spaces — the ones we’re most used to — are particular interesting. For a symmetric metric space AA, the distinction between covariant and contravariant functors on AA vanishes. The two kinds of conjugate, ^\hat{\,\,} and ˇ\check{\,\,}, are also the same. I’ll write *{\,\,}^\ast for them both.

The reflexive completion R(A)R(A) consists of the functions A[0,]A \to [0, \infty] that are equal to their double conjugate. But because there is no distinction between covariant and contravariant, we can also consider the functions A[0,]A \to [0, \infty] equal to their single conjugate.

The set of such functions is — by definition, if you like — the tight span T(A)T(A) of AA. So

T(A)={f:A[0,]|f=f *}, T(A) = \{ f : A \to [0, \infty] \,|\, f = f^\ast \},


R(A)={f:A[0,]|f=f **}. R(A) = \{ f: A \to [0, \infty] \,|\, f = f^{\ast\ast} \}.

Both come equipped with the sup metric, and both contain AA as a subspace, via the Yoneda embedding ad(,a)a \mapsto d(-, a). So AR(A)T(A)A \subseteq R(A) \subseteq T(A).

Example Let AA be the symmetric metric space consisting of two points distance DD apart. Its reflexive completion R(A)R(A) is the set [0,D]×[0,D][0, D] \times [0, D] with metric d((s 1,s 2),(t 1,t 2))=max{t 1s 1,t 2s 2,0}. d((s_1, s_2), (t_1, t_2)) = max \{ t_1 - s_1, t_2 - s_2, 0 \}. The Yoneda embedding identifies the two points of AA with the points (0,D)(0, D) and (D,0)(D, 0) of R(A)R(A). The tight span T(A)T(A) is the straight line between these two points of R(A)R(A) (a diagonal of the square), which is isometric to the ordinary Euclidean line segment [0,D][0, D].

As that example shows, the reflexive completion of a space needn’t be symmetric, even if the original space was symmetric. On the other hand, it’s not too hard to show that the tight span of a space is always symmetric. Simon’s Theorem 4.1.1 slots everything into place:

Theorem (Willerton) Let AA be a symmetric metric space. Then the tight span T(A)T(A) is the largest symmetric subspace of R(A)R(A) containing AA.

Here “largest” means that if BB is another symmetric subspace of R(A)R(A) containing AA then BT(A)B \subseteq T(A). It’s not even obvious that there is a largest one. For instance, given any non-symmetric metric space CC, what’s the largest symmetric subspace of CC? There isn’t one, just because every singleton subset of CC is symmetric.

Simon’s told the following story before, but I can’t resist telling it again. Tight spans have been discovered independently many times over. The first time they were discovered, they were called by the less catchy name of “injective envelope” (because T(A)T(A) is the smallest injective metric space containing AA). And the person who made that first discovery? Isbell — who, as far as anyone knows, never noticed that this had anything to do with Isbell conjugacy.

Let me finish with something I don’t quite understand.

Simon’s Theorem 3.1.4 says the following. Let AA be a symmetric metric space. (He doesn’t assume symmetry, but I will.) Then for any aAa \in A, pR(A)p \in R(A) and ε>0\varepsilon \gt 0, there exists bAb \in A such that

d(a,p)+d(p,b)d(a,b)+ε. d(a, p) + d(p, b) \leq d(a, b) + \varepsilon.

In other words, aa, pp and bb are almost collinear.

A loose paraphrasing of this is that every point in the reflexive completion of AA is close to being on a geodesic between points of AA. The theorem does imply this, but it says a lot more. Look at the quantification. We get to choose one end aa of the not-quite-geodesic, as well as the point pp in the reflexive completion, and we’re guaranteed that if we continue the not-quite-geodesic from aa on through pp, then we’ll eventually meet another point of AA (or nearly).

Let’s get rid of those “not quite”s, and at the same time focus attention on the tight span rather than the reflexive completion. Back in Isbell’s original 1964 paper (cited by Simon, in case you want to look it up), it’s shown that if AA is compact then so is its tight span T(A)T(A). Of course, Simon’s Theorem 3.1.4 applies in particular when pT(A)p \in T(A). But then compactness of T(A)T(A) means that we can drop the ε\varepsilon.

In other words: let AA be a compact symmetric metric space. Then for any aAa \in A and pT(A)p \in T(A), there exists bAb \in A such that

d(a,p)+d(p,b)=d(a,b). d(a, p) + d(p, b) = d(a, b).

So, if you place your pencil at a point of AA, draw a straight line from it to a point of T(A)T(A), and keep going, you’ll eventually meet another point of AA.

This leaves me wondering what tight spans of common geometric figures actually look like.

For example, take an arc AA of a circle — any size arc, just not the whole circle. Embed it in the plane and give it the Euclidean metric. I said originally that the tight span is sometimes thought of as a sort of abstract convex hull, and indeed, the introduction to Simon’s paper says that some authors have actually used this name instead of “tight span”. But the result I just stated makes this seem highly misleading. It implies that the tight span of AA is not its convex hull, and indeed, can’t be any subspace of the Euclidean plane (unless, perhaps, it’s AA itself, which I suspect is not the case). But what is it?

n-Category Café Basic Category Theory

My new book is out!

Front cover of Basic Category Theory

Click the image for more information.

It’s an introductory category theory text, and I can prove it exists: there’s a copy right in front of me. (You too can purchase a proof.) Is it unique? Maybe. Here are three of its properties:

  • It doesn’t assume much.
  • It sticks to the basics.
  • It’s short.

I want to thank the nn-Café patrons who gave me encouragement during my last week of work on this. As I remarked back then, some aspects of writing a book — even a short one — require a lot of persistence.

But I also want to take this opportunity to make a suggestion. There are now quite a lot of introductions to category theory available, of various lengths, at various levels, and in various styles. I don’t kid myself that mine is particularly special: it’s just what came out of my individual circumstances, as a result of the courses I’d taught. I think the world has plenty of introductions to category theory now.

What would be really good is for there to be a nice second book on category theory. Now, there are already some resources for advanced categorical topics: for instance, in my book, I cite both the nnLab and Borceux’s three-volume Handbook of Categorical Algebra for this. But useful as those are, what we’re missing is a shortish book that picks up where Categories for the Working Mathematician leaves off.

Let me be more specific. One of the virtues of Categories for the Working Mathematician (apart from being astoundingly well-written) is that it’s selective. Mac Lane covers a lot in just 262 pages, and he does so by repeatedly making bold choices about what to exclude. For instance, he implicitly proves that for any finitary algebraic theory, the category of algebras has all colimits — but he does so simply by proving it for groups, rather than explicitly addressing the general case. (After all, anyone who knows what a finitary algebraic theory is could easily generalize the proof.) He also writes briskly: few words are wasted.

I’m imagining a second book on category theory of a similar length to Categories for the Working Mathematician, and written in the same brisk and selective manner. Over beers five years ago, Nicola Gambino and I discussed what this hypothetical book ought to contain. I’ve lost the piece of paper I wrote it down on (thus, Nicola is absolved of all blame), but I attempted to recreate it sometime later. Here’s a tentative list of chapters, in no particular order:

  • Enriched categories
  • 2-categories (and a bit on higher categories)
  • Topos theory (obviously only an introduction) and categorical set theory
  • Fibrations
  • Bimodules, Morita equivalence, Cauchy completeness and absolute colimits
  • Operads and Lawvere theories
  • Categorical logic (again, just a bit) and internal category theory
  • Derived categories
  • Flat functors and locally presentable categories
  • Ends and Kan extensions (already in Mac Lane’s book, but maybe worth another pass).

Someone else should definitely write such a book.

Scott Aaronson3-sentence summary of what’s happening in Israel and Gaza

Hamas is trying to kill as many civilians as it can.

Israel is trying to kill as few civilians as it can.

Neither is succeeding very well.

Update (July 28): Please check out a superb essay by Sam Harris on the Israeli/Palestinian conflict.  While, as Harris says, the essay contains “something to offend everyone”—even me—it also brilliantly articulates many of the points I’ve been trying to make in this comment thread.

See also a good HuffPost article by Ali A. Rizvi, a “Pakistani-Canadian writer, physician, and musician.”

August 01, 2014

Quantum DiariesHiggs versus Descartes: this round to Higgs.

René Descartes (1596 – 1650) was an outstanding physicist, mathematician and philosopher. In physics, he laid the ground work for Isaac Newton’s (1642 – 1727) laws of motion by pioneering work on the concept of inertia. In mathematics, he developed the foundations of analytic geometry, as illustrated by the term Cartesian[1] coordinates. However, it is in his role as a philosopher that he is best remembered. Rather ironic, as his breakthrough method was a failure.

Descartes’s goal in philosophy was to develop a sound basis for all knowledge based on ideas that were so obvious they could not be doubted. His touch stone was that anything he perceived clearly and distinctly as being true was true. The archetypical example of this was the famous I think therefore I am.  Unfortunately, little else is as obvious as that famous quote and even it can be––and has been––doubted.

Euclidean geometry provides the illusionary ideal to which Descartes and other philosophers have strived. You start with a few self-evident truths and derive a superstructure built on them.  Unfortunately even Euclidean geometry fails that test. The infamous parallel postulate has been questioned since ancient times as being a bit suspicious and even other Euclidean postulates have been questioned; extending a straight line depends on the space being continuous, unbounded and infinite.

So how are we to take Euclid’s postulates and axioms?  Perhaps we should follow the idea of Sir Karl Popper (1902 – 1994) and consider them to be bold hypotheses. This casts a different light on Euclid and his work; perhaps he was the first outstanding scientist.  If we take his basic assumptions as empirical[2] rather than sure and certain knowledge, all we lose is the illusion of certainty. Euclidean geometry then becomes an empirically testable model for the geometry of space time. The theorems, derived from the basic assumption, are prediction that can be checked against observations satisfying Popper’s demarcation criteria for science. Do the angles in a triangle add up to two right angles or not? If not, then one of the assumptions is false, probably the parallel line postulate.

Back to Descartes, he criticized Galileo Galilei (1564 – 1642) for having built without having considered the first causes of nature, he has merely sought reasons for particular effects; and thus he has built without a foundation. In the end, that lack of a foundation turned out to be less of a hindrance than Descartes’ faulty one.  To a large extent, sciences’ lack of a foundation, such as Descartes wished to provide, has not proved a significant obstacle to its advance.

Like Euclid, Sir Isaac Newton had his basic assumptions—the three laws of motion and the law of universal gravity—but he did not believe that they were self-evident; he believed that he had inferred them by the process of scientific induction. Unfortunately, scientific induction was as flawed as a foundation as the self-evident nature of the Euclidean postulates. Connecting the dots between a falling apple and the motion of the moon was an act of creative genius, a bold hypothesis, and not some algorithmic derivation from observation.

It is worth noting that, at the time, Newton’s explanation had a strong competitor in Descartes theory that planetary motion was due to vortices, large circulating bands of particles that keep the planets in place.  Descartes’s theory had the advantage that it lacked the occult action at a distance that is fundamental to Newton’s law of universal gravitation.  In spite of that, today, Descartes vortices are as unknown as is his claim that the pineal gland is the seat of the soul; so much for what he perceived clearly and distinctly as being true.

Galileo’s approach of solving problems one at time and not trying to solve all problems at once has paid big dividends. It has allowed science to advance one step at a time while Descartes’s approach has faded away as failed attempt followed failed attempt. We still do not have a grand theory of everything built on an unshakable foundation and probably never will. Rather we have models of widespread utility. Even if they are built on a shaky foundation, surely that is enough.

Peter Higgs (b. 1929) follows in the tradition of Galileo. He has not, despite his Noble prize, succeeded, where Descartes failed, in producing a foundation for all knowledge; but through creativity, he has proposed a bold hypothesis whose implications have been empirically confirmed.  Descartes would probably claim that he has merely sought reasons for a particular effect: mass. The answer to the ultimate question about life, the universe and everything still remains unanswered, much to Descartes’ chagrin but as scientists we are satisfied to solve one problem at a time then move on to the next one.

To receive a notice of future posts follow me on Twitter: @musquod.

[1] Cartesian from Descartes Latinized name Cartesius.

[2] As in the final analysis they are.

David HoggGaussian Processes for astronomers

I spent my research time today (much of it on planes and trains) working on my seminar (to be given tomorrow) about Gaussian Processes. I am relying heavily on the approach advocated by Foreman-Mackey, which is to start with weighted least squares (and the Bayesian generalization), then show how a kernel function in the covariance matrix changes the story, and then show how the Gaussian Process is latent in this solution. Not ready! But Foreman-Mackey made me a bunch of great demonstration figures.

Chad OrzelEureka! It’s a Book!

I took a short nap yesterday, and of course as soon as I lay down on the bed, Emmy erupted in the furious barking that signals the arrival of a package. When I went out to get it, I found shiny new bound galley proofs of Eureka: Discovering Your Inner Scientist:

The just-arrived galley proof for my forthcoming book. Laptop included for scale.

The just-arrived galley proof for my forthcoming book. Laptop included for scale.

I knew these were coming, but didn’t expect them so quickly. I asked for several to take to the UK, on the off chance that we’re seated next to somebody really famous on the flight to London, who says “Gosh, I’ve always wanted to blurb a book about science…,” so I’ll have something to give them. I’ll also have them at my talk in Bristol on August 13, and my Kaffeeklatsch at Loncon on the 14th. I might consider raffling one off, or awarding it to someone who performs great feats of strength science, or some such. You’ll have to come to those events to find out.

Anyway, while I was aware that these existed– and I’ve been carrying around a big stack of page proofs and a red pen for a week or two– seeing them in the paper made the whole thing much more real and exciting. It’s a book! With cover art and everything!

Now I really need to finish off these page proofs. And my talk for Bristol. And the prep for SteelyKid’s sixth birthday party tomorrow (the social event of the season!). And…

But: I have a book!

Tommaso DorigoReassuring SUSY Seekers

This morning at the ICNFP 2014 conference in Kolympari (Crete) the floor was taken by Abdelhak Djouadi, who gave a very nice overview of the theoretical implications of the Higgs boson discovery, especially exploring the status of Supersymmetry models.

Djouadi explained how even if the average mass of sparticles is being pushed up in surviving models of Supersymmetry -both because of the negative result of direct searches and because of the effect of hardwiring in the theoretical models the knowledge of a "heavy" lightest scalar particle, which sits at 125 GeV- there is reason to be optimistic. He explained that for stop quarks, it is the geometric mean of their masses that has to be high, but the lightest one may be laying well below the TeV.

read more

July 31, 2014

David HoggDDD meeting, day 2

On the second day of the Moore Foundation meeting, I gave my talk (about flexible models for exoplanet populations, exoplanet transits, and exoplanet-discovering hardware calibration). After my talk, I had a great conversation with Emmanuel Candès (Stanford), who asked me very detailed questions about my prior beliefs. I realized in the conversation that I have been violating all my own rules: I have been setting my prior beliefs about hyper-parameters in the space of the hyper-parameters and not in the space of the data. That is, you can only assess the influence and consistency of the prior pdf (consistency with your actual beliefs) by flowing the prior through the probabilistic model and generating data from it. I bet if I did that for some of the problems I was showing, I would find that my priors are absurd. This is a great rule, which I often say to others but don't do myself: Always sample data from your prior (not just parameters). This is a rule for Bayes but also a rule for those of us who eschew realism! More generally, Candès's expressed the view that priors should derive from data—prior data—a view with which I agree deeply. Unfortunately, when it comes to exoplanet populations, there really aren't any prior data to speak of.

There were many excellent talks again today; again this is an incomplete set of highlights for me: Titus Brown (MSU) explained his work on developing infrastructure for biology and bioinformatics. He made a number of comments about getting customer (or user) stories right and developing with the current customer in mind. These resonated for me in my experiences of software development. He also said that his teaching and workshops and outreach are self-interested: They feed back deep and valuable information about the customer. Jeffrey Heer (UW) said similar things about his development of DataWrangler, d3.js, and other data visualization tools. (d3.js is github's fourth most popular repository!) He showed some beautiful visualizations. Heer's demo of DataWrangler simply blew away the crowd, and there were questions about it for the rest of the day.

Carl Kingsford (CMU) caused me (and others) to gasp when he said that the Sequence Read Archive of biological sequences cannot be searched by sequence. It turns out that searching for strings in enormous corpuses of strings is actually a very hard problem (who knew?). He is using a new structure called a Bloom Filter Tree, in which k-mers (length-k subsections) are stored in the nodes and the leaves contain the data sets that contain those k-mers. It is very clever and filled with all the lovely engineering issues that the data structures were filled with lo so many years ago. Kingsford focuses on writing careful code, so the combination of clever data structures and well written code gets him orders of magnitude speed-ups over the competition.

Causal inference was an explicit or implicit component of many of the talks today. For example, Matthew Stephens (Chicago) is using natural genetic variations as a "randomized experiment" to infer gene expression and function. Laurel Larson (Berkeley) is looking for precursor events and predictors for abrupt ecological changes; since her work is being used to trigger interventions, she requires a causal model.

Blair Sullivan (NC State) spoke about performing inferences with provable properties on graphs. She noted that most interesting problems are NP hard on arbitrary graphs, but become easier on graphs that can be embedded (without crossing the edges) on a planar or low-genus space. This was surprising to me, but apparently the explanation is simple: Planar graphs are much more likely to have single nodes that split the graph into disconnected sub-graphs. Another surprising thing to me is that "motif counting" (which I think is searching for identical subgraphs within a graph) is very hard; it can only be done exactly and in general for very small subgraphs (six-ish nodes).

The day ended with Laura Waller (Berkeley) talking about innovative imaging systems for microscopy, including light-field cameras, and then a general set of cameras that do non-degenerate illumination sequences and infer many properties beyond single-plane intensity measurements. She showed some very impressive demonstrations of light-field inferences with her systems, which are sophisticated, but built with inexpensive hardware. Her work has a lot of conceptual overlap with astronomy, in the areas of adaptive optics and imaging with non-degenerate masks.

David HoggDDD meeting, day 1

Today was the first day of a private finalist meeting for the new Moore Foundation Data Driven Discovery Individual Investigator grants. The format is a shootout of short talks and a few group activities. There were many extremely exciting talks at the meeting; here is just an incomplete smattering of highlights for me:

Ethan White (Utah State) showed amazing ecology data, with most data sources being people looking at things and counting things in the field, but then also some remote sensing data. He is using high-resolution time-series data on plants and animals to forecast changes in species and the ecosystem. He appeared to be checking his models "in the space of the data"—not in terms of reproduction of some external "truth"—which was nice.

Yaser Abu-Mostafa (Caltech) spoke about the practice and pitfalls of machine learning; in particular he is interested in new methods to thwart data "snooping" which is the name he gives to the problem "if you torture your data enough, it will confess".

Carey Priebe (JHU) opened his talk with the "Cortical Column Conjecture" which claims that the cortex is made up of many repeats of small network structures that are themselves, in some sense, computing primitives. This hypothesis is hard to test both because the graphs of neural connections in real brains are very noisy, and because inference on graphs (including finding repeated sub-graphs) is combinatorically hard.

Amit Singer (Princeton) is using micrographs of large molecules to infer three-dimensional structures. Each micrograph provides noisy data on a two-dimensional projection of each molecule; the collection of such projections provides enough information to both infer the Euler angles (three angles per molecule) and the three-dimensional structure (a very high-dimensional object). This project is very related to things LeCun, Barron, and I were talking about many years ago with galaxies.

Kim Reynolds (UT Dallas) is using genetic variation to determine which parts of a protein sequence are important for setting the structure and which are replaceable. She makes predictions about mutations that would not interrupt structure or function. She showed amazing structures of cellular flagella parts, and proposed that she might be able to create new kinds of flagella that would be structurally similar but different in detail.

Tommaso Dorigo"Extraordinary Claims, The 0.000029% Solution" And The 38 MeV Boson AT ICNFP 2014

Yesterday I gave a lecture at the 3rd International Conference on New Frontiers in Physics, which is going on in kolympari (Crete). I spoke critically about the five-sigma criterion that is nowadays the accepted standard in particle physics and astrophysics for discovery claims.

My slides, as usual, are quite heavily written, which is a nuisance if you are sitting at the conference trying to follow my speech, but it becomes an asset if you are reading them by yourself post-mortem. You can find them here (pdf) and here (ppt) .

read more

Clifford JohnsonCafé Talk

cafe_sketch_27_06_14 Here's a quick sketch I did while in Princeton last month, at a new café, Café Vienna. (See earlier posts here and here for sketches in an older Princeton Café. I'm using a thicker marker for this one, by contrast, giving a different feel altogether, more akin to this one.) This new café promises to recreate the atmosphere of the Cafés of Vienna and so I kind of had to have coffee there before I left. Why? Well, two reasons, one obvious and the other less so: [...] Click to continue reading this post

July 30, 2014

Clifford JohnsonTriply Dyonic

dyon_phase_diagramsI thought I'd mentioned this already, but I could not find anything after a search on the blog so somehow I think I must have forgotten to. It is a cute thing about a certain favourite solution (or class of solutions) of Einstein's equations that I've talked about here before. I'm talking about the Taub-NUT solution (and its cousin, Taub-Bolt). Taub-NUT is sort of interesting for lots of reasons. Many, in fact. One of them concerns it having both mass [tex]M[/tex] and another parameter called "nut charge", [tex]N[/tex]. There are several ways to think about what nut charge is, but one curious way is that is is sort of a "magnetic" counterpart to the ordinary mass, which can be thought of as an "electric" quantity. The language is based on analogy with electromagnetism, where, in the usual [...] Click to continue reading this post

Quantum DiariesAccelerator physicist invents new way to clean up oil spills

This article appeared in Fermilab Today on July 30, 2014.

Fermilab physicist Arden Warner revolutionizes oil spill cleanup with magnetizable oil invention. Photo: Hanae Armitage

Fermilab physicist Arden Warner revolutionizes oil spill cleanup with magnetizable oil invention. Photo: Hanae Armitage

Four years ago, Fermilab accelerator physicist Arden Warner watched national news of the BP oil spill and found himself frustrated with the cleanup response.

“My wife asked ‘Can you separate oil from water?’ and I said ‘Maybe I could magnetize it!’” Warner recalled. “But that was just something I said. Later that night while I was falling asleep, I thought, you know what, that’s not a bad idea.”

Sleep forgone, Warner began experimenting in his garage. With shavings from his shovel, a splash of engine oil and a refrigerator magnet, Warner witnessed the preliminary success of a concept that could revolutionize the process of oil spill damage control.

Warner has received patent approval on the cleanup method.

The concept is simple: Take iron particles or magnetite dust and add them to oil. It turns out that these particles mix well with oil and form a loose colloidal suspension that floats in water. Mixed with the filings, the suspension is susceptible to magnetic forces. At a barely discernible 2 to 6 microns in size, the particles tend to clump together, and it only takes a sparse dusting for them to bond with the oil. When a magnetic field is applied to the oil and filings, they congeal into a viscous liquid known as a magnetorheological fluid. The fluid’s viscosity allows a magnetic field to pool both filings and oil to a single location, making them easy to remove. (View a 30-second video of the reaction.)

“It doesn’t take long — you add the filings, you pull them out. The entire process is even more efficient with hydrophobic filings. As soon as they hit the oil, they sink in,” said Warner, who works in the Accelerator Division. Hydrophobic filings are those that don’t like to interact with water — think of hydrophobic as water-fearing. “You could essentially have a device that disperses filings and a magnetic conveyor system behind it that picks it up. You don’t need a lot of material.”

Warner tested more than 100 oils, including sweet crude and heavy crude. As it turns out, the crude oils’ natural viscosity makes it fairly easy to magnetize and clear away. Currently, booms, floating devices that corral oil spills, are at best capable of containing the spill; oil removal is an entirely different process. But the iron filings can work in conjunction with an electromagnetic boom to allow tighter constriction and removal of the oil. Using solenoids, metal coils that carry an electrical current, the electromagnetic booms can steer the oil-filing mixture into collector tanks.

Unlike other oil cleanup methods, the magnetized oil technique is far more environmentally sound. There are no harmful chemicals introduced into the ocean — magnetite is a naturally occurring mineral. The filings are added and, briefly after, extracted. While there are some straggling iron particles, the vast majority is removed in one fell, magnetized swoop — the filings can even be dried and reused.

“This technique is more environmentally benign because it’s natural; we’re not adding soaps and chemicals to the ocean,” said Cherri Schmidt, head of Fermilab’s Office of Partnerships and Technology Transfer. “Other ‘cleanup’ techniques disperse the oil and make the droplets smaller or make the oil sink to the bottom. This doesn’t do that.”

Warner’s ideas for potential applications also include wildlife cleanup and the use of chemical sensors. Small devices that “smell” high and low concentrations of oil could be fastened to a motorized electromagnetic boom to direct it to the most oil-contaminated areas.

“I get crazy ideas all the time, but every so often one sticks,” Warner said. “This is one that I think could stick for the benefit of the environment and Fermilab.”

Hanae Armitage

Chad OrzelUncertain Dots 20

In which Rhett and I talk about awful academic computing systems, Worldcon, our Wikipedia pages, and AAPT meeting envy.

Some links:

Rhett’s Wikipedia entry

My Wikipedia entry

The 2014 AAPT Summer Meeting

LonCon 3, this year’s Worldcon

My puzzling Worldcon schedule

We have some ideas for what to do next time, when our little hangout is old enough to drink, but you need to watch all the way to the end to hear those.

July 29, 2014

Quantum DiariesCERN through the eyes of a young scientist

Inspired by the event at the UNESCO headquarters in Paris that celebrated the anniversary of the signature of the CERN convention, Sophie Redford wrote about her impressions on joining CERN as a young researcher. A CERN fellow designing detectors for the future CLIC accelerator, she did her PhD at the University of Oxford, observing rare B decays with the LHCb experiment.

The “60 years of CERN” celebrations give us all the chance to reflect on the history of our organization. As a young scientist, the early years of CERN might seem remote. However, the continuity of CERN and its values connects this distant past to the present day. At CERN, the past isn’t so far away.

Of course, no matter when you arrive at CERN for the first time, it doesn’t take long to realize that you are in a place with a special history. On the surface, CERN can appear scruffy. Haphazard buildings produce a maze of long corridors, labelled with seemingly random numbers to test the navigation of newcomers. Auditoriums retain original artefacts: ashtrays and blackboards unchanged since the beginning, alongside the modern-day gadgetry of projectors and video-conferencing systems.

The theme of re-use continues underground, where older machines form the injection chain for new. It is here, in the tunnels and caverns buried below the French and Swiss countryside, where CERN spends its money. Accelerators and detectors, their immense size juxtaposed with their minute detail, constitute an unparalleled scientific experiment gone global. As a young scientist this is the stuff of dreams, and you can’t help but feel lucky to be a part of it.

If the physical situation of CERN seems unique, so is the sociological. The row of flags flying outside the main entrance is a colourful red herring, for aside from our diverse allegiances during international sporting events, nationality is meaningless inside CERN. Despite its location straddling international borders, despite our wallets containing two currencies and our heads many languages, scientific excellence is the only thing that matters here. This is a community driven by curiosity, where coffee and cooperation result in particle beams. At CERN we question the laws of our universe. Many answers are as yet unknown but our shared goal of discovery bonds us irrespective of age or nationality.

As a young scientist at CERN I feel welcome and valued; this is an environment where reason and logic rule. I feel privileged to profit from the past endeavour of others, and great pride to contribute to the future of that which others have started. I have learnt that together we can achieve extraordinary things, and that seemingly insurmountable problems can be overcome.

In many ways, the second 60 years of CERN will be nothing like the first. But by continuing to build on our past we can carry the founding values of CERN into the future, allowing the next generation of young scientists to pursue knowledge without borders.

By Sophie Redford

Jordan EllenbergR.E.M. live at the Rockpalast, 2 Oct 1985

Complete show on YouTube.  In case you were wondering what the fuss was about.

BackreactionCan you touch your nose?

Yeah, but can you? Believe it or not, it’s a question philosophers have plagued themselves with for thousands of years, and it keeps reappearing in my feeds!

Best source I could find for this image: IFLS.

My first reaction was of course: It’s nonsense – a superficial play on the words “you” and “touch”. “You touch” whatever triggers the nerves in your skin. There, look, I’ve solved a thousand year’s old problem in a matter of 3 seconds.

Then it occurred to me that with this notion of “touch” my shoes never touch the ground. Maybe I’m not a genius after all. Let me get back to that cartoon then. Certainly deep thoughts went into it that I must unravel.

The average size of an atom is an Angstrom, 10-10 m. The typical interatomar distance in molecules is a nanometer, 10-9 meter, or let that be a few nanometers if you wish. At room temperature and normal atmospheric pressure, electrostatic repulsion prevents you from pushing atoms any closer together. So the 10-8 meter in the cartoon seem about correct.

But it’s not so simple...

To begin with it isn’t just electrostatic repulsion that prevents atoms from getting close, it is more importantly the Pauli exclusion principle which forces the electrons and quarks that make up the atom to arrange in shells rather than to sit on top of each other.

If you could turn off the Pauli exclusion principle, all electrons from the higher shells would drop into the ground state, releasing energy. The same would happen with the quarks in the nucleus which arrange in similar levels. Since nuclear energy scales are higher than atomic scales by several orders of magnitude, the nuclear collapse causes the bulk of the emitted energy. How much is it?

The typical nuclear level splitting is some 100 keV, that is a few 10-14 Joule. Most of the Earth is made up of silicon, iron and oxygen, ie atomic numbers of the order of 15 or so on the average. This gives about 10-12 Joule per atom, that is 1011 Joule per mol, or 1kTon TNT per kg.

This back-of-the envelope gives pretty much exactly the maximal yield of a nuclear weapon. The difference is though that turning off the Pauli exclusion principle would convert every kg of Earthly matter into a nuclear bomb. Since our home planet has a relatively small gravitational pull, I guess it would just blast apart. I saw everybody die, again, see that’s how it happens. But I digress; let me get back to the question of touch.

So it’s not just electrostatics but also the Pauli exclusion principle that prevents you from falling through the cracks. Not only do the electrons in your shoes don’t want to touch the ground, the electrons in your shoes don’t want to touch the other electrons in your shoes either. Electrons, or fermions generally, just don’t like each other.

The 10-8 meter actually seem quite optimistic because surfaces are not perfectly even, they have a roughness to them, which means that the average distance between two solids is typically much larger than the interatomic spacing that one has in crystals. Moreover, the human body is not a solid and the skin normally covered by a thin layer of fluids. So you never touch anything just because you’re separated by a layer of grease from the world.

To be fair, grease isn’t why the Greeks were scratching their heads back then, but a guy called Zeno. Zeno’s most famous paradox divides a distance into halves indefinitely to then conclude then that because it consists of an infinite number of steps, the full distance can never be crossed. You cannot, thus, touch your nose, spoke Zeno, or ram an arrow into it respectively. The paradox resolved once it was established that infinite series can converge to finite values; the nose was in the business again, but Zeno would come back to haunt the thinkers of the day centuries later.

The issue reappeared with the advance of the mathematical field of topology in the 19th century. Back then, math, physics, and philosophy had not yet split apart, and the bright minds of the times, Descarte, Euler, Bolzano and the like, they wanted to know, using their new methods, what does it mean for any two objects to touch? And their objects were as abstract as it gets. Any object was supposed to occupy space and cover a topological set in that space. So far so good, but what kind of set?

In the space of the real numbers, sets can be open or closed or a combination thereof. Roughly speaking, if the boundary of the set is part of the set, the set is closed. If the boundary is missing the set is open. Zeno constructed an infinite series of steps that converges to a finite value and we meet these series again in topology. Iff the limiting value (of any such series) is part of the set, the set is closed. (It’s the same as the open and closed intervals you’ve been dealing with in school, just generalized to more dimensions.) The topologists then went on to reason that objects can either occupy open sets or closed sets, and at any point in space there can be only one object.

Sounds simple enough, but here’s the conundrum. If you have two open sets that do not overlap, they will always be separated by the boundary that isn’t part of either of them. And if you have two closed sets that touch, the boundary is part of both, meaning they also overlap. In neither case can the objects touch without overlapping. Now what? This puzzle was so important to them that Bolzano went on to suggest that objects may occupy sets that are partially open and partially closed. While technically possible, it’s hard to see why they would, in more than 1 spatial dimension, always arrange so as to make sure one’s object closed surface touches the other’s open patches.

More time went by and on the stage of science appeared the notion of fields that mediate interactions between things. Now objects could interact without touching, awesome. But if they don’t repel what happens when they get closer? Do or don’t they touch eventually? Or does interacting via a field means they touch already? Before anybody started worrying about this, science moved on and we learned that the field is quantized and the interaction really just mediated by the particles that make up the field. So how do we even phrase now the question whether two objects touch?

We can approach this by specifying that we mean with an “object” a bound state of many atoms. The short distance interaction of these objects will (at room temperature, normal atmospheric pressure, non-relativistically, etc) take place primarily by exchanging (virtual) photons. The photons do in no sensible way belong to any one of the objects, so it seems fair to say that the objects don’t touch. They don’t touch, in one sentence, because there is no four-fermion interaction in the standard model of particle physics.

Alas, tying touch to photon exchange in general doesn’t make much sense when we think about the way we normally use the word. It does for example not have any qualifier about the distance. A more sensible definition would make use of the probability of an interaction. Two objects touch (in some region) if their probability of interaction (in that region) is large, whether or not it was mediated by a messenger particle. This neatly solves the topologists’ problem because in quantum mechanics two objects can indeed overlap.

What one means with “large probability” of interaction is somewhat arbitrary of course, but quantum mechanics being as awkward as it is there’s always the possibility that your finger tunnels through your brain when you try to hit your nose, so we need a quantifier because nothing is ever absolutely certain. And then, after all, you can touch your nose! You already knew that, right?

But if you think this settles it, let me add...

Yes, no, maybe, wtf.
There is a non-vanishing probability that when you touch (attempt to touch?) something you actually exchange electrons with it. This opens a new can of worms because now we have to ask what is “you”? Are “you” the collection of fermions that you are made up of and do “you” change if I remove one electron and replace it with an identical electron? Or should we in that case better say that you just touched something else? Or are “you” instead the information contained in a certain arrangement of elementary particles, irrespective of the particles themselves? But in this case, “you” can never touch anything just because you are not material to begin with. I will leave that to you to ponder.

And so, after having spent an hour staring at that cartoon in my facebook feed, I came to the conclusion that the question isn’t whether we can touch something, but what we mean with “some thing”. I think I had been looking for some thing else though…

Doug NatelsonA book, + NNIN

Sorry for the posting drought.  There is a good reason:  I'm in the final stages of a textbook based on courses I developed about nanostructures and nanotechnology.  It's been an embarrassingly long time in the making, but I'm finally to the index-plus-final-touches stage.  I'll say more when it's in to the publisher.

One other thing:  I'm going to a 1.5 day workshop at NSF in three weeks about the next steps regarding the NNIN.  I've been given copies of the feedback that NSF received in their request for comment period, but if you have additional opinions or information that you'd like aired there, please let me know, either in the comments or via email.

July 28, 2014

Jordan EllenbergRank 2 versus rank 3

One interesting feature of the heuristics of Garton, Park, Poonen, Wood, Voight, discussed here previously: they predict there are fewer elliptic curves of rank 3 than there are of rank 2.  Is this what we believe?  On one hand, you might believe that having three independent points should be “harder” than having only two.  But there’s the parity issue.  All right-thinking people believe that there are equally many rank 0 and rank 1 elliptic curves, because 100% of curves with even parity have rank 0, and 100% of curves with odd parity have rank 1.  If a curve has even parity, all that has to happen to force it to have rank 2 is to have a non-torsion point.  And if a curve has odd parity, all that has to happen to force it to have rank 3 is to have one more non-torsion point you don’t know about it.  So in that sense, it seems “equally hard” to have rank 2 or rank 3, given that parity should be even half the time and odd half the time.

So my intuition about this question is very weak.  What’s yours?  Should rank 3 be less common than rank 2?  The same?  More common?

Sean CarrollQuantum Sleeping Beauty and the Multiverse

Hidden in my papers with Chip Sebens on Everettian quantum mechanics is a simple solution to a fun philosophical problem with potential implications for cosmology: the quantum version of the Sleeping Beauty Problem. It’s a classic example of self-locating uncertainty: knowing everything there is to know about the universe except where you are in it. (Skeptic’s Play beat me to the punch here, but here’s my own take.)

The setup for the traditional (non-quantum) problem is the following. Some experimental philosophers enlist the help of a subject, Sleeping Beauty. She will be put to sleep, and a coin is flipped. If it comes up heads, Beauty will be awoken on Monday and interviewed; then she will (voluntarily) have all her memories of being awakened wiped out, and be put to sleep again. Then she will be awakened again on Tuesday, and interviewed once again. If the coin came up tails, on the other hand, Beauty will only be awakened on Monday. Beauty herself is fully aware ahead of time of what the experimental protocol will be.

So in one possible world (heads) Beauty is awakened twice, in identical circumstances; in the other possible world (tails) she is only awakened once. Each time she is asked a question: “What is the probability you would assign that the coin came up tails?”

Modified from a figure by Stuart Armstrong.

Modified from a figure by Stuart Armstrong.

(Some other discussions switch the roles of heads and tails from my example.)

The Sleeping Beauty puzzle is still quite controversial. There are two answers one could imagine reasonably defending.

  • Halfer” — Before going to sleep, Beauty would have said that the probability of the coin coming up heads or tails would be one-half each. Beauty learns nothing upon waking up. She should assign a probability one-half to it having been tails.
  • Thirder” — If Beauty were told upon waking that the coin had come up heads, she would assign equal credence to it being Monday or Tuesday. But if she were told it was Monday, she would assign equal credence to the coin being heads or tails. The only consistent apportionment of credences is to assign 1/3 to each possibility, treating each possible waking-up event on an equal footing.

The Sleeping Beauty puzzle has generated considerable interest. It’s exactly the kind of wacky thought experiment that philosophers just eat up. But it has also attracted attention from cosmologists of late, because of the measure problem in cosmology. In a multiverse, there are many classical spacetimes (analogous to the coin toss) and many observers in each spacetime (analogous to being awakened on multiple occasions). Really the SB puzzle is a test-bed for cases of “mixed” uncertainties from different sources.

Chip and I argue that if we adopt Everettian quantum mechanics (EQM) and our Epistemic Separability Principle (ESP), everything becomes crystal clear. A rare case where the quantum-mechanical version of a problem is actually easier than the classical version.

In the quantum version, we naturally replace the coin toss by the observation of a spin. If the spin is initially oriented along the x-axis, we have a 50/50 chance of observing it to be up or down along the z-axis. In EQM that’s because we split into two different branches of the wave function, with equal amplitudes.

Our derivation of the Born Rule is actually based on the idea of self-locating uncertainty, so adding a bit more to it is no problem at all. We show that, if you accept the ESP, you are immediately led to the “thirder” position, as originally advocated by Elga. Roughly speaking, in the quantum wave function Beauty is awakened three times, and all of them are on a completely equal footing, and should be assigned equal credences. The same logic that says that probabilities are proportional to the amplitudes squared also says you should be a thirder.

But! We can put a minor twist on the experiment. What if, instead of waking up Beauty twice when the spin is up, we instead observe another spin. If that second spin is also up, she is awakened on Monday, while if it is down, she is awakened on Tuesday. Again we ask what probability she would assign that the first spin was down.


This new version has three branches of the wave function instead of two, as illustrated in the figure. And now the three branches don’t have equal amplitudes; the bottom one is (1/√2), while the top two are each (1/√2)2 = 1/2. In this case the ESP simply recovers the Born Rule: the bottom branch has probability 1/2, while each of the top two have probability 1/4. And Beauty wakes up precisely once on each branch, so she should assign probability 1/2 to the initial spin being down. This gives some justification for the “halfer” position, at least in this slightly modified setup.

All very cute, but it does have direct implications for the measure problem in cosmology. Consider a multiverse with many branches of the cosmological wave function, and potentially many identical observers on each branch. Given that you are one of those observers, how do you assign probabilities to the different alternatives?

Simple. Each observer Oi appears on a branch with amplitude ψi, and every appearance gets assigned a Born-rule weight wi = |ψi|2. The ESP instructs us to assign a probability to each observer given by

P(O_i) = w_i/(\sum_j w_j).

It looks easy, but note that the formula is not trivial: the weights wi will not in general add up to one, since they might describe multiple observers on a single branch and perhaps even at different times. This analysis, we claim, defuses the “Born Rule crisis” pointed out by Don Page in the context of these cosmological spacetimes.

Sleeping Beauty, in other words, might turn out to be very useful in helping us understand the origin of the universe. Then again, plenty of people already think that the multiverse is just a fairy tale, so perhaps we shouldn’t be handing them ammunition.

Chad OrzelThe Fermi Alternative

Given the recent Feynman explosion (timeline of events), some people may be casting about looking for an alternative source of colorful-character anecdotes in physics. Fortunately, the search doesn’t need to go all that far– if you flip back a couple of pages in the imaginary alphabetical listing of physicists, you’ll find a guy who fits the bill very well: Enrico Fermi.

Fermi’s contributions to physics are arguably as significant as Feynman’s. He was the first to work out the statistical mechanics of particles obeying the Pauli exclusion principle, now called “fermions” in his honor (Paul Dirac did the same thing independently a short time later, so the mathematical function is the “Fermi-Dirac distrbution“). He was also the first to develop a theory for the weak nuclear interaction, placing Wolfgang Pauli’s desperate suggestion of the existence of the neutrino on a sound theoretical footing. Fermi’s theory was remarkably successful, and anticipated or readily incorporated the next thirty-ish years of discoveries.

More than that, he was a successful experimental physicist. He did pioneering experiments with neutrons, including demonstrating the fission of heavy elements (though he initially misinterpreted this as the creation of transuranic elements) and was the first to successfully construct a nuclear reactor, as part of the Manhattan Project. The US’s great particle physics lab is named Fermilab in his honor.

One of the difficult things about replacing Feynman is that a lot of the genuinely admirable things about his approach to physics are sort of bound up with his personality. Meaning that it’s easy to slide from approach-to-physics stuff– spinning plates at Cornell, etc.– to relatively wholesome anecdotes– dazzling off-the-cuff calculations, cracking safes at Los Alamos– into the strip clubs and other things that make Feynman a polarizing figure.

Fermi brings a lot of the same positive features without the baggage. He had a similarly playful approach to a lot of physics-related things– the whole notion of “Fermi problems” and back-of-the-envelope calculations is pretty much essential to the physics mindset. Wikipedia has a great secondhand quote:

I can calculate anything in physics within a factor 2 on a few sheets: to get the numerical factor in front of the formula right may well take a physicist a year to calculate, but I am not interested in that.

He was also a charming and witty guy, with a quirky sense of humor (the photo above has a mistake in one of the equations, and people have spent years debating whether that was deliberate, because it’s the kind of thing he might’ve done as a joke on the PR people). He even has his own great Manhattan Project anecdotes– he famously estimated the strength of the blast by dropping pieces of paper and pacing off how far they blew when the shock wave hit, and prior to the Trinity test was reprimanded by Oppenheimer for running a betting pool on whether the test would ignite the atmosphere and obliterate life on Earth.

He also has the wide-ranging interests going for him. His name pops up all over physics, from statistical mechanics to the astrophysics of cosmic rays. And just like Feynman is more likely to be cited in popular writing for inspiring either nanotechnology or quantum computing than for his work on QED, Fermi’s true source of popular immortality is that damn “paradox” about aliens.

While the personal anecdotes may not quite stack up to those about Feynman, there isn’t the same dark edge. Fermi was happily married, and in fact moved to the US because of his wife, who was Jewish and subject to the racist policies put in place by Mussolini (they left directly from the Nobel Prize ceremony in 1938, where Fermi picked up a prize for work done in Rome). I’m not aware of any salacious Fermi stories, so he’s much safer in that regard.

Given all that, why is Fermi so much less well-known than Feynman? Partly because a lot of his contributions to physics were excessively practical– experimentalists tend to be less mythologized than theorists, and his greatest theoretical contributions came through cleaning up ideas proposed by Pauli. Mostly, though, it’s because he was a generation older than Feynman and died younger, in 1954. He never got the chance to be a grand old man, and didn’t live into the era where the sort of colorful anecdotage that so inflates Feynman’s status became popular. Had he lived another twenty years, things might’ve been different.

(It’s also interesting to speculate about what Schrödinger’s reputation would be had he been closer to Feynman’s age than Einstein’s. Schrödinger would’ve loved the Sixties, between his fascination with Vedic philosophy and the whole “free love” thing. But had he been alive through then, the skeevy aspects of his personal life would probably be better known, because most of his more sordid activities took place in an era when people didn’t really talk about that sort of thing in public, let alone write best-selling autobiographies about it.)

Anyway, that’s my plug for Fermi. If you find Feynman too problematic to promote– and that’s an entirely reasonable decision– Fermi gets you a lot of the same good stuff (great physicist, playful approach to science, charming and personable guy), without the darker side. He should get more press.


(That said, Fermi is one of the figures I regret not being able to feature more prominently in the forthcoming book. The problem is, the focus of the book is on process, and Fermi’s one of those guys whose process of discovery consisted mostly of “be super smart, and work really hard.” I couldn’t come up with a way to fit him into the framework of the book, other than the notion of back-of-the-envelope estimation. But you can’t do that without bringing math into it, and that would’ve pushed the book into a different category…)

Tommaso DorigoMore On The Alleged WW Excess From The LHC

This is just a short update on the saga of the anomalous excess of W-boson-pair production that the ATLAS and CMS collaborations have reported in their 7-TeV and 8-TeV proton-proton collision data. A small bit of information which I was unaware of, and which can be added to the picture.

read more

Clifford JohnsonHey, You…

page_extract_27_07_2014_2Today (Sunday) I devoted my work time to finishing an intensely complicated page. It is the main "establishing shot" type page for a story set in a Natural History Museum. This is another "don't do" if you want to save yourself time, since such a location results in lots of drawings of bones and stuffed animals and people looking at bones and stuffed animals. (The other big location "don't do" from an earlier post was cityscapes with lots of flashy buildings with endless windows to draw. :) ) Perhaps annoyingly, I won't show you the page_extract_27_07_2014_1 huge panels filled with such things, and instead show you a small corner panel of the type that people might not look at much (because there are no speech bubbles and so forth). This is seconds before our characters meet. A fun science-filled conversation will follow...(Yes these are the same characters from another story I've shown you extracts from.) [Update: I suppose I ought to explain the cape? It is a joke. I thought I'd have a [...] Click to continue reading this post

July 27, 2014

Jordan EllenbergSubnostalgia

For some reason I was thinking about pieces of culture that have departed from the world but which somehow didn’t “stick” well enough to persist even in the sphere of nostalgia.  Like when people think about the early 1990s, the years when I was in college, they might well say “oh yeah, grunge” or “oh yeah, wearing used gas station T-shirts with a name stitched on” or “oh yeah, Twin Peaks” or “oh yeah, OK Soda” or whatever.

But no one says “oh yeah, Fido Dido.”  So here I am doing it.

It is inherently hard to try to list things you’ve forgotten about.  My list right now consists of

  • Fido Dido
  • Saying “bite me”
  • Smartfood
  • Devil sticks (from Jason Starr)

That’s it.  What have you got?

Quantum DiariesWhat are Sterile Neutrinos?

Sterile Neutrinos in Under 500 Words

Hi Folks,

In the Standard Model, we have three groups of particles: (i) force carriers, like photons and gluons; (ii) matter particles, like electrons, neutrinos and quarks; and (iii) the Higgs. Each force carrier is associated with a force. For example: photons are associated with electromagnetism, the W and Z bosons are associated with the weak nuclear force, and gluons are associated with the strong nuclear force. In principle, all particles (matter, force carries, the Higgs) can carry a charge associated with some force. If this is ever the case, then the charged particle can absorb or radiate a force carrier.

SM Credit: Wiki

Credit: Wikipedia

As a concrete example, consider electrons and top quarks. Electrons carry an electric charge of “-1″ and a top quark carries an electric charge of “+2/3″. Both the electron and top quark can absorb/radiate photons, but since the top quark’s electric charge is smaller than the electron’s electric charge, it will not absorb/emit a photon as often as an electron. In a similar vein, the electron carries no “color charge”, the charge associated with the strong nuclear force, whereas the top quark does carry color and interacts via the strong nuclear force. Thus, electrons have no idea gluons even exist but top quarks can readily emit/absorb them.

Neutrinos  possess a weak nuclear charge and hypercharge, but no electric or color charge. This means that neutrinos can absorb/emit W and Z bosons and nothing else.  Neutrinos are invisible to photons (particle of light) as well as gluons (particles of the color force).  This is why it is so difficult to observe neutrinos: the only way to detect a neutrino is through the weak nuclear interactions. These are much feebler than electromagnetism or the strong nuclear force.

Sterile neutrinos are like regular neutrinos: they are massive (spin-1/2) matter particles that do not possess electric or color charge. The difference, however, is that sterile neutrinos do not carry weak nuclear or hypercharge either. In fact, they do not carry any charge, for any force. This is why they are called “sterile”; they are free from the influences of  Standard Model forces.



The properties of sterile neutrinos are simply astonishing. For example: Since they have no charge of any kind, they can in principle be their own antiparticles (the infamous “sterile Majorana neutrino“). As they are not associated with either the strong nuclear scale or electroweak symmetry breaking scale, sterile neutrinos can, in principle, have an arbitrarily large/small mass. In fact, very heavy sterile neutrinos might even be dark matter, though this is probably not the case. However, since sterile neutrinos do have mass, and at low energies they act just like regular Standard Model neutrinos, then they can participate in neutrino flavor oscillations. It is through this subtle effect that we hope to find sterile neutrinos if they do exist.

Credit: Kamioka Observatory/ICRR/University of Tokyo

Credit: Kamioka Observatory/ICRR/University of Tokyo

Until next time!

Happy Colliding,

Richard (@bravelittlemuon)


Chad OrzelKids Those Days

Lance Mannion has a really nice contrast between childhood now and back in the 1970′s that doesn’t go in the usual decline-of-society direction. He grew up not too far from where I now live, and after describing his free-ranging youth, points out some of the key factors distinguishing it from today, that need to be accounted for before lamenting the lack of kids running around outside:

– A lot of the houses in “the old neighborhood” are still owned by the people who owned them back in the day, so the only kids around are visiting grandkids,

– Those homes that are occupied by families with kids are usually occupied by families with fewer kids than back in the day, so there are fewer older siblings to keep tabs on younger kids and that kind of thing,

– Most importantly, back in the day, there were fewer two-career families. Those kids running around out in the neighborhood were always within shouting distance of multiple parents.

I’m a bit younger than Lance, and grew up way out in the country, but this rings pretty true to my experience. And as I said, we live in a neighborhood not all that far from the one he talks about, and the changes he describes also ring true. Our neighborhood is great, but it’s split between families with kids and empty-nesters. The “with kids” fraction is increasing, but there are probably two childless houses for every one with kids, and most of the families are on the smaller side compared to the 70′s– I can’t think of anyone in the immediate neighborhood with more than three kids.

So a lot of things have changed to make it less likely that you’ll see kids running around outside. Which doesn’t mean there aren’t kids running around outside– my computer sits in front of a window that faces the street, and when I’m home during the day, I regularly see kids running and playing outside. But the overall numbers are reduced to a point where it’s fairly likely that people driving through the neighborhood could reasonably be clucking their tongues and talking about how sad it is that no kids play outside any more. You have to live here to know that there are kids around, because the density is lower than it used to be.

And the lack of kids is more apparent during the day, for the economic reasons Lance notes. Basically all of the families with kids in the area are two-career families, which means that during work hours, the number of kids around drops to nearly zero. They’re all in day care, even in the summer. Not because parents are overly controlling, or afraid to let their kids roam, but because they’re at work, because they have to be to live in this neighborhood. This also feeds some of the “never away from parents” thing that people talk about, because the time parents get to spend with their kids is more limited and thus more valuable.

But there are pockets that seem a lot like the old days– down the block from us, there’s a cluster of three families all with kids about SteelyKid’s age (including one of her kindergarten classmates). Those kids are in and out of each others’ houses and yards all day long, often with no visible adult supervision. SteelyKid had a friend over yesterday, and we wandered down there after lunch, where the kids ran around a lot like it was back in the day. It’s a little too far to just send a six-year-old off there on her own (and none of the houses between here and there are families with kids), but within a few years, I can easily imagine pointing SteelyKid in their direction after school and on weekends, and having the kids from down there show up in our yard.

Anyway, it’s worth reading Lance’s post, because it’s a cut above most of the hand-wringing you see about the way we raise kids these days. It’s really not as dire a situation as a lot of cultural critics make out, if you look carefully at what’s changed from the “good old days.”

Jordan EllenbergCool song, bro

I was in Barriques and “Bra,” by Cymande came on, and I was like, cool song, cool of Barriques to be playing this song that I’m cool for knowing about, maybe I should go say something to show everyone that I already know this cool song, and then I thought, why do I know about this song anyway? and I remembered that it was because sometime last year it was playing in Barriques and I was like, what is this song, it’s cool? and I Shazammed it.

So I guess what I’m saying is, I’m probably going to the right coffee shop.  Also, this song is cool.  I’m sort of fascinated by the long instrumental break that starts around 2:50.  It doesn’t seem like very much is happening; why is it so captivating?  I think my confusion on this point has something to do with my lack of understanding of drums.

July 26, 2014

Quantum DiariesA Physicist and Historian Walk Into a Coffee Shop

It’s Saturday, so I’m at the coffee shop working on my thesis again. It’s become a tradition over the last year that I meet a writer friend each week, we catch up, have something to drink, and sit down for a few hours of good-quality writing time.


The work desk at the coffee shop: laptop, steamed pork bun, and rosebud latte.

We’ve gotten to know the coffee shop really well over the course of this year. It’s pretty new in the neighborhood, but dark and hidden enough that business is slow, and we don’t feel bad keeping a table for several hours. We have our favorite menu items, but we’ve tried most everything by now. Some mornings, the owner’s family comes in, and the kids watch cartoons at another table.

I work on my thesis mostly, or sometimes I’ll work on analysis that spills over from the week, or I’ll check on some scheduled jobs running on the computing cluster.

My friend Jason writes short stories, works on revising his novel (magical realism in ancient Egypt in the reign of Rameses XI), or drafts posts for his blog about the puzzles of the British constitution. We trade tips on how to organize notes and citations, and how to stay motivated. So I’ve been hearing a lot about the cultural difference between academic work in the humanities and the sciences. One of the big differences is the level of citation that’s expected.

As a particle physicist, when I write a paper it’s very clear which experiment I’m writing about. I only write about one experiment at a time, and I typically focus on a very small topic. Because of that, I’ve learned that the standard for making new claims is that you usually make one new claim per paper, and it’s highlighted in the abstract, introduction, and conclusion with a clear phrase like “the new contribution of this work is…” It’s easy to separate which work you claim as your own and which work is from others, because anything outside “the new contribution of this work” belongs to others. A single citation for each external experiment should suffice.

For academic work in history, the standard is much different: the writing itself is much closer to the original research. As a start, you’ll need a citation for each quote, going to sources that are as primary as you can get your hands on. The stranger idea for me is that you also need a citation for each and every idea of analysis that someone else has come up with, and that a statement without a citation is automatically claimed as original work. This shows up in the difference between Jason’s posts about modern constitutional issues and historical ones: the historical ones have huge source lists, while the modern ones are content with a few hyperlinks.

In both cases, things that are “common knowledge” doesn’t need to be cited, like the fact that TeV cosmic rays exist (they do) or the year that Elizabeth I ascended the throne (1558).

There’s a difference in the number of citations between modern physics research and history research. Is that because of the timing (historical versus modern) or the subject matter? Do they have different amounts of common knowledge? For modern topics in physics and in history, the sources are available online, so a hyperlink is a perfect reference, even in formal post. By that standard, all Quantum Diaries posts should be ok with the hyperlink citation model. But even in those cases, Jason puts footnoted citations to modern articles in the JSTOR database, and uses more citations overall.

Another cool aspect of our coffee shop is that the music is sometimes ridiculous, and it interrupts my thoughts if I get stuck in some esoteric bog. There’s an oddly large sample of German covers of 30s and 40s showtunes. You haven’t lived until you’ve heard “The Lady is a Tramp” in German while calculating oscillation probabilities. I’m kidding. Mostly.

Jason has shown me a different way of handling citations, and I’ve taught him some of the basics of HTML, so now his citations can appear as hyperlinks to the references list!

As habits go, I’m proud of this social coffee shop habit. I default to getting stuff done, even if I’m feeling slightly off or uninspired.  The social reward of hanging out makes up for the slight activation energy of getting off my couch, and once I’m out of the house, it’s always easier to focus.  I miss prime Farmers’ Market time, but I could go before we meet. The friendship has been a wonderful supportive certainty over the last year, plus I get some perspective on my field compared to others.

Scott AaronsonUS State Department: Let in cryptographers and other scientists

Predictably, my last post attracted plenty of outrage (some of it too vile to let through), along with the odd commenter who actually agreed with what I consider my fairly middle-of-the-road, liberal Zionist stance.  But since the outrage came from both sides of the issue, and the two sides were outraged about the opposite things, I guess I should feel OK about it.

Still, it’s hard not to smart from the burns of vituperation, so today I’d like to blog about a very different political issue: one where hopefully almost all Shtetl-Optimized readers will actually agree with me (!).

I’ve learned from colleagues that, over the past year, foreign-born scientists have been having enormously more trouble getting visas to enter the US than they used to.  The problem, I’m told, is particularly severe for cryptographers: embassy clerks are now instructed to ask specifically whether computer scientists seeking to enter the US work in cryptography.  If an applicant answers “yes,” it triggers a special process where the applicant hears nothing back for months, and very likely misses the workshop in the US that he or she had planned to attend.  The root of the problem, it seems, is something called the Technology Alert List (TAL), which has been around for a while—the State Department beefed it up in response to the 9/11 attacks—but which, for some unknown reason, is only now being rigorously enforced.  (Being marked as working in one of the sensitive fields on this list is apparently called “getting TAL’d.”)

The issue reached a comical extreme last October, when Adi Shamir, the “S” in RSA, Turing Award winner, and foreign member of the US National Academy of Sciences, was prevented from entering the US to speak at a “History of Cryptology” conference sponsored by the National Security Agency.  According to Shamir’s open letter detailing the incident, not even his friends at the NSA, or the president of the NAS, were able to grease the bureaucracy at the State Department for him.

It should be obvious to everyone that a crackdown on academic cryptographers serves no national security purpose whatsoever, and if anything harms American security and economic competitiveness, by diverting scientific talent to other countries.  (As Shamir delicately puts it, “the number of terrorists among the members of the US National Academy of Science is rather small.”)  So:

  1. Any readers who have more facts about what’s going on, or personal experiences, are strongly encouraged to share them in the comments section.
  2. Any readers who might have any levers of influence to pull on this issue—a Congressperson to write to, a phone call to make, an Executive Order to issue (I’m talking to you, Barack), etc.—are strongly encouraged to pull them.

Terence TaoVariants of the Selberg sieve, and bounded intervals containing many primes

I’ve just uploaded to the arXiv the D.H.J. Polymath paper “Variants of the Selberg sieve, and bounded intervals containing many primes“, which is the second paper to be produced from the Polymath8 project (the first one being discussed here). We’ll refer to this latter paper here as the Polymath8b paper, and the former as the Polymath8a paper. As with Polymath8a, the Polymath8b paper is concerned with the smallest asymptotic prime gap

\displaystyle  H_1 := \liminf_{n \rightarrow \infty}(p_{n+1}-p_n),

where {p_n} denotes the {n^{th}} prime, as well as the more general quantities

\displaystyle  H_m := \liminf_{n \rightarrow \infty}(p_{n+m}-p_n).

In the breakthrough paper of Goldston, Pintz, and Yildirim, the bound {H_1 \leq 16} was obtained under the strong hypothesis of the Elliott-Halberstam conjecture. An unconditional bound on {H_1}, however, remained elusive until the celebrated work of Zhang last year, who showed that

\displaystyle  H_1 \leq 70{,}000{,}000.

The Polymath8a paper then improved this to {H_1 \leq 4{,}680}. After that, Maynard introduced a new multidimensional Selberg sieve argument that gave the substantial improvement

\displaystyle  H_1 \leq 600

unconditionally, and {H_1 \leq 12} on the Elliott-Halberstam conjecture; furthermore, bounds on {H_m} for higher {m} were obtained for the first time, and specifically that {H_m \ll m^3 e^{4m}} for all {m \geq 1}, with the improvements {H_2 \leq 600} and {H_m \ll m^3 e^{2m}} on the Elliott-Halberstam conjecture. (I had independently discovered the multidimensional sieve idea, although I did not obtain Maynard’s specific numerical results, and my asymptotic bounds were a bit weaker.)

In Polymath8b, we obtain some further improvements. Unconditionally, we have {H_1 \leq 246} and {H_m \ll m e^{(4 - \frac{28}{157}) m}}, together with some explicit bounds on {H_2,H_3,H_4,H_5}; on the Elliott-Halberstam conjecture we have {H_m \ll m e^{2m}} and some numerical improvements to the {H_2,H_3,H_4,H_5} bounds; and assuming the generalised Elliott-Halberstam conjecture we have the bound {H_1 \leq 6}, which is best possible from sieve-theoretic methods thanks to the parity problem obstruction.

There were a variety of methods used to establish these results. Maynard’s paper obtained a criterion for bounding {H_m} which reduced to finding a good solution to a certain multidimensional variational problem. When the dimension parameter {k} was relatively small (e.g. {k \leq 100}), we were able to obtain good numerical solutions both by continuing the method of Maynard (using a basis of symmetric polynomials), or by using a Krylov iteration scheme. For large {k}, we refined the asymptotics and obtained near-optimal solutions of the variational problem. For the {H_1} bounds, we extended the reach of the multidimensional Selberg sieve (particularly under the assumption of the generalised Elliott-Halberstam conjecture) by allowing the function {F} in the multidimensional variational problem to extend to a larger region of space than was previously admissible, albeit with some tricky new constraints on {F} (and penalties in the variational problem). This required some unusual sieve-theoretic manipulations, notably an “epsilon trick”, ultimately relying on the elementary inequality {(a+b)^2 \geq a^2 + 2ab}, that allowed one to get non-trivial lower bounds for sums such as {\sum_n (a(n)+b(n))^2} even if the sum {\sum_n b(n)^2} had no non-trivial estimates available; and a way to estimate divisor sums such as {\sum_{n\leq x} \sum_{d|n} \lambda_d} even if {d} was permitted to be comparable to or even exceed {x}, by using the fundamental theorem of arithmetic to factorise {n} (after restricting to the case when {n} is almost prime). I hope that these sieve-theoretic tricks will be useful in future work in the subject.

With this paper, the Polymath8 project is almost complete; there is still a little bit of scope to push our methods further and get some modest improvement for instance to the {H_1 \leq 246} bound, but this would require a substantial amount of effort, and it is probably best to instead wait for some new breakthrough in the subject to come along. One final task we are performing is to write up a retrospective article on both the 8a and 8b experiences, an incomplete writeup of which can be found here. If anyone wishes to contribute some commentary on these projects (whether you were an active contributor, an occasional contributor, or a silent “lurker” in the online discussion), please feel free to do so in the comments to this post.

Filed under: math.NT, paper, polymath Tagged: polymath8

Tommaso DorigoA Useful Approximation For The Tail Of A Gaussian

This is just a short post to report about a useful paper I found by preparing for a talk I will be giving next week at the 3rd International Conference on New Frontiers in Physics, in the pleasant setting of the Orthodox Academy of Crete, near Kolympari.

My talk will be titled "Extraordinary Claims: the 0.000029% Solution", making reference to the 5-sigma "discovery threshold" that has become a well-known standard for reporting the observation of new effects or particles in high-energy physics and astrophysics.

read more

Clifford JohnsonPot Luck

pot_luck_25_04_14_6Here in Aspen there was a pleasant party over at the apartment of one of the visiting physicists this evening. I know it seems odd, but it has been a while since I've been at a party with a lot of physicists (I'm not counting the official dinners at the Strings conference a fews weeks back), and I enjoyed it. I heard a little about what some old friends were up to, and met some spouses and learned what they do, and so forth. For the first time, I think, I spoke at length to some curious physicists about the graphic book project, and the associated frustrating adventures in the publishing world (short version: most people love it, but they just don't want to take a risk on an unusual project...), and they were excited about it, which was nice of them. pot_luck_25_04_14_1It was a pot luck, and so although I was thinking I'd be tired and just take along a six-pack of beer, by lunchtime I decided that I'd make a little something and take it along. Then, as I tend to do, it became two little somethings...and I went and bought the ingredients at the supermarket nearby and worked down at the centre until later. Well, first I made a simple syrup from sugar and water and muddled and worried a lot of tarragon into it. pot_luck_25_04_14_2 Then in the evening, there was a lot of peeling and chopping. This is usually one of my favourite things, but the knives in the apartment I am staying in are as blunt as sticks of warm butter, and so chopping was long and fretful. (And dangerous... don't people realise that blunt knives are actually more dangerous than sharp ones?) [...] Click to continue reading this post

BackreactionCan black holes bounce to white holes?

Fast track to wisdom: Sure, but who cares if they can? We want to know if they do.

Black holes are defined by the presence of an event horizon which is the boundary of a region from which nothing can escape, ever. The word black hole is also often used to mean something that looks for a long time very similar to a black hole and that traps light, not eternally but only temporarily. Such space-times are said to have an “apparent horizon.” That they are not strictly speaking black holes was origin of the recent Stephen Hawking quote according to which black holes may not exist, by which he meant they might have only an apparent horizon instead of an eternal event horizon.

A white hole is an upside-down version of a black hole; it has an event horizon that is a boundary to a region in which nothing can ever enter. Static black hole solutions, describing unrealistic black holes that have existed forever and continue to exist forever, are actually a combination of a black hole and a white hole.

The horizon itself is a global construct, it is locally entirely unremarkable and regular. You would not note crossing the horizon, but the classical black hole solution contains a singularity in the center. This singularity is usually interpreted as the breakdown of classical general relativity and is expected to be removed by the yet-to-be-found theory of quantum gravity. 

You do however not need quantum gravity to construct singularity-free black hole space-times. Hawking and Ellis’ singularity theorems prove that singularities must form from certain matter configurations, provided the matter is normal matter and cannot develop negative pressure and/or density. All you have to do to get rid of the singularity is invent some funny type of matter that refuses to be squeezed arbitrarily. This is not possible with any type of matter we know, and so just pushes around the bump under the carpet: Now rather than having to explain quantum effects of gravity you have to explain where the funny matter comes from. It is normally interpreted not as matter but as a quantum gravitational contribution to the stress-energy tensor, but either way it’s basically the physicist’s way of using a kitten photo to cover the hole in wall.

Singularity-free black hole solutions have been constructed almost for as long as the black hole solution has been known – people have always been disturbed by the singularity. Using matter other than normal ones allowed constructing both wormhole solutions as well as black holes that turn into white holes and allow an exit into a second space-time region. Now if a black hole is really a black hole with an event horizon, then the second space-time region is causally disconnected from the first. If the black hole has only an apparent horizon, then this does not have to be so, and also the white hole then is not really a white hole, it just looks like one.

The latter solution is quite popular in quantum gravity. It basically describes matter collapsing, forming an apparent horizon and a strong quantum gravity region inside but no singularity, then evaporating and returning to an almost flat space-time. There are various ways to construct these space-times. The details differ, but the corresponding causal diagrams all look basically the same.

This recent paper for example used a collapsing shell turning into an expanding shell. The title “Singularity free gravitational collapse in an effective dynamical quantum spacetime” basically says it all. Note how the resulting causal diagram (left in figure below) looks pretty much the same as the one Lee and I constructed based on general considerations in our 2009 paper (middle in figure below), which again looks pretty much the same as the one that Ashtekar and Bojowald discussed in 2005 (right in figure below), and I could go on and add a dozen more papers discussing similar causal diagrams. (Note that the shaded regions do not mean the same in each figure.)

One needs a concrete ansatz for the matter of course to be able to calculate anything. The general structure of the causal diagram is good for classification purposes, but not useful for quantitative reasoning, for example about the evaporation.

Haggard and Rovelli and recently added to this discussion with a new paper about black holes bouncing to white holes.

    Black hole fireworks: quantum-gravity effects outside the horizon spark black to white hole tunneling
    Hal M. Haggard, Carlo Rovelli
    arXiv: 1407.0989

Ron Cowen at Nature News announced this as a new idea, and while the paper does contain new ideas, that black holes may turn into white holes is in and by itself not new. And so it follows some clarification.

Haggard and Rovelli’s paper contains two ideas that are connected by an argument, but not by a calculation, so I want to discuss them separately. Before we start it is important to note that their argument does not take into account Hawking radiation. The whole process is supposed to happen already without outgoing radiation. For this reason the situation is completely time-reversal invariant, which makes it significantly easier to construct a metric. It is also easier to arrive at a result that has nothing to do with reality.

So, the one thing that is new in the Haggard and Rovelli paper is that they construct a space-time diagram, describing a black hole turning into a white hole, both with apparent horizons, and do so by a cutting-procedure rather than altering the equation of state of the matter. As source they use a collapsing shell that is supposed to bounce. This cutting procedure is fine in principle, even though it is not often used. The problem is that you end up with a metric that exists as solution to some source, but you then have to calculate what the source has to do in order to give you the metric. This however is not done in the paper. I want to offer you a guess though as to what source would be necessary to create their metric.

The cutting that is done in the paper takes a part of the black hole metric (describing the inside of the shell) with an arm extending into the horizon region, then squeezes this arm together so that it shrinks in radial extension no longer extends into the regime below the Schwarzschild radius, which is normally behind the horizon. This squeezed part of the black hole metric is then matched to empty space, describing the inside of the shell. See image below

Figure 4 from arXiv: 1407.0989

They do not specify what happens to the shell after it has reached the end of the region that was cut, explaining one would need quantum gravity for this. The result is glued together with the time-reversed case, and so they get a metric that forms an apparent horizon and bounces at a radius where one normally would not expect quantum gravitational effects. (Working towards making more concrete the so far quite vague idea of Planck stars that we discussed here.)

The cutting and squeezing basically means that the high curvature region from inside the horizon was moved to a larger radius, and the only way this makes sense is if it happens together with the shell. So I think effectively they take the shell from a small radius and match the small radius to a large radius while keeping the density fixed (they keep the curvature). This looks to me like they blow up the total mass of the shell, but keep in mind this is my interpretation, not theirs. If that was so however, then makes sense that the horizon forms at a larger radius if the shell collapses while its mass increases. This raises the question though why the heck the mass of the shell should increase and where that energy is supposed to come from.

This brings me to the second argument in the paper, which is supposed to explain why it is plausible to expect this kind of behavior. Let me first point out that it is a bold claim that quantum gravity effects kick in outside the horizon of a (large) black hole. Standard lore has it that quantum gravity only leads to large corrections to the classical metric if the curvature is large (in the Planckian regime). This happens always after horizon crossing (as long as the mass of the black hole is larger than the Planck mass). But once the horizon is formed, the only way to make matter bounce so that it can come out of the horizon necessitates violations of causality and/or locality (keep in mind their black hole is not evaporating!) that extend into small curvature regions. This is inherently troublesome because now one has to explain why we don’t see quantum gravity effects all over the place.

The way they argue this could happen is that small, Planck size, higher-order correction to the metric can build up over time. In this case it is not solely the curvature that is relevant for an estimate of the effect, but also the duration of the buildup. So far, so good. My first problem is that I can’t see what their estimate of the long-term effects of such a small correction has to do with quantum gravity. I could read the whole estimate as being one for black hole solutions in higher-order gravity, quantum not required. If it was a quantum fluctuation I would expect the average solution to remain the classical one and the cases in which the fluctuations build up to be possible but highly improbable. In fact they seem to have something like this in mind, just that they for some reason come to the conclusion that the transition to the solution in which the initially small fluctuation builds up becomes more likely over time rather than less likely.

What one would need to do to estimate the transition probability is to work out some product of wave-functions describing the background metric close by and far away from the classical average, but nothing like this is contained in the paper. (Carlo told me though, it’s in the making.) It remains to be shown that the process of all the matter of the shell suddenly tunneling outside the horizon and expanding again is more likely to happen than the slow evaporation due to Hawking radiation which is essentially also a tunnel process (though not one of the metric, just of the matter moving in the metric background). And all this leaves aside that the state should decohere and not just happily build up quantum fluctuations for the lifetime of the universe or so.

By now I’ve probably lost most readers so let me just sum up. The space-time that Haggard and Rovelli have constructed exists as a mathematical possibility, and I do not actually doubt that the tunnel process is possible in principle, provided that they get rid of the additional energy that has appeared from somewhere (this is taken care of automatically by the time-reversal). But this alone does not tell us whether this space-time can exist as a real possibility in the sense that we do not know if this process can happen with large probability (close to one) in the time before the shell reaches the Schwarzschild radius (of the classical solution).

I have remained skeptical, despite Carlo’s infinitely patience in explaining their argument to me. But if they are right and what they claim is correct, then this would indeed solve both the black hole information loss problem and the firewall conundrum. So stay tuned...

July 25, 2014

David Hoggdust priors and likelihoods

Richard Hanson and Coryn Bailer-Jones (both MPIA) and I met today to talk about spatial priors and extinction modeling for Gaia. I showed them what I have on spatial priors, and we talked about the differences between using extinction measurements to predict new extinctions, using extinction measurements to predict dust densities, and so on. A key difference between the way I am thinking about it and the way Hanson and Bailer-Jones are thinking about it is that I don't want to instantiate the dust density (latent parameters) unless I have to. I would rather use the magic of the Gaussian Process to marginalize it out. We developed a set of issues for the document that I am writing on the subject. At Galaxy Coffee, Girish Kulkarni (MPIA) gave a great talk about the physics of the intergalactic medium and observational constraints from the absorption lines in quasar spectra.

David Hoggempirical models for APOGEE spectra

I spent a chunk of the day with Melissa Ness (MPIA), fitting empirical models to APOGEE infrared spectra of stars. The idea is to do a simple linear supervised classification or regression, in which we figure out the dependence of the spectra on key stellar parameters, using a "training set" of stars with good stellar parameters. We worked in the pair-coding mode. By the end of the day we could show that we are able to identify regions of the spectrum that might serve as good metallicity indicators, relatively insensitive to temperature and log-g. The hopes for this project range from empirical metallicity index identification to label de-noising to building a full data-driven (supervised) stellar parameter pipeline. We ended our coding day pretty optimistic.

Jordan EllenbergHow do you share your New York Times?

My op/ed about math teaching and Little League coaching is the most emailed article in the New York Times today.  Very cool!

But here’s something interesting; it’s only the 14th most viewed article, the 6th most tweeted, and the 6th most shared on Facebook.  On the other hand, this article about child refugees from Honduras is

#14 most emailed

#1 most viewed

#1 most shared on Facebook

#1 most tweeted

while Paul Krugman’s column about California is

#4 most emailed

#3 most viewed

#4 most shared on Facebook

#7 most tweeted.

Why are some articles, like mine, much more emailed than tweeted, while others, like the one about refugees, much more tweeted than emailed, and others still, like Krugman’s, come out about even?  Is it always the case that views track tweets, not emails?  Not necessarily; an article about the commercial success and legal woes of conservative poo-stirrer Dinesh D’Souza is #3 most viewed, but only #13 in tweets (and #9 in emails.)  Today’s Gaza story has lots of tweets and views but not so many emails, like the Honduras piece, so maybe this is a pattern for international news?  Presumably people inside newspapers actually study stuff like this; is any of that research public?  Now I’m curious.



Chad OrzelTen Inessential Papers in Quantum Physics

I should really know better than to click any tweeted link with a shortened URL, but for some reason, I actually followed one to an article with the limited-reach clickbait title Curious About Quantum Physics? Read These 10 Articles!. Which is only part one, because Huffington Post, so it’s actually five articles.

Three of the five articles are Einstein papers from 1905, which is sort of the equivalent of making a Ten Essential Rock Albums list that includes Revolver, Abbey Road, and the White Album. One of the goals of a well-done list of “essential” whatever is to give a sense of the breadth of a subject, not just focus on a single example, so this is a big failure right off the bat.

But it’s even worse than that, because none of the three 1905 articles is the photoelectric effect paper, which is the only one of the lot that has any quantum physics in it. There’s a fourth Einstein paper on the list, as well, the theory of general relativity, which is famous for not being compatible with quantum mechanics. So this is really like a list of Ten Essential Rock Albums that includes three country songs and a Bach concerto.

I thought about using this as an opportunity to generate a better Ten Essential Quantum Papers list, including stuff like Bell’s Theorem (the physics equivalent of the first Velvet Underground record) and the No-Cloning Theorem (the physics equivalent of punk rock) (brief pause to let those who know Bill Wootters try to reconcile that mental image). And if you would like to make suggestions of things that ought to be on such a list in the comments, feel free.

(I’m also open to suggestions of better musical analogies– maybe the EPR paper is the real Velvet Underground record? With Bell’s paper being punk rock, making Wootters and Zurek… Nirvana, maybe? Or maybe Shor’s algorithm is the “Smells Like Teen Spirit” of quantum physics (Again, a brief pause while those who know Peter Shor try to picture him as Kurt Cobain)…)

But, really, on reflection, the whole exercise is kind of silly even by the standards of clickbait blog topics, because that’s not how science works. In science, and particularly a highly mathematical science like physics, there’s not that much real benefit to reading the original source material. The best explanation of a central concept is rarely if ever found in the first paper to present it. This goes right back to the start of the discipline, with Newton’s Principia Mathematica, which nobody reads because it’s written in really opaque Latin, a move he claimed in a letter was deliberate so as to avoid “being baited by little smatterers in mathematics” (Newton was kind of a dick). Newton’s mathematical notation is also pretty awful, and I’ve heard it claimed that the reason physics advanced faster in mainland Europe than in England during the 1700s was that on the continent, they adopted Leibniz’s system, which was way more user-friendly and is the basis for modern calculus notation. Similarly, Maxwell’s original presentation of his eponymous equations is really difficult to follow, and it’s only after the work of folks like Heaviside that they become the clear, elegant, and bumper-sticker-friendly version we know today.

That’s not to say that there’s no value in reading old papers– I’ve had a lot of fun writing up old MS theses from our department, and older work can be fascinating to read. But unlike primary works of (pop) culture, they’re much better if you come to them already knowing what they’re about. The fascination comes from seeing how people fumbled their way toward ideas that we now know to be correct. It’s rare for a “classic” paper to get all the way to the modern understanding of things, or even most of the way there– most of the great original works contain what we now know to be errors of interpretation. Others are revered today for discoveries that were somewhat tangential to what the original author thought was the main point– the Cavendish experiment is thought of today as a measurement of “big G,” but he presents it as a determination of the density of the Earth, because that was of more pressing practical interest at the time.

If you want to learn science, you’re much better off looking up the best modern treatment than going back to the original papers. A good recent textbook will have the bugs worked out, and present it in something close to the language used by working scientists today. A good popular-audience treatment (ahem) will cover the basic concepts starting from a more complete understanding of the field as it has developed, and with an eye toward making those concepts accessible to a modern reader. It’s not foolproof, of course– the steady progress of science over a stretch of decades often means that newer books need to cover a huge amount of material to get to the sexy cutting-edge stuff, and sometimes scant the basics a bit. But by and large, if you’re curious about quantum physics, you’d be much better off hitting the physics section of your local bookstore or library than digging through archived journals for the original papers.

So, a list of “Ten Essential Papers on Quantum Physics” is a deeply flawed concept right from the start, at least if the goal is to learn something about quantum physics that you didn’t already know. The same is true of almost every science, with a few exception– Darwin’s On the Origin of Species is still a really good read, but it’s the exception, not the rule. Such a list can be useful as a sort of historical map, or for providing some insight into the thought processes of the great scientists of yesteryear, and those can be very rewarding. But if you’re curious and want to learn, I don’t think any original papers can really be considered “essential.”

n-Category Café The Ten-Fold Way (Part 2)

How can we discuss all the kinds of matter described by the ten-fold way in a single setup?

It’s bit tough, because 8 of them are fundamentally ‘real’ while the other 2 are fundamentally ‘complex’. Yet they should fit into a single framework, because there are 10 super division algebras over the real numbers, and each kind of matter is described using a super vector space — or really a super Hilbert space — with one of these super division algebras as its ‘ground field’.

Combining physical systems is done by tensoring their Hilbert spaces… and there does seem to be a way to do this even with super Hilbert spaces over different super division algebras. But what sort of mathematical structure can formalize this?

Here’s my current attempt to solve this problem. I’ll start with a warmup case, the threefold way. In fact I’ll spend most of my time on that! Then I’ll sketch how the ideas should extend to the tenfold way.

Fans of lax monoidal functors, Deligne’s tensor product of abelian categories, and the collage of a profunctor will be rewarded for their patience if they read the whole article. But the basic idea is supposed to be simple: it’s about a multiplication table.

The 𝟛\mathbb{3}-fold way

First of all, notice that the set

𝟛={1,0,1} \mathbb{3} = \{1,0,-1\}

is a commutative monoid under ordinary multiplication:

× 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 \begin{array}{rrrr} \mathbf{\times} & \mathbf{1} & \mathbf{0} & \mathbf{-1} \\ \mathbf{1} & 1 & 0 & -1 \\ \mathbf{0} & 0 & 0 & 0 \\ \mathbf{-1} & -1 & 0 & 1 \end{array}

Next, note that there are three (associative) division algebras over the reals: ,\mathbb{R}, \mathbb{C} or \mathbb{H}. We can equip a real vector space with the structure of a module over any of these algebras. We’ll then call it a real, complex or quaternionic vector space.

For the real case, this is entirely dull. For the complex case, this amounts to giving our real vector space VV a complex structure: a linear operator i:VVi: V \to V with i 2=1i^2 = -1. For the quaternionic case, it amounts to giving VV a quaternionic structure: a pair of linear operators i,j:VVi, j: V \to V with

i 2=j 2=1,ij=ji i^2 = j^2 = -1, \qquad i j = -j i

We can then define k=ijk = i j.

The terminology ‘quaternionic vector space’ is a bit quirky, since the quaternions aren’t a field, but indulge me. n\mathbb{H}^n is a quaternionic vector space in an obvious way. n×nn \times n quaternionic matrices act by multiplication on the right as ‘quaternionic linear transformations’ — that is, left module homomorphisms — of n\mathbb{H}^n. Moreover, every finite-dimensional quaternionic vector space is isomorphic to n\mathbb{H}^n. So it’s really not so bad! You just need to pay some attention to left versus right.

Now: I claim that given two vector spaces of any of these kinds, we can tensor them over the real numbers and get a vector space of another kind. It goes like this:

real complex quaternionic real real complex quaternionic complex complex complex complex quaternionic quaternionic complex real \begin{array}{cccc} \mathbf{\otimes} & \mathbf{real} & \mathbf{complex} & \mathbf{quaternionic} \\ \mathbf{real} & real & complex & quaternionic \\ \mathbf{complex} & complex & complex & complex \\ \mathbf{quaternionic} & quaternionic & complex & real \end{array}

You’ll notice this has the same pattern as the multiplication table we saw before:

× 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 \begin{array}{rrrr} \mathbf{\times} & \mathbf{1} & \mathbf{0} & \mathbf{-1} \\ \mathbf{1} & 1 & 0 & -1 \\ \mathbf{0} & 0 & 0 & 0 \\ \mathbf{-1} & -1 & 0 & 1 \end{array}


  • \mathbb{R} acts like 1.
  • \mathbb{C} acts like 0.
  • \mathbb{H} acts like -1.

There are different ways to understand this, but a nice one is to notice that if we have algebras AA and BB over some field, and we tensor an AA-module and a BB-module (over that field), we get an ABA \otimes B-module. So, we should look at this ‘multiplication table’ of real division algebras:

[2] [2] [4] \begin{array}{lrrr} \mathbf{\otimes} & \mathbf{\mathbb{R}} & \mathbf{\mathbb{C}} & \mathbf{\mathbb{H}} \\ \mathbf{\mathbb{R}} & \mathbb{R} & \mathbb{C} & \mathbb{H} \\ \mathbf{\mathbb{C}} & \mathbb{C} & \mathbb{C} \oplus \mathbb{C} & \mathbb{C}[2] \\ \mathbf{\mathbb{H}} & \mathbb{H} & \mathbb{C}[2] & \mathbb{R}[4] \end{array}

Here [2]\mathbb{C}[2] means the 2 × 2 complex matrices viewed as an algebra over \mathbb{R}, and [4]\mathbb{R}[4] means that 4 × 4 real matrices.

What’s going on here? Naively you might have hoped for a simpler table, which would have instantly explained my earlier claim:

\begin{array}{lrrr} \mathbf{\otimes} & \mathbf{\mathbb{R}} & \mathbf{\mathbb{C}} & \mathbf{\mathbb{H}} \\ \mathbf{\mathbb{R}} & \mathbb{R} & \mathbb{C} &\mathbb{H} \\ \mathbf{\mathbb{C}} & \mathbb{C} & \mathbb{C} & \mathbb{C} \\ \mathbf{\mathbb{H}} & \mathbb{H} & \mathbb{C} & \mathbb{R} \end{array}

This isn’t true, but it’s ‘close enough to true’. Why? Because we always have a god-given algebra homomorphism from the naive answer to the real answer! The interesting cases are these:

\mathbb{C} \to \mathbb{C} \oplus \mathbb{C} [2] \mathbb{C} \to \mathbb{C}[2] [4] \mathbb{R} \to \mathbb{R}[4]

where the first is the diagonal map a(a,a)a \mapsto (a,a), and the other two send numbers to the corresponding scalar multiples of the identity matrix.

So, for example, if VV and WW are \mathbb{C}-modules, then their tensor product (over the reals! — all tensor products here are over \mathbb{R}) is a module over \mathbb{C} \otimes \mathbb{C} \cong \mathbb{C} \oplus \mathbb{C}, and we can then pull that back via ff to get a right \mathbb{C}-module.

What’s really going on here?

There’s a monoidal category Alg Alg_{\mathbb{R}} of algebras over the real numbers, where the tensor product is the usual tensor product of algebras. The monoid 𝟛\mathbb{3} can be seen as a monoidal category with 3 objects and only identity morphisms. And I claim this:

Claim. There is an oplax monoidal functor F:𝟛Alg F : \mathbb{3} \to Alg_{\mathbb{R}} with F(1) = F(0) = F(1) = \begin{array}{ccl} F(1) &=& \mathbb{R} \\ F(0) &=& \mathbb{C} \\ F(-1) &=& \mathbb{H} \end{array}

What does ‘oplax’ mean? Some readers of the nn-Category Café eat oplax monoidal functors for breakfast and are chortling with joy at how I finally summarized everything I’d said so far in a single terse sentence! But others of you see ‘oplax’ and get a queasy feeling.

The key idea is that when we have two monoidal categories CC and DD, a functor F:CDF : C \to D is ‘oplax’ if it preserves the tensor product, not up to isomorphism, but up to a specified morphism. More precisely, given objects x,yCx,y \in C we have a natural transformation

F x,y:F(xy)F(x)F(y) F_{x,y} : F(x \otimes y) \to F(x) \otimes F(y)

If you had a ‘lax’ functor this would point the other way, and they’re a bit more popular… so when it points the opposite way it’s called ‘oplax’.

(In the lax case, F x,yF_{x,y} should probably be called the laxative, but we’re not doing that case, so I don’t get to make that joke.)

This morphism F x,yF_{x,y} needs to obey some rules, but the most important one is that using it twice, it gives two ways to get from F(xyz)F(x \otimes y \otimes z) to F(x)F(y)F(z)F(x) \otimes F(y) \otimes F(z), and these must agree.

Let’s see how this works in our example… at least in one case. I’ll take the trickiest case. Consider

F 0,0:F(00)F(0)F(0), F_{0,0} : F(0 \cdot 0) \to F(0) \otimes F(0),

that is:

F 0,0: F_{0,0} : \mathbb{C} \to \mathbb{C} \otimes \mathbb{C}

There are, in principle, two ways to use this to get a homomorphism

F(000)F(0)F(0)F(0)F(0 \cdot 0 \cdot 0 ) \to F(0) \otimes F(0) \otimes F(0)

or in other words, a homomorphism

\mathbb{C} \to \mathbb{C} \otimes \mathbb{C} \otimes \mathbb{C}

where remember, all tensor products are taken over the reals. One is

F 0,01F 0,0() \mathbb{C} \stackrel{F_{0,0}}{\longrightarrow} \mathbb{C} \otimes \mathbb{C} \stackrel{1 \otimes F_{0,0}}{\longrightarrow} \mathbb{C} \otimes (\mathbb{C} \otimes \mathbb{C})

and the other is

F 0,0F 0,01() \mathbb{C} \stackrel{F_{0,0}}{\longrightarrow} \mathbb{C} \otimes \mathbb{C} \stackrel{F_{0,0} \otimes 1}{\longrightarrow} (\mathbb{C} \otimes \mathbb{C})\otimes \mathbb{C}

I want to show they agree (after we rebracket the threefold tensor product using the associator).

Unfortunately, so far I have described F 0,0F_{0,0} in terms of an isomorphism

\mathbb{C} \otimes \mathbb{C} \cong \mathbb{C} \oplus \mathbb{C}

Using this isomorphism, F 0,0F_{0,0} becomes the diagonal map a(a,a)a \mapsto (a,a). But now we need to really understand F 0,0F_{0,0} a bit better, so I’d better say what isomorphism I have in mind! I’ll use the one that goes like this:

11 (1,1) i1 (i,i) 1i (i,i) ii (1,1) \begin{array}{ccl} \mathbb{C} \otimes \mathbb{C} &\to& \mathbb{C} \oplus \mathbb{C} \\ 1 \otimes 1 &\mapsto& (1,1) \\ i \otimes 1 &\mapsto &(i,i) \\ 1 \otimes i &\mapsto &(i,-i) \\ i \otimes i &\mapsto & (1,-1) \end{array}

This may make you nervous, but it truly is an isomorphism of real algebras, and it sends a1a \otimes 1 to (a,a)(a,a). So, unraveling the web of confusion, we have

F 0,0: a a1 \begin{array}{rccc} F_{0,0} : & \mathbb{C} &\to& \mathbb{C}\otimes \mathbb{C} \\ & a &\mapsto & a \otimes 1 \end{array}

Why didn’t I just say that in the first place? Well, I suffered over this a bit, so you should too! You see, there’s an unavoidable arbitrary choice here: I could just have well used a1aa \mapsto 1 \otimes a. F 0,0F_{0,0} looked perfectly god-given when we thought of it as a homomorphism from \mathbb{C} to \mathbb{C} \oplus \mathbb{C}, but that was deceptive, because there’s a choice of isomorphism \mathbb{C} \otimes \mathbb{C} \to \mathbb{C} \oplus \mathbb{C} lurking in this description.

This makes me nervous, since category theory disdains arbitrary choices! But it seems to work. On the one hand we have

F 0,0 1F 0,0 a a1 a(11) \begin{array}{ccccc} \mathbb{C} &\stackrel{F_{0,0}}{\longrightarrow} &\mathbb{C} \otimes \mathbb{C} &\stackrel{1 \otimes F_{0,0}}{\longrightarrow}& \mathbb{C} \otimes \mathbb{C} \otimes \mathbb{C} \\ a &\mapsto & a \otimes 1 & \mapsto & a \otimes (1 \otimes 1) \end{array}

On the other hand, we have

F 0,0 F 0,01 a a1 (a1)1 \begin{array}{ccccc} \mathbb{C} &\stackrel{F_{0,0}}{\longrightarrow} & \mathbb{C} \otimes \mathbb{C} &\stackrel{F_{0,0} \otimes 1}{\longrightarrow} & \mathbb{C} \otimes \mathbb{C} \otimes \mathbb{C} \\ a &\mapsto & a \otimes 1 & \mapsto & (a \otimes 1) \otimes 1 \end{array}

So they agree!

I need to carefully check all the other cases before I dare call my claim a theorem. Indeed, writing up this case has increased my nervousness… before, I’d thought it was obvious.

But let me march on, optimistically!


In quantum physics, what matters is not so much the algebras \mathbb{R}, \mathbb{C} and \mathbb{H} themselves as the categories of vector spaces — or indeed, Hilbert spaces —-over these algebras. So, we should think about the map sending an algebra to its category of modules.

For any field kk, there should be a contravariant pseudofunctor

Rep:Alg kRex k Rep: Alg_k \to Rex_k

where Rex kRex_k is the 2-category of

  • kk-linear finitely cocomplete categories,

  • kk-linear functors preserving finite colimits,

  • and natural transformations.

The idea is that RepRep sends any algebra AA over kk to its category of modules, and any homomorphism f:ABf : A \to B to the pullback functor f *:Rep(B)Rep(A)f^* : Rep(B) \to Rep(A) .

(Functors preserving finite colimits are also called right exact; this is the reason for the funny notation RexRex. It has nothing to do with the dinosaur of that name.)

Moreover, RepRep gets along with tensor products. It’s definitely true that given real algebras AA and BB, we have

Rep(AB)Rep(A)Rep(B) Rep(A \otimes B) \simeq Rep(A) \boxtimes Rep(B)

where \boxtimes is the tensor product of finitely cocomplete kk-linear categories. But we should be able to go further and prove RepRep is monoidal. I don’t know if anyone has bothered yet.

(In case you’re wondering, this \boxtimes thing reduces to Deligne’s tensor product of abelian categories given some ‘niceness assumptions’, but it’s a bit more general. Read the talk by Ignacio López Franco if you care… but I could have used Deligne’s setup if I restricted myself to finite-dimensional algebras, which is probably just fine for what I’m about to do.)

So, if my earlier claim is true, we can take the oplax monoidal functor

F:𝟛Alg F : \mathbb{3} \to Alg_{\mathbb{R}}

and compose it with the contravariant monoidal pseudofunctor

Rep:Alg Rex Rep : Alg_{\mathbb{R}} \to Rex_{\mathbb{R}}

giving a guy which I’ll call

Vect:𝟛Rex Vect: \mathbb{3} \to Rex_{\mathbb{R}}

I guess this guy is a contravariant oplax monoidal pseudofunctor! That doesn’t make it sound very lovable… but I love it. The idea is that:

  • Vect(1)Vect(1) is the category of real vector spaces

  • Vect(0)Vect(0) is the category of complex vector spaces

  • Vect(1)Vect(-1) is the category of quaternionic vector spaces

and the operation of multiplication in 𝟛={1,0,1}\mathbb{3} = \{1,0,-1\} gets sent to the operation of tensoring any one of these three kinds of vector space with any other kind and getting another kind!

So, if this works, we’ll have combined linear algebra over the real numbers, complex numbers and quaternions into a unified thing, VectVect. This thing deserves to be called a 𝟛\mathbb{3}-graded category. This would be a nice way to understand Dyson’s threefold way.

What’s really going on?

What’s really going on with this monoid 𝟛\mathbb{3}? It’s a kind of combination or ‘collage’ of two groups:

  • The Brauer group of \mathbb{R}, namely 2{1,1}\mathbb{Z}_2 \cong \{-1,1\}. This consists of Morita equivalence classes of central simple algebras over \mathbb{R}. One class contains \mathbb{R} and the other contains \mathbb{H}. The tensor product of algebras corresponds to multiplication in {1,1}\{-1,1\}.

  • The Brauer group of \mathbb{C}, namely the trivial group {0}\{0\}. This consists of Morita equivalence classes of central simple algebras over \mathbb{C}. But \mathbb{C} is algebraically closed, so there’s just one class, containing \mathbb{C} itself!

See, the problem is that while \mathbb{C} is a division algebra over \mathbb{R}, it’s not ‘central simple’ over \mathbb{R}: its center is not just \mathbb{R}, it’s bigger. This turns out to be why \mathbb{C} \otimes \mathbb{C} is so funny compared to the rest of the entries in our division algebra multiplication table.

So, we’ve really got two Brauer groups in play. But we also have a homomorphism from the first to the second, given by ‘tensoring with \mathbb{C}’: complexifying any real central simple algebra, we get a complex one.

And whenever we have a group homomorphism α:GH\alpha: G \to H, we can make their disjoint union GHG \sqcup H into monoid, which I’ll call G αHG \sqcup_\alpha H.

It works like this. Given g,gGg,g' \in G, we multiply them the usual way. Given h,hHh, h' \in H, we multiply them the usual way. But given gGg \in G and hHh \in H, we define

gh:=α(g)h g h := \alpha(g) h


hg:=hα(g) h g := h \alpha(g)

The multiplication on G αHG \sqcup_\alpha H is associative! For example:

(gg)h=α(gg)h=α(g)α(g)h=α(g)(gh)=g(gh) (g g')h = \alpha(g g') h = \alpha(g) \alpha(g') h = \alpha(g) (g'h) = g(g'h)

Moreover, the element 1 GG1_G \in G acts as the identity of G αHG \sqcup_\alpha H. For example:

1 Gh=α(1 G)h=1 Hh=h 1_G h = \alpha(1_G) h = 1_H h = h

But of course G αHG \sqcup_\alpha H isn’t a group, since “once you get inside HH you never get out”.

This construction could be called the collage of GG and HH via α\alpha, since it’s reminiscent of a similar construction of that name in category theory.

Question. What do monoid theorists call this construction?

Question. Can we do a similar trick for any field? Can we always take the Brauer groups of all its finite-dimensional extensions and fit them together into a monoid by taking some sort of collage? If so, I’d call this the Brauer monoid of that field.

The 𝟙𝟘\mathbb{10}-fold way

If you carefully read Part 1, maybe you can guess how I want to proceed. I want to make everything ‘super’.

I’ll replace division algebras over \mathbb{R} by super division algebras over \mathbb{R}. Now instead of 3 = 2 + 1 there are 10 = 8 + 2:

  • 8 of them are central simple over \mathbb{R}, so they give elements of the super Brauer group of \mathbb{R}, which is 8\mathbb{Z}_8.

  • 2 of them are central simple over \mathbb{C}, so they give elements of the super Brauer group of \mathbb{C}, which is 2\mathbb{Z}_2.

Complexification gives a homomorphism

α: 8 2 \alpha: \mathbb{Z}_8 \to \mathbb{Z}_2

namely the obvious nontrivial one. So, we can form the collage

𝟙𝟘= 8 α 2 \mathbb{10} = \mathbb{Z}_8 \sqcup_\alpha \mathbb{Z}_2

It’s a commutative monoid with 10 elements! Each of these is the equivalence class of one of the 10 real super division algebras.

I’ll then need to check that there’s an oplax monoidal functor

G:𝟙𝟘SuperAlg G : \mathbb{10} \to SuperAlg_{\mathbb{R}}

sending each element of 𝟙𝟘\mathbb{10} to the corresponding super division algebra.

If GG really exists, I can compose it with a thing

SuperRep:SuperAlg Rex SuperRep : SuperAlg_{\mathbb{R}} \to Rex_{\mathbb{R}}

sending each super algebra to its category of ‘super representations’ on super vector spaces. This should again be a contravariant monoidal pseudofunctor.

We can call the composite of GG with SuperRepSuperRep

SuperVect:𝟙𝟘Rex SuperVect: \mathbb{10} \to \Rex_{\mathbb{R}}

If it all works, this thing SuperVectSuperVect will deserve to be called a 𝟙𝟘\mathbb{10}-graded category. It contains super vector spaces over the 10 kinds of super division algebras in a single framework, and says how to tensor them. And when we look at super Hilbert spaces, this setup will be able to talk about all ten kinds of matter I mentioned last time… and how to combine them.

So that’s the plan. If you see problems, or ways to simplify things, please let me know!

n-Category Café The Ten-Fold Way (Part 1)

There are 10 of each of these things:

  • Associative real super-division algebras.

  • Classical families of compact symmetric spaces.

  • Ways that Hamiltonians can get along with time reversal (TT) and charge conjugation (CC) symmetry.

  • Dimensions of spacetime in string theory.

It’s too bad nobody took up writing This Week’s Finds in Mathematical Physics when I quit. Someone should have explained this stuff in a nice simple way, so I could read their summary instead of fighting my way through the original papers. I don’t have much time for this sort of stuff anymore!

Luckily there are some good places to read about this stuff:

Let me start by explaining the basic idea, and then move on to more fancy aspects.

Ten kinds of matter

The idea of the ten-fold way goes back at least to 1996, when Altland and Zirnbauer discovered that substances can be divided into 10 kinds.

The basic idea is pretty simple. Some substances have time-reversal symmetry: they would look the same, even on the atomic level, if you made a movie of them and ran it backwards. Some don’t — these are more rare, like certain superconductors made of yttrium barium copper oxide! Time reversal symmetry is described by an antiunitary operator TT that squares to 1 or to -1: please take my word for this, it’s a quantum thing. So, we get 3 choices, which are listed in the chart under TT as 1, -1, or 0 (no time reversal symmetry).

Similarly, some substances have charge conjugation symmetry, meaning a symmetry where we switch particles and holes: places where a particle is missing. The ‘particles’ here can be rather abstract things, like phonons - little vibrations of sound in a substance, which act like particles — or spinons — little vibrations in the lined-up spins of electrons. Basically any way that something can wave can, thanks to quantum mechanics, act like a particle. And sometimes we can switch particles and holes, and a substance will act the same way!

Like time reversal symmetry, charge conjugation symmetry is described by an antiunitary operator CC that can square to 1 or to -1. So again we get 3 choices, listed in the chart under CC as 1, -1, or 0 (no charge conjugation symmetry).

So far we have 3 × 3 = 9 kinds of matter. What is the tenth kind?

Some kinds of matter don’t have time reversal or charge conjugation symmetry, but they’re symmetrical under the combination of time reversal and charge conjugation! You switch particles and holes and run the movie backwards, and things look the same!

In the chart they write 1 under the SS when your matter has this combined symmetry, and 0 when it doesn’t. So, “0 0 1” is the tenth kind of matter (the second row in the chart).

This is just the beginning of an amazing story. Since then people have found substances called topological insulators that act like insulators in their interior but conduct electricity on their surface. We can make 3-dimensional topological insulators, but also 2-dimensional ones (that is, thin films) and even 1-dimensional ones (wires). And we can theorize about higher-dimensional ones, though this is mainly a mathematical game.

So we can ask which of the 10 kinds of substance can arise as topological insulators in various dimensions. And the answer is: in any particular dimension, only 5 kinds can show up. But it’s a different 5 in different dimensions! This chart shows how it works for dimensions 1 through 8. The kinds that can’t show up are labelled 0.

If you look at the chart, you’ll see it has some nice patterns. And it repeats after dimension 8. In other words, dimension 9 works just like dimension 1, and so on.

If you read some of the papers I listed, you’ll see that the \mathbb{Z}’s and 2\mathbb{Z}_2’s in the chart are the homotopy groups of the ten classical series of compact symmetric spaces. The fact that dimension n+8n+8 works like dimension nn is called Bott periodicity.

Furthermore, the stuff about operators TT, CC and SS that square to 1, -1 or don’t exist at all is closely connected to the classification of associative real super division algebras. It all fits together.

Super division algebras

In 2005, Todd Trimble wrote a short paper called The super Brauer group and super division algebras.

In it, he gave a quick way to classify the associative real super division algebras: that is, finite-dimensional associative real 2\mathbb{Z}_2-graded algebras having the property that every nonzero homogeneous element is invertible. The result was known, but I really enjoyed Todd’s effortless proof.

However, I didn’t notice that there are exactly 10 of these guys. Now this turns out to be a big deal. For each of these 10 algebras, the representations of that algebra describe ‘types of matter’ of a particular kind — where the 10 kinds are the ones I explained above!

So what are these 10 associative super division algebras?

3 of them are purely even, with no odd part: the usual associative division algebras ,\mathbb{R}, \mathbb{C} and \mathbb{H}.

7 of them are not purely even. Of these, 6 are Morita equivalent to the real Clifford algebras Cl 1,Cl 2,Cl 3,Cl 5,Cl 6Cl_1, Cl_2, Cl_3, Cl_5, Cl_6 and Cl 7Cl_7. These are the superalgebras generated by 1, 2, 3, 5, 6, or 7 odd square roots of -1.

Now you should have at least two questions:

  • What’s ‘Morita equivalence’? — and even if you know, why should it matter here? Two algebras are Morita equivalent if they have equivalent categories of representations. The same definition works for superalgebras, though now we look at their representations on super vector spaces ( 2\mathbb{Z}_2-graded vector spaces). For physics what we really care about is the representations of an algebra or superalgebra: as I mentioned, those are ‘types of matter’. So, it makes sense to count two superalgebras as ‘the same’ if they’re Morita equivalent.

  • 1, 2, 3, 5, 6, and 7? That’s weird — why not 4? Well, Todd showed that Cl 4Cl_4 is Morita equivalent to the purely even super division algebra \mathbb{H}. So we already had that one on our list. Similarly, why not 0? Cl 0Cl_0 is just \mathbb{R}. So we had that one too.

Representations of Clifford algebras are used to describe spin-1/2 particles, so it’s exciting that 8 of the 10 associative real super division algebras are Morita equivalent to real Clifford algebras.

But I’ve already mentioned one that’s not: the complex numbers, \mathbb{C}, regarded as a purely even algebra. And there’s one more! It’s the complex Clifford algebra l 1\mathbb{C}\mathrm{l}_1. This is the superalgebra you get by taking the purely even algebra \mathbb{C} and throwing in one odd square root of -1.

As soon as you hear that, you notice that the purely even algebra \mathbb{C} is the complex Clifford algebra l 0\mathbb{C}\mathrm{l}_0. In other words, it’s the superalgebra you get by taking the purely even algebra \mathbb{C} and throwing in no odd square roots of -1.

More connections

At this point things start fitting together:

  • You can multiply Morita equivalence classes of algebras using the tensor product of algebras: [A][B]=[AB][A] \otimes [B] = [A \otimes B]. Some equivalence classes have multiplicative inverses, and these form the Brauer group. We can do the same thing for superalgebras, and get the super Brauer group. The super division algebras Morita equivalent to Cl 0,,Cl 7Cl_0, \dots , Cl_7 serve as representatives of the super Brauer group of the real numbers, which is 8\mathbb{Z}_8. I explained this in week211 and further in week212. It’s a nice purely algebraic way to think about real Bott periodicity!

  • As we’ve seen, the super division algebras Morita equivalent to Cl 0Cl_0 and Cl 4Cl_4 are a bit funny. They’re purely even. So they serve as representatives of the plain old Brauer group of the real numbers, which is 2\mathbb{Z}_2.

  • On the other hand, the complex Clifford algebras l 0=\mathbb{C}\mathrm{l}_0 = \mathbb{C} and l 1\mathbb{C}\mathrm{l}_1 serve as representatives of the super Brauer group of the complex numbers, which is also 2\mathbb{Z}_2. This is a purely algebraic way to think about complex Bott periodicity, which has period 2 instead of period 8.

Meanwhile, the purely even ,\mathbb{R}, \mathbb{C} and \mathbb{H} underlie Dyson’s ‘three-fold way’, which I explained in detail here:

Briefly, if you have an irreducible unitary representation of a group on a complex Hilbert space HH, there are three possibilities:

  • The representation is isomorphic to its dual via an invariant symmetric bilinear pairing g:H×Hg : H \times H \to \mathbb{C}. In this case it has an invariant antiunitary operator J:HHJ : H \to H with J 2=1J^2 = 1. This lets us write our representation as the complexification of a real one.

  • The representation is isomorphic to its dual via an invariant antisymmetric bilinear pairing ω:H×H\omega : H \times H \to \mathbb{C}. In this case it has an invariant antiunitary operator J:HHJ : H \to H with J 2=1J^2 = -1. This lets us promote our representation to a quaternionic one.

  • The representation is not isomorphic to its dual. In this case we say it’s truly complex.

In physics applications, we can take JJ to be either time reversal symmetry, TT, or charge conjugation symmetry, CC. Studying either symmetry separately leads us to Dyson’s three-fold way. Studying them both together leads to the ten-fold way!

So the ten-fold way seems to combine in one nice package:

  • real Bott periodicity,
  • complex Bott periodicity,
  • the real Brauer group,
  • the real super Brauer group,
  • the complex super Brauer group, and
  • the three-fold way.

I could throw ‘the complex Brauer group’ into this list, because that’s lurking here too, but it’s the trivial group, with \mathbb{C} as its representative.

There really should be a better way to understand this. Here’s my best attempt right now.

The set of Morita equivalence classes of finite-dimensional real superalgebras gets a commutative monoid structure thanks to direct sum. This commutative monoid then gets a commutative rig structure thanks to tensor product. This commutative rig — let’s call it \mathfrak{R} — is apparently too complicated to understand in detail, though I’d love to be corrected about that. But we can peek at pieces:

  • We can look at the group of invertible elements in \mathfrak{R} — more precisely, elements with multiplicative inverses. This is the real super Brauer group 8\mathbb{Z}_8.

  • We can look at the sub-rig of \mathfrak{R} coming from semisimple purely even algebras. As a commutative monoid under addition, this is 3\mathbb{N}^3, since it’s generated by ,\mathbb{R}, \mathbb{C} and \mathbb{H}. This commutative monoid becomes a rig with a funny multiplication table, e.g. =\mathbb{C} \otimes \mathbb{C} = \mathbb{C} \oplus \mathbb{C}. This captures some aspects of the three-fold way.

We should really look at a larger chunk of the rig \mathfrak{R}, that includes both of these chunks. How about the sub-rig coming from all semisimple superalgebras? What’s that?

And here’s another question: what’s the relation to the 10 classical families of compact symmetric spaces? The short answer is that each family describes a family of possible Hamiltonians for one of our 10 kinds of matter. For a more detailed answer, I suggest reading Gregory Moore’s Quantum symmetries and compatible Hamiltonians. But if you look at this chart by Ryu et al, you’ll see these families involve a nice interplay between ,\mathbb{R}, \mathbb{C} and \mathbb{H}, which is what this story is all about:

The families of symmetric spaces are listed in the column “Hamiltonian”.

All this stuff is fitting together more and more nicely! And if you look at the paper by Freed and Moore, you’ll see there’s a lot more involved when you take the symmetries of crystals into account. People are beginning to understand the algebraic and topological aspects of condensed matter much more deeply these days.

The list

Just for the record, here are all 10 associative real super division algebras. 8 are Morita equivalent to real Clifford algebras:

  • Cl 0Cl_0 is the purely even division algebra \mathbb{R}.

  • Cl 1Cl_1 is the super division algebra e\mathbb{R} \oplus \mathbb{R}e, where ee is an odd element with e 2=1e^2 = -1.

  • Cl 2Cl_2 is the super division algebra e\mathbb{C} \oplus \mathbb{C}e, where ee is an odd element with e 2=1e^2 = -1 and ei=iee i = -i e.

  • Cl 3Cl_3 is the super division algebra e\mathbb{H} \oplus \mathbb{H}e, where ee is an odd element with e 2=1e^2 = 1 and ei=ie,ej=je,ek=kee i = i e, e j = j e, e k = k e.

  • Cl 4Cl_4 is [2]\mathbb{H}[2], the algebra of 2×22 \times 2 quaternionic matrices, given a certain 2\mathbb{Z}_2-grading. This is Morita equivalent to the purely even division algebra \mathbb{H}.

  • Cl 5Cl_5 is [2][2]\mathbb{H}[2] \oplus \mathbb{H}[2] given a certain 2\mathbb{Z}_2-grading. This is Morita equivalent to the super division algebra e\mathbb{H} \oplus \mathbb{H}e where ee is an odd element with e 2=1e^2 = -1 and ei=ie,ej=je,ek=kee i = i e, e j = j e, e k = k e.

  • Cl 6Cl_6 is [4][4]\mathbb{C}[4] \oplus \mathbb{C}[4] given a certain 2\mathbb{Z}_2-grading. This is Morita equivalent to the super division algebra e\mathbb{C} \oplus \mathbb{C}e where ee is an odd element with e 2=1e^2 = 1 and ei=iee i = -i e.

  • Cl 7Cl_7 is [8][8]\mathbb{R}[8] \oplus \mathbb{R}[8] given a certain 2\mathbb{Z}_2-grading. This is Morita equivalent to the super division algebra e\mathbb{R} \oplus \mathbb{R}e where ee is an odd element with e 2=1e^2 = 1.

Cl n+8Cl_{n+8} is Morita equivalent to Cl nCl_n so we can stop here if we’re just looking for Morita equivalence classes, and there also happen to be no more super division algebras down this road. It is nice to compare Cl nCl_n and Cl 8nCl_{8-n}: there’s a nice pattern here.

The remaining 2 real super division algebras are complex Clifford algebras:

  • l 0\mathbb{C}\mathrm{l}_0 is the purely even division algebra \mathbb{C}.

  • l 1\mathbb{C}\mathrm{l}_1 is the super division algebra e\mathbb{C} \oplus \mathbb{C} e, where ee is an odd element with e 2=1e^2 = -1 and ei=iee i = i e.

In the last one we could also say “with e 2=1e^2 = 1” — we’d get something isomorphic, not a new possibility.

Ten dimensions of string theory

Oh yeah — what about the 10 dimensions in string theory? Are they really related to the ten-fold way?

It seems weird, but I think the answer is “yes, at least slightly”.

Remember, 2 of the dimensions in 10d string theory are those of the string worldsheet, which is a complex manifold. The other 8 are connected to the octonions, which in turn are connected to the 8-fold periodicity of real Clifford algebra. So the 8+2 split in string theory is at least slightly connected to the 8+2 split in the list of associative real super division algebras.

This may be more of a joke than a deep observation. After all, the 8 dimensions of the octonions are not individual things with distinct identities, as the 8 super division algebras coming from real Clifford algebras are. So there’s no one-to-one correspondence going on here, just an equation between numbers.

Still, there are certain observations that would be silly to resist mentioning.

Geraint F. LewisA cosmic two-step: the universal dance of the dwarf galaxies

We had a paper in Nature this week, and I think this paper is exciting and important. I've written an article for The Conversation which you can read it here.


July 24, 2014

Sean CarrollWhy Probability in Quantum Mechanics is Given by the Wave Function Squared

One of the most profound and mysterious principles in all of physics is the Born Rule, named after Max Born. In quantum mechanics, particles don’t have classical properties like “position” or “momentum”; rather, there is a wave function that assigns a (complex) number, called the “amplitude,” to each possible measurement outcome. The Born Rule is then very simple: it says that the probability of obtaining any possible measurement outcome is equal to the square of the corresponding amplitude. (The wave function is just the set of all the amplitudes.)

Born Rule:     \mathrm{Probability}(x) = |\mathrm{amplitude}(x)|^2.

The Born Rule is certainly correct, as far as all of our experimental efforts have been able to discern. But why? Born himself kind of stumbled onto his Rule. Here is an excerpt from his 1926 paper:

Born Rule

That’s right. Born’s paper was rejected at first, and when it was later accepted by another journal, he didn’t even get the Born Rule right. At first he said the probability was equal to the amplitude, and only in an added footnote did he correct it to being the amplitude squared. And a good thing, too, since amplitudes can be negative or even imaginary!

The status of the Born Rule depends greatly on one’s preferred formulation of quantum mechanics. When we teach quantum mechanics to undergraduate physics majors, we generally give them a list of postulates that goes something like this:

  1. Quantum states are represented by wave functions, which are vectors in a mathematical space called Hilbert space.
  2. Wave functions evolve in time according to the Schrödinger equation.
  3. The act of measuring a quantum system returns a number, known as the eigenvalue of the quantity being measured.
  4. The probability of getting any particular eigenvalue is equal to the square of the amplitude for that eigenvalue.
  5. After the measurement is performed, the wave function “collapses” to a new state in which the wave function is localized precisely on the observed eigenvalue (as opposed to being in a superposition of many different possibilities).

It’s an ungainly mess, we all agree. You see that the Born Rule is simply postulated right there, as #4. Perhaps we can do better.

Of course we can do better, since “textbook quantum mechanics” is an embarrassment. There are other formulations, and you know that my own favorite is Everettian (“Many-Worlds”) quantum mechanics. (I’m sorry I was too busy to contribute to the active comment thread on that post. On the other hand, a vanishingly small percentage of the 200+ comments actually addressed the point of the article, which was that the potential for many worlds is automatically there in the wave function no matter what formulation you favor. Everett simply takes them seriously, while alternatives need to go to extra efforts to erase them. As Ted Bunn argues, Everett is just “quantum mechanics,” while collapse formulations should be called “disappearing-worlds interpretations.”)

Like the textbook formulation, Everettian quantum mechanics also comes with a list of postulates. Here it is:

  1. Quantum states are represented by wave functions, which are vectors in a mathematical space called Hilbert space.
  2. Wave functions evolve in time according to the Schrödinger equation.

That’s it! Quite a bit simpler — and the two postulates are exactly the same as the first two of the textbook approach. Everett, in other words, is claiming that all the weird stuff about “measurement” and “wave function collapse” in the conventional way of thinking about quantum mechanics isn’t something we need to add on; it comes out automatically from the formalism.

The trickiest thing to extract from the formalism is the Born Rule. That’s what Charles (“Chip”) Sebens and I tackled in our recent paper:

Self-Locating Uncertainty and the Origin of Probability in Everettian Quantum Mechanics
Charles T. Sebens, Sean M. Carroll

A longstanding issue in attempts to understand the Everett (Many-Worlds) approach to quantum mechanics is the origin of the Born rule: why is the probability given by the square of the amplitude? Following Vaidman, we note that observers are in a position of self-locating uncertainty during the period between the branches of the wave function splitting via decoherence and the observer registering the outcome of the measurement. In this period it is tempting to regard each branch as equiprobable, but we give new reasons why that would be inadvisable. Applying lessons from this analysis, we demonstrate (using arguments similar to those in Zurek’s envariance-based derivation) that the Born rule is the uniquely rational way of apportioning credence in Everettian quantum mechanics. In particular, we rely on a single key principle: changes purely to the environment do not affect the probabilities one ought to assign to measurement outcomes in a local subsystem. We arrive at a method for assigning probabilities in cases that involve both classical and quantum self-locating uncertainty. This method provides unique answers to quantum Sleeping Beauty problems, as well as a well-defined procedure for calculating probabilities in quantum cosmological multiverses with multiple similar observers.

Chip is a graduate student in the philosophy department at Michigan, which is great because this work lies squarely at the boundary of physics and philosophy. (I guess it is possible.) The paper itself leans more toward the philosophical side of things; if you are a physicist who just wants the equations, we have a shorter conference proceeding.

Before explaining what we did, let me first say a bit about why there’s a puzzle at all. Let’s think about the wave function for a spin, a spin-measuring apparatus, and an environment (the rest of the world). It might initially take the form

(α[up] + β[down] ; apparatus says “ready” ; environment0).             (1)

This might look a little cryptic if you’re not used to it, but it’s not too hard to grasp the gist. The first slot refers to the spin. It is in a superposition of “up” and “down.” The Greek letters α and β are the amplitudes that specify the wave function for those two possibilities. The second slot refers to the apparatus just sitting there in its ready state, and the third slot likewise refers to the environment. By the Born Rule, when we make a measurement the probability of seeing spin-up is |α|2, while the probability for seeing spin-down is |β|2.

In Everettian quantum mechanics (EQM), wave functions never collapse. The one we’ve written will smoothly evolve into something that looks like this:

α([up] ; apparatus says “up” ; environment1)
     + β([down] ; apparatus says “down” ; environment2).             (2)

This is an extremely simplified situation, of course, but it is meant to convey the basic appearance of two separate “worlds.” The wave function has split into branches that don’t ever talk to each other, because the two environment states are different and will stay that way. A state like this simply arises from normal Schrödinger evolution from the state we started with.

So here is the problem. After the splitting from (1) to (2), the wave function coefficients α and β just kind of go along for the ride. If you find yourself in the branch where the spin is up, your coefficient is α, but so what? How do you know what kind of coefficient is sitting outside the branch you are living on? All you know is that there was one branch and now there are two. If anything, shouldn’t we declare them to be equally likely (so-called “branch-counting”)? For that matter, in what sense are there probabilities at all? There was nothing stochastic or random about any of this process, the entire evolution was perfectly deterministic. It’s not right to say “Before the measurement, I didn’t know which branch I was going to end up on.” You know precisely that one copy of your future self will appear on each branch. Why in the world should we be talking about probabilities?

Note that the pressing question is not so much “Why is the probability given by the wave function squared, rather than the absolute value of the wave function, or the wave function to the fourth, or whatever?” as it is “Why is there a particular probability rule at all, since the theory is deterministic?” Indeed, once you accept that there should be some specific probability rule, it’s practically guaranteed to be the Born Rule. There is a result called Gleason’s Theorem, which says roughly that the Born Rule is the only consistent probability rule you can conceivably have that depends on the wave function alone. So the real question is not “Why squared?”, it’s “Whence probability?”

Of course, there are promising answers. Perhaps the most well-known is the approach developed by Deutsch and Wallace based on decision theory. There, the approach to probability is essentially operational: given the setup of Everettian quantum mechanics, how should a rational person behave, in terms of making bets and predicting experimental outcomes, etc.? They show that there is one unique answer, which is given by the Born Rule. In other words, the question “Whence probability?” is sidestepped by arguing that reasonable people in an Everettian universe will act as if there are probabilities that obey the Born Rule. Which may be good enough.

But it might not convince everyone, so there are alternatives. One of my favorites is Wojciech Zurek’s approach based on “envariance.” Rather than using words like “decision theory” and “rationality” that make physicists nervous, Zurek claims that the underlying symmetries of quantum mechanics pick out the Born Rule uniquely. It’s very pretty, and I encourage anyone who knows a little QM to have a look at Zurek’s paper. But it is subject to the criticism that it doesn’t really teach us anything that we didn’t already know from Gleason’s theorem. That is, Zurek gives us more reason to think that the Born Rule is uniquely preferred by quantum mechanics, but it doesn’t really help with the deeper question of why we should think of EQM as a theory of probabilities at all.

Here is where Chip and I try to contribute something. We use the idea of “self-locating uncertainty,” which has been much discussed in the philosophical literature, and has been applied to quantum mechanics by Lev Vaidman. Self-locating uncertainty occurs when you know that there multiple observers in the universe who find themselves in exactly the same conditions that you are in right now — but you don’t know which one of these observers you are. That can happen in “big universe” cosmology, where it leads to the measure problem. But it automatically happens in EQM, whether you like it or not.

Think of observing the spin of a particle, as in our example above. The steps are:

  1. Everything is in its starting state, before the measurement.
  2. The apparatus interacts with the system to be observed and becomes entangled. (“Pre-measurement.”)
  3. The apparatus becomes entangled with the environment, branching the wave function. (“Decoherence.”)
  4. The observer reads off the result of the measurement from the apparatus.

The point is that in between steps 3. and 4., the wave function of the universe has branched into two, but the observer doesn’t yet know which branch they are on. There are two copies of the observer that are in identical states, even though they’re part of different “worlds.” That’s the moment of self-locating uncertainty. Here it is in equations, although I don’t think it’s much help.


You might say “What if I am the apparatus myself?” That is, what if I observe the outcome directly, without any intermediating macroscopic equipment? Nice try, but no dice. That’s because decoherence happens incredibly quickly. Even if you take the extreme case where you look at the spin directly with your eyeball, the time it takes the state of your eye to decohere is about 10-21 seconds, whereas the timescales associated with the signal reaching your brain are measured in tens of milliseconds. Self-locating uncertainty is inevitable in Everettian quantum mechanics. In that sense, probability is inevitable, even though the theory is deterministic — in the phase of uncertainty, we need to assign probabilities to finding ourselves on different branches.

So what do we do about it? As I mentioned, there’s been a lot of work on how to deal with self-locating uncertainty, i.e. how to apportion credences (degrees of belief) to different possible locations for yourself in a big universe. One influential paper is by Adam Elga, and comes with the charming title of “Defeating Dr. Evil With Self-Locating Belief.” (Philosophers have more fun with their titles than physicists do.) Elga argues for a principle of Indifference: if there are truly multiple copies of you in the world, you should assume equal likelihood for being any one of them. Crucially, Elga doesn’t simply assert Indifference; he actually derives it, under a simple set of assumptions that would seem to be the kind of minimal principles of reasoning any rational person should be ready to use.

But there is a problem! Naïvely, applying Indifference to quantum mechanics just leads to branch-counting — if you assign equal probability to every possible appearance of equivalent observers, and there are two branches, each branch should get equal probability. But that’s a disaster; it says we should simply ignore the amplitudes entirely, rather than using the Born Rule. This bit of tension has led to some worry among philosophers who worry about such things.

Resolving this tension is perhaps the most useful thing Chip and I do in our paper. Rather than naïvely applying Indifference to quantum mechanics, we go back to the “simple assumptions” and try to derive it from scratch. We were able to pinpoint one hidden assumption that seems quite innocent, but actually does all the heavy lifting when it comes to quantum mechanics. We call it the “Epistemic Separability Principle,” or ESP for short. Here is the informal version (see paper for pedantic careful formulations):

ESP: The credence one should assign to being any one of several observers having identical experiences is independent of features of the environment that aren’t affecting the observers.

That is, the probabilities you assign to things happening in your lab, whatever they may be, should be exactly the same if we tweak the universe just a bit by moving around some rocks on a planet orbiting a star in the Andromeda galaxy. ESP simply asserts that our knowledge is separable: how we talk about what happens here is independent of what is happening far away. (Our system here can still be entangled with some system far away; under unitary evolution, changing that far-away system doesn’t change the entanglement.)

The ESP is quite a mild assumption, and to me it seems like a necessary part of being able to think of the universe as consisting of separate pieces. If you can’t assign credences locally without knowing about the state of the whole universe, there’s no real sense in which the rest of the world is really separate from you. It is certainly implicitly used by Elga (he assumes that credences are unchanged by some hidden person tossing a coin).

With this assumption in hand, we are able to demonstrate that Indifference does not apply to branching quantum worlds in a straightforward way. Indeed, we show that you should assign equal credences to two different branches if and only if the amplitudes for each branch are precisely equal! That’s because the proof of Indifference relies on shifting around different parts of the state of the universe and demanding that the answers to local questions not be altered; it turns out that this only works in quantum mechanics if the amplitudes are equal, which is certainly consistent with the Born Rule.

See the papers for the actual argument — it’s straightforward but a little tedious. The basic idea is that you set up a situation in which more than one quantum object is measured at the same time, and you ask what happens when you consider different objects to be “the system you will look at” versus “part of the environment.” If you want there to be a consistent way of assigning credences in all cases, you are led inevitably to equal probabilities when (and only when) the amplitudes are equal.

What if the amplitudes for the two branches are not equal? Here we can borrow some math from Zurek. (Indeed, our argument can be thought of as a love child of Vaidman and Zurek, with Elga as midwife.) In his envariance paper, Zurek shows how to start with a case of unequal amplitudes and reduce it to the case of many more branches with equal amplitudes. The number of these pseudo-branches you need is proportional to — wait for it — the square of the amplitude. Thus, you get out the full Born Rule, simply by demanding that we assign credences in situations of self-locating uncertainty in a way that is consistent with ESP.

We like this derivation in part because it treats probabilities as epistemic (statements about our knowledge of the world), not merely operational. Quantum probabilities are really credences — statements about the best degree of belief we can assign in conditions of uncertainty — rather than statements about truly stochastic dynamics or frequencies in the limit of an infinite number of outcomes. But these degrees of belief aren’t completely subjective in the conventional sense, either; there is a uniquely rational choice for how to assign them.

Working on this project has increased my own personal credence in the correctness of the Everett approach to quantum mechanics from “pretty high” to “extremely high indeed.” There are still puzzles to be worked out, no doubt, especially around the issues of exactly how and when branching happens, and how branching structures are best defined. (I’m off to a workshop next month to think about precisely these questions.) But these seem like relatively tractable technical challenges to me, rather than looming deal-breakers. EQM is an incredibly simple theory that (I can now argue in good faith) makes sense and fits the data. Now it’s just a matter of convincing the rest of the world!

ResonaancesHiggs Recap

On the occasion of summer conferences the LHC experiments dumped a large number of new Higgs results. Most of them have already been advertised on blogs, see e.g. here or here or here. In case you missed anything, here I summarize the most interesting updates of the last few weeks.

1. Mass measurements.
Both ATLAS and CMS recently presented improved measurements of the Higgs boson mass in the diphoton and 4-lepton final states. The errors shrink to 400 MeV in ATLAS and 300 MeV in CMS. The news is that Higgs has lost some weight (the boson, not Peter). A naive combination of the ATLAS and CMS results yields the central value 125.15 GeV. The profound consequence is that, for another year at least,  we will call it the 125 GeV particle, rather than the 125.5 GeV particle as before ;)

While the central values of the Higgs mass combinations quoted by ATLAS and CMS are very close, 125.36 vs 125.03 GeV, the individual inputs are still a bit apart from each other. Although the consistency of the ATLAS measurements in the  diphoton and 4-lepton channels has improved, these two independent mass determinations differ by 1.5 GeV, which corresponds to a 2 sigma tension. Furthermore, the central values of the Higgs mass quoted by ATLAS and CMS differ by 1.3 GeV in the diphoton channel and by 1.1 in the 4-lepton channel, which also amount to 2 sigmish discrepancies. This could be just bad luck, or maybe the systematic errors are slightly larger than the experimentalists think.

2. Diphoton rate update.
CMS finally released a new value of the Higgs signal strength in the diphoton channel.  This CMS measurement was a bit of a roller-coaster: initially they measured an excess, then with the full dataset they reported a small deficit. After more work and more calibration they settled to the value 1.14+0.26-0.23 relative to the standard model prediction, in perfect agreement with the standard model. Meanwhile ATLAS is also revising the signal strength in this channel towards the standard model value.  The number 1.29±0.30 quoted  on the occasion of the mass measurement is not yet the final one; there will soon be a dedicated signal strength measurement with, most likely, a slightly smaller error.  Nevertheless, we can safely announce that the celebrated Higgs diphoton excess is no more.

3. Off-shell Higgs.
Most of the LHC searches are concerned with an on-shell Higgs, that is when its 4-momentum squared is very close to its mass. This is where Higgs is most easily recognizable, since it can show as a bump in invariant mass distributions. However Higgs, like any quantum particle, can also appear as a virtual particle off-mass-shell and influence, in a subtler way, the cross section or differential distributions of various processes. One place where an off-shell Higgs may visible contribute is the pair production of on-shell Z bosons. In this case, the interference between gluon-gluon → Higgs → Z Z process and  the non-Higgs one-loop Standard Model contribution to gluon-gluon → Z Z process can influence the cross section in a non-negligible way.  At the beginning, these off-shell measurements were advertised as a model-independent Higgs width measurement, although now it is recognized the "model-independent" claim does not stand. Nevertheless, measuring the ratio of the off-shell and on-shell Higgs production provides qualitatively new information  about the Higgs couplings and, under some specific assumptions, can be interpreted an indirect constraint on the Higgs width. Now both ATLAS and CMS quote the constraints on the Higgs width at the level of 5 times the Standard Model value.  Currently, these results are not very useful in practice. Indeed, it would require a tremendous conspiracy to reconcile the current data with the Higgs width larger than 1.3 the standard model  one. But a new front has been opened, and one hopes for much more interesting results in the future.

4. Tensor structure of Higgs couplings.
Another front that is being opened as we speak is constraining higher order Higgs couplings with a different tensor structure. So far, we have been given the so-called spin/parity measurements. That is to say, the LHC experiments imagine a 125 GeV particle with a different spin and/or parity than the Higgs, and the couplings to matter consistent with that hypothesis. Than they test  whether this new particle or the standard model Higgs better describes the observed differential  distributions of Higgs decay products. This has some appeal to general public and nobel committees but little practical meaning. That's because the current data, especially the Higgs signal strength measured in multiple channels, clearly show that the Higgs is, in the first approximation, the standard model one. New physics, if exists, may only be a small perturbation on top of the standard model couplings. The relevant  question is how well we can constrain these perturbations. For example, possible couplings of the Higgs to the Z boson are

In the standard model only the first type of coupling is present in the Lagrangian, and all the a coefficients are zero. New heavy particles coupled to the Higgs and Z bosons could be indirectly detected by measuring non-zero a's, In particular, a3 violates the parity symmetry and could arise from mixing of the standard model Higgs with a pseudoscalar particle. The presence of non-zero a's would show up, for example,  as a modification of the lepton momentum distributions in the Higgs decay to 4 leptons. This was studied by CMS in this note. What they do is not perfect yet, and the results are presented in an unnecessarily complicated fashion. In any case it's a step in the right direction: as the analysis improves and more statistics is accumulated in the next runs these measurements will become an important probe of new physics.

5. Flavor violating decays.
In the standard model, the Higgs couplings conserve flavor, in both the quark and the lepton sectors. This is a consequence of the assumption that the theory is renormalizable and that only 1 Higgs field is present.  If either of these assumptions is violated, the Higgs boson may mediate transitions between different generations of matter. Earlier, ATLAS and CMS  searched for top quark decay to charm and Higgs. More recently, CMS turned to lepton flavor violation, searching for Higgs decays to τμ pairs. This decay cannot occur in the standard model, so the search is a clean null test. At the same time, the final state is relatively simple from the experimental point of view, thus this decay may be a sensitive probe of new physics. Amusingly, CMS sees a 2.5 sigma significant  excess corresponding to the h→τμ branching fraction of order 1%. So we can entertain a possibility that Higgs holds the key to new physics and flavor hierarchies, at least until ATLAS comes out with its own measurement.

July 22, 2014

Tommaso DorigoTrue And False Discoveries: How To Tell Them Apart

Many new particles and other new physics signals claimed in the last twenty years were later proven to be spurious effects, due to background fluctuations or unknown sources of systematic error. The list is long, unfortunately - and longer than the list of particles and effects that were confirmed to be true by subsequent more detailed or more statistically-rich analysis.

read more

Clifford Johnson74 Questions

open_questions_cvjHello from the Aspen Center for Physics. One of the things I wanted to point out to you last month was the 74 questions that Andy Strominger put on the slides of his talk in the last session of the Strings 2014 conference (which, you may recall from earlier posts, I attended). This was one of the "Vision Talks" that ended the sessions, where a number of speakers gave some overview thoughts about work in the field at large. Andy focused mostly on progress in quantum gravity matters in string theory, and was quite upbeat. He declines (wisely) to make predictions about where the field might be going, instead pointing out (not for the first time) that if you look at the things we've made progress on in the last N years, most (if not all) of those things would not have been on anyone's list of predictions N years ago. (He gave a specific value for N, I just can't recall what it is, but it does not matter.) He sent an email to everyone who was either speaking, organising, moderating a session or similarly involved in the conference, asking them to send, off the [...] Click to continue reading this post

Scott Aaronson“How Might Quantum Information Transform Our Future?”

So, the Templeton Foundation invited me to write a 1500-word essay on the above question.  It’s like a blog post, except they pay me to do it!  My essay is now live, here.  I hope you enjoy my attempt at techno-futurist prose.  You can comment on the essay either here or over at Templeton’s site.  Thanks very much to Ansley Roan for commissioning the piece.

July 21, 2014

n-Category Café Pullbacks That Preserve Weak Equivalences

The following concept seems to have been reinvented a bunch of times by a bunch of people, and every time they give it a different name.

Definition: Let CC be a category with pullbacks and a class of weak equivalences. A morphism f:ABf:A\to B is a [insert name here] if the pullback functor f *:C/BC/Af^\ast:C/B \to C/A preserves weak equivalences.

In a right proper model category, every fibration is one of these. But even in that case, there are usually more of these than just the fibrations. There is of course also a dual notion in which pullbacks are replaced by pushouts, and every cofibration in a left proper model category is one of those.

What should we call them?

The names that I’m aware of that have so far been given to these things are:

  1. sharp map, by Charles Rezk. This is a dualization of the terminology flat map used for the dual notion by Mike Hopkins (I don’t know a reference, does anyone?). I presume that Hopkins’ motivation was that a ring homomorphism is flat if tensoring with it (which is the pushout in the category of commutative rings) is exact, hence preserves weak equivalences of chain complexes.

    However, “flat” has the problem of being a rather overused word. For instance, we may want to talk about these objects in the canonical model structure on CatCat (where in fact it turns out that every such functor is a cofibration), but flat functor has a very different meaning. David White has pointed out that “flat” would also make sense to use for the monoid axiom in monoidal model categories.

  2. right proper, by Andrei Radulescu-Banu. This is presumably motivated by the above-mentioned fact that fibrations in right proper model categories are such. Unfortunately, proper map also has another meaning.

  3. hh-fibration, by Berger and Batanin. This is presumably motivated by the fact that “hh-cofibration” has been used by May and Sigurdsson for an intrinsic notion of cofibration in topologically enriched categories, that specializes in compactly generated spaces to closed Hurewicz cofibrations, and pushouts along the latter preserve weak homotopy equivalences. However, it makes more sense to me to keep “hh-cofibration” with May and Sigurdsson’s original meaning.

  4. Grothendieck WW-fibration (where WW is the class of weak equivalences on CC), by Ara and Maltsiniotis. Apparently this comes from unpublished work of Grothendieck. Here I guess the motivation is that these maps are “like fibrations” and are determined by the class WW of weak equivalences.

Does anyone know of other references for this notion, perhaps with other names? And any opinions on what the best name is? I’m currently inclined towards “WW-fibration” mainly because it doesn’t clash with anything else, but I could be convinced otherwise.

John PreskillReading the sub(linear) text

Physicists are not known for finesse. “Even if it cost us our funding,” I’ve heard a physicist declare, “we’d tell you what we think.” Little wonder I irked the porter who directed me toward central Cambridge.

The University of Cambridge consists of colleges as the US consists of states. Each college has a porter’s lodge, where visitors check in and students beg for help after locking their keys in their rooms. And where physicists ask for directions.

Last March, I ducked inside a porter’s lodge that bustled with deliveries. The woman behind the high wooden desk volunteered to help me, but I asked too many questions. By my fifth, her pointing at a map had devolved to jabbing.

Read the subtext, I told myself. Leave.

Or so I would have told myself, if not for that afternoon.

That afternoon, I’d visited Cambridge’s CMS, which merits every letter in “Centre for Mathematical Sciences.” Home to Isaac Newton’s intellectual offspring, the CMS consists of eight soaring, glass-walled, blue-topped pavilions. Their majesty walloped me as I turned off the road toward the gatehouse. So did the congratulatory letter from Queen Elizabeth II that decorated the route to the restroom.


I visited Nilanjana Datta, an affiliated lecturer of Cambridge’s Faculty of Mathematics, and her student, Felix Leditzky. Nilanjana and Felix specialize in entropies and one-shot information theory. Entropies quantify uncertainties and efficiencies. Imagine compressing many copies of a message into the smallest possible number of bits (units of memory). How few bits can you use per copy? That number, we call the optimal compression rate. It shrinks as the number of copies compressed grows. As the number of copies approaches infinity, that compression rate drops toward a number called the message’s Shannon entropy. If the message is quantum, the compression rate approaches the von Neumann entropy.

Good luck squeezing infinitely many copies of a message onto a hard drive. How efficiently can we compress fewer copies? According to one-shot information theory, the answer involves entropies other than Shannon’s and von Neumann’s. In addition to describing data compression, entropies describe the charging of batteriesthe concentration of entanglementthe encrypting of messages, and other information-processing tasks.

Speaking of compressing messages: Suppose one-shot information theory posted status updates on Facebook. Suppose that that panel on your Facebook page’s right-hand side showed news weightier than celebrity marriages. The news feed might read, “TRENDING: One-shot information theory: Second-order asymptotics.”

Second-order asymptotics, I learned at the CMS, concerns how the optimal compression rate decays as the number of copies compressed grows. Imagine compressing a billion copies of a quantum message ρ. The number of bits needed about equals a billion times the von Neumann entropy HvN(ρ). Since a billion is less than infinity, 1,000,000,000 HvN(ρ) bits won’t suffice. Can we estimate the compression rate more precisely?

The question reminds me of gas stations’ hidden pennies. The last time I passed a station’s billboard, some number like $3.65 caught my eye. Each gallon cost about $3.65, just as each copy of ρ costs about HvN(ρ) bits. But a 9/10, writ small, followed the $3.65. If I’d budgeted $3.65 per gallon, I couldn’t have filled my tank. If you budget HvN(ρ) bits per copy of ρ, you can’t compress all your copies.

Suppose some station’s owner hatches a plan to promote business. If you buy one gallon, you pay $3.654. The more you purchase, the more the final digit drops from four. By cataloguing receipts, you calculate how a tank’s cost varies with the number of gallons, n. The cost equals $3.65 × n to a first approximation. To a second approximation, the cost might equal $3.65 × n + an, wherein a represents some number of cents. Compute a, and you’ll have computed the gas’s second-order asymptotics.

Nilanjana and Felix computed a’s associated with data compression and other quantum tasks. Second-order asymptotics met information theory when Strassen combined them in nonquantum problems. These problems developed under attention from Hayashi, Han, Polyanski, Poor, Verdu, and others. Tomamichel and Hayashi, as well as Li, introduced quantumness.

In the total-cost expression, $3.65 × n depends on n directly, or “linearly.” The second term depends on √n. As the number of gallons grows, so does √n, but √n grows more slowly than n. The second term is called “sublinear.”

Which is the word that rose to mind in the porter’s lodge. I told myself, Read the sublinear text.

Little wonder I irked the porter. At least—thanks to quantum information, my mistake, and facial expressions’ contagiousness—she smiled.



With thanks to Nilanjana Datta and Felix Leditzky for explanations and references; to Nilanjana, Felix, and Cambridge’s Centre for Mathematical Sciences for their hospitality; and to porters everywhere for providing directions.

Sesh NadathurShort news items

Over the past two months I have been on a two-week seminar tour of the UK, taken a short holiday, attended a conference in Estonia and spent a week visiting collaborators in Spain. Posting on the blog has unfortunately suffered as a result: my apologies. Here are some items of interest that have appeared in the meantime:
  • The BICEP and Planck teams are to share their data — here's the BBC report of this news. The information I have from Planck sources is that Planck will put out a paper with new data very soon (about a week ago I heard it would be "maybe in two weeks", so let's say two or three weeks from today). This new data will then be shared with the BICEP team, and the two teams will work together to analyse its implications for the BICEP result. From the timescales involved my guess is that what Planck will be making available is a measurement of the polarised dust foreground in the BICEP sky region, and the joint publication will involve cross-correlating this map with the B-mode map measured by BICEP. A significant cross-correlation would indicate that most (or all) of the signal BICEP detected was due to dust.
  • What Planck will not be releasing in the next couple of weeks is their own measurement of the polarization of the CMB, in particular their own estimate of the value of $r$. The timetable for this release is still October: this is a deadline imposed by the fact that ESA requires Planck to release the data by December, but another major ESA mission (I forget which) is due to be launched in November and ESA don't like scheduling "competing" press conferences in the same month because there's only so much science news Joe Public can absorb at a time. From what I've heard, getting the full polarization data ready for October is a bit of a rush as it is, so it's fairly certain that's not what they're releasing soon.
  • By the way, I think I've recently understood a little better how a collaboration as enormous as Planck manage to remain so disciplined and avoid leaking rumours: it's because most of the people in the collaboration don't know the full details of the results either! That is to say, the collaboration is split into small sub-groups with specified responsibilities, and these sub-groups don't share results with each other. So if you ask a randomly chosen Planck member what the preliminary polarization results are looking like, chances are they don't know any better than you. (Though this may not stop them from saying "Well, I've seen some very interesting plots ..." and smiling enigmatically!)
  • The conference I attended in Estonia was the IAU symposium in honour of the 100th birth anniversary of the great Ya. B. Zel'dovich, on the general topic of large-scale structure and the cosmic web. I'll try to write a little about my general impressions of the conference next time. In the meantime all the talks are available for download from the website here.
  • A science news story you may have seen recently is "Biggest void in universe may explain cosmic cold spot": this is a claim that a recently detected region with a relative deficit of galaxies (the "supervoid") explains the existence of the unusual Cold Spot that has been seen in the CMB, without the need to invoke any unusual new physics. The claim of the explanation is based on this paper. Unfortunately this claim is wrong, and the paper itself has several problems. My collaborators and I are in the process of writing a paper of our own discussing why, and when we are done I will try to explain the issues on here as well. In the meantime, you heard it here first: a supervoid does not explain the Cold Spot!
Update: It has been pointed out to me that last week Julien Lesgourgues gave a talk about Planck and particle physics at the Strong and Electroweak Matter (SEWM14) symposium, in which he also discussed the timeline of forthcoming Planck and BICEP papers. You can see this on page 12 of his talk (pdf) and it is roughly the same as what I wrote above (except that there's a typo in the year — it should be 2014 not 2015!).

July 20, 2014

BackreactionI saw the future [Video] Making of.

You wanted me to smile. I did my best :p

With all the cropping and overlays my computer worked on the video mixdown for a full 12 hours and that in a miserable resolution. Amazingly, the video looks better after uploading it to YouTube. Whatever compression YouTube is using, it has nicely smoothened out some ugly pixelations that I couldn't get rid of.

The worst part of the video making is that my software screws up the audio timing upon export. Try as I might, the lip movements never quite seem to be in sync, even if they look perfectly fine before export. I am not sure exactly what causes the problem. One issue is that the timing of my camera seems to be slightly inaccurate. If I record a video with the audio running in the background and later add the same audio on a second track, the video runs too fast by about 100 ms over 3 minutes. That's already enough to note the delay and makes the editing really cumbersome. Another contributing factor seems to be simple errors in the data processing. The audio sometimes runs behind and then, with an ugly click, jumps back into place.

Another issue with the video is that, well, I don't have a video camera. I have a DSLR photo camera with a video option, but that has its limits. It does not for example automatically refocus during recording and it doesn't have a movable display either. That's a major problem since it means I can't focus the camera on myself. So I use a mop that I place in front of the blue screen, focus the camera on that, hit record, and then try to put myself in place of the mop. Needless to say, that doesn't always work, especially if I move around. This means my videos are crappy to begin with. They don't exactly get better with several imports and exports and rescalings and background removals and so on.

Oh yeah, and then the blue screen. After I noticed last time that pink is really a bad color for a background removal because skin tones are pink, not to mention lipstick, I asked Google. The internet in its eternal wisdom recommended a saturated blue rather than turquoise, which I had though of, and so I got myself a few meters of the cheapest royal blue fabric I could find online. When I replaced the background I turned into a zombie, and thus I was reminded I have blue eyes. For this reason I have replaced the background with something similar to the original color. And my eyes look bluer than they actually are.

This brings me to the audio. After I had to admit that my so-called songs sound plainly crappy, I bought and read a very recommendable book called "Mixing Audio" by Roey Izhaki. Since then I know words like multiband compressor and reverb tail. The audio mix still isn't particularly good, but at least it's better and since nobody else will do it, I'll go and congratulate myself on this awesomely punchy bass-kick loop which you'll only really appreciate if you download the mp3 and turn the volume up to max. Also note how the high frequency plings come out crystal clear after I figured out what an equalizer is good for.

My vocal recording and processing has reached its limits. There's only so much one can do without a studio environment. My microphone picks up all kinds of noise, from the cars passing by over the computer fan and the neighbor's washing machine to the church bells. I basically can't do recordings in one stretch, I have to repeat everything a few times and pick the best pieces. I've tried noise-removal tools, but the results sound terrible to me and, worse, they are not reproducible, which is a problem since I have to patch pieces together. So instead I push the vocals through several high-pass filters to get rid of the background noise. This leaves my voice sounding thinner than it is, so then I add some low-frequency reverb and a little chorus and it comes out sounding mostly fine.

I have given up on de-essing presets, they always leave me with a lisp on top of my German accent. Since I don't actually have a lot of vocals to deal with, I just treat all the 's' by hand in the final clip, and that sounds okay, at least to my ears.

Oh yeah, and I promise I'll not attempt again to hit a F#3, that was not a good idea. My voicebox clearly wasn't meant to support anything below B3. Which is strange as I evidently speak mostly in a frequency range so low that it is plainly unstable on my vocal cords. I do fairly well with everything between the middle and high C and have developed the rather strange habit of singing second and 3rd voices to myself when I get stuck on some calculation. I had the decency to remove the whole choir in the final version though ;)

Hope you enjoy this little excursion into the future. Altogether it was fun to make. And see, I even managed a smile, especially for you :o)

BackreactionWhat is a theory, What is a model?

During my first semester I coincidentally found out that the guy who often sat next to me, one of the better students, believed the Earth was only 15,000 years old. Once on the topic, he produced stacks of colorful leaflets which featured lots of names, decorated by academic titles, claiming that scientific evidence supports the scripture. I laughed at him, initially thinking he was joking, but he turned out to be dead serious and I was clearly going to roast in hell until future eternity.

If it hadn’t been for that strange encounter, I would summarily dismiss the US debates about creationism as a bizarre cultural reaction to lack of intellectual stimulation. But seeing that indoctrination can survive a physics and math education, and knowing the amount of time one can waste using reason against belief, I have a lot of sympathy for the fight of my US colleagues.

One of the main educational efforts I have seen is to explain what the word “theory” means to scientists. We are told that a “theory” isn’t just any odd story that somebody made up and told to his 12 friends, but that scientists use the word “theory” to mean an empirically well-established framework to describe observations.

That’s nice, but unfortunately not true. Maybe that is how scientist should use the word “theory”, but language doesn’t follow definitions: Cashews aren’t nuts, avocados aren’t vegetables, black isn’t a color. And a theory sometimes isn’t a theory.

The word “theory” has a common root with “theater” and originally seems to have meant “contemplation” or generally a “way to look at something,” which is quite close to the use of the word in today’s common language. Scientists adopted the word, but not in any regular way. It’s not like we vote on what gets called a theory and what doesn’t. So I’ll not attempt to give you a definition that nobody uses in practice, but just try an explanation that I think comes close to practice.

Physicists use the word theory for a well worked-out framework to describe the real world. The theory is basically a map between a model, that is a simplified stand-in for a real-world system, and reality. In physics, models are mathematical, and the theory is the dictionary to translate mathematical structures into observable quantities.

Exactly what counts as “well worked-out” is somewhat subjective, but as I said one doesn’t start with the definition. Instead, a framework that gets adapted by a big part of the community slowly lives up to deserve the title of a “theory”. Most importantly that means that the theory has to fulfil the scientific standards of the field. If something is called a theory it basically means scientists trust its quality.

One should not confuse the theory with the model. The model is what actually describes whatever part of the world you want to study by help of your theory.

General Relativity for example is a theory. It does not in and by itself describe anything we observe. For this, we have to first make several assumptions for symmetries and matter content to then arrive at model, the metric that describes space-time, from which observables can be calculated. Quantum field theory, to use another example, is a general calculation tool. To use it to describe the real world, you first have to specify what type of particles you have and what symmetries, and what process you want to look at; this gives you for example the standard model of particle physics. Quantum mechanics is a theory that doesn’t carry the name theory. A concrete model would for example be that of the Hydrogen atom, and so on. String theory has been such a convincing framework for so many that it has risen to the status of a “theory” without there being any empirical evidence.

A model doesn't necessarily have to be about describing the real world. To get a better understanding of a theory, it is often helpful to examine very simplified models even though one knows these do not describe reality. Such models are called “toy-models”. Examples are e.g. neutrino oscillations with only two flavors (even though we know there are at least three), gravity in 2 spatial dimensions (even though we know there are at least three), and the φ4 theory - where we reach the limits of my language theory, because according to what I said previously it should be a φ4 model (it falls into the domain of quantum field theory).

Phenomenological models (the things I work with) are models explicitly constructed to describe a certain property or observation (the “phenomenon”). They often use a theory that is known not to be fundamental. One never talks about phenomenological theories because the whole point of doing phenomenology is the model that makes contact to the real world. A phenomenological model serves usually one of two purposes: It is either a preliminary description of existing data or a preliminary prediction for not-yet existing data, both with the purpose to lead the way to a fully-fledged theory.

One does not necessarily need a model together with the theory to make predictions. Some theories have consequences that are true for all models and are said to be “model-independent”. Though if one wants to test them experimentally, one has to use a concrete model again. Tests of violations of Bell’s inequality maybe be an example. Entanglement is a general property of quantum mechanics, straight from the axioms of the theory, yet to test it in a certain setting one has to specify a model again. The existence of extra-dimensions in string theory may serve as another example of a model-independent prediction.

One doesn’t have to tell this to physicists, but the value of having a model defined in the language of mathematics is that one uses calculation, logical conclusions, to arrive at numerical values for observables (typically dependent on some parameters) from the basic assumptions of the model. Ie, it’s a way to limit the risk of fooling oneself and get lost in verbal acrobatics. I recently read an interesting and occasionally amusing essay from a mathematician-turned-biologist who tries to explain his colleagues what’s the point of constructing models:
“Any mathematical model, no matter how complicated, consists of a set of assumptions, from whichj are deduced a set of conclusions. The technical machinery specific to each flavor of model is concerned with deducing the latter from the former. This deduction comes with a guarantee, which, unlike other guarantees, can never be invalidated. Provided the model is correct, if you accept its assumptions, you must as a matter of logic also accept its conclusions.”
Well said.

After I realized the guy next to me in physics class wasn’t joking about his creationist beliefs, he went to length explaining that carbon-dating is a conspiracy. I went to length making sure to henceforth place my butt safely far away from him. It is beyond me how one can study a natural science and still interpret the Bible literally. Though I have a theory about this…

July 19, 2014

Tim GowersMini-monomath

The title of this post is a nod to Terry Tao’s four mini-polymath discussions, in which IMO questions were solved collaboratively online. As the beginning of what I hope will be a long exercise in gathering data about how humans solve these kinds of problems, I decided to have a go at one of this year’s IMO problems, with the idea of writing down my thoughts as I went along. Because I was doing that (and doing it directly into a LaTeX file rather than using paper and pen), I took quite a long time to solve the problem: it was the first question, and therefore intended to be one of the easier ones, so in a competition one would hope to solve it quickly and move on to the more challenging questions 2 and 3 (particularly 3). You get an average of an hour and a half per question, and I think I took at least that, though I didn’t actually time myself.

What I wrote gives some kind of illustration of the twists and turns, many of them fruitless, that people typically take when solving a problem. If I were to draw a moral from it, it would be this: when trying to solve a problem, it is a mistake to expect to take a direct route to the solution. Instead, one formulates subquestions and gradually builds up a useful bank of observations until the direct route becomes clear. Given that we’ve just had the football world cup, I’ll draw an analogy that I find not too bad (though not perfect either): a team plays better if it patiently builds up to an attack on goal than if it hoofs the ball up the pitch or takes shots from a distance. Germany gave an extraordinary illustration of this in their 7-1 defeat of Brazil.

I imagine that the rest of this post will be much more interesting if you yourself solve the problem before reading what I did. I in turn would be interested in hearing about other people’s experiences with the problem: were they similar to mine, or quite different? I would very much like to get a feel for how varied people’s experiences are. If you’re a competitor who solved the problem, feel free to join the discussion!

If I find myself with some spare time, I might have a go at doing the same with some of the other questions.

What follows is exactly what I wrote (or rather typed), with no editing at all, apart from changing the LaTeX so that it compiles in WordPress and adding two comments that are clearly marked in red.

Problem Let a_0<a_1<a_2<\dots be an infinite sequence of positive integers. Prove that there exists a unique integer n\geq 1 such that

a_n <\frac{a_0+a_1+\dots+a_n}n\leq a_{n+1}\ .

First thought.

Slight bafflement.


The expression in the middle is not an average. If we were to replace it by an average we would have the second inequality automatically.

Proof discovery technique.

Try looking at simple cases. Here we could consider what happens when n=1, for example. Then the inequality says

a_1<a_0+a_1\leq a_2\ .

Here we automatically have the first inequality, but there is no reason for the second inequality to be true.


Putting those observations together, we see that the first inequality is true when n=1, and the second inequality is “close to being true” as n gets large, since it is true if we replace n by n+1 in the denominator.

If the inequality holds for a unique n, then a plausible guess is that the first inequality fails at some m and if m_0 is minimal such that it fails, then both inequalities are true for m_0-1. I shall investigate that in due course, but I have another idea.


It is clear that WLOG a_0=1. Can we now choose a_1,a_2,\dots in such a way that we always get equality for the second inequality? We can certainly solve the equations, so the question is whether the resulting sequence will be increasing.

We get a_1=a_0/0, so I’d better set a_1=a and then continue constructing a sequence.

So a_2=a_1+1=a+1, a_3=(1+a+(a+1))/2=a+1, a_4=(1+a+(a+1)+(a+1))/3=(a+1), and so on. Thus all the a_n with n\geq 2 are equal, which they are not supposed to be. This feels significant.


Out of interest, what happens to the inequalities when we (illegally) take the above sequence? We get (a_0+a_1+\dots+a_n)/n=a+1, so we get equality on both sides except when n=1 when we get a_1=a<1+a=a_2.

Proof discovery technique.

Try to disprove the result.

Proof discovery subtechnique.

Try to find the simplest counterexample you can.

An obvious thing to do is to try to make the inequality true when n=1 and when n=2. So let’s go. Without loss of generality a_0=1, a_1=a. We now need a_2\geq a+1.

For n=2 we need a_2<(a_0+a_1+a_2)/2=(1+a+a_2)/2. That can be rearranged to a_2<1+a, exactly contradicting what we had before.


That doesn’t solve the problem but it looks interesting. In particular, it suggests rearranging the first inequality in the general case, to

a_n(1-1/n) < \frac{a_0+a_1+\dots+a_{n-1}}n\ .

That’s quite nice because the right hand side is a genuine average this time.

Actually, if getting an average is what we care about, we could also rearrange the first inequality by simply multiplying through by n/(n+1), which gives

na_n/(n+1) < \frac{a_0+a_1+\dots+a_n}{n+1}\ .


I think it is time to revisit that guess, in order to try to prove at least that there exists a solution. So we know that the first inequality holds when n=1, since all it says then is that a_1<a_0+a_1. Can it always hold? If so, then again WLOG a_0=1 and a_1=a, and after that we get a_2<1+a, a_3<(1+a+a_2)/2, a_4<(1+a+a_2+a_3)/3 etc.

Let’s write b=1+a and a_i=b-c_i for i\geq 2. Then we have c_2>0, c_3>c_2/2, c_4>(c_2+c_3)/3, etc. We also require c_2>c_3>c_4>\dots.

Let’s set c_1=0. Now the first condition becomes c_{n+1}>(c_1+\dots+c_n)/n but c_2>c_3>\dots. Is that possible?

Is it possible with equality? WLOG c_2=1. Then we have c_3=1/2, c_4=1/2, c_5=1/2, etc.


I’m starting to wonder whether the integer must be something like 1 or 2. Let’s think about it. We know that a_1<a_0+a_1. If a_2\geq a_0+a_1 then we have our n. Suppose instead that a_2<a_0+a_1. Then 2a_2<a_0+a_1+a_2, so a_2<(a_0+a_1+a_2)/2. Now if a_3\geq (a_0+a_1+a_2)/2 then we are again done, so suppose that a_3<(a_0+a_1+a_2)/2.

But since a_2<(a_0+a_1+a_2)/2, we can simply insert a_3 in between the two. Why can’t we continue doing that kind of thing? Let me try.

If a_3<(a_0+a_1+a_2)/2, then a_3<(a_0+a_1+a_2+a_3)/3, so we can insert a_4 in between the two.


I seem to have disproved the result, so I’d better see where I’m going wrong. I’ll try to construct a sequence explicitly. I’ll take a_0=1, a_1=2. I need a_1<a_2<a_0+a_1, so I’ll take a_2=5/2. Now I need a_2<a_3<(a_0+a_1+a_2)/2=11/4, so I’ll take a_3=21/8. Now I need 63/24=a_3<a_4<(a_0+a_1+a_2+a_3)/3=(8+16+20+21)/24=65/24, so I’ll take a_4=64/24=8/3.

I don’t seem to be getting stuck, so let me try to prove that I can always continue. Suppose I’ve already chosen a_0,\dots,a_n. Then the condition I need is that



By induction we already have that a_n<(a_0+\dots+a_{n-1})/(n-1), from which it follows that a_n(1+1/(n-1))<(a_0+\dots+a_n)/(n-1) and therefore that a_n<(a_0+a_1+\dots+a_n)/n. We may therefore find a_{n+1} between these two numbers, as desired.


You idiot Gowers, read the question: the a_n have to be positive integers.

Fortunately, the work I’ve done so far is not a complete waste of time. [The half-conscious thought in the back of my mind here, which is clearer in retrospect, was that the successive differences in the example I had just constructed were getting smaller and smaller. So it seemed highly likely that using the same general circle of thoughts I would be able to prove that I couldn't take the a_i to be integers.]


Here’s a trivial observation: if the second inequality fails, then a_{n+1}<(n+1)a_n/n. So if a_n=Cn, then a_{n+1}<C(n+1). How long can we keep that going with positive integers? Answer: for ever, since we can take a_n=n+2.


Never mind about that. I want to go back to an earlier idea. [It isn't obvious what I mean by "earlier idea" here. Actually, I had earlier had the idea of defining the d_i as below, but got distracted by something else and ended up not writing it down. So a small part of the record of my journey to the proof is missing.] It is simply to define d_1=a_0+a_1 and d_n=a_n for n\geq 2. Then for n\geq 2 if the first inequality holds we have

d_n<\frac{d_1+d_2+\dots+d_n}n\ .


So each new d_n is less than the average of the d_i up to that n, and hence less than the average of the d_i before that n. But that means that the average of the d_i forms a decreasing sequence. That also means that the d_n are bounded above by d_1, something I could have observed ages ago. So they can’t be an increasing sequence of integers.


I’ve now shown that the first inequality must fail at some point. Suppose n+1 is the first point at which it fails. Then we have



a_{n+1}\geq\frac{a_0+a_1+\dots+a_{n+1}}{n+1}\ .

The second inequality tells us that d_{n+1} exceeds the average of d_1,\dots,d_{n+1}, which implies that it exceeds the average of d_1,\dots,d_n. That gives us the inequality

\frac{a_0+a_1+\dots+a_n}n\leq a_{n+1}\ .


So now I’ve proved that there exists an integer n\geq 1 such that the inequalities both hold. It remains to prove uniqueness. This formulation with the d_i ought to help. We’ve picked the first point at which d_{n+1} is at least as big as the average of d_1,\dots,d_n. Does that imply that d_{n+2} is at least as big as the average of d_1,\dots,d_{n+1}? Yes, because d_{n+1} is at least as big as that average, and d_{n+2} is bigger than d_{n+1}. In other words, we can prove easily that if the first inequality fails for n+1 then it fails for n+2, and hence by induction for all m>n.

Tim GowersECM2016 — your chance to influence the programme


Just before I start this post, let me say that I do still intend to write a couple of follow-up posts to my previous one about journal prices. But I’ve been busy with a number of other things, so it may still take a little while.

This post is about the next European Congress of Mathematics, which takes place in Berlin in just over two years’ time. I have agreed to chair the scientific committee, which is responsible for choosing approximately 10 plenary speakers and approximately 30 invited lecturers, the latter to speak in four or five parallel sessions.

The ECM is less secretive than the ICM when it comes to drawing up its scientific programme. In particular, the names of the committee members were made public some time ago, and you can read them here.

I am all in favour of as much openness as possible, so I am very pleased that this is the way that the European Mathematical Society operates. But what is the maximum reasonable level of openness in this case? Clearly, public discussion of the merits of different candidates is completely out of order, but I think anything else goes. In particular, and this is the main point of the post, I would very much welcome suggestions for potential speakers. If you know of a mathematician who is European (and for these purposes Europe includes certain not obviously European countries such as Russia and Israel), has done exciting work (ideally recently), and will not already be speaking about that work at the International Congress of Mathematicians in Seoul, then we would like to hear about it. Our main aim is that the congress should be rewarding for its participants, so we will take some account of people’s ability to give a good talk. This applies in particular to plenary speakers.

I shall moderate all comments on this post. If you suggest a possible speaker, I will not publish your comment, but will note the suggestion. More general comments are also welcome and will be published, assuming that they are the kinds of comments I would normally allow.

[In parentheses, let me say what my comment policy now is. The volume of spam I get on this blog has reached a level where I have decided to implement a feature that WordPress allows, where if you have never had a comment accepted, then your comment will automatically be moderated. I try to check the moderation queue quite frequently. If you have had a comment accepted in the past, then your comments will appear as normal.

I am very reluctant to delete comments, but I do delete obvious spam, and I also delete any comment that tries to use this blog as a form of self-promotion (such as using a comment to draw attention to the author's proof of the Riemann hypothesis, or to the author's fascinating blog, etc. etc.). I sometimes delete pingbacks as well -- it depends whether I think readers of my blog might conceivably be interested in the post from which the pingback originates.]

Going back to the European Congress, if you would prefer to make your suggestion by getting in contact directly with a committee member, then that is obviously fine too. The list of committee members includes email addresses.

However you make your suggestions, it would be very helpful if you could give not just a name but a brief reason for the suggestion: what the work is that you think should be recognised, and why it is important.

The main other thing I am happy to be open about is the stage that the committee has reached in its deliberations, and the plans for how it will carry out its work. Right now, we are at the stage of trying to put together a longlist of possible speakers. I have asked the other committee members to suggest to me at least six potential speakers each, of whom at least six should be broadly in their area. I hope that will give us enough candidates to make it possible to achieve a reasonable subject balance. We will of course also strive for other forms of balance, such as gender and geographical balance, to the extent that we can. Once we have a decent-sized longlist, we will cut it down to the right sort of size.

We are aiming to produce a near-complete list of speakers by around November. This is rather a long time in advance of the Congress itself, which worried me a bit, but I have permission from the EMS to leave open a few slots so that if somebody does something spectacular after November, then we will have the option of inviting them to speak.

ResonaancesWeekend Plot: all of dark matter

To put my recent posts into a bigger perspective, here's a graph summarizing all of dark matter particles discovered so far via direct or indirect detection:

The graph shows the number of years the signal has survived vs. the inferred mass of the dark matter particle. The particle names follow the usual Particle Data Group conventions. The label's size is related to the statistical significance of the signal. The colors correspond to the Bayesian likelihood that the signal originates from dark matter, from uncertain (red) to very unlikely (blue). The masses of the discovered particles span impressive 11 orders of magnitude, although the largest concentration is near the weak scale (this is called the WIMP miracle). If I forgot any particle for which a compelling evidence exists, let me know, and I will add it to the graph.

Here are the original references for the Bulbulon, BoehmotCollaron, CDMesonDaemon, CresstonHooperon, Wenigon, Pamelon, and the mother of Bert and Ernie

ResonaancesFollow up on BICEP

The BICEP2 collaboration claims the discovery of the primordial B-mode in the CMB at a very high confidence level.  Résonaances recently reported on the chinese whispers that cast doubts about the statistical significance of that result.  They were based in part on the work of Raphael Flauger and Colin Hill, rumors of which were spreading through email and coffee time discussions. Today Raphael gave a public seminar describing this analysis, see the slides and the video.

The familiar number r=0.2 for the CMB tensor-to-scalar ratio is based on the assumption of zero foreground contribution in the region of the sky observed by BICEP. To argue that foregrounds should not be a big effect, the BICEP paper studied several models to estimate the galactic dust emission. Of those, only the data driven models DDM1 and DDM2 were based actual polarization data inadvertently shared by Planck. However, even these models suggest that foregrounds are not completely negligible. For example, subtracting the foregrounds estimated via DDM2 brings the central value of r down to 0.16 or 0.12 depending how the model is used (cross-correlation vs. auto-correlation). If, instead,  the cross-correlated  BICEP2 and Keck Array data are used as an input, the tensor-to-scalar ratio can easily be below 0.1, in agreement with the existing bounds from Planck and WMAP.

Raphael's message is that, according to his analysis, the foreground emissions are larger than estimated by BICEP, and that systematic uncertainties on that estimate (due to incomplete information, modeling uncertainties, and scraping numbers from pdf slides) are also large. If that is true, the statistical significance of the primordial B-mode  detection is much weaker than what is being claimed by BICEP.

In his talk, Raphael described an independent and what is the most complete to date attempt to extract the foregrounds from existing data. Apart from using the same Planck's polarization fraction map as BICEP, he also included the Q and U all-sky map (the letters refer to how polarization is parameterized), and models of polarized dust emission based on  HI maps (21cm hydrogen line emission is supposed to track the galactic dust).  One reason for the discrepancy with the BICEP estimates could be that the effect of the Cosmic Infrared Background - mostly unpolarized emission from faraway galaxies - is non-negligible. The green band in the plot shows the polarized dust emission obtained from the  CIB corrected DDM2 model, and compares it to the original BICEP estimate (blue dashed line).

The analysis then goes on to extract the foregrounds starting from several different premises. All available datasets (polarization reconstructed via HI maps, the information scraped from existing Planck's polarization maps) seem to say a similar story: galactic foregrounds can be large in the region of interest and uncertainties are large.  The money plot is this one:

Recall that the primordial B-mode signal should show up at moderate angular scales with l∼100 (the high-l end is dominated by non-primordial B-modes from gravitational lensing). Given the current uncertainties, the foreground emission may easily account for the entire BICEP2 signal in that region. Again, this does not prove that tensor mode cannot be there. The story may still reach a happy ending, much like the one of  the discovery of accelerated expansion (where serious doubts about systematic uncertainties also were raised after the initial announcement). But the ball is on the BICEP side to convincingly demonstrate that foregrounds are under control.

Until that happens, I think their result does not stand.

ResonaancesAnother one bites the dust...

...though it's not BICEP2 this time :) This is a long overdue update on the forward-backward asymmetry of the top quark production.
Recall that, in a collision of a quark and an anti-quark producing a top quark together with its antiparticle, the top quark is more often ejected in the direction of the incoming quark (as opposed to the anti-quark). This effect can be most easily studied at the Tevatron who was colliding protons with antiprotons, therefore the direction of the quark and of the anti-quark could be easily inferred. Indeed, the Tevatron experiments observed the asymmetry at a high confidence level. In the leading order approximation, the Standard Model predicts zero asymmetry, which boils down to the fact that gluons mediating the production process couple with the same strength to left- and right-handed quark polarizations. Taking into account quantum corrections at 1 loop leads to a small but non-zero asymmetry.
Intriguingly, the asymmetry measured at the Tevatron appeared to be large, of order 20%, significantly more than the value  predicted by the Standard Model loop effects. On top of this, the distribution of the asymmetry as a function of the top-pair invariant mass, and the angular distribution of leptons from top quark decay were strongly deviating from the Standard Model expectation. All in all, the ttbar forward-backward anomaly has been considered, for many years, one of our best hints for physics beyond the Standard Model. The asymmetry could be interpreted, for example, as  being due to new heavy resonances with the quantum numbers of the gluon, which are predicted by models where quarks are composite objects. However, the story has been getting less and less  exciting lately. First of all, no other top quark observables  (like e.g. the total production cross section) were showing any deviations, neither at the Tevatron nor at the LHC. Another worry was that the related top asymmetry was not observed at the LHC. At the same time, the Tevatron numbers have been evolving in a worrisome direction: as the Standard Model computation was being refined the prediction was going up; on the other hand, the experimental value was steadily going down as more data were being added. Today we are close to the point where the Standard Model and experiment finally meet...

The final straw is two recent updates from Tevatron's D0 experiment. Earlier this year, D0 published the measurement  of  the forward-backward asymmetry of the direction of the leptons
from top quark decays. The top quark sometimes decays leptonically, to a b-quark, a neutrino, and a charged lepton (e+, μ+).  In this case, the momentum of the lepton is to some extent correlated with that of the parent top, thus the top quark asymmetry may come together with the lepton asymmetry  (although some new physics models affect the top and lepton asymmetry in a completely different way). The previous D0 measurement showed a large, more than 3 sigma, excess in that observable. The new refined analysis using the full dataset reaches a different conclusion: the asymmetry is Al=(4.2 ± 2.4)%, in a good agreement with the Standard Model.  As can be seen in the picture,  none of the CDF and D0 measurement of the lepton asymmetry in several  final states shows any anomaly at this point.  Then came the D0 update of the regular ttbar forward-backward asymmetry in the semi-leptonic channel. Same story here: the number went down from 20% down to Att=(10.6  ± 3.0)%, compared to the Standard Model prediction of 9%. CDF got a slightly larger number here, Att=(16.4 ± 4.5)%, but taken together the results are not significantly above the Standard Model prediction of Att=9%.

So, all the current data on the top quark, both from the LHC and from the Tevatron,  are perfectly consistent with the Standard Model predictions. There may be new physics somewhere at the weak scale, but we're not gonna pin it down by measuring the top asymmetry. This one is a dead parrot:

Graphics borrowed from this talk

ResonaancesWeekend Plot: dream on

To force myself into a more regular blogging lifestyle, I thought it would be good to have a semi-regular column.  So I'm kicking off with the Weekend Plot series (any resemblance to Tommaso's Plot of the Week is purely coincidental). You understand the idea: it's weekend, people relax, drink, enjoy... and for all the nerds there's at least a plot.  

For a starter, a plot from the LHC Higgs Cross Section Working Group:

It shows the Higgs boson production cross section in proton-proton collisions as a function of center-of-mass energy. Notably, the plot extends as far as our imagination can stretch, that is up to a 100 TeV collider.  At 100 TeV the cross section is 40 times larger compared to the 8 TeV LHC.  So far we produced about 1 million Higgs bosons at the LHC and we'll probably make 20 times more in this decade. With a 100 TeV collider, 3 inverse attobarn of luminosity,  and 4 detectors  (dream on) we could produce 10 billion Higgs bosons and really squeeze the shit out of it.  For the Higgs production in association with a top-antitop quark pair the increase is even more dramatic: between 8 at 100 TeV the rate increases by a factor of 300 and ttH is upgraded to the 3rd largest production mode. Double Higgs production increases by a similar factor and becomes fairly common. So these theoretically interesting production processes  will be a piece of cake in the asymptotic future.

Wouldn't it be good?

July 18, 2014

Geraint F. LewisResolving the mass--anisotropy degeneracy of the spherically symmetric Jeans equation

I am exhausted after a month of travel, but am now back in a sunny, but cool, Sydney. It's feels especially chilly as part of my trip included Death Valley, where the temperatures were pushing 50 degrees C.

I face a couple of weeks of catch-up, especially with regards to some blog posts on my recent papers. Here, I am going to cheat and present two papers at once. Both papers are by soon-to-be-newly-minted Doctor, Foivos Diakogiannis. I hope you won't mind, as these papers are Part I and II of the same piece of work.

The fact that this work is spread over two papers tells you that it's a long and winding saga, but it's cool stuff as it does something that can really advance science - take an idea from one area and use it somewhere else.

The question the paper looks at sounds, on the face of it, rather simple. Imagine you you have a ball of stars, something like this, a globular cluster:
You can see where the stars are. Imagine that you can also measure the speeds of the stars. So, the questions is - what is the distribution of mass in this ball of stars? It might sound obvious, because isn't the mass the stars? Well, you have to be careful as we are seeing the brightest stars, and the fainter stars, are harder to see. Also, there may be dark matter in there.

So, we are faced with a dynamics problem, which means we want to find the forces; the force acting here is, of course, gravity, and so mapping the forces gives you the mass. And forces produces accelerations, so all we need is to measure these and... oh.. hang on. The Doppler Shift gives us the velocity, not the acceleration, and so we have wait (a long time) to measure accelerations (i.e. see the change of velocity over time). As they say in the old country, "Bottom".

And this has dogged astronomy for more than one hundred years. But there are some equations (which I think a lovely, but if you are not a maths fan, they may give you a minor nightmare) called the Jeans Equations. I won't pop them here, as there are lots of bits to them and it would take a blog post to explain them in detail.

But there are problems (aren't there always) and that's the assumptions that are made, and the key problem is degeneracies.

Degeneracies are a serious pain in science. Imagine you have measured a value in an experiment, let's say it's the speed of a planet (there will be an error associated with that measurement). Now, you have your mathematical laws that makes a prediction for the speed of the planet, but you find that your maths do not give you a single answer, but multiple answers that equally well explain the measurements. What's the right answer? You need some new (or better) observations to "break the degeneracies".

And degeneracies dog dynamical work. There is a traditional approach to modelling the mass distribution through the Jeans equations, where certain assumptions are made, but you are often worried about how justified your assumptions are. While we cannot remove all the degeneracies, we can try and reduce their impact. How? By letting the data point the way.

By this point, you may look a little like this

OK. So, there are parts to the Jeans equations where people traditionally put in functions to describe what something is doing. As an example, we might choose a density that has a mathematical form like
that tells us how the density change with radius (those in the know will recognise this as the well-known Navarro-Frenk-White profile. Now, what if your density doesn't look like this? Then you are going to get the wrong answers because you assumed it.

So, what you want to do is let the data choose the function for you. But how is this possible? How do you get "data" to pick the mathematical form for something like density? This is where Foivos had incredible insight and called on a completely different topic all together, namely Computer-Aided Design.

For designing things on a computer, you need curves, curves that you can bend and stretch into a range of arbitrary shapes, and it would be painful to work out the mathematical form of all of the potential curves you need. So, you don't bother. You use extremely flexible curves known as splines. I've always loved splines. They are so simple, but so versatile. You specify some points, and you get a nice smooth curve. I urge you to have a look at them.

For this work, we use b-splines and construct the profiles we want from some basic curves. Here's an example from the paper:
We then plug this flexible curve into the mathematics of dynamics. For this work, we test the approach by creating fake data from a model, and then try and recover the model from the data. And it works!
Although it is not that simple. A lot of care and thought has to be taken on just how you you construct the spline (this is the focus of the second paper), but that's now been done. We now have the mathematics we need to really crack the dynamics of globular clusters, dwarf galaxies and even our Milky Way.

There's a lot more to write on this, but we'll wait for the results to start flowing. Watch this space!

Well done Foivos! - not only on the paper, but for finishing his PhD, getting a postdoctoral position at ICRAR, but also getting married :)

Resolving the mass--anisotropy degeneracy of the spherically symmetric Jeans equation I: theoretical foundation

A widely employed method for estimating the mass of stellar systems with apparent spherical symmetry is dynamical modelling using the spherically symmetric Jeans equation. Unfortunately this approach suffers from a degeneracy between the assumed mass density and the second order velocity moments. This degeneracy can lead to significantly different predictions for the mass content of the system under investigation, and thus poses a barrier for accurate estimates of the dark matter content of astrophysical systems. In a series of papers we describe an algorithm that removes this degeneracy and allows for unbiased mass estimates of systems of constant or variable mass-to-light ratio. The present contribution sets the theoretical foundation of the method that reconstructs a unique kinematic profile for some assumed free functional form of the mass density. The essence of our method lies in using flexible B-spline functions for the representation of the radial velocity dispersion in the spherically symmetric Jeans equation. We demonstrate our algorithm through an application to synthetic data for the case of an isotropic King model with fixed mass-to-light ratio, recovering excellent fits of theoretical functions to observables and a unique solution. The mass-anisotropy degeneracy is removed to the extent that, for an assumed functional form of the potential and mass density pair (\Phi,\rho), and a given set of line-of-sight velocity dispersion \sigma_{los}^2 observables, we recover a unique profile for \sigma_{rr}^2 and \sigma_{tt}^2. Our algorithm is simple, easy to apply and provides an efficient means to reconstruct the kinematic profile.


Resolving the mass--anisotropy degeneracy of the spherically symmetric Jeans equation II: optimum smoothing and model validation

The spherical Jeans equation is widely used to estimate the mass content of a stellar systems with apparent spherical symmetry. However, this method suffers from a degeneracy between the assumed mass density and the kinematic anisotropy profile, β(r). In a previous work, we laid the theoretical foundations for an algorithm that combines smoothing B-splines with equations from dynamics to remove this degeneracy. Specifically, our method reconstructs a unique kinematic profile of σ2rr and σ2tt for an assumed free functional form of the potential and mass density (Φ,ρ) and given a set of observed line-of-sight velocity dispersion measurements, σ2los. In Paper I (submitted to MNRAS: MN-14-0101-MJ) we demonstrated the efficiency of our algorithm with a very simple example and we commented on the need for optimum smoothing of the B-spline representation; this is in order to avoid unphysical variational behaviour when we have large uncertainty in our data. In the current contribution we present a process of finding the optimum smoothing for a given data set by using information of the behaviour from known ideal theoretical models. Markov Chain Monte Carlo methods are used to explore the degeneracy in the dynamical modelling process. We validate our model through applications to synthetic data for systems with constant or variable mass-to-light ratio Υ. In all cases we recover excellent fits of theoretical functions to observables and unique solutions. Our algorithm is a robust method for the removal of the mass-anisotropy degeneracy of the spherically symmetric Jeans equation for an assumed functional form of the mass density.

Sean CarrollGalaxies That Are Too Big To Fail, But Fail Anyway

Dark matter exists, but there is still a lot we don’t know about it. Presumably it’s some kind of particle, but we don’t know how massive it is, what forces it interacts with, or how it was produced. On the other hand, there’s actually a lot we do know about the dark matter. We know how much of it there is; we know roughly where it is; we know that it’s “cold,” meaning that the average particle’s velocity is much less than the speed of light; and we know that dark matter particles don’t interact very strongly with each other. Which is quite a bit of knowledge, when you think about it.

Fortunately, astronomers are pushing forward to study how dark matter behaves as it’s scattered through the universe, and the results are interesting. We start with a very basic idea: that dark matter is cold and completely non-interacting, or at least has interactions (the strength with which dark matter particles scatter off of each other) that are too small to make any noticeable difference. This is a well-defined and predictive model: ΛCDM, which includes the cosmological constant (Λ) as well as the cold dark matter (CDM). We can compare astronomical observations to ΛCDM predictions to see if we’re on the right track.

At first blush, we are very much on the right track. Over and over again, new observations come in that match the predictions of ΛCDM. But there are still a few anomalies that bug us, especially on relatively small (galaxy-sized) scales.

One such anomaly is the “too big to fail” problem. The idea here is that we can use ΛCDM to make quantitative predictions concerning how many galaxies there should be with different masses. For example, the Milky Way is quite a big galaxy, and it has smaller satellites like the Magellanic Clouds. In ΛCDM we can predict how many such satellites there should be, and how massive they should be. For a long time we’ve known that the actual number of satellites we observe is quite a bit smaller than the number predicted — that’s the “missing satellites” problem. But this has a possible solution: we only observe satellite galaxies by seeing stars and gas in them, and maybe the halos of dark matter that would ordinarily support such galaxies get stripped of their stars and gas by interacting with the host galaxy. The too big to fail problem tries to sharpen the issue, by pointing out that some of the predicted galaxies are just so massive that there’s no way they could not have visible stars. Or, put another way: the Milky Way does have some satellites, as do other galaxies; but when we examine these smaller galaxies, they seem to have a lot less dark matter than the simulations would predict.

Still, any time you are concentrating on galaxies that are satellites of other galaxies, you rightly worry that complicated interactions between messy atoms and photons are getting in the way of the pristine elegance of the non-interacting dark matter. So we’d like to check that this purported problem exists even out “in the field,” with lonely galaxies far away from big monsters like the Milky Way.

A new paper claims that yes, there is a too-big-to-fail problem even for galaxies in the field.

Is there a “too big to fail” problem in the field?
Emmanouil Papastergis, Riccardo Giovanelli, Martha P. Haynes, Francesco Shankar

We use the Arecibo Legacy Fast ALFA (ALFALFA) 21cm survey to measure the number density of galaxies as a function of their rotational velocity, Vrot,HI (as inferred from the width of their 21cm emission line). Based on the measured velocity function we statistically connect galaxies with their host halos, via abundance matching. In a LCDM cosmology, low-velocity galaxies are expected to be hosted by halos that are significantly more massive than indicated by the measured galactic velocity; allowing lower mass halos to host ALFALFA galaxies would result in a vast overestimate of their number counts. We then seek observational verification of this predicted trend, by analyzing the kinematics of a literature sample of field dwarf galaxies. We find that galaxies with Vrot,HI<25 km/s are kinematically incompatible with their predicted LCDM host halos, in the sense that hosts are too massive to be accommodated within the measured galactic rotation curves. This issue is analogous to the "too big to fail" problem faced by the bright satellites of the Milky Way, but here it concerns extreme dwarf galaxies in the field. Consequently, solutions based on satellite-specific processes are not applicable in this context. Our result confirms the findings of previous studies based on optical survey data, and addresses a number of observational systematics present in these works. Furthermore, we point out the assumptions and uncertainties that could strongly affect our conclusions. We show that the two most important among them, namely baryonic effects on the abundances and rotation curves of halos, do not seem capable of resolving the reported discrepancy.

Here is the money plot from the paper:


The horizontal axis is the maximum circular velocity, basically telling us the mass of the halo; the vertical axis is the observed velocity of hydrogen in the galaxy. The blue line is the prediction from ΛCDM, while the dots are observed galaxies. Now, you might think that the blue line is just a very crappy fit to the data overall. But that’s okay; the points represent upper limits in the horizontal direction, so points that lie below/to the right of the curve are fine. It’s a statistical prediction: ΛCDM is predicting how many galaxies we have at each mass, even if we don’t think we can confidently measure the mass of each individual galaxy. What we see, however, is that there are a bunch of points in the bottom left corner that are above the line. ΛCDM predicts that even the smallest galaxies in this sample should still be relatively massive (have a lot of dark matter), but that’s not what we see.

If it holds up, this result is really intriguing. ΛCDM is a nice, simple starting point for a theory of dark matter, but it’s also kind of boring. From a physicist’s point of view, it would be much more fun if dark matter particles interacted noticeably with each other. We have plenty of ideas, including some of my favorites like dark photons and dark atoms. It is very tempting to think that observed deviations from the predictions of ΛCDM are due to some interesting new physics in the dark sector.

Which is why, of course, we should be especially skeptical. Always train your doubt most strongly on those ideas that you really want to be true. Fortunately there is plenty more to be done in terms of understanding the distribution of galaxies and dark matter, so this is a very solvable problem — and a great opportunity for learning something profound about most of the matter in the universe.

July 17, 2014

Terence TaoReal analysis relative to a finite measure space

In the traditional foundations of probability theory, one selects a probability space {(\Omega, {\mathcal B}, {\mathbf P})}, and makes a distinction between deterministic mathematical objects, which do not depend on the sampled state {\omega \in \Omega}, and stochastic (or random) mathematical objects, which do depend (but in a measurable fashion) on the sampled state {\omega \in \Omega}. For instance, a deterministic real number would just be an element {x \in {\bf R}}, whereas a stochastic real number (or real random variable) would be a measurable function {x: \Omega \rightarrow {\bf R}}, where in this post {{\bf R}} will always be endowed with the Borel {\sigma}-algebra. (For readers familiar with nonstandard analysis, the adjectives “deterministic” and “stochastic” will be used here in a manner analogous to the uses of the adjectives “standard” and “nonstandard” in nonstandard analysis. The analogy is particularly close when comparing with the “cheap nonstandard analysis” discussed in this previous blog post. We will also use “relative to {\Omega}” as a synonym for “stochastic”.)

Actually, for our purposes we will adopt the philosophy of identifying stochastic objects that agree almost surely, so if one was to be completely precise, we should define a stochastic real number to be an equivalence class {[x]} of measurable functions {x: \Omega \rightarrow {\bf R}}, up to almost sure equivalence. However, we shall often abuse notation and write {[x]} simply as {x}.

More generally, given any measurable space {X = (X, {\mathcal X})}, we can talk either about deterministic elements {x \in X}, or about stochastic elements of {X}, that is to say equivalence classes {[x]} of measurable maps {x: \Omega \rightarrow X} up to almost sure equivalence. We will use {\Gamma(X|\Omega)} to denote the set of all stochastic elements of {X}. (For readers familiar with sheaves, it may helpful for the purposes of this post to think of {\Gamma(X|\Omega)} as the space of measurable global sections of the trivial {X}-bundle over {\Omega}.) Of course every deterministic element {x} of {X} can also be viewed as a stochastic element {x|\Omega \in \Gamma(X|\Omega)} given by (the equivalence class of) the constant function {\omega \mapsto x}, thus giving an embedding of {X} into {\Gamma(X|\Omega)}. We do not attempt here to give an interpretation of {\Gamma(X|\Omega)} for sets {X} that are not equipped with a {\sigma}-algebra {{\mathcal X}}.

Remark 1 In my previous post on the foundations of probability theory, I emphasised the freedom to extend the sample space {(\Omega, {\mathcal B}, {\mathbf P})} to a larger sample space whenever one wished to inject additional sources of randomness. This is of course an important freedom to possess (and in the current formalism, is the analogue of the important operation of base change in algebraic geometry), but in this post we will focus on a single fixed sample space {(\Omega, {\mathcal B}, {\mathbf P})}, and not consider extensions of this space, so that one only has to consider two types of mathematical objects (deterministic and stochastic), as opposed to having many more such types, one for each potential choice of sample space (with the deterministic objects corresponding to the case when the sample space collapses to a point).

Any (measurable) {k}-ary operation on deterministic mathematical objects then extends to their stochastic counterparts by applying the operation pointwise. For instance, the addition operation {+: {\bf R} \times {\bf R} \rightarrow {\bf R}} on deterministic real numbers extends to an addition operation {+: \Gamma({\bf R}|\Omega) \times \Gamma({\bf R}|\Omega) \rightarrow \Gamma({\bf R}|\Omega)}, by defining the class {[x]+[y]} for {x,y: \Omega \rightarrow {\bf R}} to be the equivalence class of the function {\omega \mapsto x(\omega) + y(\omega)}; this operation is easily seen to be well-defined. More generally, any measurable {k}-ary deterministic operation {O: X_1 \times \dots \times X_k \rightarrow Y} between measurable spaces {X_1,\dots,X_k,Y} extends to an stochastic operation {O: \Gamma(X_1|\Omega) \times \dots \Gamma(X_k|\Omega) \rightarrow \Gamma(Y|\Omega)} in the obvious manner.

There is a similar story for {k}-ary relations {R: X_1 \times \dots \times X_k \rightarrow \{\hbox{true},\hbox{false}\}}, although here one has to make a distinction between a deterministic reading of the relation and a stochastic one. Namely, if we are given stochastic objects {x_i \in \Gamma(X_i|\Omega)} for {i=1,\dots,k}, the relation {R(x_1,\dots,x_k)} does not necessarily take values in the deterministic Boolean algebra {\{ \hbox{true}, \hbox{false}\}}, but only in the stochastic Boolean algebra {\Gamma(\{ \hbox{true}, \hbox{false}\}|\Omega)} – thus {R(x_1,\dots,x_k)} may be true with some positive probability and also false with some positive probability (with the event that {R(x_1,\dots,x_k)} being stochastically true being determined up to null events). Of course, the deterministic Boolean algebra embeds in the stochastic one, so we can talk about a relation {R(x_1,\dots,x_k)} being determinstically true or deterministically false, which (due to our identification of stochastic objects that agree almost surely) means that {R(x_1(\omega),\dots,x_k(\omega))} is almost surely true or almost surely false respectively. For instance given two stochastic objects {x,y}, one can view their equality relation {x=y} as having a stochastic truth value. This is distinct from the way the equality symbol {=} is used in mathematical logic, which we will now call “equality in the deterministic sense” to reduce confusion. Thus, {x=y} in the deterministic sense if and only if the stochastic truth value of {x=y} is equal to {\hbox{true}}, that is to say that {x(\omega)=y(\omega)} for almost all {\omega}.

Any universal identity for deterministic operations (or universal implication between identities) extends to their stochastic counterparts: for instance, addition is commutative, associative, and cancellative on the space of deterministic reals {{\bf R}}, and is therefore commutative, associative, and cancellative on stochastic reals {\Gamma({\bf R}|\Omega)} as well. However, one has to be more careful when working with mathematical laws that are not expressible as universal identities, or implications between identities. For instance, {{\bf R}} is an integral domain: if {x_1,x_2 \in {\bf R}} are deterministic reals such that {x_1 x_2=0}, then one must have {x_1=0} or {x_2=0}. However, if {x_1, x_2 \in \Gamma({\bf R}|\Omega)} are stochastic reals such that {x_1 x_2 = 0} (in the deterministic sense), then it is no longer necessarily the case that {x_1=0} (in the deterministic sense) or that {x_2=0} (in the deterministic sense); however, it is still true that “{x_1=0} or {x_2=0}” is true in the deterministic sense if one interprets the boolean operator “or” stochastically, thus “{x_1(\omega)=0} or {x_2(\omega)=0}” is true for almost all {\omega}. Another way to properly obtain a stochastic interpretation of the integral domain property of {{\bf R}} is to rewrite it as

\displaystyle  x_1,x_2 \in {\bf R}, x_1 x_2 = 0 \implies x_i=0 \hbox{ for some } i \in \{1,2\}

and then make all sets stochastic to obtain the true statement

\displaystyle  x_1,x_2 \in \Gamma({\bf R}|\Omega), x_1 x_2 = 0 \implies x_i=0 \hbox{ for some } i \in \Gamma(\{1,2\}|\Omega),

thus we have to allow the index {i} for which vanishing {x_i=0} occurs to also be stochastic, rather than deterministic. (A technical note: when one proves this statement, one has to select {i} in a measurable fashion; for instance, one can choose {i(\omega)} to equal {1} when {x_1(\omega)=0}, and {2} otherwise (so that in the “tie-breaking” case when {x_1(\omega)} and {x_2(\omega)} both vanish, one always selects {i(\omega)} to equal {1}).)

Similarly, the law of the excluded middle fails when interpreted deterministically, but remains true when interpreted stochastically: if {S} is a stochastic statement, then it is not necessarily the case that {S} is either deterministically true or deterministically false; however the sentence “{S} or not-{S}” is still deterministically true if the boolean operator “or” is interpreted stochastically rather than deterministically.

To avoid having to keep pointing out which operations are interpreted stochastically and which ones are interpreted deterministically, we will use the following convention: if we assert that a mathematical sentence {S} involving stochastic objects is true, then (unless otherwise specified) we mean that {S} is deterministically true, assuming that all relations used inside {S} are interpreted stochastically. For instance, if {x,y} are stochastic reals, when we assert that “Exactly one of {x < y}, {x=y}, or {x>y} is true”, then by default it is understood that the relations {<}, {=}, {>} and the boolean operator “exactly one of” are interpreted stochastically, and the assertion is that the sentence is deterministically true.

In the above discussion, the stochastic objects {x} being considered were elements of a deterministic space {X}, such as the reals {{\bf R}}. However, it can often be convenient to generalise this situation by allowing the ambient space {X} to also be stochastic. For instance, one might wish to consider a stochastic vector {v(\omega)} inside a stochastic vector space {V(\omega)}, or a stochastic edge {e} of a stochastic graph {G(\omega)}. In order to formally describe this situation within the classical framework of measure theory, one needs to place all the ambient spaces {X(\omega)} inside a measurable space. This can certainly be done in many contexts (e.g. when considering random graphs on a deterministic set of vertices, or if one is willing to work up to equivalence and place the ambient spaces inside a suitable moduli space), but is not completely natural in other contexts. For instance, if one wishes to consider stochastic vector spaces of potentially unbounded dimension (in particular, potentially larger than any given cardinal that one might specify in advance), then the class of all possible vector spaces is so large that it becomes a proper class rather than a set (even if one works up to equivalence), making it problematic to give this class the structure of a measurable space; furthermore, even once one does so, one needs to take additional care to pin down what it would mean for a random vector {\omega \mapsto v_\omega} lying in a random vector space {\omega \mapsto V_\omega} to depend “measurably” on {\omega}.

Of course, in any reasonable application one can avoid the set theoretic issues at least by various ad hoc means, for instance by restricting the dimension of all spaces involved to some fixed cardinal such as {2^{\aleph_0}}. However, the measure-theoretic issues can require some additional effort to resolve properly.

In this post I would like to describe a different way to formalise stochastic spaces, and stochastic elements of these spaces, by viewing the spaces as measure-theoretic analogue of a sheaf, but being over the probability space {\Omega} rather than over a topological space; stochastic objects are then sections of such sheaves. Actually, for minor technical reasons it is convenient to work in the slightly more general setting in which the base space {\Omega} is a finite measure space {(\Omega, {\mathcal B}, \mu)} rather than a probability space, thus {\mu(\Omega)} can take any value in {[0,+\infty)} rather than being normalised to equal {1}. This will allow us to easily localise to subevents {\Omega'} of {\Omega} without the need for normalisation, even when {\Omega'} is a null event (though we caution that the map {x \mapsto x|\Omega'} from deterministic objects {x} ceases to be injective in this latter case). We will however still continue to use probabilistic terminology. despite the lack of normalisation; thus for instance, sets {E} in {{\mathcal B}} will be referred to as events, the measure {\mu(E)} of such a set will be referred to as the probability (which is now permitted to exceed {1} in some cases), and an event whose complement is a null event shall be said to hold almost surely. It is in fact likely that almost all of the theory below extends to base spaces which are {\sigma}-finite rather than finite (for instance, by damping the measure to become finite, without introducing any further null events), although we will not pursue this further generalisation here.

The approach taken in this post is “topos-theoretic” in nature (although we will not use the language of topoi explicitly here), and is well suited to a “pointless” or “point-free” approach to probability theory, in which the role of the stochastic state {\omega \in \Omega} is suppressed as much as possible; instead, one strives to always adopt a “relative point of view”, with all objects under consideration being viewed as stochastic objects relative to the underlying base space {\Omega}. In this perspective, the stochastic version of a set is as follows.

Definition 1 (Stochastic set) Unless otherwise specified, we assume that we are given a fixed finite measure space {\Omega = (\Omega, {\mathcal B}, \mu)} (which we refer to as the base space). A stochastic set (relative to {\Omega}) is a tuple {X|\Omega = (\Gamma(X|E)_{E \in {\mathcal B}}, ((|E))_{E \subset F, E,F \in {\mathcal B}})} consisting of the following objects:

  • A set {\Gamma(X|E)} assigned to each event {E \in {\mathcal B}}; and
  • A restriction map {x \mapsto x|E} from {\Gamma(X|F)} to {\Gamma(X|E)} to each pair {E \subset F} of nested events {E,F \in {\mathcal B}}. (Strictly speaking, one should indicate the dependence on {F} in the notation for the restriction map, e.g. using {x \mapsto x|)E \leftarrow F)} instead of {x \mapsto x|E}, but we will abuse notation by omitting the {F} dependence.)

We refer to elements of {\Gamma(X|E)} as local stochastic elements of the stochastic set {X|\Omega}, localised to the event {E}, and elements of {\Gamma(X|\Omega)} as global stochastic elements (or simply elements) of the stochastic set. (In the language of sheaves, one would use “sections” instead of “elements” here, but I prefer to use the latter terminology here, for compatibility with conventional probabilistic notation, where for instance measurable maps from {\Omega} to {{\bf R}} are referred to as real random variables, rather than sections of the reals.)

Furthermore, we impose the following axioms:

  • (Category) The map {x \mapsto x|E} from {\Gamma(X|E)} to {\Gamma(X|E)} is the identity map, and if {E \subset F \subset G} are events in {{\mathcal B}}, then {((x|F)|E) = (x|E)} for all {x \in \Gamma(X|G)}.
  • (Null events trivial) If {E \in {\mathcal B}} is a null event, then the set {\Gamma(X|E)} is a singleton set. (In particular, {\Gamma(X|\emptyset)} is always a singleton set; this is analogous to the convention that {x^0=1} for any number {x}.)
  • (Countable gluing) Suppose that for each natural number {n}, one has an event {E_n \in {\mathcal B}} and an element {x_n \in \Gamma(X|E_n)} such that {x_n|(E_n \cap E_m) = x_m|(E_n \cap E_m)} for all {n,m}. Then there exists a unique {x\in \Gamma(X|\bigcup_{n=1}^\infty E_n)} such that {x_n = x|E_n} for all {n}.

If {\Omega'} is an event in {\Omega}, we define the localisation {X|\Omega'} of the stochastic set {X|\Omega} to {\Omega'} to be the stochastic set

\displaystyle X|\Omega' := (\Gamma(X|E)_{E \in {\mathcal B}; E \subset \Omega'}, ((|E))_{E \subset F \subset \Omega', E,F \in {\mathcal B}})

relative to {\Omega'}. (Note that there is no need to renormalise the measure on {\Omega'}, as we are not demanding that our base space have total measure {1}.)

The following fact is useful for actually verifying that a given object indeed has the structure of a stochastic set:

Exercise 1 Show that to verify the countable gluing axiom of a stochastic set, it suffices to do so under the additional hypothesis that the events {E_n} are disjoint. (Note that this is quite different from the situation with sheaves over a topological space, in which the analogous gluing axiom is often trivial in the disjoint case but has non-trivial content in the overlapping case. This is ultimately because a {\sigma}-algebra is closed under all Boolean operations, whereas a topology is only closed under union and intersection.)

Let us illustrate the concept of a stochastic set with some examples.

Example 1 (Discrete case) A simple case arises when {\Omega} is a discrete space which is at most countable. If we assign a set {X_\omega} to each {\omega \in \Omega}, with {X_\omega} a singleton if {\mu(\{\omega\})=0}. One then sets {\Gamma(X|E) := \prod_{\omega \in E} X_\omega}, with the obvious restriction maps, giving rise to a stochastic set {X|\Omega}. (Thus, a local element {x} of {\Gamma(X|E)} can be viewed as a map {\omega \mapsto x(\omega)} on {E} that takes values in {X_\omega} for each {\omega \in E}.) Conversely, it is not difficult to see that any stochastic set over an at most countable discrete probability space {\Omega} is of this form up to isomorphism. In this case, one can think of {X|\Omega} as a bundle of sets {X_\omega} over each point {\omega} (of positive probability) in the base space {\Omega}. One can extend this bundle interpretation of stochastic sets to reasonably nice sample spaces {\Omega} (such as standard Borel spaces) and similarly reasonable {X}; however, I would like to avoid this interpretation in the formalism below in order to be able to easily work in settings in which {\Omega} and {X} are very “large” (e.g. not separable in any reasonable sense). Note that we permit some of the {X_\omega} to be empty, thus it can be possible for {\Gamma(X|\Omega)} to be empty whilst {\Gamma(X|E)} for some strict subevents {E} of {\Omega} to be non-empty. (This is analogous to how it is possible for a sheaf to have local sections but no global sections.) As such, the space {\Gamma(X|\Omega)} of global elements does not completely determine the stochastic set {X|\Omega}; one sometimes needs to localise to an event {E} in order to see the full structure of such a set. Thus it is important to distinguish between a stochastic set {X|\Omega} and its space {\Gamma(X|\Omega)} of global elements. (As such, it is a slight abuse of the axiom of extensionality to refer to global elements of {X|\Omega} simply as “elements”, but hopefully this should not cause too much confusion.)

Example 2 (Measurable spaces as stochastic sets) Returning now to a general base space {\Omega}, any (deterministic) measurable space {X} gives rise to a stochastic set {X|\Omega}, with {\Gamma(X|E)} being defined as in previous discussion as the measurable functions from {E} to {X} modulo almost everywhere equivalence (in particular, {\Gamma(X|E)} a singleton set when {E} is null), with the usual restriction maps. The constraint of measurability on the maps {x: E \rightarrow \Omega}, together with the quotienting by almost sure equivalence, means that {\Gamma(X|E)} is now more complicated than a plain Cartesian product {\prod_{\omega \in E} X_\omega} of fibres, but this still serves as a useful first approximation to what {\Gamma(X|E)} is for the purposes of developing intuition. Indeed, the measurability constraint is so weak (as compared for instance to topological or smooth constraints in other contexts, such as sheaves of continuous or smooth sections of bundles) that the intuition of essentially independent fibres is quite an accurate one, at least if one avoids consideration of an uncountable number of objects simultaneously.

Example 3 (Hilbert modules) This example is the one that motivated this post for me. Suppose that one has an extension {(\tilde \Omega, \tilde {\mathcal B}, \tilde \mu)} of the base space {(\Omega, {\mathcal B},\mu)}, thus we have a measurable factor map {\pi: \tilde \Omega \rightarrow \Omega} such that the pushforward of the measure {\tilde \mu} by {\pi} is equal to {\mu}. Then we have a conditional expectation operator {\pi_*: L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu) \rightarrow L^2(\Omega,{\mathcal B},\mu)}, defined as the adjoint of the pullback map {\pi^*: L^2(\Omega,{\mathcal B},\mu) \rightarrow L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu)}. As is well known, the conditional expectation operator also extends to a contraction {\pi_*: L^1(\tilde \Omega,\tilde {\mathcal B},\tilde \mu) \rightarrow L^1(\Omega,{\mathcal B}, \mu)}. We then define the Hilbert module {L^2(\tilde \Omega|\Omega)} to be the space of functions {f \in L^2(\tilde \Omega,\tilde {\mathcal B},\tilde \mu)} with {\pi_*(|f|^2) \in L^\infty( \Omega, {\mathcal B}, \mu )}; this is a Hilbert module over {L^\infty(\Omega, {\mathcal B}, \mu)} which is of particular importance in the Furstenberg-Zimmer structure theory of measure-preserving systems. We can then define the stochastic set {L^2_\pi(\tilde \Omega)|\Omega} by setting

\displaystyle  \Gamma(L^2_\pi(\tilde \Omega)|E) := L^2( \pi^{-1}(E) | E )

with the obvious restriction maps. In the case that {\Omega,\Omega'} are standard Borel spaces, one can disintegrate {\mu'} as an integral {\mu' = \int_\Omega \nu_\omega\ d\mu(\omega)} of probability measures {\nu_\omega} (supported in the fibre {\pi^{-1}(\{\omega\})}), in which case this stochastic set can be viewed as having fibres {L^2( \tilde \Omega, \tilde {\mathcal B}, \nu_\omega )} (though if {\Omega} is not discrete, there are still some measurability conditions in {\omega} on the local and global elements that need to be imposed). However, I am interested in the case when {\Omega,\Omega'} are not standard Borel spaces (in fact, I will take them to be algebraic probability spaces, as defined in this previous post), in which case disintegrations are not available. However, it appears that the stochastic analysis developed in this blog post can serve as a substitute for the tool of disintegration in this context.

We make the remark that if {X|\Omega} is a stochastic set and {E, F} are events that are equivalent up to null events, then one can identify {\Gamma(X|E)} with {\Gamma(X|F)} (through their common restriction to {\Gamma(X|(E \cap F))}, with the restriction maps now being bijections). As such, the notion of a stochastic set does not require the full structure of a concrete probability space {(\Omega, {\mathcal B}, {\mathbf P})}; one could also have defined the notion using only the abstract {\sigma}-algebra consisting of {{\mathcal B}} modulo null events as the base space, or equivalently one could define stochastic sets over the algebraic probability spaces defined in this previous post. However, we will stick with the classical formalism of concrete probability spaces here so as to keep the notation reasonably familiar.

As a corollary of the above observation, we see that if the base space {\Omega} has total measure {0}, then all stochastic sets are trivial (they are just points).

Exercise 2 If {X|\Omega} is a stochastic set, show that there exists an event {\Omega'} with the property that for any event {E}, {\Gamma(X|E)} is non-empty if and only if {E} is contained in {\Omega'} modulo null events. (In particular, {\Omega'} is unique up to null events.) Hint: consider the numbers {\mu( E )} for {E} ranging over all events with {\Gamma(X|E)} non-empty, and form a maximising sequence for these numbers. Then use all three axioms of a stochastic set.

One can now start take many of the fundamental objects, operations, and results in set theory (and, hence, in most other categories of mathematics) and establish analogues relative to a finite measure space. Implicitly, what we will be doing in the next few paragraphs is endowing the category of stochastic sets with the structure of an elementary topos. However, to keep things reasonably concrete, we will not explicitly emphasise the topos-theoretic formalism here, although it is certainly lurking in the background.

Firstly, we define a stochastic function {f: X|\Omega \rightarrow Y|\Omega} between two stochastic sets {X|\Omega, Y|\Omega} to be a collection of maps {f: \Gamma(X|E) \rightarrow \Gamma(Y|E)} for each {E \in {\mathcal B}} which form a natural transformation in the sense that {f(x|E) = f(x)|E} for all {x \in \Gamma(X|F)} and nested events {E \subset F}. In the case when {\Omega} is discrete and at most countable (and after deleting all null points), a stochastic function is nothing more than a collection of functions {f_\omega: X_\omega \rightarrow Y_\omega} for each {\omega \in \Omega}, with the function {f: \Gamma(X|E) \rightarrow \Gamma(Y|E)} then being a direct sum of the factor functions {f_\omega}:

\displaystyle  f( (x_\omega)_{\omega \in E} ) = ( f_\omega(x_\omega) )_{\omega \in E}.

Thus (in the discrete, at most countable setting, at least) stochastic functions do not mix together information from different states {\omega} in a sample space; the value of {f(x)} at {\omega} depends only on the value of {x} at {\omega}. The situation is a bit more subtle for continuous probability spaces, due to the identification of stochastic objects that agree almost surely, nevertheness it is still good intuition to think of stochastic functions as essentially being “pointwise” or “local” in nature.

One can now form the stochastic set {\hbox{Hom}(X \rightarrow Y)|\Omega} of functions from {X|\Omega} to {Y|\Omega}, by setting {\Gamma(\hbox{Hom}(X \rightarrow Y)|E)} for any event {E} to be the set of local stochastic functions {f: X|E \rightarrow Y|E} of the localisations of {X|\Omega, Y|\Omega} to {E}; this is a stochastic set if we use the obvious restriction maps. In the case when {\Omega} is discrete and at most countable, the fibre {\hbox{Hom}(X \rightarrow Y)_\omega} at a point {\omega} of positive measure is simply the set {Y_\omega^{X_\omega}} of functions from {X_\omega} to {Y_\omega}.

In a similar spirit, we say that one stochastic set {Y|\Omega} is a (stochastic) subset of another {X|\Omega}, and write {Y|\Omega \subset X|\Omega}, if we have a stochastic inclusion map, thus {\Gamma(Y|E) \subset \Gamma(X|E)} for all events {E}, with the restriction maps being compatible. We can then define the power set {2^X|\Omega} of a stochastic set {X|\Omega} by setting {\Gamma(2^X|E)} for any event {E} to be the set of all stochastic subsets {Y|E} of {X|E} relative to {E}; it is easy to see that {2^X|\Omega} is a stochastic set with the obvious restriction maps (one can also identify {2^X|\Omega} with {\hbox{Hom}(X, \{\hbox{true},\hbox{false}\})|\Omega} in the obvious fashion). Again, when {\Omega} is discrete and at most countable, the fibre of {2^X|\Omega} at a point {\omega} of positive measure is simply the deterministic power set {2^{X_\omega}}.

Note that if {f: X|\Omega \rightarrow Y|\Omega} is a stochastic function and {Y'|\Omega} is a stochastic subset of {Y|\Omega}, then the inverse image {f^{-1}(Y')|\Omega}, defined by setting {\Gamma(f^{-1}(Y')|E)} for any event {E} to be the set of those {x \in \Gamma(X|E)} with {f(x) \in \Gamma(Y'|E)}, is a stochastic subset of {X|\Omega}. In particular, given a {k}-ary relation {R: X_1 \times \dots \times X_k|\Omega \rightarrow \{\hbox{true}, \hbox{false}\}|\Omega}, the inverse image {R^{-1}( \{ \hbox{true} \}|\Omega )} is a stochastic subset of {X_1 \times \dots \times X_k|\Omega}, which by abuse of notation we denote as

\displaystyle  \{ (x_1,\dots,x_k) \in X_1 \times \dots \times X_k: R(x_1,\dots,x_k) \hbox{ is true} \}|\Omega.

In a similar spirit, if {X'|\Omega} is a stochastic subset of {X|\Omega} and {f: X|\Omega \rightarrow Y|\Omega} is a stochastic function, we can define the image {f(X')|\Omega} by setting {\Gamma(f(X')|E)} to be the set of those {f(x)} with {x \in \Gamma(X'|E)}; one easily verifies that this is a stochastic subset of {Y|\Omega}.

Remark 2 One should caution that in the definition of the subset relation {Y|\Omega \subset X|\Omega}, it is important that {\Gamma(Y|E) \subset \Gamma(X|E)} for all events {E}, not just the global event {\Omega}; in particular, just because a stochastic set {X|\Omega} has no global sections, does not mean that it is contained in the stochastic empty set {\emptyset|\Omega}.

Now we discuss Boolean operations on stochastic subsets of a given stochastic set {X|\Omega}. Given two stochastic subsets {X_1|\Omega, X_2|\Omega} of {X|\Omega}, the stochastic intersection {(X_1 \cap X_2)|\Omega} is defined by setting {\Gamma((X_1 \cap X_2)|E)} to be the set of {x \in \Gamma(X|E)} that lie in both {\Gamma(X_1|E)} and {\Gamma(X_2|E)}:

\displaystyle  \Gamma(X_1 \cap X_2)|E) := \Gamma(X_1|E) \cap \Gamma(X_2|E).

This is easily verified to again be a stochastic subset of {X|\Omega}. More generally one may define stochastic countable intersections {(\bigcap_{n=1}^\infty X_n)|\Omega} for any sequence {X_n|\Omega} of stochastic subsets of {X|\Omega}. One could extend this definition to uncountable families if one wished, but I would advise against it, because some of the usual laws of Boolean algebra (e.g. the de Morgan laws) may break down in this setting.

Stochastic unions are a bit more subtle. The set {\Gamma((X_1 \cup X_2)|E)} should not be defined to simply be the union of {\Gamma(X_1|E)} and {\Gamma(X_2|E)}, as this would not respect the gluing axiom. Instead, we define {\Gamma((X_1 \cup X_2)|E)} to be the set of all {x \in \Gamma(X|E)} such that one can cover {E} by measurable subevents {E_1,E_2} such that {x_i|E_i \in \Gamma(X_i|E_i)} for {i=1,2}; then {(X_1 \cup X_2)|\Omega} may be verified to be a stochastic subset of {X|\Omega}. Thus for instance {\{0,1\}|\Omega} is the stochastic union of {\{0\}|\Omega} and {\{1\}|\Omega}. Similarly for countable unions {(\bigcup_{n=1}^\infty X_n)|\Omega} of stochastic subsets {X_n|\Omega} of {X|\Omega}, although for uncountable unions are extremely problematic (they are disliked by both the measure theory and the countable gluing axiom) and will not be defined here. Finally, the stochastic difference set {\Gamma((X_1 \backslash X_2)|E)} is defined as the set of all {x|E} in {\Gamma(X_1|E)} such that {x|F \not \in \Gamma(X_2|F)} for any subevent {F} of {E} of positive probability. One may verify that in the case when {\Omega} is discrete and at most countable, these Boolean operations correspond to the classical Boolean operations applied separately to each fibre {X_{i,\omega}} of the relevant sets {X_i}. We also leave as an exercise to the reader to verify the usual laws of Boolean arithmetic, e.g. the de Morgan laws, provided that one works with at most countable unions and intersections.

One can also consider a stochastic finite union {(\bigcup_{n=1}^N X_n)|\Omega} in which the number {N} of sets in the union is itself stochastic. More precisely, let {X|\Omega} be a stochastic set, let {N \in {\bf N}|\Omega} be a stochastic natural number, and let {n \mapsto X_n|\Omega} be a stochastic function from the stochastic set {\{ n \in {\bf N}: n \leq N\}|\Omega} (defined by setting {\Gamma(\{n \in {\bf N}: n\leq N\}|E) := \{ n \in {\bf N}|E: n \leq N|E\}})) to the stochastic power set {2^X|\Omega}. Here we are considering {0} to be a natural number, to allow for unions that are possibly empty, with {{\bf N}_+ := {\bf N} \backslash \{0\}} used for the positive natural numbers. We also write {(X_n)_{n=1}^N|\Omega} for the stochastic function {n \mapsto X_n|\Omega}. Then we can define the stochastic union {\bigcup_{n=1}^N X_n|\Omega} by setting {\Gamma(\bigcup_{n=1}^N X_n|E)} for an event {E} to be the set of local elements {x \in \Gamma(X|E)} with the property that there exists a covering of {E} by measurable subevents {E_{n_0}} for {n_0 \in {\bf N}_+}, such that one has {n_0 \leq N|E_{n_0}} and {x|E_{n_0} \in \Gamma(X_{n_0}|E_{n_0})}. One can verify that {\bigcup_{n=1}^N X_n|\Omega} is a stochastic set (with the obvious restriction maps). Again, in the model case when {\Omega} is discrete and at most countable, the fibre {(\bigcup_{n=1}^N X_n)_\omega} is what one would expect it to be, namely {\bigcup_{n=1}^{N(\omega)} (X_n)_\omega}.

The Cartesian product {(X \times Y)|\Omega} of two stochastic sets may be defined by setting {\Gamma((X \times Y)|E) := \Gamma(X|E) \times \Gamma(Y|E)} for all events {E}, with the obvious restriction maps; this is easily seen to be another stochastic set. This lets one define the concept of a {k}-ary operation {f: (X_1 \times \dots \times X_k)|\Omega \rightarrow Y|\Omega} from {k} stochastic sets {X_1,\dots,X_k} to another stochastic set {Y}, or a {k}-ary relation {R: (X_1 \times \dots \times X_k)|\Omega \rightarrow \{\hbox{true}, \hbox{false}\}|\Omega}. In particular, given {x_i \in X_i|\Omega} for {i=1,\dots,k}, the relation {R(x_1,\dots,x_k)} may be deterministically true, deterministically false, or have some other stochastic truth value.

Remark 3 In the degenerate case when {\Omega} is null, stochastic logic becomes a bit weird: all stochastic statements are deterministically true, as are their stochastic negations, since every event in {\Omega} (even the empty set) now holds with full probability. Among other pathologies, the empty set now has a global element over {\Omega} (this is analogous to the notorious convention {0^0=1}), and any two deterministic objects {x,y} become equal over {\Omega}: {x|\Omega=y|\Omega}.

The following simple observation is crucial to subsequent discussion. If {(x_n)_{n \in {\bf N}_+}} is a sequence taking values in the global elements {\Gamma(X|\Omega)} of a stochastic space {X|\Omega}, then we may also define global elements {x_n \in \Gamma(X|\Omega)} for stochastic indices {n \in {\bf N}_+|\Omega} as well, by appealing to the countable gluing axiom to glue together {x_{n_0}} restricted to the set {\{ \omega \in \Omega: n(\omega) = n_0\}} for each deterministic natural number {n_0} to form {x_n}. With this definition, the map {n \mapsto x_n} is a stochastic function from {{\bf N}_+|\Omega} to {X|\Omega}; indeed, this creates a one-to-one correspondence between external sequences (maps {n \mapsto x_n} from {{\bf N}_+} to {\Gamma(X|\Omega)}) and stochastic sequences (stochastic functions {n \mapsto x_n} from {{\bf N}_+|\Omega} to {X|\Omega}). Similarly with {{\bf N}_+} replaced by any other at most countable set. This observation will be important in allowing many deterministic arguments involving sequences will be able to be carried over to the stochastic setting.

We now specialise from the extremely broad discipline of set theory to the more focused discipline of real analysis. There are two fundamental axioms that underlie real analysis (and in particular distinguishes it from real algebra). The first is the Archimedean property, which we phrase in the “no infinitesimal” formulation as follows:

Proposition 2 (Archimedean property) Let {x \in {\bf R}} be such that {x \leq 1/n} for all positive natural numbers {n}. Then {x \leq 0}.

The other is the least upper bound axiom:

Proposition 3 (Least upper bound axiom) Let {S} be a non-empty subset of {{\bf R}} which has an upper bound {M \in {\bf R}}, thus {x \leq M} for all {x \in S}. Then there exists a unique real number {\sup S \in {\bf R}} with the following properties:

  • {x \leq \sup S} for all {x \in S}.
  • For any real {L < \sup S}, there exists {x \in S} such that {L < x \leq \sup S}.
  • {\sup S \leq M}.

Furthermore, {\sup S} does not depend on the choice of {M}.

The Archimedean property extends easily to the stochastic setting:

Proposition 4 (Stochastic Archimedean property) Let {x \in \Gamma({\bf R}|\Omega)} be such that {x \leq 1/n} for all deterministic natural numbers {n}. Then {x \leq 0}.

Remark 4 Here, incidentally, is one place in which this stochastic formalism deviates from the nonstandard analysis formalism, as the latter certainly permits the existence of infinitesimal elements. On the other hand, we caution that stochastic real numbers are permitted to be unbounded, so that formulation of Archimedean property is not valid in the stochastic setting.

The proof is easy and is left to the reader. The least upper bound axiom also extends nicely to the stochastic setting, but the proof requires more work (in particular, our argument uses the monotone convergence theorem):

Theorem 5 (Stochastic least upper bound axiom) Let {S|\Omega} be a stochastic subset of {{\bf R}|\Omega} which has a global upper bound {M \in {\bf R}|\Omega}, thus {x \leq M} for all {x \in \Gamma(S|\Omega)}, and is globally non-empty in the sense that there is at least one global element {x \in \Gamma(S|\Omega)}. Then there exists a unique stochastic real number {\sup S \in \Gamma({\bf R}|\Omega)} with the following properties:

  • {x \leq \sup S} for all {x \in \Gamma(S|\Omega)}.
  • For any stochastic real {L < \sup S}, there exists {x \in \Gamma(S|\Omega)} such that {L < x \leq \sup S}.
  • {\sup S \leq M}.

Furthermore, {\sup S} does not depend on the choice of {M}.

For future reference, we note that the same result holds with {{\bf R}} replaced by {{\bf N} \cup \{+\infty\}} throughout, since the latter may be embedded in the former, for instance by mapping {n} to {1 - \frac{1}{n+1}} and {+\infty} to {1}. In applications, the above theorem serves as a reasonable substitute for the countable axiom of choice, which does not appear to hold in unrestricted generality relative to a measure space; in particular, it can be used to generate various extremising sequences for stochastic functionals on various stochastic function spaces.

Proof: Uniqueness is clear (using the Archimedean property), as well as the independence on {M}, so we turn to existence. By using an order-preserving map from {{\bf R}} to {(-1,1)} (e.g. {x \mapsto \frac{2}{\pi} \hbox{arctan}(x)}) we may assume that {S|\Omega} is a subset of {(-1,1)|\Omega}, and that {M < 1}.

We observe that {\Gamma(S|\Omega)} is a lattice: if {x, y \in \Gamma(S|\Omega)}, then {\max(x,y)} and {\min(x,y)} also lie in {\Gamma(S|\Omega)}. Indeed, {\max(x,y)} may be formed by appealing to the countable gluing axiom to glue {y} (restricted the set {\{ \omega \in \Omega: x(\omega) < y(\omega) \}}) with {x} (restricted to the set {\{ \omega \in \Omega: x(\omega) \geq y(\omega) \}}), and similarly for {\min(x,y)}. (Here we use the fact that relations such as {<} are Borel measurable on {{\bf R}}.)

Let {A \in {\bf R}} denote the deterministic quantity

\displaystyle  A := \sup \{ \int_\Omega x(\omega)\ d\mu(\omega): x \in \Gamma(S|\Omega) \}

then (by Proposition 3!) {A} is well-defined; here we use the hypothesis that {\mu(\Omega)} is finite. Thus we may find a sequence {(x_n)_{n \in {\bf N}}} of elements {x_n} of {\Gamma(S|\Omega)} such that

\displaystyle  \int_\Omega x_n(\omega)\ d\mu(\omega) \rightarrow A \hbox{ as } n \rightarrow \infty. \ \ \ \ \ (1)

Using the lattice property, we may assume that the {x_n} are non-decreasing: {x_n \leq x_m} whenever {n \leq m}. If we then define {\sup S(\omega) := \sup_n x_n(\omega)} (after choosing measurable representatives of each equivalence class {x_n}), then {\sup S} is a stochastic real with {\sup S \leq M}.

If {x \in \Gamma(S|\Omega)}, then {\max(x,x_n) \in \Gamma(S|\Omega)}, and so

\displaystyle  \int_\Omega \max(x,x_n)\ d\mu(\omega) \leq A.

From this and (1) we conclude that

\displaystyle  \int_\Omega \max(x-x_n,0) \rightarrow 0 \hbox{ as } n \rightarrow \infty.

From monotone convergence, we conclude that

\displaystyle  \int_\Omega \max(x-\sup S,0) = 0

and so {x \leq \sup S}, as required.

Now let {L < \sup S} be a stochastic real. After choosing measurable representatives of each relevant equivalence class, we see that for almost every {\omega \in \Omega}, we can find a natural number {n(\omega)} with {x_{n(\omega)} > L}. If we choose {n(\omega)} to be the first such positive natural number when it exists, and (say) {1} otherwise, then {n} is a stochastic positive natural number and {L < x_n}. The claim follows. \Box

Using Proposition 4 and Theorem 5, one can then revisit many of the other foundational results of deterministic real analysis, and develop stochastic analogues; we give some examples of this below the fold (focusing on the Heine-Borel theorem and a case of the spectral theorem). As an application of this formalism, we revisit some of the Furstenberg-Zimmer structural theory of measure-preserving systems, particularly that of relatively compact and relatively weakly mixing systems, and interpret them in this framework, basically as stochastic versions of compact and weakly mixing systems (though with the caveat that the shift map is allowed to act non-trivially on the underlying probability space). As this formalism is “point-free”, in that it avoids explicit use of fibres and disintegrations, it will be well suited for generalising this structure theory to settings in which the underlying probability spaces are not standard Borel, and the underlying groups are uncountable; I hope to discuss such generalisations in future blog posts.

Remark 5 Roughly speaking, stochastic real analysis can be viewed as a restricted subset of classical real analysis in which all operations have to be “measurable” with respect to the base space. In particular, indiscriminate application of the axiom of choice is not permitted, and one should largely restrict oneself to performing countable unions and intersections rather than arbitrary unions or intersections. Presumably one can formalise this intuition with a suitable “countable transfer principle”, but I was not able to formulate a clean and general principle of this sort, instead verifying various assertions about stochastic objects by hand rather than by direct transfer from the deterministic setting. However, it would be desirable to have such a principle, since otherwise one is faced with the tedious task of redoing all the foundations of real analysis (or whatever other base theory of mathematics one is going to be working in) in the stochastic setting by carefully repeating all the arguments.

More generally, topos theory is a good formalism for capturing precisely the informal idea of performing mathematics with certain operations, such as the axiom of choice, the law of the excluded middle, or arbitrary unions and intersections, being somehow “prohibited” or otherwise “restricted”.

— 1. Metric spaces relative to a finite measure space —

The definition of a metric space carries over in the obvious fashion to the stochastic setting:

Definition 6 (Stochastic metric spaces) A stochastic metric space {X|\Omega = (X|\Omega,d)} is defined to be a stochastic set {X|\Omega}, together with a stochastic function {d: X|\Omega \times X|\Omega \rightarrow [0,+\infty)|\Omega} that obeys the following axioms for each event {\Omega'}:

  • (Non-degeneracy) If {x,y \in X|\Omega'}, then {d(x,y)=0} if and only if {x=y}.
  • (Symmetry) If {x,y \in X|\Omega'}, then {d(x,y) = d(y,x)}.
  • (Triangle inequality) If {x,y,z \in X|\Omega'}, then {d(x,z) \leq d(x,y)+d(y,z)}.

Remark 6 One could potentially interpret the non-degeneracy axiom in two ways; either deterministically ({d(x,y)=0} is deterministically true if and only if {x=y} is deterministically true) or stochastically (“{d(x,y)=0} if and only if {x=y}” is deterministically true). However, it is easy to see by a gluing argument that the two interpretations are logically equivalent. Also, if {X|\Omega} is globally non-empty, then one only needs to verify the metric space axioms for {\Omega'=\Omega}, as one can then obtain the {\Omega' \subsetneq \Omega} cases by gluing with a global section on {\Omega \backslash \Omega'}. However, when {X|\Omega} has no global elements, it becomes necessary to work locally.

Note that if {(X,d)} is a deterministic measurable metric space (thus {X} is a measurable space equipped with a measurable metric {d}), then its stochastic counterpart {(X|\Omega, d)} is a stochastic metric space. (As usual, we do not attempt to interpret {(X|\Omega, d)} when there is no measurable structure present for {X}.) In the case of a discrete at most countable {\Omega} (and after deleting any points of measure zero), a stochastic metric space {(X|\Omega,d)} is essentially just a bundle {(X_\omega,d_\omega)_{\omega \in \Omega}} of metric spaces, with no relations constraining these metric spaces with each other (for instance, the cardinality of {X_\omega} may vary arbitrarily with {\omega}).

We extend the notion of convergence in stochastic metric spaces:

Definition 7 (Stochastic convergence) Let {X|\Omega = (X|\Omega,d)} be a stochastic metric space, and let {(x_n)_{n \in {\bf N}_+}} be a sequence in {\Gamma(X|\Omega)} (which, as discussed earlier, may be viewed as a stochastic function {n \mapsto x_n} from {{\bf N}_+|\Omega} to {X|\Omega}). Let {x} be an element of {\Gamma(X|\Omega)}.

  • We say that {x_n} stochastically converges to {x} if, for every stochastic real {\epsilon>0}, there exists a stochastic positive natural number {N \in {\bf N}_+|\Omega} such that {d(x_n,x) < \epsilon} for all stochastic positive natural numbers {n \in {\bf N}_+|\Omega} with {n \geq N}.
  • We say that {x_n} is stochastically Cauchy if, for every stochastic real {\epsilon>0}, there exists a stochastic natural number {N \in {\bf N}_+|\Omega} such that {d(x_n,x_m) < \epsilon} for all stochastic natural numbers {n,m \in {\bf N}_+|\Omega} with {n,m \geq N}.
  • We say that {X|\Omega} is stochastically complete if every stochastically Cauchy sequence is stochastically convergent, and furthermore for any event {\Omega'} in {\Omega}, any stochastically Cauchy sequence relative to {\Omega'} is stochastically convergent relative to {\Omega'}.

As usual, the additional localisation in the definition of stochastic completeness to an event {\Omega'} is needed to avoid a stochastic set being stochastically complete for the trivial reason that one of its fibres happens to be empty, so that there are no global elements of the stochastic set, only local elements. (This localisation is not needed for the notions of stochastic convergence or the stochastic Cauchy property, as these automatically are preserved by localisation.)

Exercise 3 Show that to verify stochastic convergence, it suffices to restrict attention to errors {\epsilon} of the form {\epsilon = 1/m} for deterministic positive natural numbers {m}. Similarly for the stochastically Cauchy property.

Exercise 4 Let {X} be a measurable metric space. Show that {X|\Omega} is stochastically complete if and only if {X} is complete. Thus for instance {{\bf R}|\Omega} is stochastically complete.

In the case when {\Omega} is discrete and at most countable, or when {X|\Omega} is the stochastic version of a deterministic measurable space {X}, stochastic convergence is just the familiar notion of almost sure convergence: {x_n \in \Gamma(X|\Omega)} converges stochastically to {x \in \Gamma(X|\Omega)} if and only if, for almost every {\omega \in \Omega}, {x_n(\omega)} converges to {x(\omega)} in {X_\omega}. There is no uniformity of convergence in the {\omega} parameter; such a uniformity could be imposed by requiring the quantity {N} in the above definition to be a deterministic natural number rather than a stochastic one, but we will not need this notion here. Similarly for the stochastic Cauchy property. Stochastic completeness in this context is then equivalent to completeness of {X_\omega} for each {\omega} that occurs with positive probability. (As noted previously, it is important here that we define stochastic completeness with localisation, in case some of the fibres {X_\omega} are empty.)

In a stochastic metric space {X|\Omega}, we can form the balls {B(x,r)|\Omega} for any {x \in \Gamma(X|\Omega)} and stochastic real {r>0}, by setting {\Gamma(B(x,r)|E)} to be the set of all {y \in \Gamma(X|E)} such that {d(x,y) < r} locally on {E}; these are stochastic subsets of {X|\Omega} (indeed, {B(x,r)} is the inverse image of {\{ r' \in [0,+\infty): r' < r \}|\Omega} under the pinned distance map {y \mapsto d(x,y)}).

By chasing the definitions, we see that if {(x_n)_{n \in {\bf N}_+}} is a sequence of elements {x_n \in \Gamma(X|\Omega)} of a stochastic metric space {X|\Omega}, and {x} is an element of {\Gamma(X|\Omega)}, then {x_n} stochastically converges to {x} if and only if, for every stochastic {\epsilon>0}, there exists a stochastic natural positive number {N \in \Gamma({\bf N}_+|\Omega)} such that {x_n \in \Gamma(B(x,\epsilon)|\Omega)} for all stochastic positive natural numbers {n \geq N}.

Given a sequence {(x_n)_{n \in {\bf N}_+}} of elements {x_n \in \Gamma(X|\Omega)} of a stochastic metric space {X|\Omega}, we define a stochastic subsequence {(x_{n_j})_{j \in {\bf N}_+}} to be a sequence of the form {j \mapsto x_{n_j}}, where {(n_j)_{j \in {\bf N}_+}} is a sequence of stochastic natural numbers {n_j \in {\bf N}_+|\Omega}, which stochastically go to infinity in the following sense: for every stochastic positive natural number {N \in \Gamma({\bf N}_+|\Omega)}, there exists a stochastic positive natural number {J} such that {n_j \geq N} for all stochastic positive natural numbers {j \geq J}. Note that when {\Omega} is discrete and at most countable, this operation corresponds to selecting a subsequence {(x_{n_j(\omega),\omega})_{j \in {\bf N}_+}} of {(x_{n,\omega})_{n \in {\bf N}_+}} for each {\omega} occurring with positive probability, with the indices {n_j(\omega)} of the subsequence permitted to vary in {\omega}.

Exercise 5 Let {(x_n)_{n \in{\bf N}_+}} be a sequence of elements {\Gamma(X|\Omega)} of a stochastic metric space {X|\Omega}, and let {x \in \Gamma(X|\Omega)}.

  • (i) Show that {x_n} can converge stochastically to at most one element of {\Gamma(X|\Omega)}.
  • (ii) Show that if {x_n} converges stochastically to {x}, then every stochastic subsequence {(x_{n_j})_{j \in {\bf N}_+}} also converges stochastically to {x}.
  • (iii) Show that if {x_n} is stochastically Cauchy, and some stochastic subsequence of {x_n} converges stochastically to {x}, then the entire sequence {x_n} converges stochastically to {x}.
  • (iv) Show that there exists an event {E}, unique up to null events, with the property that {x_n|E} converges stochastically to {x|E}, and that there exists a stochastic {\epsilon>0} over the complement {E^c} of {E}, with the property that for any stochastic natural number {N|E^c} on {E^c}, there exists {n|E^c \geq N|E^c} such that {d(x,x_n)|E^c \geq \epsilon}. (Informally, {E} is the set of states for which {x_n} converges to {x}; the key point is that this set is automatically measurable.)
  • (v) (Urysohn subsequence principle) Show that {x_n} converges stochastically to {x} if and only if every stochastic subsequence {(x_{n_j})_{j \in {\bf N}_+}} of {(x_n)_{n \in {\bf N}_+}} has a further stochastic subsequence {(x_{n_{j_k}})_{k \in {\bf N}_+}} that converges stochastically to {x}.

(Hint: All of these exercises can be established through consideration of the events {E_{n,k}} for {n,k \in {\bf N}_+}, defined up to null events as the event that {d( x_n, x ) < \frac{1}{k}} holds stochastically.)

Next, we define the stochastic counterpart of total boundedness.

Definition 8 (Total boundedness) A stochastic metric space {X|\Omega} is said to be stochastically totally bounded if, for every stochastic real {\epsilon>0}, there exists a stochastic natural number {N \in {\bf N}|\Omega} and a stochastic function {n \mapsto x_n} from the stochastic set {\{ n \in {\bf N}_+: n \leq N \}|\Omega} to {X|\Omega}, such that

\displaystyle  X|\Omega = \bigcup_{n=1}^N B(x_n,\epsilon)|\Omega.

(Note that we allow {N} to be zero locally or globally; thus for instance the empty set {\emptyset|\Omega} is considered to be totally bounded.) We will denote this stochastic function {n \mapsto x_n} as {(x_n)_{n=1}^N|\Omega}.

Exercise 6 If {\Omega} is discrete and at most countable, show that {X|\Omega} is stochastically totally bounded if and only if for each {\omega} of positive probability, the fibre {X_\omega} is totally bounded in the deterministic sense.

Exercise 7 Show that to verify the stochastic total boundedness of a stochastic metric space, it suffices to do so for parameters {\epsilon} of the form {\epsilon=1/k} for some deterministic positive natural number {k}.

We have a stochastic version of (a fragment of) the Heine-Borel theorem:

Theorem 9 (Stochastic Heine-Borel theorem) Let {X|\Omega} be a stochastic metric space. Then the following are equivalent:

  • (i) {X|\Omega} is stochastically complete and stochastically totally bounded.
  • (ii) Every sequence in {\Gamma(X|\Omega)} has a stochastic subsequence that is stochastically convergent. Furthermore, for any event {\Omega'}, every sequence in {\Gamma(X|\Omega')} has a stochastic subsequence that is stochastically convergent relative to {\Omega'}.

As with the definition of stochastic completeness, the second part of (ii) is necessary: if for instance {\Omega} is discrete and countable, and one of the fibres {X_\omega} happens to be empty, then there are no global elements of {X|\Omega} and the first part of (ii) becomes trivially true, even if other fibres of {X_\omega} fail to be complete or totally bounded.

Inspired by the above theorem, we will call a stochastic metric space {X|\Omega} stochastically compact if (i) or (ii) holds. Note that this only recovers a fragment of the deterministic Heine-Borel theorem, as the characterisation of compactness in terms of open covers is missing. I was not able to set up a characterisation of this form, since one was only allowed to use countable unions; but perhaps some version of this characterisation can be salvaged.

Proof: The basic idea here is to mimic the classical proof of this fragment of the Heine-Borel theorem, taking care to avoid any internal appeal to the countable axiom of choice in order to keep everything measurable. (However, we can and will use the axiom of countable choice externally in the ambient set theory.)

Suppose first that {X|\Omega} fails to be stochastically complete. Then we can find an event {\Omega'} and a stochastically Cauchy sequence {x_n \in \Gamma(X|\Omega')} for {n \in {\bf N}} that fails to be stochastically convergent in {\Gamma(X|\Omega')}. By Exercise 5, no stochastic subsequence of {x_n} can be stochastically convergent in {X|\Omega'} either, and so (ii) fails.

Now suppose that {X|\Omega} fails to be stochastically bounded. Then one can find a stochastic real number {\epsilon>0}, such that it is not possible to find any stochastic natural number {N \in \Gamma({\bf N}|\Omega)} and a stochastic sequence {(x_n)_{n=1}^N|\Omega} (that is, a stochastic function {n \mapsto x_n} from {\{ n \in {\bf N}_+: n \leq N \}|\Omega} to {X|\Omega}), such that

\displaystyle  X|\Omega = \bigcup_{n=1}^N B(x_n,\epsilon)|\Omega.

(By Exercise 7 one could take {\epsilon=1/k} for a deterministic positive natural {k}, but we will not need to do so here.)

Let {S} be the set of those {N \in \Gamma({\bf N}|\Omega)} for which one can find a stochastic sequence {(x_n)_{n=1}^N|\Omega} which is {\epsilon}-separated in the sense that {d(x_n,x_m) \geq \epsilon} for all distinct {n,m \in {\bf N}_+|\Omega} with {n,m \leq N}, and more generally for any event {\Omega'}, that {d(x_n,x_m) \geq \epsilon} relative to {\Omega'} for all distinct {n,m \in{\bf N}_+|\Omega'} with {n,m \leq N|\Omega'}. (We need to relativise to {\Omega'} here to properly manage the case that {N} sometimes vanishes.) It is easy to see that {S} can be given the structure of a stochastic subset of {{\bf N}|\Omega}, and contains {0|\Omega}. By Theorem 5, there is thus a well-defined supremum {\sup S \in ({\bf N} \cup \{+\infty\})|\Omega}. We claim that {\sup S} is stochastically infinite with positive probability. Suppose for contradiction that this were not the case, then {\sup S \in {\bf N}|\Omega}. By definition of supremum (taking {L = \max(\sup S - 1, 0)} in Theorem 5), we conclude that {\sup S \in S}, thus there exists a stochastic sequence {(x_n)_{n=1}^{\sup S}|\Omega} which is {\epsilon}-separated. We now claim that

\displaystyle  X|\Omega = \bigcup_{n=1}^{\sup S} B(x_n,\epsilon)|\Omega, \ \ \ \ \ (2)

which contradicts the hypothesis that {X|\Omega} is not stochastically totally bounded. Indeed, if (2) failed, then there must exist some local element {x \in \Gamma(X|\Omega')} of {X|\Omega} which does not lie in {\Gamma(\bigcup_{n=1}^{\sup S} B(x_n,\epsilon)|\Omega')}. In particular, there must exist an event {\Omega'' \subset \Omega'} of positive probability such that {d(x,x_n) \geq \epsilon} on {\Omega''} for all {n \in {\bf N}_+|\Omega''} with {n \leq \sup S|\Omega''}. If we then define {x_{\sup S + 1}} on {\Omega''} by {x_{\sup S + 1} := x|\Omega''}, then we see that {n \mapsto x_n} on {\{ n \in {\bf N}_+: n \leq \sup S+1\}|\Omega''} is {\epsilon}-separated on {\Omega''}, and on gluing with the original {n \mapsto x_n} on the complement of {\Omega''}, we see that {\sup_S + 1_{\Omega''}} lies in {S}, contradicting the maximal nature of {\sup_S}. Thus {\sup S} is stochastically infinite with positive probability.

We may now pass to an event {\Omega'} of positive probability on which {\sup S = +\infty}. By definition of the supremum, we conclude that for every deterministic natural number {N}, we may find a sequence {x_{N,1},\dots,x_{N,N} \in \Gamma(X|\Omega')} which are {\epsilon}-separated. Observe that if {n < N} is a deterministic natural number and we have elements {y_1,\dots,y_n \in \Gamma(X|\Omega')}, then we can find a stochastic {m \leq N} on {\Omega'} such that {d(x_{N,m}, y_i) \geq \epsilon/2} for all {i=1,\dots,N}, since each {x_{N,m}} can stochastically lie within {\epsilon/2} of at most one {y_i}. (To see this rigorously, one can consider the Boolean geometry of the events on which {d(x_{N,n},y_i) \geq \epsilon/2} stochastically hold for various {n,i}.) By iterating this construction (and applying the axiom of countable choice externally), we may find an infinite sequence {y_1,y_2,\dots} in {\Gamma(X|\Omega')} which is {\epsilon/2}-separated, but then this sequence cannot have a convergent subsequence, and so (ii) fails.

Now suppose that {X|\Omega} is stochastically complete and stochastically totally bounded, and let {(x_n)_{n \in {\bf N}_+}} be a sequence of local elements {x_n \in \Gamma(X|\Omega')} of {X|\Omega} for some event {\Omega'}, which we may assume without loss of generality to have positive probability. By stochastic completeness, it suffices to find a stochastic subsequence {(x_{n_j})_{j \in {\bf N}_+}} which is stochastically Cauchy on {\Omega'}.

By stochastic total boundedness, one can find a stochastic natural number {M_1 \in {\bf N}|\Omega'} and a stochastic map {m \mapsto y_{1,m}} from {\{ m \in {\bf N}_+: m \leq M_1\}|\Omega'} to {X|\Omega'}, such that

\displaystyle  X|\Omega' = \bigcup_{m=1}^{M_1} B(y_{1,m},M_1)|\Omega'. \ \ \ \ \ (3)

For each deterministic positive natural numbers {m, n \in {\bf N}_+}, we define {E_{m,n}} to be the event in {\Omega'} that the assertions {m \leq M_1} and {d(x_n,y_{1,m})} both hold stochastically; this event is determined up to null events. From (3), we see that

\displaystyle  \bigcup_{m=1}^{M_1} E_{n,m} = \Omega'

holds up to null events for all {n \in {\bf N}_+}. In particular, we have

\displaystyle  \sum_{m=1}^{M_1} 1_{E_{n,m}} \geq 1

almost surely on {\Omega'} for all {n \in {\bf N}_+}, and so on summing in {n}

\displaystyle  \sum_{m=1}^{M_1} \sum_{n \in {\bf N}_+} 1_{E_{n,m}} = +\infty

almost surely on {\Omega'}. By selecting {m \in {\bf N}_+|\Omega'} stochastically to be the least {m} for which {\sum_{n \in {\bf N}_+} 1_{E_{n,m}}} is infinite, we have {m \leq M_1} and

\displaystyle  \sum_{n \in {\bf N}_+} 1_{E_{n,m}} = +\infty

almost surely on {\Omega'}. We can then stochastically choose a sequence {j_{1,1} < j_{1,2} < \dots} in {{\bf N}_+|\Omega'} such that {E_{n_{j_{1,i}},m}} holds almost surely on {\Omega'} for each {i \in {\bf N}_+}, or equivalently that the stochastic subsequence {(x_{n_{j_{1,i}}})_{i \in {\bf N}_+}} lies in {\Gamma( B(y_{1,m},1) | \Omega' )}. Writing {z_1 := y_{1,m}}, we have thus localised this stochastic subsequence to a stochastic unit ball {B(z_1,1)}.

By repeating this argument, we may find a further stochastic subsequence {(x_{n_{j_{2,i}}})_{i \in {\bf N}_+}} of {(x_{n_{j_{1,i}}})_{i \in {\bf N}_+}} that lies in {\Gamma( B(z_2,1/2 )|\Omega')} for some {z_2 \in \Gamma( X|\Omega')}, a yet further subsequence {(x_{n_{j_{3,i}}})_{i \in {\bf N}_+}} that lies in {\Gamma( B( z_3, 1/3)|\Omega' )} for some {z_3 \in \Gamma(X|\Omega')}, and so forth. It is then easy to see that the diagonal sequence {(x_{n_{j_{i,i}}})_{i \in {\bf N}_+}} is stochastically Cauchy, and the claim follows. \Box

For future reference, we remark that the above arguments also show that if {Y|\Omega} is a stochastically totally bounded subset of a stochastically complete metric space {X|\Omega}, then every sequence in {\Gamma(Y|\Omega)} has a stochastic susbequence which converges in {\Gamma(X|\Omega)}.

— 2. Hilbert-Schmidt operators relative to a finite measure space —

One could continue developing stochastic versions of other fundamental results in real analysis (for instance, working out the basic theory of stochastic continuous functions between metric spaces); roughly speaking, it appears that most of these results will go through as long as one does not require the concept of an uncountable union or intersection or the axiom of choice (in particular, I do not see how to develop a stochastic theory of arbitrary topological spaces, although the first countable case may be doable; also, any result reliant on the Hahn-Banach theorem or the non-sequential version of Tychonoff’s theorem will likely not have a good stochastic analogue). I will however focus on the results leading up to the stochastic version of the spectral theorem for Hilbert-Schmidt operators, as this is the application that motivated my post.

Let us first define the concept of a stochastic (real) Hilbert space, in more or less complete analogy with the deterministic counterpart:

Definition 10 (Stochastic Hilbert spaces) A stochastic vector space is a stochastic set {V|\Omega} equipped with an element {0 \in \Gamma(V|\Omega)}, an addition map {+: V|\Omega \times V|\Omega \rightarrow V|\Omega}, and a scalar multiplication map {\cdot: {\bf R}|\Omega \times V|\Omega \rightarrow V|\Omega} which obeys the usual vector space axioms. In other words, when localising to any event {E}, the addition map {+: \Gamma(V|E) \times \Gamma(V|E) \rightarrow \Gamma(V|E)} is commutative and associative with identity {0|E}, and the scalar multiplication map {\cdot: \Gamma({\bf R}|E) \times \Gamma(V|E) \rightarrow \Gamma(V|E)} is bilinear over {\Gamma({\bf R}|\mathop{\bf E})}. In other words, {\Gamma(V|E)} is a module over the commutative ring {\Gamma({\bf R}|E)}. As is usual, we define the subtraction map {-: V|\Omega \times V|\Omega \rightarrow V|\Omega} by the formula {v-w := v + (-1) \cdot w}.

A stochastic inner product space is a stochastic vector space {V|\Omega} equipped with an inner product map {\langle,\rangle: V|\Omega \times V|\Omega \rightarrow {\bf R}|\Omega} which obeys the following axioms for any event {E}:

  • (Symmetry) The map {\langle,\rangle: \Gamma(V|E) \times \Gamma(V|E) \rightarrow \Gamma({\bf R}|E)} is symmetric.
  • (Bilinearity) The map {\langle,\rangle: \Gamma(V|E) \times \Gamma(V|E) \rightarrow \Gamma({\bf R}|E)} is bilinear over {\Gamma({\bf R}|E)}.
  • (Positive semi-definiteness) For any {v \in\Gamma(V|E)}, we have {\langle v,v \rangle \geq 0}, with equality if and only if {v=0}.

By repeating the usual deterministic arguments, it is easy to see that any stochastic inner product space becomes a stochastic metric space with {d(v,w) := \|v-w\|}, where {\|v\| := \langle v,v \rangle^{1/2}}.

A stochastic Hilbert space is a stochastic inner product space {H} which is also stochastically complete. We denote the inner product on such spaces by {\langle,\rangle_H} and the norm by {\| \|_H}.

As usual, in the model case when {\Omega} is discrete and at most countable, a stochastic Hilbert space {H|\Omega} is just a bundle of deterministic Hilbert spaces {H_\omega} for each {\omega \in \Omega} occurring with positive probability, with no relationships between the different fibres {H_\omega} (in particular, their dimensions may vary arbitrarily in {\omega}). In the continuous case, the notion of a stochastic Hilbert space {H|\Omega} is very closely related to that of a Hilbert module over the commutative Banach algebra {L^\infty(\Omega)}; indeed, it is easy to see that the space of global elements {v \in \Gamma(H|\Omega)} of a stochastic Hilbert space {H|\Omega} which are bounded in the sense that {\|v\| \leq M} for some deterministic real {M} forms a Hilbert module over {L^\infty(\Omega)}. (Without the boundedness restriction, one obtains instead a Hilbert module over {\Gamma({\bf R}|\Omega)}.)

Note that we do not impose any separability hypothesis on our Hilbert spaces. Despite this, much of the theory of Hilbert spaces turns out to still be of “countable complexity” in some sense, so that it can be extended to the stochastic setting without too much difficulty.

We now extend the familiar notion of an orthonormal system in a Hilbert space to the stochastic setting. A key point is that we allow the number of elements in this system to also be stochastic.

Definition 11 (Orthonormal system) Let {H|\Omega} be a stochastic Hilbert space. A stochastic orthonormal system {(e_n)_{n=1}^N|\Omega} in {H|\Omega} consists of a stochastic extended natural number {N \in {\bf N} \cup \{+\infty\}|\Omega}, together with a stochastic map {n \mapsto e_n} from {\{ n \in {\bf N}_+: n \leq N \}|\Omega} to {H|\Omega}, such that one has {\langle e_n, e_m \rangle = 1_{n=m}} on {E} for any event {E} and any {n, m \in {\bf N}_+|E} with {n,m \leq N|E}. (Note that we allow {N} to vanish with positive probability, so that the orthonormal system can be stochastically empty.)

Now we can define the notion of a stochastic Hilbert-Schmidt operator.

Definition 12 (Stochastic Hilbert-Schmidt operator) Let {H|\Omega} and {H'|\Omega} be stochastic Hilbert spaces. A stochastic linear operator {T: H|\Omega \rightarrow H'|\Omega} is a stochastic function such that for each event {E}, the localised maps {T: \Gamma(H|E) \rightarrow \Gamma(H'|E)} are linear over {\Gamma({\bf R}|E)}. Such an operator is said to be stochastically bounded if there exists a non-negative stochastic real {A \in \Gamma([0,+\infty)|\Omega)} such that one has

\displaystyle  \| Tv \|_{H'} \leq A|E

for all events {E} and local elements {v \in \Gamma(H|E)}. By (the negation of) Theorem 5, there is a least such {A}, which we denote as {\|T\|_{B(H \rightarrow H')} \in \Gamma([0,+\infty)|\Omega)}.

Similarly, we say that a stochastic linear operator {T: H|\Omega \rightarrow H'|\Omega} is stochastically Hilbert-Schmidt if there exists a non-negative stochastic real {A \in \Gamma([0,+\infty)|\Omega)} such that one has

\displaystyle  \sum_{n=1}^N \sum_{m=1}^M |\langle T e_n, f_m \rangle|^2 \leq A^2|E

for all events {E} and all stochastic orthonormal systems {(e_n)_{n=1}^N|E, (f_m)_{m=1}^M|E} on {H|E} and {H'|E} respectively. Again, there is a least such {A}, which we denote as {\|T\|_{HS(H \rightarrow H')} \in \Gamma([0,+\infty)|\Omega)}.

A stochastic linear operator {T: H|\Omega \rightarrow H'|\Omega} is said to be compact if the image {T( \{ v \in H: \|v\| \leq 1 \} )|\Omega} of the unit ball {\{v \in H: \|v\| \leq 1\}|\Omega} is stochastically totally bounded in {H'|\Omega}.

Exercise 8 Show that any stochastic Hilbert-Schmidt operator {T: H|\Omega \rightarrow H'|\Omega} obeys the bound

\displaystyle  \sum_{n=1}^N \| T e_n \|_{H'}^2 \leq \|T\|_{HS(H \rightarrow H')}^2|E

for all events {E} and all stochastic orthonormal systems {(e_n)_{n=1}^N|E} on {H|E}. Conclude in particular that {T} is stochastically bounded with {\|T\|_{B(H \rightarrow H')} \leq \|T\|_{HS(H \rightarrow H')}}.

We have the following basic fact:

Proposition 13 Any stochastic Hilbert-Schmidt operator {T: H|\Omega \rightarrow H'|\Omega} is stochastically compact.

Proof: Suppose for contradiction that we could find a stochastic Hilbert-Schmidt operator {T: H|\Omega \rightarrow H'|\Omega} which is not stochastically compact, thus {T( \{ v \in H: \|v\| \leq 1 \} )|\Omega} is not stochastically totally bounded. By repeating the arguments used in the proof of Theorem 9, this means that there exists an event {\Omega'} of positive probability, a stochastic real {\epsilon > 0} on {\Omega'}, and an infinite sequence {x_n \in \Gamma(\{ v \in H: \|v\| \leq 1 \}|\Omega')} for {n\in {\bf N}_+} such that the {Tx_n} are {\epsilon}-separated on {\Omega'}.

We will need a stronger separation property. Let us say that a sequence {y_m \in \Gamma(H'|\Omega')} is linearly {\epsilon/4}-separated if one has

\displaystyle  \| y_m - c_1 y_1 - \dots - c_{m-1} y_{m-1} \|_{H'} \geq \epsilon/4 \ \ \ \ \ (4)

on {\Omega'} for any deterministic {m \in{\bf N}_+} and any stochastic reals {c_1,\dots,c_{m-1} \in {\bf R}|\Omega'}. We claim that {\Gamma(T(\{ v \in H: \|v\| \leq 1 \})|\Omega')} contains an infinite sequence {y_m} that is linearly {\epsilon/4}-separated. Indeed, suppose that we have already found a finite sequence {y_1,\dots,y_{m-1}} in {\Gamma(T(\{ v \in H: \|v\| \leq 1 \})|\Omega')} that is linearly {\epsilon/4}-separated for some {m \geq 1}, and wish to add on a further element {y_m} while preserving the linear {\epsilon}-separation property, that is to say we wish to have (4) for all {c_1,\dots,c_{m-1} \in {\bf R}|\Omega'}. By Exercise 8, such a {y_m} would already lie in the closed ball {B( 0, \|T\|_{HS(H \rightarrow H')})}. Now, by elementary geometry (applying a Gram-Schmidt process to the {y_1,\dots,y_{m-1}}) one can cover the stochastic set

\displaystyle  \{ y \in H': \|y\| \leq \|T\|_{HS(H \rightarrow H')}; \| y - c_1 y_1 - \dots - c_{m-1} y_{m-1} \|_{H'} < \epsilon/4 \hbox{ for some } c_1,\dots,c_{m-1} \in {\bf R} \}|\Omega'

by a finite union of balls

\displaystyle  \bigcup_{i=1}^N B( z_i, \epsilon/2 )

for some stochastic {N \in {\bf N}|\Omega'} and some stochastic finite sequence {(z_i)_{i=1}^N|\Omega'} of points in {H'|\Omega'}. Stochastically, each of these balls {B(z_i,\epsilon/2)} may contain at most one of the {x_n}; if we then define the stochastic positive natural number {n \in {\bf N}_+|\Omega'} to be the least {x_n} that stochastically lies outside all of the {B(z_i,\epsilon/2)}, then {n} is well-defined, and if we set {y_m := x_n}, we obtain the desired property (4).

As each {y_m} lies in {\Gamma(T(\{ v \in H: \|v\| \leq 1 \})|\Omega')}, we have {y_m = T v_m} for some {v_m \in \Gamma(H|\Omega')} with {\|v_m\|_H \leq 1}. From the Gram-Schmidt process, one may find an orthonormal system {(e_n)_{n=1}^N|\Omega'} in {\Gamma(H|\Omega')} for some {N \in {\bf N} \cup\{+\infty\}|\Omega'} such that each {v_m} is a linear combination (over {\Gamma({\bf R}|\Omega')}) of those {e_n} with {n \leq N, m}. We may similarly find an orthonormal system {(f_m)_{m=1}^M|\Omega'} in {\Gamma(H'|\Omega')} for some {N \in {\bf N} \cup\{+\infty\}|\Omega'} such that each {y_m} is a linear combination of those {f_i} with {i \leq M,m}. From (4) we conclude that {M=+\infty} and that for each deterministic {m \in {\bf N}_+}, the {f_m} coefficient of {y_m} has magnitude at least {\epsilon/4}, thus

\displaystyle  |\langle T x_m, f_m \rangle| \geq \epsilon/4

and thus by the Pythagoras theorem

\displaystyle  \sum_{n=1}^N |\langle T e_n, f_m \rangle|^2 \geq \epsilon^2/16

on {\Omega'}; summing in {m}, we contradict the Hilbert-Schmidt nature of {T}. \Box

Next, we establish the existence of adjoints:

Theorem 14 (Adjoint operator) Let {T: H|\Omega \rightarrow H'|\Omega} be a stochastically bounded linear operator. Then there exists a unique stochastically bounded linear operator {T^*: H'|\Omega \rightarrow H|\Omega} such that

\displaystyle  \langle Tv, w \rangle_{H'} = \langle v, T^* w \rangle_H

on any event {E} and any {v \in \Gamma(H|E)}, {w \in \Gamma(H'|E)}. In particular we have {\|T\|_{B(H \rightarrow H')} = \|T^*\|_{B(H' \rightarrow H)}}.

Proof: Uniqueness is an easy exercise that we leave to the reader, so we focus on existence. The point here is that the Riesz representation theorem for Hilbert spaces is sufficiently “constructive” that it can be translated to the stochastic setting.

Let {E} be an event and {w \in \Gamma(H'|E)} be a local element of {H'}. We let {S} be the stochastic set on {E} defined by setting {\Gamma(S|F)} for {F \subset E} to be the set of all stochastic real numbers of the form {\langle T v, w \rangle_{H'}}, where {v \in \Gamma(H|F)} with {\|v\|_H \leq 1}. By Theorem 5, we may then find a sequence {v_n \in \Gamma(H|E)} for {n \in {\bf N}_+} such that {\langle T v_n, w \rangle_{H'}} converges stochastically to {\sup S} on {E}. For any two {n,m \in{\bf N}_+}, we have from the parallelogram law that

\displaystyle  \| \frac{v_n+v_m}{2} \|_H \leq (1 - \frac{1}{2} \|v_n-v_m\|_H^2)^{1/2}

and hence by homogeneity

\displaystyle  \frac{1}{2} ( \langle T v_n, w \rangle_H + \langle T v_m, w \rangle_H ) \leq (1 - \frac{1}{2} \|v_n-v_m\|_H^2)^{1/2} \sup S

on {E} combining this with the stochastic convergence of {\langle T v_n, w \rangle_{H'}}, we conclude that {v_n} is stochastically Cauchy on the event in {E} that {\sup S} is non-zero. Setting {v_\infty \in \Gamma(H|E)} to be the stochastic limit of the {v_n} on this event (and set to {0} on the complementary event), we see that {\|v_\infty\|_H \leq 1} and

\displaystyle  \langle Tv_\infty, w \rangle_{H'} = \sup S.

On the event that {\sup S} is non-zero, {v_\infty} is thus non-zero, and consideration of the vectors {v_\infty + tu} for stochastic real {t} and stochastic vectors {u} soon reveals that

\displaystyle  \langle Tu, w \rangle_{H'} = 0

whenever {\langle u, v_\infty \rangle_H = 0}, which by elementary linear algebra gives a representation of the form

\displaystyle  \langle Tv, w \rangle_{H'} = \langle v, T^* w \rangle_{H}

for some {T^* w} (a scalar multiple of {v_\infty}); when {\sup S} vanishes, we simply take {T^* w=0}. It is then a routine matter to verify that {T^*} is a stochastically bounded linear operator, and the claim follows. \Box

We use this to relativise the spectral theorem (or more precisely, the singular value decomposition) for compact operators:

Theorem 15 (Spectral theorem for stochastically compact operators) Let {T: H|\Omega \rightarrow H'|\Omega} be a stochastically compact linear operator, with {T^*} also stochastically compact. Then there exists {N \in \Gamma({\bf N} \cup \{+\infty\}|\Omega)}, orthonormal systems {(e_n)_{n=1}^N|\Omega} and {(f_n)_{n=1}^N|\Omega} of {H|\Omega} and {H'|\Omega} respectively, and a stochastic sequence {(\sigma_n)_{n=1}^N|\Omega} in {(0,+\infty)|\Omega} such that {\sigma_n \geq \sigma_m} on an event {E} whenever {1 \leq m \leq n \leq N} are stochastic natural numbers on {E}, and such that the {\sigma_n} go to zero in the sense that for any stochastic real {\epsilon>0} on an event {E}, there exist a stochastic natural number {N_\epsilon} on {E} such that {\sigma_n \leq \epsilon} on {E} whenever {N_\epsilon \leq n \leq N} is a stochastic natural number on {E}. Furthermore, for any event {E} and {v \in \Gamma(H|E)}, one has

\displaystyle  Tv = \sum_{n=1}^N \sigma_n \langle v, e_n \rangle_H f_n

on {E}. (Note that a Bessel inequality argument shows that the series is convergent; indeed it is even unconditionally convergent.)

It is likely that the hypothesis that {T^*} be stochastically compact is redundant, in that it is implied by the stochastically compact nature of {T}, but I did not attempt to prove this rigorously as it was not needed for my application (which is focused on the Hilbert-Schmidt case).

To prove this theorem, we first establish a fragment of it for the top singular value {n=1}:

Theorem 16 (Largest singular value) Let {T: H|\Omega \rightarrow H'|\Omega} be a stochastically compact linear operator, and let {\Omega'} be the event where {\|T\|_{B(H \rightarrow H')} > 0} (this event is well-defined up to a null event). Then there exist {e \in \Gamma(H|\Omega')}, {f \in \Gamma(H'|\Omega')} with {\|e\|_H = \|f\|_{H'} = 1} on {\Omega'}, such that

\displaystyle  Te = \|T\|_{B(H \rightarrow H')} f

and dually that

\displaystyle  T^* f = \|T\|_{B(H \rightarrow H')} e

on {\Omega'}.

Proof: Note that {H,H'} are necessarily non-trivial on {\Omega'}. Let {S} denote the set of all expressions of the form {\langle Te, f\rangle} on {\Omega'}, where {e \in \Gamma(H|\Omega')}, {f \in \Gamma(H'|\Omega')} with {\|e\|_H = \|f\|_{H'} = 1} on {\Omega'}, then {S} is a globally non-empty stochastic subset of {[0,+\infty)|\Omega'} which has {\|T\|_{op}} as an upper bound. Indeed, from Theorem 5 and the definition of {\|T\|_{B(H \rightarrow H')}}, it is not hard to see that {\|T\|_{B(H \rightarrow H')} = \sup S}. From this, we may construct a sequence in {S} that converges stochastically to {\|T\|_{B(H \rightarrow H')}} on {\Omega'}, and hence we may find sequences {e_n \in \Gamma(H|\Omega')}, {f_n \in \Gamma(H'|\Omega')} for {n \in{\bf N}} with {\|e_n\|_H = \|f_n\|_{H'} = 1} on {\Omega'} with {\langle Te_n, f_n\rangle} stochastically convergent to {\|T\|_{B(H \rightarrow H')}}. By the Cauchy-Schwarz inequality, this implies that {\|Te_n\|_{H'}} is stochastically convergent to {\|T\|_{B(H \rightarrow H')}}; from the parallelogram law applied to {f_n} and {Te_n / \|T\|_{B(H \rightarrow H')}}, we conclude that {f_n - Te_n / \|T\|_{B(H \rightarrow H')}} converges stochastically to zero. On the other hand, as {T} is compact, we can pass to a stochastic subsequence and ensure that {Te_n} is stochastically convergent, thus {f_n} is also stochastically convergent to some limit {f \in \Gamma(H'|\Omega)}. Similar considerations using the adjoint operator {T^*} allow us to assume that {e_n} is stochastically convergent to some limit {e \in \Gamma(H|\Omega)}. It is then routine to verify that {\|e\|_H = \|f\|_{H'} = 1}, {f - Te / \|T\|_{B(H \rightarrow H')} = 0}, and {e - T^* e / \|T\|_{B(H \rightarrow H')} = 0}, giving the claim. \Box

Now we prove Theorem 15.

Proof: (Proof of Theorem 15.) Define a partial singular value decomposition to consist of the following data:

  • A stochastic extended natural number {N \in \Gamma({\bf N} \cup \{+\infty\}|\Omega)};
  • A stochastic orthonormal system {(e_n)_{n=1}^N|\Omega} of {H|\Omega};
  • A stochastic orthonormal system {(f_n)_{n=1}^N|\Omega} of {H'|\Omega}; and
  • A non-increasing stochastic sequence {(\sigma_n)_{n=1}^N|\Omega} in {(0,+\infty)|\Omega}

such that, for any event {E}, and any {n \in {\bf N}_+|E} with {n \leq N|E},

  • {Te_n = \sigma_n f_n} on {E}.
  • {T^* f_n = \sigma_n e_n} on {E}.
  • Whenever {v \in H|E} is orthogonal to {e_m} for all {m \leq n}, one has {\|T v \|_{H'} \leq \sigma_n \|v\|_H} on {E}.
  • Whenever {n \in {\bf N}_+|E} with {n \leq N|E} and {w \in H'|E} is orthogonal to {f_m} for all {m \leq n}, one has {\|T^* w \|_{H} \leq \sigma_n \|v\|_{H'}} on {E}.

We let {S} be the set of all {N} that can arise from a partial singular value decomposition; then {S} is a stochastic subset of {{\bf N} \cup \{\infty\}|\Omega} that contains {0|\Omega}, and by Theorem 5 we can form the supremum {\sup S \in {\bf N} \cup \{\infty\}|\Omega}.

Let us first localise to the event {\Omega'} that {\sup S} is stochastically finite. As in the proof of the Heine-Borel theorem, we can use the discrete nature of the natural numbers to conclude that {\sup S \in S} on {\Omega'}. Thus there exists a partial singular value decomposition on {\Omega'} with {N=\sup S}.

Define {\tilde H} on {\Omega'} by setting {\Gamma(\tilde H|E)} for {E \subset \Omega'} to be the set of all {v \in \Gamma(H|E)} that are orthogonal to all the {e_n} with {n \leq N}; this can easily be seen to be a stochastic Hilbert space (a stochastically finite codimension subspace of {H}). Similarly define {\tilde H'} on {\Omega'} by setting {\Gamma(\tilde H'|E)} to be the set of all {w \in \Gamma(H'|E)} that are orthogonal to all the {f_n} with {n \leq N}. As {Te_n = \sigma_n f_n} and {T^* f_n = \sigma_n e_n}, we see that {T} maps {\tilde H} to {\tilde H'}, and {T^*} maps {\tilde H'} to {\tilde H}; from the remaining axioms of a partial singular value decomposition we see that {\|T\|_{B(\tilde H \rightarrow \tilde H')}} on {\Omega'}. If {\|T\|_{B(\tilde H \rightarrow \tilde H')}} vanishes, then {T} vanishes on {\tilde H}, and one easily obtains the required singular value decomposition in {\Omega'}. Now suppose for contradiction that {\|T\|_{B(\tilde H \rightarrow \tilde H')}} does not (deterministically) vanish, so there is a subevent {E} of {\Omega'} of positive probability on which {\|T\|_{B(\tilde H \rightarrow \tilde H')}} is positive. By Theorem 16, we can then find unit vectors {e_{N+1} \in \Gamma(\tilde H|E)} and {f_{N+1} \in \Gamma(\tilde H'|E)} such that

\displaystyle  Te_{N+1} = \|T\|_{B(\tilde H \rightarrow \tilde H')} f_{N+1}


\displaystyle  T^* f_{N+1} = \|T\|_{B(\tilde H \rightarrow \tilde H')} e_{N+1}.

If we then set {\sigma_{N+1} := \|T\|_{B(\tilde H \rightarrow \tilde H')}}, we can obtain a partial singular value decomposition on {E} with {N} now set to {\sup S+1}; gluing this with the original partial singular value of decomposition on the complement of {E}, we contradict the maximality of {S}. This concludes the proof of the spectral theorem on the event that {\sup S < \infty}.

Now we localise to the complementary event {\Omega''} that {\sup S} is infinite. Now we need to run a compactness argument before we can ensure that {\sup S} actually lies in {S}. Namely, for any deterministic natural number {M}, we can find a partial singular value decomposition with data {N_M, (e_{M,n})_{n=1}^{N_M}, (f_{M,n})_{n=1}^{N_M}, (\sigma_{M,n})_{n=1}^{N_M}} on {\Omega''} such that {N_M \geq M}. We now claim that the {\sigma_{M,n}} decay in {n} uniformly in {M} in the following sense: for any deterministic real {\epsilon > 0}, there exists a stochastic {L_\epsilon \in \Gamma({\bf N}|\Omega'')} such that {\sigma_{M,n} \leq \epsilon} whenever {m \in {\bf N}} and {n \in \Gamma({\bf N}|\Omega'')} are such that {L_\epsilon < n \leq N_M}. Indeed, from the total boundedness of {T \{ v \in H: \|v\|_H \leq 1\}|\Omega''}, one can cover this space by a union {\bigcup_{i=1}^{L_\epsilon} B(w_i, \epsilon/2)} of balls of radius {\epsilon/2} for some {L_\epsilon \in \Gamma({\bf N}|\Omega'')}. If {L_\epsilon < n \leq N_M} is such that {\sigma_{M,n} > \epsilon} on some event {E \subset \Omega''} of positive probability, then the vectors {T e_{M,m} = \sigma_{M,m} f_{M,m}} for {m \leq n} are {\epsilon}-separated on {E} by Pythagoras’s theorem, leading to a contradiction since each ball {B(w_i,\epsilon/2)} can capture at most one of these vectors. This gives the claim.

By repeatedly passing to stochastic subsequences and diagonalising, we may assume that {\sigma_{M,n}} converges stochastically to a limit {\sigma_n \in [0,+\infty)} as {M \rightarrow \infty} for each {n \in {\bf N}} which is non-increasing. By compactness of {T}, we may also assume that the {T e_{M,m} = \sigma_{M,m} f_{M,m}} are stochastically convergent in {M} for each fixed {m}, which implies that the {f_{M,m}} converge stochastically to a limit {f_m} whenever {\sigma_m > 0}. A similar argument using the compactness of {T^*} allows us to assume that {e_{M,m}} converges stochastically to a limit {e_m} whenever {\sigma_m > 0}. One then easily verifies that the {e_m} and {f_m} are orthonormal systems when restricted to those {m} for which {\sigma_m> 0}. Furthermore, a limiting argument shows that whenever {\epsilon>0} is a deterministic real, {E \subset \Omega''} is an event, and {v \in \Gamma(H|E)} is a unit vector orthogonal to {e_m} for those {m \leq L_\epsilon} with {\sigma_m > 0}, then {\|Tv\|_H \leq \epsilon}. From this and a decomposition into orthonormal bases and a limiting argument we see that

\displaystyle  Tv = \sum_{1 \leq n \leq N; \sigma_n > 0} \sigma_n \langle v, e_n \rangle_H f_n

on {E} for any {v \in \Gamma(H|E)}, and the claim follows. \Box

Finally, we specialise to the example of Hilbert modules from Example 3.

Corollary 17 (Singular value decomposition on Hilbert modules) Let {\Omega_1, \Omega_2} be extensions of a finite measure space {\Omega}, with factor maps {\pi_1: \Omega_1 \rightarrow \Omega} and {\pi_2: \Omega_2 \rightarrow \Omega}. Let {T: L^2(\Omega_1) \rightarrow L^2(\Omega_2)} be a bounded linear map (in the {L^2} sense) which is also linear over {L^\infty(\Omega)} (which embeds via pullback into {L^\infty(\Omega_1)} and {L^\infty(\Omega_2)}). Assume the following Hilbert-Schmidt property: there exists a measurable {A: \Omega \rightarrow [0,+\infty)} such that

\displaystyle  \sum_{m=1}^M \sum_{n=1}^N |(\pi_2)_*( (T e_n) f_m )|^2 \leq A^2

for all measurable {M,N: \Omega \rightarrow {\bf N} \cup \{\infty\}} and all {e_n \in L^2(\Omega_1), f_m \in L^2(\Omega_2)} that are orthonormal systems over {L^\infty(\Omega_1)} in the sense that

\displaystyle  (\pi_1)_*(e_n e_{n'}) = 1_{n=n'}

whenever {n,n' \leq N}, and similarly

\displaystyle  (\pi_2)_*(f_m f_{m'}) = 1_{m=m'}

whenever {m,m' \leq M}. Then one can find a measurable {N: \Omega \rightarrow {\bf N} \cup \{\infty\}}, orthonormal systems {e_n, f_n \in L^2(\Omega_1)}, and {\sigma_n: \Omega \rightarrow (0,+\infty)} defined for {n \leq N} that are non-increasing and decay to zero as {n \rightarrow \infty} (in the case {N=+\infty}), such that

\displaystyle  T g = \sum_{n=1}^N \sigma_n (\pi_1)_*(g e_n) f_n

for all {g \in L^2(\Omega_1)}.

In the case that {\Omega} and {\Omega'} are standard Borel, one can obtain this result from the classical spectral theorem via a disintegration argument. However, without such a standard Borel assumption, the most natural way to proceed appears to be through the above topos-theoretic machinery. This result can be used to establish some of the basic facts of the Furstenberg-Zimmer structure theory of measure-preserving systems, and specifically that weakly mixing functions relative to a given factor form the orthogonal complement to compact extensions of that factor, and that such compact extensions are the inverse limit of finite rank extensions. With the above formalism, this can be done even for measure spaces that are not standard Borel, and actions of groups that are not countable; I hope to discuss this in a subsequent post.

Filed under: expository, math.CA, math.CT, math.PR, math.SP Tagged: elementary topos, Heine-Borel theorem, Hilbert modules, Hilbert-Schmidt operators, singular value decomposition, spectral theorem

July 16, 2014

Steinn SigurðssonAstrophotography

Astrophotography is the title of a gorgeous and very useful new book by Thierry Legault

I had to taper off doing book reviews, much to the annoyance of all those lovely people who persist in sending me just the sort of books that I actually really love to read – it just got too time consuming – but, when Rocky Nook told me Thierry Legault had a book on Astrophotgraphy coming out, I agreed to review it immediately.

This is why:

shades of Atlantis


From NASA HQ Photo on Flickr

Thierry Legault is an expert and he takes beautiful photos. He is particularly known for his patient set up of uniquely timed and angled shots such as the one above.

The book discusses how he does this, and the general art of astronomical photography, starting with serious amateur level photography with digital single lens reflex cameras and going on to semi-professional level photography with tracking telescopes and more advanced techniques.
In a couple of hundred pages M. Legault covers the equipment, the setup and composition, and then the primary targets: solar system, Sun, and deep sky, followed by processing.

I will probably never take photographs even close to being in the same league as Thierry, being a theorist and whatnot, and I may never end up reading through the whole book, but having it, browsing it, dipping into the various sections, looking up random snippets I had always wondered about.
In a few hours the book has already given me more pleasure than most that have sat on my shelf for years.

Most highly recommended.

ISBN-13: 978-1937538439
Publisher: Rocky Nook; 1st edition (July 6, 2014)