Planet Musings

May 21, 2015

Tommaso DorigoBang !! 13 TeV - The Highest Energy Ever Achieved By Mankind ?!

The LHC has finally started to produce 13-TeV proton-proton collisions!

The picture below shows one such collision, as recorded by the CMS experiment today. The blue boxes show the energy recorded in the calorimeter, which measures particle energy by "destroying" them as they interact with the dense layers of matter that this device is made up of; the yellow curves show tracks reconstructed by the ionization deposits of charged particles left in the silicon detector layers of the inner tracker. 

read more

David Hogg#ArloFest, day 1

Today was the first day of Landolt Standards & 21st Century Photometry in Baton Rouge, organized by Pagnotta (AMNH) and Clayton (LSU). I came to speak about self-calibration. The day started with a historical overview by Bessel (MSSSO), who gave a lovely talk filled with profiles of the many people who contributed to the development of photometric calibration and magnitude systems. Many of the people he talked about (including himself) have filter systems or magnitude systems named after them! Among the many interesting things he touched on was this paper by Johnson, which I have yet to carefully read, but apparently contains some of the philosophy behind standard-star systems. He also discussed the filter choices for the Skymapper project, which seem very considered.

Suntzeff (TAMU) gave an excellent talk about the limitations of the supernova cosmology projects; his main point is that systematic issues with the photometric calibration system are the dominant term in the uncertainty budget. This is important in thinking about where to apportion new resources. He made a great case for understanding physically every part of the photometric measurement system (and that includes the stars, the atmosphere, the telescope, and the detector pixels, among other things). I couldn't agree more!

Grindlay (CfA) blew us away with the scale and content of the DASCH plate-scanning project at Harvard. It is just awesome, in time span, cadence, and sky coverage. Anyone not searching these data is making a mistake! And, as we were recovering from that, Kafka (AAVSO) blew us away again with the scale and scope of the APASS survey, which was designed, built, operated, reduced, and delivered to the public almost entirely by citizen scientists. It is dramatic; we are not worthy!

There were many other great contributions—too many to mention them all—but the day ended with a crawfish boil and then Josh Peek (STScI) and I at the bar discussing recent explosive conversations in the astronomical community around TMT and development in Hawaii.

One last thing I should say: Arlo Landolt (LSU) has had a huge impact on astronomy; his work has enabled countless projects and scientific measurements and discoveries. The development and stewardship of photometric standards and systems, and all the attention to detail it requires, is unglamorous and time-consuming work, ill-suited to most of the community, and yet absolutely essential to everything we do. I can't thank Landolt—and his collaborators and the whole community of photometrists—enough.

John BaezInformation and Entropy in Biological Systems (Part 4)

I kicked off the workshop on Information and Entropy in Biological Systems with a broad overview of the many ways information theory and entropy get used in biology:

• John Baez, Information and entropy in biological systems.

Abstract. Information and entropy are being used in biology in many different ways: for example, to study biological communication systems, the ‘action-perception loop’, the thermodynamic foundations of biology, the structure of ecosystems, measures of biodiversity, and evolution. Can we unify these? To do this, we must learn to talk to each other. This will be easier if we share some basic concepts which I’ll sketch here.

The talk is full of links, in blue. If you click on these you can get more details. You can also watch a video of my talk:

ResonaancesHow long until it's interesting?

Last night, for the first time, the LHC  collided particles at the center-of-mass energy of 13 TeV. Routine collisions should follow early in June. The plan is to collect 5-10 inverse femtobarn (fb-1) of data before winter comes, adding to the 25 fb-1 from Run-1. It's high time dust off your Madgraph and tool up for what may be the most exciting time in particle physics in this century. But when exactly should we start getting excited? When should we start friending LHC experimentalists on facebook? When is the time to look over their shoulders for a glimpse of of gluinos popping out of the detectors. One simple way to estimate the answer is to calculate what is the luminosity when the number of particles produced  at 13 TeV will exceed that produced during the whole Run-1. This depends on the ratio of the production cross sections at 13 and 8 TeV which is of course strongly dependent on the particle's mass and production mechanism. Moreover, the LHC discovery potential will also depend on how the background processes change, and on a host of other experimental issues.  Nevertheless, let us forget for a moment about  the fine-print, and  calculate the ratio of 13 and 8 TeV cross sections for a few particles popular among the general public. This will give us a rough estimate of the threshold luminosity when things should get interesting.

  • Higgs boson: Ratio≈2.3; Luminosity≈10 fb-1.
    Higgs physics will not be terribly exciting this year, with only a modest improvement of the couplings measurements expected. 
  • tth: Ratio≈4; Luminosity≈6 fb-1.
    Nevertheless, for certain processes involving the Higgs boson the improvement may be a bit  faster. In particular, the theoretically very important process of Higgs production in association with top quarks (tth) was on the verge of being detected in Run-1. If we're lucky, this year's data may tip the scale and provide an evidence for a non-zero top Yukawa couplings. 
  • 300 GeV Higgs partner:  Ratio≈2.7 Luminosity≈9 fb-1.
    Not much hope for new scalars in the Higgs family this year.  
  • 800 GeV stops: Ratio≈10; Luminosity≈2 fb-1.
    800 GeV is close to the current lower limit on the mass of a scalar top partner decaying to a top quark and a massless neutralino. In this case, one should remember that backgrounds also increase at 13 TeV, so the progress will be a bit slower than what the above number suggests. Nevertheless,  this year we will certainly explore new parameter space and make the naturalness problem even more severe. Similar conclusions hold for a fermionic top partner. 
  • 3 TeV Z' boson: Ratio≈18; Luminosity≈1.2 fb-1.
    Getting interesting! Limits on Z' bosons decaying to leptons will be improved very soon; moreover, in this case background is not an issue.  
  • 1.4 TeV gluino: Ratio≈30; Luminosity≈0.7 fb-1.
    If all goes well, better limits on gluinos can be delivered by the end of the summer! 

In summary, the progress will be very fast for new heavy particles. In particular, for gluon-initiated production of TeV-scale particles  already the first inverse femtobarn may bring us into a new territory. For lighter particles the progress will be slower, especially when backgrounds are difficult.  On the other hand, precision physics, such as the Higgs couplings measurements, is unlikely to be in the spotlight this year.

John BaezInformation and Entropy in Biological Systems (Part 3)

We had a great workshop on information and entropy in biological systems, and now you can see what it was like. I think I’ll post these talks one a time, or maybe a few at a time, because they’d be overwhelming taken all at once.

So, let’s dive into Chris Lee’s exciting ideas about organisms as ‘information evolving machines’ that may provide ‘disinformation’ to their competitors. Near the end of his talk, he discusses some new results on an ever-popular topic: the Prisoner’s Dilemma. You may know about this classic book:

• Robert Axelrod, The Evolution of Cooperation, Basic Books, New York, 1984. Some passages available free online.

If you don’t, read it now! He showed that the simple ‘tit for tat’ strategy did very well in some experiments where the game was played repeatedly and strategies who did well got to ‘reproduce’ themselves. This result was very exciting, so a lot of people have done research on it. More recently a paper on this subject by William Press and Freeman Dyson received a lot of hype. I think this is a good place to learn about that:

• Mike Shulman, Zero determinant strategies in the iterated Prisoner’s Dilemma, The n-Category Café, 19 July 2012.

Chris Lee’s new work on the Prisoner’s Dilemma is here, cowritten with two other people who attended the workshop:

The art of war: beyond memory-one strategies in population games, PLOS One, 24 March 2015.

Abstract. We show that the history of play in a population game contains exploitable information that can be successfully used by sophisticated strategies to defeat memory-one opponents, including zero determinant strategies. The history allows a player to label opponents by their strategies, enabling a player to determine the population distribution and to act differentially based on the opponent’s strategy in each pairwise interaction. For the Prisoner’s Dilemma, these advantages lead to the natural formation of cooperative coalitions among similarly behaving players and eventually to unilateral defection against opposing player types. We show analytically and empirically that optimal play in population games depends strongly on the population distribution. For example, the optimal strategy for a minority player type against a resident tit-for-tat (TFT) population is ‘always cooperate’ (ALLC), while for a majority player type the optimal strategy versus TFT players is ‘always defect’ (ALLD). Such behaviors are not accessible to memory-one strategies. Drawing inspiration from Sun Tzu’s the Art of War, we implemented a non-memory-one strategy for population games based on techniques from machine learning and statistical inference that can exploit the history of play in this manner. Via simulation we find that this strategy is essentially uninvadable and can successfully invade (significantly more likely than a neutral mutant) essentially all known memory-one strategies for the Prisoner’s Dilemma, including ALLC (always cooperate), ALLD (always defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant (ZD) strategies, including extortionate and generous strategies.

And now for the talk! Click on the talk title here for Chris Lee’s slides, or go down and watch the video:

• Chris Lee, Empirical information, potential information and disinformation as signatures of distinct classes of information evolving machines.

Abstract. Information theory is an intuitively attractive way of thinking about biological evolution, because it seems to capture a core aspect of biology—life as a solution to “information problems”—in a fundamental way. However, there are non-trivial questions about how to apply that idea, and whether it has actual predictive value. For example, should we think of biological systems as being actually driven by an information metric? One idea that can draw useful links between information theory, evolution and statistical inference is the definition of an information evolving machine (IEM) as a system whose elements represent distinct predictions, and whose weights represent an information (prediction power) metric, typically as a function of sampling some iterative observation process. I first show how this idea provides useful results for describing a statistical inference process, including its maximum entropy bound for optimal inference, and how its sampling-based metrics (“empirical information”, Ie, for prediction power; and “potential information”, Ip, for latent prediction power) relate to classical definitions such as mutual information and relative entropy. These results suggest classification of IEMs into several distinct types:

1. Ie machine: e.g. a population of competing genotypes evolving under selection and mutation is an IEM that computes an Ie equivalent to fitness, and whose gradient (Ip) acts strictly locally, on mutations that it actually samples. Its transition rates between steady states will decrease exponentially as a function of evolutionary distance.

2. “Ip tunneling” machine: a statistical inference process summing over a population of models to compute both Ie, Ip can directly detect “latent” information in the observations (not captured by its model), which it can follow to “tunnel” rapidly to a new steady state.

3. disinformation machine (multiscale IEM): an ecosystem of species is an IEM whose elements (species) are themselves IEMs that can interact. When an attacker IEM can reduce a target IEM’s prediction power (Ie) by sending it a misleading signal, this “disinformation dynamic” can alter the evolutionary landscape in interesting ways, by opening up paths for rapid co-evolution to distant steady-states. This is especially true when the disinformation attack targets a feature of high fitness value, yielding a combination of strong negative selection for retention of the target feature, plus strong positive selection for escaping the disinformation attack. I will illustrate with examples from statistical inference and evolutionary game theory. These concepts, though basic, may provide useful connections between diverse themes in the workshop.

Clifford JohnsonSo the equations are not…

Working on rough layouts of one of the stories for the book. One rough panel ended up not looking so rough, and after Monday's ink dalliances I was itching to fiddle with brushes again, and then I thought I'd share. So... slightly less rough, shall we say? A more careful version would point the eyes a bit better, for example...(Much of the conversation, filling a bit more of the while space, has been suppressed - spoilers.) light_relief_II_sample -cvj Click to continue reading this post

Mark Chu-CarrollExpressions and Arity (part 1)

In the last post, we started looking at expressions. In this post, we’re going to continue doing that, and start looking at a simple form of expression typing called arity.

Before we can do that, we need to introduce a couple of new formalisms to complete our understanding of the elements that form expressions. The reason we need another formalism is because so far, we’ve defined the meaning of expressions in terms of function calls. But we’ve still got some ambiguity about our function calls – specifically, how do parameters work?

Suppose I’ve got a function, f. Can I call it as f(x)? Or f(x,y)? Both? Neither? It depends on exactly what function calls mean, and what it means to be a valid parameter to a function.

There are several approaches that you can take:

  1. You can say that a function application expression takes an arbitrary number of arguments. This is, roughly, what we do in dynamically typed programming languages like Python. In Python, you can write f(x) + f(x,y) + f(x, y, z, a, b, c), and the language parser will accept it. (It’ll complain when you try to execute it, but at the level of expressions, it’s a perfectly valid expression.)
  2. You can say that every function takes exactly one argument, and that a multi-argument function like f(x, y, z) is really a shorthand for a curried call sequence f(x)(y)(z) – and thus, the “application” operation really takes exactly 2 parameters – a function, and a single value to apply that function to. This is the approach that we usually take in lambda calculus.
  3. You can say that application takes two arguments, but the second one can be a combination of multiple objects – effectively a tuple. That’s what we do in a lot of versions of recursive function theory, and in programming languages like SML.

In the theory of expressions, Martin-Löf chose the third approach. A function takes a single parameter, which is a combination of a collection of other objects. If a, b, c, and d are expressions, then a, b, c, d is an expression called the combination of its four elements. This is very closely related to the idea of cartesian products in set theory, but it’s important to realize that it’s not the same thing. We’re not defining elements of a set here: we’re defining a syntactic construct of a pseudo-programming language, where one possible interpretation of it is cartesian products.

In addition to just multi-parameter functions, we’ll use combinations for other things. In type theory, we want to be able to talk about certain mathematical constructs as first-class objects. For example, we’d like to be able to talk about orderings, where an ordering is a collection of objects A combined with an operation \le. Using combinations, we can write that very naturally as A,\le.

For combinations to be useful, we need to be able to extract the component values from them. In set theory, we’d do that by having projection functions associated with the cross-product. In our theory of expressions, we have selection expressions. If x=(a, b, c, d) is a combination with four elements, then x.1 is a selection expression which extracts the first element from x.

In programming language terms, combinations give us a way of writing tuple values. Guess what’s next? Record types! Or rather, combinations with named elements. We can write a combination with names: x = (a: 1, b:2), and then write selection expressions using the names, like x.a.

Now we can start getting to the meat of things.

In combinatory logic, we’d just start with a collection of primitive constant values, and then build whatever we wanted with them using abstractions, applications, combinations, and selections. Combinatory logic is the parent of computation theory: why can’t we just stick with that foundation?

There are two answers to that. The first is familiar to programming language people: if you don’t put any restrictions on things, then you lose type safety. You’ll be able to write “valid” expressions that don’t mean anything – things like 1.x, even though “1” isn’t a combination, and so calling a selector on it makes no sense. Or you’d be able to call a function like factorial(27, 13), even though the function only takes one parameter.

The other answer is equality. One thing that we’d really like to be able to do in our theory of expressions is determine when two expressions are the same. For example, we’d really like to be able to do things like say that ((x)e)(x) == e. But without some form of control, we can’t really do that: the problem of determining whether or not two expressions are equal can become non-decidable. (The canonical example of this problem involves y combinator: ((x)x(x))((x)x(x)). If we wanted to try to work out whether an expression involving this was equivilant to another expression, we could wind up in an infinite loop of applications.)

The way that Martin-Löf worked around this is to associate an arity with an expression. Each arity of expressions is distinct, and there are rules about what operations can be applied to an expression depending on its arity. You can’t call .2 on “1+3″, because “1+3″ is a single expression, and selectors can only be applied to combined expressions.

To most of us, arity sounds like it should be a numeric value. When we we talk about the arity of a function in a program, what we mean is how many parameters it takes. In Martin-Löf expressions, arity is more expressive than that: it’s almost the same thing as types in the simply typed lambda calculus.

There are two dimensions to arity: single/combined and saturated/unsaturated.

Single expressions are atomic values, where you can’t extract other values from them by selection; multiple expressions are combinations of multiple other expressions.

Saturated expressions are expressions that have no holes in them that can be filled by other expressions – that is, expressions with no free variables. Unsaturated expressions have open holes – which means that they can be applied to other expressions.

Saturated single expressions have arity 0. An expression of arity 0 can’t be applied, and you can’t be the target of selection expressions.

An unsaturated expression has an arity (\alpha \twoheadrightarrow \beta), where both \alpha and \beta are arities. For example, the integer addition function has arity (0 \otimes 0 \twoheadrightarrow 0). (Erik Eidt pointed out that I made an error here. I originally wrote addition as 0 \twoheadrightarrow 0, where it should have been 0 \otimes 0 \twoheadrightarrow 0.)

A combined expression (e_0, e_1, ..., e_n) has arity (\alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n), where each of the \alpha_is are the arities of the expression e_i.

Sadly, I’m out of time to work on this post, so we’ll have to stop here. Next time, we’ll look at the formal rules for arities, and how to use them to define equality of expressions.

Tommaso DorigoEU Grants Submitted And Won: Some Statistics

The European Union has released some data on the latest call for applications for ITN grants. These are "training networks" where academic and non-academic institutions pool up to provide innovative training to doctoral students, in the meanting producing excellent research outputs.

read more

n-Category Café The Origin of the Word "Quandle"

A quandle is a set equipped with a binary operation with number of properties, the most important being that it distributes over itself:

a(bc)=(ab)(ac) a \triangleright (b \triangleright c) = (a \triangleright b)\triangleright (a \triangleright c)

They show up in knot theory, where they capture the essence of how the strands of a knot cross over each other… yet they manage to give an invariant of a knot, independent of the way you draw it. Even better, the quandle is a complete invariant of knots: if two knots have isomorphic quandles, there’s a diffeomorphism of 3\mathbb{R}^3 mapping one knot to the other.

I’ve always wondered where the name ‘quandle’ came from. So I decided to ask their inventor, David Joyce—who also proved the theorem I just mentioned.

He replied:

I needed a usable word. “Distributive algebra” had too many syllables. Piffle was already taken. I tried trindle and quagle, but they didn’t seem right, so I went with quandle.

So there you go! Another mystery unraveled.

David Hoggprobabilistic cosmology

Boris Leistedt (UCL) showed up for two days of chatting in preparation for his arrival at NYU this coming Fall. We discussed a set of projects in probabilistic cosmology. In one (which I have discussed previously with Fadely), we imagine what it would look like to infer galaxy redshifts from imaging without either a training set or models of galaxy spectral energy distributions. I feel like it might be possible, with the thought experiment being: What if you were given the photometry of the SDSS Luminous Red Galaxies? Couldn't you figure out their redshifts without being told their underlying spectral energy distributions? At least up to some geometric factor? Leistedt predicts that any method that doesn't use either models or training must do worse than any method that has good training data or models. However, it might make a framework for then including both model (theory) and training (data) information in in a principled way.

In another class of subjects, we talked about inference of the density field or the initial conditions. In the first place, if we could infer the density field in patches, we could use that to inform photometric redshifts or reconstruction. In the second, we could perhaps infer the initial conditions (and I mean the phases, not just the power spectrum); this is interesting because the Baryon Acoustic Feature ought to be much stronger and sharper in the initial conditions than it is today. We discussed some conceivably tractable approaches, some based on likelihood-free inference, some based on Gaussian Processes, and some based on machine learning (using simulations as training data!).

May 20, 2015

David Hoggself-calibration for Arlo

I got no research time in today until I headed to the airport, where I worked on my presentation to the Landolt Standards & 21st Century Photometry Symposium in honor of Arlo Landolt (LSU). I will be talking about self-calibration, the set of methods that have replaced Landoldt's (amazingly productive and important) standard-star catalogs for the calibration of huge surveys, which can (in principle) now be self-calibrated. I will talk about both the theory and the practice of the self-calibration arts.

Chad OrzelBreaking Boards

One of the highlights of teaching introductory mechanics is always the “karate board” lab, which I start off by punching through a wooden board. That gets the class’s attention, and then we have them hang weights on boards and measure the deflection in response to a known force. This confirms that the board behaves like a spring, and you can analyze the breaking in terms of energy, estimating the energy stored in the board, and the speed a fist must have to punch through the board. As a sort of empirical test, we can drop a half-kilogram mass from the appropriate height to match the calculated speed, and see if it goes through.

As mentioned earlier this week, I have a camera that shoots high-speed video now, so of course I decided to get a shot of the breaking boards:

It’s interesting to see how quickly the break happens– this is shot at 1000 fps, and between one frame and the next, the board goes from solid-but-bent to broken more than halfway through. It’s also interesting that in the second break shown in the video, the break occurs at a point a good deal away from the spot where the mass hits.

And, of course, there’s the obligatory gag reel at the end, from one of the several shots where the board didn’t break, and the mass bounced up to knock over the big demo caliper set that I propped up to provide a reference scale behind the board. Good times, good times…

John BaezPROPs for Linear Systems

Eric Drexler likes to say: engineering is dual to science, because science tries to understand what the world does, while engineering is about getting the world to do what you want. I think we need a slightly less ‘coercive’, more ‘cooperative’ approach to the world in order to develop ‘ecotechnology’, but it’s still a useful distinction.

For example, classical mechanics is the study of what things do when they follow Newton’s laws. Control theory is the study of what you can get them to do.

Say you have an upside-down pendulum on a cart. Classical mechanics says what it will do. But control theory says: if you watch the pendulum and use what you see to move the cart back and forth correctly, you can make sure the pendulum doesn’t fall over!

Control theorists do their work with the help of ‘signal-flow diagrams’. For example, here is the signal-flow diagram for an inverted pendulum on a cart:

When I take a look at a diagram like this, I say to myself: that’s a string diagram for a morphism in a monoidal category! And it’s true. Jason Erbele wrote a paper explaining this. Independently, Bonchi, Sobociński and Zanasi did some closely related work:

• John Baez and Jason Erbele, Categories in control.

• Filippo Bonchi, Paweł Sobociński and Fabio Zanasi, Interacting Hopf algebras.

• Filippo Bonchi, Paweł Sobociński and Fabio Zanasi, A categorical semantics of signal flow graphs.

I’ll explain some of the ideas at the Turin meeting on the categorical foundations of network theory. But I also want to talk about this new paper that Simon Wadsley of Cambridge University wrote with my student Nick Woods:

• Simon Wadsley and Nick Woods, PROPs for linear systems.

This makes the picture neater and more general!

You see, Jason and I used signal flow diagrams to give a new description of the category of finite-dimensional vector spaces and linear maps. This category plays a big role in the control theory of linear systems. Bonchi, Sobociński and Zanasi gave a closely related description of an equivalent category, \mathrm{Mat}(k), where:

• objects are natural numbers, and

• a morphism f : m \to n is an n \times m matrix with entries in the field k,

and composition is given by matrix multiplication.

But Wadsley and Woods generalized all this work to cover \mathrm{Mat}(R) whenever R is a commutative rig. A rig is a ‘ring without negatives’—like the natural numbers. We can multiply matrices valued in any rig, and this includes some very useful examples… as I’ll explain later.

Wadsley and Woods proved:

Theorem. Whenever R is a commutative rig, \mathrm{Mat}(R) is the PROP for bicommutative bimonoids over R.

This result is quick to state, but it takes a bit of explaining! So, let me start by bringing in some definitions.

Bicommutative bimonoids

We will work in any symmetric monoidal category, and draw morphisms as string diagrams.

A commutative monoid is an object equipped with a multiplication:

and a unit:

obeying these laws:

For example, suppose \mathrm{FinVect}_k is the symmetric monoidal category of finite-dimensional vector spaces over a field k, with direct sum as its tensor product. Then any object V \in \mathrm{FinVect}_k is a commutative monoid where the multiplication is addition:

(x,y) \mapsto x + y

and the unit is zero: that is, the unique map from the zero-dimensional vector space to V.

Turning all this upside down, cocommutative comonoid has a comultiplication:

and a counit:

obeying these laws:

For example, consider our vector space V \in \mathrm{FinVect}_k again. It’s a commutative comonoid where the comultiplication is duplication:

x \mapsto (x,x)

and the counit is deletion: that is, the unique map from V to the zero-dimensional vector space.

Given an object that’s both a commutative monoid and a cocommutative comonoid, we say it’s a bicommutative bimonoid if these extra axioms hold:

You can check that these are true for our running example of a finite-dimensional vector space V. The most exciting one is the top one, which says that adding two vectors and then duplicating the result is the same as duplicating each one, then adding them appropriately.

Our example has some other properties, too! Each element c \in k defines a morphism from V to itself, namely scalar multiplication by c:

x \mapsto c x

We draw this as follows:

These morphisms are compatible with the ones so far:

Moreover, all the ‘rig operations’ in k—that is, addition, multiplication, 0 and 1, but not subtraction or division—can be recovered from what we have so far:

We summarize this by saying our vector space V is a bicommutative bimonoid ‘over k‘.

More generally, suppose we have a bicommutative bimonoid A in a symmetric monoidal category. Let \mathrm{End}(A) be the set of bicommutative bimonoid homomorphisms from A to itself. This is actually a rig: there’s a way to add these homomorphisms, and also a way to ‘multiply’ them (namely, compose them).

Suppose R is any commutative rig. Then we say A is a bicommutative bimonoid over R if it’s equipped with a rig homomorphism

\Phi : R \to \mathrm{End}(A)

This is a way of summarizing the diagrams I just showed you! You see, each c \in R gives a morphism from A to itself, which we write as

The fact that this is a bicommutative bimonoid endomorphism says precisely this:

And the fact that \Phi is a rig homomorphism says precisely this:

So sometimes the right word is worth a dozen pictures!

What Jason and I showed is that for any field k, the \mathrm{FinVect}_k is the free symmetric monoidal category on a bicommutative bimonoid over k. This means that the above rules, which are rules for manipulating signal flow diagrams, completely characterize the world of linear algebra!

Bonchi, Sobociński and Zanasi used ‘PROPs’ to prove a similar result where the field is replaced by a sufficiently nice commutative ring. And Wadlsey and Woods used PROPS to generalize even further to the case of an arbitrary commutative rig!

But what are PROPs?


A PROP is a particularly tractable sort of symmetric monoidal category: a strict symmetric monoidal category where the objects are natural numbers and the tensor product of objects is given by ordinary addition. The symmetric monoidal category \mathrm{FinVect}_k is equivalent to the PROP \mathrm{Mat}(k), where a morphism f : m \to n is an n \times m matrix with entries in k, composition of morphisms is given by matrix multiplication, and the tensor product of morphisms is the direct sum of matrices.

We can define a similar PROP \mathrm{Mat}(R) whenever R is a commutative rig, and Wadsley and Woods gave an elegant description of the ‘algebras’ of \mathrm{Mat}(R). Suppose C is a PROP and D is a strict symmetric monoidal category. Then the category of algebras of C in D is the category of strict symmetric monoidal functors F : C \to D and natural transformations between these.

If for every choice of D the category of algebras of C in D is equivalent to the category of algebraic structures of some kind in D, we say C is the PROP for structures of that kind. This explains the theorem Wadsley and Woods proved:

Theorem. Whenever R is a commutative rig, \mathrm{Mat}(R) is the PROP for bicommutative bimonoids over R.

The fact that an algebra of \mathrm{Mat}(R) is a bicommutative bimonoid is equivalent to all this stuff:

The fact that \Phi(c) is a bimonoid homomorphism for all c \in R is equivalent to this stuff:

And the fact that \Phi is a rig homomorphism is equivalent to this stuff:

This is a great result because it includes some nice new examples.

First, the commutative rig of natural numbers gives a PROP \mathrm{Mat}. This is equivalent to the symmetric monoidal category \mathrm{FinSpan}, where morphisms are isomorphism classes of spans of finite sets, with disjoint union as the tensor product. Steve Lack had already shown that \mathrm{FinSpan} is the PROP for bicommutative bimonoids. But this also follows from the result of Wadsley and Woods, since every bicommutative bimonoid V is automatically equipped with a unique rig homomorphism

\Phi : \mathbb{N} \to \mathrm{End}(V)

Second, the commutative rig of booleans

\mathbb{B} = \{F,T\}

with ‘or’ as addition and ‘and’ as multiplication gives a PROP \mathrm{Mat}(\mathbb{B}). This is equivalent to the symmetric monoidal category \mathrm{FinRel} where morphisms are relations between finite sets, with disjoint union as the tensor product. Samuel Mimram had already shown that this is the PROP for special bicommutative bimonoids, meaning those where comultiplication followed by multiplication is the identity:

But again, this follows from the general result of Wadsley and Woods!

Finally, taking the commutative ring of integers \mathbb{Z}, Wadsley and Woods showed that \mathrm{Mat}(\mathbb{Z}) is the PROP for bicommutative Hopf monoids. The key here is that scalar multiplication by -1 obeys the axioms for an antipode—the extra morphism that makes a bimonoid into a Hopf monoid. Here are those axioms:

More generally, whenever R is a commutative ring, the presence of -1 \in R guarantees that a bimonoid over R is automatically a Hopf monoid over R. So, when R is a commutative ring, Wadsley and Woods’ result implies that \mathrm{Mat}(R) is the PROP for Hopf monoids over R.

Earlier, in their paper on ‘interacting Hopf algebras’, Bonchi, Sobociński and Zanasi had given an elegant and very different proof that \mathrm{Mat}(R) is the PROP for Hopf monoids over R whenever R is a principal ideal domain. The advantage of their argument is that they build up the PROP for Hopf monoids over R from smaller pieces, using some ideas developed by Steve Lack. But the new argument by Wadsley and Woods has its own charm.

In short, we’re getting the diagrammatics of linear algebra worked out very nicely, providing a solid mathematical foundation for signal flow diagrams in control theory!

Chad OrzelOn Toys in Science

The big social media blow-up of the weekend was, at least on the science-y side of things, the whole “boys with toys” thing, stemming from this NPR interview, which prompted the #GirlsWithToys hashtag in response. I’m not sorry to have missed most of the original arguments while doing stuff with the kids, but the hashtag has some good stuff.

The really unfortunate thing about this is that the point the guy was trying to make in the interview was a good one: there’s an essentially playful component to science, even at the professional level. I took a stab at making this same point over at Forbes, only without the needlessly gendered language to make people angry.

(Which, of course, will cut into its readership, but I’m not so far gone that I’ll resort to deliberately offensive clickbait…)

ResonaancesAntiprotons from AMS

This week the AMS collaboration released the long expected measurement of the cosmic ray antiproton spectrum.  Antiprotons are produced in our galaxy in collisions of high-energy cosmic rays with interstellar matter, the so-called secondary production.  Annihilation of dark matter could add more antiprotons on top of that background, which would modify the shape of the spectrum with respect to the prediction from the secondary production. Unlike for cosmic ray positrons, in this case there should be no significant primary production in astrophysical sources such as pulsars or supernovae. Thanks to this, antiprotons could in principle be a smoking gun of dark matter annihilation, or at least a powerful tool to constrain models of WIMP dark matter.

The new data from the AMS-02 detector extend the previous measurements from PAMELA up to 450 GeV and significantly reduce experimental errors at high energies. Now, if you look at the  promotional material, you may get an impression that a clear signal of dark matter has been observed.  However,  experts unanimously agree that the brown smudge in the plot above is just shit, rather than a range of predictions from the secondary production. At this point, there is certainly no serious hints for dark matter contribution to the antiproton flux. A quantitative analysis of this issue appeared in a paper today.  Predicting  the antiproton spectrum is subject to large experimental uncertainties about the flux of cosmic ray proton and about the nuclear cross sections, as well as theoretical uncertainties inherent in models of cosmic ray propagation. The  data and the predictions are compared in this Jamaican band plot. Apparently, the new AMS-02 data are situated near the upper end of the predicted range.

Thus, there is no currently no hint of dark matter detection. However, the new data are extremely useful to constrain models of dark matter. New constraints on the annihilation cross section of dark matter  are shown in the plot to the right. The most stringent limits apply to annihilation into b-quarks or into W bosons, which yield many antiprotons after decay and hadronization. The thermal production cross section - theoretically preferred in a large class of WIMP dark matter models - is in the  case of b-quarks excluded for the mass of the dark matter particle below 150 GeV. These results provide further constraints on models addressing the hooperon excess in the gamma ray emission from the galactic center.

More experimental input will allow us to tune the models of cosmic ray propagation to better predict the background. That, in turn, should lead to  more stringent limits on dark matter. Who knows... maybe a hint for dark matter annihilation will emerge one day from this data; although, given the uncertainties,  it's unlikely to ever be a smoking gun.

Thanks to Marco for comments and plots. 

ResonaancesWhat If, Part 1

This is the do-or-die year, so Résonaances will be dead serious. This year, no stupid jokes on April Fools' day: no Higgs in jail, no loose cables, no discovery of supersymmetry, or such. Instead, I'm starting with a new series "What If" inspired  by XKCD.  In this series I will answer questions that everyone is dying to know the answer to. The first of these questions is

If HEP bloggers were Muppets,
which Muppet would they be? 

Here is  the answer.

  • Gonzo the Great: Lubos@Reference Frame (on odd-numbered days)
    The one true uncompromising artist. Not treated seriously by other Muppets, but adored by chicken.
  • Animal: Lubos@Reference Frame (on even-numbered days)
    My favorite Muppet. Pure mayhem and destruction. Has only two modes: beat it, or eat it.
  • Swedish Chef: Tommaso@Quantum Diaries Survivor
    The Muppet with a penchant for experiment. No one understands what he says but it's always amusing nonetheless.
  • Kermit the Frog: Matt@Of Particular Significance
    Born Muppet leader, though not clear if he really wants the job.
  • Miss Piggy: Sabine@Backreaction
    Not the only female Muppet, but certainly the best known. Admired for her stage talents but most of all for her punch.
  • Rowlf: Sean@Preposterous Universe
    The real star and one-Muppet orchestra. Impressive as an artist or and as a comedian, though some complain he's gone to the dogs.

  • Statler and Waldorf: Peter@Not Even Wrong
    Constantly heckling other Muppets from the balcony, yet every week back for more.
  • Fozzie Bear:  Jester@Résonaances
    Failed stand-up comedian. Always stressed that he may not be funny after all.
If you have a match for  Bunsen, Beaker, or Dr Strangepork, let me know in the comments.

In preparation:
-If theoretical physicists were smurfs... 

-If LHC experimentalists were Game of Thrones characters...
-If particle physicists lived in Middle-earth... 

-If physicists were cast for Hobbit's dwarves... 
and more. 

Terence TaoFailure of the L^1 pointwise and maximal ergodic theorems for the free group

I’ve just uploaded to the arXiv my paper “Failure of the {L^1} pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map {T: X \rightarrow X} on a probability space {X = (X,\mu)}, then for any {f \in L^1(X)}, the averages {\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}} converge pointwise almost everywhere. (In the important case when the shift map {T} is ergodic, the pointwise limit is simply the mean {\int_X f\ d\mu} of the original function {f}.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group {F_2} on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for {F_2}-actions {(T_g)_{g \in F_2}} on probability spaces {(X,\mu)}. For instance, for the spherical averaging operators

\displaystyle  {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}

(where {|g|} denotes the length of the reduced word that forms {g}), they showed that {{\mathcal A}_{2n} f} converged pointwise almost everywhere provided that {f} was in {L^p(X)} for some {p>1}. (The need to restrict to spheres of even radius can be seen by considering the action of {F_2} on the two-element set {\{0,1\}} in which both generators of {F_2} act by interchanging the elements, in which case {{\mathcal A}_n} is determined by the parity of {n}.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition {f \in L^p(X)} to the weaker condition {f \in L \log L(X)}.

The question remained open as to whether the pointwise ergodic theorem for {F_2}-actions held if one only assumed that {f} was in {L^1(X)}. Nevo and Stein were able to establish this for the Cesáro averages {\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}, but not for {{\mathcal A}_n} itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on {\ell^1(F_2)}, but due to the non-amenability of {F_2}, this inequality did not transfer to {L^1(X)} and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the {L^1} pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our {\ell^1(F_2)} maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an {L^1} ergodic theorem for iterates {P^n} of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the {F_2} setting, thus settling the problem in the negative:

Theorem 1 (Failure of {L^1} pointwise ergodic theorem) There exists a measure-preserving {F_2}-action on a probability space {X} and a non-negative function {f \in L^1(X)} such that {\sup_n {\mathcal A}_{2n} f(x) = +\infty} for almost every {x}.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} such that {\sup_n P^n f(x) = +\infty} for almost every {x}. By some standard manipulations, it suffices to show that for any given {\alpha > 0} and {\varepsilon>0}, there exists a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} with {\|f\|_{L^1(X)} \leq \alpha}, such that {\sup_n P^n f \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Actually, it will be convenient to replace the Markov chain {(P^n f)_{n \geq 0}} with an ancient Markov chain {(f_n)_{n \in {\bf Z}}} – that is to say, a sequence of non-negative functions {f_n} for both positive and negative {f}, such that {f_{n+1} = P f_n} for all {n \in {\bf Z}}. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the {F_2} version of the argument can be run using infinitely ancient chains.)

For any {\alpha>0}, let {P(\alpha)} denote the claim that for any {\varepsilon>0}, there exists an ancient Markov chain {(f_n)_{n \in {\bf Z}}} with {\|f_n\|_{L^1(X)} = \alpha} such that {\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Clearly {P(1)} holds since we can just take {f_n=1} for all {n}. Our objective is to show that {P(\alpha)} holds for arbitrarily small {\alpha}. The heart of Ornstein’s argument is then the implication

\displaystyle  P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)

for any {0 < \alpha \leq 1}, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain {(f_n)_{n \in {\bf Z}}} on some probability space {X} of total mass {\|f_n\|_{L^1(X)} = \alpha}, such that {\sup_n f_n} attains the value of {1} or greater almost everywhere. Assuming that the Markov process is irreducible, the {f_n} will eventually converge as {n \rightarrow \infty} to the constant value of {\|f_n\|_{L^1(X)}}, in particular its final state will essentially stay above {\alpha} (up to small errors).

Now suppose we duplicate the Markov process by replacing {X} with a double copy {X \times \{1,2\}} (giving {\{1,2\}} the uniform probability measure), and using the disjoint sum of the Markov operators on {X \times \{1\}} and {X \times \{2\}} as the propagator, so that there is no interaction between the two components of this new system. Then the functions {f'_n(x,i) := f_n(x) 1_{i=1}} form an ancient Markov chain of mass at most {\alpha/2} that lives solely in the first half {X \times \{1\}} of this copy, and {\sup_n f'_n} attains the value of {1} or greater on almost all of the first half {X \times \{1\}}, but is zero on the second half. The final state of {f'_n} will be to stay above {\alpha} in the first half {X \times \{1\}}, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves {X \times \{1\}}, {X \times \{2\}} of the system (I mentally think of {X \times \{1\}} and {X \times \{2\}} as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain {(f'_n)_{n \in {\bf Z}}} in the previous example gets replaced by a slightly different ancient Markov chain {(f''_n)_{n \in {\bf Z}}} which is more or less identical with {f'_n} for negative times {n}, or for bounded positive times {n}, but for very large values of {n} the final state is now constant across the entire state space {X \times \{1,2\}}, and will stay above {\alpha/2} on this space.

Finally, we consider an ancient Markov chain {F_n} which is basically of the form

\displaystyle  F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}

for some large parameter {M} and for all {n \leq M} (the approximation becomes increasingly inaccurate for {n} much larger than {M}, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces {X \times \{1\}, X \times \{2\}}, but with the second copy delayed by a large time delay {M}, and also attenuated in amplitude by a factor of {1-\frac{\alpha}{2}}. The total mass of this process is now {\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}. Because of the {f''_n} component of {F_n}, we see that {\sup_n F_n} basically attains the value of {1} or greater on the first half {X \times \{1\}}. On the second half {X \times \{2\}}, we work with times {n} close to {M}. If {M} is large enough, {f''_n} would have averaged out to about {\alpha/2} at such times, but the {(1 - \frac{\alpha}{2}) f_{n-M}(x)} component can get as large as {1-\alpha/2} here. Summing (and continuing to ignore various epsilon losses), we see that {\sup_n F_n} can get as large as {1} on almost all of the second half of {X \times \{2\}}. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages {{\mathcal A}_n} for a free group action can be lifted up to become powers {P^n} of a Markov operator, basically by randomly assigning a “velocity vector” {s \in \{a,b,a^{-1},b^{-1}\}} to one’s base point {x} and then applying the Markov process that moves {x} along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from {s} to {s^{-1}}). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of {F_2} systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with {F_2}-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case {P(1)}) comes from basically considering the action of {F_2} on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time {n=-\infty}, and only reaches the boundary of this ball at the time {n=0}.

Filed under: math.CA, math.DS, paper Tagged: free group, maximal ergodic theorem, pointwise ergodic theorem

May 19, 2015

Mark Chu-CarrollBad Comparisons with Statistics

When a friend asks me to write about something, I try do it. Yesterday, a friend of mine from my Google days, Daniel Martin, sent me a link, and asked to write about it. Daniel isn’t just a former coworker of mine, but he’s a math geek with the same sort of warped sense of humor as me. He knew my blog before we worked at Google, and on my first Halloween at Google, he came to introduce himself to me. He was wearing a purple shirt with his train ticket on a cord around his neck. For those who know any abstract algebra, get ready to groan: he was purple, and he commuted. He was dressed as an Abelian grape.

Anyway, Daniel sent me a link to this article, and asked me to write about the error in it.

The real subject of the article involves a recent twitter-storm around a professor at Boston University. This professor tweeted some about racism and history, and she did it in very blunt, not-entirely-professional terms. The details of what she did isn’t something I want to discuss here. (Briefly, I think it wasn’t a smart thing to tweet like that, but plenty of white people get away with worse every day; the only reason that she’s getting as much grief as she is is because she dared to be a black woman saying bad things about white people, and the assholes at Breitbart used that to fuel the insatiable anger and hatred of their followers.)

But I don’t want to go into the details of that here. Lots of people have written interesting things about it, from all sides. Just by posting about this, I’m probably opening myself up to yet another wave of abuse, but I’d prefer to avoid and much of that as I can. Instead, I’m just going to rip out the introduction to this article, because it makes a kind of incredibly stupid mathematical argument that requires correction. Here are the first and second paragraphs:

There aren’t too many African Americans in higher education.

In fact, black folks only make up about 4 percent of all full time tenured college faculty in America. To put that in context, only 14 out of the 321—that’s about 4 percent—of U.S. astronauts have been African American. So in America, if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor.

Statistics and probability can be a difficult field of study. But… a lot of its everyday uses are really quite easy. If you’re going to open your mouth and make public statements involving probabilities, you probably should make sure that you at least understand the first chapter of “probability for dummies”.

This author doesn’t appear to have done that.

The most basic fact of understanding how to compare pretty much anything numeric in the real world is that you can only compare quantities that have the same units. You can’t compare 4 kilograms to 5 pounds, and conclude that 5 pounds is bigger than 4 kilograms because 5 is bigger than four.

That principle applies to probabilities and statistics: you need to make sure that you’re comparing apples to apples. If you compare an apple to a grapefruit, you’re not going to get a meaningful result.

The proportion of astronauts who are black is 14/321, or a bit over 4%. That means that out of every 100 astronauts, you’d expect to find four black ones.

The proportion of college professors who are black is also a bit over 4%. That means that out of every 100 randomly selected college professors, you’d expect 4 to be black.

So far, so good.

But from there, our intrepid author takes a leap, and says “if you’re black, you’ve got about as good a chance of being shot into space as you do getting a job as a college professor”.

Nothing in the quoted statistic in any way tells us anything about anyone’s chances to become an astronaut. Nothing at all.

This is a classic statistical error which is very easy to avoid. It’s a unit error: he’s comparing two things with different units. The short version of the problem is: he’s comparing black/astronaut with astronaut/black.

You can’t derive anything about the probability of a black person becoming an astronaut from the ratio of black astronauts to astronauts.

Let’s pull out some numbers to demonstrate the problem. These are completely made up, to make the calculations easy – I’m not using real data here.

Suppose that:

  • the US population is 300,000,000;
  • black people are 40% of the population, which means that there are are 120,000,000 black people.
  • there are 1000 universities in America, and there are 50 faculty per university, so there are 50,000 university professors.
  • there are 50 astronauts in the US.
  • If 4% of astronauts and 4% of college professors are black, that means that there are 2,000 black college professors, and 2 black astronauts.

In this scenario, as in reality, the percentage of black college professors and the percentage of black astronauts are equal. What about the probability of a given black person being a professor or an astronaut?

The probability of a black person being a professor is 2,000/120,000,000 – or 1 in 60,000. The probability of a black person becoming an astronaut is just 2/120,000,000 – or 1 in 60 million. Even though the probability of a random astronaut being black is the same as a the probability of a random college professor being black, the probability of a given black person becoming a college professor is 10,000 times higher that the probability of a given black person becoming an astronaut.

This kind of thing isn’t rocket science. My 11 year old son has done enough statistics in school to understand this problem! It’s simple: you need to compare like to like. If you can’t understand that, if you can’t understand your statistics enough to understand their units, you should probably try to avoid making public statements about statistics. Otherwise, you’ll wind up doing something stupid, and make yourself look like an idiot.

(In the interests of disclosure: an earlier version of this post used the comparison of apples to watermelons. But given the racial issues discussed in the post, that had unfortunate unintended connotations. When someone pointed that out to me, I changed it. To anyone who was offended: I am sorry. I did not intend to say anything associated with the racist slurs; I simply never thought of it. I should have, and I shouldn’t have needed someone to point it out to me. I’ll try to be more careful in the future.)

Clifford JohnsonReady for the day…

I have prepared the Tools of the Office of Dad*: ready_for_the_day -cvj *At least until lunchtime. Then, another set to prep... Click to continue reading this post

Clifford Johnson‘t Hooft on Scale Invariance…

Worth a read: This is 't Hooft's summary (link is a pdf) of a very interesting idea/suggestion about scale invariance and its possible role in finding an answer to a number of puzzles in physics. (It is quite short, but think I'll need to read it several times and mull over it a lot.) It won the top Gravity Research foundation essay prize this year, and there were several other interesting essays in the final list too. See here. -cvj Click to continue reading this post

Clifford JohnsonQuick Experiment…

On my way back from commencement day on campus last Friday I got to spend a bit of time on the subway, and for the first time in a while I got to do a quick sketch. (I have missed the subway so much!) Yesterday, at home, I found myself with a couple of new brushes that I wanted to try out, and so I did a brushed ink sketch from the sketch... quick_ink_experimentIt felt good to flow the ink around - haven't done that in a while either. Then I experimented with splashing a bit of digital colour underneath it. (This is all with the graphic book project in mind, where I think at least one story might [...] Click to continue reading this post

Scott Aaronson“Is There Something Mysterious About Math?”

When it rains, it pours: after not blogging for a month, I now have a second thing to blog about in as many days.  Aeon, an online magazine, asked me to write a short essay responding to the question above, so I did.  My essay is here.  Spoiler alert: my thesis is that yes, there’s something “mysterious” about math, but the main mystery is why there isn’t even more mystery than there is.  Also—shameless attempt to get you to click—the essay discusses the “discrete math is just a disorganized mess of random statements” view of Luboš Motl, who’s useful for putting flesh on what might otherwise be a strawman position.  Comments welcome (when aren’t they?).  You should also read other interesting responses to the same question by Penelope Maddy, James Franklin, and Neil Levy.  Thanks very much to Ed Lake at Aeon for commissioning these pieces.

Update (4/22): On rereading my piece, I felt bad that it didn’t make a clear enough distinction between two separate questions:

  1. Are there humanly-comprehensible explanations for why the mathematical statements that we care about are true or false—thereby rendering their truth or falsity “non-mysterious” to us?
  2. Are there formal proofs or disproofs of the statements?

Interestingly, neither of the above implies the other.  Thus, to take an example from the essay, no one has any idea how to prove that the digits 0 through 9 occur with equal frequency in the decimal expansion of π, and yet it’s utterly non-mysterious (at a “physics level of rigor”) why that particular statement should be true.  Conversely, there are many examples of statements for which we do have proofs, but which experts in the relevant fields still see as “mysterious,” because the proofs aren’t illuminating or explanatory enough.  Any proofs that require gigantic manipulations of formulas, “magically” terminating in the desired outcome, probably fall into that class, as do proofs that require computer enumeration of cases (like that of the Four-Color Theorem).

But it’s not just that proof and explanation are incomparable; sometimes they might even be at odds.  In this MathOverflow post, Timothy Gowers relates an interesting speculation of Don Zagier, that statements like the equidistribution of the digits of π might be unprovable from the usual axioms of set theory, precisely because they’re so “obviously” true—and for that very reason, there need not be anything deeper underlying their truth.  As Gowers points out, we shouldn’t go overboard with this speculation, because there are plenty of other examples of mathematical statements (the Green-Tao theorem, Vinogradov’s theorem, etc.) that also seem like they might be true “just because”—true only because their falsehood would require a statistical miracle—but for which mathematicians nevertheless managed to give fully rigorous proofs, in effect formalizing the intuition that it would take a miracle to make them false.

Zagier’s speculation is related to another objection one could raise against my essay: while I said that the “Gödelian gremlin” has remained surprisingly dormant in the 85 years since its discovery (and that this is a fascinating fact crying out for explanation), who’s to say that it’s not lurking in some of the very open problems that I mentioned, like π’s equidistribution, the Riemann Hypothesis, the Goldbach Conjecture, or P≠NP?  Conceivably, not only are all those conjectures unprovable from the usual axioms of set theory, but their unprovability is itself unprovable, and so on, so that we could never even have the satisfaction of knowing why we’ll never know.

My response to these objections is basically just to appeal yet again to the empirical record.  First, while proof and explanation need not go together and sometimes don’t, by and large they do go together: over thousands over years, mathematicians learned to seek formal proofs largely because they discovered that without them, their understanding constantly went awry.  Also, while no one can rule out that P vs. NP, the Riemann Hypothesis, etc., might be independent of set theory, there’s very little in the history of math—including in the recent history, which saw spectacular proofs of (e.g.) Fermat’s Last Theorem and the Poincaré Conjecture—that lends concrete support to such fatalism.

So in summary, I’d say that history does present us with “two mysteries of the mathematical supercontinent”—namely, why do so many of the mathematical statements that humans care about turn out to be tightly linked in webs of explanation, and also in webs of proof, rather than occupying separate islands?—and that these two mysteries are very closely related, if not quite the same.

Chad OrzelToy Roller Coasters and the Energy Principle

One of the points I make repeatedly in teaching introductory mechanics (as I’m doing this term) is that absolutely every problem students encounter can, in principle, be solved using just Newton’s Laws or, in the terminology used by Matter and Interactions, the Momentum Principle. You don’t strictly need any of the other stuff we talk about, like energy or angular momentum.

Of course, just because you can solve any problem using the Momentum Principle doesn’t mean that you want to solve those problems that way. As an example of a problem that’s really annoying to solve with just the Momentum Principle, I generally break out a toy looping roller coaster that we have in the department’s collection of demo gear. And, sinc eI have a slow-motion camera now, I shot some video of it:

That’s three clips spliced together, partly because the camera’s format doesn’t play nice with Tracker Video (sigh…), but mostly because it makes a useful point, namely that if you don’t start the car high enough up the track, the car won’t make it all the way around the loop.

It’s easy enough to calculate the speed the car needs to have at the top to make it around using just the Momentum Principle, but finding the position on the hill where it will reach the necessary speed at the top of the loop is a harder matter. You can get a good estimate of the start position very quickly if you think about the problem using energy methods, though– at the top of the loop, the cart has to have a minimum speed, and it’s above the ground by some amount, so you can work out the total kinetic plus potential energy it has as it makes the loop. all of that had to come from the gravitational potential it has at the top of the hill, which lets you get the start height. And, indeed, that’s how I figured out where to have my student hold the cart for the second and third clips in the video…

Since I have this, of course, I can crank the clip into Tracker Video Analysis (once the idiot format problem is fixed, sigh…) and measure the position of the car as a function of time. Which gets me a reconstruction of the track that looks like this:

Reconstruction of the roller coaster track from positions measured in Tracker Video.

Reconstruction of the roller coaster track from positions measured in Tracker Video.

(Yeah, it’s a little oblong, but I was selecting these positions using a trackpoint on a laptop, so I wouldn’t read too much into that…) Those positions let you work out the velocity of the car as it goes, which can be combined to find the speed:

Speed of the car going around the loop, as a function of time. Smoothed by taking a running average of five points.

Speed of the car going around the loop, as a function of time. Smoothed by taking a running average of five points.

This shows basically what you expect: the speed increases to a maximum at the bottom of the hill, drops as the cart climbs the loop, reaching a minimum at the top, then goes back up on the way down. The reconstruction of the loop shows a radius of about 15cm, which implies a minimum speed to complete the loop of around 1.2 m/s. The minimum seen in the speed graph is a bit over 2 m/s, a nice safety margin.

Since Rhett is already done for the semester (I have two more weeks of class, grumble mutter grump), I guess it falls on me to assign you some physics homework, so:

— The graphs above are for the first of the video clips. Do your own video analysis of the second and third clips, and find the minimum height for the release.

— Use the difference between the peaks in the speed graph to estimate the force of friction and air resistance acting on the car as it rolls along the track.

— Use the reconstruction of the roller coaster loop to write a VPython simulation of the car moving on the track using only the Momentum Principle.

Send your homework to Rhett for grading, since he has lots of free time these days.

Doug NatelsonBook recommendations: Stuff Matters and The Disappearing Spoon

I've lamented the lack of good popularizations of condensed matter/solid state physics.  I do, however, have recommendations for two relatively recent books about materials and chemistry, which is pretty close.

The Disappearing Spoon, by Sam Kean, is a fun, engaging stroll across the periodic table, exploring the properties of the various chemical elements through the usually fascinating, sometimes funny, occasionally macabre histories of their discoveries and uses.  The title references joke spoons made from gallium that would melt (and fall to the bottom of the cup) when used to stir tea.  The tone is light and anecdotal, and the history is obscure enough that you haven't heard all the stories before.  Very fun.

Stuff Matters, by Mark Miodownik, is similar in spirit, though not quite so historical and containing more physics and materials science.  The author is a materials scientist who happens to be a gifted author and popularizer as well.  He's done a BBC three-episode series about materials (available here), another BBC series about modern technologies, and a TED lesson about why glass is transparent.

Scott AaronsonFive announcements

1. Sanjeev Arora sent me a heads-up that there’s a discussion about the future of the STOC conference  at the Windows on Theory blog—in particular, about the idea of turning STOC into a longer “CS theory festival.”  If you have opinions about this, don’t miss the chance to make your voice heard.

2. Back in January, I blogged about a new quantum optimization algorithm by Farhi, Goldstone, and Gutmann, which was notable for being, as far as anyone could tell, the first quantum algorithm to achieve a provably better approximation ratio than the best-known classical algorithm for an NP-hard optimization problem.  Today, I report that a fearsome list of authors—Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright—has put out an eagerly-awaited paper that gives a classical algorithm for the same problem, with better performance than the quantum algorithm’s.  (They write that this “improves both qualitatively and quantitatively” on Farhi et al.’s work; I assume “qualitatively” refers to the fact that the new algorithm is classical.)  What happened, apparently, is that after I blogged (with enthusiasm) about the Farhi et al. result, a bunch of classical complexity theorists read my post and decided independently that they could match or beat the quantum algorithm’s performance classically; then they found out about each other and decided to merge their efforts.  I’m proud to say that this isn’t the first example of this blog catalyzing actual research progress, though it’s probably the best example so far.  [Update: Luca Trevisan now has a great post explaining what happened in much more detail, entitled “How Many Theoreticians Does It Take to Approximate Max 3Lin?”]

3. Jennifer Ouellette has a wonderful article in Quanta magazine about recent progress in AdS/MERA (i.e., “the emergence of spacetime from entanglement”), centered around the ideas of Brian Swingle.  This is one of the main things that I’d love to understand better right now—if I succeed even partially, you’ll know because I’ll write a blog post trying to explain it to others.  See also this blog post by Sean Carroll (about this paper by Ning Bao et al.), and this paper by Pastawski, Yoshida, Harlow, and Preskill, which explicitly mines the AdS/CFT correspondence for examples of quantum error-correcting codes.

4. Celebrity rationalist Julia Galef, who I had the great honor of meeting recently, has a podcast interview with Sean Carroll about why Carroll accepts the many-worlds interpretation.  (Or if, like me, you prefer the written word to the spoken one, click here for a full transcript.)  Unfortunately, Sean is given the opportunity at the end of the interview to recommend one science book to his listeners—just one!—but he squanders it by plugging some weird, self-indulgent thing called Quantum Computing Since Democritus.  Julia also has a YouTube video about what she learned from the interview, but I haven’t yet watched it (is there a transcript?).

5. I came across an insightful if meandering essay about nerd culture by Meredith L. Patterson.  In particular, noticing how the term “nerd” has been co-opted by normal, socially-skilled people, who’ve quickly set about remaking nerd social norms to make them identical to the rest of the world’s norms, Patterson coins the term “weird-nerd” to describe people like herself, who are still nerds in the original sense and who don’t see nerd culture as something horribly, irreparably broken.  As she writes: “We’ll start to feel less defensive when we get some indication — any indication — that our critics understand what parts of our culture we don’t want to lose and why we don’t want to lose them.”  (But is this the start of a linguistic treadmill?  Will we eventually need to talk about weird-weird-nerds, etc.?)

Clifford JohnsonBecause…

May 18, 2015

n-Category Café The Revolution Will Not Be Formalized

After a discussion with Michael Harris over at the blog about his book Mathematics without apologies, I realized that there is a lot of confusion surrounding the relationship between homotopy type theory and computer formalization — and that moreover, this confusion may be causing people to react negatively to one or the other due to incorrect associations. There are good reasons to be confused, because the relationship is complicated, and various statements by prominent members of both communities about a “revolution” haven’t helped matters. This post and its sequel(s) are my attempt to clear things up.

In this post I will talk mainly about computer formalization of mathematics, independently of homotopy type theory. There are multiple applications of computers to mathematics, and people sometimes confuse computer verification of proofs with computerized automated proof-finding. Homotopy type theory has very little to do with the latter, so henceforth by “formalization” I will mean only the verification of proofs.

This often means the verification of pre-existing proofs, but it is also possible to use a computer to help construct a proof and verify it at the same time. The tools that we use to verify proofs incorporate a small amount of automation, so that this process can sometimes save a small amount of effort over writing a proof by hand first; but at least in the realm of pure mathematics, the automation mainly serves to save us from worrying about details that the computer cares about but that we wouldn’t have worried about in a paper proof anyway.

Computer-verified proof has been going on for decades. Recently it’s garnered a bit more attention, due partly to complete verifications of some “big-name” theorems such as the four-color theorem, the odd-order theorem, and the Kepler conjecture, whose proofs were so long or complicated or automated that reasonable mathematicians might wonder whether there was an error somewhere.

What is the future of computer-verified proof? Is it the future of mathematics? Should we be happy or worried about that prospect? Does it mean that computers will take over mathematics and leave no room for the humans? My personal opinion is that (1) computer-verified proof is only going to get more common and important, but (2) it will be a long time before all mathematics is computer-verified, if indeed that ever happens, and (3) if and when it does happen, it won’t be anything to worry about.

The reason I believe (2) is that my personal experience with computer proof assistants leads me to the conclusion that they are still very far from usable by the average mathematician on a daily basis. Despite all the fancy tools that exist now, verifying a proof with a computer is usually still a lot more work than writing that proof on paper. And that’s after you spend the necessary time and effort learning to use the proof assistant tool, which generally comes with quite a passel of idiosyncracies.

Moreover, in most cases the benefits to verifying a proof with a computer are doubtful. For big theorems that are very long or complicated or automated, so that their authors have a hard time convincing other mathematicians of their correctness by hand, there’s a clear win. (That’s one of the reasons I believe (1), because I believe that proofs of this sort are also going to get more common.) Moreover, a certain kind of mathematician finds proof verification fun and rewarding for its own sake. But for the everyday proof by your average mathematician, which can be read and understood by any other average mathematician, the benefit from sweating long hours to convince a computer of its truth is just not there (yet). That’s why, despite periodic messianic claims from various quarters, you don’t see mathematicians jumping on any bandwagon of proof verification.

Now there would be certain benefits to the mathematical community if all proofs were computer verified. Notably, of course, we could be sure that they were correct. If all submissions to journals were verified first by their authors, then referees could be mostly absolved of checking the correctness of the results and could focus on exposition and interest. (There are still certain things to check, however, such as that the computer formalization does in fact prove what the paper claims that it does.)

I can imagine that once there is a “critical mass” of mathematicians choosing to verify their own results with a computer, journals might start requiring such verification, thereby tipping the balance completely towards all mathematics being verified, in the same way that many journals now require submissions in (La)TeX. However, we’re a long way from that critical mass, because proof assistants have a long way to go before they are as easy to use as TeX. My guess is that it will take about 100 years, if it happens at all. Moreover, it won’t be a “revolution” in the sense of a sudden overthrow of the status quo. Historians might look back on it afterwards and call it a revolution, but at the time when it happens it will feel like a natural and gradual process.

As for (3), for the most part computer verification is just a supplement to ordinary mathematical reasoning. It’s a tool used by human mathematicians. We have to learn to use it correctly, and establish certain conventions, such as the fact that you’ve verified a proof with a computer doesn’t absolve you from explaining it clearly to humans. But I have faith that the mathematical community will be up to that task, and I don’t see any point in worrying right now about things that are so far in the future we can barely imagine them. (And no, I don’t believe in the AI singularity either.)

(Next time: What about HoTT?)

Tommaso DorigoThe Challenges Of Scientific Publishing: A Conference By Elsevier

I spent the last weekend in Berlin, attending a conference for editors organized by Elsevier. And I learnt quite a bit during two very busy days. As a newbie - I am handling editor for the journal "Reviews in Physics" since January this year - I did expect to learn a lot from the event; but I will admit that I decided to accept the invitation to attend the event more out of curiosity for a world that is at least in part new to me, rather than out of professional sense of duty.

read more

BackreactionBook Review: “String Theory and the Scientific Method” by Richard Dawid

String Theory and the Scientific Method
By Richard Dawid
Cambridge University Press (2013)

“String Theory and the Scientific Method” is a very interesting and timely book by a philosopher trying to make sense out of trends in contemporary theoretical physics. Dawid has collected arguments that physicists have raised to demonstrate the promise of their theories, arguments that however are not supported by the scientific method as it is currently understood. He focuses on string theory, but some of his observations are more general than this.

There is for example that physicists rely on mathematical consistency as a guide, even though this is clearly not an experimental assessment. A theory that isn’t mathematically consistent in some regime where we do not have observations yet isn’t considered fundamentally valid. I have to admit it wouldn’t even have occurred to me to call this a “non-empirical assessment,” because our use of mathematics is clearly based on the observation that it works very well to describe nature.

The three arguments that Dawid has collected which are commonly raised by string theorists to support their belief that string theory is a promising theory of everything are:
  1. Meta-inductive inference: The trust in a theory is higher if its development is based on extending existing successful research programs.
  2. No-alternatives argument: The more time passes in which we fail to find a theory as successful as string theory in combining quantum field theory with general relativity the more likely it is that the one theory we have found is unique and correct.
  3. Argument of unexpected explanatory coherence: A finding is perceived more important if it wasn’t expected.
Dawid then argues basically that since a lot of physicists are de facto not relying on the scientific method any more maybe philosophers should face reality and come up with a better explanation that would alter the scientific method so that according to the new method the above arguments were scientific.

In the introduction Dawid writes explicitly that he only studies the philosophical aspects of the development and not the sociological ones. My main problem with the book is that I don’t think one can separate these two aspects clearly. Look at the arguments that he raises: The No Alternatives Argument and the Unexpected Explanatory Coherence are explicitly sociological. They are 1.) based on the observation that there exists a large research area which attracts much funding and many young people and 2.) that physicists trust their colleagues’ conclusions better if it wasn’t the conclusion they were looking for. How can you analyze the relevance of these arguments without taking into account sociological (and economic) considerations?

The other problem with Dawid’s argument is that he confuses the Scientific Method with the rest of the scientific process that happens in the communities. Science basically operates as a self-organized adaptive system, that is in the same class of systems as natural selection. For such systems to be able to self-optimize something – in the case of science the use of theories for the descriptions of nature – they must have a mechanism of variation and a mechanism for assessment of the variation followed by a feedback. In the case of natural selection the variation is genetic mixing and mutation, the assessment is whether the result survives, the feedback is another reproduction. In science the variation is a new theory and the assessment is whether it agrees with experimental test. The feedback is the revision or trashcanning of the theory. This assessment whether a theory describes observation is the defining part of science – you can’t change this assessment without changing what science does because it determines what we optimize for.

The assessments that Dawid, correctly, observes are a pre-selection that is meant to assure we spend time only on those theories (gene combinations) that are promising. To make a crude analogy, we clearly do some pre-selection in our choice of partners that determines which genetic combinations are ever put to test. These might be good choices or they might be bad choices and as long as their success hasn’t also been put to test, we have to be very careful whether we rely on them. It’s the same with the assessments that Dawid observes. Absent experimental test, we don’t know if using these arguments does us any good. In fact I would argue that if one takes into account sociological dynamics one presently has a lot of reasons to not trust researchers to be objective and unbiased which sheds much doubt on the use of these arguments.

Be that as it may, Dawid’s book has been very useful for me to clarify my thoughts about exactly what is going on in the community. I think his observations are largely correct, just that he draws the wrong conclusion. We clearly don’t need to update the scientific method, we need to apply it better, and we need to apply it in particular to better understand the process of knowledge discovery.

I might never again agree with David Gross on anything, but I do agree on his “pre-publication praise” on the cover. The book is very recommendable reading both for physicists and philosophers.

I wasn’t able to summarize the arguments in the book without drawing a lot of sketches, so I made a 15 mins slideshow with my summary and comments on the book. If you have the patience, enjoy :)

Mark Chu-CarrollExpressions and Arity (Part 2): Equivalence

Continuing where I left off: we were talking about arity in Martin-Löf’s theory of expressions. There are two basic problems that arity solves: it makes certain kinds of impossible-to-evaluate expressions be invalid in the theory; and it helps enable some way of comparing expressions for equality. Arity solves both of those problems by imposing a simple type system over expressions.

At the end of the last post, I started giving a sketch of what arities look like. Now we’re going to dive in, and take a look at how to determine the arity of an expression. It’s a fairly simple system of rules.

Before diving in, I want to stress the most important thing about the way that these rules work is that the expressions are totally syntactic and static. This is something that confused me the first time I tried to read about expression theory. When you see an expression, you think about how it’s evaluated. But expression theory is a purely syntactic theory: it’s about analyzing expressions as syntactic entities. There are, deliberately, no evaluation rules. It’s about understanding what the notations mean, and how to determine when two expressions are equivalent.

If, under the rules of Martin-Löf’s expression theory, two expressions are equivalent, then if you were to chose a valid set of evaluation rules, the two expressions will evaluate to the same value. But expression equivalence is stronger: expressions are equivalent only if you can prove their equivalence from their syntax.

That clarified, let’s start by looking at the rules of arity in expressions.

Variables and Constants
Every variable and every primitive constant has a pre-defined arity; if x is a variable or primitive constant with arity \alpha, then the expression x has arity \alpha.
In a definition D := e, the arity of the defined name D is the same as the arity of the expression e.
If a is an expression of arity \alpha \twoheadrightarrow \beta, and b is a expression of arity \alpha, then a(b) is an expression of arity \beta.
If e is an expression of arity \beta and x is a variable of arity \alpha, then (x)e is an expression of arity \alpha \twoheadrightarrow \beta.
If e_1 is an expression af arity \alpha_1, e_2 is an expression of arity \alpha_2, …, and e_n is an expression of arity \alpha_n, then a combination expression e_1, e_2, ..., e_n is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n.
If e is an expression of arity \alpha_1 \otimes \alpha_2 \otimes \ldots \otimes \alpha_n where n \ge 2, then (e).i is an expression af arity e_i.

Let’s try working through an example: x^2 + 3x + 7.

  1. As we saw in this post, this is equivalent to the simple AST-form: (x)+(+(*(x,x), *(3, x)),7).
  2. x” is a variable of arity 0; “3” and “7” are constants of arity 0; “+” and “*” are constants of arity (0 \otimes 0) \twoheadrightarrow 0.
  3. From the combination rule, since x and 3 both have arity 0, (x, x) and (3, x) each have arity 0 \otimes 0.
  4. Since (x, x) has arity 0 \otimes 0, and * has arity (0 \otimes 0) \twoheadrightarrow 0, *(x, x) has arity 0. The same thing works for *(3, x).
  5. Since the arities of the *(x, x) and *(3, x) are both 0, the combination of the pair (the arguments to the inner “+”) are 0 \otimes 0, and the arity of the inner sum expression is thus 0.
  6. Since 7 has arity 0, the combination of it with the inner sum is 0 \otimes 0, and the arity of the outer sum is 0.
  7. Since x is a variable of arity 0, and outer sum expression has arity 0, the abstract has arity 0 \twoheadrightarrow 0.

If you’re familiar with type inference in the simply typed lambda calculus, this should look pretty familiar; the only real difference is that the only thing that arity really tracks is applicability and parameter counting.

Just from this much, we can see how this prevents problems. If you try to compute the arity of 3.1 (that is, the selection of the first element from 3), you find that you can’t: there is no arity rule that would allow you to do that. The selection rule only works on a product-arity, and 3 has arity 0.

The other reason we wanted arity was to allow us to compare expressions. Intuitively, it should be obvious that the expression e and the expression (x)e(x) are in some sense equal. But we need some way of being able to actually precisely define that equality.

The kind of equality that we’re trying to get at here is called definitional equality. We’re not trying to define equality where expressions a and b evaluate to equal values – that would be easy. Instead, we’re trying to get at something more subtle: we want to capture the idea that the expressions are different ways of writing the same thing.

We need arity for this, for a simple reason. Let’s go back to that first example expression: (x)+(+(*(x,x), *(3, x)),7). Is that equivalent to (y)+(+(*(y,y), *(3, y)),7)? Or to 8x+1? If we apply them to the value 3, and then evaluate them using standard arithmetic, then all three expressions evaluate to 25. So are they all equivalent? We want to be able to say that the first two are equivalent expressions, but the last one isn’t. And we’d really like to be able to say that structurally – that is, instead of saying something evaluation-based like “forall values of x, eval(f(x)) == eval(g(x)), therefore f == g”, we want to be able to do something that says f \equiv g because they have the same structure.

Using arity, we can work out a structural definition of equivalence for expressions.

In everything below, we’l write a: \alpha to mean that a has arity \alpha, and a \equiv b : \alpha to mean that a and b are equivalent expressions of arity \alpha. We’ll define equivalence in a classic inductive form by structure:

Variables and Constants
If x is a variable or constant of arity \alpha, then x \equiv x : alpha. This is the simplest identity rule: variables and constants are equivalent to themselves.
If a := b is a definition, and b: \alpha, then a \equiv b : \alpha. This is a slightly more complex form of an identity rule: if there’s a definition of b as the value of a, then a and b are equivalent.
Application Rules
  1. If and , then . If an applyable expression a is equivalent to another applyable expression , then applying a to an expression b is equivalent to applying to an expression if the argument b is equivalent to the argument . That’s a mouthful, but it’s simple: if you have two function application expressions, they’re equivalent if both the function expressions and the argument expressions are equivalent.
  2. If x is a variable of arity \alpha, and a is an expression of arity \alpha and b is an expression of arity \beta, then ((x)b)(a)  b[x := a]: \beta. This is arity’s version of the classic beta rule of lambda calculus: applying an abstraction to an argument means substituting the argument for all references to the abstracted parameter in the body of the abstraction.
Abstraction Rules
  1. If x is a variable of arity \alpha, and b \equiv b: \beta, then (x)b \equiv (x)b: \alpha \twoheadrightarrow \beta. If two expressions are equivalent, then two abstractions using the same variable over the same expression is equivalent.
  2. If x and y are both variables of arity \alpha, and b is an expression of arity \beta, then (x)b \equiv (y)(b[x := y]): \alpha \twoheadrightarrow \beta, provided y is not free in b.
    Basically, the renaming variables in abstractions don’t matter, as long as you aren’t using the variable in the body of the abstraction. So (x)(3+4y) is equivalent to (z)(3+4y), but it’s not equivalent to (y)(3+4y), because y is a free variable in 3+4y, and the abstraction would create a binding for y.

  3. This is arities version of the eta-rule from lambda calculus: If x is a variable of arity \alpha, and b is an expression of arity \alpha \twoheadrightarrow \beta, then (x)(b(x)) \equiv b: \alpha \twoheadrightarrow \beta. This is a fancy version of an identity rule: abstraction and application cancel.

Combination Rules
  1. If , , …, , then . This one is simple: if you have two combination expressions with the same arity, then they’re equivalent if their elements are equivalent.
  2. If e: \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n, then e.1, e.2, ..., e.n \equiv: e : \alpha_1 \otimes \alpha_2 \otimes ... \otimes \alpha_n. Another easy one: if you take a combination expression, and you decompose it using selections, and then recombine those selection expressions into a combination, it’s equivalent to the original expression.
Selection Rules
  1. If , then . This is the reverse of combinations rule one: if you have two equivalent tuples, then their elements are equivalent.
  2. If a_1: \alpha_1, a_2: \alpha_2, ..., a_n: \alpha_n, then (a_1, a_2, ..., a_n).i \equiv a_i. An element of a combination is equivalent to itself outside of the combination.
If a: \alpha, then a \equiv a: \alpha.
If a \equiv b: \alpha, then b \equiv a: \alpha.
If a \equiv b: \alpha, and b \equiv c: \alpha, then a \equiv c: \alpha.

Jumping back to our example: Is x^2 + 3x + 7 equivalent to x^2 + 3x + 7? If we convert them both into their canonical AST forms, then yes. They’re identical, except for one thing: the variable name in their abstraction. By abstraction rule 2, then, they’re equivalent.

n-Category Café Categorifying the Magnitude of a Graph

Tom Leinster introduced the idea of the magnitude of graphs (first at the Café and then in a paper). I’ve been working with my mathematical brother Richard Hepworth on categorifying this and our paper has just appeared on the arXiv.

Categorifying the magnitude of a graph, Richard Hepworth and Simon Willerton.

The magnitude of a graph can be thought of as an integer power series. For example, consider the Petersen graph.

Petersen graph

Its magnitude starts in the following way. #P =1030q+30q 2+90q 3450q 4 +810q 5+270q 65670q 7+. \begin{aligned} \#P&=10-30q+30q^{2}+90q ^{3}-450q^{4}\\ &\quad\quad+810q^{5} + 270 q^{6} - 5670 q^{7} +\dots. \end{aligned}

Richard observed that associated to each graph GG there is a bigraded group MH *,*(G)\mathrm{MH}_{\ast ,\ast }(G), the graph magnitude homology of GG, that has the graph magnitude #G# G as its graded Euler characteristic. #G = k,l0(1) krank(MH k,l(G))q l = l0χ(MH *,l(G))q l. \begin{aligned} #G &= \sum _{k,l\geqslant 0} (-1)^{k}\cdot \mathrm{rank}\bigl (\mathrm{MH}_{k,l}(G)\bigr )\cdot q^{l}\\ &= \sum _{l\geqslant 0} \chi \bigl (\mathrm{MH}_{\ast ,l}(G)\bigr )\cdot q^{l}. \end{aligned} So graph magnitude homology categorifies graph magnitude in the same sense that Khovanov homology categorifies the Jones polynomial.

For instance, for the Petersen graph, the ranks of MH k,l(P)\mathrm{MH}_{k,l}(P) are given in the following table. You can check that the alternating sum of each row gives a coefficient in the above power series.

k 0 1 2 3 4 5 6 7 0 10 1 30 2 30 3 120 30 l 4 480 30 5 840 30 6 1440 1200 30 7 7200 1560 30 \begin{array}{rrrrrrrrrr} &&&&&&k\\ &&0&1&2&3&4&5&6&7 \\ &0 & 10\\ & 1 & & 30 \\ &2 & && 30 \\ &3 &&& 120 & 30 \\ l &4 &&&& 480 & 30 \\ &5 &&&&& 840 & 30 \\ &6 &&&&& 1440 & 1200 & 30 \\ &7 &&&&&& 7200 & 1560 & 30 \\ \\ \end{array}

Many of the properties that Tom proved for the magnitude are shadows of properties of magnitude homology and I’ll describe them here.

The definition

Let’s have a quick look at the definition first: this is a typical piece of homological algebra. For each graph GG we define a kk-chain on GG to be a k+1k+1-tuple of vertices of GG such that adjacent vertices are distinct: (a 0,,a k),a iVertices(G),a ia i+1. (a_{0},\dots ,a_{k}),\quad a_{i}\in \mathrm{Vertices}(G),\quad a_{i}\ne a_{i+1}. We equip the graph with the path length metric, so each edge has length 11, and we define the length of a chain to be the length of the path obtained by traversing its vertices: ((a 0,,a k)) i=0 k1d(a i,a i+1). \ell ((a_{0},\dots ,a_{k}))\coloneqq \sum _{i=0}^{k-1}d(a_{i},a_{i+1}). The chain group MC k,l(G)\mathrm{MC}_{k,l}(G) is defined to be the free abelian group on the set of kk-chains of length ll. We define the differential :MC k,lMC k1,l\partial \colon \mathrm{MC}_{k,l}\to \mathrm{MC}_{k-1,l} by (1) i i\partial \coloneqq \sum (-1)^{i}\partial _{i} where i(a 0,,a k){(a 0,,a i^,,a k) if(a 0,,a i^,,a k)=l, 0 otherwise. \partial _{i}(a_{0},\ldots ,a_{k}) \coloneqq \begin{cases} (a_{0},\ldots ,\widehat{a_{i}},\ldots ,a_{k}) & \mathrm{if } \ell (a_{0},\ldots ,\widehat{a_{i}},\ldots ,a_{k})=l, \\ 0 & \mathrm{otherwise}. \end{cases} Here, as usual, x^\widehat{x} means “omit xx”. The graph magnitude is then defined to be the homology of this differential: MH k,l(G)H k(MC *,l(G),). \mathrm{MH}_{k,l}(G)\coloneqq \mathrm{H}_{k}(\mathrm{MC}_{\ast ,l}(G),\partial ). Unfortunately, I don’t know of any intuitive interpretation of the chain groups.


One standard advantage of this sort of categorification is that it has functoriality where the original invariant did not. We don’t usually have morphisms between numbers or polynomials, but we do have morphisms between groups. In the case here we have the category of graphs with the morphisms sending vertices to vertices with edges either preserved or contracted, and graph magnitude homology gives a functor from that to the category of bigraded groups and homomorphisms.

Categorifying properties of magnitude

Here are some of the properties of magnitude that we can categorify. With the exception of disjoint unions, all of the other categorification results require some decidedly non-trivial homological algebra to prove.

Disjoint unions

Tom showed that magnitude is additive with respect to the disjoint union of graphs: #(GH)=#G+#H. #(G\sqcup H) = #G + #H. Our categorification of this is the additivity of the magnitude homology: MH *,*(GH)MH *,*(G)MH *,*(H). \mathrm{MH}_{\ast ,\ast }(G\sqcup H) \cong \mathrm{MH}_{\ast ,\ast }(G) \oplus \mathrm{MH}_{\ast ,\ast }(H).


Tom showed that magnitude is multiplicative with respect to the cartesian product \square of graphs #(GH)=#G#H. #(G\Box H) = #G\cdot #H. The categorification of this is a Künneth Theorem which says that there is a non-naturally split, short exact sequence: 0MH *,*(G)MH *,*(H)MH *,*(GH) Tor(MH *+1,*(G),MH *,*(H))0. \begin{aligned} 0\to \mathrm{MH}_{\ast ,\ast }(G)\otimes \mathrm{MH}_{\ast ,\ast }(H) \to \mathrm{MH}_{\ast ,\ast }(G\square H)\\ \to \mathrm{Tor}\bigl (\mathrm{MH}_{\ast +1,\ast }(G), \mathrm{MH}_{\ast ,\ast }(H)\bigr ) \to 0. \end{aligned} If either GG or HH has torsion-free magnitude homology, then this sequence reduces to an isomorphism MH *,*(GH)MH *,*(G)MH *,*(H). \mathrm{MH}_{\ast ,\ast }(G{\square }H)\cong \mathrm{MH}_{\ast ,\ast }(G)\otimes \mathrm{MH}_{\ast ,\ast }(H). Despite quite a bit of computation, we don’t know whether any graphs have torsion in their magnitude homology.


Tom showed that a form of the inclusion-exclusion formula holds for certain graphs. We need a pile of definitions first.

Definition. A subgraph UXU\subset X is called convex if d U(u,v)=d X(u,v)d_{U}(u,v)=d_{X}(u,v) for all u,vUu,v\in U.

One way of reading this is that for uu and vv in UU there is a geodesic joining them that lies in UU.

Definition. Let UXU\subset X be a convex subgraph. We say that XX projects to UU if for every xXx\in X that can be connected by an edge-path to some vertex of UU, there is π(x)U\pi (x)\in U such that for all uUu\in U we have d(x,u)=d(x,π(x))+d(π(x),u). d(x,u) = d(x,\pi (x)) + d(\pi (x),u).

For instance, a four-cycle graph projects to any edge, whereas a five cycle does not project to an edge.

Definition. A projecting decomposition is a triple (X;G,H)(X;G,H) consisting of a graph XX and subgraphs GG and HH such that

  • X=GHX=G\cup H,

  • GHG\cap H is convex in XX,

  • HH projects to GHG\cap H.

Tom showed that if (X;G,H)(X;G,H) is a projecting decomposition then #X+#(GH)=#G+#H. #X +# (G\cap H)= #G +#H . Our categorification of this result is that if (X;G,H)(X;G,H) is a projecting decomposition, then there is a naturally split short exact sequence 0MH *,*(GH)MH *,*(G)MH *,*(H)MH *,*(X)0 0\to \mathrm{MH}_{\ast ,\ast }(G\cap H)\to \mathrm{MH}_{\ast ,\ast }(G)\oplus \mathrm{MH}_{\ast ,\ast }(H)\to \mathrm{MH}_{\ast ,\ast }(X)\to 0 (which is a form of Mayer-Vietoris sequence) and consequently there is a natural isomorphism MH *,*(X)MH *,*(GH)MH *,*(G)MH *,*(H). \mathrm{MH}_{\ast ,\ast }(X)\oplus \mathrm{MH}_{\ast ,\ast }(G\cap H) \cong \mathrm{MH}_{\ast ,\ast }(G)\oplus \mathrm{MH}_{\ast ,\ast }(H).


Tom noted many examples of graphs which had magnitude with coefficients which alternate in sign; these examples included complete graphs, complete bipartite graphs, forests and graphs with up to four vertices.

Our categorification of this is that in all of these cases the magnitude homology is supported on the diagonal: we say a graph GG is diagonal if MH k,l(G)=0\mathrm{MH}_{k,l}(G)=0 if klk\neq l. In this case the magnitude is given by #G= l0(1) lrankMH l,l(G)q l, #G=\sum _{l\geq 0}(-1)^{l}\cdot \mathrm{rank}\mathrm{MH}_{l,l}(G)\cdot q^{l}, and it means in particular that the coefficients of the magnitude alternate in sign.

The join GHG\star H of graphs GG and HH is obtained by adding an edge between every vertex of GG and every vertex of HH. This is a very drastic operation, for instance the diameter of the resulting join is at most 22.

Theorem. If GG and HH are non-empty graphs then the join GHG\star H is diagonal.

This tells us immediately that complete graphs and complete bipartite graphs are diagonal. Together with the other properties of magnitude homology mentioned above, we recover the alternating magnitude property of all the graphs noted by Leinster, as well as many more.

However, there is one graph that appears to be diagonal, at least up to degree 77 according to our computer calculations but which we can’t prove is diagonal, i.e. it doesn’t follow from the results above. This graph is the icosahedron graph which is the one-skeleton of the icosahedron.


Where next?

There are plenty of questions that you can ask about magnitude homology, a fundamental one is whether there are graphs with the same magnitude but different magnitude homology and a related one is whether you can categorify Tom’s result (conjectured by me and David Speyer) on the magnitude of graphs which differ by certain Whitney twists.

One interesting thing demonstrated here is that magnitude has its tendrils in homological algebra, yet another area of mathematics to add to the list which includes biodiversity, enriched category theory, curvature and Minkowski dimension.

May 16, 2015

Jordan EllenbergWilliam Deresiewicz gets David Foster Wallace now

Just looking at William Deresiewicz’s piece on Mark Greif in Harper’s, where he writes:

Like David Foster Wallace, albeit in a very different key, Greif is willing to be vulnerable, to forgo the protections of irony and nihilism.

True!  (At least of DFW; I don’t know enough about Greif.)  And satisfiying, because I complained before about Deresiewicz mischaracterizing Wallace:

As for the slackers of the late ’80s and early ’90s (Generation X, grunge music, the fiction of David Foster Wallace), their affect ran to apathy and angst, a sense of aimlessness and pointlessness. Whatever. That they had no social vision was precisely what their social vision was: a defensive withdrawal from all commitment as inherently phony.

Maybe he reads my blog!


David Hoggexoplanet dynamics, sucking

In group meeting, Dun Wang talked about astrometric calibration of the GALEX Satellite, and Kat Deck (Caltech) talked about the dynamical evolution of exoplanetary systems. She pointed out that we naively expect lots of planets close in period to be locked in resonances, but in fact such resonances are rare, empirically in the Kepler sample. She has explanations for this involving the evolving proto-planetary disk.

After lunch, Deck gave the astro seminar, on planetary system stability and the Kepler planets. She discussed chaos, stability, and heuristic stability criteria. One interesting thing is that there really is no non-heuristic stability criterion: We think of a planetary system as "stable" if there are no catastrophic, order-unity changes to any of the orbital osculating elements. That's not really an equation! And at the talk there was some discussion of the point (counter-intuitive and important) that a system can be stable (by our astronomer definition) for far, far longer than the Lyapunov time. Awesome and important.

At the end of the day, Foreman-Mackey and I made the (astonishing) decision to abort and fail on our NASA proposal: We just ran out of time. I am disappointed; we have an eminently fundable pitch. That said, we just didn't start early enough to make that pitch at the level we wanted. Not sure how to feel about it, but I sure need to catch up on sleep!

David Hoggfinding planets with TTVs

In the tiny bit of research time available today, I spoke with Kat Deck (Caltech) and Foreman-Mackey about finding planets with large transit-timing residuals. These signals aren't precisely periodic, so some searches could miss them entirely. Deck has simple models (based on perturbation theory) for the variations, so we can in principle add only a few parameters and capture a large range in possible variations. This might make us much more sensitive to extremely important and interesting systems hiding in existing data.

The rest of the day was spent writing our NASA proposal, with the exception of lunch with Yann LeCun (facebook) and Léon Bottou (facebook) at the NYC office of facebook Research.

May 15, 2015

John BaezCarbon Emissions Stopped Growing?

In 2014, global carbon dioxide emissions from energy production stopped growing!

At least, that’s what preliminary data from the International Energy Agency say. It seems the big difference is China. The Chinese made more electricity from renewable sources, such as hydropower, solar and wind, and burned less coal.

In fact, a report by Greenpeace says that from April 2014 to April 2015, China’s carbon emissions dropped by an amount equal to the entire carbon emissions of the United Kingdom!

I want to check this, because it would be wonderful if true: a 5% drop. They say that if this trend continues, China will close out 2015 with the biggest reduction in CO2 emissions every recorded by a single country.

The International Energy Agency also credits Europe’s improved attempts to cut carbon emissions for the turnaround. In the US, carbon emissions has basically been dropping since 2006—with a big drop in 2009 due to the economic collapse, a partial bounce-back in 2010, but a general downward trend.

In the last 40 years, there have only been 3 times in which emissions stood still or fell compared to the previous year, all during global economic crises: the early 1980’s, 1992, and 2009. In 2014, however, the global economy expanded by 3%.

So, the tide may be turning! But please remember: while carbon emissions may start dropping, they’re still huge. The amount of the CO2 in the air shot above 400 parts per million in March this year. As Erika Podest of NASA put it:

CO2 concentrations haven’t been this high in millions of years. Even more alarming is the rate of increase in the last five decades and the fact that CO2 stays in the atmosphere for hundreds or thousands of years. This milestone is a wake up call that our actions in response to climate change need to match the persistent rise in CO2. Climate change is a threat to life on Earth and we can no longer afford to be spectators.

Here is the announcement by the International Energy Agency:

Global energy-related emissions of carbon dioxide stalled in 2014, IEA, 13 March 2015.

Their full report on this subject will come out on 15 June 2015. Here is the report by Greenpeace EnergyDesk:

China coal use falls: CO2 reduction this year could equal UK total emissions over same period, Greenpeace EnergyDesk.

I trust them less than the IEA when it comes to using statistics correctly, but someone should be able to verify their claims if true.

Jordan EllenbergI will never find all the bad sentences

Even now, a year after the book came out, two weeks before the paperback arrives, I’m still finding bad sentences in it.  The one I just noticed:

It was scary when a statistical model deployed by the Guest Marketing Analytics team at Target correctly inferred based on purchasing data that one of its customers—sorry, guests—a teenaged girl in Minnesota, was pregnant, based on an arcane formula involving elevated rates of buying unscented lotion, mineral supplements, and cotton balls.

I must have written “based on purchasing data” and then tried it again in a higher pitch with “based on an arcane formula … cotton balls” but forgotten to take out the original, leaving a sentence with a weird, redundant double “based on.”  Who knows how many mistakes like this are left in the final text?  How many will I never catch?

Mark Chu-CarrollFree-riding Insurance

Pardon me, while I go off on a bit of a rant. There is a bit of
math content to this, but there’s more politics.

There’s a news story that’s been going around this week about a guy who’s bitterly angry. His name is Luis Lang. Mr. Lang is going blind because of complications of diabetes. The only way to save his eyesight is to get very expensive eye surgery, but Mr. Lang can’t afford it. He doesn’t have insurance, and now that he needs it, he can’t buy it. According to him, this is all the fault of President Obama and the unjust Obamacare insurance system which is denying him access to insurance when he really needs it.

In the US, we’re in the early days of some big changes to our health-care system. Up until a couple of years ago, most people got insurance via their employers. If they didn’t get it through work, then they needed to buy policies on their own. Privately purchased policies were typically extremely expensive, and they came with a “pre-existing condition” exclusion (PECE). The pre-existing exclusion meant that if you had a medical condition that required care before you purchased the policy, the policy wouldn’t pay for the care.

In the new system, most people still get insurance through work. But the people who don’t get to go to government-run insurance exchanges to buy policies. The policies in the exchanges are much cheaper than the old private coverage used to be, and rules like the pre-existing condition exclusion are prohibited in policies on the exchange. In addition, if you make less than a certain income, the government will subsidize your coverage to make it affordable. Under this system, you’re required to buy insurance if it’s possible for you to buy it with the government subsidies; if you choose to go without, you have to pay a penalty. If you’re too poor to buy even with the subsidies, then you’re supposed to be able to get medicare under an expanded medicare program. But the medicare expansions needed to be done by the states, and many states refused to do it, even though the federal government would cover nearly all of the costs (100% for the first few years; 95% indefinitely thereafter.)

I’m not a fan of the new system. Personally, I believe that for-profit insurance is fundamentally immoral. But that’s not the point of this post. We’ve got a system now that should make it possible for people to get coverage. So why is this poor guy going blind, and unable to get insurance?

The answer is simple: because he very deliberately put himself into a terrible situation, and now he’s paying the price for that. And there are very good reasons why people who put themselves into his situation can’t get covered when they really need it.

First, I’ll run through how he got into this mess. Then, I’ll explain why, as sad as it is for this dumbass, he’s stuck.

Our alleged victim of unjust government policies is around 50 years old. He owns a nice home, and runs his own business. He had the opportunity to buy insurance, but he chose not to, because he has a deeply held philosophical/political belief that he should pay his own bills, and so he always paid for his medical care out of his own pocket. When Obamacare came down the pike, he was strongly opposed to it because of that philosophy, and so he paid the penalty rather than buy insurance, and stayed uninsured. It’s important to note here that he made a deliberate choice to remain uninsured.

Mr Lang isn’t a paragon of good health. He’s a smoker, and he’s had diabetes for a couple of years. He admits that he hasn’t been very good about managing his diabetes. (I’m very sympathetic to his trouble managing diabetes: there’s a lot of diabetes in my family – my mother, her brother, my grandfather, and every one of his siblings, and his father all had diabetes. I’ve seen members of my family struggle with it. Diabetes is awful. It’s hard to manage. Most people struggle with it, and many don’t ultimately do it well enough before they wind up with complications. That’s what happened to Mr. Lang: he developed complications.

Specifically, he had a series of small strokes, ruptured blood vessels in his cornea, and a detached retina. Combined, these conditions will cause him to go blind without surgery. (This is exactly what happened to my uncle – he lost his vision due to diabetes.) Mr. Lang’s condition has gotten bad enough that he’s unable to work because of these problems, so he can’t afford to pay for the surgery. So now, he wants to buy insurance. And he can’t.

Why not?

To really see why, we need to take a step back, and look at just what insurance really is.

Reduced to its basics, the idea of insurance is that there’s a chance of an unlikely event happening that you can’t afford to pay for. For example, say that there’s a 1 in 1000 chance of you needing major surgery that will cost $100,000. You can’t afford to pay $100,000 if it happens. So, you get together with 999 other people. Each of you put $100 into the pot. Then if you end up being unlucky, and you need the surgery, you can draw on the $100,000 in the pot to pay for your surgery.

The overwhelming majority of people who put money into the pot are getting nothing concrete for their money. But the people who needed medical care that they couldn’t afford on their own were able to get it. You and the other people who all bought in to the insurance pot were buying insurance against a risk, so that in case something happened, you’d be covered. You know that you’re probably going to lose money on the deal, but you do it to cover the unlikely case that you’ll need it. You’re sharing your risks with a pool of other people. In exchange for taking on a share of the risk (putting your money into the pool without knowing whether you’ll get it back), you take a share of the resource (the right to draw money out of the pool if you need it).

In the modern insurance system, it’s gotten a lot more complicated. But the basic idea is still the same. You’ve got a huge number of people all putting money into the pot, in the form of insurance premiums. When you go to the doctor, the insurance company pays for your care out of the money in that pot. The way that the insurance company sets premiums is complicated, but it comes down to collecting more from each buyer than it expects to need to pay for their medical care. It does that by mathematically analyzing risks.

This system is very easy to game if you can buy insurance whenever you want. You simply don’t buy insurance until something happens, and you need insurance to pay for it. Then you buy coverage. So you weren’t part of the shared risk pool until you knew that you needed more than you were going to pay in to the pool. You’re basically taking a share of the community resources in the insurance pool, without taking a share of the community risk. In philosophical circles, that’s called the free-rider problem.

Insurance can’t work without doing something to prevent free-riders from exploiting the system.

Before Obamacare, the way that the US private insurance system worked was that you could buy insurance any time you want, but when you did, you were only covered for things that developed after you bought it. Any medical condition that required care that developed before you bought insurance wasn’t covered. PECEs prevented the free-rider problem by blocking people from joining the benefits pool without also joining the risk pool: any conditions that developed while you were outside the risk pool weren’t covered. So before Obamacare, Mr. Lang could have gone out and bought insurance when he discovered his medical problems – but that insurance wouldn’t cover the surgery that he needs, because it developed while he was uninsured.

Without PECEs, it’s very easy to exploit the insurance system by free-riding. If you allowed some people to stay out of the insurance system until they needed the coverage, then you’d need to get more money from everyone else who bought insurance. Each year, you’d still need to have a pool of money big enough to cover all of the expected medical care costs for that year. But that pool wouldn’t just need to be big enough to cover the people who bought in at the beginning of the year – it would need to be large enough to cover everyone who bought insurance at the beginning of the year, and everyone who jumped in only when they needed it.

Let’s go back to our example. There’s only one problem that can happen, and it happens to 1 person in 1000 per year, and it costs $100,000 to treat. We’ve got a population of 2000 people. 1000 of them bought into the insurance system.

In an average year, 2 people will become ill: one with insurance, and one without. The one with insurance coverage becomes ill, and they get to take the $100,000 they need to cover their care. The person without insurance is stuck, and they need to pay $100,000 for their own care, or go without. In order to cover the expenses, each of the insured people would need to have paid $100.

If people can buy in to the insurance system at any time, without PECEs, then the un-insured person can wait until he gets sick, and buy insurance then. Now the insurance pool needs to cover $200,000 worth of expenses; but they’ve only got one additional member. In order to cover, they need to double the cost per insured person per year to $200. Everyone in the pool needs to pay double premiums in order to accomodate the free-riders!

This leads to a situation that some economists call a death spiral: You need to raise insurance premiums on healthy people in order to have enough money to cover the people who only sign up when they’re unhealthy. But raising your premiums mean that more people can’t afford to buy coverage, and so you have more people not buying insurance until they need it. And that causes you to need to raise your premiums even more, and so it goes, circling around and around.

The only alternative to PECEs that really works to prevent free-riders is to, essentially, forbid people from being free-riders. You can do that by requiring everyone to be covered, or by limiting when they can join the pool.

In the age of PECEs, there was one way of getting insurance without a PECE, and it’s exactly what I suggested in the previous paragraph. Large companies provided their employees with insurance coverage without PECEs. The reason that they could do it was because they were coming to an insurance company once a year with a large pool of people. The costs of the employer-provided insurance were determined by the average expected cost of coverage for that pool of people, divided by the size of the pool. But in the employer-based non-PECE coverage, you still couldn’t wait to get coverage until you needed it: each year, at the beginning of the year, you needed to either opt in or out of coverage; if, in January, you decided to decline coverage, and then in July, you discovered that you needed surgery, you couldn’t change your mind and opt in to insurance in order to get that covered. You had to wait until the following year. So again, you were avoiding free-riders by a combination of two mechanisms. First, you made it so that you had to go out of your way to refuse coverage – so nearly everyone was part of the company’s insurance plan. And second, you prevent free-riding by making it much harder to delay getting insurance until you needed it.

The Obamacare system bans PECEs. In order to avoid the free-rider problem, it does two things. It requires everyone to either buy insurance, or pay a fine; and it requires that you buy insurance for the whole year starting at the beginning of the year. You might think that’s great, or you might think it’s terrible, but either way, it’s one way of making insurance affordable without PECEs.

Mr. Lang wants to be a free-rider. He’s refused to be part of the insurance system, even though he knew that he had a serious medical condition that was likely to require care. Even though he was a regular smoker, and knew of the likelihood of developing serious medical problems as a result. He didn’t want to join the risk pool, and he deliberately opted out, refusing to get coverage when he had the chance.

That was his choice, and under US law, he had every right to make it for himself.

What Mr. Lang does not have the right to do is to be a free-rider.

He made a choice. Now he’s stuck with the results of that choice. As people like Mr. Lang like to say when they’re talking about other people, it’s a matter of personal responsibility. You can’t wait until you need coverage to join the insurance system.

Mr. Lang can buy insurance next year. And he’ll be able to get an affordable policy with government subsidies. And when he gets it, it will cover all of his medical problems. Before the Obamacare system that he loathes and blames, that wouldn’t have been true.

It’s not the fault of President Obama that he can’t buy insurance now. It’s not the fault of congress, or the democratic party, or the republican party. There’s only one person who’s responsible for the fact that he can’t get the coverage that he needs in order to get the surgery that would save his eyesight. And that’s the same person who he can’t see in the mirror anymore.

Chad OrzelMiscellaneous Academic Job Market Notes

A few things about the academic job market have caught my eye recently, but don’t really add up to a big coherent argument. I’ll note them here, though, to marginally increase the chance that I’ll be able to find them later.

— First, this piece at the Guardian got a lot of play, thanks in part to the dramatic headline Science careers: doomed at the outset but even more thanks to the subhead “Has it become harder for graduate students to thrive, and are our best potential scientists giving up on academia?” Most of the people I saw re-sharing it used basically just that last clause, often stripping out the punctuation to make it a statement: Our best potential scientists are leaving academia.

Only, it doesn’t really show that. That’s not to say it’s not a major indictment of the bad state of graduate education, just that it doesn’t really connect the dots to show that “our best potential scientists” are leaving. There’s an argument to be made there, to be sure– you might reasonably believe that the best potential scientists are also the students most likely to have other options, and thus the most likely to get out while the getting is good. That argument remains largely implicit, though. And it bugs me to see it asserted as fact.

You could, with only slightly less plausibility, make a counter-argument that “our best potential scientists” are those with such focus and drive that they will pursue science no matter what, even in the face of all sorts of unreasonable hardship. In fact, that’s the standard argument in favor of the hazing-like structure of graduate education in a lot of places– we need to weed out the unfit, to select the best.

If you want to make a definitive that “our best potential scientists” are leaving, I’d like to see some data showing that the “best” students are disproportionately discouraged. I’m not sure how you’d really generate that, though, given that there isn’t a generally agreed-upon measure of the quality of scientists, let alone the potential quality of students.

My own guess, if you put a gun to my head and made me stake out a position, would be that the discouragement process is basically random. At least in terms of “quality” and “potential”– it’s probably selecting for something, but I suspect that something is pretty much orthogonal to actual scientific ability. that’s just a guess, though, and I might add that if you’re going to go around pointing guns at people over stupid shit like this, you need to re-think your life choices.

— Somebody on Twitter– I’m pretty sure it was Ben Lillie, but can’t confirm because Twitter– linked to this PDF report on underrepresented minorities in STEM fields a week or so ago, and I’ve had it open in a tab for a while. This is a little old– I think this version is from 2010– which you can tell by the fact that it doesn’t use “STEM” over and over and over. Ah, those idyllic days…

Like all responsible investigations in this area, this is kind of a Rorschach blot of social science, in that you can find more or less whatever you want. If you’re a glass-half-empty sort, you can point to the summary table showing that, in physics, 5.6% of Ph.D.’s granted in the ten years leading up to 2005 were earned by students from underrepresented groups, while only 2.5% of faculty in 2007 were from those groups. There’s also a small decrease in diversity from 2002 to 2007, with the total fraction of faculty from underrepresented groups falling from 2.6% to 2.5%, and the assistant professor fraction dropping from 5.2% to 4.4%.

On the other hand, if you’re a glass-half-full sort, you can reasonably point out that the discrepancy is less bleak when you look at more direct comparisons. After all, looking at percentages over all faculty ranks sort of mushes together the stratigraphy inherent in academic hiring– assistant professors are mostly drawn from recent Ph.D.’s, while full professors can date back to the disco era, when Ph.D. cohorts were much less diverse than they are now.

If you compare the numbers for recent Ph.D.’s and assistant professors (the larval stage of tenured faculty), the gap is much, much smaller– 5.6% of Ph.D. recipients are from underrepresented groups, and 4.4% of assistant professors. Broken out by subgroup, you see a similar picture– 2% of Physics Ph.D. recipients are black, compared to 1.2% of assistant professors; 2.9% of Ph.D. recipients are Hispanic, compared to 3.3% of assistant professors.

They also have gender statistics, which are much the same– women are 14.3% of Ph.D. recipients up to 2005, and just 9.1% of faculty. But women are 16.8% of assistant professors, slightly better than their representation in the graduating cohort.

So, you know, half-full, half-empty. Whatever picture you want to present, you can pull numbers to bolster your argument from this report.

— That by itself wouldn’t really justify a post, but the interesting wrinkle in this report, to me, was that it actually tries to look at the question of prestige. The argument about comparing Ph.D. demographics to assistant professors is not new, and the usual response is that yes, women are hired into new faculty positions at a slightly higher rate than their representation in the Ph.D. cohort, but they mostly get lower-prestige jobs.

This report gets at that a tiny bit by looking at the top 100 departments in each field (by some ranking), and breaking out the statistics for the top 50 and numbers 51-100. Interestingly, this shows the opposite effect usually claimed– the overall comparison is 14.3% of women in the Ph.D. class to 16.8% of assistant professors, but for the top 50 it’s 14.3% to 17.5%, and the bottom 50 it’s 14.3% to 15.6%. The fraction of women in new faculty ranks is actually slightly higher at the more prestigious institutions.

The difference is more striking for black Ph.D.’s and faculty. Overall, the split from Ph.D. to associate professor is 2.0% to 1.2%, but for the top 50 it’s 2.0% to 1.6% and for numbers 51-100 it’s 2.0% to 0.5%. Again, the numbers are better at the more prestigious institutions.

Again, though, there’s something for everyone, because if you move up a rank, the fractions reverse– women are 12.6% of associate professors at Top 50 institutions, and 14.3% at numbers 51-100 (this probably ought to be compared to an earlier Ph.D. cohort; they give a value for the ten-year period up to 1995 of 10.8%). For black Ph.D.’s and faculty, it’s 0.4% associate profs at the top 50, and 0.8% at numbers 51-100 (compared to 1% of Ph.D.’s up to ’95). So, there’s an argument to be made that the top institutions are paying attention to diversity in hiring, but not tenuring those folks. I suspect this is muddled, though, by the tendency of people who don’t get tenure in the Ivy League (and equivalent) to end up with tenured positions at schools a little farther down the prestige scale.

You can also argue that by looking only at the top 100 institutions, this is working in very rarified territory and ignoring the community colleges and lower-level state universities that serve the majority of students, etc. Which, yes, it is. But then, the usual argument is that things are much worse for women and underrepresented minorities in elite academia than at the lower levels, and these numbers suggest that the “much worse” of elite academia isn’t all that terrible in an absolute sense.

So, you know, social science continues to be messy, and analysis of the academic job market is complicated and confusing. Also, the Sun rose in the east this morning, and my kids are super cute.

May 14, 2015

Terence TaoA remark on the lonely runner conjecture

The lonely runner conjecture is the following open problem:

Conjecture 1 Suppose one has {n \geq 1} runners on the unit circle {{\bf R}/{\bf Z}}, all starting at the origin and moving at different speeds. Then for each runner, there is at least one time {t} for which that runner is “lonely” in the sense that it is separated by a distance at least {1/n} from all other runners.

One can normalise the speed of the lonely runner to be zero, at which point the conjecture can be reformulated (after replacing {n} by {n+1}) as follows:

Conjecture 2 Let {v_1,\dots,v_n} be non-zero real numbers for some {n \geq 1}. Then there exists a real number {t} such that the numbers {tv_1,\dots,tv_n} are all a distance at least {\frac{1}{n+1}} from the integers, thus {\|tv_1\|_{{\bf R}/{\bf Z}},\dots,\|tv_n\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}} where {\|x\|_{{\bf R}/{\bf Z}}} denotes the distance of {x} to the nearest integer.

This conjecture has been proven for {n \leq 7}, but remains open for larger {n}. The bound {\frac{1}{n+1}} is optimal, as can be seen by looking at the case {v_i=i} and applying the Dirichlet approximation theorem. Note that for each non-zero {v}, the set {\{ t \in {\bf R}: \|vt\|_{{\bf R}/{\bf Z}} \leq r \}} has (Banach) density {2r} for any {0 < r < 1/2}, and from this and the union bound we can easily find {t \in {\bf R}} for which

\displaystyle \|tv_1\|_{{\bf R}/{\bf Z}},\dots,\|tv_n\|_{{\bf R}/{\bf Z}} \geq \frac{1}{2n}-\varepsilon

for any {\varepsilon>0}, but it has proven to be quite challenging to remove the factor of {2} to increase {\frac{1}{2n}} to {\frac{1}{n+1}}. (As far as I know, even improving {\frac{1}{2n}} to {\frac{1+c}{2n}} for some absolute constant {c>0} and sufficiently large {n} remains open.)

The speeds {v_1,\dots,v_n} in the above conjecture are arbitrary non-zero reals, but it has been known for some time that one can reduce without loss of generality to the case when the {v_1,\dots,v_n} are rationals, or equivalently (by scaling) to the case where they are integers; see e.g. Section 4 of this paper of Bohman, Holzman, and Kleitman.

In this post I would like to remark on a slight refinement of this reduction, in which the speeds {v_1,\dots,v_n} are integers of bounded size, where the bound depends on {n}. More precisely:

Proposition 3 In order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that the {v_1,\dots,v_n} are integers of size at most {n^{Cn^2}}, where {C} is an (explicitly computable) absolute constant. (More precisely: if this restricted version of the lonely runner conjecture is true for all {n \leq n_0}, then the original version of the conjecture is also true for all {n \leq n_0}.)

In principle, this proposition allows one to verify the lonely runner conjecture for a given {n} in finite time; however the number of cases to check with this proposition grows faster than exponentially in {n}, and so this is unfortunately not a feasible approach to verifying the lonely runner conjecture for more values of {n} than currently known.

One of the key tools needed to prove this proposition is the following additive combinatorics result. Recall that a generalised arithmetic progression (or {GAP}) in the reals {{\bf R}} is a set of the form

\displaystyle  P = \{ n_1 v_1 + \dots + n_d v_d: n_1,\dots,n_d \in {\bf Z}; |n_1| \leq N_1, \dots, |n_d| \leq N_d \}

for some {v_1,\dots,v_d \in {\bf R}} and {N_1,\dots,N_d > 0}; the quantity {d} is called the rank of the progression. If {t>0}, the progression {P} is said to be {t}-proper if the sums {n_1 v_1 + \dots + n_d v_d} with {|n_i| \leq t N_i} for {i=1,\dots,d} are all distinct. We have

Lemma 4 (Progressions lie inside proper progressions) Let {P} be a GAP of rank {d} in the reals, and let {t \geq 1}. Then {P} is contained in a {t}-proper GAP {Q} of rank at most {d}, with

\displaystyle |Q| \leq (2t)^d d^{6d^2} \prod_{i=1}^d (2N_i+1).

Proof: See Theorem 2.1 of this paper of Bilu. (Very similar results can also be found in Theorem 3.40 of my book with Van Vu, or Theorem 1.10 of this paper of mine with Van Vu.) \Box

Now let {n \geq 1}, and assume inductively that the lonely runner conjecture has been proven for all smaller values of {n}, as well as for the current value of {n} in the case that {v_1,\dots,v_n} are integers of size at most {n^{Cn^2}} for some sufficiently large {C}. We will show that the lonely runner conjecture holds in general for this choice of {n}.

let {v_1,\dots,v_n} be non-zero real numbers. Let {C_0} be a large absolute constant to be chosen later. From the above lemma applied to the GAP {\{ n_1 v_1 + \dots + n_d v_d: n_1,\dots,n_d \in \{-1,0,1\}\}}, one can find a {n^{C_0n}}-proper GAP {Q} of rank at most {n} containing {\{v_1,\dots,v_n\}} such that

\displaystyle  |Q| \leq (6n^{C_0 n})^n n^{6n^2};

in particular {|Q| \leq n^{Cn^2}} if {C} is large enough depending on {C_0}.

We write

\displaystyle  Q = \{ n_1 w_1 + \dots + n_d w_d: n_1,\dots,n_d \in {\bf Z}; |n_1| \leq N_1,\dots,|n_d| \leq N_d \}

for some {d \leq n}, {w_1,\dots,w_d}, and {N_1,\dots,N_d \geq 0}. We thus have {v_i = \phi(a_i)} for {i=1,\dots,n}, where {\phi: {\bf R}^d \rightarrow {\bf R}} is the linear map {\phi(n_1,\dots,n_d) := n_1 w_1 + \dots + n_d w_d} and {a_1,\dots,a_n \in {\bf Z}^d} are non-zero and lie in the box {\{ (n_1,\dots,n_d) \in {\bf R}^d: |n_1| \leq N_1,\dots,|n_d| \leq N_d \}}.

We now need an elementary lemma that allows us to create a “collision” between two of the {a_1,\dots,a_n} via a linear projection, without making any of the {a_i} collide with the origin:

Lemma 5 Let {a_1,\dots,a_n \in {\bf R}^d} be non-zero vectors that are not all collinear with the origin. Then, after replacing one or more of the {a_i} with their negatives {-a_i} if necessary, there exists a pair {a_i,a_j} such that {a_i-a_j \neq 0}, and such that none of the {a_1,\dots,a_n} is a scalar multiple of {a_i-a_j}.

Proof: We may assume that {d \geq 2}, since the {d \leq 1} case is vacuous. Applying a generic linear projection to {{\bf R}^2} (which does not affect collinearity, or the property that a given {a_k} is a scalar multiple of {a_i-a_j}), we may then reduce to the case {d=2}.

By a rotation and relabeling, we may assume that {a_1} lies on the negative {x}-axis; by flipping signs as necessary we may then assume that all of the {a_2,\dots,a_n} lie in the closed right half-plane. As the {a_i} are not all collinear with the origin, one of the {a_i} lies off of the {x}-axis, by relabeling, we may assume that {a_2} lies off of the {x} axis and makes a minimal angle with the {x}-axis. Then the angle of {a_2-a_1} with the {x}-axis is non-zero but smaller than any non-zero angle that any of the {a_i} make with this axis, and so none of the {a_i} are a scalar multiple of {a_2-a_1}, and the claim follows. \Box

We now return to the proof of the proposition. If the {a_1,\dots,a_n} are all collinear with the origin, then {\phi(a_1),\dots,\phi(a_n)} lie in a one-dimensional arithmetic progression {\{ mv: |m| \leq |Q| \}}, and then by rescaling we may take the {v_1,\dots,v_n} to be integers of magnitude at most {|Q| \leq n^{Cn}}, at which point we are done by hypothesis. Thus, we may assume that the {a_1,\dots,a_n} are not all collinear with the origin, and so by the above lemma and relabeling we may assume that {a_n-a_1} is non-zero, and that none of the {a_i} are scalar multiples of {a_n-a_1}.

We write

\displaystyle  a_n-a_1 = (c_1,\dots,c_d) \ \ \ \ \ (1)

with {|c_i| \leq 2 N_i} for {i=1,\dots,d}; by relabeling we may assume without loss of generality that {c_d} is non-zero, and furthermore that

\displaystyle  \frac{|c_i|}{N_i} \leq \frac{|c_d|}{N_d}

for {i=1,\dots,d}. We can also factor

\displaystyle  (c_1,\dots,c_d) = q (c'_1,\dots,c'_d) \ \ \ \ \ (2)

where {q} is a natural number and {c'_1,\dots,c'_d} have no common factor.

We now define a variant {\tilde \phi: {\bf R}^d \rightarrow {\bf R}} of {\phi: {\bf R}^d \rightarrow {\bf R}} by the map

\displaystyle  \tilde \phi(n_1,\dots,n_d) := n_1 \tilde w_1 + \dots + n_{d-1} \tilde w_{d-1} - \frac{n_d}{c_d} (c_1 \tilde w_1 + \dots + c_{d-1} \tilde w_{d-1}),

where the {\tilde w_1,\dots,\tilde w_{d-1}} are real numbers that are linearly independent over {{\bf Q}}, whose precise value will not be of importance in our argument. This is a linear map with the property that {\tilde \phi(a_n-a_1)=0}, so that {\tilde \phi(a_1),\dots,\tilde \phi(a_n)} consists of at most {n-1} distinct real numbers, which are non-zero since none of the {a_i} are scalar multiples of {a_n-a_1}, and the {\tilde w_i} are linearly independent over {{\bf Q}}. As we are assuming inductively that the lonely runner conjecture holds for {n-1}, we conclude (after deleting duplicates) that there exists at least one real number {\tilde t} such that

\displaystyle  \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n}.

We would like to “approximate” {\phi} by {\tilde \phi} to then conclude that there is at least one real number {t} such that

\displaystyle  \| t \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| t \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}.

It turns out that we can do this by a Fourier-analytic argument taking advantage of the {n^{C_0 n}}-proper nature of {Q}. Firstly, we see from the Dirichlet approximation theorem that one has

\displaystyle  \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \leq \frac{1}{10 n^2}

for a set {\tilde t} of reals of (Banach) density {\gg n^{-O(n)}}. Thus, by the triangle inequality, we have

\displaystyle  \| \tilde t \tilde \phi(a_1) \|_{{\bf R}/{\bf Z}}, \dots, \| \tilde t \tilde \phi(a_n) \|_{{\bf R}/{\bf Z}} \geq \frac{1}{n} - \frac{1}{10n^2}

for a set {\tilde t} of reals of density {\gg n^{-O(n)}}.

Applying a smooth Fourier multiplier of Littlewood-Paley type, one can find a trigonometric polynomial

\displaystyle  \eta(x) = \sum_{m: |m| \leq n^{C_0 n/10}} b_m e^{2\pi i mx}

which takes values in {[0,1]}, is {\gg 1} for {\|x\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n} - \frac{1}{10n^2}}, and is no larger than {O( n^{-100 C_0n} )} for {\|x\|_{{\bf R}/{\bf Z}} \leq \frac{1}{n+1}}. We then have

\displaystyle  \mathop{\bf E}_t \prod_{j=1}^n \eta( t \tilde \phi(a_j) ) \gg n^{-O(n)}

where {\mathop{\bf E}_t f(t)} denotes the mean value of a quasiperiodic function {f} on the reals {{\bf R}}. We expand the left-hand side out as

\displaystyle  \sum_{m_1,\dots,m_n: m_1 \tilde \phi(a_1) + \dots + m_n \tilde \phi(a_n) = 0} b_{m_1} \dots b_{m_n}.

From the genericity of {\tilde w_1,\dots,\tilde w_n}, we see that the constraint

\displaystyle  m_1 \tilde \phi(a_1) + \dots + m_n \tilde \phi(a_n) = 0

occurs if and only if {m_1 a_1 + \dots + m_n a_n} is a scalar multiple of {a_n-a_1}, or equivalently (by (1), (2)) an integer multiple of {(c'_1,\dots,c'_d)}. Thus

\displaystyle  \sum_{m_1,\dots,m_n: m_1 a_1 + \dots + m_n a_n \in {\bf Z} (c'_1,\dots,c'_d)} b_{m_1} \dots b_{m_n} \gg n^{-O(n)}. \ \ \ \ \ (3)

Next, we consider the average

\displaystyle  \mathop{\bf E}_t \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \ \ \ \ \ (4)


\displaystyle  \xi := c'_1 w_1 + \dots + c'_d w_d. \ \ \ \ \ (5)

and {\varphi} is the Dirichlet series

\displaystyle  \varphi(x) := \sum_{m: |m| \leq n^{C_0 n/2}} e^{2\pi i mx}.

By Fourier expansion and writing {v_j = \phi(a_j)}, we may write (4) as

\displaystyle  \sum_{m,m_1,\dots,m_n: |m| \leq n^{C_0n/2}; m_1 \phi(a_1) + \dots + m_n \phi(a_n) = m \xi} b_{m_1} \dots b_{m_n}.

The support of the {b_{m_i}} implies that {|m_i| \leq n^{C_0n/10}}. Because of the {n^{C_0 n}}-properness of {Q}, we see (for {n} large enough) that the equation

\displaystyle  m_1 \phi(a_1) + \dots + m_n \phi(a_n) = m \xi \ \ \ \ \ (6)

implies that

\displaystyle  m_1 a_1 + \dots + m_n a_n \in {\bf Z} (c'_1,\dots,c'_d) \ \ \ \ \ (7)

and conversely that (7) implies that (6) holds for some {m} with {|m| \leq n^{C_0 n/2}}. From (3) we thus have

\displaystyle  \mathop{\bf E}_t \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \gg n^{-O(1)}.

In particular, there exists a {t} such that

\displaystyle  \varphi( t \xi ) \prod_{j=1}^n \eta( t v_j ) \gg n^{-O(1)}.

Since {\varphi} is bounded in magnitude by {n^{C_0n/2}}, and {\eta} is bounded by {1}, we thus have

\displaystyle  \eta(t v_j) \gg n^{-C_0 n/2 - O(1)}

for each {1 \leq j \leq n}, which by the size properties of {\eta} implies that {\|tv_j\|_{{\bf R}/{\bf Z}} \geq \frac{1}{n+1}} for all {1 \leq j \leq n}, giving the lonely runner conjecture for {n}.

Filed under: expository, math.CA, question Tagged: arithmetic progressions, lonely runner conjecture

Tommaso DorigoBurton Richter Advocates Electron-Positron Colliders, For A Change

Burton Richter, 1975 Nobel prize in Physics for the discovery of the J/ψ meson, speaks about the need of a new linear collider for the measurement of Higgs boson branching fractions in a video on Facebook (as soon as I understand how to paste here I will!)

Richter has been a fervent advocate of electron-positron machines over hadronic accelerators throughout his life. So you really could not expect anything different from him - but he still does it with all his might. At one point he says, talking of the hadron collider scientists who discovered the Higgs boson:

read more

May 13, 2015

Chad OrzelPhilosophy and Pictures of Physics

I’ve been sort of falling down on my obligation to promote myself– I’ve written two blog posts for Forbes this week, and forgotten to post about them here. The first is a thing about philosophy in physics, and how Einstein illustrates both the good and bad aspects of a philosophical approach.

The second is a bit on the listicle side, looking at some types of diagrams that physicists draw when talking about physics. It’s prompted by a ZapperZ post noting that scientists talking about science always draw pictures, but other subjects get by with just talking.

These are both quickly-dashed-off sorts of things, as I’ve had a wretched cold for the last week and a bit, and didn’t have the brain for any really in-depth physics. And, of course, the philosophy thing has gotten an order of magnitude more views than last week’s in-depth dog physics post. Ah, the Internet…

Anyway, if you hadn’t seen those before, well, there they are. Go on over and read them. Or don’t. It’s all good.

Doug NatelsonA matter of gravity

Gravity remains an enduring challenge in physics.  Newton had the insight that he could understand many phenomena (e.g., the falling of an apple, the orbit of Halley's comet) if the gravitational interaction between two objects is an attractive force proportional to the product of the objects' masses, and inversely proportional to the square of the distance between them ( \(F = - G M_{1}M_{2}/r^{2}\) ), and acts along the line between the objects.  The constant of proportionality, \(G\), is Newton's gravitational constant.   About 225 years later, Einstein had the insight that in more generality one should think of gravity as actually distorting space-time; what looks like a force is really a case of freely falling objects moving in the (locally) straightest trajectories that they can.  (Obligatory rubber sheet analogy here.)  In that theory, general relativity (GR), Newton's constant \(G\) again appears as a constant of proportionality that basically sets the scale for the amount of space-time distortion produced by a certain amount of stress-energy (rather than just good old-fashioned mass).  GR has been very successful so far, though we have reasons to believe that it is the classical limit of some still unknown quantum theory of gravity.  Whatever that quantum theory is, \(G\) must still show up to set the scale for the gravitational interaction.

It makes sense that we would like to know the numerical value of \(G\) as accurately and precisely as possible - seems like the first thing you'd like to understand, right?  The challenge is, as I've explained before, gravity is actually an incredibly weak force.  To measure it well in absolute numbers, you need an apparatus that can measure small forces while not being influenced by other, faaaaaar stronger forces like electromagnetism, and you need to do something like measure the force (or the counter-force that you need to apply to null out the gravitational force) as a function of different configurations of test masses (such as tungsten spheres). 

I'm revisiting this because of a couple (1, 2) of interesting papers that came out recently.  As I'd said in that 2010 post, the challenge in measuring \(G\) is so difficult that different groups have obtained nominally high precision measurements (precise out to the fourth decimal place, such as \(G = 6.6730 \pm 0.00029 \times 10^{-11}\) Nm2/kg2) that are mutually inconsistent with each other.  See this plot (Fig. 1 from arxiv:1505.01774).  The various symbols correspond to different published measurements of \(G\) over the last 35 years (!).  The distressing thing is that there does not seem to be much sign of convergence.  The recent papers are looking to see whether there is actually some periodicity to the results (as hinted by the sinusoid on the plot).  To be clear:  The authors are not suggesting that \(G\) really varies with a several year period - rather, they're exploring the possibility that there might be some unknown systematic effect that is skewing the results of some or all of the various measurement approaches.  As both teams of authors say, the best solution would be to come up with a very clean experimental scheme and run it, undisturbed, continuously for years at a time.  That's not easy or cheap.  It's important to note that this is what real, careful measurement science looks like, not some of the stuff that has made web headlines lately.

BackreactionInformation transfer without energy exchange

While I was writing up my recent paper on classical information exchange, a very interesting new paper appeared on quantum information exchange
    Information transmission without energy exchange
    Robert H. Jonsson, Eduardo Martin-Martinez, Achim Kempf
    Phys. Rev. Lett. 114, 110505 (2015)
    arXiv:1405.3988 [quant-ph]
I was visiting Achim’s group two weeks ago and we talked about this for a bit.

In their paper the authors study the communication channels in lower dimensional spaces by use of thought experiments. If you do thought experiments, you need thought detectors. Named “Unruh De-Witt detectors” after their inventors such detectors are the simplest systems you can think of that detect something. It’s a state with two energy levels and it couples linearly to the field you want to detect. A positive measurement results in an excitation of the detector’s state, and that’s pretty much it. No loose cables, no helium leaks, no microwave ovens.

Equipped with such a thought detector, you can then introduce Bob and Alice and teach them to exchange information by means of a quantum field, in the simplest case a massless scalar field. What they can do depends on the way the field is correlated with itself at distant points. In a flat space-time with three spatial dimensions, the field only correlates with itself on the lightcone. But in lower dimensions this isn’t so.

The authors then demonstrate just exactly how Alice can use the correlations to send information to Bob in two spatial dimensions, or 2+1 dimensional space-time as the physicists like to say. They further show that Alice can submit a signal without it drowning in quantum noise. Alice submits information not by sending a quantum of the field, but by coupling and decoupling her detector to the field’s vacuum state. The correlations in the field then imply that whether her detector is coupled or not affects how the field excites Bob’s detector.

Now this information exchange between Bob and Alice is always slower than the speed of light so you might wonder why that is interesting. It is interesting because Alice doesn’t send any energy! While the switching of the detectors requires some work, this is a local energy requirement which doesn’t travel with the information.

Okay you might say then, fine, but we don’t live in 2+1 dimensional space-time. That’s right, but we don’t live in three plus one dimensional flat space-time either: We live in a curved space-time. This isn’t further discussed in the paper but the correlations allowing for this information exchange without energy can also exist in some curved backgrounds. The interesting question is then of course, in which backgrounds and what does this mean for our sending of information into black holes? Do we really need to use quanta of energy for this or is there a way to avoid this? And if it can be avoided, what does it mean for the information being stored in black holes?

I am sure we will hear more about this in the future...

Tommaso DorigoAMVA4NewPhysics: Statistical Learning And New Discoveries

I am very happy today because I have been notified by the European Community that a project I submitted for funding as coordinator last January has been evaluated very positively by the EU reviewers. The project is a training network of universities and research centres in Europe, with participation of two additional academic partners and four industrial partners from the US, Russia, Italy and Belgium. The network name is "AMVA4NewPhysics", and it aims at developing and applying cutting-edge statistical learning tools to new physics and Englert-Higgs boson studies to the LHC data collected by ATLAS and CMS.

read more

Jacques Distler Action-Angle Variables

This semester, I taught the Graduate Mechanics course. As is often the case, teaching a subject leads you to rethink that you thought you understood, sometimes with surprising results.

The subject for today’s homily is Action-Angle variables.

Let (,ω)(\mathcal{M},\omega) be a 2n2n-dimensional symplectic manifold. Let us posit that \mathcal{M} had a foliation by nn-dimensional Lagrangian tori (a torus, TMT\subset M, is Lagrangian if ω| T=0\omega|_T =0). Removing a subset, SS\subset \mathcal{M}, of codimension codim(S)2codim(S)\geq 2, where the leaves are singular, we can assume that all of the leaves on =\S\mathcal{M}'=\mathcal{M}\backslash S are smooth tori of dimension nn.

The objective is to construct coordinates φ i,K i\varphi^i, K_i with the following properties.

  1. The φ i\varphi^i restrict to angular coordinates on the tori. In particular φ i\varphi^i shifts by 2π2\pi when you go around the corresponding cycle on TT.
  2. The K iK_i are globally-defined functions on \mathcal{M} which are constant on each torus.
  3. The symplectic form ω=dK idφ i\omega= d K_i\wedge d \varphi^i.

From 1, it’s clear that it’s more convenient to work with the 1-forms dφ id\varphi^i, which are single-valued (and closed, but not necessarily exact), rather than with the φ i\varphi^i themselves. In 2, it’s rather important that the K iK_i are really globally-defined. In particular, an integrable Hamiltonian is a function H(K)H(K). The K iK_i are the nn conserved quantities which make the Hamiltonian integrable.

Obviously, a given foliation is compatible with infinitely many “integrable Hamiltonians,” so the existence of a foliation is the more fundamental concept.

All of this is totally standard.

What never really occurred to me is that the standard construction of action-angle variables turns out to be very closely wedded to the particular case of a cotangent bundle, =T *M\mathcal{M}=T^*M.

As far as I can tell, action-angle variables don’t even exist for foliations of more general symplectic manifolds, \mathcal{M}.

Any contagent bundle, T *MT^*M, has a canonical 1-form, θ\theta, on it. The standard symplectic structure is ω=dθ\omega = d\theta. The construction of the action-variables requires that we choose a homology basis, γ i\gamma_i, for each torus, in a fashion that is locally-constant1, as we move between tori of the foliation. The K iK_i are then defined as
(1)K i=12π γ iθK_i = \frac{1}{2\pi}\int_{\gamma_i} \theta
Note that, because the torus is Lagrangian, the values of K iK_i are independent of the particular choices of path chosen to represent γ i\gamma_i. Having constructed the K iK_i, the closed 1-forms
(2)dφ i=i /K iωd\varphi^i = i_{\partial/\partial K_i}\omega
Great! Except that, for a general symplectic manifold, there’s no analogue of θ\theta. In particular, it’s trivial to construct examples of symplectic manifolds, foliated by Lagrangian tori, for which no choice of action variables, K iK_i, exist. As a simple example, take (,ω)=(T 4,dθ 1dθ 3+dθ 2dθ 4) (\mathcal{M},\omega)= (T^4, d\theta_1\wedge d\theta_3 +d\theta_2\wedge d\theta_4) Obviously, we can foliate this by Lagrangian tori (taking TT to be the subsets {θ 1,θ 2=const}\{\theta_1, \theta_2=\text{const}\}). But the corresponding action variables don’t exist. We’d happily choose K i=θ iK_i=\theta_i, for i=1,2i=1,2, but those aren’t single-valued functions on \mathcal{M}. You could try to use functions that are actually single-valued (e.g., K i=sin(θ i)K_i=\sin(\theta_i)), but then the corresponding 1-forms, η i\eta^i, in ω=dK iη i\omega = dK_i\wedge\eta^i, don’t have 2π×2\pi\timesintegral periods (heck, they’re not even closed!).

Surely, there’s some sort of cohomological characterization of when Action-Angle variables exist. The situation feels a lot like the characterization of when symplectomorphisms (vector fields that preserve the symplectic form) are actually Hamiltonian vector fields2.

And, even when the obstruction vanishes, how do we generalize the construction (1), (2) to more general symplectic manifolds?


Just to be clear, there are plenty of examples where you can construct action-angle variables for foliations of symplectic manifolds which are not cotangent bundles. An easy example is (,ω)=(S 2,rdrdθ(1+r 2) 2) (\mathcal{M},\omega) = \left(S^2, \frac{r dr\wedge d\theta}{{(1+r^2)}^2}\right) where (K,φ)=(12(1+r 2),θ)(K,\varphi)= \left(-\frac{1}{2(1+r^2)},\theta\right) are action-angle variables for the obvious foliation by circles. This example “works” because once you remove the singular leaves (at r=0,r=0,\infty), ω\omega becomes cohomologically trivial on \mathcal{M}' and we can then use the standard construction. [ω]=0H 2(\S,)[\omega]=0\in H^2(\mathcal{M}\backslash S,\mathbb{R}) sounds like a sufficient condition for constructing action-angle variables. But is it necessary?

1I’m pretty sure we need them to be globally-constant over \mathcal{M}'. I’ll assume there’s no obstruction to doing that.

2If you’re not familiar with that story, note that Xω=0 \mathcal{L}_X \omega = 0 is tantamount to the condition that i Xωi_X\omega is a closed 1-form. If it happens that it is an exact 1-form, i Xω=df i_X\omega = d f then X={f,}X = \{f,\cdot\} is a Hamiltonian vector field. The obstruction to writing XX as a Hamiltonian vector field is, thus, the de Rham cohomology class, [i Xω]H 1(,)[i_X\omega]\in H^1(\mathcal{M},\mathbb{R}).

In the example at hand, that’s exactly what is going on. Any single-valued function, H(θ 1,θ 2)H(\theta_1,\theta_2), is an “integrable” Hamiltonian for the above foliation. But the symmetries, X 1=θ 3X_1=\frac{\partial}{\partial\theta_3} and X 2=θ 4X_2=\frac{\partial}{\partial\theta_4} are not Hamiltonian vector fields. Hence, there are no corresponding conserved action variables.

May 12, 2015

Terence Tao254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions

In the previous set of notes, we saw how zero-density theorems for the Riemann zeta function, when combined with the zero-free region of Vinogradov and Korobov, could be used to obtain prime number theorems in short intervals. It turns out that a more sophisticated version of this type of argument also works to obtain prime number theorems in arithmetic progressions, in particular establishing the celebrated theorem of Linnik:

Theorem 1 (Linnik’s theorem) Let {a\ (q)} be a primitive residue class. Then {a\ (q)} contains a prime {p} with {p \ll q^{O(1)}}.

In fact it is known that one can find a prime {p} with {p \ll q^{5}}, a result of Xylouris. For sake of comparison, recall from Exercise 65 of Notes 2 that the Siegel-Walfisz theorem gives this theorem with a bound of {p \ll \exp( q^{o(1)} )}, and from Exercise 48 of Notes 2 one can obtain a bound of the form {p \ll \phi(q)^2 \log^2 q} if one assumes the generalised Riemann hypothesis. The probabilistic random models from Supplement 4 suggest that one should in fact be able to take {p \ll q^{1+o(1)}}.

We will not aim to obtain the optimal exponents for Linnik’s theorem here, and follow the treatment in Chapter 18 of Iwaniec and Kowalski. We will in fact establish the following more quantitative result (a special case of a more powerful theorem of Gallagher), which splits into two cases, depending on whether there is an exceptional zero or not:

Theorem 2 (Quantitative Linnik theorem) Let {a\ (q)} be a primitive residue class for some {q \geq 2}. For any {x > 1}, let {\psi(x;q,a)} denote the quantity

\displaystyle  \psi(x;q,a) := \sum_{n \leq x: n=a\ (q)} \Lambda(n).

Assume that {x \geq q^C} for some sufficiently large {C}.

The implied constants here are effective.

Note from the Landau-Page theorem (Exercise 54 from Notes 2) that at most one exceptional zero exists (if {\varepsilon} is small enough). A key point here is that the error term {O( \exp( - c \frac{\log x}{\log q} \log \frac{1}{\varepsilon} ) )} in the exceptional zero case is an improvement over the error term when no exceptional zero is present; this compensates for the potential reduction in the main term coming from the {\chi_1(a) \frac{x^{\beta-1}}{\beta}} term. The splitting into cases depending on whether an exceptional zero exists or not turns out to be an essential technique in many advanced results in analytic number theory (though presumably such a splitting will one day become unnecessary, once the possibility of exceptional zeroes are finally eliminated for good).

Exercise 3 Assuming Theorem 2, and assuming {x \geq q^C} for some sufficiently large absolute constant {C}, establish the lower bound

\displaystyle  \psi(x;a,q) \gg \frac{x}{\phi(q)}

when there is no exceptional zero, and

\displaystyle  \psi(x;a,q) \gg \varepsilon \frac{x}{\phi(q)}

when there is an exceptional zero {\beta = 1 - \frac{\varepsilon}{\log q}}. Conclude that Theorem 2 implies Theorem 1, regardless of whether an exceptional zero exists or not.

Remark 4 The Brun-Titchmarsh theorem (Exercise 33 from Notes 4), in the sharp form of Montgomery and Vaughan, gives that

\displaystyle  \pi(x; q, a) \leq 2 \frac{x}{\phi(q) \log (x/q)}

for any primitive residue class {a\ (q)} and any {x \geq q}. This is (barely) consistent with the estimate (1). Any lowering of the coefficient {2} in the Brun-Titchmarsh inequality (with reasonable error terms), in the regime when {x} is a large power of {q}, would then lead to at least some elimination of the exceptional zero case. However, this has not led to any progress on the Landau-Siegel zero problem (and may well be just a reformulation of that problem). (When {x} is a relatively small power of {q}, some improvements to Brun-Titchmarsh are possible that are not in contradiction with the presence of an exceptional zero; see this paper of Maynard for more discussion.)

Theorem 2 is deduced in turn from facts about the distribution of zeroes of {L}-functions. Recall from the truncated explicit formula (Exercise 45(iv) of Notes 2) with (say) {T := q^2} that

\displaystyle  \sum_{n \leq x} \Lambda(n) \chi(n) = - \sum_{\hbox{Re}(\rho) > 3/4; |\hbox{Im}(\rho)| \leq q^2; L(\rho,\chi)=0} \frac{x^\rho}{\rho} + O( \frac{x}{q^2} \log^2 q)

for any non-principal character {\chi} of modulus {q}, where we assume {x \geq q^C} for some large {C}; for the principal character one has the same formula with an additional term of {x} on the right-hand side (as is easily deduced from Theorem 21 of Notes 2). Using the Fourier inversion formula

\displaystyle  1_{n = a\ (q)} = \frac{1}{\phi(q)} \sum_{\chi\ (q)} \overline{\chi(a)} \chi(n)

(see Theorem 69 of Notes 1), we thus have

\displaystyle  \psi(x;a,q) = \frac{x}{\phi(q)} ( 1 - \sum_{\chi\ (q)} \overline{\chi(a)} \sum_{\hbox{Re}(\rho) > 3/4; |\hbox{Im}(\rho)| \leq q^2; L(\rho,\chi)=0} \frac{x^{\rho-1}}{\rho}

\displaystyle  + O( \frac{\log^2 q}{q} ) )

and so it suffices by the triangle inequality (bounding {1/\rho} very crudely by {O(1)}, as the contribution of the low-lying zeroes already turns out to be quite dominant) to show that

\displaystyle  \sum_{\chi\ (q)} \sum_{\sigma > 3/4; |t| \leq q^2; L(\sigma+it,\chi)=0} x^{\sigma-1} \ll \exp( - c \frac{\log x}{\log q} ) \ \ \ \ \ (2)

when no exceptional zero is present, and

\displaystyle  \sum_{\chi\ (q)} \sum_{\sigma > 3/4; |t| \leq q^2; L(\sigma+it,\chi)=0; \sigma+it \neq \beta} x^{\sigma-1} \ll \exp( - c \frac{\log x}{\log q} \log \frac{1}{\varepsilon} ) \ \ \ \ \ (3)

when an exceptional zero is present.

To handle the former case (2), one uses two facts about zeroes. The first is the classical zero-free region (Proposition 51 from Notes 2), which we reproduce in our context here:

Proposition 5 (Classical zero-free region) Let {q, T \geq 2}. Apart from a potential exceptional zero {\beta}, all zeroes {\sigma+it} of {L}-functions {L(\cdot,\chi)} with {\chi} of modulus {q} and {|t| \leq T} are such that

\displaystyle  \sigma \leq 1 - \frac{c}{\log qT}

for some absolute constant {c>0}.

Using this zero-free region, we have

\displaystyle  x^{\sigma-1} \ll \log x \int_{1/2}^{1-c/\log q} 1_{\alpha < \sigma} x^{\alpha-1}\ d\alpha

whenever {\sigma} contributes to the sum in (2), and so the left-hand side of (2) is bounded by

\displaystyle  \ll \log x \int_{1/2}^{1 - c/\log q} N( \alpha, q, q^2 ) x^{\alpha-1}\ d\alpha

where we recall that {N(\alpha,q,T)} is the number of zeroes {\sigma+it} of any {L}-function of a character {\chi} of modulus {q} with {\sigma \geq \alpha} and {0 \leq t \leq T} (here we use conjugation symmetry to make {t} non-negative, accepting a multiplicative factor of two).

In Exercise 25 of Notes 6, the grand density estimate

\displaystyle  N(\alpha,q,T) \ll (qT)^{4(1-\alpha)} \log^{O(1)}(qT) \ \ \ \ \ (4)

is proven. If one inserts this bound into the above expression, one obtains a bound for (2) which is of the form

\displaystyle  \ll (\log^{O(1)} q) \exp( - c \frac{\log x}{\log q} ).

Unfortunately this is off from what we need by a factor of {\log^{O(1)} q} (and would lead to a weak form of Linnik’s theorem in which {p} was bounded by {O( \exp( \log^{O(1)} q ) )} rather than by {q^{O(1)}}). In the analogous problem for prime number theorems in short intervals, we could use the Vinogradov-Korobov zero-free region to compensate for this loss, but that region does not help here for the contribution of the low-lying zeroes with {t = O(1)}, which as mentioned before give the dominant contribution. Fortunately, it is possible to remove this logarithmic loss from the zero-density side of things:

Theorem 6 (Log-free grand density estimate) For any {q, T > 1} and {1/2 \leq \alpha \leq 1}, one has

\displaystyle  N(\alpha,q,T) \ll (qT)^{O(1-\alpha)}.

The implied constants are effective.

We prove this estimate below the fold. The proof follows the methods of the previous section, but one inserts various sieve weights to restrict sums over natural numbers to essentially become sums over “almost primes”, as this turns out to remove the logarithmic losses. (More generally, the trick of restricting to almost primes by inserting suitable sieve weights is quite useful for avoiding any unnecessary losses of logarithmic factors in analytic number theory estimates.)

Exercise 7 Use Theorem 6 to complete the proof of (2).

Now we turn to the case when there is an exceptional zero (3). The argument used to prove (2) applies here also, but does not gain the factor of {\log \frac{1}{\varepsilon}} in the exponent. To achieve this, we need an additional tool, a version of the Deuring-Heilbronn repulsion phenomenon due to Linnik:

Theorem 8 (Deuring-Heilbronn repulsion phenomenon) Suppose {q \geq 2} is such that there is an exceptional zero {\beta = 1 - \frac{\varepsilon}{\log q}} with {\varepsilon} small. Then all other zeroes {\sigma+it} of {L}-functions of modulus {q} are such that

\displaystyle  \sigma \leq 1 - c \frac{\log \frac{1}{\varepsilon}}{\log(q(2+|t|))}.

In other words, the exceptional zero enlarges the classical zero-free region by a factor of {\log \frac{1}{\varepsilon}}. The implied constants are effective.

Exercise 9 Use Theorem 6 and Theorem 8 to complete the proof of (3), and thus Linnik’s theorem.

Exercise 10 Use Theorem 8 to give an alternate proof of (Tatuzawa’s version of) Siegel’s theorem (Theorem 62 of Notes 2). (Hint: if two characters have different moduli, then they can be made to have the same modulus by multiplying by suitable principal characters.)

Theorem 8 is proven by similar methods to that of Theorem 6, the basic idea being to insert a further weight of {1 * \chi_1} (in addition to the sieve weights), the point being that the exceptional zero causes this weight to be quite small on the average. There is a strengthening of Theorem 8 due to Bombieri that is along the lines of Theorem 6, obtaining the improvement

\displaystyle  N'(\alpha,q,T) \ll \varepsilon (1 + \frac{\log T}{\log q}) (qT)^{O(1-\alpha)} \ \ \ \ \ (5)

with effective implied constants for any {1/2 \leq \alpha \leq 1} and {T \geq 1} in the presence of an exceptional zero, where the prime in {N'(\alpha,q,T)} means that the exceptional zero {\beta} is omitted (thus {N'(\alpha,q,T) = N(\alpha,q,T)-1} if {\alpha \leq \beta}). Note that the upper bound on {N'(\alpha,q,T)} falls below one when {\alpha > 1 - c \frac{\log \frac{1}{\varepsilon}}{\log(qT)}} for a sufficiently small {c>0}, thus recovering Theorem 8. Bombieri’s theorem can be established by the methods in this set of notes, and will be given as an exercise to the reader.

Remark 11 There are a number of alternate ways to derive the results in this set of notes, for instance using the Turan power sums method which is based on studying derivatives such as

\displaystyle \frac{L'}{L}(s,\chi)^{(k)} = (-1)^k \sum_n \frac{\Lambda(n) \chi(n) \log^k n}{n^s}

\displaystyle  \approx (-1)^{k+1} k! \sum_\rho \frac{1}{(s-\rho)^{k+1}}

for {\hbox{Re}(s)>1} and large {k}, and performing various sorts of averaging in {k} to attenuate the contribution of many of the zeroes {\rho}. We will not develop this method here, but see for instance Chapter 9 of Montgomery’s book. See the text of Friedlander and Iwaniec for yet another approach based primarily on sieve-theoretic ideas.

Remark 12 When one optimises all the exponents, it turns out that the exponent in Linnik’s theorem is extremely good in the presence of an exceptional zero – indeed Friedlander and Iwaniec showed can even get a bound of the form {p \ll q^{2-c}} for some {c>0}, which is even stronger than one can obtain from GRH! There are other places in which exceptional zeroes can be used to obtain results stronger than what one can obtain even on the Riemann hypothesis; for instance, Heath-Brown used the hypothesis of an infinite sequence of Siegel zeroes to obtain the twin prime conejcture.

— 1. Log-free density estimate —

We now prove Theorem 6. We will make no attempt here to optimise the exponents in this theorem, and so will be quite wasteful in the choices of numerical exponents in the argument that follows in order to simplify the presentation.

By increasing {T} if necessary we may assume that

\displaystyle  T \geq q^{10} \ \ \ \ \ (6)

(say); we may also assume that {T} is larger than any specified absolute constant. We may then replace {qT} by {T} in the estimate, thus we wish to show that

\displaystyle  N(\alpha,q,T) \ll T^{O(1-\alpha)}.

Observe that in the regime

\displaystyle  1-\alpha \gg \frac{\log\log T}{\log T}

the claim already follows from the non-log-free density estimate (4). Thus we may assume that

\displaystyle  \alpha = 1 - \frac{A}{\log T}

for some {A = O( \log \log T )}, and the claim is now to show that there are at most {O( \exp( O(A) ) )} zeroes of {L}-functions {L(\sigma+it,\chi)} with {|t| \leq T}, {\sigma \geq 1 - \frac{A}{\log T}}, and {\chi} a character of modulus {q}. We may assume that {A \geq 1}, since the {A < 1} case follows from the {A \geq 1} case (and also essentially follows from the classical zero-free region, in any event).

For minor technical reasons it is convenient to first dispose of the contribution of the principal character. In this case, the zeroes are the same as those of the Riemann zeta function. From the Vinogradov-Korobov zero-free region we conclude there are no zeroes {\sigma} with {\sigma \geq 1 - \frac{A}{\log T}} and {|t| \leq T}. Thus we may restrict attention to non-principal characters {\chi}.

Suppose we have a zero {L(\sigma+it,\chi)=0} of a non-principal character {\chi} of modulus {q} with {\sigma \geq 1 - \frac{A}{\log T}} and {|t| \leq T}. From equation (48) of Notes 2 we then have

\displaystyle  \sum_{n \leq x} \frac{\chi(n)}{n^{\sigma+it}} \ll q T x^{-\sigma}

for any {x \geq 1}; in particular

\displaystyle  \sum_{n \leq x} \frac{\chi(n)}{n^{\sigma+it}} \ll T^{-400} \ \ \ \ \ (7)

(say) for all {T^{500} \leq x \leq T^{1000}}. One can of course obtain more efficient truncations than this, but as mentioned previously we are not trying to optimise the exponents. If one subtracts the {n=1} term from the left-hand side, this already gives a zero-detecting polynomial, but it is not tractable to work with because it contains too many terms with small {n} (and is also not concentrated on those {n} that are almost prime). To fix this, we weight the previous Dirichlet polynomial by {\rho*1}, where {\rho} is an arithmetic function supported on {[1,T^{500}]} to be chosen later obeying the bound {\rho(m) \ll \tau(m)^{O(1)}}. We expand

\displaystyle  \sum_{n \leq T^{1000}} \rho*1(n) \frac{\chi(n)}{n^{\sigma+it}} = \sum_{m \leq T^{500}} \frac{\rho(m) \chi(m)}{m^{\sigma+it}} \sum_{n \leq T^{1000}/m} \frac{\chi(n)}{n^{\sigma+it}}

and hence by (7) and the upper bound on {\rho(m)}

\displaystyle  \sum_{n \leq T^{1000}} \rho*1(n) \frac{\chi(n)}{n^{\sigma+it}} \ll T^{-400} \sum_{m \leq T^{500}} \frac{\tau(m)^{O(1)}}{m^\sigma}.

Since {\sigma \geq 1 - O( \frac{\log\log T}{\log T} )}, one sees from the divisor bound and the hypothesis that {T} is large that

\displaystyle  \left|\sum_{n \leq T^{1000}} \rho*1(n) \frac{\chi(n)}{n^{\sigma+it}}\right| \leq \frac{1}{2}

If we have {\rho(1)=1}, then we can extract the {n=1} term and obtain a zero-detecting polynomial:

\displaystyle  \left|\sum_{1 < n \leq T^{1000}} \rho*1(n) \frac{\chi(n)}{n^{\sigma+it}}\right| \geq \frac{1}{2}.

We now select the weights {\rho}. There are a number of options here; we will use a variant of the “continuous Selberg sieve” from Section 2 of Notes 4. Fix a smooth function {f: {\bf R} \rightarrow {\bf R}} that equals {1} on {[-1,1]} and is supported on {[-2,2]}; we allow implied constants to depend on {f}. For any {R > 1}, define

\displaystyle  \beta_R(n) := \sum_{d|n} \mu(d) f( \frac{\log d}{\log R} ).

Observe from Möbius inversion that {\beta_R(n) = 1_{n=1}} for all {n \leq R}. The weight {\beta_R^2} was used as an upper bound Selberg sieve in Notes 4.

We observe that

\displaystyle  \beta_{R_1} \beta_{R_2} = 1 * \rho_{R_1,R_2} \ \ \ \ \ (8)

for any {R_1,R_2 > 1}, where

\displaystyle  \rho_{R_1,R_2}(d) := \sum_{d_1,d_2: [d_1,d_2] = d} \mu(d_1) f( \frac{\log d_1}{\log R_1} ) \mu(d_2) f( \frac{\log d_2}{\log R_2} ). \ \ \ \ \ (9)

We will need the following general bound:

Lemma 13 (Sieve upper bound) Let {1 \leq R_1 \leq R_2 \ll R_1^{O(1)}}, and let {g} be a completely multiplicative function such that {g(p) = O(1)} for all primes {p}. Then

\displaystyle  \sum_d \frac{g(d)}{d} \rho_{R_1,R_2}(d) \ll \exp( - \sum_{p \leq R_1} \frac{g(p)}{p} ). \ \ \ \ \ (10)

Proof: Clearly, we can restrict {d} to those numbers whose prime factors do not exceed {CR_1^C}, for some large absolute constant {C}.

By a Fourier expansion we can write

\displaystyle  f(u) = \int_{\bf R} e^{-itu} F(t)\ dt \ \ \ \ \ (11)

for some rapidly decreasing function {F: {\bf R} \rightarrow {\bf C}}, and thus the left-hand side of (10) may be written as

\displaystyle  \int_{\bf R} \int_{\bf R} \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \frac{g([d_1,d_2])}{[d_1,d_2] d_1^{it_1/\log R_1} d_2^{it_2/\log R_2}}\ F(t_1) F(t_2) dt_1 dt_2

where we implicitly restrict {d_1,d_2} to numbers whose prime factors do not exceed {CT^C} (note that this makes the integrand absolutely summable and integrable, so that Fubini’s theorem applies). We may factor this as

\displaystyle  \int_{\bf R} \int_{\bf R} \prod_{p \leq CR_1^C} (1 - \frac{g(p)}{p} (p^{-it_1/\log R_1} + p^{-it_2/\log R_2} - p^{-it_1/\log R_1 - it_2/\log R_2}))

\displaystyle  F(t_1) F(t_2) dt_1 dt_2.

By the rapid decrease of {F}, it thus suffices to show that

\displaystyle  \prod_{p \leq CR_1^C} (1 - \frac{g(p)}{p} (p^{-it_1/\log R_1} + p^{-it_2/\log R_2} - p^{-it_1/\log R_1 - it_2/\log R_2}))

\displaystyle  \ll (2+|t_1|+|t_2|)^{O(1)} \exp( - \sum_{p \leq R_1} \frac{g(p)}{p} ).

By Taylor expansion we can bound the left-hand side by

\displaystyle  \ll |\exp( - \sum_{p \leq CR_1^C} \frac{g(p)}{p} (p^{-it_1/\log R_1} + p^{-it_2/\log R_2} + p^{-it_1/\log R_1 - it_2/\log R_2}) )|.

By Mertens theorem we can replace the constraint {p \leq CR_1^C} with {p \leq R_1}. Since {g(p)=O(1)}, it thus suffices to show that

\displaystyle  \sum_{p \leq R_1} \frac{1}{p} |p^{-it_1/\log R_1} + p^{-it_2/\log R_2} - p^{-it_1/\log R_1 - it_2/\log R_2}) - 1|

\displaystyle  \ll \log( 2 + |t_1|+|t_2| ).

But we can factor

\displaystyle  |p^{-it_1/\log R_1} + p^{-it_2/\log R_2} - p^{-it_1/\log R_1 - it_2/\log R_2}) - 1|

\displaystyle  = |p^{-it_1/\log R_1}-1| |p^{-it_2/\log R_1}-1|

\displaystyle  \ll |p^{-it_1/\log R_1}-1|

\displaystyle  \ll \min( |t_1| \frac{\log p}{\log R_1}, 1 )

and the claim follows from Mertens’ theorem. \Box

We record a basic corollary of this estimate:

Corollary 14 For any {R \geq 1}, we have

\displaystyle  \sum_{n \leq x} \beta_R(n)^2 \ll \frac{x}{\log R} \ \ \ \ \ (12)

for any {x \geq R^5}, and

\displaystyle  \sum_{n \leq x} \frac{\beta_R(n)^2}{n} \ll 1 \ \ \ \ \ (13)

for any {x \ll R^{O(1)}}.

Proof: Writing {\beta_R^2 = \rho_{R,R}*1}, we can write the left-hand side of (12) as

\displaystyle  \sum_d \rho_{R,R}(d) (\frac{x}{d} + O(1)).

Since {\rho_{R,R}(d)} is supported on {[1,R^4]} and is bounded above by {\tau(d)^{O(1)}}, the contribution of the {O(1)} error is {O( R^4 \log^{O(1)} R )} which is acceptable. By Lemma 13 with {g=1}, the contribution of the main term is {O( x \exp( - \sum_{p \leq R} \frac{1}{p} )}, and the claim then follows from Mertens’ theorem.

Now we prove (13). Using Rankin’s trick, it suffices to show that

\displaystyle  \sum_n \frac{\beta_R(n)^2}{n^{1+1/\log R}} \ll 1.

The left-hand side factorises as

\displaystyle  \zeta(1 + \frac{1}{\log R}) \sum_n \frac{\rho_{R,R}(n)}{n^{1+1/\log R}}.

From Lemma 13 with {g(n) := n^{-1/\log R}}, we see that

\displaystyle  \sum_n \frac{\rho_{R,R}(n)}{n^{1+1/\log R}} \ll \exp( - \sum_{p \leq R} \frac{1}{p^{1+1/\log R}} )

\displaystyle  \ll \exp( - \log \zeta(1 + \frac{1}{\log R}) )

(using Mertens theorem to control the error between {\log \zeta(1 + \frac{1}{\log R}) = \sum_p \frac{1}{p^{1+1/\log R}} + O(\frac{1}{p^2})} and {\sum_{p \leq R} \frac{1}{p^{1+1/\log R}}}) and the claim follows. \Box

We will work primarily with the cutoff

\displaystyle  \beta_{T^{10}} \beta_{T^{100}} = \rho_{T^{10}, T^{100}} * 1;

the reason for the separate scales {T^{10}} and {T^{100}} will become clearer later. The function {\rho} is supported on {[1,T^{500}]}, equals {1} at {1}, and is bounded by {O( \tau^{O(1)})}, so from the previous discussion we thus have the zero-detector inequality

\displaystyle  |\sum_{T^{100} \leq n \leq T^{1000}} \beta_{T^{10}} \beta_{T^{100}}(n) \frac{\chi(n)}{n^{\sigma+it}}| \gg 1 \ \ \ \ \ (14)

whenever {L(\sigma+it,\chi)=0} with {\chi} of modulus {q}, {|t| \leq T}, and {\sigma \geq 1 - O( \frac{A}{\log T} )}. Our objective is to show that the number of such zeroes os {O( \exp(O(A)))}.

We first control the number of zeroes that are very close together. From equation (48) of Notes 2 with {x = T^2} (say), we see that

\displaystyle  L(s,\chi) \ll \exp( O( A) ) \log( q T )

whenever {\hbox{Re}(s) \geq 1 - O( \frac{A}{\log T} )}, {\hbox{Im}(s) \ll T}, and {\chi} is non-principal of modulus {q}; also from equation (45) of Notes 2 we have

\displaystyle  1 / L( 1 + \frac{1}{\log T}, \chi ) \ll \zeta( 1 + \frac{1}{\log T} ) \ll \log T.

From Jensen’s theorem (Theorem 16 of Supplement 2), we conclude that for any given non-principal {\chi} and any {t_0 \in [-T,T]}, there are at most {O(A)} zeroes of {L(\sigma+it,\chi)} (counting multiplicity, of course) with {\sigma \geq 1 - \frac{A}{\log T}} and {|t-t_0| \leq \frac{A}{\log T}}. To prove Theorem 6, it thus suffices by the usual covering argument to establish the bound

\displaystyle  J \ll \exp( O( A) ) \ \ \ \ \ (15)

whenever one has a sequence {((\chi_j, \sigma_j+ i t_j))_{j=1}^J} of zeroes {L( \sigma_j + it_j, \chi_j )=0} with {\chi_j} a non-principal character of conductor {q}, {|t_j| \leq T}, and {\sigma_j \geq 1 - \frac{A}{\log T}}, obeying the separation condition

\displaystyle  |t_j - t_{j'}| \geq \frac{A}{\log T} \hbox{ whenever } \chi_j = \chi_{j'}, j \neq j'. \ \ \ \ \ (16)

Note from the existing grand zero-density estimate in (4) that

\displaystyle  J \ll \log^{O(1)} T. \ \ \ \ \ (17)

We write (14) for the zeroes {L(\sigma_j + it_j, \chi_j)=0} as

\displaystyle  |\sum_n f(n) \overline{g_j}(n)| \gg 1 \ \ \ \ \ (18)

for all {j=1,\dots,J}, where

\displaystyle  f(n) := \frac{1}{n} \beta_{T^{10}} \beta_{T^{100}}(n) 1_{T^{100} \leq n \leq T^{1000}}


\displaystyle  g_j(n) := n^{1-\sigma_j + it_j} \overline{\chi}_j(n) \psi( \frac{\log n}{\log T} )

and {\psi: {\bf R} \rightarrow {\bf R}} is a smooth function supported on {[50, 2000]} which equals {1} on {[100, 1000]}. Note that the {n^{1-\sigma_j}} term in {g_j(n)} is {O( \exp( O(A) ) )}.

We use the generalised Bessel inequality (Proposition 2 from Notes 3) with {\nu(n) := \frac{1}{n} \beta_{R_1}(n)^2} to conclude that

\displaystyle  \sum_{j=1}^J |\sum_n f(n) \overline{g_j(n)}|^2 \leq (\sum_{T^{100} \leq n \leq T^{1000}} \frac{\beta_{T^{100}}(n)^2}{n} )

\displaystyle  \times (\sum_{1 \leq j,j' \leq J} c_j \overline{c_{j'}} \sum_n \frac{\beta_{T^{10}}(n)^2}{n} g_j(n) \overline{g_{j'}}(n) )

where {c_1,\dots,c_J} are complex numbers with {\sum_{j=1}^J |c_j|^2 = 1}. (Strictly speaking, one needs to deal with the issue that the {g_j, g_{j'}, \nu} are not finitely supported, but there is enough absolute convergence here that this is a routine matter.) From Corollary 14 we have

\displaystyle  \sum_{T^{100} \leq n \leq T^{1000}} \frac{\beta_{T^{100}}(n)^2}{n} \ll 1

(note how the logarithmic factors cancel, which is crucial to obtaining our “log-free” estimate) and so from (18), the inequality {c_j \overline{c_{j'}} \ll |c_j|^2 + |c_{j'}|^2} and symmetry it suffices to show that

\displaystyle  \sum_{j'=1}^J |\sum_n \frac{\beta_{T^{10}}(n)^2}{n} g_j(n) \overline{g_{j'}}(n)| \ll \exp(O(A)) \ \ \ \ \ (19)

for all {1 \leq j \leq J}.

We now estimate the expression

\displaystyle  \sum_n \frac{\beta_{T^{10}}(n)^2}{n} g_j(n) \overline{g_{j'}}(n). \ \ \ \ \ (20)

We expand (20) using (8) as

\displaystyle  \sum_d \frac{\rho_{T^{10}, T^{10}}(d)}{d} \sum_m \frac{1}{m} g_j(dm) \overline{g_{j'}(dm)}. \ \ \ \ \ (21)

From (9), the factor {\rho_{T^{10},T^{10}}(d)} vanishes unless {d \leq T^{40}}, and from the support of {g_j} we see that the inner sum vanishes unless {T^{50}/d \leq m \ll T^{O(1)}}. From Exercise 44 of Notes 2, we then have

\displaystyle  \sum_m \frac{1}{m} g_j(dm) \overline{g_{j'}(dm)} = 1_{\chi_j=\chi_{j'}} \frac{\phi(q)}{q} \int x^{1-\sigma_j+1-\sigma_{j'}} x^{i(t_j-t_{j'})} \psi( \frac{\log x}{\log T} )^2\ \frac{dx}{x} \ \ \ \ \ (22)

\displaystyle  + O( \exp( O(A) ) q \frac{T}{T^{50}/d} ).

From the upper bound {\rho_{T^{10}, T^{10}}(d) \ll \tau^{O(1)} d 1_{d \leq T^{40}}} we have

\displaystyle  \sum_d \frac{|\rho_{T^{10}, T^{10}}(d)|}{d} \ll \log^{O(1)} T, \ \ \ \ \ (23)

so we see from (17) and (6) that the contribution of the error term {O( \exp( O(A) ) q \frac{T}{T^{50}/d} )} to (19) is acceptable. For the main term in (22), we see from Corollary 14 that

\displaystyle  \sum_d \frac{\rho_{T^{10},T^{10}}(d)}{d} \ll \frac{1}{\log T}

so (as the main term in (22) is independent of {d}) the remaining contribution to (19) is bounded by

\displaystyle  \ll \frac{1}{\log T} \sum_{j': \chi_j = \chi_{j'}} |\int x^{1-\sigma_j+1-\sigma_{j'}} x^{i(t_j-t_{j'})} \psi( \frac{\log x}{\log T} )^2\ \frac{dx}{x}|.

Making the change of variables {x=\exp(y \log T)}, this becomes

\displaystyle  \sum_{j': \chi_j = \chi_{j'}} |\int e^{y(1-\sigma_j+1-\sigma_{j'}) \log T} e^{iy (t_j-t_{j'}) \log T} \psi( y )^2\ \ dy|.

The integral is bounded by {\exp(O(A))}, and from two integration by parts it is also bounded by

\displaystyle \exp(O(A)) \frac{1}{(|t_j-t_{j'}| \log T)^2}.

On the other hand, for {\chi_j = \chi_{j'}}, the {t_{j'}} are {\frac{A}{\log T}}-separated by hypothesis, and so

\displaystyle  \sum_{j': \chi_j = \chi_{j'}} \min( 1, \frac{1}{(|t_j-t_{j'}| \log T)^2} ) \ll 1,

and the claim follows.

— 2. Consequences of an exceptional zero —

In preparation for proving Theorem 8, we investigate in this section the consequences of a Landau-Siegel zero, that is to say a real character {\chi_1} of some modulus {q \geq 2} with a zero

\displaystyle  L(\beta,\chi_1) = 0

for some {\beta = 1 - \frac{\varepsilon}{\log q}} with {\varepsilon>0} small. For minor technical reasons we will assume that {q} is a multiple of {6}, so that {\chi_1(2) = \chi_1(3) = 0}; this condition can be easily established by multiplying {\chi} by a principal character of modulus dividing {6}. (We will not need to assume here that {\chi_1} is primitive.)

In Notes 2, we already observed that the presence of an exceptional zero was associated with a small (but positive) value of {L(1,\chi_1)}; indeed, from Lemmas 57 and 59 of Notes 2 we see that

\displaystyle  \frac{\varepsilon}{\log q} \ll L(1,\chi_1) \ll \varepsilon \log q. \ \ \ \ \ (24)

Also, from the class number formula (equation (56) from Notes 2) we have

\displaystyle  L(1,\chi_1) \gg q^{-1/2} \log^{-O(1)} q \ \ \ \ \ (25)

and thus

\displaystyle  \varepsilon \gg q^{-1/2} \log^{-O(1)} q.

For the arguments below, one could also use the slightly weaker estimates in Exercise 67 of Notes 2 or Exercise 57 of Notes 3 and still obtain comparable results. We will however not rely on Siegel’s theorem (Theorem 62 of Notes 2) in order to keep all bounds effective.

We now refine this analysis. We begin with a complexified version of Exercise 58 from Notes 2:

Exercise 15 Let {\chi_1} be a non-principal character of modulus {q \geq 2}. Let {s = \sigma+it} with {0 \leq \sigma \leq 2} and {|t| \leq T} for some {T \geq 2}. If {\sigma \neq 1}, show that

\displaystyle  \sum_{n \leq x} \frac{1*\chi_1(n)}{n^s} = \zeta(s) L(s,\chi_1) + \frac{x^{1-s}}{1-s} L(1,\chi_1) \ \ \ \ \ (26)

\displaystyle  + O( (qT)^2 x^{-1/2} \frac{x^{1-\sigma}}{1-\sigma} )

for any {x \geq 1}. (Hint: use the Dirichlet hyperbola method and Exercise 44 from Notes 2.)

If {\chi} is a non-principal character of modulus {q} with {\chi \neq \chi_1}, show that

\displaystyle  \sum_{n \leq x} \frac{1*\chi_1(n) \chi(n)}{n^s} = L(s,\chi) L(s,\chi \chi_1) + O( (qT)^2 x^{-1/2} \frac{x^{1-\sigma}}{1-\sigma} ).

For technical reasons it will be convenient to work with a completely multiplicative variant of the function {1*\chi_1}. Define the arithmetic function {g: {\bf N} \rightarrow {\bf R}} to be the completely multiplicative function such that {g(p) = 1 + \chi_1(p)} for all {p}; this is equal to {1*\chi} at square-free numbers, but is a bit larger at other numbers. Observe that {g} is non-negative, and has the factorisation

\displaystyle  g = 1 * \chi_1 * h

where {h = g * \mu * \chi_1 \mu} is a multiplicative function that vanishes on primes and obeys the bounds

\displaystyle 0 \leq h(p^j) = (1+\chi_1(p))^{j-2} \chi_1(p) \leq 2^{j-2}

for all {j \geq 2} and primes {p}. In particular {h} is non-negative and {h(p^j)=0} for {p=2,3}, since we assumed {\chi_1(2)=\chi_1(3)=0}. From Euler products we see that

\displaystyle  \sum_n \frac{h(n)}{n^{1/2}} = \prod_p \sum_{j=0}^\infty \frac{h(p^j)}{p^{j/2}}

\displaystyle  \leq \prod_{p \geq 5} 1 + \sum_{j=2}^\infty \frac{2^{j-2}}{p^{j/2}}

\displaystyle  \ll 1

so in particular the Dirichlet series {{\mathcal D} h(s) := \sum_n \frac{h(n)}{n^s}} is analytic and uniformly bounded on {\hbox{Re}(s)>1/2}. This implies that

\displaystyle  1 \leq {\mathcal D} h(1) \ll 1 \ \ \ \ \ (27)

and also that

\displaystyle  \sum_{n \leq x} \frac{h(n)}{n^s} = {\mathcal D} h(s) + O( x^{1/2-\sigma} )

whenever {s = \sigma+it} with {\sigma \geq 1/2}.

Taking Dirichlet series, we see that

\displaystyle  \sum_n \frac{g(n)}{n^s} = \zeta(s) L(s,\chi_1) {\mathcal D} h(s)

whenever {\hbox{Re}(s) > 1}; more generally, we have

\displaystyle  \sum_n \frac{g(n)\chi(n)}{n^s} = L(s,\chi) L(s,\chi\chi_1) {\mathcal D}(\chi h)(s)

for any character {\chi}. Now we look at what happens inside the critical strip:

Exercise 16 Let {\chi_1} be a real character of modulus {q} a multiple of {6}, and let {g} be as above. Let {s = \sigma+it} with {3/4 \leq \sigma \leq 2} and {|t| \leq T} for some {T \geq 2}. If {\sigma \neq 1}, show that

\displaystyle  \sum_{n \leq x} \frac{g(n)}{n^s} = \zeta(s) L(s,\chi_1) {\mathcal D} h(s) + \frac{x^{1-s}}{1-s} L(1,\chi_1) {\mathcal D} h(1) \ \ \ \ \ (28)

\displaystyle  + O( (qT)^2 x^{-1/4} \frac{x^{1-\sigma}}{1-\sigma} )

for any {x \geq 1}.

If {\chi} is a non-principal character of modulus {q} with {\chi \neq \chi_1}, show that

\displaystyle  \sum_{n \leq x} \frac{g(n) \chi(n)}{n^s} = L(s,\chi) L(s,\chi \chi_1) {\mathcal D}(\chi h)(s) \ \ \ \ \ (29)

\displaystyle  + O( (qT)^2 x^{-1/4} \frac{x^{1-\sigma}}{1-\sigma} ).

We record a nice corollary of these estimates due to Bombieri, which asserts that the exceptional zero forces {g} to vanish (or equivalently, {\chi_1(p)} to become {-1}) on most large primes:

Lemma 17 (Bombieri’s lemma) Let {\chi_1} be a real character of modulus {q} with an exceptional zero at {\beta = 1 - \frac{\varepsilon}{\log q}} for some sufficiently small {\varepsilon>0}. Then for any {x \gg q^{20}}, we have

\displaystyle  \sum_{x < p \leq x^2} \frac{1+\chi_1(p)}{p} \ll \varepsilon \frac{\log x}{\log q}.

Informally, Bombieri’s lemma asserts that {\chi_1(p)=-1} for most primes between {q^{20}} and {q^{1/\varepsilon}}. The exponent of {20} here can be lowered substantially with a more careful analysis, but we will not do so here. For primes much larger than {q^{1/\varepsilon}}, {\chi_1} becomes equidistributed; see Exercise 23 below.

Proof: Without loss of generality we may take {q} to be a multiple of {6}. We may assume that {\varepsilon \frac{\log x}{\log q} \leq 1}, as the claim follows from Mertens’ theorem otherwise; in particular {x^{1-\beta} \ll 1}.

By (28) for {s=\beta} we have

\displaystyle  \sum_{n \leq x} \frac{g(n)}{n^\beta} = \frac{x^{1-\beta}}{1-\beta} ( L(1,\chi_1) {\mathcal D} h(1) + O( q^2 x^{-1/4} ) ) \ \ \ \ \ (30)

for any {x \geq 1}. Since {{\mathcal D} h(1) \geq 1} and {x \geq q^{20}}, we see from (32), (27) that the error term is dominated by the main term, thus

\displaystyle  \sum_{n \leq x} \frac{g(n)}{n^\beta} \gg \frac{x^{1-\beta}}{1-\beta} L(1,\chi_1) {\mathcal D} h(1).

Next, applying (30) with {x} replaced by {x^3} and subtracting, we have

\displaystyle  \sum_{x < n \leq x^3} \frac{g(n)}{n^\beta} = \frac{x^{1-\beta}}{1-\beta} ( (x^{2(1-\beta)}-1) L(1,\chi_1) {\mathcal D} h(1) + O( q^2 x^{-1/4} ) ).

As {x^{1-\beta} \ll 1}, we have {x^{2(1-\beta)}-1 \ll \varepsilon \frac{\log x}{\log q}} by Taylor expansion. As before, the error term can be bounded by the main term and so

\displaystyle  \sum_{x < n \leq x^3} \frac{g(n)}{n^\beta} \ll \varepsilon \frac{\log x}{\log T} \frac{x^{1-\beta}}{1-\beta} L(1,\chi_1) {\mathcal D} h(1).

Since {g} is non-negative and completely multiplicative, one has

\displaystyle  (\sum_{n \leq x} \frac{g(n)}{n^\beta}) \times (\sum_{x < p \leq x^2} \frac{g(p)}{p^\beta}) \leq \sum_{x < n \leq x^3} \frac{g(n)}{n^\beta}

and thus (since {g(p)=1+\chi_1(p)})

\displaystyle  \sum_{x < p \leq x^2} \frac{1+\chi_1(p)}{p^\beta} \ll \varepsilon \frac{\log x}{\log T}.

Since {x^{1-\beta} \ll 1}, we have {\frac{1}{p} \ll \frac{1}{p^\beta}}, and the claim follows. \Box

Exercise 18 With the hypotheses of Bombieri’s lemma, show that

\displaystyle  \sum_{q^{20/k} < p \leq q^{20}} \frac{1+\chi_1(p)}{p} \ll_k \varepsilon^{1/k}

for any natural number {k}.

Now we can give a more precise version of (24):

Proposition 19 Let {\chi_1} be a real character of modulus {q} with an exceptional zero at {\beta = 1 - \frac{\varepsilon}{\log q}} for some sufficiently small {\varepsilon>0}. Then

\displaystyle  L(1,\chi_1) \asymp \frac{\varepsilon}{\log q} \exp( \sum_{p \leq T} \frac{1+\chi_1(p)}{p} )

for any {T \geq 1} with {\log q \ll \log T \ll \frac{\log q}{\varepsilon}}.

Observe that (24) is a corollary of the {T=q} case of this proposition thanks to Mertens’ theorem and the trivial bounds {0 \leq 1+\chi_1(p) \leq 2}. We thus see from this proposition and Bombieri’s lemma that the exceptional zero {\beta} controls {\chi_1} at primes larger than {q^{20}}, but that {L(1,\chi)} is additionally sensitive to the values of {\chi_1} at primes below this range. For an even more precise formula for {L(1,\chi)}, see this paper of Goldfeld and Schinzel, or Exercise 24 below.

Proof: By Bombieri’s lemma and Mertens’ theorem, it suffices to prove the asymptotic for {T=q}.

We begin with the upper bound

\displaystyle  L(1,\chi_1) \ll \frac{\varepsilon}{\log q} \exp( \sum_{p \leq q} \frac{1+\chi_1(p)}{p} ).

Applying (26) with {s = 1 + \frac{\varepsilon}{\log q}} and {x=q^{10}} we have

\displaystyle  \sum_{n \leq q^{10}} \frac{1*\chi_1(n)}{n^s} = \zeta(s) L(s,\chi_1) - \frac{\exp(-10\varepsilon)}{s-1} ( L(1,\chi_1) + O( q^{-3} ) ). \ \ \ \ \ (31)

The left-hand side is non-negative and {\zeta(s) \gg \frac{1}{s-1}}, so we conclude (using (32)) that

\displaystyle  L(1,\chi_1) \ll L(s,\chi_1).

From Euler products and Mertens’ theorem we have

\displaystyle  L(s,\chi_1) \asymp \exp( \sum_p \frac{\chi_1(p)}{p^{1+\varepsilon/\log q}} )

\displaystyle  \asymp \frac{\varepsilon}{\log T} \exp( \sum_p \frac{1+\chi_1(p)}{p^{1+\varepsilon/\log q}} ).

But from Lemma 17 and Mertens’ theorem we see that

\displaystyle  \sum_p \frac{1+\chi_1(p)}{p^{1+\varepsilon/\log q}} = \sum_{p \leq q} \frac{g(p)}{p} + O(1)

and the claim follows.

Now we establish the matching lower bound

\displaystyle  L(1,\chi_1) \gg \frac{\varepsilon}{\log q} \exp( \sum_{p \leq q} \frac{1+\chi_1(p)}{p} ).

Applying (26) with {\beta = 1 - \frac{\varepsilon}{\log q}} and {x=q^{10}} we have

\displaystyle  \sum_{n \leq q^{10}} \frac{1*\chi_1(n)}{n^\beta} = \frac{\exp(10\varepsilon)}{1-\beta} ( L(1,\chi_1) + O( q^{-3} ) ).

For {n \leq q^{10}}, we have {\frac{1}{n^\beta} \ll \frac{1}{n^s}}, and thus by (32)

\displaystyle  \sum_{n \leq q^{10}} \frac{1*\chi_1(n)}{n^s} \ll \frac{1}{s-1} L(1,\chi).

Inserting this into (31) and using (32) and {\zeta(s) \ll \frac{1}{s-1}} we conclude that

\displaystyle  L(s,\chi) \ll L(1,\chi)

and the claim then follows from the preceding calculations. \Box

Remark 20 One particularly striking consequence of an exceptional zero {L(\beta,\chi_1)} is that the spacing of zeroes of other {L}-functions become extremely regular; roughly speaking, for most other characters {\chi} whose conductor {q} is somewhat (but not too much) larger than the conductor {q_1} of {\chi_1}, the zeroes of {L(s,\chi) L(s,\chi\chi_1)} (at moderate height) mostly lie on the critical line and are spaced in approximate arithmetic progression; this “alternative hypothesis” is in contradiction to to the pair correlation conjecture discussed in Section 4 of Supplement 4. This phenomenon was first discovered by Montgomery and Weinberger and can roughly be explained as follows. By an approximate functional equation similar to Exercise 54 of Supplement 3, one can approximately write {L(s,\chi) L(s,\chi_1)} as the sum of {\sum_{n \lessapprox \sqrt{qq_1}} \frac{\chi (1*\chi_1)(n)}{n^s}} plus {\sum_{n \lessapprox \sqrt{qq_1}} \frac{\bar{\chi} (1*\chi_1)(n)}{n^{1-s}}} times a gamma factor which oscillates like {(q|t|)^{it}} when {s = 1/2+it}. The smallness of {1*\chi_1(n)} on average for medium-sized {n} (as suggested for instance by Bombieri’s lemma) suggests that these sums should be well approximated by much shorter sums, which oscillate quite slowly in {t}. This gives an approximation to {L(1/2+it,\chi) L(1/2+it,\chi_1)} that is of the form {F(t) + (q|t|)^{it} G(t)} for slowly varying {F,G}, which can then be used to place the zeroes of this function in approximate arithmetic progression on the real line.

— 3. The Deuring-Heilbronn repulsion phenomenon —

We now prove Theorem 8. Let {q \geq 2} be such that there is an exceptional zero {\beta = 1 - \frac{\varepsilon}{\log qT}} with {\varepsilon} small, associated to some quadratic character {\chi_1} of modulus {q}:

\displaystyle  L( \beta, \chi_1 ) = 0.

From the class number bound (equation (56) of Notes 2; one could also use Exercise 67 of Notes 2 for a similar bound) we have

\displaystyle  L(1,\chi_1) \gg q^{-1/2} \log^{-O(1)} q \ \ \ \ \ (32)


\displaystyle  \frac{1}{\varepsilon} \ll q^{1/2} \log^{O(1)} q. \ \ \ \ \ (33)

Let {T \geq 2}, let {\chi} be a character of modulus {q} (possibly equal to {\chi_1} or the principal character), and suppose we have

\displaystyle  L( \sigma+it, \chi ) = 0 \ \ \ \ \ (34)

for some {|t| \leq T}, with {\sigma+it \neq \beta}. Our task is to show that

\displaystyle  1 - \sigma \gg \log\frac{1}{\varepsilon} \frac{1}{\log(qT)}. \ \ \ \ \ (35)

We may assume that

\displaystyle  \sigma > 0.99 \ \ \ \ \ (36)

(say), since the claim is trivial otherwise. By multiplying by the principal character of modulus {6} if necessary, we may assume as before that {q} is a multiple of {6}, so that we can utilise the multiplicative function {g} from the previous section. By enlarging {T}, we may assume as in Section 1 that

\displaystyle  T \geq q^{10}; \ \ \ \ \ (37)

we may also assume that {T} is larger than any specified absolute constant. From the classical zero-free region and the Landau-Page theorem we have

\displaystyle  \sigma - 1 \gg \frac{1}{\log T}. \ \ \ \ \ (38)

The task (35) is then equivalent to showing that

\displaystyle  T^{1-\sigma} \gg \varepsilon^{-c} \ \ \ \ \ (39)

for some absolute constant {c>0}.

We recall the sieve cutoffs

\displaystyle  \beta_R(n) = \sum_{d|n} \mu(d) f( \frac{\log d}{\log R} )

from Section 1, which were used in the zero detector. The main difference is that we will “twist” the polynomial by the completely multiplicative function {g}:

Proposition 21 (Zero-detecting polynomial) Let the notation and assumptions be as above.

Proof: First suppose that {\chi} is not equal to {\chi_1} or the principal character. Since {L(s,\chi)=0}, we see from (29), (37), (36), (38) that

\displaystyle  \sum_{n \leq x} \frac{g(n) \chi(n)}{n^s} \ll T^{-10}

(say) for any {T^{500} \leq x \leq T^{1000}}. In particular, as {\rho} is supported on {[1,T^{500}]} and one has {\rho(n)=n^{o(1)}} from the divisor bound, one has

\displaystyle  \sum_{n \leq T^{1000}} (\rho*1)(n) \frac{g(n) \chi(n)}{n^s} \ll \sum_d \frac{|\rho(d)|}{d^\sigma} T^{-10}

\displaystyle  \ll T^{-1}

(say), thanks to (36). Since {\rho*1(n) = \beta_{T^{10}} \beta_{T^{100}}(n)} equals {1_{n=1}} for {n < T^{100}}, we thus conclude (40) since {T} is assumed to be large.

Now suppose that {\chi} is {\chi_1} or the principal character, so that {\zeta(s) L(s,\chi)=0}. From (28), (37), (36), (38) we then have

\displaystyle  \sum_{n \leq x} \frac{g(n)}{n^s} = \frac{x^{1-s}}{1-s} L(1,\chi) {\mathcal D} h(1) + O( T^{-10} )

for {T^{500} \leq x \leq T^{1000}}. By a similar calculation to before, we have

\displaystyle  \sum_{n \leq T^{1000}} (\rho*1)(n) \frac{g(n)}{n^s}

\displaystyle  = \sum_d \frac{\rho(d) g(d)}{d^s} \frac{(T^{1000}/d)^{1-s}}{1-s} L(1,\chi) {\mathcal D} h(1) + O( T^{-1} )

\displaystyle  = \frac{T^{1000(1-s)}}{1-s} {\mathcal D} h(1) L(1,\chi) \sum_d \frac{\rho(d) g(d)}{d} + O( T^{-1} )

\displaystyle  \ll T^{O(1-s)} \varepsilon \exp\left( \sum_{p \leq T} \frac{g(p)}{p} \right) \left|\sum_d \frac{\rho(d) g(d)}{d}\right| + T^{-1}

where we have used (38) and Proposition 19 in the last line. The claim (41) then follows from Corollary 14. \Box

Using the estimates from the previous section, we can establish the following bound:

Proposition 22 We have

\displaystyle  \sum_{T^{100} \leq n \leq T^{1000}} |\beta_{T^{10}}(n)| |\beta_{T^{100}}(n)| \frac{g(n)}{n} \ll \varepsilon^{1/2}.

The point here is that the sieve weights {\beta_{T^{10}}} and {\beta_{T^{100}}} are morally restricting to almost primes, and that {g} should be small on such numbers by Bombieri’s lemma. Assuming this proposition, we conclude that the left-hand sides of (40) or (41) are {O( \varepsilon^{1/2} T^{O(1-\sigma)})}, and (39) follows.

Proof: By the Cauchy-Schwarz inequality it suffices to show that

\displaystyle  \sum_{T^{100} \leq n \leq T^{1000}} \beta_{T^{10}}(n)^2 \frac{g(n)}{n} \ll \varepsilon \ \ \ \ \ (42)


\displaystyle  \sum_{T^{100} \leq n \leq T^{1000}} \beta_{T^{100}}(n)^2 \frac{g(n)}{n} \ll 1. \ \ \ \ \ (43)

We begin with the second bound (42), which we establish by quite crude estimates. By a Fourier expansion we can write

\displaystyle  f(u) = \int_{\bf R} e^{-itu} F(t)\ dt \ \ \ \ \ (44)

for some rapidly decreasing function {F: {\bf R} \rightarrow {\bf C}}, and thus

\displaystyle  \beta_R(n) = \int_{\bf R} \sum_{d|n} \mu(d) d^{-it/\log R} F(t)\ dt \ \ \ \ \ (45)

which factorises as

\displaystyle  \beta_R(n) = \int_{\bf R} \prod_{p|n} (1 - p^{-it/\log R}) F(t)\ dt.

Bounding {1 - p^{-it/\log R} \ll \min( t \frac{\log p}{\log R}, 1 )}, we thus have

\displaystyle  \beta_R(n) \ll_A \int_{\bf R} \prod_{p|n} O( \min( t \frac{\log p}{\log R}, 1 ) ) (1+|t|)^{-A}\ dt

for any {A}. Squaring and using Cauchy-Schwarz, we conclude that

\displaystyle  \beta_R(n) \ll_A \int_{\bf R} \prod_{p|n} O( \min( t \frac{\log p}{\log R}, 1 )^2 ) (1+|t|)^{-A}\ dt

for any {A}. In particular, for {n \leq T^{1000}}, we have

\displaystyle  \beta_{T^{100}}(n)^2 \frac{g(n)}{n} \ll_A \int_{\bf R} \prod_{p \leq T^{1000}: p|n} \frac{2^{\hbox{ord}_p(n)}}{p^{\hbox{ord}_p(n)}} O( \min( t \frac{\log p}{\log T}, 1 )^2 ) (1+|t|)^{-A}\ dt,

and so we can bound the left-hand side of (43) by

\displaystyle  \ll_A \int_{\bf R} \prod_{p \leq T^{1000}} (1 + O( \frac{\min( t \frac{\log p}{\log T}, 1 )}{p} )) (1+|t|)^{-A}\ dt,

which we bound by

\displaystyle  \ll_A \int_{\bf R} \exp( \sum_{p \leq T^{1000}} O( \frac{\min( t \frac{\log p}{\log T}, 1 )}{p} ) ) (1+|t|)^{-A}\ dt.

By Mertens’ theorem we have

\displaystyle  \sum_{p \leq T^{1000}} \frac{\min( t \frac{\log p}{\log T}, 1 )}{p} \ll \log(2+|t|),

and the claim follows by taking {A} large enough.

Now we establish (42). By dyadic decomposition it suffices to show that

\displaystyle  \sum_{n \leq x} \beta_{T^{10}}(n)^2 g(n) \ll \frac{\varepsilon}{\log T} x \ \ \ \ \ (46)

for all {T^{100} \leq x \leq T^{1000}}. The left-hand side may be written as as

\displaystyle  \sum_d \rho_{T^{10},T^{10}}(d) \sum_{n \leq x/d} g(n).

From the hyperbola method we see that

\displaystyle  \sum_{n \leq y} 1*\chi(n) = L(1,\chi) y + O( q y^{1/2} )

for any {y \geq 1}, and thus

\displaystyle  \sum_{n \leq x/d} g(n) = L(1,\chi) \frac{x}{d} {\mathcal D} h(1) + O( q x^{1/2} d^{-1/2} ).

Since {\rho_{T^{10},T^{10}}(d)} is supported on {[0,T^{40}]} and is bounded by {\tau(d)^{O(1)}}, the contribution of the {O( q x^{1/2} d^{-1/2} )} is easily seen to be acceptable (using (33), (37)). The contribution of the main term is

\displaystyle  L(1,\chi) {\mathcal D} h(1) \sum_d \rho_{T^{10},T^{10}}(d) \frac{g(d)}{d}

but this is acceptable by Lemma 13, (27), and Proposition 19. \Box

The proof of Theorem 8 is now complete.

Exercise 23 Let {\chi_1} be a real quadratic character of modulus {q} with a zero at {\beta = 1 - \frac{\varepsilon}{\log q}} for some small {\varepsilon > 0}. Show that

\displaystyle  \sum_{p \leq x} \chi(p) \log p = - \frac{x^\beta}{\beta} + O( \exp( - c \log \frac{1}{\varepsilon} \frac{\log x}{\log q} ) x )

and hence

\displaystyle  \sum_{p \leq x} \chi(p) \log p \ll \exp( - \varepsilon \frac{\log x}{\log q} ) x

for all {x \geq 1} and some absolute constant {c>0}. (Hint: use (3) and the explicit formula.) Roughly speaking, this exercise asserts that {\chi(p)} is equidistributed for primes {p} with {\log p} much larger than {\frac{1}{\varepsilon} \log q}.

Exercise 24 Let {\chi_1} be a real quadratic character of modulus {q} with a zero at {1 - \frac{\varepsilon}{\log q}} for some small {\varepsilon > 0}. Show that

\displaystyle  L(1,\chi) = (1 + O(\varepsilon)) \frac{\varepsilon}{\log q} \prod_{p \leq q^C} (1-\frac{1}{p})^{-1}(1-\frac{\chi(p)}{p})^{-1}

if {C} is a sufficiently large absolute constant. (Hint: use Exercise 81, Lemma 40, and Theorem 41 of Notes 1, as well as Exercise 23.

Exercise 25 (Bombieri’s zero density estimate) Under the hypotheses of Theorem 8, establish the estimate (5). (Hint: repeat the arguments in Section 1, but now “twisted” by {g}.)

Filed under: 254A - analytic prime number theory, math.NT Tagged: arithmetic progressions, Deuring-Heilbronn repulsion, prime number theorem

John BaezResource Theories

by Brendan Fong

Hugo Nava-Kopp and I have a new paper on resource theories:

• Brendan Fong and Hugo Nava-Kopp, Additive monotones for resource theories of parallel-combinable processes with discarding.

A mathematical theory of resources is Tobias Fritz’s current big project. He’s already explained how ordered monoids can be viewed as theories of resource convertibility in a three part series on this blog.

Ordered monoids are great, and quite familiar in network theory: for example, a Petri net can be viewed as a presentation for an ordered commutative monoid. But this work started in symmetric monoidal categories, together with my (Oxford) supervisor Bob Coecke and Rob Spekkens.

The main idea is this: think of the objects of your symmetric monoidal category as resources, and your morphisms as ways to convert one resource into another. The monoidal product or ‘tensor product’ in your category allows you to talk about collections of your resources. So, for example, in the resource theory of chemical reactions, our objects are molecules like oxygen O2, hydrogen H2, and water H2O, and morphisms things like the electrolysis of water:

2H2O → O2 + 2H2

This is a categorification of the ordered commutative monoid of resource convertibility: we now keep track of how we convert resources into one another, instead of just whether we can convert them.

Categorically, I find the other direction easier to state: being a category, the resource theory is enriched over \mathrm{Set}, while a poset is enriched over the poset of truth values or ‘booleans’ \mathbb{B} = \{0,1\}. If we ‘partially decategorify’ by changing the base of enrichment along the functor \mathrm{Set} \to \mathbb{B} that maps the empty set to 0 and any nonempty set to 1, we obtain the ordered monoid corresponding to the resource theory.

But the research programme didn’t start at resource theories either. The starting point was ‘partitioned process theories’.

Here’s an example that guided the definitions. Suppose we have a bunch of labs with interacting quantum systems, separated in space. With enough cooperation and funding, they can do big joint operations on their systems, like create entangled pairs between two locations. For ‘free’, however, they’re limited to classical communication between the locations, although they can do the full range of quantum operations on their local system. So you’ve got a symmetric monoidal category with objects quantum systems and morphisms quantum operations, together with a wide (all-object-including) symmetric monoidal subcategory that contains the morphisms you can do with local quantum operations and classical communication (known as LOCC operations).

This general structure: a symmetric monoidal category (or SMC for short) with a wide symmetric monoidal subcategory, is called a partitioned process theory. We call the morphisms in the SMC processes, and those in the subSMC free processes.

There are a number of methods for building a resource theory (i.e. an SMC) from a partitioned process theory. The unifying idea though, is that your new SMC has the processes f,g as objects, and morphisms f \to g ways of using the free processes to build g from f.

But we don’t have to go to fancy sounding quantum situations to find examples of partitioned process theories. Instead, just look at any SMC in which each object is equipped with an algebraic structure. Then the morphisms defining this structure can be taken as our ‘free’ processes.

For example, in a multigraph category every object has the structure of a ‘special commutative Frobenius algebra’. That’s a bit of a mouthful, but John defined it a while back, and examples include categories where morphisms are electrical circuits, and categories where morphisms are signal flow diagrams.

So these categories give partitioned process theories! This idea of partitioning the morphisms into ‘free’ ones and ‘costly’ ones is reminiscent of what I was saying earlier about the operad of wiring diagrams about it being useful to separate behavioural structure from interconnection structure.

This suggests that we can also view the free processes as generating some sort of operad, that describes the ways we allow ourselves to use free processes to turn processes into other processes. If we really want to roll a big machine out to play with this stuff, framed bicategories may also be interesting; Spivak is already using them to get at questions about operads. But that’s all conjecture, and a bit of a digression.

To get back to the point, this was all just to say that if you find yourself with a bunch of resistors, and you ask ‘what can I build?’, then you’re after the resource theory apparatus.

You can read more about this stuff here:

• Bob Coecke, Tobias Fritz and Rob W. Spekkens, A mathematical theory of resources.

• Tobias Fritz, The mathematical structure of theories of resource convertibility I.

May 11, 2015

Gordon WattsReally? Is it that different?

An article from the New York Times is making its rounds on various social media circles I’m a member of, “What is the Point of a Professor?” It has lots of sucker-punch quotes, like

But as this unique chapter of life closes and they reflect on campus events, one primary part of higher education will fall low on the ladder of meaningful contacts: the professors.

Or this one:

In one national survey, 61 percent of students said that professors frequently treated them “like a colleague/peer,” while only 8 percent heard frequent “negative feedback about their academic work.” More than half leave the graduation ceremony believing that they are “well prepared” in speaking, writing, critical thinking and decision-making.

Obviously implicit is that they aren’t well prepared! This is from an op-ed bit written by Mark Bauerlein, a professor at Emory. He also authored a book titled (which I have not read):

“The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (or, Don’t Trust Anyone Under 30).”

You can probably already tell this has pissed me off. Smile

This sort of hatchet job of a critique of university students gets it part-right, but, I think, really misses the point. Sorting through the article and trying to pull out a central idea that he wants all professors to adopt, I came away with this quote:

Since the early 2000s, I have made students visit my office every other week with a rough draft of an essay. We appraise and revise the prose, sentence by sentence. I ask for a clearer idea or a better verb; I circle a misplaced modifier and wait as they make the fix.

This one-on-one interaction he stresses as the cure for all the ills he has outlined. Let me just say that if I were devote this much time to each of my students I’d still be single. In the modern day and age of universities and professor’s lives (and jobs), there just isn’t time! Too many people want a university education, and there just isn’t enough money in the education system to fun this sort of interaction (and it is getting worse in many of the national largest publics).


But, frankly, if I look at my life and my work, it doesn’t seem that bad. I’m constantly mentoring undergraduates and graduate students. He claims that professors who do research don’t want interaction with their students because it detracts from their research… I doubt it is any different in English than it is in Physics – but that interaction is pretty much the only way I can get good undergraduates to start working with me! And I’m far from alone at the University of Washington.

The two views (I’m doing plenty of mentoring and his that there isn’t enough contact) are compatible: student/professor ratios are an easy explanation. But that isn’t everything – my students are not the same sort of student I was. This quote really irked me as being rather arrogant:

Naturally, students looked to professors for moral and worldly understanding.

Wow. I don’t think he has met most of my students! By the time they get to me they have a pretty good understanding of how the world works. I can help guide them though quantum mechanics and the philosophical questions that raises, but the internet and their friend groups are much stronger influences than I am for everything else!

His book title also makes me think he has missed everything that the new digital age has to offer. It feels like the constant discussion I have when organizing a conference: should we turn off wifi in the conference room and force everyone to listen to the talks, or leave it on? I see benefits and detriments to both – but you can’t hold back progress. Especially as the younger generations grow up and start attending conferences this will not be an option. And they and forward conference organizers will find ways to use it to the attendee’s benefit – the same way I hope it will happen in classrooms. I should say as a caveat, I don’t know anyone has universally cracked that nut yet!

In short:

  • He is right, in large classes can undermine the interaction between students and professors. Blame lies not just with the professors as his article implies here.
  • There is a lot of interaction going on none-the-less. Taking advantage of electronic communication, not just in-person.
  • Undergraduates learn at a university from many sources (e.g. the internet, social groups/media, etc.) in a way they didn’t a generation ago. This is good, not bad.
  • The kids are better than he seems to be giving them credit for. Smile

Edit: I originally saw this post in my fb feed, and my friend Salvatore Rappoccio had a fantastic response. It was private at the time, but now that he has made his reply to the article public”":

What? I can’t hear you over the four undergrad students I’m sending to Fermilab for the summer or the two undergrads per semester I’ve mentored for three years. If you want to chat you’ll have to take a number behind the 20-ish students per semester I sit down with for philosophical discussions or career advice outside of my office hours. I have, in the last semester, discussed physics, career choices, fatherhood, kerbal space program, and drywalling with a 3-tour vet, a guy working full time as a contractor to put himself through school, an electrician going back to school for engineering, and a student practically in tears that I bothered to tell her that she improved a lot over the semester, just to name the most memorable ones.

So What’s the point of a professor, you ask?

To educate, obviously. And not just in the classroom. Maybe it’s just you who falls into the “useless” category.

Secret Blogging SeminarThe dishonest stopping paradox

A number of blogs I read are arguing about a paradox, posed by tumblr blogger perversesheaf. Here is my attempt to explain what the paradox says.

Suppose that a drug company wishes to create evidence that a drug is beneficial, when in fact its effect is completely random. To be concrete, we’ll say that the drug has either positive or negative effect for each patient, each with probability 1/2. The drug company commits in advance that they will state exactly what their procedure will be, including their procedure for when to stop tasks, and that they will release all of their data. Nonetheless, they can guarantee that a Bayesian analyst with a somewhat reasonable prior will come to hold a strong belief that the drug does some good. Below the fold, I’ll explain how they do this, and think about whether I care.

To be concrete, let’s suppose that the drug company knows that the analyst begins with a uniform prior on the drug’s efficacy: she thinks it is equally likely to be any real number between 0 and 1. And the drug company’s goal is to get her to hold a greater than 95 percent belief that the drug’s benefit is greater than 1/2.

The drug company chooses (and announces!) the following procedure: They will continue to run patients, one at a time, until a point where they have run N patients and at least N/2+\sqrt{N} have benefited. This will eventually happen with probability 1. At this point, they stop the study and release all the data. If the analyst updates on this, she will believe that the drug has effectiveness x with a probability that is roughly a bell curve around x = 1/2+1/\sqrt{N} and standard deviation 1/(2 \sqrt{N}). (I didn’t check the constants here, but this is definitely the right form for the answer and, if the constants are wrong then just change N/2+\sqrt{N} to N/2+10 \sqrt{N}.) In particular, the analyst would be willing to bet at 19 to 1 odds that the drug does some good.

If we think that the key to this error is that the length of the experiment is allowed to be infinite, perversesheaf gives some practical numbers based on simulation, which I have also checked in my own simulations. If the experiment is cut off after 10,000 patients, or when N/2+\sqrt{N} are helped, which ever comes first, then it is the latter situation about 30% of the time.

I mostly want to open this up for discussion, but here are some quick points I noticed:

\bullet The uniform prior isn’t important here. As long as the analyst starts out with some positive probability assigned to the whole interval (1/2, 1/2+\epsilon) for some \epsilon>0, you get similar results.

\bullet As Reginald Reagan points out, the analyst rarely thinks the drug is very good.

\bullet To state the last point in a different manner, if the drug was even mildly harmful (say it helped 45% of patients and harmed 55%), this problem doesn’t occur. With those numbers, I ran a simulation and found that only 6 out of 100 analysts were fooled. Moreover, in the limit as the simulation goes to \infty, the fraction of analysts who are fooled will stay finite: If a random walk is biased towards - \infty, the odds that it will be greater than 0, let alone greater than 0 + \sqrt{N}, drop off exponentially.

Normally, I’d like to think a bit more about the question before saying something, but I am getting tired and I want to put up this post for one very key reason: Tumblr is an absurd awful interface for conversations. So, I am hoping that if I get a conversation started here, maybe we will be able to actually talk about it usefully.

May 10, 2015

Mark Chu-CarrollUnderstanding Expressions


I’m going to be trying something a bit different with this blog.

What I’ve tried to do here on GM/BM is make each post as self-contained as possible. Obviously, many things take more than one post to explain, but I’ve done my best to take things, and divide them into pieces where there’s a basic concept or process that’s the focus of each post.

I’m finding that for this type theory stuff, I’m having a hard time doing that. Or rather, given my work schedule right now when I’m trying to write about type theory, I’m finding it hard to find enough time to do that, and still be posting regularly. (That sounds like a complaint, but it’s not meant to be. I started a new job at Dropbox about three months ago, and I absolutely love it. I’m spending a lot of time working because I’m having so much fun, not because some big mean boss guy is leaning over me and threatening.)

Anyway, the point of this whole diversion is that I really want to get this blog back onto a regular posting schedule. But to do that, I’m going to have to change my posting style a bit, and make the individual posts shorter, and less self-contained. I’m definitely interested in what you, my readers, think of this – so please, as I roll into this, let me know if you think it’s working or not. Thanks!

In this post, we’re going to start looking at expressions. This might seem like it’s a diversion from the stuff I’ve been writing about type theory, but it really isn’t! Per Martin-Löf developed a theory of expressions which is used by type theorists and many others, and we’re going to be looking at that.

We’ve all seen arithmetic expressions written out since we were in first grade. We think we understand what they mean. But actually, most of us have never really stopped and thought precisely about what an expression actually means. Most of the time, that’s OK: we’ve got an intuitive sense of it that’s good enough. But for type theory, it’s not sufficient. In fact, even if we did have a really good, formal notion of expressions, it wouldn’t be right for type theory: in type theory, we’re rebuilding mathematics from a foundation of computability, and that’s not the basis of any theory of expressions that’s used in other mathematical fields.

Why is this a problem?

Let’s start by looking at a nice, simple expression:

x^2 + 3x + 7

What does that mean? Roughly speaking, it’s a function with one parameter: f(x) = x^2 + 3x + 7. But that doesn’t really tell us much: all we’ve really done is add a bit of notation. We still don’t know what it means.

Let’s take it a step further. It’s actually describing a computation that adds three elements: +(x^2, 3x, 7). But that’s not quite right either, because we know addition is binary. That means that we need to decide how to divide that addition into two parts. From the commutative property, we know that it doesn’t matter which way we divide it – but from a computational perspective, it might: doing it one way versus the other might take much longer!

We’ll pick left-associative, and say that it’s really +(+(x^2, 3x), 7). We also need to expand other parts of this into this functional idea. If we follow it all out, we wind up with: +(+(*(x,x), *(3, x)),7).

We’ve converted the expression into a collection of applications of primitive functions. Or in terms that are more familiar to geeks like me, we’ve taken the expression, and turned it into an abstract syntax tree (AST) that describes it as a computation.

We’re still being pretty vague here. We haven’t really defined our notion of function or computation. But even with that vagueness, we’ve already started making the notion of what an expression is much more concrete. We’ve taken the abstract notion of expressions, and translated it to a much less abstract notion of a computation expressed as a sequence of computable functions.

This is a really useful thing to do. It’s useful enough that we don’t want to limit it to just “pure” expressions. In the type theoretic view of computation, everything is an expression. That’s important for multiple reasons – but to make it concrete, we’re going to eventually get around to showing how types work in expressions, what it means for an expression to be well-typed, how to infer types for expressions, and so on. We want all of that to work for any program – not just for something that looks like a formula.

Fortunately, this works. We can also take an “expression” like for i in 1 .. 10 do f(i), and analyze it as a function: for(i, 1, 10, f(i)).

So, we’ve got a way of understanding expressions as functions. But even if we want to keep the computational side of things abstract and hand-wavy, that’s still not really enough. We’re closer to understanding expressions, but we’ve still got some huge, gaping holes.

Let’s jump back to that example expression: x^2 + 3x + 7. What does it mean? What we’ve seen so far is that we can both understand it, as a series of function calls: +(+(*(x, x), *(3, x)), 7). But we’d like to be able to evaluate it – to execute the computation that it describes. But we can’t do that: it’s got a gaping hole named x. What do we do with that?

We’re missing a really important notion: funcional abstraction. Our expression isn’t just an expression: what it really is is a function. We alluded to that before, but now we’re going to deal with it head-on. That expression doesn’t really define a computation: it defines a computational object that computes the function. When an expression has free variables – that is, variables that aren’t assigned a meaning within the expression – our expression represents something that’s been abstracted a level: instead of being a computation that produces a value, it’s an object that takes a parameter, and performs a computation on its parameter(s) in order to produce a value.

In our expression x^2 + 3x + 7, we’re looking at an expression in one free variable – which makes it a single-parameter function. In the notation of type theory, we’d write it as (x)(+(+(*(x, x), *(3, x)), 7) – that is,
the parameter variable in parens ahead of the expression that it parameterizes. (Yes, this notation stinks; but I’m sticking with the notations from the texts I’m using, and this is it.)

This notion, of parameters and function abstraction turns out to be more complex than you might expect. I’m going to stop this post here – but around wednesday, I’ll post the next part, which will look at how to understand the arity of an expression.

Michael NielsenWhere will the key ideas shaping the future of scientific publishing come from?

Stefan Janusz from the Royal Society asked me to comment briefly on where I’d look for new ideas about the future of scientific publishing. Here’s my response, crossposted to the Royal Society’s blog about scientific publishing.

It’s tempting to assume the key ideas will come from leading scientists, journal publishers, librarians, policy makers, and so on.

While these are all important groups, I don’t think they’re going to invent the key ideas behind the future of scientific publishing. That will be done primarily by two groups of outsiders: exceptionally creative user interface designers, and people who design group experiences.

Let me unpack both those statements.

The first important group is user interface designers. Ultimately, scientific journals are a user interface to humanity’s scientific knowledge, and people such as Henry Oldenburg, Johannes Gutenberg, and Aldus Manutius were all interface designers.

Now, many people working in science don’t understand the importance or difficulty of user interface design. It’s tempting to think it’s either about “making things pretty” or about “making things easy to use”. And, in fact, much work on interface design doesn’t go much deeper than those tasks. But the designers I’m talking about are doing something much deeper. They’re attempting to invent powerful new representations for knowledge, representations that will let us manipulate and comprehend knowledge in new ways.

Think, for example, of how the invention of user interface ideas such as the hyperlink and the search box have transformed how we relate to knowledge. Or take a look at some of Bret Victor’s beautiful designs for changing how we think about systems and mathematics. In a more playful vein, look at Marco ten Bosch’s gorgeous game Miegakure, which challenges people to learn to think in four spatial dimensions. Or consider the way programming languages such as Coq and Logo change the way people interface to mathematical knowledge.

The second group I named is people who design group experiences. In addition to being user interfaces to scientific knowledge, journals are also a medium for collective intelligence. The design of media for collective intelligence isn’t yet a widely recognized field. But there are many people doing amazing things in this area. Just as a random sample, not necessarily related to science, take a look at Ned Gulley’s work on the Mathworks programming competition. Or economist Robin Hanson on idea futures. Or even people such as the musician Bobby McFerrin, who understands crowd behaviour as well as anyone. Or Jane McGonigal and Elan Lee’s work on creating games based on “puzzles and challenges that no single person could solve on their own”. This broad vein of work is a key direction from which important new fundamental ideas will ultimately come.

Let me finish by identifying a questionable assumption implicit in the question “Where will the future of scientific publishing come from?” The assumption is that there will be a single future for scientific publishing, a kind of jazzed-up version of the scientific article, and it’s simply up to enterprising publishers to figure out what it is.

I believe that, if things go well, there will instead be a proliferation of media types. Some will be informal, cognitive media for people to think and carry out experiments with. Look, for example, at some of Peter Norvig’s ipython notebooks. Others will be collaborative environments for building up knowledge – look at Tim Gowers’s and Terry Tao’s use of blogs and wikis to solve mathematical problems collaboratively. And some will be recognizable descendants of the “paper of record” model common in journals today. So what I hope we’ll see is a much richer and more varied ecosystem, and one that continues to change and improve rapidly over many decades

May 09, 2015

John PreskillMingling stat mech with quantum info in Maryland

I felt like a yoyo.

I was standing in a hallway at the University of Maryland. On one side stood quantum-information theorists. On the other side stood statistical-mechanics scientists.* The groups eyed each other, like Jets and Sharks in West Side Story, except without fighting or dancing.

This March, the groups were generous enough to host me for a visit. I parked first at QuICS, the Joint Center for Quantum Information and Computer Science. Established in October 2014, QuICS had moved into renovated offices the previous month. QuICSland boasts bright colors, sprawling armchairs, and the scent of novelty. So recently had QuICS arrived that the restroom had not acquired toilet paper (as I learned later than I’d have preferred).

Interaction space

Photo credit: QuICS

From QuICS, I yoyo-ed to the chemistry building, where Chris Jarzynski’s group studies fluctuation relations. Fluctuation relations, introduced elsewhere on this blog, describe out-of-equilibrium systems. A system is out of equilibrium if large-scale properties of it change. Many systems operate out of equilibrium—boiling soup, combustion engines, hurricanes, and living creatures, for instance. Physicists want to describe nonequilibrium processes but have trouble: Living creatures are complicated. Hence the buzz about fluctuation relations.

My first Friday in Maryland, I presented a seminar about quantum voting for QuICS. The next Tuesday, I was to present about one-shot information theory for stat-mech enthusiasts. Each week, the stat-mech crowd invites its speaker to lunch. Chris Jarzynski recommended I invite QuICS. Hence the Jets-and-Sharks tableau.

“Have you interacted before?” I asked the hallway.

“No,” said a voice. QuICS hadn’t existed till last fall, and some QuICSers hadn’t had offices till the previous month.**


“We’re QuICS,” volunteered Stephen Jordan, a quantum-computation theorist, “the Joint Center for Quantum Information and Computer Science.”

So began the mingling. It continued at lunch, which we shared at three circular tables we’d dragged into a chain. The mingling continued during the seminar, as QuICSers sat with chemists, materials scientists, and control theorists. The mingling continued the next day, when QuICSer Alexey Gorshkov joined my discussion with the Jarzynski group. Back and forth we yoyo-ed, between buildings and topics.

“Mingled,” said Yigit Subasi. Yigit, a postdoc of Chris’s, specialized in quantum physics as a PhD student. I’d asked how he thinks about quantum fluctuation relations. Since Chris and colleagues ignited fluctuation-relation research, theorems have proliferated like vines in a jungle. Everyone and his aunty seems to have invented a fluctuation theorem. I canvassed Marylanders for bushwhacking tips.

Imagine, said Yigit, a system whose state you know. Imagine a gas, whose temperature you’ve measured, at equilibrium in a box. Or imagine a trapped ion. Begin with a state about which you have information.

Imagine performing work on the system “violently.” Compress the gas quickly, so the particles roil. Shine light on the ion. The system will leave equilibrium. “The information,” said Yigit, “gets mingled.”

Imagine halting the compression. Imagine switching off the light. Combine your information about the initial state with assumptions and physical laws.*** Manipulate equations in the right way, and the information might “unmingle.” You might capture properties of the violence in a fluctuation relation.

2 photos - cut

With Zhiyue Lu and Andrew Maven Smith of Chris Jarzynski’s group (left) and with QuICSers (right)

I’m grateful to have exchanged information in Maryland, to have yoyo-ed between groups. We have work to perform together. I have transformations to undergo.**** Let the unmingling begin.

With gratitude to Alexey Gorshkov and QuICS, and to Chris Jarzynski and the University of Maryland Department of Chemistry, for their hospitality, conversation, and camaraderie.

*Statistical mechanics is the study of systems that contain vast numbers of particles, like the air we breathe and white dwarf stars. I harp on about statistical mechanics often.

**Before QuICS’s birth, a future QuICSer had collaborated with a postdoc of Chris’s on combining quantum information with fluctuation relations.

***Yes, physical laws are assumptions. But they’re glorified assumptions.

****Hopefully nonviolent transformations.

May 08, 2015

ResonaancesWeekend plot: minimum BS conjecture

This weekend plot completes my last week's post:

It shows the phase diagram for models of natural electroweak symmetry breaking. These models can be characterized by 2 quantum numbers:

  • B [Baroqueness], describing how complicated is the model relative to the standard model;   
  • S [Strangeness], describing the fine-tuning needed to achieve electroweak symmetry breaking with the observed Higgs boson mass. 

To allow for a fair comparison, in all models the cut-off scale is fixed to Λ=10 TeV. The standard model (SM) has, by definition,  B=1, while S≈(Λ/mZ)^2≈10^4.  The principle of naturalness postulates that S should be much smaller, S ≲ 10.  This requires introducing new hypothetical particles and interactions, therefore inevitably increasing B.

The most popular approach to reducing S is by introducing supersymmetry.  The minimal supersymmetric standard model (MSSM) does not make fine-tuning better than 10^3 in the bulk of its parameter space. To improve on that, one needs to introduce large A-terms (aMSSM), or  R-parity breaking interactions (RPV), or an additional scalar (NMSSM).  Another way to decrease S is achieved in models the Higgs arises as a composite Goldstone boson of new strong interactions. Unfortunately, in all of those models,  S cannot be smaller than 10^2 due to phenomenological constraints from colliders. To suppress S even further, one has to resort to the so-called neutral naturalness, where new particles beyond the standard model are not charged under the SU(3) color group. The twin Higgs - the simplest  model of neutral naturalness - can achieve S10 at the cost of introducing a whole parallel mirror world.

The parametrization proposed here leads to a striking observation. While one can increase B indefinitely (many examples have been proposed  the literature),  for a given S there seems to be a minimum value of B below which no models exist.  In fact, the conjecture is that the product B*S is bounded from below:
BS ≳ 10^4. 
One robust prediction of the minimum BS conjecture is the existence of a very complicated (B=10^4) yet to be discovered model with no fine-tuning at all.  The take-home message is that one should always try to minimize BS, even if for fundamental reasons it cannot be avoided completely ;)

May 07, 2015

Doug NatelsonPeople you should've heard about: John Bardeen

If you ask the average person to name a physicist, chances are they'll mention Einstein, Hawking, and possibly Sheldon Cooper.  Maybe Richard Feynman, Brian Greene or (*sigh*) Michio Kaku.  I'd like to have an occasional series of posts pointing out people that should be well-known, but for some reason are not.  High up on that list:  John Bardeen, who is the only person one of only two people to win two Nobel prizes in the same field.

Bardeen, like many of his contemporaries, followed what would now be considered a meandering, unconventional trajectory into physics, starting out as an undergrad engineer at Wisconsin, working as a geophysicist, enrolling as a math grad student at Princeton, and eventually doing a doctoral thesis with Wigner worrying about electron-electron interactions in metals (resulting in these two papers about how much energy it takes to remove an electron from a metal, and how that can be strongly affected by the very last layer of atoms at the surface - in the 1980s this would be called "surface science" and now it would be called "nanoscience").

Bardeen was a quiet, brilliant person.  After WWII (during which he worked for the Navy), he went to Bell Labs, where he worked with Walter Brattain to invent the point contact transistor (and much more disagreeably with William Shockley), explaining the critical importance of "surface states" (special levels for the electrons in a semiconductor that exist at the surface, where the periodic potential of the lattice is terminated).  Shockley is viewed in hindsight as famously unpleasant as a co-worker/boss - Bardeen left Bell Labs in large part because of this and ended up at Illinois, where seven years later he worked with Bob Schrieffer and Leon Cooper to produce the brilliant BCS theory of superconductivity, earning his second Nobel.  (Shockley's borderline abusive management style is also responsible for the creation of modern Silicon Valley, but that's another story.)

During and after this period, Bardeen helped build the physics department of UIUC into a condensed matter physics powerhouse, a position it continues to hold.  He was very interested in the theory of charge density waves (special states where the electrons in a solid spontaneously take on a spatially periodic density), though according to Lillian Hoddeson's excellent book (see here, too) he had lost the intellectual flexibility of his youth by this time.  

Bardeen contributed greatly to our understanding and advancement of two whole classes of technologies that have reshaped the world (transistors and superconductors).  He was not a flamboyant personality like Feynman (after all, he was from the Midwest :-) ), and he was not a self-promoter (like Feynman), but he absolutely deserves greater notoriety and appreciation from the general public.

n-Category Café Categories in Control

To understand ecosystems, ultimately will be to understand networks. - B. C. Patten and M. Witkamp

A while back I decided one way to apply my math skills to help save the planet was to start pushing toward green mathematics: a kind of mathematics that can interact with biology and ecology just as fruitfully as traditional mathematics interacts with physics. As usual with math, the payoffs will come slowly, but they may be large. It’s not a substitute for doing other, more urgent things—but if mathematicians don’t do this, who will?

As a first step in this direction, I decided to study networks.

This May, a small group of mathematicians is meeting in Turin for a workshop on the categorical foundations of network theory, organized by Jacob Biamonte. I’m trying to get us mentally prepared for this. We all have different ideas, yet they should fit together somehow.

Tobias Fritz, Eugene Lerman and David Spivak have all written articles here about their work, though I suspect Eugene will have a lot of completely new things to say, too. Now I want to say a bit about what I’ve been doing with Jason Erbele.

Despite my ultimate aim of studying biological and ecological networks, I decided to start by clarifying the math of networks that appear in chemistry and engineering, since these are simpler, better understood, useful in their own right, and probably a good warmup for the grander goal. I’ve been working with Brendan Fong on electrical ciruits, and with Jason Erbele on control theory. Let me talk about this paper:

• John Baez and Jason Erbele, Categories in control.

Control theory is the branch of engineering that focuses on manipulating open systems—systems with inputs and outputs—to achieve desired goals. In control theory, signal-flow diagrams are used to describe linear ways of manipulating signals, for example smooth real-valued functions of time. Here’s a real-world example; click the picture for more details:

For a category theorist, at least, it is natural to treat signal-flow diagrams as string diagrams in a symmetric monoidal category. This forces some small changes of perspective, which I’ll explain, but more important is the question: which symmetric monoidal category?

We argue that the answer is: the category FinRel k\mathrm{FinRel}_k of finite-dimensional vector spaces over a certain field k,k, but with linear relations rather than linear maps as morphisms, and direct sum rather than tensor product providing the symmetric monoidal structure. We use the field k=(s)k = \mathbb{R}(s) consisting of rational functions in one real variable s.s. This variable has the meaning of differentation. A linear relation from k mk^m to k nk^n is thus a system of linear constant-coefficient ordinary differential equations relating mm ‘input’ signals and nn ‘output’ signals.

Our main goal in this paper is to provide a complete ‘generators and relations’ picture of this symmetric monoidal category, with the generators being familiar components of signal-flow diagrams. It turns out that the answer has an intriguing but mysterious connection to ideas that are familiar in the diagrammatic approach to quantum theory! Quantum theory also involves linear algebra, but it uses linear maps between Hilbert spaces as morphisms, and the tensor product of Hilbert spaces provides the symmetric monoidal structure.

We hope that the category-theoretic viewpoint on signal-flow diagrams will shed new light on control theory. However, in this paper we only lay the groundwork.

Signal flow diagrams

There are several basic operations that one wants to perform when manipulating signals. The simplest is multiplying a signal by a scalar. A signal can be amplified by a constant factor:

fcf f \mapsto cf

where c.c \in \mathbb{R}. We can write this as a string diagram:

Here the labels ff and cfc f on top and bottom are just for explanatory purposes and not really part of the diagram. Control theorists often draw arrows on the wires, but this is unnecessary from the string diagram perspective. Arrows on wires are useful to distinguish objects from their duals, but ultimately we will obtain a compact closed category where each object is its own dual, so the arrows can be dropped. What we really need is for the box denoting scalar multiplication to have a clearly defined input and output. This is why we draw it as a triangle. Control theorists often use a rectangle or circle, using arrows on wires to indicate which carries the input ff and which the output cf.c f.

A signal can also be integrated with respect to the time variable:

ff f \mapsto \int f

Mathematicians typically take differentiation as fundamental, but engineers sometimes prefer integration, because it is more robust against small perturbations. In the end it will not matter much here. We can again draw integration as a string diagram:

Since this looks like the diagram for scalar multiplication, it is natural to extend \mathbb{R} to (s),\mathbb{R}(s), the field of rational functions of a variable ss which stands for differentiation. Then differentiation becomes a special case of scalar multiplication, namely multiplication by s,s, and integration becomes multiplication by 1/s.1/s. Engineers accomplish the same effect with Laplace transforms, since differentiating a signal ff is equivalent to multiplying its Laplace transform

(f)(s)= 0 f(t)e stdt\displaystyle{ (\mathcal{L}f)(s) = \int_0^\infty f(t) e^{-st} \,dt }

by the variable s.s. Another option is to use the Fourier transform: differentiating ff is equivalent to multiplying its Fourier transform

(f)(ω)= f(t)e iωtdt\displaystyle{ (\mathcal{F}f)(\omega) = \int_{-\infty}^\infty f(t) e^{-i\omega t}\, dt }

by iω.-i\omega. Of course, the function ff needs to be sufficiently well-behaved to justify calculations involving its Laplace or Fourier transform. At a more basic level, it also requires some work to treat integration as the two-sided inverse of differentiation. Engineers do this by considering signals that vanish for t<0,t &lt; 0, and choosing the antiderivative that vanishes under the same condition. Luckily all these issues can be side-stepped in a formal treatment of signal-flow diagrams: we can simply treat signals as living in an unspecified vector space over the field (s).\mathbb{R}(s). The field (s)\mathbb{C}(s) would work just as well, and control theory relies heavily on complex analysis. In our paper we work over an arbitrary field k.k.

The simplest possible signal processor is a rock, which takes the 'input' given by the force FF on the rock and produces as 'output' the rock's position q.q. Thanks to Newton's second law F=ma,F=ma, we can describe this using a signal-flow diagram:

Here composition of morphisms is drawn in the usual way, by attaching the output wire of one morphism to the input wire of the next.

To build more interesting machines we need more building blocks, such as addition:

+:(f,g)f+g + : (f,g) \mapsto f + g

and duplication:

Δ:f(f,f) \Delta : f \mapsto (f,f)

When these linear maps are written as matrices, their matrices are transposes of each other. This is reflected in the string diagrams for addition and duplication:

The second is essentially an upside-down version of the first. However, we draw addition as a dark triangle and duplication as a light one because we will later want another way to ‘turn addition upside-down’ that does not give duplication. As an added bonus, a light upside-down triangle resembles the Greek letter Δ,\Delta, the usual symbol for duplication.

While they are typically not considered worthy of mention in control theory, for completeness we must include two other building blocks. One is the zero map from the zero-dimensional vector space {0}\{0\} to our field k,k, which we denote as 00 and draw as follows:

The other is the zero map from kk to {0},\{0\}, sometimes called ‘deletion’, which we denote as !! and draw thus:

Just as the matrices for addition and duplication are transposes of each other, so are the matrices for zero and deletion, though they are rather degenerate, being 1×01 \times 0 and 0×10 \times 1 matrices, respectively. Addition and zero make kk into a commutative monoid, meaning that the following relations hold:

The equation at right is the commutative law, and the crossing of strands is the braiding:

B:(f,g)(g,f) B : (f,g) \mapsto (g,f)

by which we switch two signals. In fact this braiding is a symmetry, so it does not matter which strand goes over which:

Dually, duplication and deletion make kk into a cocommutative comonoid. This means that if we reflect the equations obeyed by addition and zero across the horizontal axis and turn dark operations into light ones, we obtain another set of valid equations:

There are also relations between the monoid and comonoid operations. For example, adding two signals and then duplicating the result gives the same output as duplicating each signal and then adding the results:

This diagram is familiar in the theory of Hopf algebras, or more generally bialgebras. Here it is an example of the fact that the monoid operations on kk are comonoid homomorphisms—or equivalently, the comonoid operations are monoid homomorphisms.

We summarize this situation by saying that kk is a bimonoid. These are all the bimonoid laws, drawn as diagrams:

The last equation means we can actually make the diagram at left disappear, since it equals the identity morphism on the 0-dimensional vector space, which is drawn as nothing.

So far all our string diagrams denote linear maps. We can treat these as morphisms in the category FinVect k,\mathrm{FinVect}_k, where objects are finite-dimensional vector spaces over a field kk and morphisms are linear maps. This category is equivalent to the category where the only objects are vector spaces k nk^n for n0,n \ge 0, and then morphisms can be seen as n×mn \times m matrices. The space of signals is a vector space VV over kk which may not be finite-dimensional, but this does not cause a problem: an n×mn \times m matrix with entries in kk still defines a linear map from V nV^n to V mV^m in a functorial way.

In applications of string diagrams to quantum theory, we make FinVect k\mathrm{FinVect}_k into a symmetric monoidal category using the tensor product of vector spaces. In control theory, we instead make FinVect k\mathrm{FinVect}_k into a symmetric monoidal category using the direct sum of vector spaces. In Lemma 1 of our paper we prove that for any field k,k, FinVect k\mathrm{FinVect}_k with direct sum is generated as a symmetric monoidal category by the one object kk together with these morphisms:

where ckc \in k is arbitrary.

However, these generating morphisms obey some unexpected relations! For example, we have:

Thus, it is important to find a complete set of relations obeyed by these generating morphisms, thus obtaining a presentation of FinVect k\mathrm{FinVect}_k as a symmetric monoidal category. We do this in Theorem 2. In brief, these relations say:

(1) (k,+,0,Δ,!) (k, +, 0, \Delta, !) is a bicommutative bimonoid;

(2) the rig operations of kk can be recovered from the generating morphisms;

(3) all the generating morphisms commute with scalar multiplication.

Here item (2) means that +,,0+, \cdot, 0 and 11 in the field kk can be expressed in terms of signal-flow diagrams as follows:

Multiplicative inverses cannot be so expressed, so our signal-flow diagrams so far do not know that kk is a field. Additive inverses also cannot be expressed in this way. So, we expect that a version of Theorem 2 will hold whenever kk is a mere rig: that is, a ‘ring without negatives’, like the natural numbers. The one change is that instead of working with vector spaces, we should work with finitely presented free kk-modules.

Item (3), the fact that all our generating morphisms commute with scalar multiplication, amounts to these diagrammatic equations:

While Theorem 2 is a step towards understanding the category-theoretic underpinnings of control theory, it does not treat signal-flow diagrams that include ‘feedback’. Feedback is one of the most fundamental concepts in control theory because a control system without feedback may be highly sensitive to disturbances or unmodeled behavior. Feedback allows these uncontrolled behaviors to be mollified. As a string diagram, a basic feedback system might look schematically like this:

The user inputs a ‘reference’ signal, which is fed into a controller, whose output is fed into a system, which control theorists call a ‘plant’, which in turn produces its own output. But then the system’s output is duplicated, and one copy is fed into a sensor, whose output is added (or if we prefer, subtracted) from the reference signal.

In string diagrams—unlike in the usual thinking on control theory—it is essential to be able to read any diagram from top to bottom as a composite of tensor products of generating morphisms. Thus, to incorporate the idea of feedback, we need two more generating morphisms. These are the ‘cup’:

and ‘cap’:

These are not maps: they are relations. The cup imposes the relation that its two inputs be equal, while the cap does the same for its two outputs. This is a way of describing how a signal flows around a bend in a wire.

To make this precise, we use a category called FinRel k.\mathrm{FinRel}_k. An object of this category is a finite-dimensional vector space over k,k, while a morphism from UU to V,V, denoted L:UV,L : U \rightharpoonup V, is a linear relation, meaning a linear subspace

LUV L \subseteq U \oplus V

In particular, when k=(s),k = \mathbb{R}(s), a linear relation L:k mk nL : k^m \to k^n is just an arbitrary system of constant-coefficient linear ordinary differential equations relating mm input variables and nn output variables.

Since the direct sum UVU \oplus V is also the cartesian product of UU and V,V, a linear relation is indeed a relation in the usual sense, but with the property that if uUu \in U is related to vVv \in V and uUu' \in U is related to vVv' \in V then cu+cuc u + c'u' is related to cv+cvc v + c'v' whenever c,ck.c,c' \in k.

We compose linear relations L:UVL : U \rightharpoonup V and L:VWL' : V \rightharpoonup W as follows:

LL={(u,w):vV(u,v)Land(v,w)L} L'L = \{(u,w) \colon \; \exists\; v \in V \;\; (u,v) \in L \; and \; (v,w) \in L'\}

Any linear map f:UVf : U \to V gives a linear relation F:UV,F : U \rightharpoonup V, namely the graph of that map:

F={(u,f(u)):uU} F = \{ (u,f(u)) : u \in U \}

Composing linear maps thus becomes a special case of composing linear relations, so FinVect k\mathrm{FinVect}_k becomes a subcategory of FinRel k.\mathrm{FinRel}_k. Furthermore, we can make FinRel k\mathrm{FinRel}_k into a monoidal category using direct sums, and it becomes symmetric monoidal using the braiding already present in FinVect k.\mathrm{FinVect}_k.

In these terms, the cup is the linear relation

:k 2{0} \cup : k^2 \rightharpoonup \{0\}

given by

={(x,x,0):xk}k 2{0} \cup \; = \; \{ (x,x,0) : x \in k \} \; \subseteq \; k^2 \oplus \{0\}

while the cap is the linear relation

:{0}k 2 \cap : \{0\} \rightharpoonup k^2

given by

={(0,x,x):xk}{0}k 2 \cap \; = \; \{ (0,x,x) : x \in k \} \; \subseteq \; \{0\} \oplus k^2

These obey the zigzag relations:

Thus, they make FinRel k\mathrm{FinRel}_k into a compact closed category where k,k, and thus every object, is its own dual.

Besides feedback, one of the things that make the cap and cup useful is that they allow any morphism L:UVL : U \rightharpoonup V to be ‘plugged in backwards’ and thus ‘turned around’. For instance, turning around integration:

we obtain differentiation. In general, using caps and cups we can turn around any linear relation L:UVL : U \rightharpoonup V and obtain a linear relation L :VU,L^\dagger : V \rightharpoonup U, called the adjoint of L,L, which turns out to given by

L ={(v,u):(u,v)L} L^\dagger = \{(v,u) : (u,v) \in L \}

For example, if ckc \in k is nonzero, the adjoint of scalar multiplication by cc is multiplication by c 1c^{-1}:

Thus, caps and cups allow us to express multiplicative inverses in terms of signal-flow diagrams! One might think that a problem arises when when c=0,c = 0, but no: the adjoint of scalar multiplication by 00 is

{(0,x):xk}kk \{(0,x) : x \in k \} \subseteq k \oplus k

In Lemma 3 we show that FinRel k\mathrm{FinRel}_k is generated, as a symmetric monoidal category, by these morphisms:

where ckc \in k is arbitrary.

In Theorem 4 we find a complete set of relations obeyed by these generating morphisms,thus giving a presentation of FinRel k\mathrm{FinRel}_k as a symmetric monoidal category. To describe these relations, it is useful to work with adjoints of the generating morphisms. We have already seen that the adjoint of scalar multiplication by cc is scalar multiplication by c 1,c^{-1}, except when c=0.c = 0. Taking adjoints of the other four generating morphisms of FinVect k,\mathrm{FinVect}_k, we obtain four important but perhaps unfamiliar linear relations. We draw these as ‘turned around’ versions of the original generating morphisms:

Coaddition is a linear relation from kk to k 2k^2 that holds when the two outputs sum to the input:

+ :kk 2 +^\dagger : k \rightharpoonup k^2

+ ={(x,y,z):x=y+z}kk 2+^\dagger = \{(x,y,z) : \; x = y + z \} \subseteq k \oplus k^2

Cozero is a linear relation from kk to {0}\{0\} that holds when the input is zero:

0 :k{0} 0^\dagger : k \rightharpoonup \{0\}

0 ={(0,0)}k{0} 0^\dagger = \{ (0,0)\} \subseteq k \oplus \{0\}

Coduplication is a linear relation from k 2k^2 to kk that holds when the two inputs both equal the output:

Δ :k 2k \Delta^\dagger : k^2 \rightharpoonup k

Δ ={(x,y,z):x=y=z}k 2k \Delta^\dagger = \{(x,y,z) : \; x = y = z \} \subseteq k^2 \oplus k

Codeletion is a linear relation from {0}\{0\} to kk that holds always:

! :{0}k !^\dagger : \{0\} \rightharpoonup k

! ={(0,x)}{0}k !^\dagger = \{(0,x) \} \subseteq \{0\} \oplus k

Since + ,0 ,Δ +^\dagger,0^\dagger,\Delta^\dagger and ! !^\dagger automatically obey turned-around versions of the relations obeyed by +,0,Δ+,0,\Delta and !,!, we see that kk acquires a second bicommutative bimonoid structure when considered as an object in FinRel k.\mathrm{FinRel}_k.

Moreover, the four dark operations make kk into a Frobenius monoid. This means that (k,+,0)(k,+,0) is a monoid, (k,+ ,0 )(k,+^\dagger, 0^\dagger) is a comonoid, and the Frobenius relation holds:

All three expressions in this equation are linear relations saying that the sum of the two inputs equal the sum of the two outputs.

The operation sending each linear relation to its adjoint extends to a contravariant functor

:FinRel kFinRel k \dagger : \mathrm{FinRel}_k \to \mathrm{FinRel}_k

which obeys a list of properties that are summarized by saying that FinRel k\mathrm{FinRel}_k is a †-compact category. Because two of the operations in the Frobenius monoid (k,+,0,+ ,0 )(k, +,0,+^\dagger,0^\dagger) are adjoints of the other two, it is a †-Frobenius monoid.

This Frobenius monoid is also special, meaning that comultiplication (in this case + +^\dagger) followed by multiplication (in this case ++) equals the identity:

This Frobenius monoid is also commutative—and cocommutative, but for Frobenius monoids this follows from commutativity.

Starting around 2008, commutative special †-Frobenius monoids have become important in the categorical foundations of quantum theory, where they can be understood as ‘classical structures’ for quantum systems. The category FinHilb\mathrm{FinHilb} of finite-dimensional Hilbert spaces and linear maps is a †-compact category, where any linear map f:HKf : H \to K has an adjoint f :KHf^\dagger : K \to H given by

f ϕ,ψ=ϕ,fψ \langle f^\dagger \phi, \psi \rangle = \langle \phi, f \psi \rangle

for all ψH,ϕK.\psi \in H, \phi \in K . A commutative special †-Frobenius monoid in FinHilb\mathrm{FinHilb} is then the same as a Hilbert space with a chosen orthonormal basis. The reason is that given an orthonormal basis ψ i \psi_i for a finite-dimensional Hilbert space H,H, we can make HH into a commutative special †-Frobenius monoid with multiplication m:HHHm : H \otimes H \to H given by

m(ψ iψ j)={ψ i i=j 0 ij m (\psi_i \otimes \psi_j ) = \left\{ \begin{array}{cl} \psi_i & i = j \\ 0 & i \ne j \end{array}\right.

and unit i:Hi : \mathbb{C} \to H given by

i(1)= iψ i i(1) = \sum_i \psi_i

The comultiplication m m^\dagger duplicates basis states:

m (ψ i)=ψ iψ i m^\dagger(\psi_i) = \psi_i \otimes \psi_i

Conversely, any commutative special †-Frobenius monoid in FinHilb\mathrm{FinHilb} arises this way.

Considerably earlier, around 1995, commutative Frobenius monoids were recognized as important in topological quantum field theory. The reason, ultimately, is that the free symmetric monoidal category on a commutative Frobenius monoid is 2Cob,2\mathrm{Cob}, the category with 2-dimensional oriented cobordisms as morphisms. But the free symmetric monoidal category on a commutative special Frobenius monoid was worked out even earlier: it is the category with finite sets as objects, where a morphism f:XYf : X \to Y is an isomorphism class of cospans

XSY X \longrightarrow S \longleftarrow Y

This category can be made into a †-compact category in an obvious way, and then the 1-element set becomes a commutative special †-Frobenius monoid.

For all these reasons, it is interesting to find a commutative special †-Frobenius monoid lurking at the heart of control theory! However, the Frobenius monoid here has yet another property, which is more unusual. Namely, the unit 0:{0}k0 : \{0\} \rightharpoonup k followed by the counit 0 :k{0}0^\dagger : k \rightharpoonup \{0\} is the identity:

We call a special Frobenius monoid that also obeys this extra law extra-special. One can check that the free symmetric monoidal category on a commutative extra-special Frobenius monoid is the category with finite sets as objects, where a morphism f:XYf : X \to Y is an equivalence relation on the disjoint union XY,X \sqcup Y, and we compose f:XYf : X \to Y and g:YZg : Y \to Z by letting ff and gg generate an equivalence relation on XYZX \sqcup Y \sqcup Z and then restricting this to XZ.X \sqcup Z.

As if this were not enough, the light operations share many properties with the dark ones. In particular, these operations make kk into a commutative extra-special †-Frobenius monoid in a second way. In summary:

(k,+,0,Δ,!)(k, +, 0, \Delta, !) is a bicommutative bimonoid;

(k,Δ ,! ,+ ,0 )(k, \Delta^\dagger, !^\dagger, +^\dagger, 0^\dagger) is a bicommutative bimonoid;

(k,+,0,+ ,0 )(k, +, 0, +^\dagger, 0^\dagger) is a commutative extra-special †-Frobenius monoid;

(k,Δ ,! ,Δ,!)(k, \Delta^\dagger, !^\dagger, \Delta, !) is a commutative extra-special †-Frobenius monoid.

It should be no surprise that with all these structures built in, signal-flow diagrams are a powerful method of designing processes.

However, it is surprising that most of these structures are present in a seemingly very different context: the so-called ZX calculus, a diagrammatic formalism for working with complementary observables in quantum theory. This arises naturally when one has an nn-dimensional Hilbert space HH with two orthonormal bases ψ i,ϕ i\psi_i, \phi_i that are mutually unbiased, meaning that

|ψ i,ϕ j| 2=1n |\langle \psi_i, \phi_j \rangle|^2 = \displaystyle{\frac{1}{n}}

for all 1i,jn.1 \le i, j \le n. Each orthonormal basis makes HH into commutative special †-Frobenius monoid in FinHilb.\mathrm{FinHilb}. Moreover, the multiplication and unit of either one of these Frobenius monoids fits together with the comultiplication and counit of the other to form a bicommutative bimonoid. So, we have all the structure present in the list above—except that these Frobenius monoids are only extra-special if HH is 1-dimensional.

The field kk is also a 1-dimensional vector space, but this is a red herring: in FinRel k\mathrm{FinRel}_k every finite-dimensional vector space naturally acquires all four structures listed above, since addition, zero, duplication and deletion are well-defined and obey all the relations we have discussed. Jason and I focus on kk in our paper simply because it generates all the objects FinRel k\mathrm{FinRel}_k via direct sum.

Finally, in FinRel k\mathrm{FinRel}_k the cap and cup are related to the light and dark operations as follows:

Note the curious factor of 1-1 in the second equation, which breaks some of the symmetry we have seen so far. This equation says that two elements x,ykx, y \in k sum to zero if and only if x=y.-x = y. Using the zigzag relations, the two equations above give

We thus see that in FinRel k,\mathrm{FinRel}_k, both additive and multiplicative inverses can be expressed in terms of the generating morphisms used in signal-flow diagrams.

Theorem 4 of our paper gives a presentation of FinRel k\mathrm{FinRel}_k based on the ideas just discussed. Briefly, it says that FinRel k\mathrm{FinRel}_k is equivalent to the symmetric monoidal category generated by an object kk and these morphisms:

• addition +:k 2k+: k^2 \rightharpoonup k • zero 0:{0}k0 : \{0\} \rightharpoonup k • duplication Δ:kk 2\Delta: k\rightharpoonup k^2 • deletion !:k0! : k \rightharpoonup 0 • scalar multiplication c:kkc: k\rightharpoonup k for any ckc\in k • cup :k 2{0}\cup : k^2 \rightharpoonup \{0\} • cap :{0}k 2\cap : \{0\} \rightharpoonup k^2

obeying these relations:

(1) (k,+,0,Δ,!)(k, +, 0, \Delta, !) is a bicommutative bimonoid;

(2) \cap and \cup obey the zigzag equations;

(3) (k,+,0,+ ,0 )(k, +, 0, +^\dagger, 0^\dagger) is a commutative extra-special †-Frobenius monoid;

(4) (k,Δ ,! ,Δ,!)(k, \Delta^\dagger, !^\dagger, \Delta, !) is a commutative extra-special †-Frobenius monoid;

(5) the field operations of kk can be recovered from the generating morphisms;

(6) the generating morphisms (1)-(4) commute with scalar multiplication.

Note that item (2) makes FinRel k\mathrm{FinRel}_k into a †-compact category, allowing us to mention the adjoints of generating morphisms in the subsequent relations. Item (5) means that +,,0,1+, \cdot, 0, 1 and also additive and multiplicative inverses in the field kk can be expressed in terms of signal-flow diagrams in the manner we have explained.

So, we have a good categorical understanding of the linear algebra used in signal flow diagrams!

Now Jason is moving ahead to apply this to some interesting problems… but that’s another story, for later.

May 06, 2015

BackreactionTesting modified gravity with black hole shadows

Black hole shadow in the movie “Interstellar.” Image credit: Double Negative artists/DNGR/TM © Warner Bros. Entertainment Inc./Creative Commons (CC BY-NC-ND 3.0) license.

On my visit to Perimeter Institute last week, I talked to John Moffat, whose recent book “Cracking the Particle Code of the Universe” I much enjoyed reading. Talking to John is always insightful. He knows the ins and outs of both particle physics and cosmology, has an opinion on everything, and gives you a complete historical account with this. I have learned a lot from John, especially to put today’s community squabbles into a larger perspective.

John has dedicated much of his research to alternatives to the Standard Model and the cosmological Concordance Model. You might mistake him for being radical or having a chronical want of being controversial, but I assure you neither is the case. The interesting thing about his models is that they are, on the very contrary, deeply conservative. He’s fighting the standard with the standard weapons. Much of his work goes largely ignored by the community for no particular reason other than that the question what counts as an elegant model is arguably subjective. John is presently maybe best known for being one of the few defenders for modified gravity as an alternative to dark matter made of particles.

His modified gravity (MOG) that he has been working on since 2005 is a covariant version of the more widely known MOdified Newtonian Dynamics (or MOND for short). It differs from Bekenstein’s Tensor-Vector-Scalar (TeVeS) model in the field composition; it also adds a vector field to general relativity but then there are additional scalar fields and potentials for the fields. John and his collaborators claim they can fit all the evidence for dark matter with that model, including rotation curves, the acoustic peaks in the cosmic microwave background and the bullet cluster.

I can understand that nobody really liked MOND which didn’t really fit together with general relativity and was based on little more than the peculiar observation that galaxy rotation curves seem to deviate from the Newtonian prediction at a certain acceleration rather than at a certain radius. And TeVeS eventually necessitated the introduction of other types of dark matter, which made it somewhat pointless. I like dark matter because it’s a simple solution and also because I don’t really see any good reason why all matter should couple to photons. But I do have some sympathy for modifying general relativity, though, having tried and failed to do it consistently has made me vary of the many pitfalls. For what MOG is concerned, I don’t see a priori why it’s worse adding a vector field and some scalar fields than adding a bunch of other fields for which we have no direct evidence and then giving them names like WIMPS or axions.

Quite possibly the main reason MOG isn’t getting all that much attention is that it’s arguably unexciting because, if correct, it just means that none of the currently running dark matter experiments will detect anything. What you really want is a prediction for something that can be seen rather than a prediction that nothing can be seen.

That’s why I find John’s recent paper about MOG very interesting, because he points out an observable consequence of his model that could soon be tested:
Modified Gravity Black Holes and their Observable Shadows
J. W. Moffat
European Physics Journal C (2015) 75:130
arXiv:1502.01677 [gr-qc]
In this paper, he has studied how black holes in this modification of gravity differ from ordinary general relativity, and in particular calculated the size of the black hole shadow. As you might have learned from the movie “Interstellar,” black holes appear like dark disks surrounded by rings that are basically extreme lensing effects. The size of the disk in MOG depends on a parameter in the model that can be determined from fitting the galaxy rotation curves. Using this parameter, it turns out the black hole shadow should appear larger by a factor of about ten in MOG as compared to general relativity.

So far nobody has seen a black hole shadow other than in the movies, but the Event Horizon Telescope will soon be looking for exactly that. It isn’t so much a telescope but a collaboration of many telescopes all over the globe, which allows for a very long baseline interferometry with unprecedented precision. In principle they should be able to see the shadow.

What I don’t know though is whether the precision of both radius of the shadow and the mass will be sufficient to make a distinction between normal and modified general relativity in such an experiment. I am also not really sure that the black hole solution in the paper is really the most general solution one can obtain in this type of model, or if not there is some way to backpedal to another solution if the data doesn’t fulfill hopes. And then the paper contains the somewhat ominous remark that the used value for the deviation parameter might not be applicable for the black holes the Event Horizon Telescope has set its eyes on. So there are some good reasons to be skeptic of this and as the scientists always say “more work is needed.” Be that as it may, if the event horizon telescope does see a shadow larger than expected, then this would clearly be a very strong case for modified gravity.

Matt StrasslerLHC Starts Collisions; and a Radio Interview Tonight

In the long and careful process of restarting the Large Hadron Collider [LHC] after its two-year nap for upgrades and repairs, another milestone has been reached: protons have once again collided inside the LHC’s experimental detectors (named ATLAS, CMS, LHCb and ALICE). This is good news, but don’t get excited yet. It’s just one small step. These are collisions at the lowest energy at which the LHC operates (450 GeV per proton, to be compared with the 4000 GeV per proton in 2012 and the 6500 GeV per proton they’ve already achieved in the last month, though in non-colliding beams.) Also the number of protons in the beams, and the number of collisions per second, is still very, very small compared to what will be needed. So discoveries are not imminent!  Yesterday’s milestone was just one of the many little tests that are made to assure that the LHC is properly set up and ready for the first full-energy collisions, which should start in about a month.

But since full-energy collisions are on the horizon, why not listen to a radio show about what the LHC will be doing after its restart is complete? Today (Wednesday May 6th), Virtually Speaking Science, on which I have appeared a couple of times before, will run a program at 5 pm Pacific time (8 pm Eastern). Science writer Alan Boyle will be interviewing me about the LHC’s plans for the next few months and the coming years. You can listen live, or listen later once they post it.  Here’s the link for the program.

Filed under: Dark Matter, Higgs, LHC Background Info, LHC News, Public Outreach Tagged: atlas, cms, DarkMatter, ExoticDecays, Higgs, LHC, PublicTalks, supersymmetry

May 05, 2015

ResonaancesNaturalness' last bunker

Last week Symmetry Breaking ran the article entitled "Natural SUSY's last stand". That title is a bit misleading as it makes you think of General Custer at the eve of Battle of the Little Bighorn, whereas natural supersymmetry has long been dead bodies torn by vultures. Nevertheless, it is  interesting to ask a more general question: are there any natural theories that survived? And if yes, what can we learn about them from the LHC run-2?

For over 30 years naturalness has been the guiding principle in theoretical particle physics.  The standard model by itself has no naturalness problem: it contains 19 free parameters  that are simply not calculable and have to be taken from experiment. The problem arises because we believe the standard model is eventually embedded in a more fundamental  theory where all these parameters, including the Higgs boson mass, are calculable. Once that is done, the calculated Higgs mass will typically be proportional to the heaviest state in that theory as a result of quantum corrections. The exception to this rule is when the fundamental theory possesses a symmetry forbidding the Higgs mass, in which case the mass will be proportional to the scale where the symmetry becomes manifest. Given the Higgs mass is 125  GeV, the concept of naturalness leads to the following prediction: 1) new particles beyond the standard model should appear around the mass scale of 100-300 GeV, and  2) the new theory with the new particles should have a  protection mechanism for the Higgs mass built in.  

There are two main realizations of this idea. In supersymmetry, the protection is provided by opposite-spin partners of the known particles. In particular, the top quark is accompanied by stop quarks who are spin-0 scalars but otherwise they have the same color and electric charge as the top quark. Another protection mechanism can be provided by a spontaneously broken global symmetry, usually realized in the context of new strong interactions from which the Higgs arises as a composite particle. In that case, the protection is provided by the same spin partners, for example the top quark has a fermionic partner with the same quantum numbers but a different mass.

Both of these ideas are theoretically very attractive but are difficult to realize in practice. First of all, it is hard to understand how these 100 new partner particles could be hiding around the corner without leaving any trace in numerous precision experiments. But even if we were willing to believe in the Universal conspiracy, the LHC run-1 was the final nail in the coffin. The point is that both of these scenarios make a very specific  prediction: the existence of new particles with color charges around the weak scale. As the LHC is basically a quark and gluon collider, it can produce colored particles in large quantities. For example, for a 1 TeV gluino (supersymmetric partner of the gluon) some 1000 pairs would have been already produced at the LHC. Thanks to  the large production rate, the limits on colored partners are already quite stringent. For example, the LHC limits on masses of gluinos and massive spin-1 gluon resonances extend well above 1 TeV, while for scalar and fermionic top partners the limits are not far below 1 TeV. This means that a conspiracy theory is not enough: in supersymmetry and composite Higgs one also has to accept a certain degree of fine-tuning, which means we don't even solve the problem that is the very motivation for these theories.

The reasoning above suggests a possible way out.  What if naturalness could be realized without colored partners: without gluinos, stops, or heavy tops. The conspiracy problem would not go away, but at least we could avoid stringent limits from the LHC. It turns out that theories with such a property do exist. They linger away from the mainstream,  but recently they have been gaining popularity under the name of the neutral naturalness.  The reason for that is obvious: such theories may offer a nuclear bunker that will allow naturalness to survive beyond the LHC run-2.  

The best known realization of neutral naturalness is the twin Higgs model. It assumes the existence of a mirror world, with mirror gluons, mirror top quarks, a mirror Higgs boson, etc., which is related to the standard model by an approximate parity symmetry.  The parity gives rise to an accidental global symmetry that could protect the Higgs boson mass. At the technical level, the protection mechanism is similar as in composite Higgs models where standard model particles have partners with the same spins.  The crucial difference, however, is that the mirror top quarks and mirror gluons are charged under the mirror color group, not the standard model color.  As we don't have a mirror proton collider yet, the mirror partners are not produced in large quantities at the LHC. Therefore, they could well be as light as our top quark without violating any experimental bounds,  and in agreement with the requirements of naturalness.

A robust prediction of twin-Higgs-like models is that the Higgs boson couplings to matter deviate from the standard model predictions, as a consequence of mixing with the mirror Higgs. The size of this deviation is of the same order as  the fine-tuning in the theory, for example order 10% deviations  are expected when the fine-tuning is 1 in 10. This is perhaps the best motivation for precision Higgs studies:  measuring the Higgs couplings with an accuracy better than 10% may invalidate or boost the idea.  However,  the neutral naturalness points us to experimental signals that are often very different than in the popular models. For example, the mirror color interactions are  expected to behave at low energies similarly to our QCD:  there should be mirror mesons, baryons, glueballs.  By construction, the Higgs boson  must couple to the mirror world, and therefore it offers a portal via which the mirror hadronic junk can be produced and decay, which  may lead to truly exotic signatures such as displaced jets. This underlines the importance to search for exotic Higgs boson decays - very few such studies have been carried out by the LHC experiments so far. Finally, as it has been speculated for long time, dark matter may have something to do the with the mirror world. Neutral naturalness provides a reason for the existence of the mirror world and an approximate parity symmetry relating it to the real world. It may be our best shot at understanding why the amounts of ordinary and dark matter in the Universe are equal  up to a factor of  5 - something that arises as a complete accident in the usual WIMP dark matter scenario.

There's no doubt that the neutral naturalness is a  desperate attempt to save natural electroweak symmetry breaking from the reality check, or at least postpone the inevitable. Nevertheless, the existence of a mirror world is certainly a logical possibility. The recent resurgence of this scenario has led to identifying new interesting models, and new ways to search for them in  experiment. The persistence of the naturalness principle may thus be turned into a positive force, as it may motivate better searches for hidden particles.  It is possible that the LHC data hold the answer to the naturalness puzzle, but we will have to look deeper to extract it.

Sean CarrollDoes Spacetime Emerge From Quantum Information?

Quantizing gravity is an important goal of contemporary physics, but after decades of effort it’s proven to be an extremely tough nut to crack. So it’s worth considering a very slight shift of emphasis. What if the right strategy is not “finding the right theory of gravity and quantizing it,” but “finding a quantum theory out of which gravity emerges”?

That’s one way of thinking about a new and exciting approach to the problem known as “tensor networks” or the “AdS/MERA correspondence.” If you want to have the background and basic ideas presented in a digestible way, the talented Jennifer Ouellette has just published an article at Quanta that lays it all out. If you want to dive right into some of the nitty-gritty, my young and energetic collaborators and I have a new paper out:

Consistency Conditions for an AdS/MERA Correspondence
Ning Bao, ChunJun Cao, Sean M. Carroll, Aidan Chatwin-Davies, Nicholas Hunter-Jones, Jason Pollack, Grant N. Remmen

The Multi-scale Entanglement Renormalization Ansatz (MERA) is a tensor network that provides an efficient way of variationally estimating the ground state of a critical quantum system. The network geometry resembles a discretization of spatial slices of an AdS spacetime and “geodesics” in the MERA reproduce the Ryu-Takayanagi formula for the entanglement entropy of a boundary region in terms of bulk properties. It has therefore been suggested that there could be an AdS/MERA correspondence, relating states in the Hilbert space of the boundary quantum system to ones defined on the bulk lattice. Here we investigate this proposal and derive necessary conditions for it to apply, using geometric features and entropy inequalities that we expect to hold in the bulk. We show that, perhaps unsurprisingly, the MERA lattice can only describe physics on length scales larger than the AdS radius. Further, using the covariant entropy bound in the bulk, we show that there are no conventional MERA parameters that completely reproduce bulk physics even on super-AdS scales. We suggest modifications or generalizations of this kind of tensor network that may be able to provide a more robust correspondence.

(And we’re not the only Caltech-flavored group to be thinking about this stuff.)

Between the Quanta article and our paper you should basically be covered, but let me give the basic idea. It started when quantum-information theorists interested in condensed-matter physics, in particular Giufre Vidal and Glen Evenbly, were looking for ways to find the quantum ground state (the wave function with lowest possible energy) of toy-model systems of spins (qubits) arranged on a line. A simple problem, but one that is very hard to solve, even on a computer — Hilbert space is just too big to efficiently search through it. So they turned to the idea of a “tensor network.”

A tensor network is a way of building up a complicated, highly-entangled state of many particles, by starting with a simple initial state. The particular kind of network that Vidal and Evenbly became interested in is called the MERA, for Multiscale Entanglement Renormalization Ansatz (see for example). Details can be found in the links above; what matters here is that the MERA takes the form of a lattice that looks a bit like this.

tensor banner - circle_0

Our initial simple starting point is actually at the center of this diagram. The various links represent tensors acting on that initial state to make something increasingly more complicated, culminating in the many-body state at the circular boundary of the picture.

Here’s the thing: none of this had anything to do with gravity. It was a just a cute calculational trick to find quantum states of interacting electron spins. But this kind of picture can’t help but remind certain theoretical physicists of a very famous kind of spacetime: Anti-de Sitter space (AdS), the maximally symmetric solution to Einstein’s equation in the presence of a negative cosmological constant. (Or at least the “spatial” part thereof, which is simply a hyperbolic plane.)


Of course, someone has to be the first to actually do the noticing, and in this case it was a young physicist named Brian Swingle. Brian is a condensed-matter physicist himself, but he was intellectually curious enough to take courses on string theory as a grad student. There he learned that string theorists love AdS — it’s the natural home of Maldacena’s celebrated gauge/gravity duality, with a gauge theory living on the flat-space “boundary” and gravity lurking in the AdS “bulk.” Swingle wondered whether the superficial similarity between the MERA tensor network and AdS geometry wasn’t actually a sign of something deeper — an AdS/MERA correspondence?

And the answer is — maybe! Some of the features of AdS gravity are certainly captured by the MERA, so the whole thing kind of smells right. But, as we say in the paper above with the expansive list of authors, it doesn’t all just fall together right away. Some things you would like to be true in AdS don’t happen automatically in the MERA interpretation. Which isn’t a deal-killer — it’s just a sign that we have to, at the very least, work a bit harder. Perhaps there’s a generalization of the simple MERA that must be considered, or a slightly more subtle version of the purported correspondence.

The possibility is well worth pursuing. As amazing (and thoroughly checked) as the traditional AdS/CFT correspondence is, there are still questions about it that we haven’t satisfactorily answered. The tensor networks, on the other hand, are extremely concrete, well-defined objects, for which you should in principle be able to answer any question you might have. Perhaps more intriguingly, the idea of “string theory” never really enters the game. The “bulk” where gravity lives emerges directly from a set of interacting spins, in a context where the original investigators weren’t thinking about gravity at all. The starting point doesn’t even necessarily have anything to do with “spacetime,” and certainly not with the dynamics of spacetime geometry. So I certainly hope that people remain excited and keep thinking in this direction — it would be revolutionary if you could build a complete theory of quantum gravity directly from some interacting qubits.

Terence TaoThe standard branch of the matrix logarithm

Because of Euler’s identity {e^{\pi i} + 1 = 0}, the complex exponential is not injective: {e^{z + 2\pi i k} = e^z} for any complex {z} and integer {k}. As such, the complex logarithm {z \mapsto \log z} is not well-defined as a single-valued function from {{\bf C} \backslash \{0\}} to {{\bf C}}. However, after making a branch cut, one can create a branch of the logarithm which is single-valued. For instance, after removing the negative real axis {(-\infty,0]}, one has the standard branch {\hbox{Log}: {\bf C} \backslash (-\infty,0] \rightarrow \{ z \in {\bf C}: |\hbox{Im} z| < \pi \}} of the logarithm, with {\hbox{Log}(z)} defined as the unique choice of the complex logarithm of {z} whose imaginary part has magnitude strictly less than {\pi}. This particular branch has a number of useful additional properties:

  • The standard branch {\hbox{Log}} is holomorphic on its domain {{\bf C} \backslash (-\infty,0]}.
  • One has {\hbox{Log}( \overline{z} ) = \overline{ \hbox{Log}(z) }} for all {z} in the domain {{\bf C} \backslash (-\infty,0]}. In particular, if {z \in {\bf C} \backslash (-\infty,0]} is real, then {\hbox{Log} z} is real.
  • One has {\hbox{Log}( z^{-1} ) = - \hbox{Log}(z)} for all {z} in the domain {{\bf C} \backslash (-\infty,0]}.

One can then also use the standard branch of the logarithm to create standard branches of other multi-valued functions, for instance creating a standard branch {z \mapsto \exp( \frac{1}{2} \hbox{Log} z )} of the square root function. We caution however that the identity {\hbox{Log}(zw) = \hbox{Log}(z) + \hbox{Log}(w)} can fail for the standard branch (or indeed for any branch of the logarithm).

One can extend this standard branch of the logarithm to {n \times n} complex matrices, or (equivalently) to linear transformations {T: V \rightarrow V} on an {n}-dimensional complex vector space {V}, provided that the spectrum of that matrix or transformation avoids the branch cut {(-\infty,0]}. Indeed, from the spectral theorem one can decompose any such {T: V \rightarrow V} as the direct sum of operators {T_\lambda: V_\lambda \rightarrow V_\lambda} on the non-trivial generalised eigenspaces {V_\lambda} of {T}, where {\lambda \in {\bf C} \backslash (-\infty,0]} ranges in the spectrum of {T}. For each component {T_\lambda} of {T}, we define

\displaystyle  \hbox{Log}( T_\lambda ) = P_\lambda( T_\lambda )

where {P_\lambda} is the Taylor expansion of {\hbox{Log}} at {\lambda}; as {T_\lambda-\lambda} is nilpotent, only finitely many terms in this Taylor expansion are required. The logarithm {\hbox{Log} T} is then defined as the direct sum of the {\hbox{Log} T_\lambda}.

The matrix standard branch of the logarithm has many pleasant and easily verified properties (often inherited from their scalar counterparts), whenever {T: V \rightarrow V} has no spectrum in {(-\infty,0]}:

  • (i) We have {\exp( \hbox{Log} T ) = T}.
  • (ii) If {T_1: V_1 \rightarrow V_1} and {T_2: V_2 \rightarrow V_2} have no spectrum in {(-\infty,0]}, then {\hbox{Log}( T_1 \oplus T_2 ) = \hbox{Log}(T_1) \oplus \hbox{Log}(T_2)}.
  • (iii) If {T} has spectrum in a closed disk {B(z,r)} in {{\bf C} \backslash (-\infty,0]}, then {\hbox{Log}(T) = P_z(T)}, where {P_z} is the Taylor series of {\hbox{Log}} around {z} (which is absolutely convergent in {B(z,r)}).
  • (iv) {\hbox{Log}(T)} depends holomorphically on {T}. (Easily established from (ii), (iii), after covering the spectrum of {T} by disjoint disks; alternatively, one can use the Cauchy integral representation {\hbox{Log}(T) = \frac{1}{2\pi i} \int_\gamma \hbox{Log}(z)(z-T)^{-1}\ dz} for a contour {\gamma} in the domain enclosing the spectrum of {T}.) In particular, the standard branch of the matrix logarithm is smooth.
  • (v) If {S: V \rightarrow W} is any invertible linear or antilinear map, then {\hbox{Log}( STS^{-1} ) = S \hbox{Log}(T) S^{-1}}. In particular, the standard branch of the logarithm commutes with matrix conjugations; and if {T} is real with respect to a complex conjugation operation on {V} (that is to say, an antilinear involution), then {\hbox{Log}(T)} is real also.
  • (vi) If {T^*: V^* \rightarrow V^*} denotes the transpose of {T} (with {V^*} the complex dual of {V}), then {\hbox{Log}(T^*) = \hbox{Log}(T)^*}. Similarly, if {T^\dagger: V^\dagger \rightarrow V^\dagger} denotes the adjoint of {T} (with {V^\dagger} the complex conjugate of {V^*}, i.e. {V^*} with the conjugated multiplication map {(c,z) \mapsto \overline{c} z}), then {\hbox{Log}(T^\dagger) = \hbox{Log}(T)^\dagger}.
  • (vii) One has {\hbox{Log}(T^{-1}) = - \hbox{Log}( T )}.
  • (viii) If {\sigma(T)} denotes the spectrum of {T}, then {\sigma(\hbox{Log} T) = \hbox{Log} \sigma(T)}.

As a quick application of the standard branch of the matrix logarithm, we have

Proposition 1 Let {G} be one of the following matrix groups: {GL_n({\bf C})}, {GL_n({\bf R})}, {U_n({\bf C})}, {O(Q)}, {Sp_{2n}({\bf C})}, or {Sp_{2n}({\bf R})}, where {Q: {\bf R}^n \rightarrow {\bf R}} is a non-degenerate real quadratic form (so {O(Q)} is isomorphic to a (possibly indefinite) orthogonal group {O(k,n-k)} for some {0 \leq k \leq n}. Then any element {T} of {G} whose spectrum avoids {(-\infty,0]} is exponential, that is to say {T = \exp(X)} for some {X} in the Lie algebra {{\mathfrak g}} of {G}.

Proof: We just prove this for {G=O(Q)}, as the other cases are similar (or a bit simpler). If {T \in O(Q)}, then (viewing {T} as a complex-linear map on {{\bf C}^n}, and using the complex bilinear form associated to {Q} to identify {{\bf C}^n} with its complex dual {({\bf C}^n)^*}, then {T} is real and {T^{*-1} = T}. By the properties (v), (vi), (vii) of the standard branch of the matrix logarithm, we conclude that {\hbox{Log} T} is real and {- \hbox{Log}(T)^* = \hbox{Log}(T)}, and so {\hbox{Log}(T)} lies in the Lie algebra {{\mathfrak g} = {\mathfrak o}(Q)}, and the claim now follows from (i). \Box

Exercise 2 Show that {\hbox{diag}(-\lambda, -1/\lambda)} is not exponential in {GL_2({\bf R})} if {-\lambda \in (-\infty,0) \backslash \{-1\}}. Thus we see that the branch cut in the above proposition is largely necessary. See this paper of Djokovic for a more complete description of the image of the exponential map in classical groups, as well as this previous blog post for some more discussion of the surjectivity (or lack thereof) of the exponential map in Lie groups.

For a slightly less quick application of the standard branch, we have the following result (recently worked out in the answers to this MathOverflow question):

Proposition 3 Let {T} be an element of the split orthogonal group {O(n,n)} which lies in the connected component of the identity. Then {\hbox{det}(1+T) \geq 0}.

The requirement that {T} lie in the identity component is necessary, as the counterexample {T = \hbox{diag}(-\lambda, -1/\lambda )} for {\lambda \in (-\infty,-1) \cup (-1,0)} shows.

Proof: We think of {T} as a (real) linear transformation on {{\bf C}^{2n}}, and write {Q} for the quadratic form associated to {O(n,n)}, so that {O(n,n) \equiv O(Q)}. We can split {{\bf C}^{2n} = V_1 \oplus V_2}, where {V_1} is the sum of all the generalised eigenspaces corresponding to eigenvalues in {(-\infty,0]}, and {V_2} is the sum of all the remaining eigenspaces. Since {T} and {(-\infty,0]} are real, {V_1,V_2} are real (i.e. complex-conjugation invariant) also. For {i=1,2}, the restriction {T_i: V_i \rightarrow V_i} of {T} to {V_i} then lies in {O(Q_i)}, where {Q_i} is the restriction of {Q} to {V_i}, and

\displaystyle  \hbox{det}(1+T) = \hbox{det}(1+T_1) \hbox{det}(1+T_2).

The spectrum of {T_2} consists of positive reals, as well as complex pairs {\lambda, \overline{\lambda}} (with equal multiplicity), so {\hbox{det}(1+T_2) > 0}. From the preceding proposition we have {T_2 = \exp( X_2 )} for some {X_2 \in {\mathfrak o}(Q_2)}; this will be important later.

It remains to show that {\hbox{det}(1+T_1) \geq 0}. If {T_1} has spectrum at {-1} then we are done, so we may assume that {T_1} has spectrum only at {(-\infty,-1) \cup (-1,0)} (being invertible, {T} has no spectrum at {0}). We split {V_1 = V_3 \oplus V_4}, where {V_3,V_4} correspond to the portions of the spectrum in {(-\infty,-1)}, {(-1,0)}; these are real, {T}-invariant spaces. We observe that if {V_\lambda, V_\mu} are generalised eigenspaces of {T} with {\lambda \mu \neq 1}, then {V_\lambda, V_\mu} are orthogonal with respect to the (complex-bilinear) inner product {\cdot} associated with {Q}; this is easiest to see first for the actual eigenspaces (since { \lambda \mu u \cdot v = Tu \cdot Tv = u \cdot v} for all {u \in V_\lambda, v \in V_\mu}), and the extension to generalised eigenvectors then follows from a routine induction. From this we see that {V_1} is orthogonal to {V_2}, and {V_3} and {V_4} are null spaces, which by the non-degeneracy of {Q} (and hence of the restriction {Q_1} of {Q} to {V_1}) forces {V_3} to have the same dimension as {V_4}, indeed {Q} now gives an identification of {V_3^*} with {V_4}. If we let {T_3, T_4} be the restrictions of {T} to {V_3,V_4}, we thus identify {T_4} with {T_3^{*-1}}, since {T} lies in {O(Q)}; in particular {T_3} is invertible. Thus

\displaystyle  \hbox{det}(1+T_1) = \hbox{det}(1 + T_3) \hbox{det}( 1 + T_3^{*-1} ) = \hbox{det}(T_3)^{-1} \hbox{det}(1+T_3)^2

and so it suffices to show that {\hbox{det}(T_3) > 0}.

At this point we need to use the hypothesis that {T} lies in the identity component of {O(n,n)}. This implies (by a continuity argument) that the restriction of {T} to any maximal-dimensional positive subspace has positive determinant (since such a restriction cannot be singular, as this would mean that {T} positive norm vector would map to a non-positive norm vector). Now, as {V_3,V_4} have equal dimension, {Q_1} has a balanced signature, so {Q_2} does also. Since {T_2 = \exp(X_2)}, {T_2} already lies in the identity component of {O(Q_2)}, and so has positive determinant on any maximal-dimensional positive subspace of {V_2}. We conclude that {T_1} has positive determinant on any maximal-dimensional positive subspace of {V_1}.

We choose a complex basis of {V_3}, to identify {V_3} with {V_3^*}, which has already been identified with {V_4}. (In coordinates, {V_3,V_4} are now both of the form {{\bf C}^m}, and {Q( v \oplus w ) = v \cdot w} for {v,w \in {\bf C}^m}.) Then {\{ v \oplus v: v \in V_3 \}} becomes a maximal positive subspace of {V_1}, and the restriction of {T_1} to this subspace is conjugate to {T_3 + T_3^{*-1}}, so that

\displaystyle  \hbox{det}( T_3 + T_3^{*-1} ) > 0.

But since {\hbox{det}( T_3 + T_3^{*-1} ) = \hbox{det}(T_3) \hbox{det}( 1 + T_3^{-1} T_3^{*-1} )} and { 1 + T_3^{-1} T_3^{*-1}} is positive definite, so {\hbox{det}(T_3)>0} as required. \Box

Filed under: expository, math.GR, math.RA, math.SP Tagged: branch cuts, Lie groups, matrix logarithm, split orthogonal group