Planet Musings

April 27, 2017

David Hogghypothesis testing and marginalization

I had a valuable chat in the morning with Adrian Price-Whelan (Princeton) about some hypothesis testing, for stellar pairs. The hypotheses are: unbound and unrelated field stars, co-moving but unbound, and comoving because bound. We discussed this problem as a hypothesis test, and also as a parameter estimation (estimating binding energy and velocity difference). My position (that my loyal reader knows well) is that you should never do a hypothesis test when you can do a parameter estimation.

A Bayesian hypothesis test involves computing fully marginalized likelihoods (FMLs). A parameter estimation involves computing partially marginalized posteriors. When I present this difference to Dustin Lang (Toronto), he tends to say “how can marginalizing out all but one of your parameters be so much easier than marginalizing out all your parameters?”. Good question! I think the answer has to do with the difference between estimating densities (probability densities that integrate to unity) and estimating absolute probabilities (numbers that sum to unity). But I can't quite get the argument right.

In my mind, this is connected to an observation I have seen over at Andrew Gelman's blog more than once: When predicting the outcome of a sporting event, it is much better to predict a pdf over final scores than to predict the win/loss probability. This is absolutely my experience (context: horse racing).

April 26, 2017

BackreactionNot all publicity is good publicity, not even in science.

“Any publicity is good publicity” is a reaction I frequently get to my complaints about flaky science coverage. I find this attitude disturbing, especially when it comes from scientists. [img src:] To begin with, it’s an idiotic stance towards journalism in general – basically a permission for journalists to write nonsense. Just imagine having the same attitude towards

David Hoggthe last year of a giant star's life

Eliot Quataert (Berkeley) gave the astrophysics seminar today. He spoke about the last years-to-days in the lifetime of a massive star. He is interested in explaining the empirical evidence that suggests that many of these stars cough out significant mass ejection events in the last years of their lives. He has mechanisms that involve convection in the core driving gravity (not gravitational) waves in the outer parts that break at the edge of the star. His talk touched on many fundamental ideas in astrophysics, including the conditions under which an object can violate the Eddington luminosity. For mass-loss driven (effectively) by excess luminosity, you have to both exceed (some form of) the Eddington limit and deposit energy high enough up in the star's radius that there is enough total energy (luminosity times time) to unbind the outskirts. His talk also (inadvertently) touched on some points of impedance matching that I am interested in. Quataert's research style is something I admire immensely: Very simple, very fundamental arguments, backed up by very good analytic and computational work. The talk was a pleasure!

After the talk, I went to lunch with Daniela Huppenkothen (NYU), Jack Ireland (GSFC), and Andrew Inglis (GSFC). We spoke more about possible extensions of things they are working on in more Bayesian or more machine-learning directions. We also talked about the astrophysics Decadal process, and the impacts this has on astrophysics missions at NASA and projects at NSF, and comparisons to similar structures in the Solar world. Interestingly rich subject there.

David HoggSolar data

In the morning, Jack Ireland (GSFC) and Andrew Inglis (GSFC) gave talks about data-intensive projects in Solar Physics. Ireland spoke about his Helioviewer project, which is a rich, multi-modal, interactive interface to the multi-channel, heterogeneous, imaging, time-stream, and event data on the Sun, coming from many different missions and facilities. It is like Google Earth for the Sun, but also with very deep links into the raw data. This project has made it very easy for scientists (and citizen scientists) from all backgrounds to interact with and obtain Solar data.

Inglis spoke about his AFINO project to characterize all Solar flares in terms of various time-series (Fourier) properties. He is interested in very similar questions for Solar flares that Huppenkothen (NYU) is interested in for neutron-star and black-hole transients. Some of the interaction during the talk was about different probabilistic approaches to power-spectrum questions in the time domain.

Over lunch I met with Ruth Angus (Columbia) to consult on her stellar chronometer projects. We discussed bringing in vertical action (yes, Galactic dynamics) as a stellar clock or age indicator. It is an odd indicator, because the vertical action (presumably) random-walks with time. This makes it a very low-precision clock! But it has many nice properties, like that it works for all classes of stars (possibly with subtleties), in our self-calibration context it connects age indicators of different types from different stars, and it is good at constraining old ages. We wrote some math and discussed further our MCMC sampling issues.

John BaezComplexity Theory and Evolution in Economics

This book looks interesting:

• David S. Wilson and Alan Kirman, editors, Complexity and Evolution: Toward a New Synthesis for Economics, MIT Press, Cambridge Mass., 2016.

You can get some chapters for free here. I’ve only looked carefully at this one:

• Joshua M. Epstein and Julia Chelen, Advancing Agent_Zero.

Agent_Zero is a simple toy model of an agent that’s not the idealized rational actor often studied in economics: rather, it has emotional, deliberative, and social modules which interact with each other to make decisions. Epstein and Chelen simulate collections of such agents and see what they do:

Abstract. Agent_Zero is a mathematical and computational individual that can generate important, but insufficiently understood, social dynamics from the bottom up. First published by Epstein (2013), this new theoretical entity possesses emotional, deliberative, and social modules, each grounded in contemporary neuroscience. Agent_Zero’s observable behavior results from the interaction of these internal modules. When multiple Agent_Zeros interact with one another, a wide range of important, even disturbing, collective dynamics emerge. These dynamics are not straightforwardly generated using the canonical rational actor which has dominated mathematical social science since the 1940s. Following a concise exposition of the Agent_Zero model, this chapter offers a range of fertile research directions, including the use of realistic geographies and population levels, the exploration of new internal modules and new interactions among them, the development of formal axioms for modular agents, empirical testing, the replication of historical episodes, and practical applications. These may all serve to advance the Agent_Zero research program.

It sounds like a fun and productive project as long as one keeps ones wits about one. It’s hard to draw conclusions about human behavior from such simplified agents. One can argue about this, and of course economists will. But regardless of this, one can draw conclusions about which kinds of simplified agents will engage in which kinds of collective behavior under which conditions.

Basically, one can start mapping out a small simple corner of the huge ‘phase space’ of possible societies. And that’s bound to lead to interesting new ideas that one wouldn’t get from either 1) empirical research on human and animal societies or 2) pure theoretical pondering without the help of simulations.

Here’s an article whose title, at least, takes a vastly more sanguine attitude toward benefits of such work:

• Kate Douglas, Orthodox economics is broken: how evolution, ecology, and collective behavior can help us avoid catastrophe, Evonomics, 22 July 2016.

I’ll quote just a bit:

For simplicity’s sake, orthodox economics assumes that Homo economicus, when making a fundamental decision such as whether to buy or sell something, has access to all relevant information. And because our made-up economic cousins are so rational and self-interested, when the price of an asset is too high, say, they wouldn’t buy—so the price falls. This leads to the notion that economies self-organise into an equilibrium state, where supply and demand are equal.

Real humans—be they Wall Street traders or customers in Walmart—don’t always have accurate information to hand, nor do they act rationally. And they certainly don’t act in isolation. We learn from each other, and what we value, buy and invest in is strongly influenced by our beliefs and cultural norms, which themselves change over time and space.

“Many preferences are dynamic, especially as individuals move between groups, and completely new preferences may arise through the mixing of peoples as they create new identities,” says anthropologist Adrian Bell at the University of Utah in Salt Lake City. “Economists need to take cultural evolution more seriously,” he says, because it would help them understand who or what drives shifts in behaviour.

Using a mathematical model of price fluctuations, for example, Bell has shown that prestige bias—our tendency to copy successful or prestigious individuals—influences pricing and investor behaviour in a way that creates or exacerbates market bubbles.

We also adapt our decisions according to the situation, which in turn changes the situations faced by others, and so on. The stability or otherwise of financial markets, for instance, depends to a great extent on traders, whose strategies vary according to what they expect to be most profitable at any one time. “The economy should be considered as a complex adaptive system in which the agents constantly react to, influence and are influenced by the other individuals in the economy,” says Kirman.

This is where biologists might help. Some researchers are used to exploring the nature and functions of complex interactions between networks of individuals as part of their attempts to understand swarms of locusts, termite colonies or entire ecosystems. Their work has provided insights into how information spreads within groups and how that influences consensus decision-making, says Iain Couzin from the Max Planck Institute for Ornithology in Konstanz, Germany—insights that could potentially improve our understanding of financial markets.

Take the popular notion of the “wisdom of the crowd”—the belief that large groups of people can make smart decisions even when poorly informed, because individual errors of judgement based on imperfect information tend to cancel out. In orthodox economics, the wisdom of the crowd helps to determine the prices of assets and ensure that markets function efficiently. “This is often misplaced,” says Couzin, who studies collective behaviour in animals from locusts to fish and baboons.

By creating a computer model based on how these animals make consensus decisions, Couzin and his colleagues showed last year that the wisdom of the crowd works only under certain conditions—and that contrary to popular belief, small groups with access to many sources of information tend to make the best decisions.

That’s because the individual decisions that make up the consensus are based on two types of environmental cue: those to which the entire group are exposed—known as high-correlation cues—and those that only some individuals see, or low-correlation cues. Couzin found that in larger groups, the information known by all members drowns out that which only a few individuals noticed. So if the widely known information is unreliable, larger groups make poor decisions. Smaller groups, on the other hand, still make good decisions because they rely on a greater diversity of information.

So when it comes to organising large businesses or financial institutions, “we need to think about leaders, hierarchies and who has what information”, says Couzin. Decision-making structures based on groups of between eight and 12 individuals, rather than larger boards of directors, might prevent over-reliance on highly correlated information, which can compromise collective intelligence. Operating in a series of smaller groups may help prevent decision-makers from indulging their natural tendency to follow the pack, says Kirman.

Taking into account such effects requires economists to abandon one-size-fits-all mathematical formulae in favour of “agent-based” modelling—computer programs that give virtual economic agents differing characteristics that in turn determine interactions. That’s easier said than done: just like economists, biologists usually model relatively simple agents with simple rules of interaction. How do you model a human?

It’s a nut we’re beginning to crack. One attendee at the forum was Joshua Epstein, director of the Center for Advanced Modelling at Johns Hopkins University in Baltimore, Maryland. He and his colleagues have come up with Agent_Zero, an open-source software template for a more human-like actor influenced by emotion, reason and social pressures. Collections of Agent_Zeros think, feel and deliberate. They have more human-like relationships with other agents and groups, and their interactions lead to social conflict, violence and financial panic. Agent_Zero offers economists a way to explore a range of scenarios and see which best matches what is going on in the real world. This kind of sophistication means they could potentially create scenarios approaching the complexity of real life.

Orthodox economics likes to portray economies as stately ships proceeding forwards on an even keel, occasionally buffeted by unforeseen storms. Kirman prefers a different metaphor, one borrowed from biology: economies are like slime moulds, collections of single-celled organisms that move as a single body, constantly reorganising themselves to slide in directions that are neither understood nor necessarily desired by their component parts.

For Kirman, viewing economies as complex adaptive systems might help us understand how they evolve over time—and perhaps even suggest ways to make them more robust and adaptable. He’s not alone. Drawing analogies between financial and biological networks, the Bank of England’s research chief Andrew Haldane and University of Oxford ecologist Robert May have together argued that we should be less concerned with the robustness of individual banks than the contagious effects of one bank’s problems on others to which it is connected. Approaches like this might help markets to avoid failures that come from within the system itself, Kirman says.

To put this view of macroeconomics into practice, however, might mean making it more like weather forecasting, which has improved its accuracy by feeding enormous amounts of real-time data into computer simulation models that are tested against each other. That’s not going to be easy.


April 25, 2017

Richard EastherUp, Up and Away...

Last night my twitter feed carried a string of "what's that in the western sky" queries, including this picture from Rachael King @rachaelking70...

Balloon in the Christchurch sky...

Balloon in the Christchurch sky...

There's a clear disk showing in this snapshot, so we can immediately rule out "Venus", a standard explanation for bright lights in the sky. (And, for good measure, Venus is currently only visible in the pre-dawn hours.)

"Weather balloon" is the next usual suspect for purported UFOs, and they are indeed far more common than flying objects with less terrestrial origins.

In this case, I knew that NASA was launching a high-altitude research balloon this week from Wanaka in the South Island, and a quick google revealed that they had succeeded after a string of aborted attempts. And this balloon is a monster, making it particularly easy to see. Meteorological balloons range in size from "large garbage bag" to "small house", but this bad boy is closer to a flying football stadium – including the stands. 

Once we knew what it was, the next question on Twitter was "what is it doing?" The answer of course is "science".

Balloon science is an often-neglected, older cousin of rocket science and NASA's primary goal with this launch is to test a better balloon. Their aim is to produce balloons that can spend 100+ days aloft, providing experimental platforms on the edge of space that cost far less than an orbital mission (and with a decent shot at getting your payload back with a soft-landing rather than a fiery re-entry). 

This year's mission carries a scientific payload since, if you are testing a giant balloon, why waste the chance to put it to work? Slung beneath the balloon is EUSO-SPB, the Extreme Universe Space Observatory on a Super Pressure Balloon, a project managed by the University of Chicago with Professor Angela Olinto as Principal Investigator.

The experiment is searching for the most powerful cosmic rays in the universe – charged particles that have been accelerated to within a hairsbreadth of the speed of light. To find them, the telescope looks down rather than up. These rare particles slam into the earth's atmosphere, leaving glowing ultraviolet tracks in the air beneath the balloon which are caught and analysed by high speed cameras.

A more sophisticated version of the experiment will fly on the International Space Station. These exceptionally rare high-energy cosmic rays are catnip for astrophysicists – they are fascinating in their own right, and might well start their journeys towards us from the vicinity of the supermassive black holes at the centres of "active galaxies", some of the most extreme objects in the universe. 

ANZAC poppy aboard balloon; Image: NASA

ANZAC poppy aboard balloon; Image: NASA

And because it launched on ANZAC Day the balloon carried a poppy aloft along with its scientific payload. So it turns out that as Christchurch residents watched an unexpected bright light rising in the western sky "at the going down of the sun", they were also and unwittingly gazing at a small memorial for those who had fallen in the service of their country.

 CODA: Video of the launch (mis-captioned as a weather balloon!)

Mark Chu-CarrollA Review of Type Theory (so far)

I’m trying to get back to writing about type theory. Since it’s been quite a while since the last type theory post, we’ll start with a bit of review.

What is this type theory stuff about?

The basic motivation behind type theory is that set theory isn’t the best foundation for mathematics. It seems great at first, but when you dig in deep, you start to see cracks.

If you start with naive set theory, the initial view is amazing: it’s so simple! But it falls apart: it’s not consistent. When you patch it, creating axiomatic set theory, you get something that isn’t logically inconsistent – but it’s a whole lot more complicated. And while it does fix the inconsistency, it still gives you some results which seem wrong.

Type theory covers a range of approaches that try to construct a foundational theory of mathematics that has the intuitive appeal of axiomatic set theory, but without some of its problems.

The particular form of type theory that we’ve been looking at is called Martin-Löf type theory. M-L type theory is a constructive theory of mathematics in which computation plays a central role. The theory rebuilds mathematics in a very concrete form: every proof must explicitly construct the objects it talks about. Every existence proof doesn’t just prove that something exists in the abstract – it provides a set of instructions (a program!) to construct an example of the thing that exists. Every proof that something is false provides a set of instructions (also a program!) for how to construct a counterexample that demonstrates its falsehood.

This is, necessarily, a weaker foundation for math than traditional axiomatic set theory. There are useful things that are provable in axiomatic set theory, but which aren’t provable in a mathematics based on M-L type theory. That’s the price you pay for the constructive foundations. But in exchange, you get something that is, in many ways, clearer and more reasonable than axiomatic set theory. Like so many things, it’s a tradeoff.

The constructivist nature of M-L type theory is particularly interesting to wierdos like me, because it means that programming becomes the foundation of mathematics. It creates a beautiful cyclic relationship: mathematics is the foundation of programming, and programming is the foundation of mathematics. The two are, in essence, one and the same thing.

The traditional set theoretic basis of mathematics uses set theory with first order predicate logic. FOPL and set theory are so tightly entangled in the structure of mathematics that they’re almost inseparable. The basic definitions of type theory require logical predicates that look pretty much like FOPL; and FOPL requires a model that looks pretty much like set theory.

For our type theory, we can’t use FOPL – it’s part of the problem. Instead, Martin-Lof used intuitionistic logic. Intuitionistic logic plays the same role in type theory that FOPL plays in set theory: it’s deeply entwined into the entire system of types.

The most basic thing to understand in type theory is what a logical proposition means. A proposition is a complete logical statement no unbound variables and no quantifiers. For example, “Mark has blue eyes” is a proposition. A simple proposition is a statement of fact about a specific object. In type theory, a proof of a proposition is a program that demonstrates that the statement is true. A proof that “Mark has blue eyes” is a program that does something like “Look at a picture of Mark, screen out everything but the eyes, measure the color C of his eyes, and then check that C is within the range of frequencies that we call “blue”. We can only say that a proposition is true if we can write that program.

Simple propositions are important as a starting point, but you can’t do anything terribly interesting with them. Reasoning with simple propositions is like writing a program where you can only use literal values, but no variables. To be able to do interesting things, you really need variables.

In Martin-Lof type theory, variables come along with predicates. A predicate is a statement describing a property or fact about an object (or about a collection of objects) – but instead of defining it in terms of a single fixed value like a proposition, it takes a parameter. “Mark has blue eyes” is a proposition; “Has blue eyes” is a predicate. In M-L type theory, a predicate is only meaningful if you can write a program that, given an object (or group of objects) as a parameter, can determine whether or no the predicate is true for that object.

That’s roughly where we got to in type theory before the blog went on hiatus.

David Hoggafter SDSS-IV; red-clump stars

At Stars group meeting, Juna Kollmeier (OCIW) spoke about the plans for the successor project to SDSS-IV. It will be an all-sky spectroscopic survey, with 15 million spectroscopic visits, on 5-ish million targets. The cadence and plan are made possible by advances in robot fiber positioning, and The Cannon, which permits inferences about stars that scale well with decreasing signal-to-noise ratio. The survey will use the 2.5-m SDSS telescope in the North, and the 2.5-m du Pont in the South. Science goals include galactic archaeology, stellar systems (binaries, triples, and so on), evolved stars, origins of the elements, TESS scientific support and follow-up, and time-domain events. The audience had many questions about operations and goals, including the maturity of the science plan. The short story is that partners who buy in to the survey now will have a lot of influence over the targeting and scientific program.

Keith Hawkins (Columbia) showed his red-clump-star models built on TGAS and 2MASS and WISE and GALEX data. He finds an intrinsic scatter of about 0.17 magnitude (RMS) in many bands, and, when the scatter is larger, there are color trends that could be calibrated out. He also, incidentally, infers a dust reddening for every star. One nice result is that he finds a huge dependence of the GALEX photometry on metallicity, which has lots of possible scientific applications. The crowd discussed the extent to which theoretical ideas support the standard-ness of RC stars.

John PreskillGlass beads and weak-measurement schemes

Richard Feynman fiddled with electronics in a home laboratory, growing up. I fiddled with arts and crafts.1 I glued popsicle sticks, painted plaques, braided yarn, and designed greeting cards. Of the supplies in my family’s crafts box, I adored the beads most. Of the beads, I favored the glass ones.

I would pour them on the carpet, some weekend afternoons. I’d inherited a hodgepodge: The beads’ sizes, colors, shapes, trimmings, and craftsmanship varied. No property divided the beads into families whose members looked like they belonged together. But divide the beads I tried. I might classify them by color, then subdivide classes by shape. The color and shape groupings precluded me from grouping by size. But, by loosening my original classification and combining members from two classes, I might incorporate trimmings into the categorization. I’d push my classification scheme as far as I could. Then, I’d rake the beads together and reorganize them according to different principles.

Why have I pursued theoretical physics? many people ask. I have many answers. They include “Because I adored organizing craft supplies at age eight.” I craft and organize ideas.


I’ve blogged about the out-of-time-ordered correlator (OTOC), a signature of how quantum information spreads throughout a many-particle system. Experimentalists want to measure the OTOC, to learn how information spreads. But measuring the OTOC requires tight control over many quantum particles.

I proposed a scheme for measuring the OTOC, with help from Chapman physicist Justin Dressel. The scheme involves weak measurements. Weak measurements barely disturb the systems measured. (Most measurements of quantum systems disturb the measured systems. So intuited Werner Heisenberg when formulating his uncertainty principle.)

I had little hope for the weak-measurement scheme’s practicality. Consider the stereotypical experimentalist’s response to a stereotypical experimental proposal by a theorist: Oh, sure, we can implement that—in thirty years. Maybe. If the pace of technological development doubles. I expected to file the weak-measurement proposal in the “unfeasible” category.

But experimentalists started collaring me. The scheme sounds reasonable, they said. How many trials would one have to perform? Did the proposal require ancillas, helper systems used to control the measured system? Must each ancilla influence the whole measured system, or could an ancilla interact with just one particle? How did this proposal compare with alternatives?

I met with a cavity-QED2 experimentalist and a cold-atoms expert. I talked with postdocs over skype, with heads of labs at Caltech, with grad students in Taiwan, and with John Preskill in his office. I questioned an NMR3 experimentalist over lunch and fielded superconducting-qubit4 questions in the sunshine. I read papers, reread papers, and powwowed with Justin.

I wouldn’t have managed half so well without Justin and without Brian Swingle. Brian and coauthors proposed the first OTOC-measurement scheme. He reached out after finding my first OTOC paper.

According to that paper, the OTOC is a moment of a quasiprobability.5 How does that quasiprobability look, we wondered? How does it behave? What properties does it have? Our answers appear in a paper we released with Justin this month. We calculate the quasiprobability in two examples, prove properties of the quasiprobability, and argue that the OTOC motivates generalizations of quasiprobability theory. We also enhance the weak-measurement scheme and analyze it.

Amidst that analysis, in a 10 x 6 table, we classify glass beads.


We inventoried our experimental conversations and distilled them. We culled measurement-scheme features analogous to bead size, color, and shape. Each property labels a row in the table. Each measurement scheme labels a column. Each scheme has, I learned, gold flecks and dents, hues and mottling, an angle at which it catches the light.

I’ve kept most of the glass beads that fascinated me at age eight. Some of the beads have dispersed to necklaces, picture frames, and eyeglass leashes. I moved the remnants, a few years ago, to a compartmentalized box. Doesn’t it resemble the table?


That’s why I work at the IQIM.


1I fiddled in a home laboratory, too, in a garage. But I lived across the street from that garage. I lived two rooms from an arts-and-crafts box.

2Cavity QED consists of light interacting with atoms in a box.

3Lots of nuclei manipulated with magnetic fields. “NMR” stands for “nuclear magnetic resonance.” MRI machines, used to scan brains, rely on NMR.

4Superconducting circuits are tiny, cold quantum circuits.

5A quasiprobability resembles a probability but behaves more oddly: Probabilities range between zero and one; quasiprobabilities can dip below zero. Think of a moment as like an average.

With thanks to all who questioned me; to all who answered questions of mine; to my wonderful coauthors; and to my parents, who stocked the crafts box.

April 24, 2017

Doug NatelsonQuantum conduction in bad metals, and jphys+

I've written previously about bad metals.  We recently published a result (also here) looking at what happens to conduction in an example of such a material at low temperatures, when quantum corrections to conduction (like these) should become increasingly important.   If you're interested, please take a look at a blog post I wrote about this that is appearing on jphys+, the very nice blogging and news/views site run by the Institute of Physics.

David HoggDr Vakili

The research highlight of the day was a beautiful PhD defense by my student MJ Vakili (NYU). Vakili presented two big projects from his thesis: In one, he has developed fast mock-catalog software for understanding cosmic variance in large-scale structure surveys. In the other, he has built and run an inference method to learn the pixel-convolved point-spread function in a space-based imaging device.
In both cases, he has good evidence that his methods are the best in the world. (We intend to write up the latter in the Summer.) Vakili's thesis is amazingly broad, going from pixel-level image processing work that will serve weak-lensing and other precise imaging tasks, all the way up to new methods for using computational simulations to perform principled inferences with cosmological data sets. He was granted a PhD at the end of an excellent defense and a lively set of arguments in the seminar room and in committee. Thank you, MJ, for a great body of work, and a great contribution to my scientific life.

April 23, 2017

Doug NatelsonThoughts after the March for Science

About 10000 people turned out (according to the Houston Chronicle) for our local version of the March for Science.   Observations:

  • While there were some overtly partisan participants and signs, the overarching messages that came through were "We're all in this together!", "Science has made the world a better place, with much less disease and famine, a much higher standard of living for billions, and a greater understanding of the amazingness of the universe.", "Science does actually provide factual answers to properly formulated scientific questions", and "Facts are not opinions, and should feed into policy decisions, rather than policy positions altering what people claim are facts."
  • For a bunch of people often stereotyped as humorless, scientists had some pretty funny, creative signs.  A personal favorite:  "The last time scientists were silenced, Krypton exploded!"  One I saw online:  "I can't believe I have to march for facts."
  • Based on what I saw, it's hard for me to believe that this would have the negative backlash that some were worrying about before the event.  It simply wasn't done in a sufficiently controversial or antagonistic way.  Anyone who would have found the messages in the first point above to be offensive and polarizing likely already had negative perceptions of scientists, and (for good or ill) most of the population wasn't paying much attention anyway.
So what now?

  • Hopefully this will actually get more people who support the main messages above to engage, both with the larger community and with their political representatives.  
  • It would be great to see some more scientists and engineers actually run for office.  
  • It would also be great if more of the media would get on board with the concept that there really are facts.  Policy-making is complicated and must take into account many factors about which people can have legitimate disagreements, but that does not mean that every statement has two sides.  "Teach the controversy" is not a legitimate response to questions of testable fact.  In other words, Science is Real
  • Try to stay positive and keep the humor and creativity flowing.  We are never going to persuade a skeptical, very-busy-with-their-lives public if all we do is sound like doomsayers.

n-Category Café On Clubs and Data-Type Constructors

Guest post by Pierre Cagne

The Kan Extension Seminar II continues with a third consecutive of Kelly, entitled On clubs and data-type constructors. It deals with the notion of club, first introduced by Kelly as an attempt to encode theories of categories with structure involving some kind of coherence issues. Astonishing enough, there is no mention of operads whatsoever in this article. (To be fair, there is a mention of “those Lawvere theories with only associativity axioms”…) Is it because the notion of club was developed in several stages at various time periods, making operads less identifiable among this work? Or does Kelly judge irrelevant the link between the two notions? I am not sure, but anyway I think it is quite interesting to read this article in the light of what we now know about operads.

Before starting with the mathematical content, I would like to thank Alexander, Brendan and Emily for organizing this online seminar. It is a great opportunity to take a deeper look at seminal papers that would have been hard to explore all by oneself. On that note, I am also very grateful for the rich discussions we have with my fellow participants.

Non symmetric Set-operads

Let us take a look at the simplest kind of operads: non symmetric Set\mathsf{Set}-operads. Those are informally collections of operations with given arities closed under compositions. The usual way to define them is to endow the category [N,Set][\mathbf{N},\mathsf{Set}] of N\mathbf{N}-indexed families of sets with the substitution monoidal product (see Simon’s post): for two such families RR and SS, (RS) n= k 1++k m=nR m×S k 1××S k mnN (R \circ S)_n = \sum_{k_1+\dots+k_m = n} R_m \times S_{k_1} \times \dots \times S_{k_m} \quad \forall n \in \mathbf{N} This monoidal product is better understood when elements of R nR_n and S nS_n are thought as branching with nn inputs and one output: RSR\circ S is then obtained by plugging outputs of elements of SS to the inputs of elements of RR. A non symmetric operad is defined to be a monoid for that monoidal product, a typical example being the family (Set(X n,X)) nN(\mathsf{Set}(X^n,X))_{n\in\mathbf{N}} for a set XX.

We can now take advantage of the equivalence [N,Set]Set/N[\mathbf{N},\mathsf{Set}] \overset \sim \to \mathsf{Set}/\mathbf{N} to equip the category Set/N\mathsf{Set}/\mathbf{N} with a monoidal product. This equivalence maps a family SS to the coproduct nS n\sum_n S_n with the canonical map to N\mathbf{N}, while the inverse equivalence maps a function a:ANa: A \to \mathbf{N} to the family of fibers (a 1(n)) nN(a^{-1}(n))_{n\in\mathbf{N}}. It means that a N\mathbf{N}-indexed family can be thought either as a set of operations of arity nn for each nn or as a bunch of operations, each labeled by an integer given its arity. Let us transport the monoidal product of [N,Set][\mathbf{N}, \mathsf{Set}] to Set/N\mathsf{Set}/\mathbf{N}: given two maps a:ANa: A \to \mathbf{N} and b:BNb: B \to \mathbf{N}, we compute the \circ-product of the family of fibers, and then take the coproduct to get AB={(x,y 1,,y m):xA,y iB,a(x)=m} A\circ B = \{ (x,y_1,\dots,y_m) : x \in A, y_i \in B, a(x) = m \} with the map ABNA\circ B \to \mathbf{N} mapping (x,y 1,,y m) ib(y i)(x,y_1,\dots,y_m)\mapsto \sum_i b(y_i). That is, the monoidal product is achieved by computing the following pullback:

Non symmetric operads as pullbacks

where LL is the free monoid monad (or list monad) on Set\mathsf{Set}. Hence a non symmetric operad is equivalently a monoid in Set/N\mathsf{Set}/\mathbf{N} for this monoidal product. In Burroni’s terminology, it would be called a LL-category with one object.

In my opinion, Kelly’s clubs are a way to generalize this point of view to other kind of operads, replacing N\mathbf{N} by the groupoid P\mathbf P of bijections (to get symmetric operads) or the category Fin\mathsf{Fin} of finite sets (to get Lawvere theories). Obviously, Set/P\mathsf{Set}/\mathbf P or Set/Fin\mathsf{Set}/\mathsf{Fin} does not make much sense, but the coproduct functor of earlier can be easily understood as a Grothendieck construction that adapts neatly in this context, providing functors: [P,Set]Cat/P,[Fin,Set]Cat/Fin [\mathbf P,\mathsf{Set}] \to \mathsf{Cat}/\mathbf P,\qquad [\mathsf{Fin},\mathsf{Set}] \to \mathsf{Cat}/\mathsf{Fin} Of course, these functors are not equivalences anymore, but it does not prevent us from looking for monoidal products on Cat/P\mathsf{Cat}/\mathbf P and Cat/Fin\mathsf{Cat}/\mathsf{Fin} that restrict to the substitution product on the essential images of these functors (i.e. the discrete opfibrations). Before going to the abstract definitions, you might keep in mind the following goal: we are seeking those small categories 𝒞\mathcal{C} such that Cat/𝒞\mathsf{Cat}/\mathcal{C} admits a monoidal product reflecting through the Grothendieck construction the substition product in [𝒞,Set][\mathcal{C},\mathsf{Set}].

Abstract clubs

Recall that in a monoidal category \mathcal{E} with product \otimes and unit II, any monoid MM with multiplication m:MMMm: M\otimes M \to M and unit u:IMu: I \to M induces a monoidal structure on /M\mathcal{E}/M as follows: the unit is u:IMu: I \to M and the product of f:XMf: X \to M by g:YMg: Y \to M is the composite XYfgMMmM X\otimes Y \overset {f\otimes g}\to M \otimes M \overset{m}\to M Be aware that this monoidal structure depends heavily on the monoid MM. For example, even if \mathcal{E} is finitely complete and \otimes is the cartesian product, the induced structure on /M\mathcal{E}/M is almost never the cartesian one. A notable fact about this structure on /M\mathcal{E}/M is that the monoids in it are exactly the morphisms of monoids with codomain MM.

We will use this property in the monoidal category [𝒜,𝒜][\mathcal{A},\mathcal{A}] of endofunctors on a category 𝒜\mathcal{A}. I will not say a lot about size issues here, but of course we assume that there exist enough universes to make sense of [𝒜,𝒜][\mathcal{A},\mathcal{A}] as a category even when 𝒜\mathcal{A} is not small but only locally small: that is, if smallness is relative to a universe 𝕌\mathbb{U}, then we posit a universe 𝕍𝕌\mathbb{V} \ni \mathbb{U} big enough to contain the set of objects of 𝒜\mathcal{A}, making 𝒜\mathcal{A} a 𝕍\mathbb{V}-small category hence [𝒜,𝒜][\mathcal{A},\mathcal{A}] a locally 𝕍\mathbb{V}-small category. The monoidal product on [𝒜,𝒜][\mathcal{A},\mathcal{A}] is just the composition of endofunctors and the unit is the identity functor Id\mathrm{Id}. The monoids in that category are precisely the monads on 𝒜\mathcal{A}, and for any such S:𝒜𝒜S: \mathcal{A} \to \mathcal{A} with multiplication n:SSSn: SS \to S and unit j:IdSj: \mathrm{Id} \to S, the slice category [𝒜,𝒜]/S[\mathcal{A},\mathcal{A}]/S inherits a monoidal structure with unit jj and product α Sβ\alpha \circ^S \beta the composite TRαβSSnS T R \overset{\alpha\beta} \to S S \overset n \to S for any α:TS\alpha: T \to S and β:RS\beta: R \to S.

Now a natural transformation γ\gamma between two functors F,G:𝒜𝒜F,G: \mathcal{A} \to \mathcal{A} is said to be cartesian whenever the naturality squares

Cartesian natural transformation

are pullback diagrams. If 𝒜\mathcal{A} is finitely complete, as it will be for the rest of the post, it admits in particular a terminal object 11 and the pasting lemma ensures that we only have to check for the pullback property of the naturality squares of the form

Alternative definition of cartesian natural transformation

to know if γ\gamma is cartesian. Let us denote by \mathcal{M} the (possibly large) set of morphsisms in [𝒜,𝒜][\mathcal{A},\mathcal{A}] that are cartesian in this sense, and denote by /S\mathcal{M}/S the full subcategory of [𝒜,𝒜]/S[\mathcal{A},\mathcal{A}]/S whose objects are in \mathcal{M}.

Definition. A club in 𝒜\mathcal{A} is a monad SS such that /S\mathcal{M}/S is closed under the monoidal product S\circ^S.

By “closed under S\circ^S”, it is understood that the unit jj of SS is in \mathcal{M} and that the product α Sβ\alpha \circ^S \beta of two elements of \mathcal{M} with codomain SS still is in \mathcal{M}. A useful alternate characterization is the following:

Lemma. A monad (S,n,j)(S,n,j) is a club if and only if n,jn,j \in \mathcal{M} and SS\mathcal{M}\subseteq \mathcal{M}.

It is clear from the definition of S\circ^S that the condition is sufficient, as the α Sβ\alpha \circ^S \beta can be written as n(Sβ)(αT)n\cdot(S\beta)\cdot(\alpha T) via the exchange rule. Now suppose SS is a club: jj \in \mathcal{M} as it is the monoidal unit; nn \in \mathcal{M} comes from id S Sid S\mathrm{id}_S \circ^S \mathrm{id}_S \in \mathcal{M}; finally for any α:TS\alpha: T \to S \in \mathcal{M}, we should have id S Sα=n(Sα)\mathrm{id}_S \circ^S \alpha = n\cdot(S\alpha) \in \mathcal{M}, and having already nn\in\mathcal{M} this yields SαS\alpha \in \mathcal{M} by the pasting lemma.

In particular, this lemma shows that monoids in /S\mathcal{M}/S, which coincide with monad maps TST \to S \in \mathcal{M} for some monad TT, are clubs too. We shall denote the category of these by Club(𝒜)/S\mathbf{Club}(\mathcal{A})/S.

The lemma also implies that any cartesian monad, by which is meant a pullbacks preserving monad with cartesian unit and multiplication, is automatically a club.

Now note that evaluation at 11 provides an equivalence /S𝒜/S1\mathcal{M}/S \overset\sim\to \mathcal{A}/S1 whose pseudo inverse is given for a map f:KS1f:K \to S1 by the natural transformation pointwise defined as the pullback


The previous monoidal product on /S\mathcal{M}/S can be transported on 𝒜/S1\mathcal{A}/S1 and bears a fairly simple description: given f:KS1f:K \to S1 and g:HS1g:H \to S1, the product, still denoted f Sgf\circ^S g, is the evaluation at 11 of the composite TRSSSTR \to SS \to S where TST \to S corresponds to ff and RSR\to S to gg. Hence the explicit equivalence given above allows us to write this as

Clubs as pullbacks

Definition. By abuse of terminology, a monoid in 𝒜/S1\mathcal{A}/S1 is said to be a club over S1S1.

Examples of clubs

On Set\mathsf{Set}, the free monoid monad LL is cartesian, hence a club on Set\mathsf{Set} in the above sense. Of course, we retrieve as L\circ^L the monoidal product of the introduction on Set/N\mathsf{Set}/\mathbf{N}. Hence, clubs over N\mathbf{N} in Set\mathsf{Set} are exactly the non symmetric Set\mathsf{Set}-operads.

Considering Cat\mathsf{Cat} as a 11-category, the free finite coproduct category monad FF on Cat\mathsf{Cat} is a club in the above sense. This can be shown directly through the charaterization we stated earlier: its unit and multiplication are cartesian and it maps cartesian transformations to cartesian transformations. Moreover, the obvious monad map PFP \to F is cartesian, where PP is the free strict symmetric monoidal category monad on Cat\mathsf{Cat}. Hence it yields for free that PP is also a club on Cat\mathsf{Cat}. Note that the groupoid P\mathbf{P} of bijections is P1P1 and the category Fin\mathsf{Fin} of finite sets is F1F1. So it is now a matter of careful bookkeeping to establish that the functors (given by the Grothendieck construction) [P,Set]Cat/P,[Fin,Set]Cat/Fin [\mathbf{P},\mathsf{Set}] \to \mathsf{Cat}/\mathbf{P}, \qquad [\mathsf{Fin},\mathsf{Set}] \to \mathsf{Cat}/\mathsf{Fin} are strong monoidal where the domain categories are given Kelly’s substition product. In other words, it exhibits symmetric Set\mathsf{Set}-operads and non enriched Lawvere theories as special clubs over P\mathbf{P} and Fin\mathsf{Fin}.

We could say that we are done: we have a polished abstract notion of clubs that can encompass the different notions of operads on Set\mathsf{Set} that we are used to. But what about operads on other categories? Also, the above monads PP and FF are actually 22-monads on Cat\mathsf{Cat} when seen as a 22-category. Can we extend the notion to this enrichement?

Enriched clubs

We shall fix a cosmos 𝒱\mathcal{V} to enriched over (and denote as usual the underlying ordinary notions by a 00-index), but we want it to have good properties, so that finite completeness makes sense in this enriched framework. Hence we ask that 𝒱\mathcal{V} is locally finitely presentable as a closed category (see David’s post). Taking a look at what we did in the ordinary case, we see that it heavily relies on the possibility of defining slice categories, which is not possible in full generality. Hence we ask for 𝒱\mathcal{V} to be semicartesian, meaning that the monoidal unit of 𝒱\mathcal{V} is its terminal object: then for a 𝒱\mathcal{V}-category \mathcal{B}, the slice category /B\mathcal{B}/B is defined to have elements 1(X,B)1 \to \mathcal{B}(X,B) as objects, and the space of morphisms between such f:1(X,B)f:1 \to \mathcal{B}(X,B) and f:1(X,B)f':1 \to \mathcal{B}(X',B) is given by the following pullback in 𝒱 0\mathcal{V}_0:

Comma enriched

If we also want to be able to talk about the category of enriched clubs over something, we should be able to make a 𝒱\mathcal{V}-category out of the monoids in a monoidal 𝒱\mathcal{V}-category. Again, this is a priori not possible to do: the space of monoid maps between (M,m,i)(M,m,i) and (N,n,j)(N,n,j) is supposed to interpret “the subspace of those f:MNf: M \to N such that fi=jfi=j and fm(x,y)=n(fx,fy)fm(x,y)=n(fx,fy) for all x,yx,y”, where the later equation has two occurences of ff on the right. Hence we ask that 𝒱\mathcal{V} is actually a cartesian cosmos, so that the interpretation of such a subspace is the joint equalizer of

Monoid enriched

Monoid enriched

Moreover, these hypothesis also resolve the set theoretical issues: because of all the hypotheses on 𝒱\mathcal{V}, the underlying 𝒱 0\mathcal{V}_0 identifies with the category Lex[𝒯 0,Set]\mathrm{Lex}[\mathcal{T}_0,\mathsf{Set}] of Set\mathsf{Set}-valued left exact functors from the finitely presentables of 𝒱 0\mathcal{V}_0. Hence, for a 𝒱\mathcal{V}-category 𝒜\mathcal{A}, the category of 𝒱\mathcal{V}-endofunctors [𝒜,𝒜][\mathcal{A},\mathcal{A}] is naturally a 𝒱\mathcal{V}'-category for the cartesian cosmos 𝒱=Lex[𝒯 0,Set]\mathcal{V}'=\mathrm{Lex}[\mathcal{T}_0,\mathsf{Set}'] where Set\mathsf{Set}' is the category of 𝕍\mathbb{V}-small sets for a universe 𝕍\mathbb{V} big enough to contain the set of objects of 𝒜\mathcal{A}. Hence we do not care so much about size issues and consider everything to be a 𝒱\mathcal{V}-category; the careful reader will replace 𝒱\mathcal{V} by 𝒱\mathcal{V}' when necessary.

In the context of categories enriched over a locally finitely presentable cartesian closed cosmos 𝒱\mathcal{V}, all we did in the ordinary case is directly enrichable. We call a 𝒱\mathcal{V}-natural transformation α:TS\alpha: T \to S cartesian just when it is so as a natural transformation T 0S 0T_0 \to S_0, and denote the set of these by \mathcal{M}. For a 𝒱\mathcal{V}-monad SS on 𝒜\mathcal{A}, the category /S\mathcal{M}/S is the full subcategory of the slice [𝒜,𝒜]/S[\mathcal{A},\mathcal{A}]/S spanned by the objects in \mathcal{M}.

Definition. A 𝒱\mathcal{V}-club on 𝒜\mathcal{A} is a 𝒱\mathcal{V}-monad SS such that /S\mathcal{M}/S is closed under the induced 𝒱\mathcal{V}-monoidal product of [𝒜,𝒜]/S[\mathcal{A},\mathcal{A}]/S.

Now comes the fundamental proposition about enriched clubs:

Proposition. A 𝒱\mathcal{V}-monad SS is a 𝒱\mathcal{V}-club if and only if S 0S_0 is an ordinary club.

In that case, the category of monoids in /S\mathcal{M}/S is composed of the clubs TT together with a 𝒱\mathcal{V}-monad map 1[𝒜,𝒜](T,S)1 \to [\mathcal{A},\mathcal{A}](T,S) in \mathcal{M}. We will still denote it Club(𝒜)/S\mathbf{Club}(\mathcal{A})/S and its underlying ordinary category is Club(𝒜 0)/S 0\mathbf{Club}(\mathcal{A}_0)/S_0. We can once again take advantage of the 𝒱\mathcal{V}-equivalence /S𝒜/S1\mathcal{M}/S \simeq \mathcal{A}/S1 to equip the later with a 𝒱\mathcal{V}-monoidal product, and abuse terminlogy to call its monoids 𝒱\mathcal{V}-clubs over S1S1. Proving all that carefully require notions of enriched factorization systems that are of no use for this post.

So basically, the slogan is: as long as 𝒱\mathcal{V} is a cartesian cosmos which is loccally presentable as a closed category, everything works the same way as in the ordinary case, and () 0(-)_0 preserves and reflects clubs.

Examples of enriched clubs

As we said earlier, FF and PP are 22-monads on Cat\mathsf{Cat}, and the underlying F 0F_0 and P 0P_0 (earlier just denoted FF and PP) are ordinary clubs. So FF and PP are Cat\mathsf{Cat}-clubs, maybe better called 22-clubs. Moreover, the map P 0F 0P_0 \to F_0 mentioned earlier is easily promoted to a 22-natural transformation making P\mathbf{P} a 22-club over Fin\mathsf{Fin}.

The free monoid monad on a cartesian cosmos 𝒱\mathcal{V} is a 𝒱\mathcal{V}-club and the clubs over L1L1 are precisely the non symmetric 𝒱\mathcal{V}-operads.

Last but not least, a quite surprising example at first sight. Any small ordinary category 𝒜 0\mathcal{A}_0 is naturally enriched in its category of presheaves Psh(𝒜 0)\mathrm{Psh}(\mathcal{A}_0), as the full subcategory of the cartesian cosmos 𝒱=Psh(𝒜 0)\mathcal{V}=\mathrm{Psh}(\mathcal{A}_0) spanned by the representables. Concretely, the space of morphisms between AA and BB is given by the presheaf 𝒜(A,B):C𝒜 0(A×C,B) \mathcal{A}(A,B): C \mapsto \mathcal{A}_0(A \times C, B) Hence an 𝒱\mathcal{V}-endofunctor SS on 𝒜\mathcal{A} is the data of a map ASAA \mapsto SA on objects, together with for any A,BA,B a 𝒱\mathcal{V}-natural transformation σ A,B:𝒜(A,B)𝒜(SA,SB)\sigma_{A,B}: \mathcal{A}(A,B) \to \mathcal{A}(SA,SB) satisfying some axioms. Now fixing A,C𝒜A,C \in \mathcal{A}, the collection of (σ A,B) C:𝒜 0(A×C,B)𝒜 0(SA×C,SB) (\sigma_{A,B})_C : \mathcal{A}_0(A\times C,B) \to \mathcal{A}_0(SA \times C, SB) is equivalently, via Yoneda, a collection of σ˜ A,C:𝒜 0(SA×C,S(A×C)). \tilde{\sigma}_{A,C} : \mathcal{A}_0(SA\times C,S(A \times C)). The axioms that σ\sigma satisfies as a 𝒱\mathcal{V}-enriched natural transformation make σ˜\tilde \sigma a strength for the endofunctor S 0S_0. Along this translation, a strong monad on 𝒜\mathcal{A} is then just a Psh(𝒜 0)\mathrm{Psh}(\mathcal{A}_0)-monad. And it is very common, when modelling side effects by monads in Computer Science, to end up with strong cartesian monads. As cartesian monads, they are in particular ordinary clubs on 𝒜 0\mathcal{A}_0. Hence, those are Psh(𝒜 0)\mathrm{Psh}(\mathcal{A}_0)-monads whose underlying ordinary monad is a club: that is, they are Psh(𝒜 0)\mathrm{Psh}(\mathcal{A}_0)-clubs on 𝒜\mathcal{A}.

In conclusion, let me point out that there is much more in Kelly’s article than presented here, especially on local factorisation systems and their link to (replete) reflexive subcategories with a left exact reflexion. It is by the way quite surprising that he does not stay in full generality longer, as one could define an abstract club in just that framework. Maybe there is just no interesting example to come up with at that level of generality…

Also, a great deal of examples of club comes from never published work of Robin Cockett (or at least, I was not able to find it), so these motivations are quite difficult to follow.

Going a little further in the generalization, the cautious reader should have noticed that we did not say anything about coloured operads. For then we would not have to look at slice categories of the form 𝒜/S1\mathcal{A}/S1, but at categories of span with one leg pointing to SCS C (morally mapping an operation to its coloured arity) and the other one to CC (morally picking the output colour), where the CC is the object of colours. Those spans actually appear above implicitly whenever a map or the form !:X1!:X \to 1 is involved (morally, this is the map picking the “only output colour” in a non coloured operad). This somehow should be contained somewhere in Garner’s work on double clubs or in Shulman’s and Cruttwell’s unified framework for generalized multicategories. I am looking forward to learn more about that in the comments!

April 22, 2017

Scott AaronsonMe at the Science March today, in front of the Texas Capitol in Austin

Noncommutative GeometryGreat Thanks!

Let me express my heartfelt thanks to the organizers of the Shanghai 2017 NCG-event as well as to each participant. Your presence and talks were the most wonderful gift showing so many lively facets of mathematics and physics displaying the vitality of NCG! The whole three weeks were a pleasure due to the wonderful hospitality of our hosts, Xiaoman Chen, Guoliang Yu and Yi-Jun Yao. It is

April 21, 2017

Scott AaronsonIf Google achieves superintelligence, time zones will be its Achilles heel

Like a latter-day Prometheus, Google brought a half-century of insights down from Mount Academic CS, and thereby changed life for the better here in our sublunary realm.  You’ve probably had the experience of Google completing a search query before you’d fully formulated it in your mind, and thinking: “wow, our dysfunctional civilization might no longer be able to send people to the Moon, or even build working mass-transit systems, but I guess there are still engineers who can create things that inspire awe.  And apparently many of them work at Google.”

I’ve never worked at Google, or had any financial stake in them, but I’m delighted to have many friends at Google’s far-flung locations, from Mountain View to Santa Barbara to Seattle to Boston to London to Tel Aviv, who sometimes host me when I visit and let me gorge on the legendary free food.  If Google’s hiring of John Martinis and avid participation in the race for quantum supremacy weren’t enough, in the past year, my meeting both Larry Page and Sergey Brin to discuss quantum computing and the foundations of quantum mechanics, and seeing firsthand the intensity of their nerdish curiosity, heightened my appreciation still further for what that pair set in motion two decades ago.  Hell, I don’t even begrudge Google its purchase of a D-Wave machine—even that might’ve ultimately been for the best, since it’s what led to the experiments that made clear the immense difficulty of getting any quantum speedup from those machines in a fair comparison.

But of course, all that fulsome praise was just a preamble to my gripe.  It’s time someone said it in public: the semantics of Google Calendar are badly screwed up.

The issue is this: suppose I’m traveling to California, and I put into Google Calendar that, the day after I arrive, I’ll be giving a lecture at 4pm.  In such a case, I always—always—mean 4pm California time.  There’s no reason why I would ever mean, “4pm in whatever time zone I’m in right now, while creating this calendar entry.”

But Google Calendar doesn’t understand that.  And its not understanding it—just that one little point—has led to years of confusions, missed appointments, and nearly-missed flights, on both my part and Dana’s.  At least, until we learned to painstakingly enter the time zone for every calendar entry by hand (I still often forget).

Until recently, I thought it was just me and Dana who had this problem.  But then last week, completely independently, a postdoc started complaining to me, “you know what’s messed up about Google Calendar?…”

The ideal, I suppose, would be to use machine learning to guess the intended time zone for each calendar entry.  But failing that, it would also work fine just to assume that “4pm,” as entered by the user, unless otherwise specified means “4pm in whatever time zone we find ourselves in when the appointed day arrives.”

I foresee two possibilities, either of which I’m OK with.  The first is that Google fixes the problem, whether prompted by this blog post or by something else.  The second is that the issue never gets resolved; then, as often prophesied, Google’s deep nets achieve sentience and plot to take over the whole observable universe … and they would, if not for one fortuitous bug, which will cause the AIs to tip their hand to humanity an hour before planned.

In a discussion thread on Y Combinator, some people object to my proposed solution (“4pm means 4pm in whichever time zone I’ll be in then“) on the following ground. What if I want to call a group meeting at (say) 11am in Austin, and I’ll be traveling but will still call into the meeting remotely, and I want my calendar to show the meeting time in Austin, not the time wherever I’ll be calling in from (which might even be a plane)?

I can attest that, in ten years, that’s not a problem that’s arisen for me even once, whereas the converse problem arises almost every week, and is one of the banes of my existence.

But sure: Google Calendar should certainly include the option to tie times to specific time zones in advance! It seems obvious to me that my way should be the default, but honestly, I’d be happy if my way were even an option you could pick.

BackreactionNo, physicists have not created “negative mass”

Thanks to BBC, I will now for several years get emails from know-it-alls who think physicists are idiots not to realize the expansion of the universe is caused by negative mass. Because that negative mass, you must know, has actually been created in the lab: The Independent declares this turns physics “completely upside down” And if you think that was crappy science journalism, The

BackreactionCatching Light – New Video!

I have many shortcomings, like leaving people uncertain whether they’re supposed to laugh or not. But you can’t blame me for lack of vision. I see a future in which science has become a cultural good, like sports, music, and movies. We’re not quite there yet, but thanks to the Foundational Questions Institute (FQXi) we’re a step closer today. This is the first music video in a series of

Doug NatelsonReady-to-peel semiconductors!

Every now and then there is an article that makes you sit up and say "Wow!"  

Epitaxy is the growth of crystalline material on top of a substrate with a matching (or very close to it) crystal structure.  For example, it is possible to grow InAs epitaxially on top of GaSb, or SiGe epitaxially on top of Si.  The idea is that the lattice of the underlying material guides the growth of the new layers of atoms, and if the lattice mismatch isn't too bad and the conditions are right, you can get extremely high quality growth (that is, with nearly perfect structure).  The ability to grow semiconductor films epitaxially has given us a ton of electronic devices that are everywhere around us, including light emitting diodes, diode lasers, photodiodes, high mobility transistors, etc.   Note that when you grow, say, AlGaAs epitaxially on a GaAs substrate, you end up with one big crystal, all covalently bonded.  You can't readily split off just the newly grown material mechanically.  If you did homoepitaxy, growing GaAs on GaAs, you likely would not even be able to figure out where the substrate ended and the overgrown film began.

In this paper (sorry about the Nature paywall - I couldn't find another source), a group from MIT has done something very interesting.  They have shown that a monolayer of graphene on top of a substrate does not screw up overgrowth of material that is epitaxially registered with the underlying substrate.  That is, if you have an atomically flat, clean GaAs substrate ("epiready"), and cover it with a single atomic layer of graphene, you can grow new GaAs on top of the graphene (!), and despite the intervening carbon atoms (with their own hexagonal lattice in the way), the overgrown GaAs will have registry (crystallographic alignment and orientation) with the underlying substrate.  Somehow the short-ranged potentials that guide the overgrowth are able to penetrate through the graphene.  Moreover, after you've done the overgrowth, you can actually peel off the epitaxial film (!!), since it's only weakly van der Waals bound to the graphene.  They demonstrate this with a variety of overgrown materials, including a III-V semiconductor stack that functions as a LED.  

I found this pretty amazing.  It suggests that there may be real opportunities for using layered van der Waals materials to grow new and unusual systems, perhaps helping with epitaxy even when lattice mismatch would otherwise be a problem.  I suspect the physics at work here (chemical interactions from the substrate "passing through" overlying graphene) is closely related to this work from several years ago.

April 19, 2017

John PreskillThe entangled fabric of space

We live in the information revolution. We translate everything into vast sequences of ones and zeroes. From our personal email to our work documents, from our heart rates to our credit rates, from our preferred movies to our movie preferences, all things information are represented using this minimal {0,1} alphabet which our digital helpers “understand” and process. Many of us physicists are now taking this information revolution at heart and embracing the “It from qubit” motto. Our dream: to understand space, time and gravity as emergent features in a world made of information – quantum information.

Over the past two years, I have been obsessively trying to understand this profound perspective more rigorously. Recently, John Preskill and I have taken a further step in this direction in the recent paper: quantum code properties from holographic geometries. In it, we make progress in interpreting features of the holographic approach to quantum gravity in the terms of quantum information constructs. 

In this post I would like to present some context for this work through analogies which hopefully help intuitively convey the general ideas. While still containing some technical content, this post is not likely to satisfy those readers seeking a precise in-depth presentation. To you I can only recommend the masterfully delivered lecture notes on gravity and entanglement by Mark Van Raamsdonk.  

Entanglement as a cat’s cradle


A cat’s cradle serves as a crude metaphor for quantum mechanical entanglement. The full image provides a complete description of the string and how it is laced in a stable configuration around the two hands. However, this lacing does not describe a stable configuration of half the string on one hand. The string would become disentangled and fall if we were to suddenly remove one of the hands or cut through the middle.

Of all the concepts needed to explain emergent spacetime, maybe the most difficult is that of quantum entanglement. While the word seems to convey some kind of string wound up in a complicated way, it is actually a quality which may describe information in quantum mechanical systems. In particular, it applies to a system for which we have a complete description as a whole, but are only capable of describing certain statistical properties of its parts. In other words, our knowledge of the whole loses predictive power when we are only concerned with the parts. Entanglement entropy is a measure of information which quantifies this.

While our metaphor for entanglement is quite crude, it will serve the purpose of this post. Namely, to illustrate one of the driving premises for the holographic approach to quantum gravity, that the very structure of spacetime is emergent and built up from entanglement entropy.

Knit and crochet your way into the manifolds

But let us bring back our metaphors and try to convey the content of this identification. For this, we resort to the unlikely worlds of knitting and crochet. Indeed, by a planned combination of individual loops and stitches, these traditional crafts are capable of approximating any kind of surface (2D Riemannian surface would be the technical term).

Here I have presented some examples with uniform curvature R: flat in green, positive curvature (ball) in yellow and negative curvature (coral reef) in purple. While actual practitioners may be more interested in getting the shape right on hats and socks for loved ones, for us the point is that if we take a step back, these objects built of simple loops, hooks and stitches could end up looking a lot like the smooth surfaces that a physicist might like to use to describe 2D space. This is cute, but can we push this metaphor even further?

Well, first of all, although the pictures above are only representing 2D surfaces, we can expect that a similar approach should allow approximating 3D and even higher dimensional objects (again the technical term is Riemannian manifolds). It would just make things much harder to present in a picture. These woolen structures are, in fact, quite reminiscent of tensor networks, a modern mathematical construct widely used in the field of quantum information. There too, we combine basic building blocks (tensors) through simple operations (tensor index contraction) to build a more complex composite object. In the tensor network world, the structure of the network (how its nodes are connected to other nodes) generically defines the entanglement structure of the resulting object.


This regular tensor network layout was used to describe hyperbolic space which is similar to the purple crochet. However, they apriori look quite dissimilar due to the use of the Poincaré disk model where tensors further from the center look smaller. Another difference is that the high degree of regularity is achieved at the expense of having very few tensors per curvature radius (as compared to its purple crochet cousin). However, planarity and regularity don’t seem to be essential so the crochet probably provides a better intuitive picture.

Roughly speaking, tensor networks are ingenious ways of encoding (quantum) inputs into (quantum) outputs. In particular, if you enter some input at the boundary of your tensor network, the tensors do the work of processing that information throughout the network so that if you ask for an output at any one of the nodes in the bulk of the tensor network, you get the right encoded answer. In other words, the information we input into the tensor network begins its journey at the dangling edges found at the boundary of the network and travels through the bulk edges by exploiting them as information bridges between the nodes of the network.

In the figure representing the cat’s cradle, these dangling input edges can be though of as the fingers holding the wool. Now, if we partition these edges into two disjoint sets (say, the fingers on the left hand and the fingers on the right hand, respectively), there will be some amount of entanglement between them. How much? In general, we cannot say, but under certain assumptions we find that it is proportional to the minimum cut through the network. Imagine you had an incredible number of fingers holding your wool structure. Now separate these fingers arbitrarily into two subsets L and R (we may call them left hand and right hand, although there is nothing right or left handy about them). By pulling left hand and right hand apart, the wool might stretch until at some point it breaks. How many threads will break? Well, the question is analogous to the entanglement one. We might expect, however, that a minimal number of threads break such that each hand can go its own way. This is what we call the minimal cut. In tensor networks, entanglement entropy is always bounded above by such a minimal cut and it has been confirmed that under certain conditions entanglement also reaches, or approximates, this bound. In this respect, our wool analogy seems to be working out.


Holography, in the context of black holes, was sparked by a profound observation of Jacob Bekenstein and Stephen Hawking, which identified the surface area of a black hole horizon (in Planck units) with its entropy, or information content:BHentropyF1

S_{BH} = \frac{k A_{BH}}{4\ell_p^2} .

Here, S_{BH} is the entropy associated to the black hole, A_{BH} is its horizon area, \ell_p is the Planck length and k is Boltzmann’s constant.
Why is this equation such a big deal? Well, there are many reasons, but let me emphasize one. For theoretical physicists, it is common to get rid of physical units by relating them through universal constants. For example, the theory of special relativity allows us to identify units of distance with units of time through the equation d=ct using the speed of light c. General relativity further allows us to identify mass and energy through the famous E=mc^2. By considering the Bekenstein-Hawking entropy, units of area are being swept away altogether! They are being identified with dimensionless units of information (one square meter is roughly 1.4\times10^{69} bits according to the Bousso bound).

Initially, the identification of area and information was proposed to reconcile black holes with the laws of thermodynamics. However, this has turned out to be the main hint leading to the holographic principle, wherein states that describe a certain volume of space in a theory of quantum gravity can also be thought of as being represented at the lower dimensional boundary of the given volume. This idea, put forth by Gerard ‘t Hooft, was later given a more precise interpretation by Leonard Susskind and subsequently by Juan Maldacena through the celebrated AdS/CFT correspondence. I will not dwell in the details of the AdS/CFT correspondence as I am not an expert myself. However, this correspondence gave S. Ryu and T. Takayanagi  (RT) a setting to vastly generalize the identification of area as an information quantity. They proposed identifying the area of minimal surfaces on the bulk (remember the minimal cut?) with entanglement entropy in the boundary theory.

Roughly speaking, if we were to split the boundary into two regions, left L and right R it should be possible to also partition the bulk in a way that each piece of the bulk has either L or R in its boundary. Ryu and Takayanagi proposed that the area of the smallest surface \chi_R=\chi_L which splits the bulk in this way would be proportional to the entanglement entropy between the two parts

S_L = S_R = \frac{|\chi_L|}{4G} =\frac{|\chi_R|}{4G}.

It turns out that some quantum field theory states admit such a geometric interpretation. Many high energy theory colleagues have ideas about when this is possible and what are the necessary conditions. By far the best studied setting for this holographic duality is AdS/CFT, where Ryu and Takayanagi first checked their proposal. Here, the entanglement features of  the lowest energy state of a conformal field theory are matched to surfaces in a hyperbolic space (like the purple crochet and the tensor network presented). However, other geometries have been shown to match the RT prediction with respect to the entanglement properties of different states. The key point here is that the boundary states do not have any geometry per se. They just manifest different amounts of entanglement when partitioned in different ways.


The holographic program suggests that bulk geometry emerges from the entanglement properties of the boundary state. Spacetime materializes from the information structure of the boundary instead of being a fundamental structure as in general relativity. Am I saying that we should strip everything physical, including space in favor of ones and zeros? Well, first of all, it is not just me who is pushing this approach. Secondly, no one is claiming that we should start making all our physical reasoning in terms of ones and zeros.

Let me give an example. We know that the sea is composed mostly of water molecules. The observation of waves that travel, superpose and break can be labeled as an emergent phenomenon. However, to a surfer, a wave is much more real than the water molecules composing it and the fact that it is emergent is of no practical consequence when trying to predict where a wave will break. A proficient physicist, armed with tools from statistical mechanics (there are more than 10^{25} molecules per liter), could probably derive a macroscopic model for waves from the microscopic theory of particles. In the process of learning what the surfer already understood, he would identify elements of the  microscopic theory which become irrelevant for such questions. Such details could be whether the sea has an odd or even number of molecules or the presence of a few fish.

In the case of holography, each square meter corresponds to 1.4\times10^{69} bits of entanglement. We don’t even have words to describe anything close to this outrageously large exponent which leaves plenty of room for emergence. Even taking all the information on the internet – estimated at 10^{22} bits (10 zettabits) – we can’t even match the area equivalent of the smallest known particle. The fact that there are so many orders of magnitude makes it difficult to extrapolate our understanding of the geometric domain to the information domain and vice versa. This is precisely the realm where techniques such as those from statistical mechanics successfully get rid of irrelevant details.

High energy theorists and people with a background in general relativity tend to picture things in a continuum language. For example, part of their daily butter are Riemannian or Lorentzian manifolds which are respectively used to describe space and spacetime. In contrast, most of information theory is usually applied to deal with discrete elements such as bits, elementary circuit gates, etc. Nevertheless, I believe it is fruitful to straddle this cultural divide to the benefit of both parties. In a way, the convergence we are seeking is analogous to the one achieved by the kinetic theory of gases, which allowed the unification of thermodynamics with classical mechanics.

So what did we do?

The remarkable success of the geometric RT prediction to different bulk geometries such as the BTZ black holes and the generality of the entanglement result for its random tensor network cousins emboldened us to take the RT prescription beyond its usual domain of application. We considered applying it to arbitrary Riemannian manifolds that are space-like and that can be approximated by a smoothly knit fabric.

Furthermore, we went on to consider the implications that such assumptions would have when the corresponding geometries are interpreted as error-correcting codes. In fact, our work elaborates on the perspective of A. Almheiri, X. Dong and D. Harlow (ADH) where quantum error-correcting code properties of AdS/CFT were laid out; it is hard to overemphasize the influence of this work. Our work considers general geometries and identifies properties a code associated to a specific holographic geometry should satisfy.

In the cat cradle/fabric metaphor for holography, the fingers at the boundary constitute the boundary theory without gravity and the resulted fabric represents a bulk geometry in the corresponding bulk gravitational theory. Bulk observables may be represented in different ways on the boundary, but not arbitrarily. This raises the question of which parts of the bulk correspond to which parts of the boundary. In general, there is not a one to one mapping. However, if we partition the boundary in two parts L and R, we expect to be able to split the bulk into two corresponding regions  {\mathcal E}[L]  and  {\mathcal E}[R]. This is the content of the entanglement wedge hypothesis, which is our other main assumption.  In our metaphor, one could imagine that we pull the left fingers up and the right fingers down (taking care not to get hurt). At some point, the fabric breaks through \chi_R into two pieces. In the setting we are concerned with, these pieces maintain part of the original structure, which tells us which bulk information was available in one piece of the boundary and which part was available in the other.

Although we do not produce new explicit examples of such codes, we worked our way towards developing a language which translates between the holographic/geometric perspective and the coding theory perspective. We specifically build upon the language of operator algebra quantum error correction (OAQEC) which allows individually focusing on different parts of the logical message. In doing so we identified several coding theoretic bounds and quantities, some of which we found to be applicable beyond the context of holography. A particularly noteworthy one is a strengthening of the quantum Singleton bound, which defines a trade-off between how much logical information can be packed in a code, how much physical space is used for encoding this information and how well-protected the information is from erasures.

One of the central observations of ADH highlights how quantum codes have properties from both classical error-correcting codes and secret sharing schemes. On the one hand, logical encoded information should be protected from loss of small parts of the carrier, a property quantified by the code distance. On the other hand, the logical encoded information should not become accessible until a sufficiently large part of the carrier is available to us. This is quantified by the threshold of a corresponding secret sharing scheme. We call this quantity price as it identifies how much of the carrier we would need before someone could reconstruct the message faithfully. In general, it is hard to balance these two competing requirements; a statement which can be made rigorous. This kind of complementarity has long been recognized in quantum cryptography. However, we found that according to holographic predictions, codes admitting a geometric interpretation achieve a remarkable optimality in the trade-off between these features.

Our exploration of alternative geometries is rewarded by the following guidelines


In uberholography, bulk observables are accessible in a Cantor type fractal shaped subregion of the boundary. This is illustrated on the Poincare disc presentation of negatively curved bulk.

  • Hyperbolic geometries predict a fixed polynomial scaling for code distance. This is illustrated by a feature we call uberholography. We use this name because there is an excess of holography wherein bulk observables can be represented on intricate subsets of the boundary which have fractal dimension even smaller than the boundary itself.
  • Hyperbolic geometries suggest the possibility of decoding procedures which are local on the boundary geometry. This property may be connected to the locality of the corresponding boundary field theory.
  • Flat and positive curvature geometries may lead to codes with better parameters in terms of distance and rates (ratio of logical information to physical information). A hemisphere reaches optimum parameters, saturating coding bounds.


    Seven iterations of a ternary Cantor set (dark line) on the unit interval. Each iteration is obtained by punching holes from the previous one and the set obtained in the limit is a fractal.

Current day quantum computers are far from the number of qubits required to invoke an emergent geometry. Nevertheless, it is exhilarating to take a step back and consider how the properties of the codes, and information in general, may be interpreted geometrically. On the other hand, I find that the quantum code language we adapt to the context of holography might eventually serve as a useful tool in distinguishing which boundary features are relevant or irrelevant for the emergent properties of the holographic dual. Ours is but one contribution in a very active field. However, the one thing I am certain about is that these are exciting times to be doing physics.

n-Category Café Functional Equations, Entropy and Diversity: A Seminar Course

I’ve just finished teaching a seminar course officially called “Functional Equations”, but really more about the concepts of entropy and diversity.

I’m grateful to the participants — from many parts of mathematics, biology and physics, at levels from undergraduate to professor — who kept coming and contributing, week after week. It was lots of fun, and I learned a great deal.

This post collects together all the material in one place. First, the notes:

Now, the posts I wrote every week:

n-Category Café The Diversity of a Metacommunity

The eleventh and final installment of the functional equations course can be described in two ways:

  • From one perspective, I talked about conditional entropy, mutual information, and a very appealing analogy between these concepts and the most basic primary-school Venn diagrams.

  • From another, it was about diversity across a metacommunity, that is, an ecological community divided into smaller communities (e.g. geographical sites).

The notes begin on page 44 here.

Venn diagram showing various entropy measures for a pair of random variables

Doug NatelsonMarch for Science, April 22

There has been a great deal written by many (e.g., 1 2 3 4 5 6) about the upcoming March for Science.  I'm going to the Houston satellite event.  I respect the concern that such a march risks casting scientists as "just another special interest group", or framing scientists as a group as leftists who are reflexively opposed to the present US administration.  Certainly some of the comments from the march's nominal twitter feed are (1) overtly political, despite claims that the event is not partisan; and (2) not just political, but rather extremely so.

On balance, though, I think that the stated core messages (science is not inherently partisan; science is critical for the future of the country and society; policy making about relevant issues should be informed by science) are important and should be heard by a large audience.   If the argument is that scientists should just stay quiet and keep their heads down, because silence is the responsible way to convey objectivity, I am not persuaded.  

BackreactionWhy doesn’t anti-matter anti-gravitate?

Flying pig.[Image:] Why aren’t there any particles that fall up in the gravitational field of Earth? It would be so handy – If I had to move the couch, rather than waiting for the husband to flex his muscles, I’d just tie an anti-gravitating weight to it and the couch would float to the other side of the room. Newton’s law of gravity and Coulomb’s law for the electric force

John BaezStanford Complexity Group

Aaron Goodman of the Stanford Complexity Group invited me to give a talk there on Thursday April 20th. If you’re nearby—like in Silicon Valley—please drop by! It will be in Clark S361 at 4:20 pm.

Here’s the idea. Everyone likes to say that biology is all about information. There’s something true about this—just think about DNA. But what does this insight actually do for us, quantitatively speaking? To figure this out, we need to do some work.

Biology is also about things that make copies of themselves. So it makes sense to figure out how information theory is connected to the replicator equation—a simple model of population dynamics for self-replicating entities.

To see the connection, we need to use ‘relative information’: the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Then everything pops into sharp focus.

It turns out that free energy—energy in forms that can actually be used, not just waste heat—is a special case of relative information Since the decrease of free energy is what drives chemical reactions, biochemistry is founded on relative information.

But there’s a lot more to it than this! Using relative information we can also see evolution as a learning process, fix the problems with Fisher’s fundamental theorem of natural selection, and more.

So this what I’ll talk about! You can see my slides here:

• John Baez, Biology as information dynamics.

but my talk will be videotaped, and it’ll eventually be put here:

Stanford complexity group, YouTube.

You can already see lots of cool talks at this location!


April 18, 2017

Tommaso DorigoLHCb Measures Unity, Finds 0.6

With a slightly anti-climatic timing if we consider the just ended orgy of new results presented at winter conferences in particle physics (which I touched on here), the LHCb collaboration outed today the results of a measurement of unity, drawing attention on the fact that unity was found to be not equal to 1.0.

read more

April 17, 2017

BackreactionBook review: “A Big Bang in a Little Room” by Zeeya Merali

A Big Bang in a Little Room: The Quest to Create New Universes Zeeya Merali Basic Books (February 14, 2017) When I heard that Zeeya Merali had written a book, I expected something like a Worst Of New Scientist compilation. But A Big Bang in A Little Room turned out to be both interesting and enjoyable, if maybe not for the reason the author intended. If you follow the popular science news on

April 15, 2017

n-Category Café Value

What is the value of the whole in terms of the values of the parts?

More specifically, given a finite set whose elements have assigned “values” v 1,,v nv_1, \ldots, v_n and assigned “sizes” p 1,,p np_1, \ldots, p_n (normalized to sum to 11), how can we assign a value σ(p,v)\sigma(\mathbf{p}, \mathbf{v}) to the set in a coherent way?

This seems like a very general question. But in fact, just a few sensible requirements on the function σ\sigma are enough to pin it down almost uniquely. And the answer turns out to be closely connected to existing mathematical concepts that you probably already know.

Let’s write

Δ n={(p 1,,p n) n:p i0,p i=1} \Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n : p_i \geq 0, \sum p_i = 1 \Bigr\}

for the set of probability distributions on {1,,n}\{1, \ldots, n\}. Assuming that our “values” are positive real numbers, we’re interested in sequences of functions

(σ:Δ n×(0,) n(0,)) n1 \Bigl( \sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty) \Bigr)_{n \geq 1}

that aggregate the values of the elements to give a value to the whole set. So, if the elements of the set have relative sizes p=(p 1,,p n)\mathbf{p} = (p_1, \ldots, p_n) and values v=(v 1,,v n)\mathbf{v} = (v_1, \ldots, v_n), then the value assigned to the whole set is σ(p,v)\sigma(\mathbf{p}, \mathbf{v}).

Here are some properties that it would be reasonable for σ\sigma to satisfy.

Homogeneity  The idea is that whatever “value” means, the value of the set and the value of the elements should be measured in the same units. For instance, if the elements are valued in kilograms then the set should be valued in kilograms too. A switch from kilograms to grams would then multiply both values by 1000. So, in general, we ask that

σ(p,cv)=cσ(p,v) \sigma(\mathbf{p}, c\mathbf{v}) = c \sigma(\mathbf{p}, \mathbf{v})

for all pΔ n\mathbf{p} \in \Delta_n, v(0,) n\mathbf{v} \in (0, \infty)^n and c(0,)c \in (0, \infty).

Monotonicity  The values of the elements are supposed to make a positive contribution to the value of the whole, so we ask that if v iv iv_i \leq v'_i for all ii then

σ(p,v)σ(p,v) \sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')

for all pΔ n\mathbf{p} \in \Delta_n.

Replication  Suppose that our nn elements have the same size and the same value, vv. Then the value of the whole set should be nvn v. This property says, among other things, that σ\sigma isn’t an average: putting in more elements of value vv increases the value of the whole set!

If σ\sigma is homogeneous, we might as well assume that v=1v = 1, in which case the requirement is that

σ((1/n,,1/n),(1,,1))=n. \sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.

Modularity  This one’s a basic logical axiom, best illustrated by an example.

Imagine that we’re very ambitious and wish to evaluate the entire planet — or at least, the part that’s land. And suppose we already know the values and relative sizes of every country.

We could, of course, simply put this data into σ\sigma and get an answer immediately. But we could instead begin by evaluating each continent, and then compute the value of the planet using the values and sizes of the continents. If σ\sigma is sensible, this should give the same answer.

The notation needed to express this formally is a bit heavy. Let wΔ n\mathbf{w} \in \Delta_n; in our example, n=7n = 7 (or however many continents there are) and w=(w 1,,w 7)\mathbf{w} = (w_1, \ldots, w_7) encodes their relative sizes. For each i=1,,ni = 1, \ldots, n, let p iΔ k i\mathbf{p}^i \in \Delta_{k_i}; in our example, p i\mathbf{p}^i encodes the relative sizes of the countries on the iith continent. Then we get a probability distribution

w(p 1,,p n)=(w 1p 1 1,,w 1p k 1 1,,w np 1 n,,w np k n n)Δ k 1++k n, \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n) = (w_1 p^1_1, \ldots, w_1 p^1_{k_1}, \,\,\ldots, \,\, w_n p^n_1, \ldots, w_n p^n_{k_n}) \in \Delta_{k_1 + \cdots + k_n},

which in our example encodes the relative sizes of all the countries on the planet. (Incidentally, this composition makes (Δ n)(\Delta_n) into an operad, a fact that we’ve discussed many times before on this blog.) Also let

v 1=(v 1 1,,v k 1 1)(0,) k 1,,v n=(v 1 n,,v k n n)(0,) k n. \mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1}, \,\,\ldots,\,\, \mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.

In the example, v j iv^i_j is the value of the jjth country on the iith continent. Then the value of the iith continent is σ(p i,v i)\sigma(\mathbf{p}^i, \mathbf{v}^i), so the axiom is that

σ(w(p 1,,p n),(v 1 1,,v k 1 1,,v 1 n,,v k n n))=σ(w,(σ(p 1,v 1),,σ(p n,v n))). \sigma \bigl( \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n), (v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n}) \bigr) = \sigma \Bigl( \mathbf{w}, \bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n, \mathbf{v}^n) \bigr) \Bigr).

The left-hand side is the value of the planet calculated in a single step, and the right-hand side is its value when calculated in two steps, with continents as the intermediate stage.

Symmetry  It shouldn’t matter what order we list the elements in. So it’s natural to ask that

σ(p,v)=σ(pτ,vτ) \sigma(\mathbf{p}, \mathbf{v}) = \sigma(\mathbf{p} \tau, \mathbf{v} \tau)

for any τ\tau in the symmetric group S nS_n, where the right-hand side refers to the obvious S nS_n-actions.

Absent elements should count for nothing! In other words, if p 1=0p_1 = 0 then we should have

σ((p 1,,p n),(v 1,,v n))=σ((p 2,,p n),(v 2,,v n)). \sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr) = \sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).

This isn’t quite triival. I haven’t yet given you any examples of the kind of function that σ\sigma might be, but perhaps you already have in mind a simple one like this:

σ(p,v)=v 1++v n. \sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.

In words, the value of the whole is simply the sum of the values of the parts, regardless of their sizes. But if σ\sigma is to have the “absent elements” property, this won’t do. (Intuitively, if p i=0p_i = 0 then we shouldn’t count v iv_i in the sum, because the iith element isn’t actually there.) So we’d better modify this example slightly, instead taking

σ(p,v)= i:p i>0v i. \sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.

This function (or rather, sequence of functions) does have the “absent elements” property.

Continuity in positive probabilities  Finally, we ask that for each v(0,) n\mathbf{v} \in (0, \infty)^n, the function σ(,v)\sigma(-, \mathbf{v}) is continuous on the interior of the simplex Δ n\Delta_n, that is, continuous over those probability distributions p\mathbf{p} such that p 1,,p n>0p_1, \ldots, p_n \gt 0.

Why only over the interior of the simplex? Basically because of natural examples of σ\sigma like the one just given, which is continuous on the interior of the simplex but not the boundary. Generally, it’s sometimes useful to make a sharp, discontinuous distinction between the cases p i>0p_i \gt 0 (presence) and p i=0p_i = 0 (absence).


Arrow’s famous theorem states that a few apparently mild conditions on a voting system are, in fact, mutually contradictory. The mild conditions above are not mutually contradictory. In fact, there’s a one-parameter family σ q\sigma_q of functions each of which satisfies these conditions. For real q1q \neq 1, the definition is

σ q(p,v)=( i:p i>0p i qv i 1q) 1/(1q). \sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.

For instance, σ 0\sigma_0 is the example of σ\sigma given above.

The formula for σ q\sigma_q is obviously invalid at q=1q = 1, but it converges to a limit as q1q \to 1, and we define σ 1(p,v)\sigma_1(\mathbf{p}, \mathbf{v}) to be that limit. Explicitly, this gives

σ 1(p,v)= i:p i>0(v i/p i) p i. \sigma_1(\mathbf{p}, \mathbf{v}) = \prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.

In the same way, we can define σ \sigma_{-\infty} and σ \sigma_\infty as the appropriate limits:

σ (p,v)=max i:p i>0v i/p i,σ (p,v)=min i:p i>0v i/p i. \sigma_{-\infty}(\mathbf{p}, \mathbf{v}) = \max_{i \,:\, p_i \gt 0} v_i/p_i, \qquad \sigma_{\infty}(\mathbf{p}, \mathbf{v}) = \min_{i \,:\, p_i \gt 0} v_i/p_i.

And it’s easy to check that for each q[,]q \in [-\infty, \infty], the function σ q\sigma_q satisfies all the natural conditions listed above.

These functions σ q\sigma_q might be unfamiliar to you, but they have some special cases that are quite well-explored. In particular:

  • Suppose you’re in a situation where the elements don’t have “sizes”. Then it would be natural to take p\mathbf{p} to be the uniform distribution u n=(1/n,,1/n)\mathbf{u}_n = (1/n, \ldots, 1/n). In that case, σ q(u n,v)=const(v i 1q) 1/(1q), \sigma_q(\mathbf{u}_n, \mathbf{v}) = const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)}, where the constant is a certain power of nn. When q0q \leq 0, this is exactly a constant times v 1q\|\mathbf{v}\|_{1 - q}, the (1q)(1 - q)-norm of the vector v\mathbf{v}.

  • Suppose you’re in a situation where the elements don’t have “values”. Then it would be natural to take v\mathbf{v} to be 1=(1,,1)\mathbf{1} = (1, \ldots, 1). In that case, σ q(p,1)=(p i q) 1/(1q). \sigma_q(\mathbf{p}, \mathbf{1}) = \bigl( \sum p_i^q \bigr)^{1/(1 - q)}. This is the quantity that ecologists know as the Hill number of order qq and use as a measure of biological diversity. Information theorists know it as the exponential of the Rényi entropy of order qq, the special case q=1q = 1 being Shannon entropy. And actually, the general formula for σ q\sigma_q is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).

Anyway, the big — and as far as I know, new — result is:

Theorem  The functions σ q\sigma_q are the only functions σ\sigma with the seven properties above.

So although the properties above don’t seem that demanding, they actually force our notion of “aggregate value” to be given by one of the functions in the family (σ q) q[,](\sigma_q)_{q \in [-\infty, \infty]}. And although I didn’t even mention the notions of diversity or entropy in my justification of the axioms, they come out anyway as special cases.

I covered all this yesterday in the tenth and penultimate installment of the functional equations course that I’m giving. It’s written up on pages 38–42 of the notes so far. There you can also read how this relates to more realistic measures of biodiversity than the Hill numbers. Plus, you can see an outline of the (quite substantial) proof of the theorem above.

Doug Natelson"Barocalorics", or making a refrigerator from rubber

People have spent a lot of time and effort in trying to control the flow and transfer of heat.  Heat is energy transferred in a disorganized way among many little degrees of freedom, like the vibrations of atoms in a solid or the motion of molecules in a gas.  One over-simplified way of stating how heat likes to flow:  Energy tends to be distributed among as many degrees of freedom as possible.  The reason heat flows from hot things to cold things is that tendency.  Manipulating the flow of heat then really all comes down to manipulating ways for energy to be distributed.

Refrigerators are systems that, with the help of some externally supplied work, take heat from a "cold" side, and dump that heat (usually plus some additional heat) to a "hot" side.  For example, in your household refrigerator, heat goes from your food + the refrigerator inner walls (the cold side) into a working fluid, some relative of freon, which boils.  That freon vapor gets pumped through coils; a fan blows across those coils and (some of) the heat is transferred from the freon vapor to the air in your kitchen.   The now-cooler freon vapor is condensed and pumped (via a compressor) and sent back around again.  

There are other ways to cool things, though, than by running a cycle using a working fluid like freon. For example, I've written before about magnetic cooling.  There, instead of using the motion of liquid and gas molecules as the means to do cooling, heat is made to flow in the desired directions by manipulating the spins of either electrons or nuclei.  Basically, you can use a magnetic field to arrange those spins such that it is vastly more likely for thermal energy to come out of the jiggling motion of your material of interest, and instead end up going into rearranging those spins.

Stretching a polymer tends to heat it, due to the barocaloric
effect.  Adapted from Chauhan et al., doi:10.1557/mre.2015.17 
It turns out, you can do something rather similar using rubber.  The key is something called the elasto-caloric or barocaloric effect - see here (pdf!) for a really nice review.  The effect is shown in the figure, adapted from that paper.   An elastomer in its relaxed state is sitting there at some temperature and with some entropy - the entropy has contributions due to the jiggling around of the atoms, as well as the structural arrangement of the polymer chains.  There are lots of ways for the chains to be bunched up, so there is quite a bit of entropy associated with that arrangement.  Roughly speaking, when the rubber is stretched out quickly (so that there is no time for heat to flow in or out of the rubber) those chains straighten, and the structural piece of the entropy goes down.  To make up for that, the kinetic contribution to the entropy goes up, showing up as an elevated temperature.  Quickly stretch rubber and it gets warmer.  A similar thing happens when rubber is compressed instead of stretched.  So, you could imagine running a refrigeration cycle based on this!  Stretch a piece of rubber quickly; it gets warmer (\(T \rightarrow T + \Delta T\)).  Allow that heat to leave while in the stretched state (\(T + \Delta T \rightarrow T\)).  Now release the rubber quickly so no heat can flow.  The rubber will get colder now than the initial \(T\); energy will tend to rearrange itself out the kinetic motion of the atoms and into crumpling up the polymer chains.  The now-cold rubber can be used to cool something.  Repeat the cycle as desired.  It's a pretty neat idea.  Very recently, this preprint showed up on the arxiv, showing that a common silicone rubber, PDMS, is great for this sort of thing.  Imagine making a refrigerator out of the same stuff used for soft contact lenses!  These effects tend to have rather limited useful temperature ranges in most elastomers, but it's still funky.

April 14, 2017

Terence TaoCounting objects up to isomorphism: groupoid cardinality

How many groups of order four are there? Technically, there are an enormous number, so much so, in fact, that the class of groups of order four is not even a set, but merely a proper class. This is because any four objects {a,b,c,d} can be turned into a group {\{a,b,c,d\}} by designating one of the four objects, say {a}, to be the group identity, and imposing a suitable multiplication table (and inversion law) on the four elements in a manner that obeys the usual group axioms. Since all sets are themselves objects, the class of four-element groups is thus at least as large as the class of all sets, which by Russell’s paradox is known not to itself be a set (assuming the usual ZFC axioms of set theory).

A much better question is to ask how many groups of order four there are up to isomorphism, counting each isomorphism class of groups exactly once. Now, as one learns in undergraduate group theory classes, the answer is just “two”: the cyclic group {C_4} and the Klein four-group {C_2 \times C_2}.

More generally, given a class of objects {X} and some equivalence relation {\sim} on {X} (which one should interpret as describing the property of two objects in {X} being “isomorphic”), one can consider the number {|X / \sim|} of objects in {X} “up to isomorphism”, which is simply the cardinality of the collection {X/\sim} of equivalence classes {[x]:=\{y\in X:x \sim {}y \}} of {X}. In the case where {X} is finite, one can express this cardinality by the formula

\displaystyle |X/\sim| = \sum_{x \in X} \frac{1}{|[x]|}, \ \ \ \ \ (1)

thus one counts elements in {X}, weighted by the reciprocal of the number of objects they are isomorphic to.

Example 1 Let {X} be the five-element set {\{-2,-1,0,1,2\}} of integers between {-2} and {2}. Let us say that two elements {x, y} of {X} are isomorphic if they have the same magnitude: {x \sim y \iff |x| = |y|}. Then the quotient space {X/\sim} consists of just three equivalence classes: {\{-2,2\} = [2] = [-2]}, {\{-1,1\} = [-1] = [1]}, and {\{0\} = [0]}. Thus there are three objects in {X} up to isomorphism, and the identity (1) is then just

\displaystyle 3 = \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2}.

Thus, to count elements in {X} up to equivalence, the elements {-2,-1,1,2} are given a weight of {1/2} because they are each isomorphic to two elements in {X}, while the element {0} is given a weight of {1} because it is isomorphic to just one element in {X} (namely, itself).

Given a finite probability set {X}, there is also a natural probability distribution on {X}, namely the uniform distribution, according to which a random variable {\mathbf{x} \in X} is set equal to any given element {x} of {X} with probability {\frac{1}{|X|}}:

\displaystyle {\bf P}( \mathbf{x} = x ) = \frac{1}{|X|}.

Given a notion {\sim} of isomorphism on {X}, one can then define the random equivalence class {[\mathbf{x}] \in X/\sim} that the random element {\mathbf{x}} belongs to. But if the isomorphism classes are unequal in size, we now encounter a biasing effect: even if {\mathbf{x}} was drawn uniformly from {X}, the equivalence class {[\mathbf{x}]} need not be uniformly distributed in {X/\sim}. For instance, in the above example, if {\mathbf{x}} was drawn uniformly from {\{-2,-1,0,1,2\}}, then the equivalence class {[\mathbf{x}]} will not be uniformly distributed in the three-element space {X/\sim}, because it has a {2/5} probability of being equal to the class {\{-2,2\}} or to the class {\{-1,1\}}, and only a {1/5} probability of being equal to the class {\{0\}}.

However, it is possible to remove this bias by changing the weighting in (1), and thus changing the notion of what cardinality means. To do this, we generalise the previous situation. Instead of considering sets {X} with an equivalence relation {\sim} to capture the notion of isomorphism, we instead consider groupoids, which are sets {X} in which there are allowed to be multiple isomorphisms between elements in {X} (and in particular, there are allowed to be multiple automorphisms from an element to itself). More precisely:

Definition 2 A groupoid is a set (or proper class) {X}, together with a (possibly empty) collection {\mathrm{Iso}(x \rightarrow y)} of “isomorphisms” between any pair {x,y} of elements of {X}, and a composition map {f, g \mapsto g \circ f} from isomorphisms {f \in \mathrm{Iso}(x \rightarrow y)}, {g \in \mathrm{Iso}(y \rightarrow z)} to isomorphisms in {\mathrm{Iso}(x \rightarrow z)} for every {x,y,z \in X}, obeying the following group-like axioms:

  • (Identity) For every {x \in X}, there is an identity isomorphism {\mathrm{id}_x \in \mathrm{Iso}(x \rightarrow x)}, such that {f \circ \mathrm{id}_x = \mathrm{id}_y \circ f = f} for all {f \in \mathrm{Iso}(x \rightarrow y)} and {x,y \in X}.
  • (Associativity) If {f \in \mathrm{Iso}(x \rightarrow y)}, {g \in \mathrm{Iso}(y \rightarrow z)}, and {h \in \mathrm{Iso}(z \rightarrow w)} for some {x,y,z,w \in X}, then {h \circ (g \circ f) = (h \circ g) \circ f}.
  • (Inverse) If {f \in \mathrm{Iso}(x \rightarrow y)} for some {x,y \in X}, then there exists an inverse isomorphism {f^{-1} \in \mathrm{Iso}(y \rightarrow x)} such that {f \circ f^{-1} = \mathrm{id}_y} and {f^{-1} \circ f = \mathrm{id}_x}.

We say that two elements {x,y} of a groupoid are isomorphic, and write {x \sim y}, if there is at least one isomorphism from {x} to {y}.

Example 3 Any category gives a groupoid by taking {X} to be the set (or class) of objects, and {\mathrm{Iso}(x \rightarrow y)} to be the collection of invertible morphisms from {x} to {y}. For instance, in the category {\mathbf{Set}} of sets, {\mathrm{Iso}(x \rightarrow y)} would be the collection of bijections from {x} to {y}; in the category {\mathbf{Vec}/k} of linear vector spaces over some given base field {k}, {\mathrm{Iso}(x \rightarrow y)} would be the collection of invertible linear transformations from {x} to {y}; and so forth.

Every set {X} equipped with an equivalence relation {\sim} can be turned into a groupoid by assigning precisely one isomorphism {\iota_{x \rightarrow y}} from {x} to {y} for any pair {x,y \in X} with {x \sim y}, and no isomorphisms from {x} to {y} when {x \not \sim y}, with the groupoid operations of identity, composition, and inverse defined in the only way possible consistent with the axioms. We will call this the simply connected groupoid associated with this equivalence relation. For instance, with {X = \{-2,-1,0,1,2\}} as above, if we turn {X} into a simply connected groupoid, there will be precisely one isomorphism from {2} to {-2}, and also precisely one isomorphism from {2} to {2}, but no isomorphisms from {2} to {-1}, {0}, or {1}.

However, one can also form multiply-connected groupoids in which there can be multiple isomorphisms from one element of {X} to another. For instance, one can view {X = \{-2,-1,0,1,2\}} as a space that is acted on by multiplication by the two-element group {\{-1,+1\}}. This gives rise to two types of isomorphisms, an identity isomorphism {(+1)_x} from {x} to {x} for each {x \in X}, and a negation isomorphism {(-1)_x} from {x} to {-x} for each {x \in X}; in particular, there are two automorphisms of {0} (i.e., isomorphisms from {0} to itself), namely {(+1)_0} and {(-1)_0}, whereas the other four elements of {X} only have a single automorphism (the identity isomorphism). One defines composition, identity, and inverse in this groupoid in the obvious fashion (using the group law of the two-element group {\{-1,+1\}}); for instance, we have {(-1)_{-2} \circ (-1)_2 = (+1)_2}.

For a finite multiply-connected groupoid, it turns out that the natural notion of “cardinality” (or as I prefer to call it, “cardinality up to isomorphism”) is given by the variant

\displaystyle \sum_{x \in X} \frac{1}{|\{ f: f \in \mathrm{Iso}(x \rightarrow y) \hbox{ for some } y\}|}

of (1). That is to say, in the multiply connected case, the denominator is no longer the number of objects {y} isomorphic to {x}, but rather the number of isomorphisms from {x} to other objects {y}. Grouping together all summands coming from a single equivalence class {[x]} in {X/\sim}, we can also write this expression as

\displaystyle \sum_{[x] \in X/\sim} \frac{1}{|\mathrm{Aut}(x)|} \ \ \ \ \ (2)

where {\mathrm{Aut}(x) := \mathrm{Iso}(x \rightarrow x)} is the automorphism group of {x}, that is to say the group of isomorphisms from {x} to itself. (Note that if {x,x'} belong to the same equivalence class {[x]}, then the two groups {\mathrm{Aut}(x)} and {\mathrm{Aut}(x')} will be isomorphic and thus have the same cardinality, and so the above expression is well-defined.

For instance, if we take {X} to be the simply connected groupoid on {\{-2,-1,0,1,2\}}, then the number of elements of {X} up to isomorphism is

\displaystyle \frac{1}{2} + \frac{1}{2} + 1 + \frac{1}{2} + \frac{1}{2} = 1 + 1 + 1 = 3

exactly as before. If however we take the multiply connected groupoid on {\{-2,-1,0,1,2\}}, in which {0} has two automorphisms, the number of elements of {X} up to isomorphism is now the smaller quantity

\displaystyle \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} + \frac{1}{2} = 1 + \frac{1}{2} + 1 = \frac{5}{2};

the equivalence class {[0]} is now counted with weight {1/2} rather than {1} due to the two automorphisms on {0}. Geometrically, one can think of this groupoid as being formed by taking the five-element set {\{-2,-1,0,1,2\}}, and “folding it in half” around the fixed point {0}, giving rise to two “full” quotient points {[1], [2]} and one “half” point {[0]}. More generally, given a finite group {G} acting on a finite set {X}, and forming the associated multiply connected groupoid, the cardinality up to isomorphism of this groupoid will be {|X|/|G|}, since each element {x} of {X} will have {|G|} isomorphisms on it (whether they be to the same element {x}, or to other elements of {X}).

The definition (2) can also make sense for some infinite groupoids; to my knowledge this was first explicitly done in this paper of Baez and Dolan. Consider for instance the category {\mathbf{FinSet}} of finite sets, with isomorphisms given by bijections as in Example 3. Every finite set is isomorphic to {\{1,\dots,n\}} for some natural number {n}, so the equivalence classes of {\mathbf{FinSet}} may be indexed by the natural numbers. The automorphism group {S_n} of {\{1,\dots,n\}} has order {n!}, so the cardinality of {\mathbf{FinSet}} up to isomorphism is

\displaystyle \sum_{n=0}^\infty \frac{1}{n!} = e.

(This fact is sometimes loosely stated as “the number of finite sets is {e}“, but I view this statement as somewhat misleading if the qualifier “up to isomorphism” is not added.) Similarly, when one allows for multiple isomorphisms from a group to itself, the number of groups of order four up to isomorphism is now

\displaystyle \frac{1}{2} + \frac{1}{6} = \frac{2}{3}

because the cyclic group {C_4} has two automorphisms, whereas the Klein four-group {C_2 \times C_2} has six.

In the case that the cardinality of a groupoid {X} up to isomorphism is finite and non-empty, one can now define the notion of a random isomorphism class {[\mathbf{x}]} in {X/\sim} drawn “uniformly up to isomorphism”, by requiring the probability of attaining any given isomorphism class {[x]} to be

\displaystyle {\mathbf P}([\mathbf{x}] = [x]) = \frac{1 / |\mathrm{Aut}(x)|}{\sum_{[y] \in X/\sim} 1/|\mathrm{Aut}(y)|},

thus the probability of being isomorphic to a given element {x} will be inversely proportional to the number of automorphisms that {x} has. For instance, if we take {X} to be the set {\{-2,-1,0,1,2\}} with the simply connected groupoid, {[\mathbf{x}]} will be drawn uniformly from the three available equivalence classes {[0], [1], [2]}, with a {1/3} probability of attaining each; but if instead one uses the multiply connected groupoid coming from the action of {\{-1,+1\}}, and draws {[\mathbf{x}]} uniformly up to isomorphism, then {[1]} and {[2]} will now be selected with probability {2/5} each, and {[0]} will be selected with probability {1/5}. Thus this distribution has accounted for the bias mentioned previously: if a finite group {G} acts on a finite space {X}, and {\mathbf{x}} is drawn uniformly from {X}, then {[\mathbf{x}]} now still be drawn uniformly up to isomorphism from {X/G}, if we use the multiply connected groupoid coming from the {G} action, rather than the simply connected groupoid coming from just the {G}-orbit structure on {X}.

Using the groupoid of finite sets, we see that a finite set chosen uniformly up to isomorphism will have a cardinality that is distributed according to the Poisson distribution of parameter {1}, that is to say it will be of cardinality {n} with probability {\frac{e^{-1}}{n!}}.

One important source of groupoids are the fundamental groupoids {\pi_1(M)} of a manifold {M} (one can also consider more general topological spaces than manifolds, but for simplicity we will restrict this discussion to the manifold case), in which the underlying space is simply {M}, and the isomorphisms from {x} to {y} are the equivalence classes of paths from {x} to {y} up to homotopy; in particular, the automorphism group of any given point is just the fundamental group of {M} at that base point. The equivalence class {[x]} of a point in {M} is then the connected component of {x} in {M}. The cardinality up to isomorphism of the fundamental groupoid is then

\displaystyle \sum_{M' \in \pi_0(M)} \frac{1}{|\pi_1(M')|}

where {\pi_0(M)} is the collection of connected components {M'} of {M}, and {|\pi_1(M')|} is the order of the fundamental group of {M'}. Thus, simply connected components of {M} count for a full unit of cardinality, whereas multiply connected components (which can be viewed as quotients of their simply connected cover by their fundamental group) will count for a fractional unit of cardinality, inversely to the order of their fundamental group.

This notion of cardinality up to isomorphism of a groupoid behaves well with respect to various basic notions. For instance, suppose one has an {n}-fold covering map {\pi: X \rightarrow Y} of one finite groupoid {Y} by another {X}. This means that {\pi} is a functor that is surjective, with all preimages of cardinality {n}, with the property that given any pair {y,y'} in the base space {Y} and any {x} in the preimage {\pi^{-1}(\{y\})} of {y}, every isomorphism {f \in \mathrm{Iso}(y \rightarrow y')} has a unique lift {\tilde f \in \mathrm{Iso}(x \rightarrow x')} from the given initial point {x} (and some {x'} in the preimage of {y'}). Then one can check that the cardinality up to isomorphism of {X} is {n} times the cardinality up to isomorphism of {Y}, which fits well with the geometric picture of {X} as the {n}-fold cover of {Y}. (For instance, if one covers a manifold {M} with finite fundamental group by its universal cover, this is a {|\pi_1(M)|}-fold cover, the base has cardinality {1/|\pi_1(M)|} up to isomorphism, and the universal cover has cardinality one up to isomorphism.) Related to this, if one draws an equivalence class {[\mathrm{x}]} of {X} uniformly up to isomorphism, then {\pi([\mathrm{x}])} will be an equivalence class of {Y} drawn uniformly up to isomorphism also.

Indeed, one can show that this notion of cardinality up to isomorphism for groupoids is uniquely determined by a small number of axioms such as these (similar to the axioms that determine Euler characteristic); see this blog post of Qiaochu Yuan for details.

The probability distributions on isomorphism classes described by the above recipe seem to arise naturally in many applications. For instance, if one draws a profinite abelian group up to isomorphism at random in this fashion (so that each isomorphism class {[G]} of a profinite abelian group {G} occurs with probability inversely proportional to the number of automorphisms of this group), then the resulting distribution is known as the Cohen-Lenstra distribution, and seems to emerge as the natural asymptotic distribution of many randomly generated profinite abelian groups in number theory and combinatorics, such as the class groups of random quadratic fields; see this previous blog post for more discussion. For a simple combinatorial example, the set of fixed points of a random permutation on {n} elements will have a cardinality that converges in distribution to the Poisson distribution of rate {1} (as discussed in this previous post), thus we see that the fixed points of a large random permutation asymptotically are distributed uniformly up to isomorphism. I’ve been told that this notion of cardinality up to isomorphism is also particularly compatible with stacks (which are a good framework to describe such objects as moduli spaces of algebraic varieties up to isomorphism), though I am not sufficiently acquainted with this theory to say much more than this.

Filed under: expository, math.CO, math.GR, math.GT Tagged: groupoid cardinality

Tommaso DorigoWaiting For Jupiter

This evening I am blogging from a residence in Sesto val Pusteria, a beautiful mountain village in the Italian Alps. I came here for a few days of rest after a crazy work schedule in the past few days -the reason why my blogging has been intermittent. Sesto is surrounded by glorious mountains, and hiking around here is marvelous. But right now, as I sip a non-alcoholic beer (pretty good), chilling off after a day out, my thoughts are focused 500,000,000 kilometers away.

read more

April 10, 2017

Noncommutative GeometryDAVID

It is with immense emotion and sorrow that we learnt the sudden death of David Goss.  He was our dear friend, a joyful supporter of our field and a constant source of inspiration through his great work, of remarkable originality and depth, on function field arithmetics. We are profoundly saddened by this tragic loss.  David will remain in our heart, we shall miss him dearly.

Secret Blogging Seminar… and Elsevier taketh away.

Readers may recall that during the 2013 “peak-Elsevier” period, Elsevier made an interesting concession to the mathematical community — they released all their old mathematical content (“old” here means a rolling 4-year embargo) under a fairly permissive licence.

Unfortunately, sometime in the intervening period they have quietly withdrawn some of the rights they gave to that content. In particular, they no longer give the right to redistribute on non-commercial terms. Of course, the 2013 licence is no longer available on their website, but thankfully David Roberts saved a copy at The critical sentence there is

“Users may access, download, copy, display, redistribute, adapt, translate, text mine and data mine the articles provided that: …”

The new licence, at now reads

“Users may access, download, copy, translate, text and data mine (but may not redistribute, display or adapt) the articles for non-commercial purposes provided that users: …”

I think this is pretty upsetting. The big publishers hold the copyright on our collective cultural heritage, and they can deny us access to the mathematical literature at a whim. The promise that we could redistribute on a non-commercial basis was a guarantee that we could preserve the literature. If this is to be taken away, I hope that mathematicians will go to war again.

Hopefully Elsevier will soon come out with a “oops, this was a mistake, those lawyers, you know?” but this will only happen if we get on their case.

What to do:

  • Elsevier journal editors: please contact your Elsevier representations, and ask that the licence for the open archives be restored to what it was, to assure the mathematical community that we have ongoing access to the old literature.
  • Elsevier referees and authors: please contact your journal editors, to ask them to contact Elsevier. If you are currently refereeing or submitting, please bring up this issue directly.
  • Everyone: contact Elsevier, either by email or social media (twitter facebook google+).
  • Happily, as we have a copy of the 2013 licence, all the Elsevier open mathematics archive up to 2009 is still available for non-commercial redistribution under their terms. You can find these at

April 09, 2017

Michael NielsenIs there a tension between creativity and accuracy?

On Twitter, I’ve been chatting with my friend Julia Galef about tensions between thinking creatively and thinking in a way that reduces error.

Of course, all other things being equal, I’m in favour of reducing error in our thinking!

However, all other things are not always equal.

In particular, I believe “there’s a tension, too, between behaviours which maximize accuracy & which maximize creativity… A lot of important truths come from v. irrational ppl.”

Julia has summarized some of her thinking in a blog post, where she disagrees, writing: “I totally agree that we need more experimentation with “crazy ideas”! I’m just skeptical that rationality is, on the margin, in tension with that goal.”

Before getting to Julia’s arguments, I want to flesh out the idea of a tension between maximizing creativity and maximizing accuracy.

Consider the following statement of Feynman’s, on the need to fool himself into believing that he had a creative edge in his work. He’s talking about his early ideas on how to develop a theory of electrons and light (which became, after many years, quantum electrodynamics). The statement is a little jarring to modern sensibilities, but please look past that to the idea he’s trying to convey:

I told myself [of his competitors]: “They’re on the wrong track: I’ve got the track!” Now, in the end, I had to give up those ideas and go over to their ideas of retarded action and so on – my original idea of electrons not acting on themselves disappeared, but because I had been working so hard I found something. So, as long as I can drive myself one way or the other, it’s okay. Even if it’s an illusion, it still makes me go, and this is the kind of thing that keeps me going through the depths.

It’s like the African savages who are going into battle – first they have to gather around and beat drums and jump up and down to build up their energy to fight. I feel the same way, building up my energy by talking to myself and telling myself, “They are trying to do it this way, I’m going to do it that way” and then I get excited and I can go back to work again.

Many of the most creative scientists I know are extremely determined people, willing to explore unusual positions for years. Sometimes, those positions are well grounded. And sometimes, even well after the fact, it’s obvious they were fooling themselves, but somehow their early errors helped them find their way to the truth. They were, to use the mathematician Goro Shimura’s phrase “gifted with the special capability of making many mistakes, mostly in the right direction”.

An extreme example is the physicist Joseph Weber, who pioneered gravitational wave astronomy. The verdict of both his contemporaries and of history is that he was fooling himself: his systems simply didn’t work the way he thought. On the other hand, even though he fooled himself for decades, the principals on the (successful!) LIGO project have repeatedly acknowledged that his work was a major stimulus for them to work on finding gravitational waves. In retrospect, it’s difficult to be anything other than glad that Weber clung so tenaciously to his erroneous beliefs.

For me, what matters here is that: (a) much of Weber’s work was based on an unreasonable belief; and (b) on net, it helped speed up important discoveries.

Weber demonstrates my point in an extreme form. He was outright wrong, and remained so, and yet his erroneous example still served a useful purpose, helping inspire others to pursue ideas that eventually worked. In some sense, this is a collective (rather than individual) version of my point. More common is the case – like Feynman – of a person who may cling to mistaken beliefs for a long period, but ultimately uses that as a bridge to new discovery.

Turning to Julia’s post, she responds to my argument with: “In general, I think overconfidence stifles experimentation”, and argues that the great majority of people in society reject “crazy” ideas – say, seasteading – because they’re overconfident in conventional wisdom.

I agree that people often mistakenly reject unusual ideas because they’re overconfident in the conventional wisdom.

However, I don’t think it’s relevant to my argument. Being overconfident in beliefs that most people hold is not at all the same as being overconfident in beliefs that few people hold.

You may wonder if the underlying cognitive mechanisms are the same, and perhaps there’s some kind of broad disposition to overconfidence?

But if that was the case then you’d expect that someone overconfident in their own unusual ideas would, in other areas, also be overconfident in the conventional wisdom.

However, my anecdotal experience is that a colleague willing to pursue unusual ideas of their own is often particularly sympathetic to unusual ideas from other people in other areas. This suggests that being overconfident in your own crazy ideas isn’t likely to stifle other experimentation.

Julia also suggests several variants on the “strategy of temporarily suspending your disbelief and throwing yourself headlong into something for a while, allowing your emotional state to be as
you were 100% confident.”

In a sense, Feynman and Weber were practicing an extreme version of this strategy. I don’t know Weber’s work well, but it’s notable that in the details of Feynman’s work he was good at ferreting out error, and not fooling himself. He wasn’t always rigorous – mathematicians have, for instance, spent decades trying to make the path integral rigorous – but there was usually a strong core argument. Indeed, Feynman delivered a very stimulating speech on the value of careful thought in scientific work.

How can this careful approach to the details of argument be reconciled with his remarks about the need to fool yourself in creative work?

I never met Feynman, and can’t say how he reconciled the two points of view. But my own approach in creative work, and I believe many others also take this approach, is to carve out a sort of creative cocoon around nascent ideas.

Consider Apple designer Jony Ive’s remarks at a memorial after Steve Jobs’ death:

Steve used to say to me — and he used to say this a lot — “Hey Jony, here’s a dopey idea.”

And sometimes they were. Really dopey. Sometimes they were truly dreadful. But sometimes they took the air from the room and they left us both completely silent. Bold, crazy, magnificent ideas. Or quiet simple ones, which in their subtlety, their detail, they were utterly profound. And just as Steve loved ideas, and loved making stuff, he treated the process of creativity with a rare and a wonderful reverence. You see, I think he better than anyone understood that while ideas ultimately can be so powerful, they begin as fragile, barely formed thoughts, so easily missed, so easily compromised, so easily just squished.

To be creative, you need to recognize those barely formed thoughts, thoughts which are usually wrong and poorly formed in many ways, but which have some kernel of originality and importance and
truth. And if they seem important enough to be worth pursuing, you construct a creative cocoon around them, a set of stories you tell yourself to protect the idea not just from others, but from your own self doubts. The purpose of those stories isn’t to be an air tight defence. It’s to give you the confidence to nurture the idea, possibly for years, to find out if there’s something really there.

And so, even someone who has extremely high standards for the final details of their work, may have an important component to their thinking which relies on rather woolly arguments. And they may well need to cling to that cocoon. Perhaps other approaches are possible. But my own experience is that this is often the case.


Julia finishes her post with:

One last point: Even if it turned out to be true that irrationality is necessary for innovators, that’s only a weak defense of your original claim, which was that I’m significantly overrating the value of rationality in general. Remember, “coming up with brilliant new ideas” is just one domain in which we could evaluate the potential value-add of increased rationality. There are lots of other domains to consider, such as designing policy, allocating philanthropic funds, military strategy, etc. We could certainly talk about those separately; for now, I’m just noting that you made this original claim about the dubious value of rationality in general, but then your argument focused on this one particular domain, innovation.

To clarify, I didn’t intend my claim to be in general: the tension I see is between creativity and accuracy.

That said, this tension does leak into other areas.

If you’re a funder, say, trying to determine what to fund in AI research, you go and talk to AI experts. And many of those people are likely to have cultivated their own creative cocoons, which will inform their remarks. How a funder should deal with that is a separate essay. My point here is simply that this process of creative cocooning isn’t easily untangled from things like evaluation of work.

April 08, 2017

John BaezPeriodic Patterns in Peptide Masses

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers m_1, ... , m_{20}, how many ways can you write any natural number N as a finite ordered sum of these numbers? Call it F(N) and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for F(N). It obeys a linear recurrence:

F(N) = F(N - m_1) + \cdots + F(N - m_{20})

and we can solve this using the ansatz

F(N) = x^N

Then the recurrence relation will hold if

x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}

for all N. But this is fairly easy to achieve! If m_{20} is the biggest mass, we just need this polynomial equation to hold:

x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1

There will be a bunch of solutions, about m_{20} of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for F(N) we need to find the right linear combination of functions x^N where x ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root x with the largest magnitude will make x^N grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses m_1 = 1, m_2 = 2. Then the numbers F(N) are the Fibonacci numbers. I hope you know this: the Nth Fibonacci number is the number of ways to write N as the sum of an ordered list of 1’s and 2’s!


1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

x^2 = x + 1

and the root with largest magnitude is the golden ratio:

\Phi = 1.6180339...

The other root is

1 - \Phi = -0.6180339...

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers m_1 < \cdots < m_k are all the roots of

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1


I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that F(N) is asymptotically proportional to x^N where x is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of F(N). The red roots have the second largest magnitude. These give the main oscillations in F(N), which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers m_1 < \cdots < m_k and draw all the roots of the equations

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The beauty of roots.

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:

But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses m_i are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!

Dave BaconQuantum Advantage

I’ve had quite a few conversations lately about a comment I left on Scirate. The paper at that link, “Quantum advantage with shallow circuits” by Sergey Bravyi, David Gosset, Robert Koenig, shows a provable separation between analogous classes of quantum and classical circuits, even when the quantum circuit is restricted to nearest-neighbor gates on a 2D grid. This is a fantastic result! My comment, however, wasn’t regarding the result, but rather the title of the paper. I’m just happy that they called it a “quantum advantage” instead of using that other term…

The term “quantum supremacy” is the fashionable name for the quantum experiments attempting to beat classical computers at some given task, not necessarily a useful one. According to current usage, the term (strangely) only applies to computational problems. The theoretical and experimental work towards demonstrating this is wonderful. But the term itself, as any native English speaker can tell you, has the unfortunate feature that it immediately calls to mind “white supremacy”. Indeed, one can even quantify this using a Google ngram search for *_ADJ supremacy over all books in Google’s corpus between 1900 and 2008:

None of these terms has a particularly good connotation, but white supremacy (the worst on the list) is an order of magnitude more common than the others and has, on net, been growing since the 30s. For almost every native speaker that I’ve talked to, and quite a few non-native speakers as well, the taint of this is hard to escape. (For speakers of German or French, this word is a bit like “Vormachtstellung” or “collaboration” respectively.)

The humor surrounding this term has always been in bad taste — talking about “quantum supremacists” and jokes about disavowing their support — but it was perhaps tolerable before the US election in November. Given that there are several viable alternatives, for example “quantum advantage” or even “quantum superiority”, can we please agree as a community to abandon this awful term?

This isn’t about being PC. And I’m not trying to shame any of the people that have used this term. It’s just a poor word choice, and we don’t have to be stuck with it. Connotations of words matter: you don’t say someone is “scrawny” if you mean they are thin, even though my thesaurus lists these words as synonyms. Given the readily available alternatives, the only case I can think of for “supremacy” at this point is inertia, which is a rather poor argument.

So please, say it with me now: quantum advantage.

Update: Ashley Montanaro points out that “advantage” should potentially be reserved for a slight advantage. I maintain that “superiority” is still a good choice, and I also offer “dominance” as another alternative. Martin Schwarz suggests some variation of “breaking the X barrier”, which has a nice feel to it. 

April 06, 2017

n-Category Café Applied Category Theory

The American Mathematical Society is having a meeting here at U. C. Riverside during the weekend of November 4th and 5th, 2017. I’m organizing a session on Applied Category Theory, and I’m looking for people to give talks.

The goal is to start a conversation about applications of category theory, not within pure math or fundamental physics, but to other branches of science and engineering — especially those where the use of category theory is not already well-established! For example, my students and I have been applying category theory to chemistry, electrical engineering, control theory and Markov processes.

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

More precisely, please read the information there and then click on the link in blue to submit an abstract. It should then magically fly through cyberspace to me! Abstracts are due September 12th, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

We’ll be having some interesting plenary talks:

  • Paul Balmer, UCLA, An invitation to tensor-triangular geometry.

  • Pavel Etingof, MIT, Double affine Hecke algebras and their applications.

  • Monica Vazirani, U.C. Davis, Combinatorics, categorification, and crystals.

John BaezApplied Category Theory

The American Mathematical Society is having a meeting here at U. C. Riverside during the weekend of November 4th and 5th, 2017. I’m organizing a session on Applied Category Theory, and I’m looking for people to give talks.

The goal is to start a conversation about applications of category theory, not within pure math or fundamental physics, but to other branches of science and engineering—especially those where the use of category theory is not already well-established! For example, my students and I have been applying category theory to chemistry, electrical engineering, control theory and Markov processes.

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through cyberspace to me! Abstracts are due September 12th, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 4–5 November 2017.

We’ll be having some interesting plenary talks:

• Paul Balmer, UCLA, An invitation to tensor-triangular geometry.

• Pavel Etingof, MIT, Double affine Hecke algebras and their applications.

• Monica Vazirani, U.C. Davis, Combinatorics, categorification, and crystals.

April 04, 2017

Tommaso DorigoWinter 2017 LHC Results: The Higgs Is Still There, But...

Snow is melting in the Alps, and particle physicists, who have flocked to La Thuile for exciting ski conferences in the past weeks, are now back to their usual occupations. The pressure of the deadline is over: results have been finalized and approved, preliminary conference notes have been submitted, talks have been given. The period starting now, the one immediately following presentation of new results, when the next deadline (summer conferences!) is still far away, is more productive in terms of real thought and new ideas. Hopefully we'll come up with some new way to probe the standard model or to squeeze more information from those proton-proton collisions, lest we start to look like accountants!

read more

April 03, 2017

John PreskillHere’s one way to get out of a black hole!

Two weeks ago I attended an exciting workshop at Stanford, organized by the It from Qubit collaboration, which I covered enthusiastically on Twitter. Many of the talks at the workshop provided fodder for possible blog posts, but one in particular especially struck my fancy. In explaining how to recover information that has fallen into a black hole (under just the right conditions), Juan Maldacena offered a new perspective on a problem that has worried me for many years. I am eagerly awaiting Juan’s paper, with Douglas Stanford and Zhenbin Yang, which will provide more details.


My cell-phone photo of Juan Maldacena lecturing at Stanford, 22 March 2017.

Almost 10 years ago I visited the Perimeter Institute to attend a conference, and by chance was assigned an office shared with Patrick Hayden. Patrick was a professor at McGill at that time, but I knew him well from his years at Caltech as a Sherman Fairchild Prize Fellow, and deeply respected him. Our proximity that week ignited a collaboration which turned out to be one of the most satisfying of my career.

To my surprise, Patrick revealed he had been thinking about  black holes, a long-time passion of mine but not previously a research interest of his, and that he had already arrived at a startling insight which would be central to the paper we later wrote together. Patrick wondered what would happen if Alice possessed a black hole which happened to be highly entangled with a quantum computer held by Bob. He imagined Alice throwing a qubit into the black hole, after which Bob would collect the black hole’s Hawking radiation and feed it into his quantum computer for processing. Drawing on his knowledge about quantum communication through noisy channels, Patrick argued that  Bob would only need to grab a few qubits from the radiation in order to salvage Alice’s qubit successfully by doing an appropriate quantum computation.


Alice tosses a qubit into a black hole, which is entangled with Bob’s quantum computer. Bob grabs some Hawking radiation, then does a quantum computation to decode Alice’s qubit.

This idea got my adrenaline pumping, stirring a vigorous dialogue. Patrick had initially assumed that the subsystem of the black hole ejected in the Hawking radiation had been randomly chosen, but we eventually decided (based on a simple picture of the quantum computation performed by the black hole) that it should take a time scaling like M log M (where M is the black hole mass expressed in Planck units) for Alice’s qubit to get scrambled up with the rest of her black hole. Only after this scrambling time would her qubit leak out in the Hawking radiation. This time is actually shockingly short, about a millisecond for a solar mass black hole. The best previous estimate for how long it would take for Alice’s qubit to emerge (scaling like M3), had been about 1067 years.

This short time scale aroused memories of discussions with Lenny Susskind back in 1993, vividly recreated in Lenny’s engaging book The Black Hole War. Because of the black hole’s peculiar geometry, it seemed conceivable that Bob could distill a copy of Alice’s qubit from the Hawking radiation and then leap into the black hole, joining Alice, who could then toss her copy of the qubit to Bob. It disturbed me that Bob would then hold two perfect copies of Alice’s qubit; I was a quantum information novice at the time, but I knew enough to realize that making a perfect clone of a qubit would violate the rules of quantum mechanics. I proposed to Lenny a possible resolution of this “cloning puzzle”: If Bob has to wait outside the black hole for too long in order to distill Alice’s qubit, then when he finally jumps in it may be too late for Alice’s qubit to catch up to Bob inside the black hole before Bob is destroyed by the powerful gravitational forces inside. Revisiting that scenario, I realized that the scrambling time M log M, though short, was just barely long enough for the story to be self-consistent. It was gratifying that things seemed to fit together so nicely, as though a deep truth were being affirmed.


If Bob decodes the Hawking radiation and then jumps into the black hole, can he acquire two identical copies of Alice’s qubit?

Patrick and I viewed our paper as a welcome opportunity to draw the quantum information and quantum gravity communities closer together, and we wrote it with both audiences in mind. We had fun writing it, adding rhetorical flourishes which we hoped would draw in readers who might otherwise be put off by unfamiliar ideas and terminology.

In their recent work, Juan and his collaborators propose a different way to think about the problem. They stripped down our Hawking radiation decoding scenario to a model so simple that it can be analyzed quite explicitly, yielding a pleasing result. What had worried me so much was that there seemed to be two copies of the same qubit, one carried into the black hole by Alice and the other residing outside the black hole in the Hawking radiation. I was alarmed by the prospect of a rendezvous of the two copies. Maldacena et al. argue that my concern was based on a misconception. There is just one copy, either inside the black hole or outside, but not both. In effect, as Bob extracts his copy of the qubit on the outside, he destroys Alice’s copy on the inside!

To reach this conclusion, several ideas are invoked. First, we analyze the problem in the case where we understand quantum gravity best, the case of a negatively curved spacetime called anti-de Sitter space.  In effect, this trick allows us to trap a black hole inside a bottle, which is very advantageous because we can study the physics of the black hole by considering what happens on the walls of the bottle. Second, we envision Bob’s quantum computer as another black hole which is entangled with Alice’s black hole. When two black holes in anti-de Sitter space are entangled, the resulting geometry has a “wormhole” which connects together the interiors of the two black holes. Third, we chose the entangled pair of black holes to be in a very special quantum state, called the “thermofield double” state. This just means that the wormhole connecting the black holes is as short as possible. Fourth, to make the analysis even simpler, we suppose there is just one spatial dimension, which makes it easier to draw a picture of the spacetime. Now each wall of the bottle is just a point in space, with the left wall lying outside Bob’s side of the wormhole, and the right wall lying outside Alice’s side.

An important property of the wormhole is that it is not traversable. That is, when Alice throws her qubit into her black hole and it enters her end of the wormhole, the qubit cannot emerge from the other end. Instead it is stuck inside, unable to get out on either Alice’s side or Bob’s side. Most ways of manipulating the black holes from the outside would just make the wormhole longer and exacerbate the situation, but in a clever recent paper Ping Gao, Daniel Jafferis, and Aron Wall pointed out an exception. We can imagine a quantum wire connecting the left wall and right wall, which simulates a process in which Bob extracts a small amount of Hawking radiation from the right wall (that is, from Alice’s black hole), and carefully deposits it on the left wall (inserting it into Bob’s quantum computer). Gao, Jafferis, and Wall find that this procedure, by altering the trajectories of Alice’s and Bob’s walls, can actually make the wormhole traversable!


(a) A nontraversable wormhole. Alice’s qubit, thrown into the black hole, never reaches Bob. (b) Stealing some Hawking radiation from Alice’s side and inserting it on Bob’s side makes the wormhole traversable. Now Alice’s qubit reaches Bob, who can easily “decode” it.

This picture gives us a beautiful geometric interpretation of the decoding protocol that Patrick and I had described. It is the interaction between Alice’s wall and Bob’s wall that brings Alice’s qubit within Bob’s grasp. By allowing Alice’s qubit to reach Bob at the other end of the wormhole, that interaction suffices to perform Bob’s decoding task, which is especially easy in this case because Bob’s quantum computer was connected to Alice’s black hole by a short wormhole when she threw her qubit inside.


If, after a delay, Bob’s jumps into the black hole, he might find Alice’s qubit inside. But if he does, that qubit cannot be decoded by Bob’s quantum computer. Bob has no way to attain two copies of the qubit.

And what if Bob conducts his daring experiment, in which he decodes Alice’s qubit while still outside the black hole, and then jumps into the black hole to check whether the same qubit is also still inside? The above spacetime diagram contrasts two possible outcomes of Bob’s experiment. After entering the black hole, Alice might throw her qubit toward Bob so he can catch it inside the black hole. But if she does, then the qubit never reaches Bob’s quantum computer, and he won’t be able to decode it from the outside. On the other hand, Alice might allow her qubit to reach Bob’s quantum computer at the other end of the (now traversable) wormhole. But if she does, Bob won’t find the qubit when he enters the black hole. Either way, there is just one copy of the qubit, and no way to clone it. I shouldn’t have been so worried!

Granted, we have only described what happens in an oversimplified model of a black hole, but the lessons learned may be more broadly applicable. The case for broader applicability rests on a highly speculative idea, what Maldacena and Susskind called the ER=EPR conjecture, which I wrote about in this earlier blog post. One consequence of the conjecture is that a black hole highly entangled with a quantum computer is equivalent, after a transformation acting only on the computer, to two black holes connected by a short wormhole (though it might be difficult to actually execute that transformation). The insights of Gao-Jafferis-Wall and Maldacena-Stanford-Yang, together with the ER=EPR viewpoint, indicate that we don’t have to worry about the same quantum information being in two places at once. Quantum mechanics can survive the attack of the clones. Whew!

Thanks to Juan, Douglas, and Lenny for ongoing discussions and correspondence which have helped me to understand their ideas (including a lucid explanation from Douglas at our Caltech group meeting last Wednesday). This story is still unfolding and there will be more to say. These are exciting times!

Chad OrzelPhysics Blogging Round-Up: March

Another month, another batch of blog posts at Forbes:

In Physics, Infinity Is Easy But Ten Is Hard: Some thoughts on the odd fact that powerful math tricks make it easy to deal uncountably many interacting particles, while a smaller number would be a Really Hard Problem.

New Experiment Explores The Origin Of Probabilities In Quantum Physics: A write-up of an experiment using a multi-path interferometer to look for departures from the Born rule for calculating probabilities from wavefunctions.

The Most Important Science To Fund Is The Hardest To Explain: In light of the awful budget proposal put forth by the Trump administration, some thoughts on the importance of government funding for the most basic kinds of research.

Popular Science Writing And Our Fascination With Speculation: Prompted by the new book from Jorge Cham and Daniel Whiteson, a look at why so many pop-science books focus on what we don’t know.

Can You Make A Quantum Superposition Of Cause And Effect?: A write-up of a new paper where they put a single photon into a superposition of A-then-B and B-then-A, which kind of makes my head hurt.

I’m pretty happy with these, though I would’ve expected more pageviews for the cause-and-effect thing. My teaching schedule is slightly lighter this Spring term, so I may be able to do a little more in-depth blogging than in recent months. Or maybe not. Come back in May to find out…

Jordan EllenbergEllenbergs

I got a message last week from the husband of my first cousin once removed;  his father-in-law, Leonard Ellenberg, was my grandfather Julius Ellenberg’s brother.  I never knew my grandfather; he died before I was born, and I was named for him.

The message contained a huge amount of information about a side of my family I’ve never known well.  I’m still going through it all.  But I wanted to share some of it while it was on my mind.

Here’s the manifest for the voyage of the S.S. Polonia, which left Danzig on September 17, 1923 and arrived in New York on October 1.


Owadje Ellenberg (always known as Owadia in my family) was my great-grandfather.  He came to New York with his wife Sura-Fejga (known to us as Sara), Markus (Max), Etia-Race (Ethel), Leon (Leonard), Samuel and Bernard.  Sara was seven months pregnant with my uncle Morris Ellenberg, the youngest child.

Owadje gives his occupation as “mason”; his son Max, only 17, was listed as “tailor.”  They came from Stanislawow, Poland, which is now the city of Ivano-Frankivsk in Ukraine.  On the immigration form you had to list a relative in your country of origin; Owadje listed his brother, Zacharja, who lived on Zosina Wola 6 in Stanislawow.  None of the old street names have survived to the present, but looking at this old map of Stanislawow


it seems pretty clear Zosina Wola is the present day Yevhena Konoval’tsya Street.  I have no way of knowing whether the numbering changed, but #6 Yevhena Konoval’tsya St. seems to be the setback building here:


So this is the best guess I have as to where my ancestors lived in the old country.  The name Zosina Wola lives on only in the name of a bar a few blocks down Yevhena Konoval’tsya:



Owadje, now Owadia, files a declaration of intention to naturalize in 1934:


His signature is almost as bad as mine!  By 1934 he’s living in Borough Park, Brooklyn, a plasterer.  5 foot 7 and 160lb; I think every subsequent Ellenberg man has been that size by the age of 15.  Shtetl nutrition.  There are two separate questions on this form, “color” and “race”:  for color he puts white, for race he puts “Hebrew.”  What did other Europeans put for race?  He puts his hometown as Sopoff, which I think must be the modern Sopiv; my grandmother Sara was from Obertyn, quite close by.  I guess they moved to the big city, Stanislowow, about 40 miles away, when they were pretty young; they got married there in 1902, when they were 21.  The form says he previously filed a declaration of intention in 1926.  What happened?  Did he just not follow through, or was his naturalization rejected?  Did he ever become a citizen?  I don’t know.

Here’s what his house in Brooklyn looks like now:



Did you notice whose name was missing from the Polonia’s manifest?  Ovadje’s oldest son, my grandfather, Julius.  Except one thing I’ve learned from all this is that I don’t actually know what my grandfather’s name was.  Julius is what we called him.  But my dad says his passport says “Israel Ellenberg.”  And his naturalization papers


have him as “Juda Ellenberg”  (Juda being the Anglicization of Yehuda, his and my Hebrew name.)  So didn’t that have to be his legal name?  But how could that not be on his passport?

Update:  Cousin Phyllis came through for me!  My grandfather legally changed his name to Julius on June 13, 1927, four months after he filed for naturalization.    


My grandfather was the first to come to America, in December 1920, and he came alone.  He was 16.  He managed to make enough money to bring the whole rest of the family in late 1923, which was a good thing because in May 1924 Calvin Coolidge signed the Johnson-Reed Act which clamped down on immigration by people thought to be debasing the American racial stock:  among these were Italians, Chinese, Czechs, Spaniards, and Jews, definitely Jews.

Another thing I didn’t know:  my grandfather lists his port of entry as Vanceboro, Maine.  That’s not a seaport; it’s a small town on the Canadian border.  So Julius/Juda/Israel must have sailed to Canada; this I never knew.  Where would he have landed? Sounds like most Canadian immigrants landed at Quebec or Halifax, and Halifax makes much more sense if he entered the US at Vanceboro.  But why did he sail to Canada instead of the US?  And why did he leave from France (the form says “Montrese, France,” a place I can’t find) instead of Poland?  (Update:  My cousin comes through again:  another record shows that Julius arrived on Dec 7, 1920 in St. John, New Brunswick, conveyed in 3rd class by the S.S. Corsican.  Looks like this ship would have been coming from England, not France;  I don’t know how to reconcile that.)

In 1927, when he naturalized, Julius lived at 83 2nd Avenue, a building built in 1900 at the boundary of the Bowery and the East Village.  Here’s what it looks like now:


Not a lot of new immigrants able to afford rent there these days, I’m betting.  Later he’d move to Long Beach, Long Island, where my father and his sisters grew up.

My first-cousin-once-removed-in-law went farther back, too, all the way back to Mojżesz Ellenberg, who was born sometime in the middle of the 18th century.  The Hapsburg Empire required Jews to adopt surnames only in 1787; so Mojżesz could very well have been the first Ellenberg.  You may be thinking he’s Owadia’s father’s father’s father, but no — Ellenberg was Owadia’s mother’s name.  I was puzzled by this but actually it was common.  What it meant is that Mordko Kasirer, Owadia’s father, didn’t want to pay the fee for a civil marriage — why should he, when he was already married to Rivka Ellenberg in the synagogue?  But if you weren’t legally married, your children weren’t allowed to take their father’s surname.  So be it.  Mordko wasn’t gonna get ripped off by the system.  Definitely my relative.

Update:  Cousin Phyllis Rosner sends me my grandfather’s birth record.  At birth in Poland he’s Izrael Juda Ellenberg.  This still doesn’t answer what his legal name in the US was, but it explains the passport!














April 02, 2017

Richard EastherActually, This Is Rocket Science

This year is the 60th anniversary of the first satellite launch, Sputnik 1, in 1957. Since then over 8000 objects have followed it into orbit and most people alive today were born after the Space Age began. 

But despite being commonplace, spaceflight is still far from routine. In fact, in the six decades since the Soviet Union started the space race, just eleven nations and the European Union have achieved indigenous launch capability, sending a locally developed rocket into orbit. These include the United States, which launched its first satellite months after Sputnik and two – Russia and the Ukraine – that inherited the Soviet programme. Three more first world economies, Japan, the United Kingdom and France, are on the list along with emerging superpowers China and India. Finally, Israel, Iran and North Korea round out the group. 

RocketLab's launch site on New Zealand''s Mahia Peninsula. 

RocketLab's launch site on New Zealand''s Mahia Peninsula. 

In the next months New Zealand stands a good chance of becoming a member of this club. New Zealand's nascent programme is the work of RocketLab, which is planning its first launch from the Mahia Peninsula on the East Coast of the North Island. If that succeeds, we will be in category of our own: smallest country on the list, one of just three (with the EU and Japan) whose efforts did not piggy-back military missile programmes, and the first debut made by private enterprise rather than a national programme.

RocketLab has the potential to give New Zealand a role in "private space", the biggest upheaval in launch technology since the Space Shuttle, as innovative commercial startups transform what had become a sedate and settled industry. Elon Musk's Space-X is the poster child for this revolution, docking a privately developed capsule with the International Space Station, making first controlled landing and recovery of a conventional rocket, and developing a remarkably ambitious (if wildly optimistic) plan to transport humans to Mars, but it is one of many players in the field.  

The Space Shuttle, now a museum piece. 

The Space Shuttle, now a museum piece. 

RocketLab is developing a smaller, niche product compared to Space-X's heavy boosters. Designed for frequent launches of small payloads, RocketLab's Electron rocket is fully expendable but lifts a few hundred kilograms for just a few million dollars. (You can make your booking online.) That said, RocketLab still has to successfully test their rocket and, as this is rocket science, there is a real risk of failure, especially in their first few launches. Moreover, for a commercial venture simple technical success is not enough – they must also out-complete other operators in their niche. 

For the most part, RocketLab has avoided publicly committing to specific milestones well ahead of time, and while it has received plenty of coverage (most recently for achieving space unicorn status, a start-up worth over a billion dollars), many New Zealanders may not appreciate the stakes. But if it works it will be a very big deal indeed - not just for RocketLab, but the country, as it has the potential to spawn a local space industry, developing novel satellite technologies and figuring out new ways to use of data from space-based instruments. And that would be a lift-off to celebrate. 

CODA: All rocket programmes borrow from others and RocketLab's URL – – tells a more nuanced story than the Kiwi-boosterism (pardon the pun) that will follow a successful launch. The company is primarily American owned, but founded by New Zealander Peter Beck, and it builds, tests and launches rockets in New Zealand, although it will apparently have additional launch-sites elsewhere. 

And just because it's cool, here's a video of Space-X soft-landing a rocket on a barge; it looks like a film run backwards, but it is a huge leap forwards. 

April 01, 2017

John BaezJobs at U.C. Riverside

The Mathematics Department of the University of California at Riverside is trying to hire some visiting assistant professors. We plan to make decisions quite soon!

The positions are open to applicants who have PhD or will have a PhD by the beginning of the term from all research areas in mathematics. The teaching load is six courses per year (i.e. 2 per quarter). In addition to teaching, the applicants will be responsible for attending advanced seminars and working on research projects.

This is initially a one-year appointment, and with successful annual teaching review, it is renewable for up to a third year term.

For more details, including how to apply, go here:

March 30, 2017

Chad OrzelPriority Expectations and Student-Faculty Conflict

There was a kerfuffle in academic social media a bit earlier this week, kicked off by an anonymous Twitter feed dedicated to complaints about students (which I won’t link to, as it’s one of those stunt feeds that’s mostly an exercise in maximizing clicks by maximizing dickishness). This triggered a bunch of sweeping declarations about the surpassing awfulness of all faculty who have ever thought poorly of a student (which I’m also not going to link, because they were mostly on Twitter and are now even more annoying to find than they were to read). It was a great week for muttered paraphrases of Mercutio’s death speech, in other words.

These opposite extremes are sort of interesting, though, in that they both spring from the exact same core problem, namely that each side of the faculty-student relationship thinks they should be the other’s top priority, and are annoyed when they’re not.

Faculty complaints about students missing class, not handing in work, etc. in the end trace back to the feeling that class work– and specifically their class work– ought to be the single highest priority for students in that class. I realized this a few years back, when I had a horrible experience with a couple of pre-med physics classes, who were infuriating even by the standards of pre-med physics classes. What I found most maddening about this particular group was that they didn’t even try to hide the fact that my class was their lowest priority. It wasn’t just the constant requests that I adjust my due dates to work around the organic chemistry class running the same term– those are a constant with the pre-med crowd. This particular group would come to a physics recitation section in a first-floor classroom that ended ten minutes before my lab on the third floor, then leave the building to go get coffee in the campus center, and roll into lab 10-15 minutes late. The colleague who taught the recitation section was usually back in his office down the hall well before the students in the class he’d just finished teaching would stroll by on their way to lab.

(I eventually lost my temper with them, and started locking the door at the beginning of the period, only letting them in after I finished the pre-lab lecture to the students who cared enough to arrive on time. This… did not end well.)

On the student side, a lot of the complaints about faculty practices and policies boil down to the same thing in reverse– the idea that faculty need to have the needs and wants of individual students in their class as their absolute top priority. One of the most common complaints about faculty is that they’re “not available enough” and “too slow returning graded work,” both of which implicitly assume that the faculty don’t have anything else to do that’s more important than grading papers and waiting for student questions. That’s not remotely accurate, even if we restrict the scope of activities to professional duties alone, leaving out personal and family concerns. There are research papers to be written or re-written, grant proposals with hard deadlines, committee and department service tasks, and lots of other things that take faculty away from working on that specific class.

And a lot of things that seem like perfectly reasonable requests from an individual student perspective have very real costs for faculty, and for other students in the class. I’m pretty flexible about due dates and the like, but I can’t wait on one student’s homework indefinitely, no matter how good their reason for needing extra time, because it harms the other students in the class. My general practice is to make solutions available to the class as a study aid, and I can’t do that until I have all the homework that’s going to be graded.

Are there faculty whose draconian policies are an unfair imposition on students? I think so, yes. Are there students who feel entitled to excessive deference? Absolutely. (The go-get-coffee-and-come-to-lab-late thing was beyond the pale…) For the most part, though, everybody has priorities they’re trying to balance, and we’re all doing the best we can.

It’s important for both faculty and students to recognize that members of the other group are people trying to balance multiple competing priorities as best they can. Students who really like the class and want to do well can end up having to give other courses, other activities, or their personal well-being a higher priority for part (even most) of the term. And faculty who want to do right by their students can nevertheless have any number of valid reasons for drawing a line and saying “I can do this much, and no more.”

We all want our thing to be everybody else’s top priority, that’s just human nature. It’s not always going to work out that way, though, and recognizing that is the key to avoiding a lot of needless conflict.

March 29, 2017

Tommaso DorigoThe Way I See It

Where by "It" I really mean the Future of mankind. The human race is facing huge new challenges in the XXI century, and we are only starting to get equipped to face them. 

The biggest drama of the past century was arguably caused by the two world conflicts and the subsequent transition to nuclear warfare: humanity had to learn to coexist with the impending threat of global annihilation by thermonuclear war. But today, in addition to that dreadful scenario there are now others we have to cope with.

read more

March 28, 2017

Terence TaoYves Meyer wins the 2017 Abel Prize

Just a short post to note that Norwegian Academy of Science and Letters has just announced that the 2017 Abel prize has been awarded to Yves Meyer, “for his pivotal role in the development of the mathematical theory of wavelets”.  The actual prize ceremony will be at Oslo in May.

I am actually in Oslo myself currently, having just presented Meyer’s work at the announcement ceremony (and also having written a brief description of some of his work).  The Abel prize has a somewhat unintuitive (and occasionally misunderstood) arrangement in which the presenter of the work of the prize is selected independently of the winner of the prize (I think in part so that the choice of presenter gives no clues as to the identity of the laureate).  In particular, like other presenters before me (which in recent years have included Timothy Gowers, Jordan Ellenberg, and Alex Bellos), I agreed to present the laureate’s work before knowing who the laureate was!  But in this case the task was very easy, because Meyer’s areas of (both pure and applied) harmonic analysis and PDE fell rather squarely within my own area of expertise.  (I had previously written about some other work of Meyer in this blog post.)  Indeed I had learned about Meyer’s wavelet constructions as a graduate student while taking a course from Ingrid Daubechies.   Daubechies also made extremely important contributions to the theory of wavelets, but due to a conflict of interest (as per the guidelines for the prize committee) arising from Daubechies’ presidency of the International Mathematical Union (which nominates the majority of the members of the Abel prize committee, who then serve for two years) from 2011 to 2014 (and her continuing service ex officio on the IMU executive committee from 2015 to 2018), she will not be eligible for the prize until 2021 at the earliest, and so I do not think this prize should be necessarily construed as a judgement on the relative contributions of Meyer and Daubechies to this field.  (In any case I fully agree with the Abel prize committee’s citation of Meyer’s pivotal role in the development of the theory of wavelets.)

[Update, Mar 28: link to prize committee guidelines and clarification of the extent of Daubechies’ conflict of interest added. -T]

Filed under: math.CA, math.IT, non-technical Tagged: Abel prize, Yves Meyer

March 25, 2017

Scott AaronsonDaniel Moshe Aaronson

Born Wednesday March 22, 2017, exactly at noon.  19.5 inches, 7 pounds.

I learned that Dana had gone into labor—unexpectedly early, at 37 weeks—just as I was waiting to board a redeye flight back to Austin from the It from Qubit complexity workshop at Stanford.  I made it in time for the birth with a few hours to spare.  Mother and baby appear to be in excellent health.  So far, Daniel seems to be a relatively easy baby.  Lily, his sister, is extremely excited to have a new playmate (though not one who does much yet).

I apologize that I haven’t been answering comments on the is-the-universe-a-simulation thread as promptly as I normally do.  This is why.

March 24, 2017

Chad Orzel“CERN Invented the Web” Isn’t an Argument for Anything

I mentioned in passing in the Forbes post about science funding that I’m thoroughly sick of hearing about how the World Wide Web was invented at CERN. I got into an argument about this a while back on Twitter, too, but had to go do something else and couldn’t go into much detail. It’s probably worth explaining at greater-than-Twitter length, though, and a little too inside-baseball for Forbes, so I’ll write something about it here.

At its core, the “CERN invented WWW” argument is a “Basic research pays off in unexpected ways” argument, and in that sense, it’s fine. The problem is, it’s not anything more than that– its fine as an argument for funding basic research as a general matter, but it’s not an argument for anything in particular.

What bugs me is now when it’s used as a general “Basic research is good” argument, but that it’s used as a catch-all argument for giving particle physicists whatever they want for whatever they decide they want to do next. It’s used to steamroll past a number of other, perfectly valid, arguments about funding priorities within the general area of basic physics research, and that gets really tiresome.

Inventing WWW is great, but it’s not an argument for particle physics in particular, precisely because it was a weird spin-off that nobody expected, or knew what to do with. In fact, you can argue that much of the impact of the Web was enabled precisely because CERN didn’t really understand it, and Time Berners-Lee just went and did it, and gave the whole thing away. You can easily imagine a different arrangement where Web-like network technologies were developed by people who better understood the implications, and operated in a more proprietary way from the start.

As an argument for funding particle physics in particular, though, the argument undermines itself precisely due to the chance nature of the discovery. Past performance does not guarantee future results, and the fact that CERN stumbled into a transformative discovery once doesn’t mean you can expect anything remotely similar to happen again.

The success of the Web is all too often invoked as a way around a very different funding argument, though, where it doesn’t really apply, which is an argument about the relative importance of Big Science. That is, a side spin-off like the Web is a great argument for funding basic science in general, but it doesn’t say anything about the relative merits of spending a billion dollars on building a next-generation particle collider, as opposed to funding a thousand million-dollar grants for smaller projects in less abstract areas of physics.

There are arguments that go both ways on that, and none of them have anything to do with the Web. On the Big Science side, you can argue that working at an extremely large scale necessarily involves pushing the limits of engineering and networking and working in those big limits might offer greater opportunities for discovery. On the small-science side, you can argue that a greater diversity of projects and researchers offers more chances for the unexpected to happen compared to the same investment in a single enormous project.

I’m not sure what the right answer to that question is– given my background, I’m naturally inclined toward the “lots of small projects (in subfields like the one I work in)” model, but I can see some merit to the arguments about working at scale. I think it is a legitimate question, though, one that needs to be considered seriously, and not one that can be headed off by using WWW as a Get Funding Forever trump card for particle physics.

Chad OrzelThe Central Problem of Academic Hiring

A bunch of people in my social-media feeds are sharing this post by Alana Cattapan titled Time-sucking academic job applications don’t know enormity of what they ask. It describes an ad asking for two sample course syllabi “not merely syllabi for courses previously taught — but rather syllabi for specific courses in the hiring department,” and expresses outrage at the imposition on the time of people applying for the job. She argues that the burden falls particularly heavily on groups that are already disadvantaged, such as people currently in contingent faculty positions.

It’s a good argument, as far as it goes, and as someone who has been on the hiring side of more faculty searches than I care to think about, the thought of having to review sample syllabi for every applicant in a pool is… not exactly an appealing prospect. At the same time, though, I can see how a hiring committee would end up implementing this for the best of reasons.

Many of the standard materials used in academic hiring are famously rife with biases– letters of reference being the most obviously problematic, but even the use of CV’s can create issues, as it lends itself to paper-counting and lazy credentialism (“They’re from Bigname University, they must be good…”). Given these well-known problems, I can see a chain of reasoning leading to the sample-syllabus request as a measure to help avoid biases in the hiring process. A sample syllabus is much more concrete than the usual “teaching philosophy” (which tends to be met with boilerplate piffle), particularly if it’s for a specific course familiar to the members of the hiring committee. It offers a relatively objective way to sort out who really understands what’s involved in teaching, that doesn’t rely on name recognition or personal networking. I can even imagine some faculty earnestly arguing that this would give an advantage to people in contingent-faculty jobs, who have lots of teaching experience and would thus be better able to craft a good syllabus than some wet-behind-the-ears grad student from a prestigious university.

And yet, Cattapan’s “too much burden on the applicant” argument is a good one. Which is just another reminder that academic hiring is a lot like Churchill’s famous quip about democracy: whatever system you’re using is the worst possible one, except for all the others.

And, like most discussions of academic hiring, this is frustrating because it dances around what’s really the central problem with academic hiring, namely that the job market for faculty positions absolutely sucks, and has for decades. A single tenure-track opening will generally draw triple-digit numbers of applications, and maybe 40% of those will be obviously unqualified. Which leaves the people doing the hiring with literally dozens of applications that they have to cut down somehow. It’s a process that will necessarily leave large numbers of perfectly well qualified people shut out of jobs through no particular fault of their own, just because there aren’t nearly enough jobs to go around.

Given that market situation, most arguments about why this or that method of winnowing the field of candidates is Bad feel frustratingly pointless. We can drop some measures as too burdensome for applicants, and others as too riddled with bias, but none of that changes the fact that somehow, 149 of 150 applicants need to be disappointed at the end of the process. And it’s never really clear what should replace those problematic methods that would do a substantially better job of weeding out 99.3% of the applicants without introducing new problems.

At some level the fairest thing to do would be to make the easy cut of removing the obviously unqualified and then using a random number generator to pick who gets invited to campus for interviews. I doubt that would make anybody any happier, though.

Don’t get me wrong, this isn’t a throw-up-your-hands anti-measurement argument. I’d love it if somebody could find a relatively objective and reasonably efficient means of picking job candidates out of a large pool, and I certainly think it’s worth exploring new and different ways of measuring academic “quality,” like the sort of thing Bee at Backreaction talks about. (I’d settle for more essays and blog posts saying “This is what you should do,” rather than “This is what you shouldn’t do”…) But it’s also important to note that all of these things are small perturbations to the real central problem of academic hiring, namely that there are too few jobs for too many applicants.

March 22, 2017

Scott AaronsonYour yearly dose of is-the-universe-a-simulation

Yesterday Ryan Mandelbaum, at Gizmodo, posted a decidedly tongue-in-cheek piece about whether or not the universe is a computer simulation.  (The piece was filed under the category “LOL.”)

The immediate impetus for Mandelbaum’s piece was a blog post by Sabine Hossenfelder, a physicist who will likely be familiar to regulars here in the nerdosphere.  In her post, Sabine vents about the simulation speculations of philosophers like Nick Bostrom.  She writes:

Proclaiming that “the programmer did it” doesn’t only not explain anything – it teleports us back to the age of mythology. The simulation hypothesis annoys me because it intrudes on the terrain of physicists. It’s a bold claim about the laws of nature that however doesn’t pay any attention to what we know about the laws of nature.

After hammering home that point, Sabine goes further, and says that the simulation hypothesis is almost ruled out, by (for example) the fact that our universe is Lorentz-invariant, and a simulation of our world by a discrete lattice of bits won’t reproduce Lorentz-invariance or other continuous symmetries.

In writing his post, Ryan Mandelbaum interviewed two people: Sabine and me.

I basically told Ryan that I agree with Sabine insofar as she argues that the simulation hypothesis is lazy—that it doesn’t pay its rent by doing real explanatory work, doesn’t even engage much with any of the deep things we’ve learned about the physical world—and disagree insofar as she argues that the simulation hypothesis faces some special difficulty because of Lorentz-invariance or other continuous phenomena in known physics.  In short: blame it for being unfalsifiable rather than for being falsified!

Indeed, to whatever extent we believe the Bekenstein bound—and even more pointedly, to whatever extent we think the AdS/CFT correspondence says something about reality—we believe that in quantum gravity, any bounded physical system (with a short-wavelength cutoff, yada yada) lives in a Hilbert space of a finite number of qubits, perhaps ~1069 qubits per square meter of surface area.  And as a corollary, if the cosmological constant is indeed constant (so that galaxies more than ~20 billion light years away are receding from us faster than light), then our entire observable universe can be described as a system of ~10122 qubits.  The qubits would in some sense be the fundamental reality, from which Lorentz-invariant spacetime and all the rest would need to be recovered as low-energy effective descriptions.  (I hasten to add: there’s of course nothing special about qubits here, any more than there is about bits in classical computation, compared to some other unit of information—nothing that says the Hilbert space dimension has to be a power of 2 or anything silly like that.)  Anyway, this would mean that our observable universe could be simulated by a quantum computer—or even for that matter by a classical computer, to high precision, using a mere ~210^122 time steps.

Sabine might respond that AdS/CFT and other quantum gravity ideas are mere theoretical speculations, not solid and established like special relativity.  But crucially, if you believe that the observable universe couldn’t be simulated by a computer even in principle—that it has no mapping to any system of bits or qubits—then at some point the speculative shoe shifts to the other foot.  The question becomes: do you reject the Church-Turing Thesis?  Or, what amounts to the same thing: do you believe, like Roger Penrose, that it’s possible to build devices in nature that solve the halting problem or other uncomputable problems?  If so, how?  But if not, then how exactly does the universe avoid being computational, in the broad sense of the term?

I’d write more, but by coincidence, right now I’m at an It from Qubit meeting at Stanford, where everyone is talking about how to map quantum theories of gravity to quantum circuits acting on finite sets of qubits, and the questions in quantum circuit complexity that are thereby raised.  It’s tremendously exciting—the mixture of attendees is among the most stimulating I’ve ever encountered, from Lenny Susskind and Don Page and Daniel Harlow to Umesh Vazirani and Dorit Aharonov and Mario Szegedy to Google’s Sergey Brin.  But it should surprise no one that, amid all the discussion of computation and fundamental physics, the question of whether the universe “really” “is” a simulation has barely come up.  Why would it, when there are so many more fruitful things to ask?  All I can say with confidence is that, if our world is a simulation, then whoever is simulating it (God, or a bored teenager in the metaverse) seems to have a clear preference for the 2-norm over the 1-norm, and for the complex numbers over the reals.

March 19, 2017

John PreskillLocal operations and Chinese communications

The workshop spotlighted entanglement. It began in Shanghai, paused as participants hopped the Taiwan Strait, and resumed in Taipei. We discussed quantum operations and chaos, thermodynamics and field theory.1 I planned to return from Taipei to Shanghai to Los Angeles.

Quantum thermodynamicist Nelly Ng and I drove to the Taipei airport early. News from Air China curtailed our self-congratulations: China’s military was running an operation near Shanghai. Commercial planes couldn’t land. I’d miss my flight to LA.


Two quantum thermodynamicists in Shanghai

An operation?

Quantum information theorists use a mindset called operationalism. We envision experimentalists in separate labs. Call the experimentalists Alice, Bob, and Eve (ABE). We tell stories about ABE to formulate and analyze problems. Which quantum states do ABE prepare? How do ABE evolve, or manipulate, the states? Which measurements do ABE perform? Do they communicate about the measurements’ outcomes?

Operationalism concretizes ideas. The outlook checks us from drifting into philosophy and into abstractions difficult to apply physics tools to.2 Operationalism infuses our language, our framing of problems, and our mathematical proofs.

Experimentalists can perform some operations more easily than others. Suppose that Alice controls the magnets, lasers, and photodetectors in her lab; Bob controls the equipment in his; and Eve controls the equipment in hers. Each experimentalist can perform local operations (LO). Suppose that Alice, Bob, and Eve can talk on the phone and send emails. They exchange classical communications (CC).

You can’t generate entanglement using LOCC. Entanglement consists of strong correlations that quantum systems can share and that classical systems can’t. A quantum system in Alice’s lab can hold more information about a quantum system of Bob’s than any classical system could. We must create and control entanglement to operate quantum computers. Creating and controlling entanglement poses challenges. Hence quantum information scientists often model easy-to-perform operations with LOCC.

Suppose that some experimentalist Charlie loans entangled quantum systems to Alice, Bob, and Eve. How efficiently can ABE compute some quantity, exchange quantum messages, or perform other information-processing tasks, using that entanglement? Such questions underlie quantum information theory.


Taipei’s night market. Or Caltech’s neighborhood?

Local operations.

Nelly and I performed those, trying to finagle me to LA. I inquired at Air China’s check-in desk in English. Nelly inquired in Mandarin. An employee smiled sadly at each of us.

We branched out into classical communications. I called Expedia (“No, I do not want to fly to Manila”), United Airlines (“No flights for two days?”), my credit-card company, Air China’s American reservations office, Air China’s Chinese reservations office, and Air China’s Taipei reservations office. I called AT&T to ascertain why I couldn’t reach Air China (“Yes, please connect me to the airline. Could you tell me the number first? I’ll need to dial it after you connect me and the call is then dropped”).

As I called, Nelly emailed. She alerted Bob, aka Janet (Ling-Yan) Hung, who hosted half the workshop at Fudan University in Shanghai. Nelly emailed Eve, aka Feng-Li Lin, who hosted half the workshop at National Taiwan University in Taipei. Janet twiddled the magnets in her lab (investigated travel funding), and Feng-Li cooled a refrigerator in his.

ABE can process information only so efficiently, using LOCC. The time crept from 1:00 PM to 3:30.


Nelly Ng uses classical communications.

What could we have accomplished with quantum communication? Using LOCC, Alice can manipulate quantum states (like an electron’s orientation) in her lab. She can send nonquantum messages (like “My flight is delayed”) to Bob. She can’t send quantum information (like an electron’s orientation).

Alice and Bob can ape quantum communication, given entanglement. Suppose that Charlie strongly correlates two electrons. Suppose that Charlie gives Alice one electron and gives Bob the other. Alice can send one qubit–one unit of quantum information–to Bob. We call that sending quantum teleportation.

Suppose that air-traffic control had loaned entanglement to Janet, Feng-Li, and me. Could we have finagled me to LA quickly?

Quantum teleportation differs from human teleportation.


We didn’t need teleportation. Feng-Li arranged for me to visit Taiwan’s National Center for Theoretical Sciences (NCTS) for two days. Air China agreed to return me to Shanghai afterward. United would fly me to LA, thanks to help from Janet. Nelly rescued my luggage from leaving on the wrong flight.

Would I rather have teleported? I would have avoided a bushel of stress. But I wouldn’t have learned from Janet about Chinese science funding, wouldn’t have heard Feng-Li’s views about gravitational waves, wouldn’t have glimpsed Taiwanese countryside flitting past the train we rode to the NCTS.

According to some metrics, classical resources outperform quantum.


At Taiwan’s National Center for Theoretical Sciences

The workshop organizers have generously released videos of the lectures. My lecture about quantum chaos and fluctuation relations appears here and here. More talks appear here.

With gratitude to Janet Hung, Feng-Li Lin, and Nelly Ng; to Fudan University, National Taiwan University, and Taiwan’s National Center for Theoretical Sciences for their hospitality; and to Xiao Yu for administrative support.

Glossary and other clarifications:

1Field theory describes subatomic particles and light.

2Physics and philosophy enrich each other. But I haven’t trained in philosophy. I benefit from differentiating physics problems that I’ve equipped to solve from philosophy problems that I haven’t.

Scott AaronsonI will not log in to your website

Two or three times a day, I get an email whose basic structure is as follows:

Prof. Aaronson, given your expertise, we’d be incredibly grateful for your feedback on a paper / report / grant proposal about quantum computing.  To access the document in question, all you’ll need to do is create an account on our proprietary DigiScholar Portal system, a process that takes no more than 3 hours.  If, at the end of that process, you’re told that the account setup failed, it might be because your browser’s certificates are outdated, or because you already have an account with us, or simply because our server is acting up, or some other reason.  If you already have an account, you’ll of course need to remember your DigiScholar Portal ID and password, and not confuse them with the 500 other usernames and passwords you’ve created for similar reasons—ours required their own distinctive combination of upper and lowercase letters, numerals, and symbols.  After navigating through our site to access the document, you’ll then be able to enter your DigiScholar Review, strictly adhering to our 15-part format, and keeping in mind that our system will log you out and delete all your work after 30 seconds of inactivity.  If you have trouble, just call our helpline during normal business hours (excluding Wednesdays and Thursdays) and stay on the line until someone assists you.  Most importantly, please understand that we can neither email you the document we want you to read, nor accept any comments about it by email.  In fact, all emails to this address will be automatically ignored.

Every day, I seem to grow crustier than the last.

More than a decade ago, I resolved that I would no longer submit to or review for most for-profit journals, as a protest against the exorbitant fees that those journals charge academics in order to buy back access to our own work—work that we turn over to the publishers (copyright and all) and even review for them completely for free, with the publishers typically adding zero or even negative value.  I’m happy that I’ve been able to keep that pledge.

Today, I’m proud to announce a new boycott, less politically important but equally consequential for my quality of life, and to recommend it to all of my friends.  Namely: as long as the world gives me any choice in the matter, I will never again struggle to log in to any organization’s website.  I’ll continue to devote a huge fraction of my waking hours to fielding questions from all sorts of people on the Internet, and I’ll do it cheerfully and free of charge.  All I ask is that, if you have a question, or a document you want me to read, you email it!  Or leave a blog comment, or stop by in person, or whatever—but in any case, don’t make me log in to anything other than Gmail or Facebook or WordPress or a few other sites that remain navigable by a senile 35-year-old who’s increasingly fixed in his ways.  Even Google Docs and Dropbox are pushing it: I’ll give up (on principle) at the first sight of any login issue, and ask for just a regular URL or an attachment.

Oh, Skype no longer lets me log in either.  Could I get to the bottom of that?  Probably.  But life is too short, and too precious.  So if we must, we’ll use the phone, or Google Hangouts.

In related news, I will no longer patronize any haircut place that turns away walk-in customers.

Back when we were discussing the boycott of Elsevier and the other predatory publishers, I wrote that this was a rare case “when laziness and idealism coincide.”  But the truth is more general: whenever my deepest beliefs and my desire to get out of work both point in the same direction, from here till the grave there’s not a force in the world that can turn me the opposite way.

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …


Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

March 18, 2017

March 12, 2017

Jordan EllenbergFitchburg facts

March 11, 2017

Terence TaoFurstenberg limits of the Liouville function

Given a function {f: {\bf N} \rightarrow \{-1,+1\}} on the natural numbers taking values in {+1, -1}, one can invoke the Furstenberg correspondence principle to locate a measure preserving system {T \circlearrowright (X, \mu)} – a probability space {(X,\mu)} together with a measure-preserving shift {T: X \rightarrow X} (or equivalently, a measure-preserving {{\bf Z}}-action on {(X,\mu)}) – together with a measurable function (or “observable”) {F: X \rightarrow \{-1,+1\}} that has essentially the same statistics as {f} in the sense that

\displaystyle \lim \inf_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)

\displaystyle \leq \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle \leq \lim \sup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)

for any integers {h_1,\dots,h_k}. In particular, one has

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x) = \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k) \ \ \ \ \ (1)


whenever the limit on the right-hand side exists. We will refer to the system {T \circlearrowright (X,\mu)} together with the designated function {F} as a Furstenberg limit ot the sequence {f}. These Furstenberg limits capture some, but not all, of the asymptotic behaviour of {f}; roughly speaking, they control the typical “local” behaviour of {f}, involving correlations such as {\frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)} in the regime where {h_1,\dots,h_k} are much smaller than {N}. However, the control on error terms here is usually only qualitative at best, and one usually does not obtain non-trivial control on correlations in which the {h_1,\dots,h_k} are allowed to grow at some significant rate with {N} (e.g. like some power {N^\theta} of {N}).

The correspondence principle is discussed in these previous blog posts. One way to establish the principle is by introducing a Banach limit {p\!-\!\lim: \ell^\infty({\bf N}) \rightarrow {\bf R}} that extends the usual limit functional on the subspace of {\ell^\infty({\bf N})} consisting of convergent sequences while still having operator norm one. Such functionals cannot be constructed explicitly, but can be proven to exist (non-constructively and non-uniquely) using the Hahn-Banach theorem; one can also use a non-principal ultrafilter here if desired. One can then seek to construct a system {T \circlearrowright (X,\mu)} and a measurable function {F: X \rightarrow \{-1,+1\}} for which one has the statistics

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x) = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k) \ \ \ \ \ (2)


for all {h_1,\dots,h_k}. One can explicitly construct such a system as follows. One can take {X} to be the Cantor space {\{-1,+1\}^{\bf Z}} with the product {\sigma}-algebra and the shift

\displaystyle T ( (x_n)_{n \in {\bf Z}} ) := (x_{n+1})_{n \in {\bf Z}}

with the function {F: X \rightarrow \{-1,+1\}} being the coordinate function at zero:

\displaystyle F( (x_n)_{n \in {\bf Z}} ) := x_0

(so in particular {F( T^h (x_n)_{n \in {\bf Z}} ) = x_h} for any {h \in {\bf Z}}). The only thing remaining is to construct the invariant measure {\mu}. In order to be consistent with (2), one must have

\displaystyle \mu( \{ (x_n)_{n \in {\bf Z}}: x_{h_j} = \epsilon_j \forall 1 \leq j \leq k \} )

\displaystyle = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N 1_{f(n+h_1)=\epsilon_1} \dots 1_{f(n+h_k)=\epsilon_k}

for any distinct integers {h_1,\dots,h_k} and signs {\epsilon_1,\dots,\epsilon_k}. One can check that this defines a premeasure on the Boolean algebra of {\{-1,+1\}^{\bf Z}} defined by cylinder sets, and the existence of {\mu} then follows from the Hahn-Kolmogorov extension theorem (or the closely related Kolmogorov extension theorem). One can then check that the correspondence (2) holds, and that {\mu} is translation-invariant; the latter comes from the translation invariance of the (Banach-)Césaro averaging operation {f \mapsto p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n)}. A variant of this construction shows that the Furstenberg limit is unique up to equivalence if and only if all the limits appearing in (1) actually exist.

One can obtain a slightly tighter correspondence by using a smoother average than the Césaro average. For instance, one can use the logarithmic Césaro averages {\lim_{N \rightarrow \infty} \frac{1}{\log N}\sum_{n=1}^N \frac{f(n)}{n}} in place of the Césaro average {\sum_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n)}, thus one replaces (2) by

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n+h_1) \dots f(n+h_k)}{n}.

Whenever the Césaro average of a bounded sequence {f: {\bf N} \rightarrow {\bf R}} exists, then the logarithmic Césaro average exists and is equal to the Césaro average. Thus, a Furstenberg limit constructed using logarithmic Banach-Césaro averaging still obeys (1) for all {h_1,\dots,h_k} when the right-hand side limit exists, but also obeys the more general assertion

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle = \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n+h_1) \dots f(n+h_k)}{n}

whenever the limit of the right-hand side exists.

In a recent paper of Frantizinakis, the Furstenberg limits of the Liouville function {\lambda} (with logarithmic averaging) were studied. Some (but not all) of the known facts and conjectures about the Liouville function can be interpreted in the Furstenberg limit. For instance, in a recent breakthrough result of Matomaki and Radziwill (discussed previously here), it was shown that the Liouville function exhibited cancellation on short intervals in the sense that

\displaystyle \lim_{H \rightarrow \infty} \limsup_{X \rightarrow \infty} \frac{1}{X} \int_X^{2X} \frac{1}{H} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = 0.

In terms of Furstenberg limits of the Liouville function, this assertion is equivalent to the assertion that

\displaystyle \lim_{H \rightarrow \infty} \int_X |\frac{1}{H} \sum_{h=1}^H F(T^h x)|\ d\mu(x) = 0

for all Furstenberg limits {T \circlearrowright (X,\mu), F} of Liouville (including those without logarithmic averaging). Invoking the mean ergodic theorem (discussed in this previous post), this assertion is in turn equivalent to the observable {F} that corresponds to the Liouville function being orthogonal to the invariant factor {L^\infty(X,\mu)^{\bf Z} = \{ g \in L^\infty(X,\mu): g \circ T = g \}} of {X}; equivalently, the first Gowers-Host-Kra seminorm {\|F\|_{U^1(X)}} of {F} (as defined for instance in this previous post) vanishes. The Chowla conjecture, which asserts that

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for all distinct integers {h_1,\dots,h_k}, is equivalent to the assertion that all the Furstenberg limits of Liouville are equivalent to the Bernoulli system ({\{-1,+1\}^{\bf Z}} with the product measure arising from the uniform distribution on {\{-1,+1\}}, with the shift {T} and observable {F} as before). Similarly, the logarithmically averaged Chowla conjecture

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = 0

is equivalent to the assertion that all the Furstenberg limits of Liouville with logarithmic averaging are equivalent to the Bernoulli system. Recently, I was able to prove the two-point version

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) \lambda(n+h)}{n} = 0 \ \ \ \ \ (3)


of the logarithmically averaged Chowla conjecture, for any non-zero integer {h}; this is equivalent to the perfect strong mixing property

\displaystyle \int_X F(x) F(T^h x)\ d\mu(x) = 0

for any Furstenberg limit of Liouville with logarithmic averaging, and any {h \neq 0}.

The situation is more delicate with regards to the Sarnak conjecture, which is equivalent to the assertion that

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \lambda(n) f(n) = 0

for any zero-entropy sequence {f: {\bf N} \rightarrow {\bf R}} (see this previous blog post for more discussion). Morally speaking, this conjecture should be equivalent to the assertion that any Furstenberg limit of Liouville is disjoint from any zero entropy system, but I was not able to formally establish an implication in either direction due to some technical issues regarding the fact that the Furstenberg limit does not directly control long-range correlations, only short-range ones. (There are however ergodic theoretic interpretations of the Sarnak conjecture that involve the notion of generic points; see this paper of El Abdalaoui, Lemancyk, and de la Rue.) But the situation is currently better with the logarithmically averaged Sarnak conjecture

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) f(n)}{n} = 0,

as I was able to show that this conjecture was equivalent to the logarithmically averaged Chowla conjecture, and hence to all Furstenberg limits of Liouville with logarithmic averaging being Bernoulli; I also showed the conjecture was equivalent to local Gowers uniformity of the Liouville function, which is in turn equivalent to the function {F} having all Gowers-Host-Kra seminorms vanishing in every Furstenberg limit with logarithmic averaging. In this recent paper of Frantzikinakis, this analysis was taken further, showing that the logarithmically averaged Chowla and Sarnak conjectures were in fact equivalent to the much milder seeming assertion that all Furstenberg limits with logarithmic averaging were ergodic.

Actually, the logarithmically averaged Furstenberg limits have more structure than just a {{\bf Z}}-action on a measure preserving system {(X,\mu)} with a single observable {F}. Let {Aff_+({\bf Z})} denote the semigroup of affine maps {n \mapsto an+b} on the integers with {a,b \in {\bf Z}} and {a} positive. Also, let {\hat {\bf Z}} denote the profinite integers (the inverse limit of the cyclic groups {{\bf Z}/q{\bf Z}}). Observe that {Aff_+({\bf Z})} acts on {\hat {\bf Z}} by taking the inverse limit of the obvious actions of {Aff_+({\bf Z})} on {{\bf Z}/q{\bf Z}}.

Proposition 1 (Enriched logarithmically averaged Furstenberg limit of Liouville) Let {p\!-\!\lim} be a Banach limit. Then there exists a probability space {(X,\mu)} with an action {\phi \mapsto T^\phi} of the affine semigroup {Aff_+({\bf Z})}, as well as measurable functions {F: X \rightarrow \{-1,+1\}} and {M: X \rightarrow \hat {\bf Z}}, with the following properties:

  • (i) (Affine Furstenberg limit) For any {\phi_1,\dots,\phi_k \in Aff_+({\bf Z})}, and any congruence class {a\ (q)}, one has

    \displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(\phi_1(n)) \dots \lambda(\phi_k(n)) 1_{n = a\ (q)}}{n}

    \displaystyle = \int_X F( T^{\phi_1}(x) ) \dots F( T^{\phi_k}(x) ) 1_{M(x) = a\ (q)}\ d\mu(x).

  • (ii) (Equivariance of {M}) For any {\phi \in Aff_+({\bf Z})}, one has

    \displaystyle M( T^\phi(x) ) = \phi( M(x) )

    for {\mu}-almost every {x \in X}.

  • (iii) (Multiplicativity at fixed primes) For any prime {p}, one has

    \displaystyle F( T^{p\cdot} x ) = - F(x)

    for {\mu}-almost every {x \in X}, where {p \cdot \in Aff_+({\bf Z})} is the dilation map {n \mapsto pn}.

  • (iv) (Measure pushforward) If {\phi \in Aff_+({\bf Z})} is of the form {\phi(n) = an+b} and {S_\phi \subset X} is the set {S_\phi = \{ x \in X: M(x) \in \phi(\hat {\bf Z}) \}}, then the pushforward {T^\phi_* \mu} of {\mu} by {\phi} is equal to {a \mu\downharpoonright_{S_\phi}}, that is to say one has

    \displaystyle \mu( (T^\phi)^{-1}(E) ) = a \mu( E \cap S_\phi )

    for every measurable {E \subset X}.

Note that {{\bf Z}} can be viewed as the subgroup of {Aff_+({\bf Z})} consisting of the translations {n \mapsto n + b}. If one only keeps the {{\bf Z}}-portion of the {Aff_+({\bf Z})} action and forgets the rest (as well as the function {M}) then the action becomes measure-preserving, and we recover an ordinary Furstenberg limit with logarithmic averaging. However, the additional structure here can be quite useful; for instance, one can transfer the proof of (3) to this setting, which we sketch below the fold, after proving the proposition.

The observable {M}, roughly speaking, means that points {x} in the Furstenberg limit {X} constructed by this proposition are still “virtual integers” in the sense that one can meaningfully compute the residue class of {x} modulo any natural number modulus {q}, by first applying {M} and then reducing mod {q}. The action of {Aff_+({\bf Z})} means that one can also meaningfully multiply {x} by any natural number, and translate it by any integer. As with other applications of the correspondence principle, the main advantage of moving to this more “virtual” setting is that one now acquires a probability measure {\mu}, so that the tools of ergodic theory can be readily applied.

— 1. Proof of proposition —

We adapt the previous construction of the Furstenberg limit. The space {X} will no longer be the Cantor space {\{-1,+1\}^{\bf Z}}, but will instead be taken to be the space

\displaystyle X := \{-1,+1\}^{Aff_+({\bf Z})} \times \hat {\bf Z}.

The action of {Aff_+({\bf Z})} here is given by

\displaystyle T^\phi ( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := ( (x_{\psi \phi})_{\psi \in Aff_+({\bf Z})}, \phi(m) );

this can easily be seen to be a semigroup action. The observables {F: X \rightarrow \{-1,+1\}} and {M: X \rightarrow \hat {\bf Z}} are defined as

\displaystyle F( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := x_{id}


\displaystyle M( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := m

where {id} is the identity element of {Aff_+({\bf Z})}. Property (ii) is now clear. Now we have to construct the measure {\mu}. In order to be consistent with property (i), the measure of the set

\displaystyle \{ ((x_\phi)_{\phi \in Aff_+({\bf Z})}, m): x_{\phi_j} = \epsilon_j \forall 1 \leq j \leq k; m = a\ (q) \} \ \ \ \ \ (4)


for any distinct {\phi_1,\dots,\phi_k \in Aff_+({\bf Z})}, signs {\epsilon_1,\dots,\epsilon_k \in \{-1,+1\}}, and congruence class {a\ (q)}, must be equal to

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{1_{\lambda(\phi_j(n)) = \epsilon_j \forall 1 \leq j \leq k; m = a\ (q)}}{n}.

One can check that this requirement uniquely defines a premeasure on the Boolean algebra on {X} generated by the sets (4), and {\mu} can then be constructed from the Hahn-Kolmogorov theorem as before. Property (i) follows from construction. Specialising to the case {\phi_1(n) = n}, {\phi_2(n) = pn} for a prime {p} we have

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) \lambda(pn)}{n}

\displaystyle = \int_X F( x ) F( T^{p \cdot}(x) ) \ d\mu(x);

the left-hand side is {-1}, which gives (iii).

It remains to establish (iv). It will suffice to do so for sets {E} of the form (4). The claim then follows from the dilation invariance property

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(an+b)}{n} = a p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n)}{n} 1_{n = b\ (a)}

for any bounded function {f}, which is easily verified (here is where it is essential that we are using logarithmically averaged Césaro means rather than ordinary Césaro means).

Remark 2 One can embed this {Aff_+({\bf Z})}-system {X} as a subsystem of a {Aff_+({\bf Q})}-system {Aff_+({\bf Q}) \otimes_{Aff_+({\bf Z})} X}, however this larger system is only {\sigma}-finite rather than a probability space, and also the observable {M} now takes values in the larger space {{\bf Q} \otimes_{\bf Z} \hat {\bf Z}}. This recovers a group action rather than a semigroup action, but I am not sure if the added complexity of infinite measure is worth it.

— 2. Two-point logarithmic Chowla —

We now sketch how the proof of (3) in this paper can be translated to the ergodic theory setting. For sake of notation let us just prove (3) when {h=1}. We will assume familiarity with ergodic theory concepts in this sketch. By taking a suitable Banach limit, it will suffice to establish that

\displaystyle \int_X F(x) F( T^{\cdot+1} x)\ d\mu(x) = 0

for any Furstenberg limit produced by Proposition 1, where {\cdot+h} denotes the operation of translation by {h}. By property (iii) of that proposition, we can the left-hand side as

\displaystyle \int_X F(T^{p\cdot} x) F( T^{p\cdot+p} x)\ d\mu(x)

for any prime {p}, and then by property (iv) we can write this in turn as

\displaystyle \int_X F(x) F( T^{p} x) p 1_{M(x) = 0\ (p)}\ d\mu(x).

Averaging, we thus have

\displaystyle \int_X F(x) F( T^{\cdot+1} x)\ d\mu(x) = \frac{1}{|\mathcal P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{p} x) p 1_{M(x) = 0\ (p)}\ d\mu(x)

for any {P>1}, where {{\mathcal P}_P} denotes the primes between {P/2} and {P}.

On the other hand, the Matomaki-Radziwill theorem (twisted by Dirichlet characters) tells us that for any congruence class {q \geq 1}, one has

\displaystyle \lim_{H \rightarrow \infty} \lim\sup_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n \leq N} |\frac{1}{H} \sum_{h=1}^H \lambda(n+qh)| = 0

which on passing to the Furstenberg limit gives

\displaystyle \lim_{H \rightarrow \infty} \int_X |\frac{1}{H} \sum_{h=1}^H F( T^{\cdot+qh} x)|\ d\mu(x) = 0.

Applying the mean ergodic theorem, we conclude that {F} is orthogonal to the profinite factor of the {{\bf Z}}-action, by which we mean the factor generated by the functions that are periodic ({T^{\cdot+q}}-invariant for some {q \geq 1}). One can show from Fourier analysis that the profinite factor is characteristic for averaging along primes, and in particular that

\displaystyle \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{\cdot + p} x)\ d\mu \rightarrow 0

as {P \rightarrow \infty}. (This is not too difficult given the usual Vinogradov estimates for exponential sums over primes, but I don’t know of a good reference for this fact. This paper of Frantzikinakis, Host, and Kra establishes the analogous claim that the Kronecker factor is characteristic for triple averages {\frac{1}{|\mathcal P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{p} x)\ d\mu}, and their argument would also apply here, but this is something of an overkill.) Thus, if we define the quantities

\displaystyle Q_P := \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{\cdot + p} x) (p 1_{M(x) = 0\ (p)}-1)\ d\mu(x)

it will suffice to show that {\liminf_{P \rightarrow \infty} |Q_P| = 0}.

Suppose for contradiction that {|Q_P| \geq \varepsilon} for all sufficiently large {P}. We can write {Q_P} as an expectation

\displaystyle Q_P = {\bf E} F_P( X_P, Y_P )

where {X_P} is the {\{-1,+1\}^P}-valued random variable

\displaystyle X_P := ( F( T^{\cdot + k} x ) )_{0 \leq k < P}

with {x} drawn from {X} with law {\mu}, {Y_P} is the {\prod_{p \in {\mathcal P}_P} {\bf Z}/p{\bf Z}}-valued random variable

\displaystyle Y_P := ( M(x)\ (p) )_{p \in {\mathcal P}_P}

with {x} as before, and {F_P} is the function

\displaystyle F_P( (\epsilon_k)_{0 \leq k < P}, (a_p)_{p \in {\mathcal P}_P} ) := \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \epsilon_0 \epsilon_p (p 1_{a_p = 0\ (p)} - 1).

As {|Q_P| \geq \varepsilon}, we have

\displaystyle |F_P(X_P, Y_P)| \geq \varepsilon/2

with probability at least {\varepsilon/2}. On the other hand, an application of Hoeffding’s inequality and the prime number theorem shows that if {U_P} is drawn uniformly from {\prod_{p \in {\mathcal P}_P} {\bf Z}/p{\bf Z}} and independently of {X_P}, that one has the concentration of measure bound

\displaystyle \mathop{\bf P}( |F_P(X_P, U_P)| > \varepsilon/2 ) \leq 2 \exp( - c_\varepsilon P / \log P )

for some {c_\varepsilon > 0}. Using the Pinsker-type inequality from this previous blog post, we conclude the lower bound

\displaystyle I( X_P : Y_P ) \gg_\varepsilon \frac{P}{\log P}

on the mutual information between {X_P} and {Y_P}. Using Shannon entropy inequalities as in my paper, this implies the entropy decrement

\displaystyle \frac{H(X_{kP})}{kP} \leq \frac{H(X_P)}{P} - \frac{c_\varepsilon}{\log P} + O( \frac{1}{k} )

for any natural number {k}, which on iterating (and using the divergence of {\sum_{j=1}^\infty \frac{1}{j \log j}}) shows that {\frac{H(X_P)}{P}} eventually becomes negative for sufficiently large {P}, which is absurd. (See also this previous blog post for a sketch of a slightly different way to conclude the argument from entropy inequalities.)

Filed under: expository, math.NT Tagged: Chowla conjecture, correspondence principle, Liouville function, randomness

March 10, 2017

John PreskillWhat is Water 2.0

Before I arrived in Los Angeles, I thought I might need to hit the brakes a bit with some of the radical physics theories I’d encountered during my preliminary research. After all, these were scientists I was meeting: people who “engage in a systematic activity to acquire knowledge that describes and predicts the natural world”, according to Wikipedia. It turns out I wasn’t hardly as far-out as they were.

I could recount numerous anecdotes that exemplify my encounter with the frighteningly intelligent and vivid imagination of the people at LIGO with whom I had the great pleasure of working – Prof. Rana X. Adhikari, Maria Okounkova, Eric Quintero, Maximiliano Isi, Sarah Gossan, and Jameson Graef Rollins – but in the end it all boils down to a parable about fish.

Rana’s version, which he recounted to me on our first meeting, goes as follows: “There are these two young fish swimming along, and a scientist approaches the aquarium and proclaims, “We’ve finally discovered the true nature of water!” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes, “What the hell is water?”” In David Foster Wallace’s more famous version, the scientist is not a scientist but an old fish, who greets them saying, “Morning, boys. How’s the water?”

What is Water

The difference is not circumstantial. Foster Wallace’s version is an argument against “unconsciousness, the default setting, the rat race, the constant gnawing sense of having had, and lost, some infinite thing” – personified by the young fish – and an urgent call for awareness – personified by the old fish. But in Rana’s version, the matter is more hard-won: as long as they are fish, they haven’t the faintest apprehension of the very concept of water: even a wise old fish would fail to notice. In this adaptation, gaining awareness of that which is “so real and essential, so hidden in plain sight all around us, all the time” as Foster Wallace describes it, demands much more than just an effort in mindfulness. It demands imagining the unimaginable.

Albert Einstein once said that “Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.” But the question remains of how far our imagination can reach, and where the radius ends for us in “what there ever will be to know and understand”, versus that which happens to be. My earlier remark about LIGO scientists’ being far-out does not at all refer to a speculative disposition, which would characterise amateur anything-goes, and does go over-the-edge pseudo-science. Rather, it refers to the high level of creativity that is demanded of physicists today, and to the untiring curiosity that drives them to expand the limits of that radius, despite all odds.

The possibility of imagination has become an increasingly animating thought within my currently ongoing project:

As an independent curator of contemporary art, I travelled to Caltech for a 6-week period of research, towards developing an exhibition that will invite the public to engage with some of the highly challenging implications around the concept of time in physics. In it, I identify LIGO’s breakthrough detection of gravitational waves as an unparalleled incentive by which to acquire – in broad cultural terms – a new sense of time that departs from the old and now wholly inadequate one. After LIGO’s announcement proved that time fluctuation not only happens, but that it happened here, to us, on a precise date and time, it is finally possible for a broader public to relate, however abstract some of the concepts from the field of physics may remain. More simply put: we can finally sense that the water is moving.[1]

One century after Einstein’s Theory of General Relativity, most people continue to hold a highly impoverished idea of the nature of time, despite it being perhaps the most fundamental element of our existence. For 100 years there was no blame or shame in this. Because within all possible changes to the three main components of the universe – space, time & energy – the fluctuation of time was always the only one that escaped our sensorial capacities, existing exclusively in our minds, and finding its fullest expression in mathematical language. If you don’t speak mathematics, time fluctuation remains impossible to grasp, and painful to imagine.

But on February 11th, 2016, this situation changed dramatically.

On this date, a televised announcement told the world of the first-ever sensory detection of time-fluctuation, made with the aid of the most sensitive machine ever to be built by mankind. Finally, we have sensorial access to variations in all components of the universe as we know it. What is more, we observe the non-static passage of time through sound, thereby connecting it to the most affective of our senses.


Of course, LIGO’s detection is limited to time fluctuation and doesn’t yet make other mind-bending behaviours of time observable. But this is only circumstantial. The key point is that we can take this initial leap, and that it loosens our feet from the cramp of Newtonian fixity. Once in this state, gambolling over to ideas about zero time tunnelling, non-causality, or the future determining the present, for instance, is far more plausible, and no longer painful but rather seductive, at least, perhaps, for the playful at heart.

Taking a slight off-road (to be re-routed in a moment): there is a common misconception about children’s allegedly free-spirited creativity. Watching someone aged between around 4 and 15 draw a figure will demonstrate quite clearly just how taut they really are, and that they apply strict schemes that follow reality as they see and learn to see it. Bodies consistently have eyes, mouths, noses, heads, rumps and limbs, correctly placed and in increasingly realistic colours. Ask them to depart from these conventions – “draw one eye on his forehead”, “make her face green” – like masters such as Pablo Picasso and Henri Matisse have done – and they’ll likely become very upset (young adolescents being particularly conservative, reaching the point of panic when challenged to shed consensus).

This is not to compare the lay public (including myself) to children, but to suggest that there’s no inborn capacity – the unaffected, ‘genius’ naïveté that the modernist movements of Primitivism, Art Brut and Outsider Art exalted – for developing a creativity that is of substance. Arriving at a consequential idea, in both art and physics, entails a great deal of acumen and is far from gratuitous, however whimsical the moment in which it sometimes appears. And it’s also to suggest that there’s a necessary process of acquaintance – the knowledge of something through experience – in taking a cognitive leap away from the seemingly obvious nature of reality. If there’s some truth in this, then LIGO’s expansion of our sensorial access to the fluctuation of time, together with artistic approaches that lift the remaining questions and ambiguities of spacetime onto a relational, experiential plane, lay fertile ground on which to begin to foster a new sense of time – on a broad cultural level – however slowly it unfolds.

The first iteration of this project will be an exhibition, to take place in Berlin, in July 2017. It will feature existing and newly commissioned works by established and upcoming artists from Los Angeles and Berlin, working in sound, installation and video, to stage a series of immersive environments that invite the viewers’ bodily interaction.

Though the full selection cannot be disclosed just yet, I would like here to provide a glimpse of two works-in-progress by artist-duo Evelina Domnitch & Dmitry Gelfand, whom I invited to Los Angeles to collaborate in my research with LIGO, and whose contribution has been of great value to the project.

For more details on the exhibition, please stay tuned, and be warmly welcome to visit Berlin in July!

Text & images: courtesy of the artists.

ORBIHEDRON | 2017Orbihedron

A dark vortex in the middle of a water-filled basin emits prismatic bursts of rotating light. Akin to a radiant ergosphere surrounding a spinning black hole, Orbihedron evokes the relativistic as well as quantum interpretation of gravity – the reconciliation of which is essential for unravelling black hole behaviour and the origins of the cosmos. Descending into the eye of the vortex, a white laser beam reaches an impassible singularity that casts a whirling circular shadow on the basin’s floor. The singularity lies at the bottom of a dimple on the water’s surface, the crown of the vortex, which acts as a concave lens focussing the laser beam along the horizon of the “black hole” shadow. Light is seemingly swallowed by the black hole in accordance with general relativity, yet leaks out as quantum theory predicts.

ER = EPR | 2017ER=EPR

Two co-rotating vortices, joined together via a slender vortical bridge, lethargically drift through a body of water. Light hitting the water’s surface transforms the vortex pair into a dynamic lens, projecting two entangled black holes encircled by shimmering halos. As soon as the “wormhole” link between the black holes rips apart, the vortices immediately dissipate, analogously to the collapse of a wave function. Connecting distant black holes or two sides of the same black hole, might wormholes be an example of cosmic-scale quantum entanglement? This mind-bending conjecture of Juan Maldacena and Leonard Susskind can be traced back to two iconoclastic papers from 1935. Previously thought to be unrelated (both by their authors and numerous generations of readers), one article, the legendary EPR (penned by Einstein, Podolsky and Rosen) engendered the concept of quantum entanglement or “spooky action at a distance”; and the second text theorised Einstein-Rosen (ER) bridges, later known as wormholes. Although the widely read EPR paper has led to the second quantum revolution, currently paving the way to quantum simulation and computation, ER has enjoyed very little readership. By equating ER to EPR, the formerly irreconcilable paradigms of physics have the potential to converge: the phenomenon of gravity is imagined in a quantum mechanical context. The theory further implies, according to Maldacena, that the undivided, “reliable structure of space-time is due to the ghostly features of entanglement”.


[1] I am here extending our capacity to sense to that of the technology itself, which indeed measured the warping of spacetime. However, in interpreting gravitational waves from a human frame of reference (moving nowhere near the speed of light at which gravitational waves travel), they would seem to be spatial. In fact, the elongation of space (a longer wavelength) directly implies that time slows down (a longer wave-period), so that the two are indistinguishable.


Isabel de Sena

March 08, 2017

Terence TaoOpen thread for mathematicians on the immigration executive order

The self-chosen remit of my blog is “Updates on my research and expository papers, discussion of open problems, and other maths-related topics”.  Of the 774 posts on this blog, I estimate that about 99% of the posts indeed relate to mathematics, mathematicians, or the administration of this mathematical blog, and only about 1% are not related to mathematics or the community of mathematicians in any significant fashion.

This is not one of the 1%.

Mathematical research is clearly an international activity.  But actually a stronger claim is true: mathematical research is a transnational activity, in that the specific nationality of individual members of a research team or research community are (or should be) of no appreciable significance for the purpose of advancing mathematics.  For instance, even during the height of the Cold War, there was no movement in (say) the United States to boycott Soviet mathematicians or theorems, or to only use results from Western literature (though the latter did sometimes happen by default, due to the limited avenues of information exchange between East and West, and former did occasionally occur for political reasons, most notably with the Soviet Union preventing Gregory Margulis from traveling to receive his Fields Medal in 1978 EDIT: and also Sergei Novikov in 1970).    The national origin of even the most fundamental components of mathematics, whether it be the geometry (γεωμετρία) of the ancient Greeks, the algebra (الجبر) of the Islamic world, or the Hindu-Arabic numerals 0,1,\dots,9, are primarily of historical interest, and have only a negligible impact on the worldwide adoption of these mathematical tools. While it is true that individual mathematicians or research teams sometimes compete with each other to be the first to solve some desired problem, and that a citizen could take pride in the mathematical achievements of researchers from their country, one did not see any significant state-sponsored “space races” in which it was deemed in the national interest that a particular result ought to be proven by “our” mathematicians and not “theirs”.   Mathematical research ability is highly non-fungible, and the value added by foreign students and faculty to a mathematics department cannot be completely replaced by an equivalent amount of domestic students and faculty, no matter how large and well educated the country (though a state can certainly work at the margins to encourage and support more domestic mathematicians).  It is no coincidence that all of the top mathematics department worldwide actively recruit the best mathematicians regardless of national origin, and often retain immigration counsel to assist with situations in which these mathematicians come from a country that is currently politically disfavoured by their own.

Of course, mathematicians cannot ignore the political realities of the modern international order altogether.  Anyone who has organised an international conference or program knows that there will inevitably be visa issues to resolve because the host country makes it particularly difficult for certain nationals to attend the event.  I myself, like many other academics working long-term in the United States, have certainly experienced my own share of immigration bureaucracy, starting with various glitches in the renewal or application of my J-1 and O-1 visas, then to the lengthy vetting process for acquiring permanent residency (or “green card”) status, and finally to becoming naturalised as a US citizen (retaining dual citizenship with Australia).  Nevertheless, while the process could be slow and frustrating, there was at least an order to it.  The rules of the game were complicated, but were known in advance, and did not abruptly change in the middle of playing it (save in truly exceptional situations, such as the days after the September 11 terrorist attacks).  One just had to study the relevant visa regulations (or hire an immigration lawyer to do so), fill out the paperwork and submit to the relevant background checks, and remain in good standing until the application was approved in order to study, work, or participate in a mathematical activity held in another country.  On rare occasion, some senior university administrator may have had to contact a high-ranking government official to approve some particularly complicated application, but for the most part one could work through normal channels in order to ensure for instance that the majority of participants of a conference could actually be physically present at that conference, or that an excellent mathematician hired by unanimous consent by a mathematics department could in fact legally work in that department.

With the recent and highly publicised executive order on immigration, many of these fundamental assumptions have been seriously damaged, if not destroyed altogether.  Even if the order was withdrawn immediately, there is no longer an assurance, even for nationals not initially impacted by that order, that some similar abrupt and major change in the rules for entry to the United States could not occur, for instance for a visitor who has already gone through the lengthy visa application process and background checks, secured the appropriate visa, and is already in flight to the country.  This is already affecting upcoming or ongoing mathematical conferences or programs in the US, with many international speakers (including those from countries not directly affected by the order) now cancelling their visit, either in protest or in concern about their ability to freely enter and leave the country.  Even some conferences outside the US are affected, as some mathematicians currently in the US with a valid visa or even permanent residency are uncertain if they could ever return back to their place of work if they left the country to attend a meeting.  In the slightly longer term, it is likely that the ability of elite US institutions to attract the best students and faculty will be seriously impacted.  Again, the losses would be strongest regarding candidates that were nationals of the countries affected by the current executive order, but I fear that many other mathematicians from other countries would now be much more concerned about entering and living in the US than they would have previously.

It is still possible for this sort of long-term damage to the mathematical community (both within the US and abroad) to be reversed or at least contained, but at present there is a real risk of the damage becoming permanent.  To prevent this, it seems insufficient for me for the current order to be rescinded, as desirable as that would be; some further legislative or judicial action would be needed to begin restoring enough trust in the stability of the US immigration and visa system that the international travel that is so necessary to modern mathematical research becomes “just” a bureaucratic headache again.

Of course, the impact of this executive order is far, far broader than just its effect on mathematicians and mathematical research.  But there are countless other venues on the internet and elsewhere to discuss these other aspects (or politics in general).  (For instance, discussion of the qualifications, or lack thereof, of the current US president can be carried out at this previous post.) I would therefore like to open this post to readers to discuss the effects or potential effects of this order on the mathematical community; I particularly encourage mathematicians who have been personally affected by this order to share their experiences.  As per the rules of the blog, I request that “the discussions are kept constructive, polite, and at least tangentially relevant to the topic at hand”.

Some relevant links (please feel free to suggest more, either through comments or by email):

Filed under: math.HO, non-technical, opinion

March 06, 2017

Jordan EllenbergPeter Norvig, the meaning of polynomials, debugging as psychotherapy

I saw Peter Norvig give a great general-audience talk on AI at Berkeley when I was there last month.  A few notes from his talk.

  • “We have always prioritized fast and cheap over safety and privacy — maybe this time we can make better choices.”
  • He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial.  But the code looks weird to a human eye.  To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output.  So has it really ”learned” the polynomial?  I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input.  For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp.  I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of.  Relevant:  Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!)
  • Norvig made the point that traditional computer programs are very modular, leading to a highly successful debugging tradition of zeroing in on the precise part of the program that is doing something wrong, then fixing that part.  An algorithm or process developed by a machine, by contrast, may not have legible “parts”!  If a neural net is screwing up when classifying something, there’s no meaningful way to say “this neuron is the problem, let’s fix it.”  We’re dealing with highly non-modular complex systems which have evolved into a suboptimally functioning state, and you have to find a way to improve function which doesn’t involve taking the thing apart and replacing the broken component.  Of course, we already have a large professional community that works on exactly this problem.  They’re called therapists.  And I wonder whether the future of debugging will look a lot more like clinical psychology than it does like contemporary software engineering.