Planet Musings

March 25, 2017

Scott AaronsonDaniel Moshe Aaronson

Born Wednesday March 22, 2017, exactly at noon.  19.5 inches, 7 pounds.

I learned that Dana had gone into labor—unexpectedly early, at 37 weeks—just as I was waiting to board a redeye flight back to Austin from the It from Qubit complexity workshop at Stanford.  I made it in time for the birth with a few hours to spare.  Mother and baby appear to be in excellent health.  So far, Daniel seems to be a relatively easy baby.  Lily, his sister, is extremely excited to have a new playmate (though not one who does much yet).

I apologize that I haven’t been answering comments on the is-the-universe-a-simulation thread as promptly as I normally do.  This is why.

David Hoggstatistics questions

I spent time today writing in the method section of the Anderson et al paper. I realized in writing it that we have been thinking about our model of the color–magnitude diagram as being a prior on the distance or parallax. But it isn't really, it is a prior on the color and magnitude, which for a given noisy, observed star, becomes a prior on the parallax. We will compute these implicit priors explicitly (it is a different prior for every star) for our paper output. We have to describe this all patiently and well!

At some point during the day, Jo Bovy (Toronto) asked a very simple question about statistics: Why does re-sampling the data (given presumed-known Gaussian noise variances in the data space) and re-fitting deliver samples of the fit parameters that span the same uncertainty distribution as the likelihood function would imply? This is only true for linear fitting, of course, but why is it true (and no, I don't mean what is the mathematical formula!)? My view is that this is (sort-of) a coincidence rather than a result, especially since it (to my mind) confuses the likelihood and the posterior. But it is an oddly deep question.

David Hoggtoy problem

Lauren Anderson (Flatiron) and I met early to discuss a toy model that would elucidate our color–magnitude diagram model project. Context is: We want to write a section called “Why the heck does this work?” in our paper. We came up with a model so simple, I was able to implement it during the drinking of one coffee. It is, of course, a straight-line fit (with intrinsic width, then used to de-noise the data we started with).

March 24, 2017

Chad Orzel“CERN Invented the Web” Isn’t an Argument for Anything

I mentioned in passing in the Forbes post about science funding that I’m thoroughly sick of hearing about how the World Wide Web was invented at CERN. I got into an argument about this a while back on Twitter, too, but had to go do something else and couldn’t go into much detail. It’s probably worth explaining at greater-than-Twitter length, though, and a little too inside-baseball for Forbes, so I’ll write something about it here.

At its core, the “CERN invented WWW” argument is a “Basic research pays off in unexpected ways” argument, and in that sense, it’s fine. The problem is, it’s not anything more than that– its fine as an argument for funding basic research as a general matter, but it’s not an argument for anything in particular.

What bugs me is now when it’s used as a general “Basic research is good” argument, but that it’s used as a catch-all argument for giving particle physicists whatever they want for whatever they decide they want to do next. It’s used to steamroll past a number of other, perfectly valid, arguments about funding priorities within the general area of basic physics research, and that gets really tiresome.

Inventing WWW is great, but it’s not an argument for particle physics in particular, precisely because it was a weird spin-off that nobody expected, or knew what to do with. In fact, you can argue that much of the impact of the Web was enabled precisely because CERN didn’t really understand it, and Time Berners-Lee just went and did it, and gave the whole thing away. You can easily imagine a different arrangement where Web-like network technologies were developed by people who better understood the implications, and operated in a more proprietary way from the start.

As an argument for funding particle physics in particular, though, the argument undermines itself precisely due to the chance nature of the discovery. Past performance does not guarantee future results, and the fact that CERN stumbled into a transformative discovery once doesn’t mean you can expect anything remotely similar to happen again.

The success of the Web is all too often invoked as a way around a very different funding argument, though, where it doesn’t really apply, which is an argument about the relative importance of Big Science. That is, a side spin-off like the Web is a great argument for funding basic science in general, but it doesn’t say anything about the relative merits of spending a billion dollars on building a next-generation particle collider, as opposed to funding a thousand million-dollar grants for smaller projects in less abstract areas of physics.

There are arguments that go both ways on that, and none of them have anything to do with the Web. On the Big Science side, you can argue that working at an extremely large scale necessarily involves pushing the limits of engineering and networking and working in those big limits might offer greater opportunities for discovery. On the small-science side, you can argue that a greater diversity of projects and researchers offers more chances for the unexpected to happen compared to the same investment in a single enormous project.

I’m not sure what the right answer to that question is– given my background, I’m naturally inclined toward the “lots of small projects (in subfields like the one I work in)” model, but I can see some merit to the arguments about working at scale. I think it is a legitimate question, though, one that needs to be considered seriously, and not one that can be headed off by using WWW as a Get Funding Forever trump card for particle physics.

Chad OrzelThe Central Problem of Academic Hiring

A bunch of people in my social-media feeds are sharing this post by Alana Cattapan titled Time-sucking academic job applications don’t know enormity of what they ask. It describes an ad asking for two sample course syllabi “not merely syllabi for courses previously taught — but rather syllabi for specific courses in the hiring department,” and expresses outrage at the imposition on the time of people applying for the job. She argues that the burden falls particularly heavily on groups that are already disadvantaged, such as people currently in contingent faculty positions.

It’s a good argument, as far as it goes, and as someone who has been on the hiring side of more faculty searches than I care to think about, the thought of having to review sample syllabi for every applicant in a pool is… not exactly an appealing prospect. At the same time, though, I can see how a hiring committee would end up implementing this for the best of reasons.

Many of the standard materials used in academic hiring are famously rife with biases– letters of reference being the most obviously problematic, but even the use of CV’s can create issues, as it lends itself to paper-counting and lazy credentialism (“They’re from Bigname University, they must be good…”). Given these well-known problems, I can see a chain of reasoning leading to the sample-syllabus request as a measure to help avoid biases in the hiring process. A sample syllabus is much more concrete than the usual “teaching philosophy” (which tends to be met with boilerplate piffle), particularly if it’s for a specific course familiar to the members of the hiring committee. It offers a relatively objective way to sort out who really understands what’s involved in teaching, that doesn’t rely on name recognition or personal networking. I can even imagine some faculty earnestly arguing that this would give an advantage to people in contingent-faculty jobs, who have lots of teaching experience and would thus be better able to craft a good syllabus than some wet-behind-the-ears grad student from a prestigious university.

And yet, Cattapan’s “too much burden on the applicant” argument is a good one. Which is just another reminder that academic hiring is a lot like Churchill’s famous quip about democracy: whatever system you’re using is the worst possible one, except for all the others.

And, like most discussions of academic hiring, this is frustrating because it dances around what’s really the central problem with academic hiring, namely that the job market for faculty positions absolutely sucks, and has for decades. A single tenure-track opening will generally draw triple-digit numbers of applications, and maybe 40% of those will be obviously unqualified. Which leaves the people doing the hiring with literally dozens of applications that they have to cut down somehow. It’s a process that will necessarily leave large numbers of perfectly well qualified people shut out of jobs through no particular fault of their own, just because there aren’t nearly enough jobs to go around.

Given that market situation, most arguments about why this or that method of winnowing the field of candidates is Bad feel frustratingly pointless. We can drop some measures as too burdensome for applicants, and others as too riddled with bias, but none of that changes the fact that somehow, 149 of 150 applicants need to be disappointed at the end of the process. And it’s never really clear what should replace those problematic methods that would do a substantially better job of weeding out 99.3% of the applicants without introducing new problems.

At some level the fairest thing to do would be to make the easy cut of removing the obviously unqualified and then using a random number generator to pick who gets invited to campus for interviews. I doubt that would make anybody any happier, though.

Don’t get me wrong, this isn’t a throw-up-your-hands anti-measurement argument. I’d love it if somebody could find a relatively objective and reasonably efficient means of picking job candidates out of a large pool, and I certainly think it’s worth exploring new and different ways of measuring academic “quality,” like the sort of thing Bee at Backreaction talks about. (I’d settle for more essays and blog posts saying “This is what you should do,” rather than “This is what you shouldn’t do”…) But it’s also important to note that all of these things are small perturbations to the real central problem of academic hiring, namely that there are too few jobs for too many applicants.

March 22, 2017

BackreactionAcademia is fucked-up. So why isn’t anyone doing something about it?

A week or so ago, a list of perverse incentives in academia made rounds. It offers examples like “rewarding an increased number of citations” that – instead of encouraging work of high quality and impact – results in inflated citation lists, an academic tit-for-tat which has become standard practice. Likewise, rewarding a high number of publications doesn’t produce more good science, but merely

Scott AaronsonYour yearly dose of is-the-universe-a-simulation

Yesterday Ryan Mandelbaum, at Gizmodo, posted a decidedly tongue-in-cheek piece about whether or not the universe is a computer simulation.  (The piece was filed under the category “LOL.”)

The immediate impetus for Mandelbaum’s piece was a blog post by Sabine Hossenfelder, a physicist who will likely be familiar to regulars here in the nerdosphere.  In her post, Sabine vents about the simulation speculations of philosophers like Nick Bostrom.  She writes:

Proclaiming that “the programmer did it” doesn’t only not explain anything – it teleports us back to the age of mythology. The simulation hypothesis annoys me because it intrudes on the terrain of physicists. It’s a bold claim about the laws of nature that however doesn’t pay any attention to what we know about the laws of nature.

After hammering home that point, Sabine goes further, and says that the simulation hypothesis is almost ruled out, by (for example) the fact that our universe is Lorentz-invariant, and a simulation of our world by a discrete lattice of bits won’t reproduce Lorentz-invariance or other continuous symmetries.

In writing his post, Ryan Mandelbaum interviewed two people: Sabine and me.

I basically told Ryan that I agree with Sabine insofar as she argues that the simulation hypothesis is lazy—that it doesn’t pay its rent by doing real explanatory work, doesn’t even engage much with any of the deep things we’ve learned about the physical world—and disagree insofar as she argues that the simulation hypothesis faces some special difficulty because of Lorentz-invariance or other continuous phenomena in known physics.  In short: blame it for being unfalsifiable rather than for being falsified!

Indeed, to whatever extent we believe the Bekenstein bound—and even more pointedly, to whatever extent we think the AdS/CFT correspondence says something about reality—we believe that in quantum gravity, any bounded physical system (with a short-wavelength cutoff, yada yada) lives in a Hilbert space of a finite number of qubits, perhaps ~1069 qubits per square meter of surface area.  And as a corollary, if the cosmological constant is indeed constant (so that galaxies more than ~20 billion light years away are receding from us faster than light), then our entire observable universe can be described as a system of ~10122 qubits.  The qubits would in some sense be the fundamental reality, from which Lorentz-invariant spacetime and all the rest would need to be recovered as low-energy effective descriptions.  (I hasten to add: there’s of course nothing special about qubits here, any more than there is about bits in classical computation, compared to some other unit of information—nothing that says the Hilbert space dimension has to be a power of 2 or anything silly like that.)  Anyway, this would mean that our observable universe could be simulated by a quantum computer—or even for that matter by a classical computer, to high precision, using a mere ~210^122 time steps.

Sabine might respond that AdS/CFT and other quantum gravity ideas are mere theoretical speculations, not solid and established like special relativity.  But crucially, if you believe that the observable universe couldn’t be simulated by a computer even in principle—that it has no mapping to any system of bits or qubits—then at some point the speculative shoe shifts to the other foot.  The question becomes: do you reject the Church-Turing Thesis?  Or, what amounts to the same thing: do you believe, like Roger Penrose, that it’s possible to build devices in nature that solve the halting problem or other uncomputable problems?  If so, how?  But if not, then how exactly does the universe avoid being computational, in the broad sense of the term?

I’d write more, but by coincidence, right now I’m at an It from Qubit meeting at Stanford, where everyone is talking about how to map quantum theories of gravity to quantum circuits acting on finite sets of qubits, and the questions in quantum circuit complexity that are thereby raised.  It’s tremendously exciting—the mixture of attendees is among the most stimulating I’ve ever encountered, from Lenny Susskind and Don Page and Daniel Harlow to Umesh Vazirani and Dorit Aharonov and Mario Szegedy to Google’s Sergey Brin.  But it should surprise no one that, amid all the discussion of computation and fundamental physics, the question of whether the universe “really” “is” a simulation has barely come up.  Why would it, when there are so many more fruitful things to ask?  All I can say with confidence is that, if our world is a simulation, then whoever is simulating it (God, or a bored teenager in the metaverse) seems to have a clear preference for the 2-norm over the 1-norm, and for the complex numbers over the reals.

Doug NatelsonHysteresis in science and engineering policy

I have tried hard to avoid political tracts on this blog, because I don't think that's why people necessarily want to read here.  Political flamewars in the comments or loss of readers over differences of opinion are not outcomes I want.  The recent proposed budget from the White House, however, inspires some observations.  (I know the President's suggested budget is only the very beginning of the budgetary process, but it does tell you something about the administration priorities.)

The second law of thermodynamics tell us that some macroscopic processes tend to run only one direction.  It's easier to disperse a drop of ink in a glass of water than to somehow reconstitute the drop of ink once the glass has been stirred.  

In general, the response of a system to some input (say the response of a ferromagnet to an applied magnetic field, or the deformation of a blob of silly putty in response to an applied stress) can depend on the history of the material.  Taking the input from A to B and back to A doesn't necessarily return the system to its original state.  Cycling the input and ending up with a looping trajectory of the system in response because of that history dependence is called hysteresis.  This happens because there is some inherent time scale for the system to respond to inputs, and if it can't keep up, there is lag.

The proposed budget would make sweeping changes to programs and efforts that, in some cases, took decades to put in place.   Drastically reducing the size and scope of federal agencies is not something that can simply be undone by the next Congress or the next President.  Cutting 20% of NIH or 17% of DOE Office of Science would have ripple effects for many years, and anyone who has worked in a large institution knows that big cuts are almost never restored.   Expertise at EPA and NOAA can't just be rebuilt once eliminated.  

People can have legitimate discussions and differences of opinion about the role of the government and what it should be funding.  However, everyone should recognize that these are serious decisions, many of which are irreversible in practical terms.   Acting otherwise is irresponsible and foolish.

Terence TaoYves Meyer wins the 2017 Abel Prize

Just a short post to note that Norwegian Academy of Science and Letters has just announced that the 2017 Abel prize has been awarded to Yves Meyer, “for his pivotal role in the development of the mathematical theory of wavelets”.  The actual prize ceremony will be at Oslo in May.

I am actually in Oslo myself currently, having just presented Meyer’s work at the announcement ceremony (and also having written a brief description of some of his work).  The Abel prize has a somewhat unintuitive (and occasionally misunderstood) arrangement in which the presenter of the work of the prize is selected independently of the winner of the prize (I think in part so that the choice of presenter gives no clues as to the identity of the laureate).  In particular, like other presenters before me (which in recent years have included Timothy Gowers, Jordan Ellenberg, and Alex Bellos), I agreed to present the laureate’s work before knowing who the laureate was!  But in this case the task was very easy, because Meyer’s areas of (both pure and applied) harmonic analysis and PDE fell rather squarely within my own area of expertise.  (I had previously written about some other work of Meyer in this blog post.)  Indeed I had learned about Meyer’s wavelet constructions as a graduate student while taking a course from Ingrid Daubechies.   Daubechies also made extremely important contributions to the theory of wavelets, but my understanding is that due to a conflict of interest arising from Daubechies’ presidency of the International Mathematical Union (which nominates members of the Abel prize committee) from 2011 to 2014, she was not eligible for the prize this year, and so I do not think this prize should be necessarily construed as a judgement on the relative contributions of Meyer and Daubechies to this field.  (In any case I fully agree with the Abel prize committee’s citation of Meyer’s pivotal role in the development of the theory of wavelets.)


Filed under: math.CA, math.IT, non-technical Tagged: Abel prize, Yves Meyer

n-Category Café Functional Equations VII: The p-Norms

The pp-norms have a nice multiplicativity property:

(Ax,Ay,Az,Bx,By,Bz) p=(A,B) p(x,y,z) p \|(A x, A y, A z, B x, B y, B z)\|_p = \|(A, B)\|_p \, \|(x, y, z)\|_p

for all A,B,x,y,zA, B, x, y, z \in \mathbb{R} — and similarly, of course, for any numbers of arguments.

Guillaume Aubrun and Ion Nechita showed that this condition completely characterizes the pp-norms. In other words, any system of norms that’s multiplicative in this sense must be equal to p\|\cdot\|_p for some p[1,]p \in [1, \infty]. And the amazing thing is, to prove this, they used some nontrivial probability theory.

All this is explained in this week’s functional equations notes, which start on page 26 here.

March 21, 2017

n-Category Café On the Operads of J. P. May

Guest post by Simon Cho

We continue the Kan Extension Seminar II with Max Kelly’s On the operads of J. P. May. As we will see, the main message of the paper is that (symmetric) operads enriched in a suitably nice category 𝒱\mathcal{V} arise naturally as monoids for a “substitution product” in the monoidal category [P,𝒱][\mathbf{P}, \mathcal{V}] (where P\mathbf{P} is a category that keeps track of the symmetry). Before we begin, I want to thank the organizers and participants of the Kan Extension Seminar (II) for the opportunity to read and discuss these nice papers with them.

Some time ago, in her excellent post about Hyland and Power’s paper, Evangelia described what Lawvere theories are about. We might think of Lawvere theories as a way to frame algebraic structure by stratifying the different components of an algebraic structure into roughly three ascending levels of specificity: the product structure, the specific algebraic operations (meaning, other than projections, etc.), and the models of that algebraic structure. These structures are manifested categorically through (respectively) the category 0 op\aleph_0^{\text{op}} of finite sets and (the duals of) maps between them, a category \mathcal{L} with finite products that has the same objects as 0\aleph_0, and some other category 𝒞\mathcal{C} with finite products. Then a Lawvere theory is just a strict product preserving functor I: 0 opI: \aleph_0^{\text{op}} \rightarrow \mathcal{L}, and a model or interpretation of a Lawvere theory is a (non-strict) product preserving functor M:𝒞M: \mathcal{L} \rightarrow \mathcal{C}.

Thus 0 op\aleph_0^{\text{op}} specifies the bare product structure (with the attendant projections, etc.) which gives us a notion of what it means to be “nn-ary” for some given nn; II then transfers this notion of arity to the category \mathcal{L}, whose shape describes the specific algebraic structure in question (think of the diagrams one uses to categorically define the group axioms, for example); MM then gives a particular manifestation of the algebraic structure \mathcal{L} on an object MI(1)𝒞M \circ I (1) \in \mathcal{C}.

The reason I bring this up is that I like to think of operads as what results when we make the following change of perspective on Lawvere theories: whereas models of Lawvere theories are essentially given by specifying a “ground set of elements” A𝒞A \in \mathcal{C} and taking as the nn-ary operations morphisms A nAA^n \rightarrow A, we now consider a hypothetical category whose (nn-indexed) objects themselves are the homsets 𝒞(A n,A)\mathcal{C}(A^n, A), along with some machinery that keeps track of what happens when we permute the argument slots.

Cosmos structure on [P,𝒱][\mathbf{P}, \mathcal{V}]

More precisely, consider the category P\mathbf{P} with objects the natural numbers, and morphisms P(m,n)\mathbf{P}(m,n) given by P(n,n)=Σ n\mathbf{P}(n,n) = \Sigma_n (the symmetric group on nn letters) and P(m,n)=\mathbf{P}(m,n) = \emptyset for mnm \neq n.

Let 𝒱\mathcal{V} be a cosmos, that is, a complete and cocomplete symmetric monoidal closed category with identity II and internal hom [,][-,-].

Fix A𝒱A \in \mathcal{V}. The assignment n[A n,A]n \mapsto [A^{\otimes n}, A] defines a functor P𝒱\mathbf{P} \rightarrow \mathcal{V} (where functoriality in P\mathbf{P} comes from the symmetry of the tensor product in 𝒱\mathcal{V}). This turns out to be a typical example of a 𝒱\mathcal{V}-operad, which we call the “endomorphism operad” on AA. In order to actually define what an operad is, we need to lay some groundwork.

(A point of notation: we will henceforth denote A nA^{\otimes n} by A nA^n.)

We’ll need the fact that the functor 𝒱(I,):𝒱textbfSets\mathcal{V}(I, -): \mathcal{V} \rightarrow \textbf{Sets} has a left adjoint FF given by FX= XIFX = \coprod_X I. FF takes the product to the tensor product (since it’s a left adjoint and tensor products in 𝒱\mathcal{V} distributes over coproducts), and in fact we can assume that it does so strictly. Henceforth for XtextbfSetsX \in \textbf{Sets} and A𝒱A \in \mathcal{V} we write XAX \otimes A to actually mean FXAFX \otimes A.

We then get a cosmos structure on \mathcal{F} which is given by Day convolution: for T,ST,S \in \mathcal{F} we have TS= m,nP(m+n,)TmSnT \otimes S = \int^{m,n} \mathbf{P}(m+n, - ) \otimes Tm \otimes Sn Since we are thinking of a given TT \in \mathcal{F} as a collection of operations (indexed by arity) on which we can act by permuting the argument slots, we can think of (TS)k(T \otimes S) k as a collection of the kk-ary operations that we obtain by freely permuting mm argument slots of type TT and nn argument slots of type SS (where m,nm,n range over all pairs such that m+n=km+n = k), modulo respecting the previously given actions of Σ m\Sigma_m (resp. Σ n\Sigma_n) on TmTm (resp. SnSn).

The identity is then given by P(0,)I\mathbf{P}(0,-) \otimes I.

Associativity and symmetry of the cosmos structure. Now let T,S,RT,S, R \in \mathcal{F}. If we unpack the definition, draw out some diagrams, and apply some abstract nonsense, we find that T(SR)(TS)R m+n+kP(m+n+k,)TmSnRkT \otimes (S \otimes R) \simeq (T \otimes S) \otimes R \simeq \int^{m+n+k} \mathbf{P}(m+n+k, - ) \otimes Tm \otimes Sn \otimes Rk which we can again assume are actually equalities.

Before we address the symmetry of this monoidal structure, we make a technical point. P\mathbf{P} itself has a symmetric monoidal structure, given by addition. Thus for n 1,,n mPn_1, \dots, n_m \in \mathbf{P} we have n 1++n mPn_1 + \cdots + n_m \in \mathbf{P}. There is evidently an action of Σ m\Sigma_m on this term, which we require to be in the “wrong” direction, so that ξΣ m\xi \in \Sigma_m induces ξ:n ξ1++n ξmn 1++n m\langle \xi \rangle: n_{\xi 1} + \cdots + n_{\xi m} \rightarrow n_1 + \cdots + n_m rather than the other way around.

(However, for the symmetry of the monoidal structure on 𝒱\mathcal{V}, given a product A 1A mA_1 \otimes \cdots \otimes A_m we require that the action of Σ m\Sigma_m on this term is in the “correct” direction, i.e. ξΣ m\xi \in \Sigma_m induces ξ:A 1A mA ξ1A ξm\langle \xi \rangle: A_1 \otimes \cdots \otimes A_m \rightarrow A_{\xi 1} \otimes \cdots \otimes A_{\xi m}.)

We thus have:

T 1T m = n 1,,n mP(n 1+n m,)T 1n 1T mn m ξ P(ξ,)ξ T ξ1T ξm = n 1,,n mP(n ξ1+n ξm,)T ξ1n ξ1T ξmn ξm \begin{matrix} T_1 \otimes \cdots \otimes T_m &=& \int^{n_1, \dots, n_m} \mathbf{P}(n_1 + \cdots n_m, - ) \otimes T_1 n_1 \otimes \cdots \otimes T_{m} n_m\\ &&\\ {\langle \xi \rangle} \Big \downarrow && \Big \downarrow {\mathbf{P}(\langle \xi \rangle, -) \otimes \langle \xi \rangle}\\ &&\\ T_{\xi 1} \otimes \cdots \otimes T_{\xi m} &=& \int^{n_1, \dots, n_m} \mathbf{P}(n_{\xi 1} + \cdots n_{\xi m}, - ) \otimes T_{\xi 1} n_{\xi 1} \otimes \cdots \otimes T_{\xi m} n_{\xi m}\\ \end{matrix}

Now ξ:n ξ1++n ξmn 1++n m\langle \xi \rangle: n_{\xi 1} + \cdots + n_{\xi m} \rightarrow n_1 + \cdots + n_m extends to an action ξ:T 1T mT ξ1T ξm\langle \xi \rangle: T_1 \otimes \cdots \otimes T_m \rightarrow T_{\xi 1} \otimes \cdots \otimes T_{\xi m} as we saw previously. Therefore we now have a functor P op×\mathbf{P}^{\text{op}} \times \mathcal{F} \rightarrow \mathcal{F} given by (m,T)T m(m, T) \mapsto T^m, a fact which we will later use.

\mathcal{F} as a 𝒱\mathcal{V}-category. There is a way in which we can regard 𝒱\mathcal{V} as a full coreflective subcategory of \mathcal{F}: consider the functor ϕ:𝒱\phi: \mathcal{F} \rightarrow \mathcal{V} given by ϕT=T0\phi T = T0. This has a right adjoint ψ:𝒱\psi: \mathcal{V} \rightarrow \mathcal{F} given by ψA=P(0,)A\psi A = \mathbf{P}(0, -) \otimes A.

The inclusion ψ\psi preserves all of the relevant monoidal structure, so we are justified in considering A𝒱A \in \mathcal{V} as either an object of 𝒱\mathcal{V} or of \mathcal{F} (via the inclusion ψ\psi). With this notation we can write, for A𝒱A \in \mathcal{V} and T,ST,S \in \mathcal{F}: (AT,S)𝒱(A,[T,S])\mathcal{F}(A \otimes T, S) \simeq \mathcal{V}(A, [T,S]) If T,ST, S \in \mathcal{F} then their \mathcal{F}-valued hom is given by [[T,S]][[T,S]], where for kPk \in \mathbf{P} we have [[T,S]]k= n[Tn,S(n+k)][[T,S]]k = \int_n [Tn, S(n+k)] and their 𝒱\mathcal{V}-valued hom, which makes \mathcal{F} into a 𝒱\mathcal{V}-category, is given by [T,S]=ϕ[[T,S]]= n[Tn,Sn][T,S] = \phi [[T,S]] = \int_n [Tn, Sn]

The substitution product

Let us return to our motivating example of the endomorphism operad (which we denote by {A,A}\{A,A\}) on AA, for a fixed A𝒱A \in \mathcal{V}. For now it’s just an object {A,A}\{A, A\} \in \mathcal{F}; but it contains more structure than we’re currently using. Namely, for each m,n 1,,n mPm, n_1, \dots, n_m \in \mathbf{P} we can give a morphism [A m,A]([A n 1,A][A n m,A])[A n 1++n m,A][A^m, A] \otimes \left ( [A^{n_1}, A] \otimes \cdots \otimes [A^{n_m}, A] \right ) \rightarrow [A^{n_1 + \cdots + n_m}, A] coming from evaluation (see the section below about the little nn-disks operad for details). We would like a general framework for expressing such a notion of composing operations.

Definition of an operad. Recall from the previous section that, for given TT \in \mathcal{F}, we can consider nT nn \mapsto T^n as a functor P op\mathbf{P}^{\text{op}} \rightarrow \mathcal{F}. We can thus define a (non-symmetric!) product TS= nTnS nT \circ S = \int^n Tn \otimes S^n. It is easy to check that if S𝒱S \in \mathcal{V} then in fact TS𝒱T \circ S \in \mathcal{V}, so that \circ can be considered as a functor either of type ×\mathcal{F} \times \mathcal{F} \rightarrow \mathcal{F} or of type ×𝒱𝒱\mathcal{F} \times \mathcal{V} \rightarrow \mathcal{V}.

The clarity with which Kelly’s paper demonstrates the various important properties of this substitution product would be difficult for me to improve upon, so I simply list here the punchlines, and refer the reader to the original paper for their proofs:

  • For T,ST,S \in \mathcal{F} and nPn \in \mathbf{P}, we have (TS) nT nS(T \circ S)^n \simeq T^n \circ S which is natural in T,S,nT, S, n. Using this and a Fubini style argument we get associativity of \circ.

  • J=P(1,)IJ = \mathbf{P}(1, - )\otimes I is the identity for \circ.

  • For SS \in \mathcal{F}, S:- \circ S: \mathcal{F} \rightarrow \mathcal{F} has the right adjoint {S,}\{S, -\} given by {S,R}m=[S m,R]\{S, R\}m = [S^m, R]. Moreover if A𝒱A \in \mathcal{V} then we in fact have 𝒱(TA,B)(T,{A,B})\mathcal{V}(T \circ A, B) \simeq \mathcal{F} (T, \{A, B\}).

We can now define an operad as a monoid for \circ, i.e. some TT \in \mathcal{F} equipped with μ:TTT\mu: T \circ T \rightarrow T and η:JT\eta: J \rightarrow T satisfying the monoid axioms. Operad morphisms are morphisms TT T \rightarrow T^\prime that respect μ\mu and η\eta.

{A,A}\{A, A\} as an operad. Once again we turn back to the example of {A,A}\{A, A\} \in \mathcal{F}. Note that our choice to denote the endomorphism operad (n[A n,A])(n \mapsto [A^n, A]) by {A,A}\{A, A\} agrees with the construction of {A,}\{A, -\} as the right adjoint to A- \circ A.

There is an evident evaluation map {A,A}AeA\{A, A\} \circ A \xrightarrow{e} A, so that we have the composition {A,A}{A,A}A1e{A,A}AeA\{A, A\} \circ \{A, A\} \circ A \xrightarrow{1 \circ e} \{A,A\} \circ A \xrightarrow{e} A which by adjunction gives us μ:{A,A}{A,A}{A,A}\mu:\{A,A\} \circ \{A,A\} \rightarrow \{A,A\} which we take as our monoid multiplication. Similarly JAAJ \circ A \simeq A corresponds by adjunction to η:J{A,A}\eta: J \rightarrow \{A, A\}. We thus have that {A,A}\{A,A\} is an operad. In fact it is the “universal” operad, in the following sense:

Every operad TT \in \mathcal{F} gives a monad TT \circ - on \mathcal{F}, or on 𝒱\mathcal{V} via restriction. Given AA \in \mathcal{F}, algebra structures h :TAAh^{\prime}: T \circ A \rightarrow A for the monad TT \circ - on AA correspond precisely to operad morphisms h:T{A,A}h: T \rightarrow \{A,A\}. In this case we say that hh gives an algebra structure on AA for the operad TT.

The little nn-disks operad

There are some other aspects of operads that the paper looks at, but for this post I will abuse artistic license to talk about something else that isn’t exactly in the paper (although it is indirectly referenced): May’s little nn-disks operad. For a great introduction to the following material I recommend Emily Riehl’s notes on Kathryn Hess’s two-part (I,II) talk on operads in algebraic topology.

Let 𝒱=(Top nice,×,{*})\mathcal{V} = (\mathbf{Top}_{\text{nice}}, \times, \{*\}) where Top nice\mathbf{Top}_{\text{nice}} is one’s favorite cartesian closed category of topological spaces, with ×\times the appropriate product in this category.

Fix some nn \in \mathbb{N}. For kPk \in \mathbf{P}, we let d n(k)=sEmb( kD n,D n)d_n(k) = \text{sEmb}(\coprod_{k} D^n, D^n), the space of standard embeddings of kk copies of the closed unit nn-disk in n\mathbb{R}^n into the closed unit nn-disk in n\mathbb{R}^n. By the space of standard embeddings we mean the subspace of the mapping space consisting of the maps which restrict on each summand to affine maps xλx+cx \mapsto \lambda x + c with 0λ10 \leq \lambda \leq 1.

Given ξP(k,k)\xi \in \mathbf{P}(k, k) we have the evident action ξ:sEmb( kD n,D n)sEmb( ξkD n,D n)\langle \xi \rangle: \text{sEmb}(\coprod_{k} D^n, D^n) \rightarrow \text{sEmb}(\coprod_{\xi k} D^n, D^n), which gives us a functor d n:PTop niced_n: \mathbf{P} \rightarrow \mathbf{Top}_{\text{nice}}, so d nd_n \in \mathcal{F}.

Fix some k,lPk,l \in \mathbf{P}; then d n k(l)= m 1,,m kP(m 1++m k,l)d n(m 1)d n(m k)d_n^k(l) = \int^{m_1, \dots, m_k} \mathbf{P}(m_1 + \cdots + m_k, l) \otimes d_n(m_1) \otimes \cdots \otimes d_n(m_k), which we can roughly think of as all the different ways we can partition a total of ll disks into kk blocks, with the i thi^{\text{th}} block having m im_i disks, and then map each block of m im_i disks into a single disk, all the while being able to permute the ll disks amongst themselves (without necessarily having to respect the partitions).

We then get μ:d nd nd n\mu: d_n \circ d_n \rightarrow d_n by composing the disk embeddings. More precisely, for each ll we get a morphism μ l:(d n(k)d n k)ld n(k)(d n k(l))d n(l)\mu_l: (d_n(k) \otimes d_n^k)l \simeq d_n(k) \otimes (d_n^k(l)) \rightarrow d_n(l) from the following considerations:

First we note that d n(k)d n(m 1)d n(m k) =sEmb( kD n,D n)×( 1iksEmb( m iD n,D n)) sEmb(D n,D n) k×( 1iksEmb( m iD n,D n)) 1ik(sEmb( m iD n,D n)×sEmb(D n,D n)). \begin{aligned} d_n(k) \otimes d_n(m_1) \otimes \cdots \otimes d_n(m_k) &= \text{sEmb}(\coprod_k D^n, D^n) \times (\prod_{1 \leq i \leq k} \text{sEmb}(\coprod_{m_i} D^n, D^n))\\ &\simeq \text{sEmb}(D^n, D^n)^k \times (\prod_{1 \leq i \leq k} \text{sEmb}(\coprod_{m_i} D^n, D^n))\\ &\simeq \prod_{1 \leq i \leq k} (\text{sEmb}(\coprod_{m_i} D^n, D^n) \times \text{sEmb}(D^n, D^n)). \end{aligned} Now for each ii there is a map sEmb( m iD n,D n)×sEmb(D n,D n)sEmb( m iD n,D n)\text{sEmb}(\coprod_{m_i} D^n, D^n) \times \text{sEmb}(D^n, D^n) \rightarrow \text{sEmb}(\coprod_{m_i}D^n, D^n) induced from iterated evaluation by adjunction. Then by the above, this gives a morphism d n(k)d n(m 1)d n(m k) 1iksEmb( m iD n,D n) sEmb( m 1++m kD n,D n) =d n(m 1++m k). \begin{aligned} d_n(k) \otimes d_n(m_1) \otimes \cdots \otimes d_n(m_k) &\rightarrow \prod_{1 \leq i \leq k} \text{sEmb} (\coprod_{m_i} D^n, D^n)\\ &\simeq \text{sEmb}(\coprod_{m_1 + \cdots + m_k} D^n, D^n)\\ &= d_n(m_1 + \cdots + m_k). \end{aligned}

A big reason that the little nn-disks operad is relevant to algebraic topology is that there is a big theorem stating that a space is weakly equivalent to an nn-fold loop space if and only if it’s an algebra for d nd_n.

One direction is straightforward: consider a space AA and its nn-fold loop space Ω nA\Omega^n A. Given an element of d n(k)d_n (k) and kk choices of “little maps” (D n,D n)(A,*)(D^n, \partial D^n) \rightarrow (A, \ast), we can stitch together these little maps into one large map (D n,D n)(A,*)(D^n, \partial D^n) \rightarrow (A,\ast) according to the instructions specified by the chosen element of d n(k)d_n(k) (where we map everything in the complement of the kk little disks to the basepoint in AA). Doing this for each kk, we get an operad morphism d n{Ω nA,Ω nA}d_n \rightarrow \{\Omega^n A, \Omega^n A\}.

The other direction is much harder, and Maru gave an absolutely fantastic sketch of the basic story in our group discussions, which I hope she will post in the comments; I refrain from including it in the body of this post, partially for reasons of length and partially because I would just end up repeating verbatim what she said in the discussion.

March 20, 2017

David Hogga prior on the CMD isn't a prior on distance, exactly

Today my research time was spent writing in the paper by Lauren Anderson (Flatiron) about the TGAS color–magnitude diagram. I think of it as being a probabilistic inference in which we put a prior on stellar distances and then infer the distance. But that isn't correct! It is an inference in which we put a prior on the color–magnitude diagram, and then, given noisy color and (apparent) magnitude information, this turns into an (effective, implicit) prior on distance. This Duh! moment led to some changes to the method section!

David Hoggwhat's in an astronomical catalog?

The stars group meeting today wandered into dangerous territory, because it got me on my soap box! The points of discussion were: Are there biases in the Gaia TGAS parallaxes? and How could we use proper motions responsibly to constrain stellar parallaxes? Keith Hawkins (Columbia) is working a bit on the former, and I am thinking of writing something short with Boris Leistedt (NYU) on the latter.

The reason it got me on my soap-box is a huge set of issues about whether catalogs should deliver likelihood or posterior information. My view—and (I think) the view of the Gaia DPAC—is that the TGAS measurements and uncertainties are parameters of a parameterized model of the likelihood function. They are not parameters of a posterior, nor the output of any Bayesian inference. If they were outputs of a Bayesian inference, they could not be used in hierarchical models or other kinds of subsequent inferences without a factoring out of the Gaia-team prior.

This view (and this issue) has implications for what we are doing with our (Liestedt, Hawkins, Anderson) models of the color–magnitude diagram. If we output posterior information, we have to also output prior information for our stuff to be used by normals, down-stream. Even with such output, the results are hard to use correctly. We have various papers, but they are hard to read!

One comment is that, if the Gaia TGAS contains likelihood information, then the right way to consider its possible biases or systematic errors is to build a better model of the likelihood function, given their outputs. That is, the systematics should be created to be adjustments to the likelihood function, not posterior outputs, if at all possible.

Another comment is that negative parallaxes make sense for a likelihood function, but not (really) for a posterior pdf. Usually a sensible prior will rule out negative parallaxes! But a sensible likelihood function will permit them. The fact that the Gaia catalogs will have negative parallaxes is related to the fact that it is better to give likelihood information. This all has huge implications for people (like me, like Portillo at Harvard, like Lang at Toronto) who are thinking about making probabilistic catalogs. It's a big, subtle, and complex deal.

March 19, 2017

John PreskillLocal operations and Chinese communications

The workshop spotlighted entanglement. It began in Shanghai, paused as participants hopped the Taiwan Strait, and resumed in Taipei. We discussed quantum operations and chaos, thermodynamics and field theory.1 I planned to return from Taipei to Shanghai to Los Angeles.

Quantum thermodynamicist Nelly Ng and I drove to the Taipei airport early. News from Air China curtailed our self-congratulations: China’s military was running an operation near Shanghai. Commercial planes couldn’t land. I’d miss my flight to LA.

nelly-and-me

Two quantum thermodynamicists in Shanghai

An operation?

Quantum information theorists use a mindset called operationalism. We envision experimentalists in separate labs. Call the experimentalists Alice, Bob, and Eve (ABE). We tell stories about ABE to formulate and analyze problems. Which quantum states do ABE prepare? How do ABE evolve, or manipulate, the states? Which measurements do ABE perform? Do they communicate about the measurements’ outcomes?

Operationalism concretizes ideas. The outlook checks us from drifting into philosophy and into abstractions difficult to apply physics tools to.2 Operationalism infuses our language, our framing of problems, and our mathematical proofs.

Experimentalists can perform some operations more easily than others. Suppose that Alice controls the magnets, lasers, and photodetectors in her lab; Bob controls the equipment in his; and Eve controls the equipment in hers. Each experimentalist can perform local operations (LO). Suppose that Alice, Bob, and Eve can talk on the phone and send emails. They exchange classical communications (CC).

You can’t generate entanglement using LOCC. Entanglement consists of strong correlations that quantum systems can share and that classical systems can’t. A quantum system in Alice’s lab can hold more information about a quantum system of Bob’s than any classical system could. We must create and control entanglement to operate quantum computers. Creating and controlling entanglement poses challenges. Hence quantum information scientists often model easy-to-perform operations with LOCC.

Suppose that some experimentalist Charlie loans entangled quantum systems to Alice, Bob, and Eve. How efficiently can ABE compute some quantity, exchange quantum messages, or perform other information-processing tasks, using that entanglement? Such questions underlie quantum information theory.

ca

Taipei’s night market. Or Caltech’s neighborhood?

Local operations.

Nelly and I performed those, trying to finagle me to LA. I inquired at Air China’s check-in desk in English. Nelly inquired in Mandarin. An employee smiled sadly at each of us.

We branched out into classical communications. I called Expedia (“No, I do not want to fly to Manila”), United Airlines (“No flights for two days?”), my credit-card company, Air China’s American reservations office, Air China’s Chinese reservations office, and Air China’s Taipei reservations office. I called AT&T to ascertain why I couldn’t reach Air China (“Yes, please connect me to the airline. Could you tell me the number first? I’ll need to dial it after you connect me and the call is then dropped”).

As I called, Nelly emailed. She alerted Bob, aka Janet (Ling-Yan) Hung, who hosted half the workshop at Fudan University in Shanghai. Nelly emailed Eve, aka Feng-Li Lin, who hosted half the workshop at National Taiwan University in Taipei. Janet twiddled the magnets in her lab (investigated travel funding), and Feng-Li cooled a refrigerator in his.

ABE can process information only so efficiently, using LOCC. The time crept from 1:00 PM to 3:30.

nelly-2-001

Nelly Ng uses classical communications.

What could we have accomplished with quantum communication? Using LOCC, Alice can manipulate quantum states (like an electron’s orientation) in her lab. She can send nonquantum messages (like “My flight is delayed”) to Bob. She can’t send quantum information (like an electron’s orientation).

Alice and Bob can ape quantum communication, given entanglement. Suppose that Charlie strongly correlates two electrons. Suppose that Charlie gives Alice one electron and gives Bob the other. Alice can send one qubit–one unit of quantum information–to Bob. We call that sending quantum teleportation.

Suppose that air-traffic control had loaned entanglement to Janet, Feng-Li, and me. Could we have finagled me to LA quickly?

Quantum teleportation differs from human teleportation.

xkcd

xkcd.com/465

We didn’t need teleportation. Feng-Li arranged for me to visit Taiwan’s National Center for Theoretical Sciences (NCTS) for two days. Air China agreed to return me to Shanghai afterward. United would fly me to LA, thanks to help from Janet. Nelly rescued my luggage from leaving on the wrong flight.

Would I rather have teleported? I would have avoided a bushel of stress. But I wouldn’t have learned from Janet about Chinese science funding, wouldn’t have heard Feng-Li’s views about gravitational waves, wouldn’t have glimpsed Taiwanese countryside flitting past the train we rode to the NCTS.

According to some metrics, classical resources outperform quantum.

einstein-2-001

At Taiwan’s National Center for Theoretical Sciences

The workshop organizers have generously released videos of the lectures. My lecture about quantum chaos and fluctuation relations appears here and here. More talks appear here.

With gratitude to Janet Hung, Feng-Li Lin, and Nelly Ng; to Fudan University, National Taiwan University, and Taiwan’s National Center for Theoretical Sciences for their hospitality; and to Xiao Yu for administrative support.

Glossary and other clarifications:

1Field theory describes subatomic particles and light.

2Physics and philosophy enrich each other. But I haven’t trained in philosophy. I benefit from differentiating physics problems that I’ve equipped to solve from philosophy problems that I haven’t.


Scott AaronsonI will not log in to your website

Two or three times a day, I get an email whose basic structure is as follows:

Prof. Aaronson, given your expertise, we’d be incredibly grateful for your feedback on a paper / report / grant proposal about quantum computing.  To access the document in question, all you’ll need to do is create an account on our proprietary DigiScholar Portal system, a process that takes no more than 3 hours.  If, at the end of that process, you’re told that the account setup failed, it might be because your browser’s certificates are outdated, or because you already have an account with us, or simply because our server is acting up, or some other reason.  If you already have an account, you’ll of course need to remember your DigiScholar Portal ID and password, and not confuse them with the 500 other usernames and passwords you’ve created for similar reasons—ours required their own distinctive combination of upper and lowercase letters, numerals, and symbols.  After navigating through our site to access the document, you’ll then be able to enter your DigiScholar Review, strictly adhering to our 15-part format, and keeping in mind that our system will log you out and delete all your work after 30 seconds of inactivity.  If you have trouble, just call our helpline during normal business hours (excluding Wednesdays and Thursdays) and stay on the line until someone assists you.  Most importantly, please understand that we can neither email you the document we want you to read, nor accept any comments about it by email.  In fact, all emails to this address will be automatically ignored.

Every day, I seem to grow crustier than the last.

More than a decade ago, I resolved that I would no longer submit to or review for most for-profit journals, as a protest against the exorbitant fees that those journals charge academics in order to buy back access to our own work—work that we turn over to the publishers (copyright and all) and even review for them completely for free, with the publishers typically adding zero or even negative value.  I’m happy that I’ve been able to keep that pledge.

Today, I’m proud to announce a new boycott, less politically important but equally consequential for my quality of life, and to recommend it to all of my friends.  Namely: as long as the world gives me any choice in the matter, I will never again struggle to log in to any organization’s website.  I’ll continue to devote a huge fraction of my waking hours to fielding questions from all sorts of people on the Internet, and I’ll do it cheerfully and free of charge.  All I ask is that, if you have a question, or a document you want me to read, you email it!  Or leave a blog comment, or stop by in person, or whatever—but in any case, don’t make me log in to anything other than Gmail or Facebook or WordPress or a few other sites that remain navigable by a senile 35-year-old who’s increasingly fixed in his ways.  Even Google Docs and Dropbox are pushing it: I’ll give up (on principle) at the first sight of any login issue, and ask for just a regular URL or an attachment.

Oh, Skype no longer lets me log in either.  Could I get to the bottom of that?  Probably.  But life is too short, and too precious.  So if we must, we’ll use the phone, or Google Hangouts.

In related news, I will no longer patronize any haircut place that turns away walk-in customers.

Back when we were discussing the boycott of Elsevier and the other predatory publishers, I wrote that this was a rare case “when laziness and idealism coincide.”  But the truth is more general: whenever my deepest beliefs and my desire to get out of work both point in the same direction, from here till the grave there’s not a force in the world that can turn me the opposite way.

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …

Update:

Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

March 18, 2017

BackreactionNo, we probably don’t live in a computer simulation

According to Nick Bostrom of the Future of Humanity Institute, it is likely that we live in a computer simulation. And one of our biggest existential risks is that the superintelligence running our simulation shuts it down. The simulation hypothesis, as it’s called, enjoys a certain popularity among people who like to think of themselves as intellectual, believing it speaks for their mental

David Hoggsnow day

[Today was a NYC snow day, with schools and NYU closed, and Flatiron on a short day.] I made use of my incarceration at home writing in the nascent paper about the TGAS color–magnitude diagram with Lauren Anderson (Flatiron). And doing lots of other non-research things.

Tommaso DorigoNeutrinos: The Status, Circa 2017

Doug NatelsonAPS March Meeting 2017 Day 3 - updated w/ guest post!

Hello readers - I have travel plans such that I have to leave the APS meeting after lunch today.  That means I will miss the big Kavli Symposium session.  If someone out there would like to offer to write up a bit about those talks, please email me or comment below, and I'd be happy to give someone a guest post on this.

Update:  One of my readers was able to attend the first two talks of the Kavli Symposium, by Duncan Haldane and Michael Kosterlitz, two of this year's Nobel laureates.  Here are his comments.  If anyone has observations about the remaining talks in the symposium, please feel free to email me or post in the comments below.
I basically ran from the Buckley Prize talk by Alexei Kitaev down the big hall where Duncan Haldane was preparing to talk.  When I got there it was packed full but I managed to squeeze into a seat in the middle section.  I sighted my postdoc near the back of the first section; he later told me he’d arrived 35 minutes early to get that seat.

I felt Haldane’s talk was remarkably clear and simple given the rarified nature of the physics behind it.  He pointed out that condensed matter physics really changed starting in the 1980’s, and conceptually now is much different than the conventional picture  presented in books like Ashcroft and Mermin’s Solid State Physics that many of us learned from as students.  One prevailing idea leading up to that time was that changes in the ground state must always be related to changes in symmetry.  Haldane’s paper on antiferromagnetic Heisenberg spin chains showed that the ground state properties of the chains were drastically different depending on whether  the spin at each site is integer (S=1,2,3,…) or half-integer (S=1/2, 3/2, 5/2 …) , despite the fact that the Hamiltonian has the same spherical symmetry for any value of S.  This we now understand on the basis of the topological classifications of the systems.  Many of these topological classifications were later systematically worked out by Xiao-Gang Wen who shared this year’s Buckley prize with Alexei Kitaev. Haldane flashed a link to his original manuscript on spin chains which he has posted on arXiv.org as https://arxiv.org/abs/1612.00076 , and which he noted was “rejected by many journals”.  He was also amused or bemused or maybe both by the fact that people referred to his ideas as “Haldane’s conjecture” rather than recognizing that he’d solved the problem.  He noted that once one understands that the topological classification determines many of the important properties it is obvious that simplified “toy models” can give deep insight into the underlying physics of all systems in the same class.  In this regard he singled out the AKLT model, which revealed how finite chains of spin S=1 have effective S=1/2 degrees of freedom associated with each end.  These are entangled with each other no matter how long the finite chain – a remarkable demonstration of quantum entanglement over a long distance.  This also is a simple example of the special nature of surface states or excitations in topological systems. 

Kosterlitz began by pointing out that the Nobel prize was effectively awarded for work on two distinct aspects of topology in condensed matter, and both of these involved David Thouless which led to his being awarded one-half of the prize, with the other half shared by Kosterlitz and Haldane.  He then relayed a bit about his own story: he started as a high energy physicist, and apparently did not get offered the position he wanted at CERN so he ended up at Birmingham, which turned out to be remarkably fortuitous.  There he teamed with Thouless and gradually switched his interests to condensed matter physics.  They wanted to understand data suggesting that quasi-two-dimensional films of liquid helium seemed to show a phase transition despite the expectation that this should not be possible.  He then gave a very professorial exposition of the Kosterlitz-Thouless (K-T) transition, starting with the physics of vortices, and how their mutual interactions involve a potential that depends on the logarithm of the distance.  The results point to a non-zero temperature above which the free energy favors free vortices and below which vortex-anti vortex pairs are bound. He then pointed out how this is relevant to a wide variety of two dimensional systems, including xy magnets, and also the melting of two-dimensional crystals in which two K-T transitions occur corresponding respectively to the unbinding of dislocations and disclinations.  
I greatly enjoyed both of these talks, especially since I have experimentally researched both spin chains and two-dimensional melting at different times in my career. 

March 17, 2017

Tommaso DorigoFive New Charmed Baryons Discovered By LHCb!

While I was busy reporting the talks at the "Neutrino Telescope"  conference in Venice, LHCb released a startling new result, which I have not much time to describe in much detail this evening (it's Friday evening here in Italy and I'm going to call the week off), and yet wish to share with you as soon as possible.
The spectroscopy of low- and intermediate-mass hadrons (whatever this means) is a complex topic which either enthuses particle physicists or bores them to death. There are two reasons for this dycothomic behaviour.

read more

March 16, 2017

Scott AaronsonInsert D-Wave Post Here

In the two months since I last blogged, the US has continued its descent into madness.  Yet even while so many certainties have proven ephemeral as the morning dew—the US’s autonomy from Russia, the sanity of our nuclear chain of command, the outcome of our Civil War, the constraints on rulers that supposedly set us apart from the world’s dictator-run hellholes—I’ve learned that certain facts of life remain constant.

The moon still waxes and wanes.  Electrons remain bound to their nuclei.  P≠NP proofs still fill my inbox.  Squirrels still gather acorns.  And—of course!—people continue to claim big quantum speedups using D-Wave devices, and those claims still require careful scrutiny.

With that preamble, I hereby offer you eight quantum computing news items.


Cathy McGeoch Episode II: The Selby Comparison

On January 17, a group from D-Wave—including Cathy McGeoch, who now works directly for D-Wave—put out a preprint claiming a factor-of-2500 speedup for the D-Wave machine (the new, 2000-qubit one) compared to the best classical algorithms.  Notably, they wrote that the speedup persisted when they compared against simulated annealing, quantum Monte Carlo, and even the so-called Hamze-de Freitas-Selby (HFS) algorithm, which was often the classical victor in previous performance comparisons against the D-Wave machine.

Reading this, I was happy to see how far the discussion has advanced since 2013, when McGeoch and Cong Wang reported a factor-of-3600 speedup for the D-Wave machine, but then it turned out that they’d compared only against classical exact solvers rather than heuristics—a choice for which they were heavily criticized on this blog and elsewhere.  (And indeed, that particular speedup disappeared once the classical computer’s shackles were removed.)

So, when people asked me this January about the new speedup claim—the one even against the HFS algorithm—I replied that, even though we’ve by now been around this carousel several times, I felt like the ball was now firmly in the D-Wave skeptics’ court, to reproduce the observed performance classically.  And if, after a year or so, no one could, that would be a good time to start taking seriously that a D-Wave speedup might finally be here to stay—and to move on to the next question, of whether this speedup had anything to do with quantum computation, or only with the building of a piece of special-purpose optimization hardware.


A&M: Annealing and Matching

As it happened, it only took one month.  On March 2, Salvatore Mandrà, Helmut Katzgraber, and Creighton Thomas put up a response preprint, pointing out that the instances studied by the D-Wave group in their most recent comparison are actually reducible to the minimum-weight perfect matching problem—and for that reason, are solvable in polynomial time on a classical computer.   Much of Mandrà et al.’s paper just consists of graphs, wherein they plot the running times of the D-Wave machine and of a classical heuristic on the relevant instances—clearly all different flavors of exponential—and then Edmonds’ matching algorithm from the 1960s, which breaks away from the pack into polynomiality.

But let me bend over backwards to tell you the full story.  Last week, I had the privilege of visiting Texas A&M to give a talk.  While there, I got to meet Helmut Katzgraber, a condensed-matter physicist who’s one of the world experts on quantum annealing experiments, to talk to him about their new response paper.  Helmut was clear in his prediction that, with only small modifications to the instances considered, one could see similar performance by the D-Wave machine while avoiding the reduction to perfect matching.  With those future modifications, it’s possible that one really might see a D-Wave speedup that survived serious attempts by skeptics to make it go away.

But Helmut was equally clear in saying that, even in such a case, he sees no evidence at present that the speedup would be asymptotic or quantum-computational in nature.  In other words, he thinks the existing data is well explained by the observation that we’re comparing D-Wave against classical algorithms for Ising spin minimization problems on Chimera graphs, and D-Wave has heroically engineered an expensive piece of hardware specifically for Ising spin minimization problems on Chimera graphs and basically nothing else.  If so, then the prediction would be that such speedups as can be found are unlikely to extend either to more “practical” optimization problems—which need to be embedded into the Chimera graph with considerable losses—or to better scaling behavior on large instances.  (As usual, as long as the comparison is against the best classical algorithms, and as long as we grant the classical algorithm the same non-quantum advantages that the D-Wave machine enjoys, such as classical parallelism—as Rønnow et al advocated.)

Incidentally, my visit to Texas A&M was partly an “apology tour.”  When I announced on this blog that I was moving from MIT to UT Austin, I talked about the challenge and excitement of setting up a quantum computing research center in a place that currently had little quantum computing for hundreds of miles around.  This thoughtless remark inexcusably left out not only my friends at Louisiana State (like Jon Dowling and Mark Wilde), but even closer to home, Katzgraber and the others at Texas A&M.  I felt terrible about this for months.  So it gives me special satisfaction to have the opportunity to call out Katzgraber’s new work in this post.  In football, UT and A&M were longtime arch-rivals, but when it comes to the appropriate level of skepticism to apply to quantum supremacy claims, the Texas Republic seems remarkably unified.


When 15 MilliKelvin is Toasty

In other D-Wave-related scientific news, on Monday night Tameem Albash, Victor Martin-Mayer, and Itay Hen put out a preprint arguing that, in order for quantum annealing to have any real chance of yielding a speedup over classical optimization methods, the temperature of the annealer should decrease at least like 1/log(n), where n is the instance size, and more likely like 1/nβ (i.e., as an inverse power law).

If this is correct, then cold as the D-Wave machine is, at 0.015 degrees or whatever above absolute zero, it still wouldn’t be cold enough to see a scalable speedup, at least not without quantum fault-tolerance, something that D-Wave has so far eschewed.  With no error-correction, any constant temperature that’s above zero would cause dangerous level-crossings up to excited states when the instances get large enough.  Only a temperature that actually converged to zero as the problems got larger would suffice.

Over the last few years, I’ve heard many experts make this exact same point in conversation, but this is the first time I’ve seen the argument spelled out in a paper, with explicit calculations (modulo assumptions) of the rate at which the temperature would need to go to zero for uncorrected quantum annealing to be a viable path to a speedup.  I lack the expertise to evaluate the calculations myself, but any experts who’d like to share their insight in the comments section are “warmly” (har har) invited.


“Their Current Numbers Are Still To Be Checked”

As some of you will have seen, The Economist now has a sprawling 10-page cover story about quantum computing and other quantum technologies.  I had some contact with the author while the story was in the works.

The piece covers a lot of ground and contains many true statements.  It could be much worse.

But I take issue with two things.

First, The Economist claims: “What is notable about the effort [to build scalable QCs] now is that the challenges are no longer scientific but have become matters of engineering.”  As John Preskill and others pointed out, this is pretty far from true, at least if we interpret the claim in the way most engineers and businesspeople would.

Yes, we know the rules of quantum mechanics, and the theory of quantum fault-tolerance, and a few promising applications; and the basic building blocks of QC have already been demonstrated in several platforms.  But if (let’s say) someone were to pony up $100 billion, asking only for a universal quantum computer as soon as possible, I think the rational thing to do would be to spend initially on a frenzy of basic research: should we bet on superconducting qubits, trapped ions, nonabelian anyons, photonics, a combination thereof, or something else?  (Even that is far from settled.)  Can we invent better error-correcting codes and magic state distillation schemes, in order to push the resource requirements for universal QC down by three or four orders of magnitude?  Which decoherence mechanisms will be relevant when we try to do this stuff at scale?  And of course, which new quantum algorithms can we discover, and which new cryptographic codes resistant to quantum attack?

The second statement I take issue with is this:

“For years experts questioned whether the [D-Wave] devices were actually exploiting quantum mechanics and whether they worked better than traditional computers.  Those questions have since been conclusively answered—yes, and sometimes”

I would instead say that the answers are:

  1. depends on what you mean by “exploit” (yes, there are quantum tunneling effects, but do they help you solve problems faster?), and
  2. no, the evidence remains weak to nonexistent that the D-Wave machine solves anything faster than a traditional computer—certainly if, by “traditional computer,” we mean a device that gets all the advantages of the D-Wave machine (e.g., classical parallelism, hardware heroically specialized to the one type of problem we’re testing on), but no quantum effects.

Shortly afterward, when discussing the race to achieve “quantum supremacy” (i.e., a clear quantum computing speedup for some task, not necessarily a useful one), the Economist piece hedges: “D-Wave has hinted it has already [achieved quantum supremacy], but has made similar claims in the past; their current numbers are still to be checked.”

To me, “their current numbers are still to be checked” deserves its place alongside “mistakes were made” among the great understatements of the English language—perhaps a fitting honor for The Economist.


Defeat Device

Some of you might also have seen that D-Wave announced a deal with Volkswagen, to use D-Wave machines for traffic flow.  I had some advance warning of this deal, when reporters called asking me to comment on it.  At least in the materials I saw, no evidence is discussed that the D-Wave machine actually solves whatever problem VW is interested in faster than it could be solved with a classical computer.  Indeed, in a pattern we’ve seen repeatedly for the past decade, the question of such evidence is never even directly confronted or acknowledged.

So I guess I’ll say the same thing here that I said to the journalists.  Namely, until there’s a paper or some other technical information, obviously there’s not much I can say about this D-Wave/Volkswagen collaboration.  But it would be astonishing if quantum supremacy were to be achieved on an application problem of interest to a carmaker, even as scientists struggle to achieve that milestone on contrived and artificial benchmarks, even as the milestone seems repeatedly to elude D-Wave itself on contrived and artificial benchmarks.  In the previous such partnerships—such as that with Lockheed Martin—we can reasonably guess that no convincing evidence for quantum supremacy was found, because if it had been, it would’ve been trumpeted from the rooftops.

Anyway, I confess that I couldn’t resist adding a tiny snark—something about how, if these claims of amazing performance were found not to withstand an examination of the details, it would not be the first time in Volkswagen’s recent history.


Farewell to a Visionary Leader—One Who Was Trash-Talking Critics on Social Media A Decade Before President Trump

This isn’t really news, but since it happened since my last D-Wave post, I figured I should share.  Apparently D-Wave’s outspoken and inimitable founder, Geordie Rose, left D-Wave to form a machine-learning startup (see D-Wave’s leadership page, where Rose is absent).  I wish Geordie the best with his new venture.


Martinis Visits UT Austin

On Feb. 22, we were privileged to have John Martinis of Google visit UT Austin for a day and give the physics colloquium.  Martinis concentrated on the quest to achieve quantum supremacy, in the near future, using sampling problems inspired by theoretical proposals such as BosonSampling and IQP, but tailored to Google’s architecture.  He elaborated on Google’s plan to build a 49-qubit device within the next few years: basically, a 7×7 square array of superconducting qubits with controllable nearest-neighbor couplings.  To a layperson, 49 qubits might sound unimpressive compared to D-Wave’s 2000—but the point is that these qubits will hopefully maintain coherence times thousands of times longer than the D-Wave qubits, and will also support arbitrary quantum computations (rather than only annealing).  Obviously I don’t know whether Google will succeed in its announced plan, but if it does, I’m very optimistic about a convincing quantum supremacy demonstration being possible with this sort of device.

Perhaps most memorably, Martinis unveiled some spectacular data, which showed near-perfect agreement between Google’s earlier 9-qubit quantum computer and the theoretical predictions for a simulation of the Hofstadter butterfly (incidentally invented by Douglas Hofstadter, of Gödel, Escher, Bach fame, when he was still a physics graduate student).  My colleague Andrew Potter explained to me that the Hofstadter butterfly can’t be used to show quantum supremacy, because it’s mathematically equivalent to a system of non-interacting fermions, and can therefore be simulated in classical polynomial time.  But it’s certainly an impressive calibration test for Google’s device.


2000 Qubits Are Easy, 50 Qubits Are Hard

Just like the Google group, IBM has also publicly set itself the ambitious goal of building a 50-qubit superconducting quantum computer in the near future (i.e., the next few years).  Here in Austin, IBM held a quantum computing session at South by Southwest, so I went—my first exposure of any kind to SXSW.  There were 10 or 15 people in the audience; the purpose of the presentation was to walk through the use of the IBM Quantum Experience in designing 5-qubit quantum circuits and submitting them first to a simulator and then to IBM’s actual superconducting device.  (To the end user, of course, the real machine differs from the simulation only in that with the former, you can see the exact effects of decoherence.)  Afterward, I chatted with the presenters, who were extremely friendly and knowledgeable, and relieved (they said) that I found nothing substantial to criticize in their summary of quantum computing.

Hope everyone had a great Pi Day and Ides of March.

March 15, 2017

Doug NatelsonAPS March Meeting 2017 Day 2

Some highlights from day 2 (though I spent quite a bit of time talking with colleagues and collaborators):

Harold Hwang of Stanford gave a very nice talk about oxide materials, with two main parts.  First, he spoke about making a hot electron (metal base) transistor  (pdf N Mat 10, 198 (2011)) - this is a transistor device made from STO/LSMO/Nb:STO, where the LSMO layer is a metal, and the idea is to get "hot" electrons to shoot over the Schottky barrier at the STO/LSMO interface, ballistically across the metallic LSMO base, and into the STO drain.  Progress has been interesting since that paper, especially with very thin bases.  In principle such devices can be very fast. 

The second part of his talk was about trying to make free-standing ultrathin oxide layers, reminiscent of what you can see with the van der Waals materials like graphene or MoS2.  To do this, they use a layer of Sr3Al2O6 - that stuff can be grown epitaxially with pulsed laser deposition on nice oxide substrates like STO, and other oxide materials (even YBCO or superlattices) can be grown epitaxially on top of it. Sr3Al2O6 is related to the compound in Portland cement that is hygroscopic, and turns out to be water soluble (!), so that you can dissolve it and lift off the layers above it.  Very impressive.

Bharat Jalan of Minnesota spoke about growing BaSnO3 via molecular beam epitaxy.  This stuff is a semiconductor dominated by the Ba 5s band, with a low effective mass so that it tends to have pretty high mobilities.  This is an increasingly trendy new wide gap oxide semiconductor that could potentially be useful for transparent electronics.  

Ivan Bozovic of Brookhaven (and Yale) gave a very compelling talk about high temperature superconductors, specifically LSCO, based on having grown thousands of extremely high quality (as assessed by the width of the transition in penetration depth measurements) epitaxial films of varying doping concentrations.   Often people assert that the cuprates, when "overdoped", basically become more conventional BCS superconductors with a Fermi liquid normal state.  Bozovic presents very convincing evidence (from pretty much the data alone, without complex models for interpretation) that shows this is not right - that instead these materials are weird even in the overdoped regime, with systematic property variations that don't look much like conventional superconductors at all.  In the second part of his talk, he showed clear transport evidence for electronic anisotropy in the normal state of LSCO over the phase diagram, with preferred axes in the plane that vary with temperature and don't necessarily align with crystallographic axes of the material.  Neat stuff.   

Shang-Jie Yu at Maryland spoke about work on coherent optical manipulation of phonons.  In particular, previous work from this group looked at ensembles of spherical core-shell nanoparticles in solution, and found that they could excite a radial breathing vibrational mode with an optical pulse, and then measure that breathing in a time-resolved way with probe pulses.  Now they can do more complex pulse sequences to control which vibrations get excited - very cute, and it's impressive to me that this works even when working with an ensemble of particles with presumably some variation in geometry.


n-Category Café Functional Equations VI: Using Probability Theory to Solve Functional Equations

A functional equation is an entirely deterministic thing, such as f(x+y)=f(x)+f(y) f(x + y) = f(x) + f(y) or f(f(f(x)))=x f(f(f(x))) = x or f(cos(e f(x)))+2x=sin(f(x+1)). f\Bigl(\cos\bigl(e^{f(x)}\bigr)\Bigr) + 2x = \sin\bigl(f(x+1)\bigr). So it’s a genuine revelation that one can solve some functional equations using probability theory — more specifically, the theory of large deviations.

This week and next week, I’m explaining how. Today (pages 22-25 of these notes) was mainly background:

  • an introduction to the theory of large deviations;

  • an introduction to convex duality, which Simon has written about here before;

  • how the two can be combined to get a nontrivial formula for sums of powers of real numbers.

Next time, I’ll explain how this technique produces a startlingly simple characterization of the pp-norms.

March 14, 2017

Doug NatelsonAPS March Meeting 2017 Day 1

Some talks I saw today at the APS March Meeting in New Orleans:

John Martinis spoke about "quantum supremacy".  Quantum supremacy means achieving performance truly superior to classical situation - in Martinis' usage, the idea is to look at cross-correlations between different qubits, and compare with expectations for fully entangled/coherent systems, to assess how well you are able to set, entangle, and preserve the coherence of your quantum bits.

An optical analog:  Coherent light (laser pointer) incident on frosted glass results in a diffuse spot that is, when examined in detail, an incredibly complicated speckle pattern.  The statistics of that speckled light (correlations over different spatial regions) are very different than if you just had a defocused spot.  In his system, he is taking nine (superconducting, tunable transmon) qubits, where they can control both the coupling between neighboring bits and the energy of each bit.  They set the system in an initial state (injecting a known number of microwave photons into particular qubits); set the energies in a known but randomly selected way, turn on and off the neighbor couplings (25 ns timescale) for some number of cycles, and then look where the microwave photons end up, and take the statistics.  They find that they get good agreement with an error rate of 0.3%/qubit/cycle.  That's enough that they could conceivably do something useful.

As a demo, they use their qubits to model the Hofstadter butterfly problem - finding the energy levels of a 2d electronic system (on a hexagonal lattice, which maps to a 1d problem that they can implement w. their array of nine qubits).  They can get a nice agreement between theory and experiment.  Very impressive.  He  concluded w/ a warning not to believe all hype from qc investigators, including himself.  In general, the approach is basically brute force up to ~ 45 qubits or more (couple of hundred), to think about optimal control and feedback schemes before worrying about truly huge scaling.  The only downside to the talk was that it was in a room that was far too small for the audience.

Alex MacLeod gave a nice talk about using scanning near-field optical microscopy to study the metal-insulator transition in V2O3, as in this paper.   By performing cryogenic near-field scanning optical microscopy in ultrahigh vacuum (!), they measured scattered light from nanoscale scanning tip, giving local dielectric information (hence distinction between metal and insulator surroundings) with an effective spatial resolution that is basically the radius of curvature of the tip.   There is pattern formation at the metal-insulator transition because the two phases have different crystal structures (metal = corundum; insulator = monoclinic), and therefore the transition is a problem of constrained free energy minimization.  This generically leads to pattern formation in the mixed-phase regime.  They see a clear percolation transition in optical measurements, coinciding w/ long distance transport measurements - they really are seeing metallic domains.  Strangely, they find a temperature offset betw/ the structural transition (as seen through x-ray) vs the MIT.  The structural transition temperature is higher, and coincides with max anisotropy in the imaged patterns.  They also see pieces of persistent metallic state at low T, suggesting that some other frustration is going on to stabilize this.

Anatole von Lilienfeld of Basel gave an interesting talk about using machine learning techniques to get quantum chemistry information about small molecules faster and allegedly with better accuracy than full density functional theory calculations.  Basically you train the software on molecules that have been solved to some high degree of accuracy, parametrizing the molecules by their structure (a "Coulomb matrix" that takes into account the relative coordinates and effective charges of the ions) and/or bonding (a "bag of bonds" that takes into account two-body bonds).  Then the software can do a really good job interpolating quantum properties (HOMO-LUMO gaps, ionization potentials) of related molecules faster than you could calculate them in detail.  Impressive, but it seems like a powerful look-up table rather than providing much physical insight.

Melissa Eblen-Zayas gave a fun talk about trying to upgrade the typical advanced junior lab to include real elements of experimental design.  Best line:  "At times student frustration was palpable."

Dan Ralph gave a very compelling talk about the origins of spin-orbit torques in thin-film heterostructures.  I've written in the past about related work.   This was a particularly clear exposition, and went to new territory.  Traditionally, if you have a thin film of a heavy metal (tantalum, say), and you pass current through that film, at the upper (and lower) film surface you will accumulate spin density oriented in the plane and perpendicular to the charge current.  He made a clear argument that this is required because of the mirror symmetry properties of typical polycrystalline metal films.  However, if instead you work with a thin material with much lower symmetry (WTe2, for example) instead of the heavy metal, you can exert spin torques on adjacent magnetic overlayers as if the accumulated spin was out of the plane (which could be useful for certain device approaches).

March 13, 2017

Tommaso DorigoPosts On Neutrino Experiments, Day 1

The first day of the Neutrino Telescopes XVII conference in Venice is over, and I would like to point you to some short summaries that I published for the conference blog, at http://neutel11.wordpress.com. 
Specifically:


- a summary of the talk on Super-Kamiokande
- a summary of the talk on SNO
- a summary of the talk on KamLAND
- a summary of the talk on K2K and T2K
- a summary of the talk on Daya Bay

You might have noticed that the above experiments were recipients of the 2016 Breakthrough prize in physics. In fact, the session was specifically focusing on these experiments for that reason.

read more

Tommaso DorigoThe Formidable Neutrino

Elementary particles are mysterious and unfathomable, and it takes giant accelerators and incredibly complex devices to study them. In the last 100 years we have made great strides in the investigations of the properties of quarks, leptons, and vector bosons, but I would lie if I said we know half of what we would like to. In science, the opening of a door reveals others, closed by more complicated locks - and no clearer example of this is the investigation of subatomic matter. 

read more

BackreactionIs Verlinde’s Emergent Gravity compatible with General Relativity?

Dark matter filaments, Millenium SimulationImage: Volker Springel A few months ago, Erik Verlinde published an update of his 2010 idea that gravity might originate in the entropy of so-far undetected microscopic constituents of space-time. Gravity, then, would not be fundamental but emergent. With the new formalism, he derived an equation for a modified gravitational law that, on galactic

John BaezRestoring the North Cascades Ecosystem

In 49 hours, the National Park Service will stop taking comments on an important issue: whether to reintroduce grizzly bears into the North Cascades near Seattle. If you leave a comment on their website before then, you can help make this happen! Follow the easy directions here:

http://theoatmeal.com/blog/grizzlies_north_cascades

Please go ahead! Then tell your friends to join in, and give them this link. This can be your good deed for the day.

But if you want more details:

Grizzly bears are traditionally the apex predator in the North Cascades. Without the apex predator, the whole ecosystem is thrown out of balance. I know this from my childhood in northern Virginia, where deer are stripping the forest of all low-hanging greenery with no wolves to control them. With the top predator, the whole ecosystem springs to life and starts humming like a well-tuned engine! For example, when wolves were reintroduced in Yellowstone National Park, it seems that even riverbeds were affected:

There are several plans to restore grizzlies to the North Cascades. On the link I recommended, Matthew Inman supports Alternative C — Incremental Restoration. I’m not an expert on this issue, so I went ahead and supported that. There are actually 4 alternatives on the table:

Alternative A — No Action. They’ll keep doing what they’re already doing. The few grizzlies already there would be protected from poaching, the local population would be advised on how to deal with grizzlies, and the bears would be monitored. All other alternatives will do these things and more.

Alternative B — Ecosystem Evaluation Restoration. Up to 10 grizzly bears will be captured from source populations in northwestern Montana and/or south-central British Columbia and released at a single remote site on Forest Service lands in the North Cascades. This will take 2 years, and then they’ll be monitored for 2 years before deciding what to do next.

Alternative C — Incremental Restoration. 5 to 7 grizzly bears will be captured and released into the North Casades each year over roughly 5 to 10 years, with a goal of establishing an initial population of 25 grizzly bears. Bears would be released at multiple remote sites. They can be relocated or removed if they cause trouble. Alternative C is expected to reach the restoration goal of approximately 200 grizzly bears within 60 to 100 years.

Alternative D — Expedited Restoration. 5 to 7 grizzly bears will be captured and released into the North Casades each year until the population reaches about 200, which is what the area can easily support.

So, pick your own alternative if you like!

By the way, the remaining grizzly bears in the western United States live within six recovery zones:

• the Greater Yellowstone Ecosystem (GYE) in Wyoming and southwest Montana,

• the Northern Continental Divide Ecosystem (NCDE) in northwest Montana,

• the Cabinet-Yaak Ecosystem (CYE) in extreme northwestern Montana and the northern Idaho panhandle,

• the Selkirk Ecosystem (SE) in northern Idaho and northeastern Washington,

• the Bitterroot Ecosystem (BE) in central Idaho and western Montana,

• and the North Cascades Ecosystem (NCE) in northwestern and north-central Washington.

The North Cascades Ecosystem consists of 24,800 square kilometers in Washington, with an additional 10,350 square kilometers in British Columbia. In the US, 90% of this ecosystem is managed by the US Forest Service, the US National Park Service, and the State of Washington, and approximately 41% falls within Forest Service wilderness or the North Cascades National Park Service Complex.

For more, read this:

• National Park Service, Draft Grizzly Bear Restoration Plan / Environmental Impact Statement: North Cascades Ecosystem.

The picture of grizzlies is from this article:

• Ron Judd, Why returning grizzlies to the North Cascades is the right thing to do, Pacific NW Magazine, 23 November 2015.

If you’re worried about reintroducing grizzly bears, read it!

The map is from here:

• Krista Langlois, Grizzlies gain ground, High Country News, 27 August 2014.

Here you’ll see the huge obstacles this project has overcome so far.


March 12, 2017

Jordan EllenbergFitchburg facts


March 11, 2017

Doug NatelsonAPS March Meeting 2017

Once again, it's that time of year when somewhat absurd numbers of condensed matter (and other) physicists gather together.  This time the festivities are in New Orleans.  I'll be at the meeting tomorrow (this will be my first time attending the business meetings as a member-at-large of the Division of Condensed Matter Physics, so that should be new and different)  through Wednesday afternoon.  As in previous years, I will do my best to write up some of the interesting things I learn about.  (If you're at the meeting and you don't already have a copy, now is the perfect time to swing by the Cambridge University Press exhibit at the trade show and pick up my book :-) )

Terence TaoFurstenberg limits of the Liouville function

Given a function {f: {\bf N} \rightarrow \{-1,+1\}} on the natural numbers taking values in {+1, -1}, one can invoke the Furstenberg correspondence principle to locate a measure preserving system {T \circlearrowright (X, \mu)} – a probability space {(X,\mu)} together with a measure-preserving shift {T: X \rightarrow X} (or equivalently, a measure-preserving {{\bf Z}}-action on {(X,\mu)}) – together with a measurable function (or “observable”) {F: X \rightarrow \{-1,+1\}} that has essentially the same statistics as {f} in the sense that

\displaystyle \lim \inf_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)

\displaystyle \leq \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle \leq \lim \sup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)

for any integers {h_1,\dots,h_k}. In particular, one has

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x) = \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k) \ \ \ \ \ (1)

 

whenever the limit on the right-hand side exists. We will refer to the system {T \circlearrowright (X,\mu)} together with the designated function {F} as a Furstenberg limit ot the sequence {f}. These Furstenberg limits capture some, but not all, of the asymptotic behaviour of {f}; roughly speaking, they control the typical “local” behaviour of {f}, involving correlations such as {\frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k)} in the regime where {h_1,\dots,h_k} are much smaller than {N}. However, the control on error terms here is usually only qualitative at best, and one usually does not obtain non-trivial control on correlations in which the {h_1,\dots,h_k} are allowed to grow at some significant rate with {N} (e.g. like some power {N^\theta} of {N}).

The correspondence principle is discussed in these previous blog posts. One way to establish the principle is by introducing a Banach limit {p\!-\!\lim: \ell^\infty({\bf N}) \rightarrow {\bf R}} that extends the usual limit functional on the subspace of {\ell^\infty({\bf N})} consisting of convergent sequences while still having operator norm one. Such functionals cannot be constructed explicitly, but can be proven to exist (non-constructively and non-uniquely) using the Hahn-Banach theorem; one can also use a non-principal ultrafilter here if desired. One can then seek to construct a system {T \circlearrowright (X,\mu)} and a measurable function {F: X \rightarrow \{-1,+1\}} for which one has the statistics

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x) = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n+h_1) \dots f(n+h_k) \ \ \ \ \ (2)

 

for all {h_1,\dots,h_k}. One can explicitly construct such a system as follows. One can take {X} to be the Cantor space {\{-1,+1\}^{\bf Z}} with the product {\sigma}-algebra and the shift

\displaystyle T ( (x_n)_{n \in {\bf Z}} ) := (x_{n+1})_{n \in {\bf Z}}

with the function {F: X \rightarrow \{-1,+1\}} being the coordinate function at zero:

\displaystyle F( (x_n)_{n \in {\bf Z}} ) := x_0

(so in particular {F( T^h (x_n)_{n \in {\bf Z}} ) = x_h} for any {h \in {\bf Z}}). The only thing remaining is to construct the invariant measure {\mu}. In order to be consistent with (2), one must have

\displaystyle \mu( \{ (x_n)_{n \in {\bf Z}}: x_{h_j} = \epsilon_j \forall 1 \leq j \leq k \} )

\displaystyle = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N 1_{f(n+h_1)=\epsilon_1} \dots 1_{f(n+h_k)=\epsilon_k}

for any distinct integers {h_1,\dots,h_k} and signs {\epsilon_1,\dots,\epsilon_k}. One can check that this defines a premeasure on the Boolean algebra of {\{-1,+1\}^{\bf Z}} defined by cylinder sets, and the existence of {\mu} then follows from the Hahn-Kolmogorov extension theorem (or the closely related Kolmogorov extension theorem). One can then check that the correspondence (2) holds, and that {\mu} is translation-invariant; the latter comes from the translation invariance of the (Banach-)Césaro averaging operation {f \mapsto p\!-\!\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n)}. A variant of this construction shows that the Furstenberg limit is unique up to equivalence if and only if all the limits appearing in (1) actually exist.

One can obtain a slightly tighter correspondence by using a smoother average than the Césaro average. For instance, one can use the logarithmic Césaro averages {\lim_{N \rightarrow \infty} \frac{1}{\log N}\sum_{n=1}^N \frac{f(n)}{n}} in place of the Césaro average {\sum_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N f(n)}, thus one replaces (2) by

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle = p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n+h_1) \dots f(n+h_k)}{n}.

Whenever the Césaro average of a bounded sequence {f: {\bf N} \rightarrow {\bf R}} exists, then the logarithmic Césaro average exists and is equal to the Césaro average. Thus, a Furstenberg limit constructed using logarithmic Banach-Césaro averaging still obeys (1) for all {h_1,\dots,h_k} when the right-hand side limit exists, but also obeys the more general assertion

\displaystyle \int_X F(T^{h_1} x) \dots F(T^{h_k} x)\ d\mu(x)

\displaystyle = \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n+h_1) \dots f(n+h_k)}{n}

whenever the limit of the right-hand side exists.

In a recent paper of Frantizinakis, the Furstenberg limits of the Liouville function {\lambda} (with logarithmic averaging) were studied. Some (but not all) of the known facts and conjectures about the Liouville function can be interpreted in the Furstenberg limit. For instance, in a recent breakthrough result of Matomaki and Radziwill (discussed previously here), it was shown that the Liouville function exhibited cancellation on short intervals in the sense that

\displaystyle \lim_{H \rightarrow \infty} \limsup_{X \rightarrow \infty} \frac{1}{X} \int_X^{2X} \frac{1}{H} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = 0.

In terms of Furstenberg limits of the Liouville function, this assertion is equivalent to the assertion that

\displaystyle \lim_{H \rightarrow \infty} \int_X |\frac{1}{H} \sum_{h=1}^H F(T^h x)|\ d\mu(x) = 0

for all Furstenberg limits {T \circlearrowright (X,\mu), F} of Liouville (including those without logarithmic averaging). Invoking the mean ergodic theorem (discussed in this previous post), this assertion is in turn equivalent to the observable {F} that corresponds to the Liouville function being orthogonal to the invariant factor {L^\infty(X,\mu)^{\bf Z} = \{ g \in L^\infty(X,\mu): g \circ T = g \}} of {X}; equivalently, the first Gowers-Host-Kra seminorm {\|F\|_{U^1(X)}} of {F} (as defined for instance in this previous post) vanishes. The Chowla conjecture, which asserts that

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for all distinct integers {h_1,\dots,h_k}, is equivalent to the assertion that all the Furstenberg limits of Liouville are equivalent to the Bernoulli system ({\{-1,+1\}^{\bf Z}} with the product measure arising from the uniform distribution on {\{-1,+1\}}, with the shift {T} and observable {F} as before). Similarly, the logarithmically averaged Chowla conjecture

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = 0

is equivalent to the assertion that all the Furstenberg limits of Liouville with logarithmic averaging are equivalent to the Bernoulli system. Recently, I was able to prove the two-point version

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) \lambda(n+h)}{n} = 0 \ \ \ \ \ (3)

 

of the logarithmically averaged Chowla conjecture, for any non-zero integer {h}; this is equivalent to the perfect strong mixing property

\displaystyle \int_X F(x) F(T^h x)\ d\mu(x) = 0

for any Furstenberg limit of Liouville with logarithmic averaging, and any {h \neq 0}.

The situation is more delicate with regards to the Sarnak conjecture, which is equivalent to the assertion that

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \lambda(n) f(n) = 0

for any zero-entropy sequence {f: {\bf N} \rightarrow {\bf R}} (see this previous blog post for more discussion). Morally speaking, this conjecture should be equivalent to the assertion that any Furstenberg limit of Liouville is disjoint from any zero entropy system, but I was not able to formally establish an implication in either direction due to some technical issues regarding the fact that the Furstenberg limit does not directly control long-range correlations, only short-range ones. (There are however ergodic theoretic interpretations of the Sarnak conjecture that involve the notion of generic points; see this paper of El Abdalaoui, Lemancyk, and de la Rue.) But the situation is currently better with the logarithmically averaged Sarnak conjecture

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) f(n)}{n} = 0,

as I was able to show that this conjecture was equivalent to the logarithmically averaged Chowla conjecture, and hence to all Furstenberg limits of Liouville with logarithmic averaging being Bernoulli; I also showed the conjecture was equivalent to local Gowers uniformity of the Liouville function, which is in turn equivalent to the function {F} having all Gowers-Host-Kra seminorms vanishing in every Furstenberg limit with logarithmic averaging. In this recent paper of Frantzikinakis, this analysis was taken further, showing that the logarithmically averaged Chowla and Sarnak conjectures were in fact equivalent to the much milder seeming assertion that all Furstenberg limits with logarithmic averaging were ergodic.

Actually, the logarithmically averaged Furstenberg limits have more structure than just a {{\bf Z}}-action on a measure preserving system {(X,\mu)} with a single observable {F}. Let {Aff_+({\bf Z})} denote the semigroup of affine maps {n \mapsto an+b} on the integers with {a,b \in {\bf Z}} and {a} positive. Also, let {\hat {\bf Z}} denote the profinite integers (the inverse limit of the cyclic groups {{\bf Z}/q{\bf Z}}). Observe that {Aff_+({\bf Z})} acts on {\hat {\bf Z}} by taking the inverse limit of the obvious actions of {Aff_+({\bf Z})} on {{\bf Z}/q{\bf Z}}.

Proposition 1 (Enriched logarithmically averaged Furstenberg limit of Liouville) Let {p\!-\!\lim} be a Banach limit. Then there exists a probability space {(X,\mu)} with an action {\phi \mapsto T^\phi} of the affine semigroup {Aff_+({\bf Z})}, as well as measurable functions {F: X \rightarrow \{-1,+1\}} and {M: X \rightarrow \hat {\bf Z}}, with the following properties:

  • (i) (Affine Furstenberg limit) For any {\phi_1,\dots,\phi_k \in Aff_+({\bf Z})}, and any congruence class {a\ (q)}, one has

    \displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(\phi_1(n)) \dots \lambda(\phi_k(n)) 1_{n = a\ (q)}}{n}

    \displaystyle = \int_X F( T^{\phi_1}(x) ) \dots F( T^{\phi_k}(x) ) 1_{M(x) = a\ (q)}\ d\mu(x).

  • (ii) (Equivariance of {M}) For any {\phi \in Aff_+({\bf Z})}, one has

    \displaystyle M( T^\phi(x) ) = \phi( M(x) )

    for {\mu}-almost every {x \in X}.

  • (iii) (Multiplicativity at fixed primes) For any prime {p}, one has

    \displaystyle F( T^{p\cdot} x ) = - F(x)

    for {\mu}-almost every {x \in X}, where {p \cdot \in Aff_+({\bf Z})} is the dilation map {n \mapsto pn}.

  • (iv) (Measure pushforward) If {\phi \in Aff_+({\bf Z})} is of the form {\phi(n) = an+b} and {S_\phi \subset X} is the set {S_\phi = \{ x \in X: M(x) \in \phi(\hat {\bf Z}) \}}, then the pushforward {T^\phi_* \mu} of {\mu} by {\phi} is equal to {a \mu\downharpoonright_{S_\phi}}, that is to say one has

    \displaystyle \mu( (T^\phi)^{-1}(E) ) = a \mu( E \cap S_\phi )

    for every measurable {E \subset X}.

Note that {{\bf Z}} can be viewed as the subgroup of {Aff_+({\bf Z})} consisting of the translations {n \mapsto n + b}. If one only keeps the {{\bf Z}}-portion of the {Aff_+({\bf Z})} action and forgets the rest (as well as the function {M}) then the action becomes measure-preserving, and we recover an ordinary Furstenberg limit with logarithmic averaging. However, the additional structure here can be quite useful; for instance, one can transfer the proof of (3) to this setting, which we sketch below the fold, after proving the proposition.

The observable {M}, roughly speaking, means that points {x} in the Furstenberg limit {X} constructed by this proposition are still “virtual integers” in the sense that one can meaningfully compute the residue class of {x} modulo any natural number modulus {q}, by first applying {M} and then reducing mod {q}. The action of {Aff_+({\bf Z})} means that one can also meaningfully multiply {x} by any natural number, and translate it by any integer. As with other applications of the correspondence principle, the main advantage of moving to this more “virtual” setting is that one now acquires a probability measure {\mu}, so that the tools of ergodic theory can be readily applied.

— 1. Proof of proposition —

We adapt the previous construction of the Furstenberg limit. The space {X} will no longer be the Cantor space {\{-1,+1\}^{\bf Z}}, but will instead be taken to be the space

\displaystyle X := \{-1,+1\}^{Aff_+({\bf Z})} \times \hat {\bf Z}.

The action of {Aff_+({\bf Z})} here is given by

\displaystyle T^\phi ( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := ( (x_{\psi \phi})_{\psi \in Aff_+({\bf Z})}, \phi(m) );

this can easily be seen to be a semigroup action. The observables {F: X \rightarrow \{-1,+1\}} and {M: X \rightarrow \hat {\bf Z}} are defined as

\displaystyle F( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := x_{id}

and

\displaystyle M( (x_\psi)_{\psi \in Aff_+({\bf Z})}, m ) := m

where {id} is the identity element of {Aff_+({\bf Z})}. Property (ii) is now clear. Now we have to construct the measure {\mu}. In order to be consistent with property (i), the measure of the set

\displaystyle \{ ((x_\phi)_{\phi \in Aff_+({\bf Z})}, m): x_{\phi_j} = \epsilon_j \forall 1 \leq j \leq k; m = a\ (q) \} \ \ \ \ \ (4)

 

for any distinct {\phi_1,\dots,\phi_k \in Aff_+({\bf Z})}, signs {\epsilon_1,\dots,\epsilon_k \in \{-1,+1\}}, and congruence class {a\ (q)}, must be equal to

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{1_{\lambda(\phi_j(n)) = \epsilon_j \forall 1 \leq j \leq k; m = a\ (q)}}{n}.

One can check that this requirement uniquely defines a premeasure on the Boolean algebra on {X} generated by the sets (4), and {\mu} can then be constructed from the Hahn-Kolmogorov theorem as before. Property (i) follows from construction. Specialising to the case {\phi_1(n) = n}, {\phi_2(n) = pn} for a prime {p} we have

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{\lambda(n) \lambda(pn)}{n}

\displaystyle = \int_X F( x ) F( T^{p \cdot}(x) ) \ d\mu(x);

the left-hand side is {-1}, which gives (iii).

It remains to establish (iv). It will suffice to do so for sets {E} of the form (4). The claim then follows from the dilation invariance property

\displaystyle p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(an+b)}{n} = a p\!-\!\lim_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n=1}^N \frac{f(n)}{n} 1_{n = b\ (a)}

for any bounded function {f}, which is easily verified (here is where it is essential that we are using logarithmically averaged Césaro means rather than ordinary Césaro means).

Remark 2 One can embed this {Aff_+({\bf Z})}-system {X} as a subsystem of a {Aff_+({\bf Q})}-system {Aff_+({\bf Q}) \otimes_{Aff_+({\bf Z})} X}, however this larger system is only {\sigma}-finite rather than a probability space, and also the observable {M} now takes values in the larger space {{\bf Q} \otimes_{\bf Z} \hat {\bf Z}}. This recovers a group action rather than a semigroup action, but I am not sure if the added complexity of infinite measure is worth it.

— 2. Two-point logarithmic Chowla —

We now sketch how the proof of (3) in this paper can be translated to the ergodic theory setting. For sake of notation let us just prove (3) when {h=1}. We will assume familiarity with ergodic theory concepts in this sketch. By taking a suitable Banach limit, it will suffice to establish that

\displaystyle \int_X F(x) F( T^{\cdot+1} x)\ d\mu(x) = 0

for any Furstenberg limit produced by Proposition 1, where {\cdot+h} denotes the operation of translation by {h}. By property (iii) of that proposition, we can the left-hand side as

\displaystyle \int_X F(T^{p\cdot} x) F( T^{p\cdot+p} x)\ d\mu(x)

for any prime {p}, and then by property (iv) we can write this in turn as

\displaystyle \int_X F(x) F( T^{p} x) p 1_{M(x) = 0\ (p)}\ d\mu(x).

Averaging, we thus have

\displaystyle \int_X F(x) F( T^{\cdot+1} x)\ d\mu(x) = \frac{1}{|\mathcal P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{p} x) p 1_{M(x) = 0\ (p)}\ d\mu(x)

for any {P>1}, where {{\mathcal P}_P} denotes the primes between {P/2} and {P}.

On the other hand, the Matomaki-Radziwill theorem (twisted by Dirichlet characters) tells us that for any congruence class {q \geq 1}, one has

\displaystyle \lim_{H \rightarrow \infty} \lim\sup_{N \rightarrow \infty} \frac{1}{\log N} \sum_{n \leq N} |\frac{1}{H} \sum_{h=1}^H \lambda(n+qh)| = 0

which on passing to the Furstenberg limit gives

\displaystyle \lim_{H \rightarrow \infty} \int_X |\frac{1}{H} \sum_{h=1}^H F( T^{\cdot+qh} x)|\ d\mu(x) = 0.

Applying the mean ergodic theorem, we conclude that {F} is orthogonal to the profinite factor of the {{\bf Z}}-action, by which we mean the factor generated by the functions that are periodic ({T^{\cdot+q}}-invariant for some {q \geq 1}). One can show from Fourier analysis that the profinite factor is characteristic for averaging along primes, and in particular that

\displaystyle \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{\cdot + p} x)\ d\mu \rightarrow 0

as {P \rightarrow \infty}. (This is not too difficult given the usual Vinogradov estimates for exponential sums over primes, but I don’t know of a good reference for this fact. This paper of Frantzikinakis, Host, and Kra establishes the analogous claim that the Kronecker factor is characteristic for triple averages {\frac{1}{|\mathcal P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{p} x)\ d\mu}, and their argument would also apply here, but this is something of an overkill.) Thus, if we define the quantities

\displaystyle Q_P := \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \int_X F(x) F( T^{\cdot + p} x) (p 1_{M(x) = 0\ (p)}-1)\ d\mu(x)

it will suffice to show that {\liminf_{P \rightarrow \infty} |Q_P| = 0}.

Suppose for contradiction that {|Q_P| \geq \varepsilon} for all sufficiently large {P}. We can write {Q_P} as an expectation

\displaystyle Q_P = {\bf E} F_P( X_P, Y_P )

where {X_P} is the {\{-1,+1\}^P}-valued random variable

\displaystyle X_P := ( F( T^{\cdot + k} x ) )_{0 \leq k < P}

with {x} drawn from {X} with law {\mu}, {Y_P} is the {\prod_{p \in {\mathcal P}_P} {\bf Z}/p{\bf Z}}-valued random variable

\displaystyle Y_P := ( M(x)\ (p) )_{p \in {\mathcal P}_P}

with {x} as before, and {F_P} is the function

\displaystyle F_P( (\epsilon_k)_{0 \leq k < P}, (a_p)_{p \in {\mathcal P}_P} ) := \frac{1}{|{\mathcal P}_P|} \sum_{p \in {\mathcal P}_P} \epsilon_0 \epsilon_p (p 1_{a_p = 0\ (p)} - 1).

As {|Q_P| \geq \varepsilon}, we have

\displaystyle |F_P(X_P, Y_P)| \geq \varepsilon/2

with probability at least {\varepsilon/2}. On the other hand, an application of Hoeffding’s inequality and the prime number theorem shows that if {U_P} is drawn uniformly from {\prod_{p \in {\mathcal P}_P} {\bf Z}/p{\bf Z}} and independently of {X_P}, that one has the concentration of measure bound

\displaystyle \mathop{\bf P}( |F_P(X_P, U_P)| > \varepsilon/2 ) \leq 2 \exp( - c_\varepsilon P / \log P )

for some {c_\varepsilon > 0}. Using the Pinsker-type inequality from this previous blog post, we conclude the lower bound

\displaystyle I( X_P : Y_P ) \gg_\varepsilon \frac{P}{\log P}

on the mutual information between {X_P} and {Y_P}. Using Shannon entropy inequalities as in my paper, this implies the entropy decrement

\displaystyle \frac{H(X_{kP})}{kP} \leq \frac{H(X_P)}{P} - \frac{c_\varepsilon}{\log P} + O( \frac{1}{k} )

for any natural number {k}, which on iterating (and using the divergence of {\sum_{j=1}^\infty \frac{1}{j \log j}}) shows that {\frac{H(X_P)}{P}} eventually becomes negative for sufficiently large {P}, which is absurd. (See also this previous blog post for a sketch of a slightly different way to conclude the argument from entropy inequalities.)


Filed under: expository, math.NT Tagged: Chowla conjecture, correspondence principle, Liouville function, randomness

March 10, 2017

John BaezPi and the Golden Ratio

Two of my favorite numbers are pi:

\pi = 3.14159...

and the golden ratio:

\displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} } = 1.6180339...

They’re related:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Greg Egan and I came up with this formula last weekend. It’s probably not new, and it certainly wouldn’t surprise experts, but it’s still fun coming up with a formula like this. Let me explain how we did it.

History has a fractal texture. It’s not exactly self-similar, but the closer you look at any incident, the more fine-grained detail you see. The simplified stories we learn about the history of math and physics in school are like blurry pictures of the Mandelbrot set. You can see the overall shape, but the really exciting stuff is hidden.

François Viète is a French mathematician who doesn’t show up in those simplified stories. He studied law at Poitiers, graduating in 1559. He began his career as an attorney at a quite high level, with cases involving the widow of King Francis I of France and also Mary, Queen of Scots. But his true interest was always mathematics. A friend said he could think about a single question for up to three days, his elbow on the desk, feeding himself without changing position.

Nonetheless, he was highly successful in law. By 1590 he was working for King Henry IV. The king admired his mathematical talents, and Viète soon confirmed his worth by cracking a Spanish cipher, thus allowing the French to read all the Spanish communications they were able to obtain.

In 1591, François Viète came out with an important book, introducing what is called the new algebra: a symbolic method for dealing with polynomial equations. This deserves to be much better known; it was very familiar to Descartes and others, and it was an important precursor to our modern notation and methods. For example, he emphasized care with the use of variables, and advocated denoting known quantities by consonants and unknown quantities by vowels. (Later people switched to using letters near the beginning of the alphabet for known quantities and letters near the end like x,y,z for unknowns.)

In 1593 he came out with another book, Variorum De Rebus Mathematicis Responsorum, Liber VIII. Among other things, it includes a formula for pi. In modernized notation, it looks like this:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

This is remarkable! First of all, it looks cool. Second, it’s the earliest known example of an infinite product in mathematics. Third, it’s the earliest known formula for the exact value of pi. In fact, it seems to be the earliest formula representing a number as the result of an infinite process rather than of a finite calculation! So, Viète’s formula has been called the beginning of analysis. In his article “The life of pi”, Jonathan Borwein went even further and called Viète’s formula “the dawn of modern mathematics”.

How did Viète come up with his formula? I haven’t read his book, but the idea seems fairly clear. The area of the unit circle is pi. So, you can approximate pi better and better by computing the area of a square inscribed in this circle, and then an octagon, and then a 16-gon, and so on:

If you compute these areas in a clever way, you get this series of numbers:

\begin{array}{ccl} A_4 &=& 2 \\  \\ A_8 &=& 2 \cdot \frac{2}{\sqrt{2}} \\  \\ A_{16} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}}  \\  \\ A_{32} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}}  \end{array}

and so on, where A_n is the area of a regular n-gon inscribed in the unit circle. So, it was only a small step for Viète (though an infinite leap for mankind) to conclude that

\displaystyle{ \pi = 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

or, if square roots in a denominator make you uncomfortable:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

The basic idea here would not have surprised Archimedes, who rigorously proved that

223/71 < \pi < 22/7

by approximating the circumference of a circle using a regular 96-gon. Since 96 = 2^5 \times 3, you can draw a regular 96-gon with ruler and compass by taking an equilateral triangle and bisecting its edges to get a hexagon, bisecting the edges of that to get a 12-gon, and so on up to 96. In a more modern way of thinking, you can figure out everything you need to know by starting with the angle \pi/3 and using half-angle formulas 4 times to work out the sine or cosine of \pi/96. And indeed, before Viète came along, Ludolph van Ceulen had computed pi to 35 digits using a regular polygon with 2^{62} sides! So Viète’s daring new idea was to give an exact formula for pi that involved an infinite process.

Now let’s see in detail how Viète’s formula works. Since there’s no need to start with a square, we might as well start with a regular n-gon inscribed in the circle and repeatedly bisect its sides, getting better and better approximations to pi. If we start with a pentagon, we’ll get a formula for pi that involves the golden ratio!

We have

\displaystyle{ \pi = \lim_{k \to \infty} A_k }

so we can also compute pi by starting with a regular n-gon and repeatedly doubling the number of vertices:

\displaystyle{ \pi = \lim_{k \to \infty} A_{2^k n} }

The key trick is to write A_{2^k}{n} as a ‘telescoping product’:

A_{2^k n} = A_n \cdot \frac{A_{2n}}{A_n} \cdot  \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}}

Thus, taking the limit as k \to \infty we get

\displaystyle{ \pi = A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }

where we start with the area of the n-gon and keep ‘correcting’ it to get the area of the 2n-gon, the 4n-gon, the 8n-gon and so on.

There’s a simple formula for the area of a regular n-gon inscribed in a circle. You can chop it into 2 n right triangles, each of which has base \sin(\pi/n) and height \cos(\pi/n), and thus area n \sin(\pi/n) \cos(\pi/n):

Thus,

A_n = n \sin(\pi/n) \cos(\pi/n) = \displaystyle{\frac{n}{2} \sin(2 \pi / n)}

This lets us understand how the area changes when we double the number of vertices:

\displaystyle{ \frac{A_{n}}{A_{2n}} = \frac{\frac{n}{2} \sin(2 \pi / n)}{n \sin(\pi / n)} = \frac{n \sin( \pi / n) \cos(\pi/n)}{n \sin(\pi / n)} = \cos(\pi/n) }

This is nice and simple, but we really need a recursive formula for this quantity. Let’s define

\displaystyle{ R_n = 2\frac{A_{n}}{A_{2n}} = 2 \cos(\pi/n) }

Why the factor of 2? It simplifies our calculations slightly. We can express R_{2n} in terms of R_n using the half-angle formula for the cosine:

\displaystyle{ R_{2n} = 2 \cos(\pi/2n) = 2\sqrt{\frac{1 + \cos(\pi/n)}{2}} = \sqrt{2 + R_n} }

Now we’re ready for some fun! We have

\begin{array}{ccl} \pi &=& \displaystyle{ A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }  \\ \\ & = &\displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{R_{2n}} \cdot \frac{2}{R_{4n}} \cdots } \end{array}

so using our recursive formula R_{2n} = \sqrt{2 + R_n}, which holds for any n, we get

\pi =  \displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{\sqrt{2 + R_n}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_n}}} \cdots }

I think this deserves to be called the generalized Viète formula. And indeed, if we start with a square, we get

A_4 = \displaystyle{\frac{4}{2} \sin(2 \pi / 4)} = 2

and

R_4 = 2 \cos(\pi/4) = \sqrt{2}

giving Viète’s formula:

\pi = \displaystyle{ 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

as desired!

But what if we start with a pentagon? For this it helps to remember a beautiful but slightly obscure trig fact:

\cos(\pi / 5) = \Phi/2

and a slightly less beautiful one:

\displaystyle{ \sin(2\pi / 5) = \frac{1}{2} \sqrt{2 + \Phi} }

It’s easy to prove these, and I’ll show you how later. For now, note that they imply

A_5 = \displaystyle{\frac{5}{2} \sin(2 \pi / 5)} = \frac{5}{4} \sqrt{2 + \Phi}

and

R_5 = 2 \cos(\pi/5) = \Phi

Thus, the formula

\pi =  \displaystyle{ A_5 \cdot \frac{2}{R_5} \cdot \frac{2}{\sqrt{2 + R_5}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_5}}} \cdots }

gives us

\pi =  \displaystyle{ \frac{5}{4} \sqrt{2 + \Phi} \cdot \frac{2}{\Phi} \cdot \frac{2}{\sqrt{2 + \Phi}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdots }

or, cleaning it up a bit, the formula we want:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Voilà!

There’s a lot more to say, but let me just explain the slightly obscure trigonometry facts we needed. To derive these, I find it nice to remember that a regular pentagon, and the pentagram inside it, contain lots of similar triangles:



Using the fact that all these triangles are similar, it’s easy to show that for any one, the ratio of the long side to the short side is \Phi to 1, since

\displaystyle{\Phi = 1 + \frac{1}{\Phi} }

Another important fact is that the pentagram trisects the interior angle of the regular pentagon, breaking the interior angle of 108^\circ = 3\pi/5 into 3 angles of 36^\circ = \pi/5:



Again this is easy and fun to show.

Combining these facts, we can prove that

\displaystyle{ \cos(2\pi/5) = \frac{1}{2\Phi}  }

and

\displaystyle{ \cos(\pi/5) = \frac{\Phi}{2} }

To prove the first equation, chop one of those golden triangles into two right triangles and do things you learned in high school. To prove the second, do the same things to one of the short squat isosceles triangles:

Starting from these equations and using \cos^2 \theta + \sin^2 \theta = 1, we can show

\displaystyle{ \sin(2\pi/5) = \frac{1}{2}\sqrt{2 + \Phi}}

and, just for completeness (we don’t need it here):

\displaystyle{ \sin(\pi/5) = \frac{1}{2}\sqrt{3 - \Phi}}

These require some mildly annoying calculations, where it helps to use the identity

\displaystyle{\frac{1}{\Phi^2} = 2 - \Phi }

Okay, that’s all for now! But if you want more fun, try a couple of puzzles:

Puzzle 1. We’ve gotten formulas for pi starting from a square or a regular pentagon. What formula do you get starting from an equilateral triangle?

Puzzle 2. Using the generalized Viète formula, prove Euler’s formula

\displaystyle{  \frac{\sin x}{x} = \cos\frac{x}{2} \cdot \cos\frac{x}{4} \cdot \cos\frac{x}{8} \cdots }

Conversely, use Euler’s formula to prove the generalized Viète formula.

So, one might say that the real point of Viète’s formula, and its generalized version, is not any special property of pi, but Euler’s formula.


n-Category Café The Logic of Space

Mathieu Anel and Gabriel Catren are editing a book called New Spaces for Mathematics and Physics, about all different kinds of notions of “space” and their applications. Among other things, there are chapters about smooth spaces, \infty-groupoids, topos theory, stacks, and various other things of interest to nn-Cafe patrons, all of which I am looking forward to reading. There are chapters by our own John Baez about the continuum and Urs Schreiber about higher prequantum geometry. Here is my own contribution:

I intend this to be my last effort at popularization of HoTT for some time, and accordingly it ended up being rather comprehensive. It begins with a 20-page introduction to type theory, from the perspective of a mathematician wanting to use it as an internal language for categories. There are many introductions to type theory, but probably not enough from this point of view, and moreover most popularizations of type theory are rather vague about its categorical semantics; thus I chose (with some additional prompting from the editors) to spend quite some time on this, and be fairly (though not completely) precise about exactly how the categorical semantics of type theory works.

I also decided to emphasize the point of view that type theory (and “syntax” more generally) is a presentation of the initial object in some category of structured categories. Some category theorists respond to this by saying essentially “what good is it to describe that initial object in some complicated way, rather than just studying it categorically?” It’s taken me a while to be able to express the answer in a really satisfying way (at least, one that satisfies me), and I tried to do so here. The short version is that by explicitly constructing an object that has some universal property, we may learn more about it than we can conclude from the mere statement of its universal property. This is one of the reasons that topologists study classifying spaces, category theorists study classifying toposes, and algebraists study free groups. For a longer answer, read the chapter.

After this introduction to ordinary type theory, but before moving on to homotopy type theory, I spent a while on synthetic topology: type theory treated as an internal language for a category of spaces (actual space-spaces, not \infty-groupoids). This seemed appropriate since the book is about all different kinds of space. It also provides a good justification of type theory’s constructive logic for a classical mathematician, since classical principles like the law of excluded middle and the axiom of choice are simply false in categories of spaces (e.g. a continuous surjection generally fails to have a continuous section).

I also introduced some specific toposes of spaces, such as Johnstone’s topological topos and the toposes of continuous sets and smooth sets. I also mentioned their “local” or “cohesive” nature, and how it can be regarded as explaining why so many objects in mathematics come “naturally” with topological structure. Namely, because mathematics can be done in type theory, and thereby interpreted in any topos, any mathematical construction can be interpreted in a topos of spaces; and since the forgetful functor from a local/cohesive topos preserves most categorical operations, in most cases the “underlying set” of such an interpretation is what we would get by performing the same construction directly with sets. This also tell us in what circumstances we should expect a construction that takes account of topology to disagree with its analogue for discrete sets, and in what circumstances we should expect a set-theoretic construction to inherit a nontrivial topology even when there is no topological input; read the chapter for details.

The subsequent introduction to homotopy type theory and synthetic homotopy theory has nothing particularly special about it, although I decided to downplay the role of “fibration categories” in favor of (,1)(\infty,1)-categories when talking about higher-categorical semantics. Current technology for constructing higher-categorical interpretations of type theory uses fibration categories, but I don’t regard that as necessarily essential, and future technology may move away from it. In particular, in addition to the intuition of identity types as path objects in a model category, I think it’s valuable to have a similar intution for identity types as diagonal morphisms in an (,1)(\infty,1)-category.

The last section brings everything together by discussing cohesive homotopy type theory, which is of course one of my current personal interests, modeling the local/cohesive structure of an (,1)(\infty,1)-topos with modalities inside homotopy type theory. As I’ve said before, I feel that this perspective greatly clarifies the distinction and relationship between space-spaces and \infty-groupoid “spaces”, with the connecting “fundamental \infty-groupoid” functor characterized by a simple universal property.

Finally, in the conclusion I at last allowed myself some philosophical rein to speculate about synthetic theories as foundations for mathematics, as opposed to simply internal languages for categories constructed in an ambient classical mathematics. Once we see that mathematics can be formulated in type theory to apply equally well in a category of spaces as in the category of sets, there is no particular reason to regard the category of sets as the “true” foundation and the category of spaces as “less foundational”. Just as we can construct a category of spaces from a category of sets by equipping sets with topological structure, we can construct a “category of sets” (i.e. a Boolean topos) from a “category of spaces” by restricting to the subcategory of objects with uninteresting topology (the discrete or codiscrete ones). Either category, therefore, can serve as an equally valid “reference frame” from which to describe mathematics.

John PreskillWhat is Water 2.0

Before I arrived in Los Angeles, I thought I might need to hit the brakes a bit with some of the radical physics theories I’d encountered during my preliminary research. After all, these were scientists I was meeting: people who “engage in a systematic activity to acquire knowledge that describes and predicts the natural world”, according to Wikipedia. It turns out I wasn’t hardly as far-out as they were.

I could recount numerous anecdotes that exemplify my encounter with the frighteningly intelligent and vivid imagination of the people at LIGO with whom I had the great pleasure of working – Prof. Rana X. Adhikari, Maria Okounkova, Eric Quintero, Maximiliano Isi, Sarah Gossan, and Jameson Graef Rollins – but in the end it all boils down to a parable about fish.

Rana’s version, which he recounted to me on our first meeting, goes as follows: “There are these two young fish swimming along, and a scientist approaches the aquarium and proclaims, “We’ve finally discovered the true nature of water!” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes, “What the hell is water?”” In David Foster Wallace’s more famous version, the scientist is not a scientist but an old fish, who greets them saying, “Morning, boys. How’s the water?”

What is Water

The difference is not circumstantial. Foster Wallace’s version is an argument against “unconsciousness, the default setting, the rat race, the constant gnawing sense of having had, and lost, some infinite thing” – personified by the young fish – and an urgent call for awareness – personified by the old fish. But in Rana’s version, the matter is more hard-won: as long as they are fish, they haven’t the faintest apprehension of the very concept of water: even a wise old fish would fail to notice. In this adaptation, gaining awareness of that which is “so real and essential, so hidden in plain sight all around us, all the time” as Foster Wallace describes it, demands much more than just an effort in mindfulness. It demands imagining the unimaginable.

Albert Einstein once said that “Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.” But the question remains of how far our imagination can reach, and where the radius ends for us in “what there ever will be to know and understand”, versus that which happens to be. My earlier remark about LIGO scientists’ being far-out does not at all refer to a speculative disposition, which would characterise amateur anything-goes, and does go over-the-edge pseudo-science. Rather, it refers to the high level of creativity that is demanded of physicists today, and to the untiring curiosity that drives them to expand the limits of that radius, despite all odds.

The possibility of imagination has become an increasingly animating thought within my currently ongoing project:

As an independent curator of contemporary art, I travelled to Caltech for a 6-week period of research, towards developing an exhibition that will invite the public to engage with some of the highly challenging implications around the concept of time in physics. In it, I identify LIGO’s breakthrough detection of gravitational waves as an unparalleled incentive by which to acquire – in broad cultural terms – a new sense of time that departs from the old and now wholly inadequate one. After LIGO’s announcement proved that time fluctuation not only happens, but that it happened here, to us, on a precise date and time, it is finally possible for a broader public to relate, however abstract some of the concepts from the field of physics may remain. More simply put: we can finally sense that the water is moving.[1]

One century after Einstein’s Theory of General Relativity, most people continue to hold a highly impoverished idea of the nature of time, despite it being perhaps the most fundamental element of our existence. For 100 years there was no blame or shame in this. Because within all possible changes to the three main components of the universe – space, time & energy – the fluctuation of time was always the only one that escaped our sensorial capacities, existing exclusively in our minds, and finding its fullest expression in mathematical language. If you don’t speak mathematics, time fluctuation remains impossible to grasp, and painful to imagine.

But on February 11th, 2016, this situation changed dramatically.

On this date, a televised announcement told the world of the first-ever sensory detection of time-fluctuation, made with the aid of the most sensitive machine ever to be built by mankind. Finally, we have sensorial access to variations in all components of the universe as we know it. What is more, we observe the non-static passage of time through sound, thereby connecting it to the most affective of our senses.

Strain-waveforms_v2

Of course, LIGO’s detection is limited to time fluctuation and doesn’t yet make other mind-bending behaviours of time observable. But this is only circumstantial. The key point is that we can take this initial leap, and that it loosens our feet from the cramp of Newtonian fixity. Once in this state, gambolling over to ideas about zero time tunnelling, non-causality, or the future determining the present, for instance, is far more plausible, and no longer painful but rather seductive, at least, perhaps, for the playful at heart.

Taking a slight off-road (to be re-routed in a moment): there is a common misconception about children’s allegedly free-spirited creativity. Watching someone aged between around 4 and 15 draw a figure will demonstrate quite clearly just how taut they really are, and that they apply strict schemes that follow reality as they see and learn to see it. Bodies consistently have eyes, mouths, noses, heads, rumps and limbs, correctly placed and in increasingly realistic colours. Ask them to depart from these conventions – “draw one eye on his forehead”, “make her face green” – like masters such as Pablo Picasso and Henri Matisse have done – and they’ll likely become very upset (young adolescents being particularly conservative, reaching the point of panic when challenged to shed consensus).

This is not to compare the lay public (including myself) to children, but to suggest that there’s no inborn capacity – the unaffected, ‘genius’ naïveté that the modernist movements of Primitivism, Art Brut and Outsider Art exalted – for developing a creativity that is of substance. Arriving at a consequential idea, in both art and physics, entails a great deal of acumen and is far from gratuitous, however whimsical the moment in which it sometimes appears. And it’s also to suggest that there’s a necessary process of acquaintance – the knowledge of something through experience – in taking a cognitive leap away from the seemingly obvious nature of reality. If there’s some truth in this, then LIGO’s expansion of our sensorial access to the fluctuation of time, together with artistic approaches that lift the remaining questions and ambiguities of spacetime onto a relational, experiential plane, lay fertile ground on which to begin to foster a new sense of time – on a broad cultural level – however slowly it unfolds.

The first iteration of this project will be an exhibition, to take place in Berlin, in July 2017. It will feature existing and newly commissioned works by established and upcoming artists from Los Angeles and Berlin, working in sound, installation and video, to stage a series of immersive environments that invite the viewers’ bodily interaction.

Though the full selection cannot be disclosed just yet, I would like here to provide a glimpse of two works-in-progress by artist-duo Evelina Domnitch & Dmitry Gelfand, whom I invited to Los Angeles to collaborate in my research with LIGO, and whose contribution has been of great value to the project.

For more details on the exhibition, please stay tuned, and be warmly welcome to visit Berlin in July!

Text & images: courtesy of the artists.

ORBIHEDRON | 2017Orbihedron

A dark vortex in the middle of a water-filled basin emits prismatic bursts of rotating light. Akin to a radiant ergosphere surrounding a spinning black hole, Orbihedron evokes the relativistic as well as quantum interpretation of gravity – the reconciliation of which is essential for unravelling black hole behaviour and the origins of the cosmos. Descending into the eye of the vortex, a white laser beam reaches an impassible singularity that casts a whirling circular shadow on the basin’s floor. The singularity lies at the bottom of a dimple on the water’s surface, the crown of the vortex, which acts as a concave lens focussing the laser beam along the horizon of the “black hole” shadow. Light is seemingly swallowed by the black hole in accordance with general relativity, yet leaks out as quantum theory predicts.

ER = EPR | 2017ER=EPR

Two co-rotating vortices, joined together via a slender vortical bridge, lethargically drift through a body of water. Light hitting the water’s surface transforms the vortex pair into a dynamic lens, projecting two entangled black holes encircled by shimmering halos. As soon as the “wormhole” link between the black holes rips apart, the vortices immediately dissipate, analogously to the collapse of a wave function. Connecting distant black holes or two sides of the same black hole, might wormholes be an example of cosmic-scale quantum entanglement? This mind-bending conjecture of Juan Maldacena and Leonard Susskind can be traced back to two iconoclastic papers from 1935. Previously thought to be unrelated (both by their authors and numerous generations of readers), one article, the legendary EPR (penned by Einstein, Podolsky and Rosen) engendered the concept of quantum entanglement or “spooky action at a distance”; and the second text theorised Einstein-Rosen (ER) bridges, later known as wormholes. Although the widely read EPR paper has led to the second quantum revolution, currently paving the way to quantum simulation and computation, ER has enjoyed very little readership. By equating ER to EPR, the formerly irreconcilable paradigms of physics have the potential to converge: the phenomenon of gravity is imagined in a quantum mechanical context. The theory further implies, according to Maldacena, that the undivided, “reliable structure of space-time is due to the ghostly features of entanglement”.

 

[1] I am here extending our capacity to sense to that of the technology itself, which indeed measured the warping of spacetime. However, in interpreting gravitational waves from a human frame of reference (moving nowhere near the speed of light at which gravitational waves travel), they would seem to be spatial. In fact, the elongation of space (a longer wavelength) directly implies that time slows down (a longer wave-period), so that the two are indistinguishable.

 

Isabel de Sena


n-Category Café Postdocs in Sydney

Richard Garner writes:

The category theory group at Macquarie is currently advertising a two-year Postdoctoral Research Fellowship to work on a project entitled “Enriched categories: new applications in geometry and logic”.

Applications close 31st March. The position is expected to start in the second half of this year.

More information can be found at the following link:

http://jobs.mq.edu.au/cw/en/job/500525/postdoctoral-research-fellow

Feel free to contact me with further queries.

Richard Garner

March 08, 2017

Terence TaoOpen thread for mathematicians on the immigration executive order

The self-chosen remit of my blog is “Updates on my research and expository papers, discussion of open problems, and other maths-related topics”.  Of the 774 posts on this blog, I estimate that about 99% of the posts indeed relate to mathematics, mathematicians, or the administration of this mathematical blog, and only about 1% are not related to mathematics or the community of mathematicians in any significant fashion.

This is not one of the 1%.

Mathematical research is clearly an international activity.  But actually a stronger claim is true: mathematical research is a transnational activity, in that the specific nationality of individual members of a research team or research community are (or should be) of no appreciable significance for the purpose of advancing mathematics.  For instance, even during the height of the Cold War, there was no movement in (say) the United States to boycott Soviet mathematicians or theorems, or to only use results from Western literature (though the latter did sometimes happen by default, due to the limited avenues of information exchange between East and West, and former did occasionally occur for political reasons, most notably with the Soviet Union preventing Gregory Margulis from traveling to receive his Fields Medal in 1978 EDIT: and also Sergei Novikov in 1970).    The national origin of even the most fundamental components of mathematics, whether it be the geometry (γεωμετρία) of the ancient Greeks, the algebra (الجبر) of the Islamic world, or the Hindu-Arabic numerals 0,1,\dots,9, are primarily of historical interest, and have only a negligible impact on the worldwide adoption of these mathematical tools. While it is true that individual mathematicians or research teams sometimes compete with each other to be the first to solve some desired problem, and that a citizen could take pride in the mathematical achievements of researchers from their country, one did not see any significant state-sponsored “space races” in which it was deemed in the national interest that a particular result ought to be proven by “our” mathematicians and not “theirs”.   Mathematical research ability is highly non-fungible, and the value added by foreign students and faculty to a mathematics department cannot be completely replaced by an equivalent amount of domestic students and faculty, no matter how large and well educated the country (though a state can certainly work at the margins to encourage and support more domestic mathematicians).  It is no coincidence that all of the top mathematics department worldwide actively recruit the best mathematicians regardless of national origin, and often retain immigration counsel to assist with situations in which these mathematicians come from a country that is currently politically disfavoured by their own.

Of course, mathematicians cannot ignore the political realities of the modern international order altogether.  Anyone who has organised an international conference or program knows that there will inevitably be visa issues to resolve because the host country makes it particularly difficult for certain nationals to attend the event.  I myself, like many other academics working long-term in the United States, have certainly experienced my own share of immigration bureaucracy, starting with various glitches in the renewal or application of my J-1 and O-1 visas, then to the lengthy vetting process for acquiring permanent residency (or “green card”) status, and finally to becoming naturalised as a US citizen (retaining dual citizenship with Australia).  Nevertheless, while the process could be slow and frustrating, there was at least an order to it.  The rules of the game were complicated, but were known in advance, and did not abruptly change in the middle of playing it (save in truly exceptional situations, such as the days after the September 11 terrorist attacks).  One just had to study the relevant visa regulations (or hire an immigration lawyer to do so), fill out the paperwork and submit to the relevant background checks, and remain in good standing until the application was approved in order to study, work, or participate in a mathematical activity held in another country.  On rare occasion, some senior university administrator may have had to contact a high-ranking government official to approve some particularly complicated application, but for the most part one could work through normal channels in order to ensure for instance that the majority of participants of a conference could actually be physically present at that conference, or that an excellent mathematician hired by unanimous consent by a mathematics department could in fact legally work in that department.

With the recent and highly publicised executive order on immigration, many of these fundamental assumptions have been seriously damaged, if not destroyed altogether.  Even if the order was withdrawn immediately, there is no longer an assurance, even for nationals not initially impacted by that order, that some similar abrupt and major change in the rules for entry to the United States could not occur, for instance for a visitor who has already gone through the lengthy visa application process and background checks, secured the appropriate visa, and is already in flight to the country.  This is already affecting upcoming or ongoing mathematical conferences or programs in the US, with many international speakers (including those from countries not directly affected by the order) now cancelling their visit, either in protest or in concern about their ability to freely enter and leave the country.  Even some conferences outside the US are affected, as some mathematicians currently in the US with a valid visa or even permanent residency are uncertain if they could ever return back to their place of work if they left the country to attend a meeting.  In the slightly longer term, it is likely that the ability of elite US institutions to attract the best students and faculty will be seriously impacted.  Again, the losses would be strongest regarding candidates that were nationals of the countries affected by the current executive order, but I fear that many other mathematicians from other countries would now be much more concerned about entering and living in the US than they would have previously.

It is still possible for this sort of long-term damage to the mathematical community (both within the US and abroad) to be reversed or at least contained, but at present there is a real risk of the damage becoming permanent.  To prevent this, it seems insufficient for me for the current order to be rescinded, as desirable as that would be; some further legislative or judicial action would be needed to begin restoring enough trust in the stability of the US immigration and visa system that the international travel that is so necessary to modern mathematical research becomes “just” a bureaucratic headache again.

Of course, the impact of this executive order is far, far broader than just its effect on mathematicians and mathematical research.  But there are countless other venues on the internet and elsewhere to discuss these other aspects (or politics in general).  (For instance, discussion of the qualifications, or lack thereof, of the current US president can be carried out at this previous post.) I would therefore like to open this post to readers to discuss the effects or potential effects of this order on the mathematical community; I particularly encourage mathematicians who have been personally affected by this order to share their experiences.  As per the rules of the blog, I request that “the discussions are kept constructive, polite, and at least tangentially relevant to the topic at hand”.

Some relevant links (please feel free to suggest more, either through comments or by email):


Filed under: math.HO, non-technical, opinion

March 06, 2017

Jordan EllenbergPeter Norvig, the meaning of polynomials, debugging as psychotherapy

I saw Peter Norvig give a great general-audience talk on AI at Berkeley when I was there last month.  A few notes from his talk.

  • “We have always prioritized fast and cheap over safety and privacy — maybe this time we can make better choices.”
  • He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial.  But the code looks weird to a human eye.  To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output.  So has it really ”learned” the polynomial?  I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input.  For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp.  I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of.  Relevant:  Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!)
  • Norvig made the point that traditional computer programs are very modular, leading to a highly successful debugging tradition of zeroing in on the precise part of the program that is doing something wrong, then fixing that part.  An algorithm or process developed by a machine, by contrast, may not have legible “parts”!  If a neural net is screwing up when classifying something, there’s no meaningful way to say “this neuron is the problem, let’s fix it.”  We’re dealing with highly non-modular complex systems which have evolved into a suboptimally functioning state, and you have to find a way to improve function which doesn’t involve taking the thing apart and replacing the broken component.  Of course, we already have a large professional community that works on exactly this problem.  They’re called therapists.  And I wonder whether the future of debugging will look a lot more like clinical psychology than it does like contemporary software engineering.

Chad OrzelPhysics Blogging Round-Up: February

Another month, another collection of physics posts from Forbes:

Quantum Loopholes And The Problem Of Free Will: In one of those odd bits of synchronicity, a previous post about whether dark matter and energy might affect atoms in a way that allowed for “free will” was followed shortly by a news release about an experiment looking at quantum entanglement with astronomical sources acting as “random number generators.” This pushes the point when local interactions might’ve generated any correlation between measurements back in time a thousand-plus years, which in turn ties into the question of “free will.”

Scientific Knowledge Is Made To Be Used: Some thoughts on a division in attitudes between science and other academic disciplines, where the way we do science naturally leads to more discussion of applications.

Why Writing About Math Is The Best Part Of Common Core: In which I say nice things about the way my kids are being taught about math.

Why Do We Spend So Much Time Teaching Historical Physics?: I’m teaching the badly misnamed “modern physics” course this term, and finding it frustrating because the book I’m using isn’t historical enough.

How Do You Create Quantum Entanglement?: Prompted by a conversation with a colleague from history, a sketch of the main ways experimental physicists establish correlations between the quantum states of particles.

A good month traffic-wise, though I was surprised by the detailed dynamics of some of these– in particular, I expected an immediate negative response to the Common Core thing, but in fact that took a while to take off, and most of the response was positive. The only one that didn’t do well by my half-joking metric of “Get more views than there are students at Union” was the science-knowledge one, and that was probably justified as it was just kind of noodling around.

So, there was February. March brings with it the end of my current crushingly heavy teaching load, which should give a little more opportunity for substantive blogging. Maybe.

Jordan EllenbergEllenbergs

I got a message last week from the husband of my first cousin once removed;  his father-in-law, Leonard Ellenberg, was my grandfather Julius Ellenberg’s brother.  I never knew my grandfather; he died before I was born, and I was named for him.

The message contained a huge amount of information about a side of my family I’ve never known well.  I’m still going through it all.  But I wanted to share some of it while it was on my mind.

Here’s the manifest for the voyage of the S.S. Polonia, which left Danzig on September 17, 1923 and arrived in New York on October 1.

owadias-ellenbergs-family-immgration-doc

Owadje Ellenberg (always known as Owadia in my family) was my great-grandfather.  He came to New York with his wife Sura-Fejga (known to us as Sara), Markus (Max), Etia-Race (Ethel), Leon (Leonard), Samuel and Bernard.  Sara was seven months pregnant with my uncle Morris Ellenberg, the youngest child.

Owadje gives his occupation as “mason”; his son Max, only 17, was listed as “tailor.”  They came from Stanislawow, Poland, which is now the city of Ivano-Frankivsk in Ukraine.  On the immigration form you had to list a relative in your country of origin; Owadje listed his brother, Zacharja, who lived on Zosina Wola 6 in Stanislawow.  None of the old street names have survived to the present, but looking at this old map of Stanislawow

stanislawow

it seems pretty clear Zosina Wola is the present day Yevhena Konoval’tsya Street.  I have no way of knowing whether the numbering changed, but #6 Yevhena Konoval’tsya St. seems to be the setback building here:

screen-shot-2017-03-03-at-3-mar-10-53-pm

So this is the best guess I have as to where my ancestors lived in the old country.  The name Zosina Wola lives on only in the name of a bar a few blocks down Yevhena Konoval’tsya:

screen-shot-2017-03-03-at-3-mar-11-02-pm

 

Owadje, now Owadia, files a declaration of intention to naturalize in 1934:

owadia-ellenbergs-naturalization-doc

His signature is almost as bad as mine!  By 1934 he’s living in Borough Park, Brooklyn, a plasterer.  5 foot 7 and 160lb; I think every subsequent Ellenberg man has been that size by the age of 15.  Shtetl nutrition.  There are two separate questions on this form, “color” and “race”:  for color he puts white, for race he puts “Hebrew.”  What did other Europeans put for race?  He puts his hometown as Sopoff, which I think must be the modern Sopiv; my grandmother Sara was from Obertyn, quite close by.  I guess they moved to the big city, Stanislowow, about 40 miles away, when they were pretty young; they got married there in 1902, when they were 21.  The form says he previously filed a declaration of intention in 1926.  What happened?  Did he just not follow through, or was his naturalization rejected?  Did he ever become a citizen?  I don’t know.

Here’s what his house in Brooklyn looks like now:

screen-shot-2017-03-04-at-4-mar-11-05-am

 

Did you notice whose name was missing from the Polonia’s manifest?  Ovadje’s oldest son, my grandfather, Julius.  Except one thing I’ve learned from all this is that I don’t actually know what my grandfather’s name was.  Julius is what we called him.  But my dad says his passport says “Israel Ellenberg.”  And his naturalization papers

julius-ellenberg-naturalization-doc

have him as “Juda Ellenberg”  (Juda being the Anglicization of Yehuda, his and my Hebrew name.)  So didn’t that have to be his legal name?  But how could that not be on his passport?

Update:  Cousin Phyllis came through for me!  My grandfather legally changed his name to Julius on June 13, 1927, four months after he filed for naturalization.    

1927-juda-ellenberg

My grandfather was the first to come to America, in December 1920, and he came alone.  He was 16.  He managed to make enough money to bring the whole rest of the family in late 1923, which was a good thing because in May 1924 Calvin Coolidge signed the Johnson-Reed Act which clamped down on immigration by people thought to be debasing the American racial stock:  among these were Italians, Chinese, Czechs, Spaniards, and Jews, definitely Jews.

Another thing I didn’t know:  my grandfather lists his port of entry as Vanceboro, Maine.  That’s not a seaport; it’s a small town on the Canadian border.  So Julius/Juda/Israel must have sailed to Canada; this I never knew.  Where would he have landed? Sounds like most Canadian immigrants landed at Quebec or Halifax, and Halifax makes much more sense if he entered the US at Vanceboro.  But why did he sail to Canada instead of the US?  And why did he leave from France (the form says “Montrese, France,” a place I can’t find) instead of Poland?

In 1927, when he naturalized, Julius lived at 83 2nd Avenue, a building built in 1900 at the boundary of the Bowery and the East Village.  Here’s what it looks like now:

screen-shot-2017-03-04-at-4-mar-10-51-am

Not a lot of new immigrants able to afford rent there these days, I’m betting.  Later he’d move to Long Beach, Long Island, where my father and his sisters grew up.

My first-cousin-once-removed-in-law went farther back, too, all the way back to Mojżesz Ellenberg, who was born sometime in the middle of the 18th century.  The Hapsburg Empire required Jews to adopt surnames only in 1787; so Mojżesz could very well have been the first Ellenberg.  You may be thinking he’s Owadia’s father’s father’s father, but no — Ellenberg was Owadia’s mother’s name.  I was puzzled by this but actually it was common.  What it meant is that Mordko Kasirer, Owadia’s father, didn’t want to pay the fee for a civil marriage — why should he, when he was already married to Rivka Ellenberg in the synagogue?  But if you weren’t legally married, your children weren’t allowed to take their father’s surname.  So be it.  Mordko wasn’t gonna get ripped off by the system.  Definitely my relative.

Update:  Cousin Phyllis Rosner sends me my grandfather’s birth record.  At birth in Poland he’s Izrael Juda Ellenberg.  This still doesn’t answer what his legal name in the US was, but it explains the passport!

1904-izrael-juda-ellenberg-birth

 

 

 

 

 

 

 

 

 

 

 

 


March 04, 2017

Terence TaoSpecial cases of Shannon entropy

Given a random variable {X} that takes on only finitely many values, we can define its Shannon entropy by the formula

\displaystyle  H(X) := \sum_x \mathbf{P}(X=x) \log \frac{1}{\mathbf{P}(X=x)}

with the convention that {0 \log \frac{1}{0} = 0}. (In some texts, one uses the logarithm to base {2} rather than the natural logarithm, but the choice of base will not be relevant for this discussion.) This is clearly a nonnegative quantity. Given two random variables {X,Y} taking on finitely many values, the joint variable {(X,Y)} is also a random variable taking on finitely many values, and also has an entropy {H(X,Y)}. It obeys the Shannon inequalities

\displaystyle  H(X), H(Y) \leq H(X,Y) \leq H(X) + H(Y)

so we can define some further nonnegative quantities, the mutual information

\displaystyle  I(X:Y) := H(X) + H(Y) - H(X,Y)

and the conditional entropies

\displaystyle  H(X|Y) := H(X,Y) - H(Y); \quad H(Y|X) := H(X,Y) - H(X).

More generally, given three random variables {X,Y,Z}, one can define the conditional mutual information

\displaystyle  I(X:Y|Z) := H(X|Z) + H(Y|Z) - H(X,Y|Z)

and the final of the Shannon entropy inequalities asserts that this quantity is also non-negative.

The mutual information {I(X:Y)} is a measure of the extent to which {X} and {Y} fail to be independent; indeed, it is not difficult to show that {I(X:Y)} vanishes if and only if {X} and {Y} are independent. Similarly, {I(X:Y|Z)} vanishes if and only if {X} and {Y} are conditionally independent relative to {Z}. At the other extreme, {H(X|Y)} is a measure of the extent to which {X} fails to depend on {Y}; indeed, it is not difficult to show that {H(X|Y)=0} if and only if {X} is determined by {Y} in the sense that there is a deterministic function {f} such that {X = f(Y)}. In a related vein, if {X} and {X'} are equivalent in the sense that there are deterministic functional relationships {X = f(X')}, {X' = g(X)} between the two variables, then {X} is interchangeable with {X'} for the purposes of computing the above quantities, thus for instance {H(X) = H(X')}, {H(X,Y) = H(X',Y)}, {I(X:Y) = I(X':Y)}, {I(X:Y|Z) = I(X':Y|Z)}, etc..

One can get some initial intuition for these information-theoretic quantities by specialising to a simple situation in which all the random variables {X} being considered come from restricting a single random (and uniformly distributed) boolean function {F: \Omega \rightarrow \{0,1\}} on a given finite domain {\Omega} to some subset {A} of {\Omega}:

\displaystyle  X = F \downharpoonright_A.

In this case, {X} has the law of a random uniformly distributed boolean function from {A} to {\{0,1\}}, and the entropy here can be easily computed to be {|A| \log 2}, where {|A|} denotes the cardinality of {A}. If {X} is the restriction of {F} to {A}, and {Y} is the restriction of {F} to {B}, then the joint variable {(X,Y)} is equivalent to the restriction of {F} to {A \cup B}. If one discards the normalisation factor {\log 2}, one then obtains the following dictionary between entropy and the combinatorics of finite sets:

Random variables {X,Y,Z} Finite sets {A,B,C}
Entropy {H(X)} Cardinality {|A|}
Joint variable {(X,Y)} Union {A \cup B}
Mutual information {I(X:Y)} Intersection cardinality {|A \cap B|}
Conditional entropy {H(X|Y)} Set difference cardinality {|A \backslash B|}
Conditional mutual information {I(X:Y|Z)} {|(A \cap B) \backslash C|}
{X, Y} independent {A, B} disjoint
{X} determined by {Y} {A} a subset of {B}
{X,Y} conditionally independent relative to {Z} {A \cap B \subset C}

Every (linear) inequality or identity about entropy (and related quantities, such as mutual information) then specialises to a combinatorial inequality or identity about finite sets that is easily verified. For instance, the Shannon inequality {H(X,Y) \leq H(X)+H(Y)} becomes the union bound {|A \cup B| \leq |A| + |B|}, and the definition of mutual information becomes the inclusion-exclusion formula

\displaystyle  |A \cap B| = |A| + |B| - |A \cup B|.

For a more advanced example, consider the data processing inequality that asserts that if {X, Z} are conditionally independent relative to {Y}, then {I(X:Z) \leq I(X:Y)}. Specialising to sets, this now says that if {A, C} are disjoint outside of {B}, then {|A \cap C| \leq |A \cap B|}; this can be made apparent by considering the corresponding Venn diagram. This dictionary also suggests how to prove the data processing inequality using the existing Shannon inequalities. Firstly, if {A} and {C} are not necessarily disjoint outside of {B}, then a consideration of Venn diagrams gives the more general inequality

\displaystyle  |A \cap C| \leq |A \cap B| + |(A \cap C) \backslash B|

and a further inspection of the diagram then reveals the more precise identity

\displaystyle  |A \cap C| + |(A \cap B) \backslash C| = |A \cap B| + |(A \cap C) \backslash B|.

Using the dictionary in the reverse direction, one is then led to conjecture the identity

\displaystyle  I( X : Z ) + I( X : Y | Z ) = I( X : Y ) + I( X : Z | Y )

which (together with non-negativity of conditional mutual information) implies the data processing inequality, and this identity is in turn easily established from the definition of mutual information.

On the other hand, not every assertion about cardinalities of sets generalises to entropies of random variables that are not arising from restricting random boolean functions to sets. For instance, a basic property of sets is that disjointness from a given set {C} is preserved by unions:

\displaystyle  A \cap C = B \cap C = \emptyset \implies (A \cup B) \cap C = \emptyset.

Indeed, one has the union bound

\displaystyle  |(A \cup B) \cap C| \leq |A \cap C| + |B \cap C|. \ \ \ \ \ (1)

Applying the dictionary in the reverse direction, one might now conjecture that if {X} was independent of {Z} and {Y} was independent of {Z}, then {(X,Y)} should also be independent of {Z}, and furthermore that

\displaystyle  I(X,Y:Z) \leq I(X:Z) + I(Y:Z)

but these statements are well known to be false (for reasons related to pairwise independence of random variables being strictly weaker than joint independence). For a concrete counterexample, one can take {X, Y \in {\bf F}_2} to be independent, uniformly distributed random elements of the finite field {{\bf F}_2} of two elements, and take {Z := X+Y} to be the sum of these two field elements. One can easily check that each of {X} and {Y} is separately independent of {Z}, but the joint variable {(X,Y)} determines {Z} and thus is not independent of {Z}.

From the inclusion-exclusion identities

\displaystyle  |A \cap C| = |A| + |C| - |A \cup C|

\displaystyle  |B \cap C| = |B| + |C| - |B \cup C|

\displaystyle  |(A \cup B) \cap C| = |A \cup B| + |C| - |A \cup B \cup C|

\displaystyle  |A \cap B \cap C| = |A| + |B| + |C| - |A \cup B| - |B \cup C| - |A \cup C|

\displaystyle + |A \cup B \cup C|

one can check that (1) is equivalent to the trivial lower bound {|A \cap B \cap C| \geq 0}. The basic issue here is that in the dictionary between entropy and combinatorics, there is no satisfactory entropy analogue of the notion of a triple intersection {A \cap B \cap C}. (Even the double intersection {A \cap B} only exists information theoretically in a “virtual” sense; the mutual information {I(X:Y)} allows one to “compute the entropy” of this “intersection”, but does not actually describe this intersection itself as a random variable.)

However, this issue only arises with three or more variables; it is not too difficult to show that the only linear equalities and inequalities that are necessarily obeyed by the information-theoretic quantities {H(X), H(Y), H(X,Y), I(X:Y), H(X|Y), H(Y|X)} associated to just two variables {X,Y} are those that are also necessarily obeyed by their combinatorial analogues {|A|, |B|, |A \cup B|, |A \cap B|, |A \backslash B|, |B \backslash A|}. (See for instance the Venn diagram at the Wikipedia page for mutual information for a pictorial summation of this statement.)

One can work with a larger class of special cases of Shannon entropy by working with random linear functions rather than random boolean functions. Namely, let {S} be some finite-dimensional vector space over a finite field {{\mathbf F}}, and let {f: S \rightarrow {\mathbf F}} be a random linear functional on {S}, selected uniformly among all such functions. Every subspace {U} of {S} then gives rise to a random variable {X = X_U: U \rightarrow {\mathbf F}} formed by restricting {f} to {U}. This random variable is also distributed uniformly amongst all linear functions on {U}, and its entropy can be easily computed to be {\mathrm{dim}(U) \log |\mathbf{F}|}. Given two random variables {X, Y} formed by restricting {f} to {U, V} respectively, the joint random variable {(X,Y)} determines the random linear function {f} on the union {U \cup V} on the two spaces, and thus by linearity on the Minkowski sum {U+V} as well; thus {(X,Y)} is equivalent to the restriction of {f} to {U+V}. In particular, {H(X,Y) = \mathrm{dim}(U+V) \log |\mathbf{F}|}. This implies that {I(X:Y) = \mathrm{dim}(U \cap V) \log |\mathbf{F}|} and also {H(X|Y) = \mathrm{dim}(\pi_V(U)) \log |\mathbf{F}|}, where {\pi_V: S \rightarrow S/V} is the quotient map. After discarding the normalising constant {\log |\mathbf{F}|}, this leads to the following dictionary between information theoretic quantities and linear algebra quantities, analogous to the previous dictionary:

Random variables {X,Y,Z} Subspaces {U,V,W}
Entropy {H(X)} Dimension {\mathrm{dim}(U)}
Joint variable {(X,Y)} Sum {U+V}
Mutual information {I(X:Y)} Dimension of intersection {\mathrm{dim}(U \cap V)}
Conditional entropy {H(X|Y)} Dimension of projection {\mathrm{dim}(\pi_V(U))}
Conditional mutual information {I(X:Y|Z)} {\mathrm{dim}(\pi_W(U) \cap \pi_W(V))}
{X, Y} independent {U, V} transverse ({U \cap V = \{0\}})
{X} determined by {Y} {U} a subspace of {V}
{X,Y} conditionally independent relative to {Z} {\pi_W(U)}, {\pi_W(V)} transverse.

The combinatorial dictionary can be regarded as a specialisation of the linear algebra dictionary, by taking {S} to be the vector space {\mathbf{F}_2^\Omega} over the finite field {\mathbf{F}_2} of two elements, and only considering those subspaces {U} that are coordinate subspaces {U = {\bf F}_2^A} associated to various subsets {A} of {\Omega}.

As before, every linear inequality or equality that is valid for the information-theoretic quantities discussed above, is automatically valid for the linear algebra counterparts for subspaces of a vector space over a finite field by applying the above specialisation (and dividing out by the normalising factor of {\log |\mathbf{F}|}). In fact, the requirement that the field be finite can be removed by applying the compactness theorem from logic (or one of its relatives, such as Los’s theorem on ultraproducts, as done in this previous blog post).

The linear algebra model captures more of the features of Shannon entropy than the combinatorial model. For instance, in contrast to the combinatorial case, it is possible in the linear algebra setting to have subspaces {U,V,W} such that {U} and {V} are separately transverse to {W}, but their sum {U+V} is not; for instance, in a two-dimensional vector space {{\bf F}^2}, one can take {U,V,W} to be the one-dimensional subspaces spanned by {(0,1)}, {(1,0)}, and {(1,1)} respectively. Note that this is essentially the same counterexample from before (which took {{\bf F}} to be the field of two elements). Indeed, one can show that any necessarily true linear inequality or equality involving the dimensions of three subspaces {U,V,W} (as well as the various other quantities on the above table) will also be necessarily true when applied to the entropies of three discrete random variables {X,Y,Z} (as well as the corresponding quantities on the above table).

However, the linear algebra model does not completely capture the subtleties of Shannon entropy once one works with four or more variables (or subspaces). This was first observed by Ingleton, who established the dimensional inequality

\displaystyle  \mathrm{dim}(U \cap V) \leq \mathrm{dim}(\pi_W(U) \cap \pi_W(V)) + \mathrm{dim}(\pi_X(U) \cap \pi_X(V)) + \mathrm{dim}(W \cap X) \ \ \ \ \ (2)

for any subspaces {U,V,W,X}. This is easiest to see when the three terms on the right-hand side vanish; then {\pi_W(U), \pi_W(V)} are transverse, which implies that {U\cap V \subset W}; similarly {U \cap V \subset X}. But {W} and {X} are transverse, and this clearly implies that {U} and {V} are themselves transverse. To prove the general case of Ingleton’s inequality, one can define {Y := U \cap V} and use {\mathrm{dim}(\pi_W(Y)) \leq \mathrm{dim}(\pi_W(U) \cap \pi_W(V))} (and similarly for {X} instead of {W}) to reduce to establishing the inequality

\displaystyle  \mathrm{dim}(Y) \leq \mathrm{dim}(\pi_W(Y)) + \mathrm{dim}(\pi_X(Y)) + \mathrm{dim}(W \cap X) \ \ \ \ \ (3)

which can be rearranged using {\mathrm{dim}(\pi_W(Y)) = \mathrm{dim}(Y) - \mathrm{dim}(W) + \mathrm{dim}(\pi_Y(W))} (and similarly for {X} instead of {W}) and {\mathrm{dim}(W \cap X) = \mathrm{dim}(W) + \mathrm{dim}(X) - \mathrm{dim}(W + X)} as

\displaystyle  \mathrm{dim}(W + X ) \leq \mathrm{dim}(\pi_Y(W)) + \mathrm{dim}(\pi_Y(X)) + \mathrm{dim}(Y)

but this is clear since {\mathrm{dim}(W + X ) \leq \mathrm{dim}(\pi_Y(W) + \pi_Y(X)) + \mathrm{dim}(Y)}.

Returning to the entropy setting, the analogue

\displaystyle  H( V ) \leq H( V | Z ) + H(V | W ) + I(Z:W)

of (3) is true (exercise!), but the analogue

\displaystyle  I(X:Y) \leq I(X:Y|Z) + I(X:Y|W) + I(Z:W) \ \ \ \ \ (4)

of Ingleton’s inequality is false in general. Again, this is easiest to see when all the terms on the right-hand side vanish; then {X,Y} are conditionally independent relative to {Z}, and relative to {W}, and {Z} and {W} are independent, and the claim (4) would then be asserting that {X} and {Y} are independent. While there is no linear counterexample to this statement, there are simple non-linear ones: for instance, one can take {Z,W} to be independent uniform variables from {\mathbf{F}_2}, and take {X} and {Y} to be (say) {ZW} and {(1-Z)(1-W)} respectively (thus {X, Y} are the indicators of the events {Z=W=1} and {Z=W=0} respectively). Once one conditions on either {Z} or {W}, one of {X,Y} has positive conditional entropy and the other has zero entropy, and so {X, Y} are conditionally independent relative to either {Z} or {W}; also, {Z} or {W} are independent of each other. But {X} and {Y} are not independent of each other (they cannot be simultaneously equal to {1}). Somehow, the feature of the linear algebra model that is not present in general is that in the linear algebra setting, every pair of subspaces {U, V} has a well-defined intersection {U \cap V} that is also a subspace, whereas for arbitrary random variables {X, Y}, there does not necessarily exist the analogue of an intersection, namely a “common information” random variable {V} that has the entropy of {I(X:Y)} and is determined either by {X} or by {Y}.

I do not know if there is any simpler model of Shannon entropy that captures all the inequalities available for four variables. One significant complication is that there exist some information inequalities in this setting that are not of Shannon type, such as the Zhang-Yeung inequality

\displaystyle  I(X:Y) \leq 2 I(X:Y|Z) + I(X:Z|Y) + I(Y:Z|X)

\displaystyle + I(X:Y|W) + I(Z:W).

One can however still use these simpler models of Shannon entropy to be able to guess arguments that would work for general random variables. An example of this comes from my paper on the logarithmically averaged Chowla conjecture, in which I showed among other things that

\displaystyle  |\sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n}| \leq \varepsilon x \ \ \ \ \ (5)

whenever {x} was sufficiently large depending on {\varepsilon>0}, where {\lambda} is the Liouville function. The information-theoretic part of the proof was as follows. Given some intermediate scale {H} between {1} and {x}, one can form certain random variables {X_H, Y_H}. The random variable {X_H} is a sign pattern of the form {(\lambda(n+1),\dots,\lambda(n+H))} where {n} is a random number chosen from {1} to {x} (with logarithmic weighting). The random variable {Y_H} was tuple {(n \hbox{ mod } p)_{p \sim \varepsilon^2 H}} of reductions of {n} to primes {p} comparable to {\varepsilon^2 H}. Roughly speaking, what was implicitly shown in the paper (after using the multiplicativity of {\lambda}, the circle method, and the Matomaki-Radziwill theorem on short averages of multiplicative functions) is that if the inequality (5) fails, then there was a lower bound

\displaystyle  I( X_H : Y_H ) \gg \varepsilon^7 \frac{H}{\log H}

on the mutual information between {X_H} and {Y_H}. From translation invariance, this also gives the more general lower bound

\displaystyle  I( X_{H_0,H} : Y_H ) \gg \varepsilon^7 \frac{H}{\log H} \ \ \ \ \ (6)

for any {H_0}, where {X_{H_0,H}} denotes the shifted sign pattern {(\lambda(n+H_0+1),\dots,\lambda(n+H_0+H))}. On the other hand, one had the entropy bounds

\displaystyle  H( X_{H_0,H} ), H(Y_H) \ll H

and from concatenating sign patterns one could see that {X_{H_0,H+H'}} is equivalent to the joint random variable {(X_{H_0,H}, X_{H_0+H,H'})} for any {H_0,H,H'}. Applying these facts and using an “entropy decrement” argument, I was able to obtain a contradiction once {H} was allowed to become sufficiently large compared to {\varepsilon}, but the bound was quite weak (coming ultimately from the unboundedness of {\sum_{\log H_- \leq j \leq \log H_+} \frac{1}{j \log j}} as the interval {[H_-,H_+]} of values of {H} under consideration becomes large), something of the order of {H \sim \exp\exp\exp(\varepsilon^{-7})}; the quantity {H} needs at various junctures to be less than a small power of {\log x}, so the relationship between {x} and {\varepsilon} becomes essentially quadruple exponential in nature, {x \sim \exp\exp\exp\exp(\varepsilon^{-7})}. The basic strategy was to observe that the lower bound (6) causes some slowdown in the growth rate {H(X_{kH})/kH} of the mean entropy, in that this quantity decreased by {\gg \frac{\varepsilon^7}{\log H}} as {k} increased from {1} to {\log H}, basically by dividing {X_{kH}} into {k} components {X_{jH, H}}, {j=0,\dots,k-1} and observing from (6) each of these shares a bit of common information with the same variable {Y_H}. This is relatively clear when one works in a set model, in which {Y_H} is modeled by a set {B_H} of size {O(H)}, and {X_{H_0,H}} is modeled by a set of the form

\displaystyle  X_{H_0,H} = \bigcup_{H_0 < h \leq H_0+H} A_h

for various sets {A_h} of size {O(1)} (also there is some translation symmetry that maps {A_h} to a shift {A_{h+1}} while preserving all of the {B_H}).

However, on considering the set model recently, I realised that one can be a little more efficient by exploiting the fact (basically the Chinese remainder theorem) that the random variables {Y_H} are basically jointly independent as {H} ranges over dyadic values that are much smaller than {\log x}, which in the set model corresponds to the {B_H} all being disjoint. One can then establish a variant

\displaystyle  I( X_{H_0,H} : Y_H | (Y_{H'})_{H' < H}) \gg \varepsilon^7 \frac{H}{\log H} \ \ \ \ \ (7)

of (6), which in the set model roughly speaking asserts that each {B_H} claims a portion of the {\bigcup_{H_0 < h \leq H_0+H} A_h} of cardinality {\gg \varepsilon^7 \frac{H}{\log H}} that is not claimed by previous choices of {B_H}. This leads to a more efficient contradiction (relying on the unboundedness of {\sum_{\log H_- \leq j \leq \log H_+} \frac{1}{j}} rather than {\sum_{\log H_- \leq j \leq \log H_+} \frac{1}{j \log j}}) that looks like it removes one order of exponential growth, thus the relationship between {x} and {\varepsilon} is now {x \sim \exp\exp\exp(\varepsilon^{-7})}. Returning to the entropy model, one can use (7) and Shannon inequalities to establish an inequality of the form

\displaystyle  \frac{1}{2H} H(X_{2H} | (Y_{H'})_{H' \leq 2H}) \leq \frac{1}{H} H(X_{H} | (Y_{H'})_{H' \leq H}) - \frac{c \varepsilon^7}{\log H}

for a small constant {c>0}, which on iterating and using the boundedness of {\frac{1}{H} H(X_{H} | (Y_{H'})_{H' \leq H})} gives the claim. (A modification of this analysis, at least on the level of the back of the envelope calculation, suggests that the Matomaki-Radziwill theorem is needed only for ranges {H} greater than {\exp( (\log\log x)^{\varepsilon^{7}} )} or so, although at this range the theorem is not significantly simpler than the general case).


Filed under: expository, math.IT, math.NT Tagged: Liouville function, randomness, Shannon entropy

March 03, 2017

Peter RohdeARC Future Fellowship

I’m pleased and honoured to announce that I have just been awarded a prestigious ARC Future Fellowship to conduct a 4 year project into quantum networking and encrypted quantum computation. I will be based at the University of Technology Sydney, where I have received tenure as a Senior Lecturer. Ad astra.

Tommaso DorigoDecision Trees, Explained To Kids

Decision trees are one of the many players in the booming field of supervised machine learning. They can be used to classify elements into two or more classes, depending on their characteristics. Their interest in particle physics applications is large, as we always need to try and decide on a statistical basis what kind of physics process originated the particle collision we see in our detector.

read more

BackreactionYes, a violation of energy conservation can explain the cosmological constant

Chad Orzel recently pointed me towards an article in Physics World according to which “Dark energy emerges when energy conservation is violated.” Quoted in the Physics World article are George Ellis, who enthusiastically notes that the idea is “no more fanciful than many other ideas being explored in theoretical physics at present,” and Lee Smolin, according to whom it’s “speculative, but in the

February 28, 2017

Richard EastherYour Mileage May Vary

Auckland's 2017 Bike Challenge winds up today; February may be the shortest month, but it is prime cycling season in Auckland, with decent weather and long evenings. I don't usually log my saddle-time, but I tracked my activity while taking part in the challenge. In the course of the month I made 36 trips covering 345km on my bike. If I'd traveled the same distance by car, I would have emitted 69 kg of carbon dioxide.

I can add a few more stats to that: the same amount of travel by car would have cost me a couple of hundred dollars (that's just for petrol, plus parking in central Auckland), or a bit over a hundred by bus.

On the other side of the ledger, while the running costs of a bike are close to zero, it occasionally needs a little love from a mechanic – and I find that the rider appreciates a bonus mango lassi with his lunch.

Over the last year, I've moved from a fair-weather cyclist to something close to a year-round rider. My trousers fit a little more loosely than they did 12 months ago, as I am around 5kg lighter. Not a huge change, but a result that goes against the run of play for a (let's be honest) middle aged guy who spends a lot of time at a desk. I don't puff so much going up the hills any more, and a long walk seems a lot shorter than it used to. An e-bike may lie over the horizon, but for now I'm fully pedal-powered. 

It's not easy to acquire a new habit that's stuck as well as this one has, so what made it possible? The first answer is infrastructure;  my commute is mainly along Auckland's Northwestern Cycleway, which runs along the side of the highway to town – and it is just much (much!) nicer to be riding down a tree-lined car-free path than it is to be sitting in traffic on the adjacent motorway. 


IMG_8243.jpg
IMG_8240 (1).jpg
IMG_8242.jpg

February 26, 2017

Jordan EllenbergTweet, repeat

Messing around a bit with lexical analysis of my tweets (of which there are about 10,000).  It’s interesting to see which tweets I’ve essentially duplicated (I mean, after @-mentions, etc. are removed.)  Some of the top duplicates:

  • Thanks (8 times)
  • thanks (6 times)
  • Yep (6 times)
  • Yes (5 times)
  • yep (5 times)
  • Thanks so much (5 times)
  • RT (5 times)
  • I know right (4 times)

More detailed tweet analysis later.

 

 


Jordan EllenbergMathematicians becoming data scientists: Should you? How to?

I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter.  I asked her:  so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry?  How would you know whether you might find that kind of work enjoyable?  And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share.  Take it away, Sarah!

 

 

As I went back and edited my initial bullet points to Jordan, I realized I had a good deal to say on this topic. My bullet points started overflowing. So, the TL;DR (which I acknowledge is still pretty long) is in the bullet points. However, if you’re looking for a more Foster-Wallace-esque adventure, and also perhaps the most unique insights I have to offer on these questions, the footnotes are where it’s at.

 

Some bullet points to determine if you (a Math PhD student) would like this kind of work:

  • Do you like modeling complex problems mathematically? Do you enjoy taking a real-world problem and determining its salient features in the context of real-world constraints?
  • Do you like communicating with diverse audiences, many of whom are not mathematical experts? Are you good at translating mathematical insights into non- or less mathematical language?
  • Can you walk away from a problem when the solution is “good enough,” are you able to switch between tasks or problems with relative ease? Are you OK with simple solutions to problems that could have more complicated solutions, but only with rapidly diminishing returns?
  • Are you OK with abandoning the work of your dissertation and potentially not publishing in academic journals anymore? (Unless you are at Google or Microsoft Research the likelihood is that you will not have this opportunity. You may, however, be able to publish in Industry journals or the Industry tracks of interdisciplinary academic conferences). Are you OK with becoming more of a practitioner than a theorist?
  • Do you like programming in an object oriented, statically typed language and are you at least decent at it? This one really depends on the company and the role; at Twitter to have any impact at all you had to build it yourself, and that meant writing code in Scala; if you wanted to do anything REALLY interesting you had to be REALLY good at Scala, which is a serious investment. At other companies things can be different, but IMO the really interesting work requires knowing how to program at a level where you could pass as a Software Engineer. However there is this whole category of Data Scientist for which you only need to know R or Pandas (the R-like Python package) and be well-versed in Statistics. For that job you do mostly offline analysis of various business metrics; if you start building models of users in R you run into size constraints pretty quickly and then also “productionizing” (i.e., building a working system to update and implement your model) your approach needs to be taken up by someone else, which then takes you out of the driver’s seat in the chaotic environment of most tech companies. In conclusion, Math PhDs can run the risk of becoming irrelevant in tech if they cannot build things[1].

How to position oneself for these types of jobs (and by “these types of jobs” I mean tech jobs as a Data Scientist or ML engineer NOT at Google Research or Microsoft Research):

  • Have a coding project or two in java or C++ (or Scala, or Go) that has some real-world machine learning application. Pick a dataset, solve a problem, put it on your blog.
  • Have an R or Pandas data analysis project where you take some publicly available data and glean some insights. Clarity of communication and a pretty presentation are what matter here, more than being super impressive technically, although that’s also good. If you make your project in an iPython notebook and then put it on github, all the better.
  • Be ready to have a coding interview. Google-able resources to support you in your preparations abound. You can come up to speed 5-10 times faster than a typical CS undergrad (e.g. you won’t be struggling for weeks to grok summation notation) but this still does require some Ideally you’d start practicing regularly (like, 1-2 hours per day if you have minimal CS or algorithms background) 2-6 months before you’re on the job market. Look at it like a qualifying exam. Most places will let you interview in Python but it might also be a good idea to know how you’d answer questions in java or C++. Resenting or dreading this part of the process will not help you be successful at it, and it is arguably worth investing the time to be successful. It will at the very least give you many more employment options to choose from. Also this time investment can somewhat overlap with the time you put in to the bullet points above. A final note: if you are not already comfortable with java, I do not recommend the book “Cracking the Coding Interview.” Many people swear by it but I found it to be way too overwhelming as a first resource. What I did to get started was I coded the algorithms that are introduced in Kleinberg and Tardos’ Algorithm Design book (or at least the first 2/3 to ¾ of it); the book is small and well-written enough to not be completely impossible to get through in the time allotted, covers most of the data structures you might be asked to implement in an interview, and actually coding the algorithms presented was very good practice for coding interviews. Also, it will help you get good at analyzing run times of algorithms, which you’ll typically be asked to do as well. Combining this (or working through a similar book) with a good website which provides practice interview questions should be adequate preparation. The only hole here is that this doesn’t really help you understand object-oriented programming at all; finding a good internet resource that helps you understand the implementation of data structures as classes is probably advisable. It’s worth investing the time in a practical introduction to object oriented programming, but this is probably the least important of the things already mentioned (but may be very important when you actually start working, depending on how much production code you are actually expected to write).
  • Employers may opt, instead of the coding interview, to give you a take home coding project. You similarly need to prepare for these in much the same way you need to prepare for a coding interview; since these projects will typically have an expected turn around time, you should not wait until you have received one to prepare. The first two bullet points are also very good preparation for this type of task.
  • Be able to talk about what you do and what you’re interested in at various levels, to various audiences. A big thing about the transition to tech is that you possibly start communicating with people who don’t really know what a vector space is[2]. Be ready to have those conversations. Honestly ask yourself if you’re OK with having those conversations.
  • Research the industry and figure out what real-world problems you’d be interested in solving. Have a targeted job search that reflects that you already have an understanding of what’s out there and a path for your own career advancement in mind. Don’t see going to Industry as failing out of the academic career path; sometimes math grad students are viewed as seeing tech jobs as a sort of fallback that they’re entitled to if the academic path doesn’t work out. Try to pursue this chapter with the enthusiasm that you brought to your academic endeavors, and envision a path for yourself in this new context.

Obviously it is important to note that all opinions and advice expressed herein are just my opinion based on a rather limited array of experiences. I think the value added here comes from positioning (that is, I came from a pure math background originally), and my tendency to exhaustively and compulsively analyze and dissect any organization or situation I’m a part of.

[1] However, with regards to this bullet point, integrating the output of math PhDs into the workflow of a tech company is a known stumbling block and there’s a general consensus that excellent programming skills in addition to a rigorous mathematical background may be too much to ask of employees; there just simply aren’t enough people who have these qualifications. There’s a whole cottage industry that’s sprung up around translating the insight and modeling of data scientists into deployable code which is equal parts fascinating and wearying. These black-box analytics packages are part of a broader space of implemented solutions to abstractly defined problems called Enterprise Software where consumer companies begin to translate their problems into a form representable to a software solution and then receive the corresponding answers that are programmed in to that solution. As a mathematician, you may find you prefer the idea of working for an Enterprise or “B2B” company; you deal with abstracted problems and implement solutions perhaps according to some mathematically advanced modeling. You may have the opportunity to be rigorous and exacting in the development of this software and to make critical design choices in how information is aggregated, processed, and disseminated in its lifecycle through your system. You may be more directly on the critical path of the creation of the company’s final product than you would be at a company whose primary goal is not to manufacture analytics solutions. On the other hand, as a Data Scientist working for a company which is a consumer of an enterprise solution, you may bear witness to a phenomenon whereby employees translate their output into a format consumable by an enterprise solution and then are correspondingly shaped by that translation and its ripple effects on the implicit characterization of their work’s meaning and value*. This is happening everywhere all the time and can lead to the emergent phenomenon of a company which is really now just nothing more than the interaction of many dogs each being wagged by many tiny tails. Even more meta: as a certain brand of Data Scientist you may be tasked with the role of discerning a meaningful signal from the many outputs generated by various enterprise systems for tracking and aggregating data. It is actually this type of murky business problem that you are best suited to address and a good employer will seek your help in selecting metrics* which will correctly inform leadership about company progress. However, the opportunity of creating a role for a Data Scientist at this level is seldom recognized or appreciated. It is more likely your skills will be restricted to a more technical pursuit which may be ill-informed from a business standpoint in the first place; you will invest your time in generating a solution to a slightly or greatly misinterpreted and so misstated problem. But as long as you get to build ML models or deep learning systems or whatever thing it is that interests you, and get paid a lot to do it, that might not matter to you.

*a “metric” in business land is just a measurement, typically a thing that can be counted, or a percentage, often viewed over time. e.g. an IT department might have enterprise software for managing their tasks that reveals the number of open requests it has at any given time, the current number of resolved requests, or the resolved requests each day or week. It’s not entirely trivial to translate these metrics into a standard by which to evaluate the IT department’s performance; the company could be growing; an acquisition of another company may have led to a flurry of IT requests to get new employees on-boarded quickly while maintaining a desktop work environment similar to the one they had before, or something like that. What metric is flexible enough to reflect employee success in the diverse situations that arise in real life? This is an example of the not-entirely-trivial question around metrics to measure success that arises at all levels of a company all the time.

 

[2] Up until now, you have had the privilege of working for mathematicians all the way down (mostly). At the very least, your direct supervisor has been a mathematician whose mathematical insight and intuition you (hopefully) greatly respect; you have probably grown accustomed to working for someone who can not only understand but also analyze and evaluate your solution to a problem (indeed, in some cases they may have had to actively avoid thinking about your problem to avoid solving it before you). One of the biggest transitions when you move into Industry is that you will possibly move into a position where you are the local expert. This doesn’t just mean that you’re the expert on C*-algebras but someone else is the expert on etale cohomology (btw I have more or less forgotten what either of those things are; does this fact make you feel sad or empty? Your reaction is a good thing to note.); you will possibly be working for someone and with other people who never developed a real understanding of mathematical logic, who basically see you as some kind of computer-math sorcerer whose work they can’t begin to understand (and I’m talking here not about C*-algebras but about stochastic gradient descent or matrix multiplication or just matrices). The extent to which this is true obviously varies, but it’s a good idea to know ahead of time how much having mathematical colleagues is important to you, and also how much having a mathematically literate boss is important to you. Both are totally possible, but are by no means guaranteed. You might find that this is critical to your happiness on the job, so when comparing offers, don’t overlook it. The less your immediate supervisor can directly see the merits of your work, the more time you’ll have to spend explaining it and, more importantly, selling it. However, less ability on your supervisor’s part to oversee your work can also mean more creative freedom; when presented with a problem you may have total flexibility in determining the correct method of solution; if no one else in the room can understand what you’re doing you have a lot of control. But you also have a great deal more responsibility to consistently demonstrate your work’s value. There are tradeoffs, and different personalities thrive in each situation.

 

 

 

 


February 24, 2017

John BaezAzimuth Backup Project (Part 4)

The Azimuth Climate Data Backup Project is going well! Our Kickstarter campaign ended on January 31st and the money has recently reached us. Our original goal was $5000. We got $20,427 of donations, and after Kickstarter took its cut we received $18,590.96.

Next time I’ll tell you what our project has actually been doing. This time I just want to give a huge “thank you!” to all 627 people who contributed money on Kickstarter!

I sent out thank you notes to everyone, updating them on our progress and asking if they wanted their names listed. The blanks in the following list represent people who either didn’t reply, didn’t want their names listed, or backed out and decided not to give money. I’ll list people in chronological order: first contributors first.

Only 12 people backed out; the vast majority of blanks on this list are people who haven’t replied to my email. I noticed some interesting but obvious patterns. For example, people who contributed later are less likely to have answered my email yet—I’ll update this list later. People who contributed more money were more likely to answer my email.

The magnitude of contributions ranged from $2000 to $1. A few people offered to help in other ways. The response was international—this was really heartwarming! People from the US were more likely than others to ask not to be listed.

But instead of continuing to list statistical patterns, let me just thank everyone who contributed.

thank-you-message2_edited-1

Daniel Estrada
Ahmed Amer
Saeed Masroor
Jodi Kaplan
John Wehrle
Bob Calder
Andrea Borgia
L Gardner

Uche Eke
Keith Warner
Dean Kalahan
James Benson
Dianne Hackborn

Walter Hahn
Thomas Savarino
Noah Friedman
Eric Willisson
Jeffrey Gilmore
John Bennett
Glenn McDavid

Brian Turner

Peter Bagaric

Martin Dahl Nielsen
Broc Stenman

Gabriel Scherer
Roice Nelson
Felipe Pait
Kenneth Hertz

Luis Bruno


Andrew Lottmann
Alex Morse

Mads Bach Villadsen
Noam Zeilberger

Buffy Lyon

Josh Wilcox

Danny Borg

Krishna Bhogaonker
Harald Tveit Alvestrand


Tarek A. Hijaz, MD
Jouni Pohjola
Chavdar Petkov
Markus Jöbstl
Bjørn Borud


Sarah G

William Straub

Frank Harper
Carsten Führmann
Rick Angel
Drew Armstrong

Jesimpson

Valeria de Paiva
Ron Prater
David Tanzer

Rafael Laguna
Miguel Esteves dos Santos 
Sophie Dennison-Gibby




Randy Drexler
Peter Haggstrom


Jerzy Michał Pawlak
Santini Basra
Jenny Meyer


John Iskra

Bruce Jones
Māris Ozols
Everett Rubel



Mike D
Manik Uppal
Todd Trimble

Federer Fanatic

Forrest Samuel, Harmos Consulting








Annie Wynn
Norman and Marcia Dresner



Daniel Mattingly
James W. Crosby








Jennifer Booth
Greg Randolph





Dave and Karen Deeter

Sarah Truebe









Tieg Zaharia
Jeffrey Salfen
Birian Abelson

Logan McDonald

Brian Truebe
Jon Leland


Nicole



Sarah Lim







James Turnbull




John Huerta
Katie Mandel Bruce
Bethany Summer




Heather Tilert

Anna C. Gladstone



Naom Hart
Aaron Riley

Giampiero Campa

Julie A. Sylvia


Pace Willisson









Bangskij










Peter Herschberg

Alaistair Farrugia


Conor Hennessy




Stephanie Mohr




Torinthiel


Lincoln Muri 
Anet Ferwerda 


Hanna





Michelle Lee Guiney

Ben Doherty
Trace Hagemann







Ryan Mannion


Penni and Terry O'Hearn



Brian Bassham
Caitlin Murphy
John Verran






Susan


Alexander Hawson
Fabrizio Mafessoni
Anita Phagan
Nicolas Acuña
Niklas Brunberg

Adam Luptak
V. Lazaro Zamora






Branford Werner
Niklas Starck Westerberg
Luca Zenti and Marta Veneziano 


Ilja Preuß
Christopher Flint

George Read 
Courtney Leigh

Katharina Spoerri


Daniel Risse



Hanna
Charles-Etienne Jamme
rhackman41



Jeff Leggett

RKBookman


Aaron Paul
Mike Metzler


Patrick Leiser

Melinda

Ryan Vaughn
Kent Crispin

Michael Teague

Ben



Fabian Bach
Steven Canning


Betsy McCall

John Rees

Mary Peters

Shane Claridge
Thomas Negovan
Tom Grace
Justin Jones


Jason Mitchell




Josh Weber
Rebecca Lynne Hanginger
Kirby


Dawn Conniff


Michael T. Astolfi



Kristeva

Erik
Keith Uber

Elaine Mazerolle
Matthieu Walraet

Linda Penfold




Lujia Liu



Keith



Samar Tareem


Henrik Almén
Michael Deakin 
Rutger Ockhorst

Erin Bassett
James Crook



Junior Eluhu
Dan Laufer
Carl
Robert Solovay






Silica Magazine







Leonard Saers
Alfredo Arroyo García



Larry Yu













John Behemonth


Eric Humphrey


Svein Halvor Halvorsen



Karim Issa

Øystein Risan Borgersen
David Anderson Bell III











Ole-Morten Duesend







Adam North and Gabrielle Falquero

Robert Biegler 


Qu Wenhao






Steffen Dittmar




Shanna Germain






Adam Blinkinsop







John WS Marvin (Dread Unicorn Games)


Bill Carter
Darth Chronis 



Lawrence Stewart

Gareth Hodges

Colin Backhurst
Christopher Metzger

Rachel Gumper


Mariah Thompson

Falk Alexander Glade
Johnathan Salter




Maggie Unkefer
Shawna Maryanovich






Wilhelm Fitzpatrick
Dylan “ExoByte” Mayo
Lynda Lee




Scott Carpenter



Charles D, Payet
Vince Rostkowski


Tim Brown
Raven Daegmorgan
Zak Brueckner


Christian Page

Adi Shavit


Steven Greenberg
Chuck Lunney



Adriel Bustamente

Natasha Anicich



Bram De Bie
Edward L






Gray Detrick
Robert


Sarah Russell

Sam Leavin

Abilash Pulicken

Isabel Olondriz
James Pierce
James Morrison


April Daniels



José Tremblay Champagne


Chris Edmonds

Hans & Maria Cummings
Bart Gasiewiski


Andy Chamard



Andrew Jackson

Christopher Wright

Crystal Collins

ichimonji10


Alan Stern
Alison W


Dag Henrik Bråtane





Martin Nilsson


William Schrade


Tim GowersTimothy Chow starts Polymath12

This is a quick post to draw attention to the fact that a new and very interesting looking polymath project has just started, led by Timothy Chow. He is running it over at the Polymath blog.

The problem it will tackle is Rota’s basis conjecture, which is the following statement.

Conjecture. For each i let B_i=\{e_{i1},\dots,e_{in}\} be a basis of an n-dimensional vector space V. Then there are n disjoint bases of V, each containing one element from each B_i.

Equivalently, if you have an n\times n matrix where each row is a basis, then you can permute the entries of the rows so that each column is also a basis.

This is one of those annoying problems that comes into the how-can-that-not-be-known category. Timothy Chow has a lot of interesting thoughts to get the project going, as well as explanations of why he thinks the time might be ripe for a solution.


February 23, 2017

John BaezSaving Climate Data (Part 6)

Scott Pruitt, who filed legal challenges against Environmental Protection Agency rules fourteen times, working hand in hand with oil and gas companies, is now head of that agency. What does that mean about the safety of climate data on the EPA’s websites? Here is an inside report:

• Dawn Reeves, EPA preserves Obama-Era website but climate change data doubts remain, InsideEPA.com, 21 February 2017.

For those of us who are backing up climate data, the really important stuff is in red near the bottom.

The EPA has posted a link to an archived version of its website from Jan. 19, the day before President Donald Trump was inaugurated and the agency began removing climate change-related information from its official site, saying the move comes in response to concerns that it would permanently scrub such data.

However, the archived version notes that links to climate and other environmental databases will go to current versions of them—continuing the fears that the Trump EPA will remove or destroy crucial greenhouse gas and other data.

The archived version was put in place and linked to the main page in response to “numerous [Freedom of Information Act (FOIA)] requests regarding historic versions of the EPA website,” says an email to agency staff shared by the press office. “The Agency is making its best reasonable effort to 1) preserve agency records that are the subject of a request; 2) produce requested agency records in the format requested; and 3) post frequently requested agency records in electronic format for public inspection. To meet these goals, EPA has re-posted a snapshot of the EPA website as it existed on January 19, 2017.”

The email adds that the action is similar to the snapshot taken of the Obama White House website.

The archived version of EPA’s website includes a “more information” link that offers more explanation.

For example, it says the page is “not the current EPA website” and that the archive includes “static content, such as webpages and reports in Portable Document Format (PDF), as that content appeared on EPA’s website as of January 19, 2017.”

It cites technical limits for the database exclusions. “For example, many of the links contained on EPA’s website are to databases that are updated with the new information on a regular basis. These databases are not part of the static content that comprises the Web Snapshot.” Searches of the databases from the archive “will take you to the current version of the database,” the agency says.

“In addition, links may have been broken in the website as it appeared” on Jan. 19 and those will remain broken on the snapshot. Links that are no longer active will also appear as broken in the snapshot.

“Finally, certain extremely large collections of content… were not included in the Snapshot due to their size” such as AirNow images, radiation network graphs, historic air technology transfer network information, and EPA’s searchable news releases.”

‘Smart’ Move

One source urging the preservation of the data says the snapshot appears to be a “smart” move on EPA’s behalf, given the FOIA requests it has received, and notes that even though other groups like NextGen Climate and scientists have been working to capture EPA’s online information, having it on EPA’s site makes it official.

But it could also be a signal that big changes are coming to the official Trump EPA site, and it is unclear how long the agency will maintain the archived version.

The source says while it is disappointing that the archive may signal the imminent removal of EPA’s climate site, “at least they are trying to accommodate public concerns” to preserve the information.

A second source adds that while it is good that EPA is seeking “to address the widespread concern” that the information will be removed by an administration that does not believe in human-caused climate change, “on the other hand, it doesn’t address the primary concern of the data. It is snapshots of the web text.” Also, information “not included,” such as climate databases, is what is difficult to capture by outside groups and is what really must be preserved.

“If they take [information] down” that groups have been trying to preserve, then the underlying concern about access to data remains. “Web crawlers and programs can do things that are easy,” such as taking snapshots of text, “but getting the data inside the database is much more challenging,” the source says.

The first source notes that EPA’s searchable databases, such as those maintained by its Clean Air Markets Division, are used by the public “all the time.”

The agency’s Office of General Counsel (OGC) Jan. 25 began a review of the implications of taking down the climate page—a planned wholesale removal that was temporarily suspended to allow for the OGC review.

But EPA did remove some specific climate information, including links to the Clean Power Plan and references to President Barack Obama’s Climate Action Plan. Inside EPA captured this screenshot of the “What EPA Is Doing” page regarding climate change. Those links are missing on the Trump EPA site. The archive includes the same version of the page as captured by our screenshot.

Inside EPA first reported the plans to take down the climate information on Jan. 17.

After the OGC investigation began, a source close to the Trump administration said Jan. 31 that climate “propaganda” would be taken down from the EPA site, but that the agency is not expected to remove databases on GHG emissions or climate science. “Eventually… the propaganda will get removed…. Most of what is there is not data. Most of what is there is interpretation.”

The Sierra Club and Environmental Defense Fund both filed FOIA requests asking the agency to preserve its climate data, while attorneys representing youth plaintiffs in a federal climate change lawsuit against the government have also asked the Department of Justice to ensure the data related to its claims is preserved.

The Azimuth Climate Data Backup Project and other groups are making copies of actual databases, not just the visible portions of websites.


BackreactionBook Review: “The Particle Zoo” by Gavin Hesketh

The Particle Zoo: The Search for the Fundamental Nature of Reality By Gavin Hesketh Quercus (1 Sept. 2016) The first word in Gavin Hesketh’s book The Particle Zoo is “Beauty.” I read the word, closed the book, and didn’t reopen it for several months. Having just myself finished writing a book about the role of beauty in theoretical physics, it was the absolutely last thing I wanted to hear about

February 20, 2017

John PreskillIt’s CHAOS!

My brother and I played the video game Sonic the Hedgehog on a Sega Dreamcast. The hero has spiky electric-blue fur and can run at the speed of sound.1 One of us, then the other, would battle monsters. Monster number one oozes onto a dark city street as an aquamarine puddle. The puddle spreads, then surges upward to form limbs and claws.2 The limbs splatter when Sonic attacks: Aqua globs rain onto the street.

chaos-vs-sonic

The monster’s master, Dr. Eggman, has ginger mustachios and a body redolent of his name. He scoffs as the heroes congratulate themselves.

“Fools!” he cries, the pauses in his speech heightening the drama. “[That monster is] CHAOS…the GOD…of DE-STRUC-TION!” His cackle could put a Disney villain to shame.

Dr. Eggman’s outburst comes to mind when anyone asks what topic I’m working on.

“Chaos! And the flow of time, quantum theory, and the loss of information.”

eggman

Alexei Kitaev, a Caltech physicist, hooked me on chaos. I TAed his spring-2016 course. The registrar calls the course Ph 219c: Quantum Computation. I call the course Topics that Interest Alexei Kitaev.

“What do you plan to cover?” I asked at the end of winter term.

Topological quantum computation, Alexei replied. How you simulate Hamiltonians with quantum circuits. Or maybe…well, he was thinking of discussing black holes, information, and chaos.

If I’d had a tail, it would have wagged.

“What would you say about black holes?” I asked.

untitled-2

Sonic’s best friend, Tails the fox.

I fwumped down on the couch in Alexei’s office, and Alexei walked to his whiteboard. Scientists first noticed chaos in classical systems. Consider a double pendulum—a pendulum that hangs from the bottom of a pendulum that hangs from, say, a clock face. Imagine pulling the bottom pendulum far to one side, then releasing. The double pendulum will swing, bend, and loop-the-loop like a trapeze artist. Imagine freezing the trapeze artist after an amount t of time.

What if you pulled another double pendulum a hair’s breadth less far? You could let the pendulum swing, wait for a time t, and freeze this pendulum. This pendulum would probably lie far from its brother. This pendulum would probably have been moving with a different speed than its brother, in a different direction, just before the freeze. The double pendulum’s motion changes loads if the initial conditions change slightly. This sensitivity to initial conditions characterizes classical chaos.

A mathematical object F(t) reflects quantum systems’ sensitivities to initial conditions. [Experts: F(t) can evolve as an exponential governed by a Lyapunov-type exponent: \sim 1 - ({\rm const.})e^{\lambda_{\rm L} t}.] F(t) encodes a hypothetical process that snakes back and forth through time. This snaking earned F(t) the name “the out-of-time-ordered correlator” (OTOC). The snaking prevents experimentalists from measuring quantum systems’ OTOCs easily. But experimentalists are trying, because F(t) reveals how quantum information spreads via entanglement. Such entanglement distinguishes black holes, cold atoms, and specially prepared light from everyday, classical systems.

Alexei illustrated, on his whiteboard, the sensitivity to initial conditions.

“In case you’re taking votes about what to cover this spring,” I said, “I vote for chaos.”

We covered chaos. A guest attended one lecture: Beni Yoshida, a former IQIM postdoc. Beni and colleagues had devised quantum error-correcting codes for black holes.3 Beni’s foray into black-hole physics had led him to F(t). He’d written an OTOC paper that Alexei presented about. Beni presented about a follow-up paper. If I’d had another tail, it would have wagged.

tails-2

Sonic’s friend has two tails.

Alexei’s course ended. My research shifted to many-body localization (MBL), a quantum phenomenon that stymies the spread of information. OTOC talk burbled beyond my office door.

At the end of the summer, IQIM postdoc Yichen Huang posted on Facebook, “In the past week, five papers (one of which is ours) appeared . . . studying out-of-time-ordered correlators in many-body localized systems.”

I looked down at the MBL calculation I was performing. I looked at my computer screen. I set down my pencil.

“Fine.”

I marched to John Preskill’s office.

boss

The bosses. Of different sorts, of course.

The OTOC kept flaring on my radar, I reported. Maybe the time had come for me to try contributing to the discussion. What might I contribute? What would be interesting?

We kicked around ideas.

“Well,” John ventured, “you’re interested in fluctuation relations, right?”

Something clicked like the “power” button on a video-game console.

Fluctuation relations are equations derived in nonequilibrium statistical mechanics. They describe systems driven far from equilibrium, like a DNA strand whose ends you’ve yanked apart. Experimentalists use fluctuation theorems to infer a difficult-to-measure quantity, a difference \Delta F between free energies. Fluctuation relations imply the Second Law of Thermodynamics. The Second Law relates to the flow of time and the loss of information.

Time…loss of information…Fluctuation relations smelled like the OTOC. The two had to join together.

on-button

I spent the next four days sitting, writing, obsessed. I’d read a paper, three years earlier, that casts a fluctuation relation in terms of a correlator. I unearthed the paper and redid the proof. Could I deform the proof until the paper’s correlator became the out-of-time-ordered correlator?

Apparently. I presented my argument to my research group. John encouraged me to clarify a point: I’d defined a mathematical object A, a probability amplitude. Did A have physical significance? Could anyone measure it? I consulted measurement experts. One identified A as a quasiprobability, a quantum generalization of a probability, used to model light in quantum optics. With the experts’ assistance, I devised two schemes for measuring the quasiprobability.

The result is a fluctuation-like relation that contains the OTOC. The OTOC, the theorem reveals, is a combination of quasiprobabilities. Experimentalists can measure quasiprobabilities with weak measurements, gentle probings that barely disturb the probed system. The theorem suggests two experimental protocols for inferring the difficult-to-measure OTOC, just as fluctuation relations suggest protocols for inferring the difficult-to-measure \Delta F. Just as fluctuation relations cast \Delta F in terms of a characteristic function of a probability distribution, this relation casts F(t) in terms of a characteristic function of a (summed) quasiprobability distribution. Quasiprobabilities reflect entanglement, as the OTOC does.

pra-image

Collaborators and I are extending this work theoretically and experimentally. How does the quasiprobability look? How does it behave? What mathematical properties does it have? The OTOC is motivating questions not only about our quasiprobability, but also about quasiprobability and weak measurements. We’re pushing toward measuring the OTOC quasiprobability with superconducting qubits or cold atoms.

Chaos has evolved from an enemy to a curiosity, from a god of destruction to an inspiration. I no longer play the electric-blue hedgehog. But I remain electrified.

 

1I hadn’t started studying physics, ok?

2Don’t ask me how the liquid’s surface tension rises enough to maintain the limbs’ shapes.

3Black holes obey quantum mechanics. Quantum systems can solve certain problems more quickly than ordinary (classical) computers. Computers make mistakes. We fix mistakes using error-correcting codes. The codes required by quantum computers differ from the codes required by ordinary computers. Systems that contain black holes, we can regard as performing quantum computations. Black-hole systems’ mistakes admit of correction via the code constructed by Beni & co. 


February 15, 2017

John PreskillTen finalists selected for film festival “Quantum Shorts”

“Crazy enough”, “visually very exciting”, “compelling from the start”, “beautiful cinematography”: this is what members of the Quantum Shorts festival shortlisting panel had to say about films selected for screening. As a member of the panel, and as someone who has experienced the power of visual storytelling firsthand (Anyone Can Quantum, Quantum Is Calling), I was excited to see filmmakers and students from around the world try their hand at interpreting the weirdness of the quantum realm in fresh ways.

The ten shortlisted films were chosen from a total of 203 submissions received during the festival’s 2016 call for entries. Some of the finalists are dramatic, some funny, some abstract. Some are live-action film, some animation. Each is under five minutes long. Find the titles and synopses of the shortlisted films below.

Screenings of the films start February 23 with confirmed events in Waterloo (23 February) and Vancouver (23 February), Canada; Singapore (25-28 February); Glasgow, UK (17 March); and Brisbane, Australia (24 March).

More details can be found at shorts.quantumlah.org, where viewers can also watch the films online
and vote for their favorite to help decide a ‘People’s Choice’ prize. The website also hosts interviews with the filmmakers.

The Quantum Shorts festival is run by the Centre for Quantum Technologies at the National University of Singapore with a constellation of prestigious partners including Scientific American magazine and the journal Nature. The festival’s media partners, scientific partners and screening partners span five countries. The Institute for Quantum Information and Matter at Caltech is a proud sponsor.

For making the shortlist, the filmmakers receive a $250 award, a one-year digital subscription to Scientific American and certificates.

The festival’s top prize of US $1500 and runner-up prize of US $1000 will now be decided by a panel of eminent judges. The additional People’s Choice prize of $500 will be decided by public vote on the shortlist, with voting open on the festival website until March 26th. Prizes will be announced by the end of March.

Quantum Shorts 2016: FINALISTS

Ampersand

What unites everything on Earth? That we are all ultimately composed of something that is both matter & wave

Submitted by Erin Shea, United States

Approaching Reality

Dancing cats, a watchful observer and a strange co-existence. It’s all you need to understand the essence of quantum mechanics

Submitted by Simone De Liberato, United Kingdom

Bolero

The coin is held fast, but is it heads or tails? As long as the fist remains closed, you are a winner – and a loser

Submitted by Ivan D’Antonio, Italy

Novae

What happens when a massive star reaches the end of its life? Something that goes way beyond the spectacular, according to this cosmic poem about the infinite beauty of a black hole’s birth

Submitted by Thomas Vanz, France

The Guardian

A quantum love triangle, where uncertainty is the only winner

Submitted by Chetan Kotabage, India

The Real Thing

Picking up a beverage shouldn’t be this hard. And it definitely shouldn’t take you through the multiverse…

Submitted by Adam Welch, United States

Together – Parallel Universe

It’s a tale as old as time: boy meets girl, girl is not as interested as boy hoped. So boy builds spaceship and travels through multi-dimensional reality to find the one universe where they can be together

Submitted by Michael Robertson, South Africa

Tom’s Breakfast

This is one of those days when Tom’s morning routine doesn’t go to plan – far from it, in fact. The only question is, can he be philosophical about it?

Submitted by Ben Garfield, United Kingdom

Triangulation

Only imagination can show us the hidden world inside of fundamental particles

Submitted by Vladimir Vlasenko, Ukraine

Whitecap

Dr. David Long has discovered how to turn matter into waveforms. So why shouldn’t he experiment with his own existence?

Submitted by Bernard Ong, United States


qs_trailer_final

February 14, 2017

Mark Chu-CarrollDoes well-ordering contradict Cantor?

The other day, I received an email that actually excited me! It’s a question related to Cantor’s diagonalization, but there’s absolutely nothing cranky about it! It’s something interesting and subtle. So without further ado:

Cantor’s diagonalization says that you can’t put the reals into 1 to 1 correspondence with the integers. The well-ordering theorem seems to suggest that you can pick a least number from every set including the reals, so why can’t you just keep picking least elements to put them into 1 to 1 correspondence with the reals. I understand why Cantor says you can’t. I just don’t see what is wrong with the other arguments (other than it must be wrong somehow). Apologies for not being able to state the argument in formal maths, I’m around 20 years out of practice for formal maths.

As we’ve seen in too many discussions of Cantor’s diagonalization, it’s a proof that shows that it is impossible to create a one-to-one correspondence between the natural numbers and the real numbers.

The Well-ordering says something that seems innoccuous at first, but which, looked at in depth, really does appear to contradict Cantor’s diagonalization.

A set S is well-ordered if there exists a total ordering <= on the set, with the additional property that for any subset T \subseteq S, T has a smallest element.

The well-ordering theorem says that every non-empty set can be well-ordered. Since the set of real numbers is a set, that means that there exists a well-ordering relation over the real numbers.

The problem with that is that it appears that that tells you a way of producing an enumeration of the reals! It says that the set of all real numbers has a least element: Bingo, there’s the first element of the enumeration! Now you take the set of real numbers excluding that one, and it has a least element under the well-ordering relation: there’s the second element. And so on. Under the well-ordering theorem, then, every set has a least element; and every element has a unique successor! Isn’t that defining an enumeration of the reals?

The solution to this isn’t particularly satisfying on an intuitive level.

The well-ordering theorem is, mathematically, equivalent to the axiom of choice. And like the axiom of choice, it produces some very ugly results. It can be used to create “existence” proofs of things that, in a practical sense, don’t exist in a usable form. It proves that something exists, but it doesn’t prove that you can ever produce it or even identify it if it’s handed to you.

So there is an enumeration of the real numbers under the well ordering theorem. Only the less-than relation used to define the well-ordering is not the standard real-number less than operation. (It obviously can’t be, because under well-ordering, every set has a least element, and standard real-number less-than doesn’t have a least element.) In fact, for any ordering relation \le_x that you can define, describe, or compute, \le_x is not the well-ordering relation for the reals.

Under the well-ordering theorem, the real numbers have a well-ordering relation, only you can’t ever know what it is. You can’t define any element of it; even if someone handed it to you, you couldn’t tell that you had it.

It’s very much like the Banach-Tarski paradox: we can say that there’s a way of doing it, only we can’t actually do it in practice. In the B-T paradox, we can say that there is a way of cutting a sphere into these strange pieces – but we can’t describe anything about the cut, other than saying that it exists. The well-ordering of the reals is the same kind of construct.

How does this get around Cantor? It weasels its way out of Cantor by the fact that while the well-ordering exists, it doesn’t exist in a form that can be used to produce an enumeration. You can’t get any kind of handle on the well-ordering relation. You can’t produce an enumeration from something that you can’t create or identify – just like you can’t ever produce any of the pieces of the Banach-Tarski cut of a sphere. It exists, but you can’t use it to actually produce an enumeration. So the set of real numbers remains non-enumerable even though it’s well-ordered.

If that feels like a cheat, well… That’s why a lot of people don’t like the axiom of choice. It produces cheatish existence proofs. Connecting back to something I’ve been trying to write about, that’s a big part of the reason why intuitionistic type theory exists: it’s a way of constructing math without stuff like this. In an intuitionistic type theory (like the Martin-Lof theory that I’ve been writing about), it doesn’t exist if you can’t construct it.

February 10, 2017

Terence TaoA bound on partitioning clusters

Daniel Kane and I have just uploaded to the arXiv our paper “A bound on partitioning clusters“, submitted to the Electronic Journal of Combinatorics. In this short and elementary paper, we consider a question that arose from biomathematical applications: given a finite family {X} of sets (or “clusters”), how many ways can there be of partitioning a set {A \in X} in this family as the disjoint union {A = A_1 \uplus A_2} of two other sets {A_1, A_2} in this family? That is to say, what is the best upper bound one can place on the quantity

\displaystyle | \{ (A,A_1,A_2) \in X^3: A = A_1 \uplus A_2 \}|

in terms of the cardinality {|X|} of {X}? A trivial upper bound would be {|X|^2}, since this is the number of possible pairs {(A_1,A_2)}, and {A_1,A_2} clearly determine {A}. In our paper, we establish the improved bound

\displaystyle | \{ (A,A_1,A_2) \in X^3: A = A_1 \uplus A_2 \}| \leq |X|^{3/p}

where {p} is the somewhat strange exponent

\displaystyle p := \log_3 \frac{27}{4} = 1.73814\dots, \ \ \ \ \ (1)

 

so that {3/p = 1.72598\dots}. Furthermore, this exponent is best possible!

Actually, the latter claim is quite easy to show: one takes {X} to be all the subsets of {\{1,\dots,n\}} of cardinality either {n/3} or {2n/3}, for {n} a multiple of {3}, and the claim follows readily from Stirling’s formula. So it is perhaps the former claim that is more interesting (since many combinatorial proof techniques, such as those based on inequalities such as the Cauchy-Schwarz inequality, tend to produce exponents that are rational or at least algebraic). We follow the common, though unintuitive, trick of generalising a problem to make it simpler. Firstly, one generalises the bound to the “trilinear” bound

\displaystyle | \{ (A_1,A_2,A_3) \in X_1 \times X_2 \times X_3: A_3 = A_1 \uplus A_2 \}|

\displaystyle \leq |X_1|^{1/p} |X_2|^{1/p} |X_3|^{1/p}

for arbitrary finite collections {X_1,X_2,X_3} of sets. One can place all the sets in {X_1,X_2,X_3} inside a single finite set such as {\{1,\dots,n\}}, and then by replacing every set {A_3} in {X_3} by its complement in {\{1,\dots,n\}}, one can phrase the inequality in the equivalent form

\displaystyle | \{ (A_1,A_2,A_3) \in X_1 \times X_2 \times X_3: \{1,\dots,n\} =A_1 \uplus A_2 \uplus A_3 \}|

\displaystyle \leq |X_1|^{1/p} |X_2|^{1/p} |X_3|^{1/p}

for arbitrary collections {X_1,X_2,X_3} of subsets of {\{1,\dots,n\}}. We generalise further by turning sets into functions, replacing the estimate with the slightly stronger convolution estimate

\displaystyle f_1 * f_2 * f_3 (1,\dots,1) \leq \|f_1\|_{\ell^p(\{0,1\}^n)} \|f_2\|_{\ell^p(\{0,1\}^n)} \|f_3\|_{\ell^p(\{0,1\}^n)}

for arbitrary functions {f_1,f_2,f_3} on the Hamming cube {\{0,1\}^n}, where the convolution is on the integer lattice {\bf Z}^n rather than on the finite field vector space {\bf F}_2^n. The advantage of working in this general setting is that it becomes very easy to apply induction on the dimension {n}; indeed, to prove this estimate for arbitrary {n} it suffices to do so for {n=1}. This reduces matters to establishing the elementary inequality

\displaystyle (ab(1-c))^{1/p} + (bc(1-a))^{1/p} + (ca(1-b))^{1/p} \leq 1

for all {0 \leq a,b,c \leq 1}, which can be done by a combination of undergraduate multivariable calculus and a little bit of numerical computation. (The left-hand side turns out to have local maxima at {(1,1,0), (1,0,1), (0,1,1), (2/3,2/3,2/3)}, with the latter being the cause of the numerology (1).)

The same sort of argument also gives an energy bound

\displaystyle E(A,A) \leq |A|^{\log_2 6}

for any subset {A \subset \{0,1\}^n} of the Hamming cube, where

\displaystyle E(A,A) := |\{(a_1,a_2,a_3,a_4) \in A^4: a_1+a_2 = a_3 + a_4 \}|

is the additive energy of {A}. The example {A = \{0,1\}^n} shows that the exponent {\log_2 6} cannot be improved.


Filed under: math.CO, paper

February 07, 2017

Noncommutative GeometryConnes 70

I am happy to report that to  celebrate Alain Connes' 70th birthday, 3 conferences on noncommutative geometry and its interactions with different fields are planned to take place in Shanghai, China. Students, postdocs, young faculty and all those interested in the subject are encouraged to participate. Please check the  Conference webpage for more details.

John PreskillHow do you hear electronic oscillations with light

For decades, understanding the origin of high temperature superconductivity has been regarded as the Holy Grail by physicists in the condensed matter community. The importance of high temperature superconductivity resides not only in its technological promises, but also in the dazzling number of exotic phases and elementary excitations it puts on display for physicists. These myriad phases and excitations give physicists new dimensions and building bricks for understanding and exploiting the world of collective phenomena. The pseudogap, charge-density-wave, nematic and spin liquid phases, for examples, are a few exotica that are found in cuprate high temperature superconductors. Understanding these phases is important for understanding the mechanism behind high temperature superconductivity, but they are also interesting in and of themselves.

The charge-density-wave (CDW) phase in the cuprates – a spontaneous emergence of a periodic modulation of charge density in real space – has particularly garnered a lot of attention. It emerges upon the destruction of the parent antiferromagnetic Mott insulating phase with doping and it appears to directly compete with superconductivity. Whether or not these features are generic, or maybe even necessary, for high temperature superconductivty is an important question. Unfortunately, currently there exists no other comparable high temperature superconducting materials family that enables such questions to be answered.

iqim

Recently, the iridates have emerged as a possible analog to the cuprates. The single layer variant Sr2IrO4, for example, exhibits signatures of both a pseudogap phase and a high temperature superconducting phase. However, with an increasing parallel being drawn between the iridates and the cuprates in terms of their electronic phases, CDW has so far eluded detection in any iridate, calling into question the validity of this comparison. Rather than studying the single layer variant, we decided to look at the bilayer iridate Sr3Ir2O7 in which a clear Mott insulator to metal transition has been reported with doping.

While CDW has been observed in many materials, what made it elusive in cuprates for many years is its spatially short-ranged (it extends only a few lattice spacings long) and often temporally short-ranged (it blinks in and out of existence quickly) nature. To get a good view of this order, experimentalists had to literally pin it down using external influences like magnetic fields or chemical dopants to suppress the temporal fluctuations and then use very sensitive diffraction or scanning tunneling based probes to observe them.

But rather than looking in real space for signatures of the CDW order, an alternative approach is to look for them in the time domain. Works by the Gedik group at MIT and the Orenstein group at U.C. Berkeley have shown that one can use ultrafast time-resolved optical reflectivity to “listen” for the tone of a CDW to infer its presence in the cuprates. In these experiments, one impulsively excites a coherent mode of the CDW using a femtosecond laser pulse, much like one would excite the vibrational mode of a tuning fork by impulsively banging it. One then stroboscopically looks for these CDW oscillations via temporally periodic modulations in its optical reflectivity, much like one would listen for the tone produced by the tuning fork. If you manage to hear the tone of the CDW, then you have established its existence!

We applied a similar approach to Sr3Ir2O7 and its doped versions [hear our experiment]. To our delight, the ringing of a CDW mode sounded immediately upon doping across its Mott insulator to metal transition, implying that the electronic liquid born from the doped Mott insulator is unstable to CDW formation, very similar to the case in cuprates. Also like the case of cuprates, this charge-density-wave is of a special nature: it is either very short-ranged, or temporally fluctuating. Whether or not there is a superconducting phase that competes with the CDW in Sr3Ir2O7 remains to be seen. If so, the phenomenology of the cuprates may really be quite generic. If not, the interesting question of why not is worth pursuing. And who knows, maybe the fact that we have a system that can be controllably tuned between the antiferromagnetic order and the CDW order may find use in technology some day.


February 06, 2017

John BaezSaving Climate Data (Part 5)

march-for-science-earth-day

There’s a lot going on! Here’s a news roundup. I will separately talk about what the Azimuth Climate Data Backup Project is doing.

I’ll start with the bad news, and then go on to some good news.

Tweaking the EPA website

Scientists are keeping track of how Trump administration is changing the Environmental Protection Agency website, with before-and-after photos, and analysis:

• Brian Kahn, Behold the “tweaks” Trump has made to the EPA website (so far), National Resources Defense Council blog, 3 February 2017.

There’s more about “adaptation” to climate change, and less about how it’s caused by carbon emissions.

All of this would be nothing compared to the new bill to eliminate the EPA, or Myron Ebell’s plan to fire most of the people working there:

• Joe Davidson, Trump transition leader’s goal is two-thirds cut in EPA employees, Washington Post, 30 January 2017.

If you want to keep track of this battle, I recommend getting a 30-day free subscription to this online magazine:

InsideEPA.com.

Taking animal welfare data offline

The Trump team is taking animal-welfare data offline. The US Department of Agriculture will no longer make lab inspection results and violations publicly available, citing privacy concerns:

• Sara Reardon, US government takes animal-welfare data offline, Nature Breaking News, 3 Feburary 2017.

Restricting access to geospatial data

A new bill would prevent the US government from providing access to geospatial data if it helps people understand housing discrimination. It goes like this:

Notwithstanding any other provision of law, no Federal funds may be used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing._

For more on this bill, and the important ways in which such data has been used, see:

• Abraham Gutman, Scott Burris, and the Temple University Center for Public Health Law Research, Where will data take the Trump administration on housing?, Philly.com, 1 February 2017.

The EDGI fights back

The Environmental Data and Governance Initiative or EDGI is working to archive public environmental data. They’re helping coordinate data rescue events. You can attend one and have fun eating pizza with cool people while saving data:

• 3 February 2017, Portland
• 4 February 2017, New York City
• 10-11 February 2017, Austin Texas
• 11 February 2017, U. C. Berkeley, California
• 18 February 2017, MIT, Cambridge Massachusetts
• 18 February 2017, Haverford Connecticut
• 18-19 February 2017, Washington DC
• 26 February 2017, Twin Cities, Minnesota

Or, work with EDGI to organize one your own data rescue event! They provide some online tools to help download data.

I know there will also be another event at UCLA, so the above list is not complete, and it will probably change and grow over time. Keep up-to-date at their site:

Environmental Data and Governance Initiative.

Scientists fight back

The pushback is so big it’s hard to list it all! For now I’ll just quote some of this article:

• Tabitha Powledge, The gag reflex: Trump info shutdowns at US science agencies, especially EPA, 27 January 2017.

THE PUSHBACK FROM SCIENCE HAS BEGUN

Predictably, counter-tweets claiming to come from rebellious employees at the EPA, the Forest Service, the USDA, and NASA sprang up immediately. At The Verge, Rich McCormick says there’s reason to believe these claims may be genuine, although none has yet been verified. A lovely head on this post: “On the internet, nobody knows if you’re a National Park.”

At Hit&Run, Ronald Bailey provides handles for several of these alt tweet streams, which he calls “the revolt of the permanent government.” (That’s a compliment.)

Bailey argues, “with exception perhaps of some minor amount of national security intelligence, there is no good reason that any information, data, studies, and reports that federal agencies produce should be kept from the public and press. In any case, I will be following the Alt_Bureaucracy feeds for a while.”

NeuroDojo Zen Faulkes posted on how to demand that scientific societies show some backbone. “Ask yourself: “Have my professional societies done anything more political than say, ‘Please don’t cut funding?’” Will they fight?,” he asked.

Scientists associated with the group_ 500 Women Scientists _donned lab coats and marched in DC as part of the Women’s March on Washington the day after Trump’s Inauguration, Robinson Meyer reported at the Atlantic. A wildlife ecologist from North Carolina told Meyer, “I just can’t believe we’re having to yell, ‘Science is real.’”

Taking a cue from how the Women’s March did its social media organizing, other scientists who want to set up a Washington march of their own have put together a closed Facebook group that claims more than 600,000 members, Kate Sheridan writes at STAT.

The #ScienceMarch Twitter feed says a date for the march will be posted in a few days. [The march will be on 22 April 2017.] The group also plans to release tools to help people interested in local marches coordinate their efforts and avoid duplication.

At The Atlantic, Ed Yong describes the political action committee 314Action. (314=the first three digits of pi.)

Among other political activities, it is holding a webinar on Pi Day—March 14—to explain to scientists how to run for office. Yong calls 314Action the science version of Emily’s List, which helps pro-choice candidates run for office. 314Action says it is ready to connect potential candidate scientists with mentors—and donors.

Other groups may be willing to step in when government agencies wimp out. A few days before the Inauguration, the Centers for Disease Control and Prevention abruptly and with no explanation cancelled a 3-day meeting on the health effects of climate change scheduled for February. Scientists told Ars Technica’s Beth Mole that CDC has a history of running away from politicized issues.

One of the conference organizers from the American Public Health Association was quoted as saying nobody told the organizers to cancel.

I believe it. Just one more example of the chilling effect on global warming. In politics, once the Dear Leader’s wishes are known, some hirelings will rush to gratify them without being asked.

The APHA guy said they simply wanted to head off a potential last-minute cancellation. Yeah, I guess an anticipatory pre-cancellation would do that.

But then—Al Gore to the rescue! He is joining with a number of health groups—including the American Public Health Association—to hold a one-day meeting on the topic Feb 16 at the Carter Center in Atlanta, CDC’s home base. Vox’s Julia Belluz reports that it is not clear whether CDC officials will be part of the Gore rescue event.

The Sierra Club fights back

The Sierra Club, of which I’m a proud member, is using the Freedom of Information Act or FOIA to battle or at least slow the deletion of government databases. They wisely started even before Trump took power:

• Jennifer A Dlouhy, Fearing Trump data purge, environmentalists push to get records, BloombergMarkets, 13 January 2017.

Here’s how the strategy works:

U.S. government scientists frantically copying climate data they fear will disappear under the Trump administration may get extra time to safeguard the information, courtesy of a novel legal bid by the Sierra Club.

The environmental group is turning to open records requests to protect the resources and keep them from being deleted or made inaccessible, beginning with information housed at the Environmental Protection Agency and the Department of Energy. On Thursday [January 9th], the organization filed Freedom of Information Act requests asking those agencies to turn over a slew of records, including data on greenhouse gas emissions, traditional air pollution and power plants.

The rationale is simple: Federal laws and regulations generally block government agencies from destroying files that are being considered for release. Even if the Sierra Club’s FOIA requests are later rejected, the record-seeking alone could prevent files from being zapped quickly. And if the records are released, they could be stored independently on non-government computer servers, accessible even if other versions go offline.


Mark Chu-CarrollUnderstanding Global Warming Scale Issues

Aside from the endless stream of Cantor cranks, the next biggest category of emails I get is from climate “skeptics”. They all ask pretty much the same question. For example, here’s one I received today:

My personal analysis, and natural sceptisism tells me, that there are something fundamentally wrong with the entire warming theory when it comes to the CO2.

If a gas in the atmosphere increase from 0.03 to 0.04… that just cant be a significant parameter, can it?

I generally ignore it, because… let’s face it, the majority of people who ask this question aren’t looking for a real answer. But this one was much more polite and reasonable than most, so I decided to answer it. And once I went to the trouble of writing a response, I figured that I might as well turn it into a post as well.

The current figures – you can find them in a variety of places from wikipedia to the US NOAA – are that the atmosphere CO2 has changed from around 280 parts per million in 1850 to 400 parts per million today.

Why can’t that be a significant parameter?

There’s a couple of things to understand to grasp global warming: how much energy carbon dioxide can trap in the atmosphere, and hom much carbon dioxide there actually is in the atmosphere. Put those two facts together, and you realize that we’re talking about a massive quantity of carbon dioxide trapping a massive amount of energy.

The problem is scale. Humans notoriously have a really hard time wrapping our heads around scale. When numbers get big enough, we aren’t able to really grasp them intuitively and understand what they mean. The difference between two numbers like 300 and 400ppm is tiny, we can’t really grasp how in could be significant, because we aren’t good at taking that small difference, and realizing just how ridiculously large it actually is.

If you actually look at the math behind the greenhouse effect, you find that some gasses are very effective at trapping heat. The earth is only habitable because of the carbon dioxide in the atmosphere – without it, earth would be too cold for life. Small amounts of it provide enough heat-trapping effect to move us from a frozen rock to the world we have. Increasing the quantity of it increases the amount of heat it can trap.

Let’s think about what the difference between 280 and 400 parts per million actually means at the scale of earth’s atmosphere. You hear a number like 400ppm – that’s 4 one-hundreds of one percent – that seems like nothing, right? How could that have such a massive effect?!

But like so many other mathematical things, you need to put that number into the appropriate scale. The earths atmosphere masses roughly 5 times 10^21 grams. 400ppm of that scales to 2 times 10^18 grams of carbon dioxide. That’s 2 billion trillion kilograms of CO2. Compared to 100 years ago, that’s about 800 million trillion kilograms of carbon dioxide added to the atmosphere over the last hundred years. That’s a really, really massive quantity of carbon dioxide! scaled to the number of particles, that’s something around 10^40th (plus or minus a couple of powers of ten – at this scale, who cares?) additional molecules of carbon dioxide in the atmosphere. It’s a very small percentage, but it’s a huge quantity.

When you talk about trapping heat, you also have to remember that there’s scaling issues there, too. We’re not talking about adding 100 degrees to the earths temperature. It’s a massive increase in the quantity of energy in the atmosphere, but because the atmosphere is so large, it doesn’t look like much: just a couple of degrees. That can be very deceptive – 5 degrees celsius isn’t a huge temperature difference. But if you think of the quantity of extra energy that’s being absorbed by the atmosphere to produce that difference, it’s pretty damned huge. It doesn’t necessarily look like all that much when you see it stated at 2 degrees celsius – but if you think of it terms of the quantity of additional energy being trapped by the atmosphere, it’s very significant.

Calculating just how much energy a molecule of CO2 can absorb is a lot trickier than calculating the mass-change of the quantity of CO2 in the atmosphere. It’s a complicated phenomenon which involves a lot of different factors – how much infrared is absorbed by an atom, how quickly that energy gets distributed into the other molecules that it interacts with… I’m not going to go into detail on that. There’s a ton of places, like here, where you can look up a detailed explanation. But when you consider the scale issues, it should be clear that there’s a pretty damned massive increase in the capacity to absorb energy in a small percentage-wise increase in the quantity of CO2.

February 04, 2017

Scott AaronsonFirst they came for the Iranians

Action Item: If you’re an American academic, please sign the petition against the Immigration Executive Order. (There are already more than eighteen thousand signatories, including Nobel Laureates, Fields Medalists, you name it, but it could use more!)

I don’t expect this petition to have the slightest effect on the regime, but at least we should demonstrate to the world and to history that American academia didn’t take this silently.


I’m sure there were weeks, in February or March 1933, when the educated, liberal Germans commiserated with each other over the latest outrages of their new Chancellor, but consoled themselves that at least none of it was going to affect them personally.

This time, it’s taken just five days, since the hostile takeover of the US by its worst elements, for edicts from above to have actually hurt my life and (much more directly) the lives of my students, friends, and colleagues.

Today, we learned that Trump is suspending the issuance of US visas to people from seven majority-Islamic countries, including Iran (but strangely not Saudi Arabia, the cradle of Wahhabist terrorism—not that that would be morally justified either).  This suspension might last just 30 days, but might also continue indefinitely—particularly if, as seems likely, the Iranian government thumbs its nose at whatever Trump demands that it do to get the suspension rescinded.

So the upshot is that, until further notice, science departments at American universities can no longer recruit PhD students from Iran—a country that, along with China, India, and a few others, has long been the source of some of our best talent.  This will directly affect this year’s recruiting season, which is just now getting underway.  (If Canada and Australia have any brains, they’ll snatch these students, and make the loss America’s.)

But what about the thousands of Iranian students who are already here?  So far, no one’s rounding them up and deporting them.  But their futures have suddenly been thrown into jeopardy.

Right now, I have an Iranian PhD student who came to MIT on a student visa in 2013.  He started working with me two years ago, on the power of a rudimentary quantum computing model inspired by (1+1)-dimensional integrable quantum field theory.  You can read our paper about it, with Adam Bouland and Greg Kuperberg, here.  It so happens that this week, my student is visiting us in Austin and staying at our home.  He’s spent the whole day pacing around, terrified about his future.  His original plan, to do a postdoc in the US after he finishes his PhD, now seems impossible (since it would require a visa renewal).

Look: in the 11-year history of this blog, there have been only a few occasions when I felt so strongly about something that I stood my ground, even in the face of widespread attacks from people who I otherwise respected.  One, of course, was when I spoke out for shy nerdy males, and for a vision of feminism broad enough to recognize their suffering as a problem.  A second was when I was more blunt about D-Wave, and about its and its supporters’ quantum speedup claims, than some of my colleagues were comfortable with.  But the remaining occasions almost all involved my defending the values of the United States, Israel, Zionism, or “the West,” or condemning Islamic fundamentalism, radical leftism, or the worldviews of such individuals as Noam Chomsky or my “good friend” Mahmoud Ahmadinejad.

Which is simply to say: I don’t think anyone on earth can accuse me of secret sympathies for the Iranian government.

But when it comes to student visas, I can’t see that my feelings about the mullahs have anything to do with the matter.  We’re talking about people who happen to have been born in Iran, who came to the US to do math and science.  Would we rather have these young scientists here, filled with gratitude for the opportunities we’ve given them, or back in Iran filled with justified anger over our having expelled them?

To the Trump regime, I make one request: if you ever decide that it’s the policy of the US government to deport my PhD students, then deport me first.  I’m practically begging you: come to my house, arrest me, revoke my citizenship, and tear up the awards I’ve accepted at the White House and the State Department.  I’d consider that to be the greatest honor of my career.

And to those who cheered Trump’s campaign in the comments of this blog: go ahead, let me hear you defend this.


Update (Jan. 27, 2017): To everyone who’s praised the “courage” that it took me to say this, thank you so much—but to be perfectly honest, it takes orders of magnitude less courage to say this, than to say something that any of your friends or colleagues might actually disagree with! The support has been totally overwhelming, and has reaffirmed my sense that the United States is now effectively two countries, an open and a closed one, locked in a cold Civil War.

Some people have expressed surprise that I’d come out so strongly for Iranian students and researchers, “given that they don’t always agree with my politics,” or given my unapologetic support for the founding principles (if not always the actions) of the United States and of Israel. For my part, I’m surprised that they’re surprised! So let me say something that might be clarifying.

I care about the happiness, freedom, and welfare of all the men and women who are actually working to understand the universe and build the technologies of the future, and of all the bright young people who want to join these quests, whatever their backgrounds and wherever they might be found—whether it’s in Iran or Israel, in India or China or right here in the US.  The system of science is far from perfect, and we often discuss ways to improve it on this blog.  But I have not the slightest interest in tearing down what we have now, or destroying the world’s current pool of scientific talent in some cleansing fire, in order to pursue someone’s mental model of what the scientific community used to look like in Periclean Athens—or for that matter, their fantasy of what it would look like in a post-gender post-racial communist utopia.  I’m interested in the actual human beings doing actual science who I actually meet, or hope to meet.

Understand that, and a large fraction of all the political views that I’ve ever expressed on this blog, even ones that might seem to be in tension with each other, fall out as immediate corollaries.

(Related to that, some readers might be interested in a further explanation of my views about Zionism. See also my thoughts about liberal democracy, in response to numerous comments here by Curtis Yarvin a.k.a. Mencius Moldbug a.k.a. “Boldmug.”)


Update (Jan. 29) Here’s a moving statement from my student Saeed himself, which he asked me to share here.

This is not of my best interest to talk about politics. Not because I am scared but because I know little politics. I am emotionally affected like many other fellow human beings on this planet. But I am still in the US and hopefully I can pursue my degree at MIT. But many other talented friends of mine can’t. Simply because they came back to their hometowns to visit their parents. On this matter, I must say that like many of my friends in Iran I did not have a chance to see my parents in four years, my basic human right, just because I am from a particular nationality; something that I didn’t have any decision on, and that I decided to study in my favorite school, something that I decided when I was 15. When, like many other talented friends of mine, I was teaching myself mathematics and physics hoping to make big impacts in positive ways in the future. And I must say I am proud of my nationality – home is home wherever it is. I came to America to do science in the first place. I still don’t have any other intention, I am a free man, I can do science even in desert, if I have to. If you read history you’ll see scientists even from old ages have always been traveling.

As I said I know little about many things, so I just phrase my own standpoint. You should also talk to the ones who are really affected. A good friend of mine, Ahmad, who studies Mechanical engineering in UC Berkeley, came back to visit his parents in August. He is one of the most talented students I have ever seen in my life. He has been waiting for his student visa since then and now he is ultimately depressed because he cannot finish his degree. The very least the academic society can do is to help students like Ahmad finish their degrees even if it is from abroad. I can’t emphasize enough I know little about many things. But, from a business standpoint, this is a terrible deal for America. Just think about it. All international students in this country have been getting free education untill 22, in the American point of reference, and now they are using their knowledge to build technology in the USA. Just do a simple calculation and see how much money this would amount to. In any case my fellow international students should rethink this deal, and don’t take it unless at the least they are treated with respect. Having said all of this I must say I love the people of America, I have had many great friends here, great advisors specially Scott Aaronson and Aram Harrow, with whom I have been talking about life, religion, freedom and my favorite topic the foundations of the universe. I am grateful for the education I received at MIT and I think I have something I didn’t have before. I don’t even hate Mr Trump. I think he would feel different if we have a cup of coffee sometime.


Update (Jan. 31): See also this post by Terry Tao.


Update (Feb. 2): If you haven’t been checking the comments on this post, come have a look if you’d like to watch me and others doing our best to defend the foundations of Enlightenment and liberal democracy against a regiment of monarchists and neoreactionaries, including the notorious Mencius Moldbug, as well as a guy named Jim who explicitly advocates abolishing democracy and appointing Trump as “God-Emperor” with his sons to succeed him. (Incidentally, which son? Is Ivanka out of contention?)

I find these people to be simply articulating, more clearly and logically than most, the worldview that put Trump into office and where it inevitably leads. And any of us who are horrified by it had better get over our incredulity, fast, and pick up the case for modernity and Enlightenment where Spinoza and Paine and Mill and all the others left it off—because that’s what’s actually at stake here, and if we don’t understand that then we’ll continue to be blindsided.