There is a lot going on. Today, some words about NSF.
Yesterday Sethuraman Panchanathan, the director of the National Science Foundation, resigned 16 months before the end of his six year term. The relevant Science article raises the possibility that this is because, as an executive branch appointee, he would effectively have to endorse the upcoming presidential budget request, which is rumored to be a 55% cut to the agency budget (from around $9B/yr to $4B/yr) and a 50% reduction in agency staffing. (Note: actual appropriations are set by Congress, which has ignored presidential budget requests in the past.) This comes at the end of a week when all new awards were halted at the agency while non-agency personnel conducted "a second review" of all grants, and many active grants have been terminated. Bear in mind, awards this year from NSF are already down 50% over last year, even without official budget cuts.
The NSF has been absolutely critical to a long list of scientific and technological advances over the last 70 years (see here while it's still up). As mentioned previously, government support of basic research has a great return on investment for the national economy, and it's a tiny fraction of government spending. Less than three years ago, the CHIPS & Science Act was passed with supposed bipartisan support in Congress, authorizing the doubling of the NSF budget. Last summer I posted in frustration that this support seemed to be an illusion when it came to actual funding.
People can have disagreements about the "right" level of government support for science in times of fiscal challenges, but as far as I can tell, no one (including and especially Congress so far) voted for the dismantling of the NSF. If you think the present trajectory is wrong, contact your legislators and make your voices heard.
Around 250 BC Archimedes found a general algorithm for computing pi to arbitrary accuracy, and used it to prove that 223/71 < π < 22/7. This seems to be when people started using 22/7 as an approximation to pi.
By the Middle Ages, math had backslid so much in Western Europe that scholars believed pi was actually equal to 22/7.
Around 1020, a mathematician named Franco of Liège got interested in the ancient Greek problem of squaring the circle. But since he believed that pi is 22/7, he started studying the square root of 22/7.
There’s a big difference between being misinformed and being stupid. Liège was misinformed but not stupid. He went ahead to prove that the square root of 22/7 is irrational!
His proof resembles the old Greek proof that the square root of 2 is irrational. I don’t know if Liège was aware of that. I also don’t know if he noticed that if pi were 22/7, it would be possible to square the circle with straightedge and compass. I also don’t know if he wondered why pi was 22/7. He may have just taken it on authority.
But still: math was coming back.
Liège was a student of a student of the famous scholar Gerbert of Aurillac (~950–1003), who studied in the Islamic schools of Sevilla and Córdoba, and thus got some benefits of a culture whose mathematics was light years ahead of Western Europe. Gerbert wrote something interesting: he said that the benefit of mathematics lie in the “sharpening of the mind”.
I got most of this interesting tale from this book:
• Thomas Sonar, trans. Morton Patricia and Keith William Morton, 3000 Years of Analysis: Mathematics in History and Culture, Birkhäuser, 2020. Preface and table of contents free here.
It’s over 700 pages long, but it’s fun to read, and you can start anywhere! The translation is weak and occasionally funny, but tolerable. If its length is intimating, you may enjoy the detailed review here:
• Anthony Weston, 3000 years of analysis, Notices of the American Mathematical Society70 1 (January 2023), 115–121.
A basic type of problem that occurs throughout mathematics is the lifting problem: given some space that “sits above” some other “base” space due to a projection map , and some map from a third space into the base space , find a “lift” of to , that is to say a map such that . In many applications we would like to have preserve many of the properties of (e.g., continuity, differentiability, linearity, etc.).
Of course, if the projection map is not surjective, one would not expect the lifting problem to be solvable in general, as the map to be lifted could simply take values outside of the range of . So it is natural to impose the requirement that be surjective, giving the following commutative diagram to complete:
If no further requirements are placed on the lift , then the axiom of choice is precisely the assertion that the lifting problem is always solvable (once we require to be surjective). Indeed, the axiom of choice lets us select a preimage in the fiber of each point , and one can lift any by setting . Conversely, to build a choice function for a surjective map , it suffices to lift the identity map to .
Of course, the maps provided by the axiom of choice are famously pathological, being almost certain to be discontinuous, non-measurable, etc.. So now suppose that all spaces involved are topological spaces, and all maps involved are required to be continuous. Then the lifting problem is not always solvable. For instance, we have a continuous projection from to , but the identity map cannot be lifted continuously up to , because is contractable and is not.
However, if is a discrete space (every set is open), then the axiom of choice lets us solve the continuous lifting problem from for any continuous surjection , simply because every map from to is continuous. Conversely, the discrete spaces are the only ones with this property: if is a topological space which is not discrete, then if one lets be the same space equipped with the discrete topology, then the only way one can continuously lift the identity map through the “projection map” (that maps each point to itself) is if is itself discrete.
These discrete spaces are theprojective objects in the category of topological spaces, since in this category the concept of an epimorphism agrees with that of a surjective continuous map. Thus can be viewed as the unique (up to isomorphism) projective object in this category that has a bijective continuous map to .
Now let us narrow the category of topological spaces to the category of compact Hausdorff (CH) spaces. Here things should be better behaved; for instance, it is a standard fact in this category that continuous bijections are homeomorphisms, and it is still the case that the epimorphisms are the continuous surjections. So we have a usable notion of a projective object in this category: CH spaces such that any continuous map into another CH space can be lifted via any surjective continuous map to another CH space.
By the previous discussion, discrete CH spaces will be projective, but this is an extremely restrictive set of examples, since of course compact discrete spaces must be finite. Are there any others? The answer was worked out by Gleason:
Proposition 1 A compact Hausdorff space is projective if and only if it is extremally disconnected, i.e., the closure of every open set is again open.
Proof: We begin with the “only if” direction. Let was projective, and let be an open subset of . Then the closure and complement are both closed, hence compact, subsets of , so the disjoint union is another CH space, which has an obvious surjective continuous projection map to formed by gluing the two inclusion maps together. As is projective, the identity map must then lift to a continuous map . One easily checks that has to map to the first component of the disjoint union, and ot the second component; hence , and so is open, giving extremal disconnectedness.
Conversely, suppose that is extremally disconnected, that is a continuous surjection of CH spaces, and is continuous. We wish to lift to a continuous map .
We first observe that it suffices to solve the lifting problem for the identity map , that is to say we can assume without loss of generality that and is the identity. Indeed, for general maps , one can introduce the pullback space
which is clearly a CH space that has a continuous surjection . Any continuous lift of the identity map to , when projected onto , will give a desired lift .
So now we are trying to lift the identity map via a continuous surjection . Let us call this surjection minimally surjective if no restriction of to a proper closed subset of remains surjective. An easy application of Zorn’s lemma shows that every continuous surjection can be restricted to a minimally surjective continuous map . Thus, without loss of generality, we may assume that is minimally surjective.
The key claim now is that every minimally surjective map into an extremally disconnected space is in fact a bijection. Indeed, suppose for contradiction that there were two distinct points in that mapped to the same point under . By taking contrapositives of the minimal surjectivity property, we see that every open neighborhood of must contain at least one fiber of , and by shrinking this neighborhood one can ensure the base point is arbitrarily close to . Thus, every open neighborhood of must intersect every open neighborhood of , contradicting the Hausdorff property.
It is well known that continuous bijections between CH spaces must be homeomorphisms (they map compact sets to compact sets, hence must be open maps). So is a homeomorphism, and one can lift the identity map to the inverse map .
Remark 2 The property of being “minimally surjective” sounds like it should have a purely category-theoretic definition, but I was unable to match this concept to a standard term in category theory (something along the lines of a “minimal epimorphism”, I would imagine).
In view of this proposition, it is now natural to look for extremally disconnected CH spaces (also known as Stonean spaces). The discrete CH spaces are one class of such spaces, but they are all finite. Unfortunately, these are the only “small” examples:
Lemma 3 Any first countable extremally disconnected CH space is discrete.
Proof: If such a space were not discrete, one could find a sequence in converging to a limit such that for all . One can sparsify the elements to all be distinct, and from the Hausdorff property one can construct neighbourhoods of each that avoid , and are disjoint from each other. Then and then are disjoint open sets that both have as an adherent point, which is inconsistent with extremal disconnectedness: the closure of contains but is disjoint from , so cannot be open.
Thus for instance there are no extremally disconnected compact metric spaces, other than the finite spaces; for instance, the Cantor space is not extremally disconnected, even though it is totally disconnected (which one can easily see to be a property implied by extremal disconnectedness). On the other hand, once we leave the first-countable world, we have plenty of such spaces:
Lemma 4 Let be a complete Boolean algebra. Then the Stone dual of (i.e., the space of boolean homomorphisms ) is an extremally disconnected CH space.
Proof: The CH properties are standard. The elements of give a basis of the topology given by the clopen sets . Because the Boolean algebra is complete, we see that the closure of the open set for any family of sets is simply the clopen set , which obviously open, giving extremal disconnectedness.
Remark 5 In fact, every extremally disconnected CH space is homeomorphic to a Stone dual of a complete Boolean algebra (and specifically, the clopen algebra of ); see Gleason’s paper.
Corollary 6 Every CH space is the surjective continuous image of an extremally disconnected CH space.
Proof: Take the Stone-Čech compactification of equipped with the discrete topology, or equivalently the Stone dual of the power set (i.e., the ultrafilters on ). By the previous lemma, this is an extremally disconnected CH space. Because every ultrafilter on a CH space has a unique limit, we have a canonical map from to , which one can easily check to be continuous and surjective.
Remark 7 In fact, to each CH space one can associate an extremally disconnected CH space with a minimally surjective continuous map . The construction is the same, but instead of working with the entire power set , one works with the smaller (but still complete) Boolean algebra of domains – closed subsets of which are the closure of their interior, ordered by inclusion. This is unique up to homoeomorphism, and is thus a canonical choice of extremally disconnected space to project onto . See the paper of Gleason for details.
Several facts in analysis concerning CH spaces can be made easier to prove by utilizing Corollary 6 and working first in extremally disconnected spaces, where some things become simpler. My vague understanding is that this is highly compatible with the modern perspective of condensed mathematics, although I am not an expert in this area. Here, I will just give a classic example of this philosophy, due to Garling and presented in this paper of Hartig:
Theorem 8 (Riesz representation theorem) Let be a CH space, and let be a bounded linear functional. Then there is a (unique) Radon measure on (on the Baire -algebra, generated by ) such for all .
Uniqueness of the measure is relatively straightforward; the difficult task is existence, and most known proofs are somewhat complicated. But one can observe that the theorem “pushes forward” under surjective maps:
Proposition 9 Suppose is a continuous surjection between CH spaces. If the Riesz representation theorem is true for , then it is also true for .
Proof: As is surjective, the pullback map is an isometry, hence every bounded linear functional on can be viewed as a bounded linear functional on a subspace of , and hence by the Hahn–Banach theorem it extends to a bounded linear functional on . By the Riesz representation theorem on , this latter functional can be represented as an integral against a Radon measure on . One can then check that the pushforward measure is then a Radon measure on , and gives the desired representation of the bounded linear functional on .
In view of this proposition and Corollary 6, it suffices to prove the Riesz representation theorem for extremally disconnected CH spaces. But this is easy:
Proposition 10 The Riesz representation theorem is true for extremally disconnected CH spaces.
Proof: The Baire -algebra is generated by the Boolean algebra of clopen sets. A functional induces a finitely additive measure on this algebra by the formula . This is in fact a premeasure, because by compactness the only way to partition a clopen set into countably many clopen sets is to have only finitely many of the latter sets non-empty. By the Carathéodory extension theorem, then extends to a Baire measure, which one can check to be a Radon measure that represents (the finite linear combinations of indicators of clopen sets are dense in ).
Last week I visited Harvard and MIT, and as advertised in my last post, gave the Yip Lecture at Harvard on the subject “How Much Math Is Knowable?” The visit was hosted by Harvard’s wonderful Center of Mathematical Sciences and Applications (CMSA), directed by my former UT Austin colleague Dan Freed. Thanks so much to everyone at CMSA for the visit.
I’m told it was one of my better performances. As always, I strongly recommend watching at 2x speed.
I opened the lecture by saying that, while obviously it would always be an honor to give the Yip Lecture at Harvard, it’s especially an honor right now, as the rest of American academia looks to Harvard to defend the value of our entire enterprise. I urged Harvard to “fight fiercely,” in the words of the Tom Lehrer song.
I wasn’t just fishing for applause; I meant it. It’s crucial for people to understand that, in its total war against universities, MAGA has now lost, not merely the anti-Israel leftists, but also most conservatives, classical liberals, Zionists, etc. with any intellectual scruples whatsoever. To my mind, this opens up the possibility for a broad, nonpartisan response, highlighting everything universities (yes, even Harvard ) do for our civilization that’s worth defending.
For three days in my old hometown of Cambridge, MA, I met back-to-back with friends and colleagues old and new. Almost to a person, they were terrified about whether they’ll be able to keep doing science as their funding gets decimated, but especially terrified for anyone who they cared about on visas and green cards. International scholars can now be handcuffed, deported, and even placed in indefinite confinement for pretty much any reason—including long-ago speeding tickets—or no reason at all. The resulting fear has paralyzed, in a matter of months, an American scientific juggernaut that took a century to build.
A few of my colleagues personally knew Rümeysa Öztürk, the Turkish student at Tufts who currently sits in prison for coauthoring an editorial for her student newspaper advocating the boycott of Israel. I of course disagree with what Öztürk wrote … and that is completely irrelevant to my moral demand that she go free. Even supposing the government had much more on her than this one editorial, still the proper response would seem to be a deportation notice—“either contest our evidence in court, or else get on the next flight back to Turkey”—rather than grabbing Öztürk off the street and sending her to indefinite detention in Louisiana. It’s impossible to imagine any university worth attending where the students live in constant fear of imprisonment for the civil expression of opinions.
To help calibrate where things stand right now, here’s the individual you might expect to be most on board with a crackdown on antisemitism at Harvard:
Jason Rubenstein, the executive director of Harvard Hillel, said that the school is in the midst of a long — and long-overdue — reckoning with antisemitism, and that [President] Garber has taken important steps to address the problem. Methodical federal civil rights oversight could play a constructive role in that reform, he said. “But the government’s current, fast-paced assault against Harvard – shuttering apolitical, life-saving research; targeting the university’s tax-exempt status; and threatening all student visas … is neither deliberate nor methodical, and its disregard for the necessities of negotiation and due process threatens the bulwarks of institutional independence and the rule of law that undergird our shared freedoms.”
Meanwhile, as the storm clouds over American academia continue to darken, I’ll just continue to write what I think about everything, because what else can I do?
Last night, alas, I lost yet another left-wing academic friend, the fourth or fifth I’ve lost since October 7. For while I was ready to take a ferocious public stand against the current US government, for the survival and independence of our universities, and for free speech and due process for foreign students, this friend regarded all that as insufficient. He demanded that I also clear the tentifada movement of any charge of antisemitism. For, as he patiently explained to me (while worrying that I wouldn’t grasp the point), while the protesters may have technically violated university rules, disrupted education, created a hostile environment in the sense of Title VI antidiscrimination law in ways that would be obvious were we discussing any other targeted minority, etc. etc., still, the only thing that matters morally is that the protesters represent “the powerless,” whereas Zionist Jews like me represent “the powerful.” So, I told this former friend to go fuck himself. Too harsh? Maybe if he hadn’t been Jewish himself, I could’ve forgiven him for letting the world’s oldest conspiracy theory colonize his brain.
For me, the deep significance of in-person visits, including my recent trip to Harvard, is that they reassure me of the preponderance of sanity within my little world—and thereby of my own sanity. Online, every single day I feel isolated and embattled: pressed in on one side by MAGA forces who claim to care about antisemitism, but then turn out to want the destruction of science, universities, free speech, international exchange, due process of law, and everything else that’s made the modern world less than fully horrible; and on the other side, by leftists who say they stand with me for science and academic freedom and civil rights and everything else that’s good, but then add that the struggle needs to continue until the downfall of the scheming, moneyed Zionists and the liberation of Palestine from river to sea.
When I travel to universities to give talks, though, I meet one sane, reasonable human being after another. Almost to a person, they acknowledge the reality of antisemitism, ideological monoculture, bureaucracy, spiraling costs, and many other problems at universities—and they care about universities enough to want to fix those problems, rather than gleefully nuking the universities from orbit as MAGA is doing. Mostly, though, people just want me to sign Quantum Computing Since Democritus, or tell me how much they like this blog, or ask questions about quantum algorithms or the Busy Beaver function. Which is fine too, and which you can do in the comments.
Quantum computing finds itself in a peculiar situation. On the technological side, after billions of dollars and decades of research, working quantum computers are nearing fruition. But still, the number one question asked about quantum computers is the same as it was two decades ago: What are they good for? The honest answer reveals an elephant in the room: We don’t fully know yet. For theorists like me, this is an opportunity, a call to action.
Technological momentum
Suppose we do not have quantum computers in a few decades time. What will be the reason? It’s unlikely that we’ll encounter some insurmountable engineering obstacle. The theoretical basis of quantum error-correction is solid, and several platforms are now below the error-correction threshold (Harvard, Yale, Google). Experimentalists believe today’s technology can scale to 100 logical qubits and gates—the megaquop era. If mankind spends $100 billion over the next few decades, it’s likely we could build a quantum computer.
A more concerning reason that quantum computing might fail is that there is not enough incentive to justify such a large investment in R&D and infrastructure. Let’s make a comparison to nuclear fusion. Like quantum hardware, they have challenging science and engineering problems to solve. However, if a nuclear fusion lab were to succeed in their mission of building a nuclear fusion reactor, the application would be self-evident. This is not the case for quantum computing—it is a sledgehammer looking for nails to hit.
Nevertheless, industry investment in quantum computing is currently accelerating. To maintain the momentum, it is critical to match investment growth and hardware progress with algorithmic capabilities. The time to discover quantum algorithms is now.
Empowered theorists
Theory research is forward-looking and predictive. Theorists such as Geoffrey Hinton laid the foundations of the current AI revolution. But decades later, with an abundance of computing hardware, AI has become much more of an empirical field. I look forward to the day that quantum hardware reaches a state of abundance, but that day is not yet here.
Today, quantum computing is an area where theorists have extraordinary leverage. A few pages of mathematics by Peter Shor inspired thousands of researchers, engineers and investors to join the field. Perhaps another few pages by someone reading this blog will establish a future of world-altering impact for the industry. There are not many places where mathematics has such potential for influence. An entire community of experimentalists, engineers, and businesses are looking to the theorists for ideas.
The Challenge
Traditionally, it is thought that the ideal quantum algorithm would exhibit three features. First, it should be provably correct, giving a guarantee that executing the quantum circuit reliably will achieve the intended outcome. Second, the underlying problem should be classically hard—the output of the quantum algorithm should be computationally hard to replicate with a classical algorithm. Third, it should be useful, with the potential to solve a problem of interest in the real world. Shor’s algorithm comes close to meeting all of these criteria. However, demanding all three in an absolute fashion may be unnecessary and perhaps even counterproductive to progress.
Provable correctness is important, since today we cannot yet empirically test quantum algorithms on hardware at scale. But what degree of evidence should we require for classical hardness? Rigorous proof of classical hardness is currently unattainable without resolving major open problems like P vs NP, but there are softer forms of proof, such as reductions to well-studied classical hardness assumptions.
I argue that we should replace the ideal of provable hardness with a more pragmatic approach: The quantum algorithm should outperform the best known classical algorithm that produces the same output by a super-quadratic speedup.1 Emphasizing provable classical hardness might inadvertently impede the discovery of new quantum algorithms, since a truly novel quantum algorithm could potentially introduce a new classical hardness assumption that differs fundamentally from established ones. The back-and-forth process of proposing and breaking new assumptions is a productive direction that helps us triangulate where quantum advantage lies.
It may also be unproductive to aim directly at solving existing real-world problems with quantum algorithms. Fundamental computational tasks with quantum advantage are special and we have very few examples, yet they necessarily provide the basis for any eventual quantum application. We should search for more of these fundamental tasks and match them to applications later.
That said, it is important to distinguish between quantum algorithms that could one day provide the basis for a practically relevant computation, and those that will not. In the real world, computations are not useful unless they are verifiable or at least repeatable. For instance, consider a quantum simulation algorithm that computes a physical observable. If two different quantum computers run the simulation and get the same answer, one can be confident that this answer is correct and that it makes a robust prediction about the world. Some problems such as factoring are naturally easy to verify classically, but we can set the bar even lower: The output of a useful quantum algorithm should at least be repeatable by another quantum computer.
There is a subtle fourth requirement of paramount importance that is often overlooked, captured by the following litmus test: If given a quantum computer tomorrow, could you implement your quantum algorithm? In order to do so, you need not only a quantum algorithm but also a distribution over its inputs on which to run it. Classical hardness must then be judged in the average case over this distribution of inputs, rather than in the worst case.
I’ll end this section with a specific caution regarding quantum algorithms whose output is the expectation value of an observable. A common reason these proposals fail to be classically hard is that the expectation value exponentially concentrates over the distribution of inputs. When this happens, a trivial classical algorithm can replicate the quantum result by simply outputting the concentrated (typical) value for every input. To avoid this, we must seek ensembles of quantum circuits whose expectation values exhibit meaningful variation and sensitivity to different inputs.
We can crystallize these priorities into the following challenge:
The Challenge Find a quantum algorithm and a distribution over its inputs with the following features: — (Provable correctness.) The quantum algorithm is provably correct. — (Classical hardness.) The quantum algorithm outperforms the best known classical algorithm that performs the same task by a super-quadratic speedup, in the average-case over the distribution of inputs. — (Potential utility.) The output is verifiable, or at least repeatable.
We can categorize quantum algorithms by the form of their output. First, there are quantum algorithms for search problems, which produce a bitstring satisfying some constraints. This could be the prime factors of a number, a planted feature in some dataset, or the solution to an optimization problem. Next, there are quantum algorithms that compute a value to some precision, for example the expectation value of some physical observable. Then there are proofs of quantumness, which involve a verifier who generates a test using some hidden key, and the key can be used to verify the output. Finally, there are quantum algorithms which sample from some distribution.
Hamiltonian simulation is perhaps the most widely heralded source of quantum utility. Physics and chemistry contain many quantities that Nature computes effortlessly, yet remain beyond the reach of even our best classical simulations. Quantum computation is capable of simulating Nature directly, giving us strong reason to believe that quantum algorithms can compute classically-hard quantities.
There are already many examples where a quantum computer could help us answer an unsolved scientific question, like determining the phase diagram of the Hubbard model or the ground energy of FeMoCo. These undoubtedly have scientific value. However, they are isolated examples, whereas we would like evidence that the pool of quantum-solvable questions is inexhaustible. Can we take inspiration from strongly correlated physics to write down a concrete ensemble of Hamiltonian simulation instances where there is a classically-hard observable? This would gather evidence for the sustained, broad utility of quantum simulation, and would also help us understand where and how quantum advantage arises.
Over in the computer science community, there has been a lot of work on oracle separations such as welded trees and forrelation, which should give us confidence in the abilities of quantum computers. Can we instantiate these oracles in a way that pragmatically remains classically hard? This is necessary in order to pass our earlier litmus test of being ready to run the quantum algorithm tomorrow.
The issue with these broad frameworks is that they often do not specify a distribution over inputs. Can we find novel ensembles of inputs to these frameworks which exhibit super-quadratic speedups? BQP-completeness shows that one has translated the notion of quantum computation into a different language, which allows one to embed an existing quantum algorithm such as Shor’s algorithm into your framework. But in order to discover a new quantum algorithm, you must find an ensemble of BQP computations which does not arise from Shor’s algorithm.
Table I claims that sampling tasks alone are not useful since they are not even quantumly repeatable. One may wonder if sampling tasks could be useful in some way. After all, classical Monte Carlo sampling algorithms are widely used in practice. However, applications of sampling typically use samples to extract meaningful information or specific features of the underlying distribution. For example, Monte Carlo sampling can be used to evaluate integrals in Bayesian inference and statistical physics. In contrast, samples obtained from random quantum circuits lack any discernible features. If a collection of quantum algorithms generated samples containing meaningful signals from which one could extract classically hard-to-compute values, those algorithms would effectively transition into the compute a value category.
Table I also claims that proofs of quantumness are not useful. This is not completely true—one potential application is generating certifiable randomness. However, such applications are generally cryptographic rather than computational in nature. Specifically, proofs of quantumness cannot help us solve problems or answer questions whose solutions we do not already know.
Finally, there are several exciting directions proposing applications of quantum technologies in sensing and metrology, communication, learning with quantum memory, and streaming. These are very interesting, and I hope that mankind’s second century of quantum mechanics brings forth all flavors of capabilities. However, the technological momentum is mostly focused on building quantum computers for the purpose of computational advantage, and so this is where breakthroughs will have the greatest immediate impact.
Don’t be too afraid
At the annual QIP conference, only a handful of papers out of hundreds each year attempt to advance new quantum algorithms. Given the stakes, why is this number so low? One common explanation is that quantum algorithm research is simply too difficult. Nevertheless, we have seen substantial progress in quantum algorithms in recent years. After an underwhelming lack of end-to-end proposals with the potential for utility between the years 2000 and 2020, Table I exhibits several breakthroughs from the past 5 years.
In between blind optimism and resigned pessimism, embracing a mission-driven mindset can propel our field forward. We should allow ourselves to adopt a more exploratory, scrappier approach: We can hunt for quantum advantages in yet-unstudied problems or subtle signals in the third decimal place. The bar for meaningful progress is lower than it might seem, and even incremental advances are valuable. Don’t be too afraid!
Quadratic speedups are widespread but will not form the basis of practical quantum advantage due to the overheads associated with quantum error-correction. ︎
After World War II, under the influence (direct and indirect) of people like Vannevar Bush, a "grand bargain" was effectively struck between the US government and the nation's universities. The war had demonstrated how important science and engineering research could be, through the Manhattan Project and the development of radar, among other things. University researchers had effectively and sometimes literally been conscripted into the war effort. In the postwar period, with more citizens than ever going to college because of the GI Bill, universities went through a period of rapid growth, and the government began funding research at universities on the large scale. This was a way of accomplishing multiple goals. This funding got hundreds of scientists and engineers to work on projects that agencies and the academic community itself (through peer review) thought would be important but perhaps were of such long-term or indirect economic impact that industry would be unlikely to support them. It trained the next generation of researchers and of the technically skilled workforce. It accomplished this as a complement to national laboratories and direct federal agency work.
After Sputnik, there was an enormous ramp-up of investment. This figure (see here for an interactive version) shows different contributions to investment in research and development in the US from 1953 through 2021:
A couple of days ago, the New York Times published a related figure, showing the growth in dollars of total federal funds sent to US universities, but I think this is a more meaningful graph (hat tip to Prof. Elizabeth Popp Berman at Michigan for her discussion of this). In 2021, federal investment in research (the large majority of which is happening at universities) as a percentage of GDP was at its lowest level since 1953, and it was sinking further even before this year (for those worried about US competitiveness.... Also, industry does a lot more D than they do long-term R.). There are many studies by economists showing that federal investment in research has a large return (for example, here is one by the Federal Reserve Bank of Dallas saying that returns to the US economy on federal research expenditures are between 150% and 300%). Remember, these funds are not just given to universities - they are in the form of grants and contracts, for which specific work is done and reported. These investments also helped make US higher education the envy of much of the world and led to education of international students as a tremendous effective export business for the country.
Of course, like any system created organically by people, there are problems. Universities are complicated and full of (ugh) academics. Higher education is too expensive. Compliance bureaucracy can be onerous. Any deliberative process like peer review trades efficiency for collective expertise but also the hazards of group-think. At the same time, the relationship between federally sponsored research and universities has led to an enormous amount of economic, technological, and medical benefit over the last 70 years.
Right now it looks like this whole apparatus is being radically altered, if not dismantled in part or in whole. Moreover, this is not happening as a result of a debate or discussion about the proper role and scale of federal spending at universities, or an in-depth look at the flaws and benefits of the historically developed research ecosystem. It's happening because "elections have consequences", and I'd be willing to bet that very very few people in the electorate cast their votes even secondarily because of this topic. Sincere people can have differing opinions about these issues, but decisions of such consequence and magnitude should not be taken lightly or incidentally.
(I am turning off comments on this one b/c I don't have time right now to pay close attention. Take it as read that some people would comment that US spending must be cut back and that this is a consequence.)
There is reporting about the upcoming presidential budget requests about NASA and NOAA. The requested cuts are very deep. To quote Eric Berger's article linked above, for the science part of NASA, "Among the proposals were: A two-thirds cut to astrophysics, down to $487 million; a greater than two-thirds cut to heliophysics, down to $455 million; a greater than 50 percent cut to Earth science, down to $1.033 billion; and a 30 percent cut to Planetary science, down to $1.929 billion." The proposed cuts to NOAA are similarly deep, seeking to end climate study in the agency, as Science puts it. The full presidential budget request, including NSF, DOE, NIST, etc. is still to come. Remember, Congress in the past has often essentially ignored presidential budget requests. It is unclear if the will exists to do so now.
Speaking of NSF, the graduate research fellowship program award announcements for this year came out this past week. The agency awarded slightly under half as many of these prestigious 3-year fellowships as in each of the last 15 years. I can only presume that this is because the agency is deeply concerned about its budgets for the next couple of fiscal years.
Grants are being frozen at several top private universities - these include Columbia (new cancellations), the University of Pennsylvania (here), Harvard (here), Northwestern and Cornell (here), and Princeton (here). There are various law suits filed about all of these. Princeton and Harvard have been borrowing money (issuing bonds) to partly deal with the disruption as litigation continues. The president of Princeton has been more vocal than many about this.
There has been a surge in visa revocations and unannounced student status changes in SEVIS for international students in the US. To say that this is unsettling is an enormous understatement. See here for a limited discussion. There seems to be deep reluctance for universities to speak out about this, presumably from the worry that saying the wrong thing will end up placing their international students and scholars at greater exposure.
On Friday evening, the US Department of Energy put out a "policy flash", stating that indirect cost rates on its grants would be cut immediately to 15%. This sounds familiar. Legal challenges are undoubtedly beginning.
Added bonus: According to the Washington Post, DOGE (whatever they say they are this week) is now in control of grants.gov, the website that posts funding opportunities. As the article says, "Now the responsibility of posting these grant opportunities is poised to rest with DOGE — and if its employees delay those postings or stop them altogether, 'it could effectively shut down federal-grant making,' said one federal official who spoke on the condition of anonymity to describe internal operations."
None of this is good news for the future of science and engineering research in the US. If you are a US voter and you think that university-based research is important, I encourage you to contact your legislators and make your opinions heard.
(As I have put in my profile, what I write here are my personal opinions; I am not in any way speaking for my employer. That should be obvious, but it never hurts to state it explicitly.)
Update: NSF has "disestablished" the advisory committees associated with its directorates (except the recently created TIP directorate). Coverage here in Science. This is not good, and I worry that it bodes ill for large cutbacks.
ChatGPT and its kin work by using Large Language Models, or LLMs.
A climate model is a pile of mathematics and code, honed on data from the climate of the past. Tell it how the climate starts out, and it will give you a prediction for what happens next.
Similarly, a language model is a pile of mathematics and code, honed on data from the texts of the past. Tell it how a text starts, and it will give you a prediction for what happens next.
We have a rough idea of what a climate model can predict. The climate has to follow the laws of physics, for example. Similarly, a text should follow the laws of grammar, the order of verbs and nouns and so forth. The creators of the earliest, smallest language models figured out how to do that reasonably well.
Texts do more than just follow grammar, though. They can describe the world. And LLMs are both surprisingly good and surprisingly bad at that. They can do a lot when used right, answering test questions most humans would struggle with. But they also “hallucinate”, confidently saying things that have nothing to do with reality.
If you want to understand why large language models make both good predictions and bad, you shouldn’t just think about abstract “texts”. Instead, think about a specific type of text: a story.
Stories follow grammar, most of the time. But they also follow their own logic. The hero sets out, saves the world, and returns home again. The evil queen falls from the tower at the climax of the final battle. There are three princesses, and only the third can break the spell.
We aren’t usually taught this logic, like we’re taught physics or grammar. We learn it from experience, from reading stories and getting used to patterns. It’s the logic, not of how a story must go, but of how a story typically goes. And that question, of what typically comes next, is exactly the question LLMs are designed to answer.
It’s also a question we sometimes answer.
I was a theatre kid, and I loved improv in particular. Some of it was improv comedy, the games and skits you might have seen on “Whose Line is it Anyway?” But some of it was more…hippy stuff.
I’d meet up with a group on Saturdays. One year we made up a creation myth, half-rehearsed and half-improvised, a collection of gods and primordial beings. The next year we moved the story forward. Civilization had risen…and fallen again. We played a group of survivors gathered around a campfire, wary groups wondering what came next.
We plotted out characters ahead of time. I was the “villain”, or the closest we had to one. An enforcer of the just-fallen empire, the oppressor embodied. While the others carried clubs, staves, and farm implements, I was the only one with a real weapon: a sword.
(Plastic in reality, but the audience knew what to do.)
In the arguments and recriminations of the story, that sword set me apart, a constant threat that turned my character from contemptible to dangerous, that gave me a seat at the table even as I antagonized and stirred the pot.
But the story had another direction. The arguments pushed and pulled, and gradually the survivors realized that they would not survive if they did not put their grievances to rest, if they did not seek peace. So, one man stepped forward, and tossed his staff into the fire.
The others followed. One by one, clubs and sticks and menacing tools were cast aside. And soon, I was the only one armed.
If I was behaving logically, if I followed my character’s interests, I would have “won” there. I had gotten what I wanted, now there was no check on my power.
But that wasn’t what the story wanted. Improv is a game of fast decisions and fluid invention. We follow our instincts, and our instincts are shaped by experience. The stories of the past guide our choices, and must often be the only guide: we don’t have time to edit, or to second-guess.
And I felt the story, and what it wanted. It was a command that transcended will, that felt like it left no room for an individual actor making an individual decision.
I cast my sword into the fire.
The instinct that brought me to do that is the same instinct that guides authors when they say that their characters write themselves, when their story goes in an unexpected direction. It’s an instinct that can be tempered and counteracted, with time and effort, because it can easily lead to nonsense. It’s why every good book needs an editor, why improv can be as repetitive as it is magical.
And it’s been the best way I’ve found to understand LLMs.
An LLM telling a story tells a typical story, based on the data used to create it. In the same way, an LLM giving advice gives typical advice, to some extent in content but more importantly in form, advice that is confident and mentions things advice often mentions. An LLM writing a biography will write a typical biography, which may not be your biography, even if your biography was one of those used to create it, because it tries to predict how a biography should go based on all the other biographies. And all of these predictions and hallucinations are very much the kind of snap judgement that disarmed me.
These days, people are trying to build on top of LLMs and make technology that does more, that can edit and check its decisions. For the most part, they’re building these checks out of LLMs. Instead of telling one story, of someone giving advice on the internet, they tell two stories: the advisor and the editor, one giving the advice and one correcting it. They have to tell these stories many times, broken up into many parts, to approximate something other than the improv actor’s first instincts, and that’s why software that does this is substantially more expensive than more basic software that doesn’t.
I can’t say how far they’ll get. Models need data to work well, decisions need reliability to be good, computers need infrastructure to compute. But if you want to understand what’s at an LLM’s beating heart, think about the first instincts you have in writing or in theatre, in stories or in play. Then think about a machine that just does that.
I have received multiple queries from colleagues who have been invited (from a non-academic email address) to speak a strange-sounding conference that is allegedly supported by major mathematical institutions, allegedly hosted at a prestigious university, and allegedly having myself (and two other Fields Medalists) as plenary speakers. The invitees are asked to pay “registration fees” upfront, with the promise of future reimbursement. (There is a bare-bones web site, which seems to be partially copy-pasted from some previous “conferences” in chemistry and physics, but I will not link to it here.)
I have not agreed (or even been asked) to participate in this event, and I can confirm the same for at least one of the other supposed plenary speakers. There is also no confirmation of the support or location claimed.
As such, this does not appear to be a legitimate scientific conference, and I would advise anyone receiving such an email to discard it.
EDIT: in order to have this post be picked up by appropriate search engine queries, the name of the alleged conference is “Infinity ’25: Horizons in Mathematical Thought”.
SECOND EDIT: I am *not* referring to the 2026 International Congress of Mathematicians (ICM), which is also sending out speaker invitations currently, and is of course an extremely legitimate conference.
Buzz off, this is my blog, and if I feel like posting a chess game, that's what is going to happen. But if you like the game, stay here - this is a nice game.
Again played after hiours today, and again on a 5' online blitz server (chess.com). What amazes me is that these days I seem to have a sort of touch for nice attacks and brilliant combinations. Let me show you why I am saying this.
The starting position arose after the following opening sequence:
tommasodorigo - UTOPII841, chess.com April 16 2025
Every week, I tell myself I won’t do yet another post about the asteroid striking American academia, and then every week events force my hand otherwise.
No one on earth—certainly no one who reads this blog—could call me blasé about the issue of antisemitism at US universities. I’ve blasted the takeover of entire departments and unrelated student clubs and campus common areas by the dogmatic belief that the State of Israel (and only Israel, among all nations on earth) should be eradicated, by the use of that belief as a litmus test for entry. Since October 7, I’ve dealt with comments and emails pretty much every day calling me a genocidal Judeofascist Zionist.
So I hope it means something when I say: today I salute Harvard for standing up to the Trump administration. And I’ll say so in person, when I visit Harvard’s math department later this week to give the Fifth Annual Yip Lecture, on “How Much Math Is Knowable?” The more depressing the news, I find, the more my thoughts turn to the same questions that bothered Euclid and Archimedes and Leibniz and Russell and Turing. Actually, what the hell, why don’t I share the abstract for this talk?
Theoretical computer science has over the years sought more and more refined answers to the question of which mathematical truths are knowable by finite beings like ourselves, bounded in time and space and subject to physical laws. I’ll tell a story that starts with Gödel’s Incompleteness Theorem and Turing’s discovery of uncomputability. I’ll then introduce the spectacular Busy Beaver function, which grows faster than any computable function. Work by me and Yedidia, along with recent improvements by O’Rear and Riebel, has shown that the value of BB(745) is independent of the axioms of set theory; on the other end, an international collaboration proved last year that BB(5) = 47,176,870. I’ll speculate on whether BB(6) will ever be known, by us or our AI successors. I’ll next discuss the P≠NP conjecture and what it does and doesn’t mean for the limits of machine intelligence. As my own specialty is quantum computing, I’ll summarize what we know about how scalable quantum computers, assuming we get them, will expand the boundary of what’s mathematically knowable. I’ll end by talking about hypothetical models even beyond quantum computers, which might expand the boundary of knowability still further, if one is able (for example) to jump into a black hole, create a closed timelike curve, or project oneself onto the holographic boundary of the universe.
Now back to the depressing news. What makes me take Harvard’s side is the experience of Columbia. Columbia had already been moving in the right direction on fighting antisemitism, and on enforcing its rules against disruption, before the government even got involved. Then, once the government did take away funding and present its ultimatum—completely outside the process specified in Title VI law—Columbia’s administration quickly agreed to everything asked, to howls of outrage from the left-leaning faculty. Yet despite its total capitulation, the government has continued to hold Columbia’s medical research and other science funding hostage, while inventing a never-ending list of additional demands, whose apparent endpoint is that Columbia submit to state ideological control like a university in Russia or Iran.
By taking this scorched-earth route, the government has effectively telegraphed to all the other universities, as clearly as possible: “actually, we don’t care what you do or don’t do on antisemitism. We just want to destroy you, and antisemitism was our best available pretext, the place where you’d most obviously fallen short of your ideals. But we’re not really trying to cure a sick patient, or force the patient to adopt better health habits: we’re trying to shoot, disembowel, and dismember the patient. That being the case, you might as well fight us and go down with dignity!”
No wonder that my distinguished Harvard friends (and past Shtetl-Optimized guest bloggers) Steven Pinker and Boaz Barak—not exactly known as anti-Zionist woke radicals—have come out in favor of Harvard fighting this in court. So has Harvard’s past president Larry Summers, who’s welcome to guest-blog here as well. They all understand that events have given us no choice but to fight Trump as if there were no antisemitism, even while we continue to fight antisemitism as if there were no Trump.
Update (April 16): Commenter Greg argues that, in the title of this post, I probably ought to revise “Harvard’s biggest crisis since 1636” to “its biggest crisis since 1640.” Why 1640? Because that’s when the new college was shut down, over allegations that its head teacher was beating the students and that the head teacher’s wife (who was also the cook) was serving the students food adulterated with dung. By 1642, Harvard was back on track and had graduated its first class.
The Mathematics Division at Stellenbosch University in South Africa is looking to hire a new permanent appointment at Lecturer / Senior Lecturer level (other levels may be considered too under the appropriate circumstances).
Preference will be given to candidates working in number theory or a related area, but those working in other areas of mathematics will definitely also be considered.
The closing date for applications is 30 April 2025. For more details, kindly see the official advertisement.
Consider a wonderful career in the winelands area of South Africa!
After a very intense day at work, I sought some relaxation in online blitz chess today. And the game gave me the kick I was hoping I'd get. After a quick Alapin Sicilian opening, we reached the following position (diagram 1):
As you can see, black is threatening a checkmate with Qxg2++. However, the last move was a serious error, as it neglected the intrinsic power of my open files and diagonals against the black king. Can you find the sequence with which I quickly destroyed my opponent's position?
The Scientia Institute at Rice sponsors series of public lectures annually, centered around a theme. The intent is to get a wide variety of perspectives spanning across the humanities, social sciences, arts, sciences, and engineering, presented in an accessible way. The youtube channel with recordings of recent talks is here.
This past year, the theme was "democracy" in its broadest sense. I was honored to be invited last year to contribute a talk, which I gave this past Tuesday, following a presentation by my CS colleague Rodrigo Ferreira about whether AI has politics. Below I've embedded the video, with the start time set where I begin (27:00, so you can rewind to see Rodrigo).
Which (macroscopic) states of matter to we see? The ones that "win the popular vote" of the microscopic configurations.
Where Does Meaning Live in a Sentence? Math Might Tell Us.
The mathematician Tai-Danae Bradley is using category theory to try
to understand both human and AI-generated language.
It’s a nicely set up Q&A, with questions like “What’s something category theory lets you see that you can’t otherwise?” and “How do you use category theory to understand language?”
We’ll get back to measurement, interference and the double-slit experiment just as soon as I can get my math program to produce pictures of the relevant wave functions reliably. I owe you some further discussion of why measurement (and even interactions without measurement) can partially or completely eliminate quantum interference.
But in the meantime, I’ve gotten some questions and some criticism for arguing that superposition is an OR, not an AND. It is time to look closely at this choice, and understand both its strengths and its limitations, and how we have to move beyond it to fully appreciate quantum physics. [I probably should have written this article earlier — and I suspect I’ll need to write it again someday, as it’s a tricky subject.]
The Question of Superposition
Just to remind you of my definitions (we’ll see examples in a moment): objects that interact with one another form a system, and a system is at any time in a certain quantum state, consisting of one or more possibilities combined in some way and described by what is often called a “wave function”. If the number of possibilities described by the wave function is more than one, then physicists say that the state of the quantum system is a superposition of two or more basic states. [Caution: as we’ll explore in later posts, the number of states in the superposition can depend on one’s choice of “basis”.]
As an example, suppose we have two boxes, L and R for left and right, and two atoms, one of hydrogen H and one of nitrogen N. Our physical system consists of the two atoms, and depending on which box each atom is in, the system can exist in four obvious possibilities, shown in Fig. 1:
HL NL (i.e. both the hydrogen atom and the nitrogen atom are in the left box)
HL NR
HR NL
HR NR
Figure 1: The four intuitively obvious options for how to store the two atoms in the boxes correspond to four basic states of the quantum system.
Before quantum physics, we would have thought those were the only options; each atom must be in one box or the other. But in quantum physics there are many more non-obvious possibilities.
In particular, we could put the system in a superposition of the form HL NL + HR NR, shown in Fig. 2. In the jargon of physics, “the system is in a superposition of HL NL and HR NR“. Note the use of the word “and” here. But don’t read too much into it; jargon often involves linguistic shorthand, and can be arbitrary and imprecise. The question I’m focused on here is not “what do physicists say?”, but “what does it actually mean?”
Figure 2: A quantum system can be in a superposition, such as this one represented by two basic states related by a “+” symbol. (This is not the most general case, as discussed below.)
In particular, does it mean that “HL NLAND HR NR” are true? Or does it mean “HL NLOR HR NR” is true? Or does it mean something else?
The Problems with “AND”
First, let’s see why the “AND” option has a serious problem.
In ordinary language, if I say that “A AND B are true”, then I mean that one can check that A is true and also, separately, that B is true — i.e., both A and B are true. With this meaning in mind, it’s clear that experiments do not encourage us to view superposition as an AND. (There are theory interpretations of quantum physics that do encourage the use of “AND”, a point I’ll return to.)
Experiment Is Skeptical
Specifically, if a system is in a quantum superposition of two states A and B, no experiment will ever show that
A is true AND
B is true.
Instead, in any experiment explicitly designed to check whether A is true and whether B is true, the result will only reveal, at best, that
A is true and B is not true OR
B is true and A is not true.
The result might also be ambiguous, neither confirming nor denying that either one is true. But no measurement will ever show that both A AND B are definitively true. The two possibilities A and B are mutually exclusive in any actual measurement that is sensitive to the question.
In our case, if we go looking for our two atoms in the state HL NL + HR NR — if we do position measurements on both of them — we will either find both of them in the left box OR both of them in the right box. 1920’s quantum physics may be weird, but it does not allow measurements of an atom to find it in two places at the same time: an atom has a position, even if it is inherently uncertain, and if I make a serious attempt to locate it, I will find only one answer (within the precision of the measurement). [Measurement itself requires a long discussion, which I won’t attempt here; but see this post and the following one.]
And so, in this case, a measurement will find that one box has two atoms and the other has zero. Yet if we use “AND” in describing the superposition, we end up saying “both atoms are in the left box AND both atoms are in the right box”, which seems to imply that both atoms are in both boxes, contrary to any experiment. Again, certain theoretical approaches might argue that they are in both boxes, but we should obviously be very cautious when experiment disagrees with theoretical reasoning.
The Fortunate and/or Unfortunate Cat
The example of Schrodinger’s cat is another context in which some writers use “and” in describing what is going on.
A reminder of the cat experiment: We have an atom which may decay now or later, according to a quantum process whose timing we cannot predict. If the atom decays, it initiates a chain reaction which kills the cat. If the atom and the cat are placed inside a sealed box, isolating them completely from the rest of the universe, then the initial state, with an intact atom (Ai) and a Live cat (CL), will evolve to a state in a superposition roughly of the form AiCL+AdCD, where Ad refers to a decayed atom and CD refers to a Dead cat. (More precisely, the state will take the form c1AiCL+c2AdCD, where c1 and c2 are complex numbers with |c1|2 + |c2|2 = 1; but we can ignore these numbers for now.)
Figure 3: As in Figure 2, a superposition can even, in principle, be applied to macroscopic objects. This includes the famous Schrodinger cat state.
Leaving aside that the experiment is both unethical and impossible in practice, it raises an important point about the word “AND”. It includes a place where we must say “AND“; there’s no choice.
As we close the box to start the experiment, the atom is intact AND the cat is alive; both are simultaneously true, as measurement can verify. The state that we use to describe this, AiCL, is a mathematical product: implicitly “AiCL” means AixCL, where x is the “times” symbol.
Figure 4: It is unambiguous that the initial state of the cat-atom system is that the atom is intact AND the cat is alive: AixCL.
Later, the state to which the system evolves is a sum of two products — a superposition (AixCL) + (AdxCD) which includes two “AND” relationships
1) “the atom is intact AND the cat is alive” (AixCL) 2) “the atom has decayed AND the cat is dead” (AdxCD)
In each of these two possibilities, the state of the atom and the state of the cat are perfectly correlated; if you know one, you know the other. To use language consistent with English (and all other languages with which I am familiar), we must use “AND” to describe this correlation. (Note: in this particular example, correlation does in fact imply causation — but that’s not a requirement here. Correlation is enough.)
It is then often said that, theoretically, that “before we open the box, the cat is both alive AND dead”. But again, if we open the box to find out, experimentally, we will find out either that “the cat is alive OR the cat is dead.” So we should think this through carefully.
We’ve established that “x” must mean “AND“, as in Fig. 4. So let’s try to understand the “+” that appears in the superposition (AixCL) + (AdxCD). It is certainly the case that such a state doesn’t tell us whether CL is true or CD is true, or even that it is meaningful to say that only one is true.
But suppose we decide that “+” means “AND“, also. Then we end up saying
“(the cat is alive AND the atom is intact) AND (the cat is dead AND the atom has decayed.)”
That’s very worrying. In ordinary English, if I’m referring to some possible facts A,B,C, and D, and I tell you that “(A AND B are true) AND (C AND D are true)”, the logic of the language implies that A AND B AND C AND D are all true. But that standard logic would leads to a falsehood. It is absolutely not the case, in the state (AixCL) + (AdxCD), that CL is true and Ad is true — we will never find, in any experiment, that the cat is alive and yet the atom has decayed. That could only happen if the system were in a superposition that includes the possibility AdxCL. Nor (unless we wait a few years and the cat dies of old age) can it be the case that CD is true and Ai is true.
And so, if “x” means “AND” and “+” means “AND“, it’s clear that these are two different meanings of “AND.”
“AND” and “AND”
Is that okay? Well, lots of words have multiple meanings. Still, we’re not used to the idea of “AND” being ambiguous in English. Nor are “x” and “+” usually described with the same word. So using “AND” is definitely problematic.
(That said, people who like to think in terms of parallel “universes” or “branches” in which all possibilities happen [the many-worlds interpretation] may actually prefer to have two meanings of “AND”, one for things that happen in two different branches, and one for things that happen in the same branch. But this has some additional problems too, as we’ll see later when we get to the subtleties of “OR”.)
These issues are why, in my personal view, “OR” is better when one first learns quantum physics. I think it makes it easier to explain how quantum physics is both related to standard probabilities and yet goes beyond it. For one thing, “or” is already ambiguous in English, so we’re used to the idea that it might have multiple meanings. For another, we definitely need “+” to be conceptually different from “x“, so it is confusing, pedagogically, to start right off by saying that both mathematical operators are “AND”.
But “OR” is not without its problems.
The Problems with “OR”
In normal English, saying “the atom is intact and the cat is alive” OR “the atom has decayed and the cat is dead” would tell us two possible facts about the current contents of the box, one of which is definitely true.
But in quantum physics, the use of “OR” in the Schrodinger cat superposition does not tell us what is currently happening inside the box. It does tell us the state of the system at the moment, but all that does is predict the possible outcomes that would be observed if the box were opened right now(and their probabilities.) That’s less information than telling us the properties of what is in the closed box.
The advantage of “OR” is that it does tell us the two outcomes of opening the box, upon which we will find
“The atom is intact AND the cat is alive” OR
“The atom has decayed AND the cat is dead”
Similarly, for our box of atoms, it tells us that if we attempt to locate the atoms, we will find that
“the hydrogen atom is in the left box AND the nitrogen atom is in the left box” OR
“the hydrogen atom is in the right box AND the nitrogen atom is in the right box”
In other words, this use of AND and OR agrees with what experiments actually find. Better this than the alternative, it seems to me.
Nevertheless, just because it is better doesn’t mean it is unproblematic.
The Usual Or
The word “OR” is already ambiguous in usual English, in that it could mean
either A is true or B is true
A is true or B is true or both are true
Which of these two meanings is intended in an English sentence has to be determined by context, or explained by the speaker. Here I’m focused on the first meaning.
Returning to our first example of Figs. 1 and 2, suppose I hand the two atoms to you and ask you to put them in either box, whichever one you choose. You do so, but you don’t tell me what your choice was, and you head off on a long vacation.
While I wait for you to return, what can I say about the two atoms? Assuming you followed my instructions, I would say that
“both atoms are in the left box OR both atoms are in the right box”
In doing so, I’m using “or” in its “either…or…” sense in ordinary English. I don’t know which box you chose, but I still know (Fig. 5) that the system is either definitely in the HL NL state ORdefinitely in the HR NR state of Fig. 1. I know this without doing any measurement, and I’m only uncertain about which is which because I’m missing information that you could have provided me. The information is knowable; I just don’t have it.
Figure 5: The atoms were definitely put into in one box or the other, but nobody told me which box was selected.
But this uncertainty about which box the atoms are in is completely different from the uncertainty that arises from putting the atoms in the superposition state HL NL + HR NR!
The Superposition OR
If the system is in the state HL NL + HR NR, i.e. what I’ve been calling (“HL NLOR HR NR“), it is in a state of inherent uncertainty of whether the two atoms are in the left box or in the right box. It is not that I happen not to know which box the atoms are in, but rather that this information is not knowable within the rules of quantum physics. Even if you yourself put the atoms into this superposition, you don’t know which box they’re in any more than I do.
The only thing we can try to do is perform an experiment and see what the answer is. The problem is that we cannot necessarily infer, if we find both atoms in the left box, that the two atoms were in that box prior to that measurement.
If we do try to make that assumption, we find ourselves in apparent contradiction with experiment. The issue is quantum interference. If we repeat the whole process, but instead of opening the boxes to see where the atoms are, we first bring the two boxes together and measure the atoms’ properties, we will observe quantum interference effects. As I have discussed in my recent series of five posts on interference (starting here), quantum interference can only occur when a system takes at least two paths to its current state; but if the two atoms were definitely in one box or definitely in the other, then there would be only one path in Fig. 6.
Figure 6: In the superposition state, the atoms cannot simply be in definite but unknown locations, as in Fig. 5. If the boxes are joined and then opened, quantum interference will occur, implying the system has evolved via two paths to a single state.
Prior to the measurement, the system had inherent uncertainty about the question, and while measurement removes the current uncertainty, it does not in general remove the past uncertainty. The act of measurement changes the state of the system — more precisely, it changes the state of the larger system that includes both atoms and the measurement device — and so establishing meaningfully that the two atoms are now in the left box is not sufficient to tell us meaningfully that the two atoms were previously and definitively in the left box.
So if this is “OR“, it is certainly not what it usually means in English!
This Superposition or That One?
And it gets worse, because we can take more complex examples. As I mentioned when discussing the poor cat, the superposition HL NL+ HR NR is actually one in a large class of superpositions, of the form c1 HL NL + c2 HR NR , where c1 and c2 are complex numbers. A second simple example of such a superposition is HL NL– HR NR, with a minus sign instead of a plus sign.
So suppose I had asked you to put the two atoms in a superposition either of the form HL NL+ HR NRor HL NL– HR NR, your choice; and suppose you did so without telling me which superposition you chose. What would I then know?
I would know that the system is either in the state (HL NL + HR NR) or in the state (HL NL – HR NR), depending on what you chose to do. In words, what I would know is that the system is represented by
(HL NLOR HR NR) OR (HL NLOR HR NR)
Uh oh. Now we’re as badly off as we were with “AND“.
First, the “OR” in the center is a standard English “OR” — it means that the system is definitely in one superposition or the other, but I don’t know which one — which isn’t the same thing as the “OR“s in the parentheses, which are “OR“s of superposition that only tell us what the results of measurements might be.
Second, the two “OR“s in the parentheses are different, since one means “+” and the other means “–“. In some other superposition state, the OR might mean 3/5 + i 4/5, where i is the standard imaginary number equal to the square root of -1. In English, there’s obviously no room for all this complexity. [Note that I’d have the same problem if I used “AND” for superpositions instead.]
So even if “OR” is better, it’s still not up to the task. Superposition forces us to choose whether to have multiple meanings of “AND” or multiple meanings of “OR”, including meanings that don’t hold in ordinary language. In a sense, the “+” (or “-” or whatever) in a superposition is a bit more “AND” than standard English “OR”, but it’s also a bit more “OR” than a standard English “AND”. It’s something truly new and unfamiliar.
Experts in the foundational meaning of quantum physics argue over whether to use “OR” or “AND”. It’s not an argument I want to get into. My goal here is to help you understand how quantum physics works with the minimum of interpretation and the minimum of mathematics. This requires precise language, of course. But here we find we cannot avoid a small amount of math — that of simple numbers, sometimes even complex numbers — because ordinary language simply can’t capture the logic of what quantum physics can do.
I will continue, for consistency, to use “OR” for a superposition, but going forward we must admit and recognize its limitations, and become more sophisticated about what it does and doesn’t mean. One should understand my use of “OR“, and the “pre-quantum viewpoint” that I often employ, as pedagogical methodology, not a statement about nature. Specifically, I have been trying to clarify the crucial idea of the space of possibilities, and to show examples of how quantum physics goes beyond pre-quantum physics. I find the “pre-quantum viewpoint”, where it is absolutely required that we use “OR”, helps students get the basics straight. But it is true that the pre-quantum viewpoint obscures some of the full richness and complexity of quantum phenomena, much of which arises precisely because the quantum “OR” is not the standard “OR” [and similarly if you prefer “AND” instead.] So we have to start leaving it behind.
There are many more layers of subtlety yet to be uncovered [for instance, what if my system is in a state (A OR B), but I make a measurement that can’t directly tell me whether A is true or B is true?] but this is enough for today.
I’m grateful to Jacob Barandes for a discussion about some of these issues.
Conceptual Summary
When we use “A AND B” in ordinary language, we mean “A is true and B is true”.
When we use “A OR B” in ordinary language, we find “OR” is ambiguous even in English; it may mean
“either A is true or B is true”, or
“A is true or B is true or both are true.”
In my recent posts, when I say a superposition c1 A + c2 B can be expressed as “A OR B”, I mean something that I cannot mean in English, because such a meaning would never normally occur to us:
I mean that the result of an appropriate measurement carried out at this moment will give the result A or the result B (but not both).
I do so without generally implying that the state of the system, if I don’t carry out the measurement, is definitely A or definitely B (though unknown).
Instead the system could be viewed as being in an uncanny state of being that we’re not used to, for which neither ordinary “AND” nor ordinary “OR” applies.
Note also that using either “AND” or “OR” is unable to capture the difference between superpositions that involve the same states but differ in the numbers c1, c2.
The third bullet point is open to different choices about “AND” and “OR“, and open to different interpretation about what superposition states imply about the systems that are in them. There are different consistent ways to combine the language and concepts, and the particular choice I’ve made is pragmatic, not dogmatic. For a single set of blog posts that tell a coherent story, I have to to pick a single consistent language; but it’s a choice. Once one’s understanding of quantum physics is strong, it’s both valuable and straightforward to consider other possible choices.
A friend and I were discussing whether there’s anything I could possibly say, on this blog, in 2025, that wouldn’t provoke an outraged reaction from my commenters. So I started jotting down ideas. Let’s see how I did.
Pancakes are a delicious breakfast, especially with blueberries and maple syrup.
Since it’s now Passover, and no pancakes for me this week, let me add: I think matzoh has been somewhat unfairly maligned. Of course it tastes like cardboard if you eat it plain, but it’s pretty tasty with butter, fruit preserves, tuna salad, egg salad, or chopped liver.
Central Texas is actually really nice in the springtime, with lush foliage and good weather for being outside.
Kittens are cute. So are puppies, although I’d go for kittens given the choice.
Hamilton is a great musical—so much so that it’s become hard to think about the American Founding except as Lin-Manuel Miranda reimagined it, with rap battles in Washington’s cabinet and so forth. I’m glad I got to take my kids to see it last week, when it was in Austin (I hadn’t seen it since it its pre-Broadway previews a decade ago). Two-hundred fifty years on, I hope America remembers its founding promise, and that Hamilton doesn’t turn out to be America’s eulogy.
The Simpsons and Futurama are hilarious.
Young Sheldon and The Big Bang Theory are unjustly maligned. They were about as good as any sitcoms can possibly be.
For the most part, people should be free to live lives of their choosing, as long as they’re not harming others.
The rapid progress of AI might be the most important thing that’s happened in my lifetime. There’s a huge range of plausible outcomes, from “merely another technological transformation like computing or the Internet” to “biggest thing since the appearance of multicellular life,” but in any case, we ought to proceed with caution and with the wider interests of humanity foremost in our minds.
Research into curing cancer is great and should continue to be supported.
The discoveries of NP-completeness, public-key encryption, zero-knowledge and probabilistically checkable proofs, and quantum computational speedups were milestones in the history of theoretical computer science, worthy of celebration.
Katalin Karikó, who pioneered mRNA vaccines, is a heroine of humanity. We should figure out how to create more Katalin Karikós.
Scientists spend too much of their time writing grant proposals, and not enough doing actual science. We should experiment with new institutions to fix this.
I wish California could build high-speed rail from LA to San Francisco. If California’s Democrats could show they could do this, it would be an electoral boon to Democrats nationally.
I wish the US could build clean energy, including wind, solar, and nuclear. Actually, more generally, we should do everything recommended in Derek Thompson and Ezra Klein’s phenomenal new book Abundance, which I just finished.
The great questions of philosophy—why does the universe exist? how does consciousness relate to the physical world? what grounds morality?—are worthy of respect, as primary drivers of human curiosity for millennia. Scientists and engineers should never sneer at these questions. All the same, I personally couldn’t spend my life on such questions: I also need small problems, ones where I can make definite progress.
Quantum physics, which turns 100 this year, is arguably the most metaphysical of all empirical discoveries. It’s worthy of returning to again and again in life, asking: but how could the world be that way? Is there a different angle that we missed?
If I knew for sure that I could achieve Enlightenment, but only by meditating on a mountaintop for a decade, a further question would arise: is it worth it? Or would I rather spend that decade engaged with the world, with scientific problems and with other people?
I, too, vote for political parties, and have sectarian allegiances. But I’m most moved by human creative effort, in science or literature or anything else, that transcends time and place and circumstance and speaks to the eternal.
As I was writing this post, a bird died by flying straight into the window of my home office. As little sense as it might make from a utilitarian standpoint, I am sad for that bird.
I’ve just uploaded to the arXiv the paper “Decomposing a factorial into large factors“. This paper studies the quantity , defined as the largest quantity such that it is possible to factorize into factors , each of which is at least . The first few values of this sequence are
(OEIS A034258). For instance, we have , because on the one hand we can factor
but on the other hand it is not possible to factorize into nine factors, each of which is or higher.
This quantity was introduced by Erdös, who asked for upper and lower bounds on ; informally, this asks how equitably one can split up into factors. When factoring an arbitrary number, this is essentially a variant of the notorious knapsack problem (after taking logarithms), but one can hope that the specific structure of the factorial can make this particular knapsack-type problem more tractable. Since
for any putative factorization, we obtain an upper bound
thanks to the Stirling approximation. At one point, Erdös, Selfridge, and Straus claimed that this upper bound was asymptotically sharp, in the sense that
as ; informally, this means we can split into factors that are (mostly) approximately the same size, when is large. However, as reported in this later paper, Erdös “believed that Straus had written up our proof… Unfortunately Straus suddenly died and no trace was ever found of his notes. Furthermore, we never could reconstruct our proof, so our assertion now can be called only a conjecture”.
Some further exploration of was conducted by Guy and Selfridge. There is a simple construction that gives the lower bound
that comes from starting with the standard factorization and transferring some powers of from the later part of the sequence to the earlier part to rebalance the terms somewhat. More precisely, if one removes one power of two from the even numbers between and , and one additional power of two from the multiples of four between to , this frees up powers of two that one can then distribute amongst the numbers up to to bring them all up to at least in size. A more complicated procedure involving transferring both powers of and then gives the improvement . At this point, however, things got more complicated, and the following conjectures were made by Guy and Selfridge:
(i) Is for all ?
(ii) Is for all ? (At , this conjecture barely fails: .)
(iii) Is for all ?
In this note we establish the bounds
as , where is the explicit constant
In particular this recovers the lost result (2). An upper bound of the shape
which is consistent with the above conjectures (i), (ii), (iii) of Guy and Selfridge, although numerically the convergence is somewhat slow.
The upper bound argument for (3) is simple enough that it could also be modified to establish the first conjecture (i) of Guy and Selfridge; in principle, (ii) and (iii) are now also reducible to a finite computation, but unfortunately the implied constants in the lower bound of (3) are too weak to make this directly feasible. However, it may be possible to now crowdsource the verification of (ii) and (iii) by supplying a suitable set of factorizations to cover medium sized , combined with some effective version of the lower bound argument that can establish for all past a certain threshold. The value singled out by Guy and Selfridge appears to be quite a suitable test case: the constructions I tried fell just a little short of the conjectured threshold of , but it seems barely within reach that a sufficiently efficient rearrangement of factors can work here.
We now describe the proof of the upper and lower bound in (3). To improve upon the trivial upper bound (1), one can use the large prime factors of . Indeed, every prime between and divides at least once (and the ones between and divide it twice), and any factor that contains such a factor therefore has to be significantly larger than the benchmark value of . This observation already readily leads to some upper bound of the shape (4) for some ; if one also uses the primes that are slightly less than (noting that any multiple of that exceeds , must in fact exceed ) is what leads to the precise constant .
For previous lower bound constructions, one started with the initial factorization and then tried to “improve” this factorization by moving around some of the prime factors. For the lower bound in (3), we start instead with an approximate factorization roughly of the shape
where is the target lower bound (so, slightly smaller than ), and is a moderately sized natural number parameter (we will take , although there is significant flexibility here). If we denote the right-hand side here by , then is basically a product of numbers of size at least . It is not literally equal to ; however, an easy application of Legendre’s formula shows that for odd small primes , and have almost exactly the same number of factors of . On the other hand, as is odd, contains no factors of , while contains about such factors. The prime factorizations of and differ somewhat at large primes, but has slightly more such prime factors as (about such factors, in fact). By some careful applications of the prime number theorem, one can tweak some of the large primes appearing in to make the prime factorization of and agree almost exactly, except that is missing most of the powers of in , while having some additional large prime factors beyond those contained in to compensate. With a suitable choice of threshold , one can then replace these excess large prime factors with powers of two to obtain a factorization of into terms that are all at least , giving the lower bound.
The general approach of first locating some approximate factorization of (where the approximation is in the “adelic” sense of having not just approximately the right magnitude, but also approximately the right number of factors of for various primes ), and then moving factors around to get an exact factorization of , looks promising for also resolving the conjectures (ii), (iii) mentioned above. For instance, I was numerically able to verify that by the following procedure:
Start with the approximate factorization of , by . Thus is the product of odd numbers, each of which is at least .
Call an odd prime -heavy if it divides more often than , and -heavy if it divides more often than . It turns out that there are more -heavy primes than -heavy primes (counting multiplicity). On the other hand, contains powers of , while has none. This represents the (multi-)set of primes one has to redistribute in order to convert a factorization of to a factorization of .
Using a greedy algorithm, one can match a -heavy prime to each -heavy prime (counting multiplicity) in such a way that for a small (in most cases one can make , and often one also has ). If we then replace in the factorization of by for each -heavy prime , this increases (and does not decrease any of the factors of ), while eliminating all the -heavy primes. With a somewhat crude matching algorithm, I was able to do this using of the powers of dividing , leaving powers remaining at my disposal. (I don’t claim that this is the most efficient matching, in terms of powers of two required, but it sufficed.)
There are still -heavy primes left over in the factorization of (the modified version of) . Replacing each of these primes with , and then distributing the remaining powers of two arbitrarily, this obtains a factorization of into terms, each of which are at least .
However, I was not able to adjust parameters to reach in this manner. Perhaps some readers here who are adept with computers can come up with a more efficient construction to get closer to this bound? If one can find a way to reach this bound, most likely it can be adapted to then resolve conjectures (ii) and (iii) above after some additional numerical effort.
UPDATE: There is now an active Github project to track the latest progress, coming from multiple contributors.
(A post summarizing recent US science-related events will be coming later. For now, here is my promised post about multiferroics, inspired in part by a recent visit to Rice by Yoshi Tokura.)
Electrons carry spins and therefore magnetic moments (that is, they can act in some ways like little bar magnets), and as I was teaching undergrads this past week, under certain conditions some of the electrons in a material can spontaneously develop long-range magnetic order. That is, rather than being, on average, randomly oriented, instead below some critical temperature the spins take on a pattern that repeats throughout the material. In the ordered state, if you know the arrangement of spins in one (magnetic) unit cell of the material, that pattern is repeated over many (perhaps all, if the system is a single domain) the unit cells. In picking out this pattern, the overall symmetry of the material is lowered compared to the non-ordered state. (There can be local moment magnets, when the electrons with the magnetic moments are localized to particular atoms; there can also be itinerant magnets, when the mobile electrons in a metal take on a net spin polarization.) The most famous kind of magnetic order is ferromagnetism, when the magnetic moments spontaneously align along a particular direction, often leading to magnetic fields projected out of the material. Magnetic materials can be metals, semiconductors, or insulators.
In insulators, an additional kind of order is possible, based on electric polarization, \(\mathbf{P}\). There is subtlety about defining polarization, but for the purposes of this discussion, the question is whether the atoms within each unit cell bond appropriately and are displaced below some critical temperature to create a net electric dipole moment, leading to ferroelectricity. (Antiferroelectricity is also possible.) Again, the ordered state has lower symmetry than the non-ordered state. Ferroelectric materials have some interesting applications.
BiFeO3, a multiferroic antiferromagnet, image from here.
Multiferroics are materials that have simultaneous magnetic order and electric polarization order. A good recent review is here. For applications, obviously it would be convenient if both the magnetic and polarization ordering happened well above room temperature. There can be deep connections between the magnetic order and the electric polarization - see this paper, and this commentary. Because of these connections, the low energy excitations of multiferroics can be really complicated, like electromagnons. Similarly, there can be combined "spin textures" and polarization textures in such materials - see here and here. Multiferroics raise the possibility of using applied voltages (and hence electric fields) to flip \(\mathbf{P}\), and thus toggle around \(\mathbf{M}\). This has been proposed as a key enabling capability for information processing devices, as in this approach. These materials are extremely rich, and it feels like their full potential has not yet been realized.
The finale of season 3 having aired, an old post of mine about The White Lotus season 1 is getting a lot of hits. So let me just say that I think the conclusion of season 3 very much backs up my 2021 read of Belinda’s character.
My sister-in-law was explaining to me a difference between British and US standard English — in Britain, “just” acts as a marker requiring the past perfect, but in America, that’s not the case. So the Beatles sing “I’ve Just Seen a Face,” not “I Just Saw a Face,” while Stevie Wonder sings “I Just Called To Say I Love You,” not “I’ve Just Called To Say I Fancy You.”
Is this a Beatles song you’re aware of? Not one of their biggest hits but wonderful. Paul McCartney was always just kind of finding perfect pop songs like this between his couch cushions. “I’ve just cleaned my couch,” he’d say, and there it was.
A few weeks ago I wrote an article about the 2024 Orioles breaking the all-time record for fewest double plays hit into. I wrote: “Regression to the mean spares no one, and the 2025 Orioles will likely hit into more double plays than they did in 2024.” Boy, was I right. Thirteen games into the season, the Orioles have hit into 16 double plays, the most in the majors and more than double their pace of last year. Sorry, Orioles! I think I cursed you. I mean you see a a play like this one, where Bobby Witt Jr. makes a nutso catch on a Hjeston Kerstad line drive to double off a flabbergasted Tyler O’Neill at second base, and you think — that doesn’t just happen.
They really do seem hex-struck. I was skeptical of the “sign eight or nine #4 starters approach” but figured it would at least allow them to absorb injuries without skipping a beat. But now? The starting rotation you could make from our injured list — Zach Eflin, Grayson Rodriguez, Kyle Bradish, Albert Suárez, and Tyler Wells — is a lot better than the five guys we have available. In fact, I’d say it’s a better than average MLB starting five. We have broken an entire rotation! And this isn’t even counting Trevor Rogers, Chayce McDermott, and Kyle Gibson, all also on the roster and too sore to take the mound.
If the Orioles need me to stand at home plate and enact some kind of cleansing ceremony, I am ready and able.
Particle physicists have an enormously successful model called the Standard Model, which describes the world in terms of seventeen quantum fields, giving rise to particles from the familiar electron to the challenging-to-measure Higgs boson. The model has nineteen parameters, numbers that aren’t predicted by the model itself but must be found by doing experiments and finding the best statistical fit. With those numbers as input, the model is extremely accurate, aside from the occasional weird discrepancy.
Cosmologists have their own very successful standard model that they use to model the universe as a whole. Called ΛCDM, it describes the universe in terms of three things: dark energy, denoted with a capital lambda (Λ), cold dark matter (CDM), and ordinary matter, all interacting with each other via gravity. The model has six parameters, which must be found by observing the universe and finding the best statistical fit. When those numbers are input, the model is extremely accurate, though there have recently been some high-profile discrepancies.
ΛCDM doesn’t just propose a list of fields and let them interact freely. Instead, it tries to model the universe as a whole, which means it carries assumptions about how matter and energy are distributed, and how space-time is shaped. Some of this is controlled by its parameters, and by tweaking them one can model a universe that varies in different ways. But other assumptions are baked in. If the universe had a very different shape, caused by a very different distribution of matter and energy, then we would need a very different model to represent it. We couldn’t use ΛCDM.
The Standard Model isn’t like that. If you collide two protons together, you need a model of how quarks are distributed inside protons. But that model isn’t the Standard Model, it’s a separate model used for that particular type of experiment. The Standard Model is supposed to be the big picture, the stuff that exists and affects every experiment you can do.
That means the Standard Model is supported in a way that ΛCDM isn’t. The Standard Model describes many different experiments, and is supported by almost all of them. When an experiment disagrees, it has specific implications for part of the model only. For example, neutrinos have mass, which was not predicted in the Standard Model, but it proved easy for people to modify the model to fit. We know the Standard Model is not the full picture, but we also know that any deviations from it must be very small. Large deviations would contradict other experiments, or more basic principles like probabilities needing to be smaller than one.
In contrast, ΛCDM is really just supported by one experiment. We have one universe to observe. We can gather a lot of data, measuring it from its early history to the recent past. But we can’t run it over and over again under different conditions, and our many measurements are all measuring different aspects of the same thing. That’s why unlike in the Standard Model, we can’t separate out assumptions about the shape of the universe from assumptions about what it contains. Dark energy and dark matter are on the same footing as distribution of fluctuations and homogeneity and all those shape-related words, part of one model that gets fit together as a whole.
And so while both the Standard Model and ΛCDM are successful, that success means something different. It’s hard to imagine that we find new evidence and discover that electrons don’t exist, or quarks don’t exist. But we may well find out that dark energy doesn’t exist, or that the universe has a radically different shape. The statistical success of ΛCDM is impressive, and it means any alternative has a high bar to clear. But it doesn’t have to mean rethinking everything the way an alternative to the Standard Model would.
In this terrifying time for the world, I’m delighted to announce a little glimmer of good news. I’m receiving a large grant from the wonderful Open Philanthropy, to build up a group of students and postdocs over the next few years, here at UT Austin, to do research in theoretical computer science that’s motivated by AI alignment. We’ll think about some of the same topics I thought about in my time at OpenAI—interpretability of neural nets, cryptographic backdoors, out-of-distribution generalization—but we also hope to be a sort of “consulting shop,” to whom anyone in the alignment community can come with theoretical computer science problems.
I already have two PhD students and several undergraduate students working in this direction. If you’re interested in doing a PhD in CS theory for AI alignment, feel free to apply to the CS PhD program at UT Austin this coming December and say so, listing me as a potential advisor.
Meanwhile, if you’re interested in a postdoc in CS theory for AI alignment, to start as early as this coming August, please email me your CV and links to representative publications, and arrange for two recommendation letters to be emailed to me.
The Open Philanthropy project will put me in regular contact with all sorts of people who are trying to develop complexity theory for AI interpretability and alignment. One great example of such a person is Eric Neyman—previously a PhD student of Tim Roughgarden at Columbia, now at the Alignment Research Center, the Berkeley organization founded by my former student Paul Christiano. Eric has asked me to share an exciting announcement, along similar lines to the above:
The Alignment Research Center (ARC) is looking for grad students and postdocs for its visiting researcher program. ARC is trying to develop algorithms for explaining neural network behavior, with the goal of advancing AI safety (see here for a more detailed summary). Our research approach is fairly theory-focused, and we are interested in applicants with backgrounds in CS theory or ML. Visiting researcher appointments are typically 10 weeks long, and are offered year-round.
If you are interested, you can apply here. (The link also provides more details about the role, including some samples of past work done by ARC.) If you have any questions, feel free to email hiring@alignment.org.
Some of my students and I are working closely with the ARC team. I like what I’ve seen of their research so far, and would encourage readers with the relevant background to apply.
Meantime, I of course continue to be interested in quantum computing! I’ve applied for multiple grants to continue doing quantum complexity theory, though whether or not I can get such grants will alas depend (among other factors) whether the US National Science Foundation continues to exist, as more than a shadow of what it was. The signs look ominous; Science magazine reports that the NSF just cut by half the number of awarded graduate fellowships, and this has almost certainly directly affected students who I know and care about.
Meantime we all do the best we can. My UTCS colleague, Chandrajit Bajaj, is currently seeking a postdoc in the general area of Statistical Machine Learning, Mathematics, and Statistical Physics, for up to three years. Topics include:
Learning various dynamical systems through their Stochastic Hamiltonians. This involves many subproblems in geometry, stochastic optimization and stabilized flows which would be interesting in their own right.
Optimizing task dynamics on different algebraic varieties of applied interest — Grassmanians, the Stiefel and Flag manifolds, Lie groups, etc.
Thanks so much to the folks at Open Philanthropy, and to everyone else doing their best to push basic research forward even while our civilization is on fire.
The highest-mass subnuclear particle ever observed used to the the top quark. Measured for the first time by the CDF experiment in 1994, and subsequently confirmed by CDF and D0 in 1995, the top quark is the heaviest elementary particle we know of, and it is a wonderful physical system per se, which has been studied with momentum in the past thirty years at the Tevatron and at the LHC colliders. The top quark
With the stock market crash and the big protests across the US, I’m finally feeling a trace of optimism that Trump’s stranglehold on the nation will weaken. Just a trace.
I still need to self-medicate to keep from sinking into depression — where ‘self-medicate’, in my case, means studying fun math and physics I don’t need to know. I’ve been learning about the interactions between number theory and group theory. But I haven’t been doing enough physics! I’m better at that, and it’s more visceral: more of a bodily experience, imagining things wiggling around.
So, I’ve been belatedly trying to lessen my terrible ignorance of nuclear physics. Nuclear physics is a fascinating application of quantum theory, but it’s less practical than chemistry and less sexy than particle physics, so I somehow skipped over it.
I’m finding it worth looking at! Right away it’s getting me to think about quantum ellipsoids.
Nuclear physics forces you to imagine blobs of protons and neutrons wiggling around in a very quantum-mechanical way. Nuclei are too complicated to fully understand. We can simulate them on a computer, but simulation is not understanding, and it’s also very hard: one book I’m reading points out that one computation you might want to do requires diagonalizing a matrix. So I’d rather learn about the many simplified models of nuclei people have created, which offer partial understanding… and lots of beautiful math.
Protons minimize energy by forming pairs with opposite spin. Same for neutrons. Each pair acts like a particle in its own right. So nuclei act very differently depending on whether they have an even or odd number of protons, and an even or odd number of neutrons!
The ‘Interacting Boson Model’ is a simple approximate model of ‘even-even’ atomic nuclei: nuclei with an even number of protons and an even number of neutrons. It treats the nucleus as consisting of bosons, each boson being either a pair of nucleons — that is, either protons or neutrons — where the members of a pair have opposite spin but are the same in every other way. So, these bosons are a bit like the paired electrons responsible for superconductivity, called ‘Cooper pairs’.
However, in the Interacting Boson Model we assume our bosons all have either spin 0 (s-bosons) or spin 2 (d-bosons), and we ignore all properties of the bosons except their spin angular momentum. A spin-0 particle has 1 spin state, since the spin-0 representation of is 1-dimensional. A spin-2 particle has 5, since the spin-2 representation is 5-dimensional.
If we assume the maximum amount of symmetry among all 6 states, both s-boson and d-boson states, we get a theory with symmetry! And part of why I got interested in this stuff was that it would be fun to see a rather large group like showing up as symmetries — or approximate symmetries — in real world physics.
More sophisticated models recognize that not all these states behave the same, so they assume a smaller group of symmetries.
But there are some simpler questions to start with.
How do we make a spin-0 or spin-2 particle out of two nucleons? That’s easy. Two nucleons with opposite spin have total spin 0. But if they’re orbiting each other, they have orbital angular momentum too, so the pair can act like a particle with spin 0, 1, 2, 3, etc.
Why are these bosons in the Interacting Boson Model assumed to have spin 0 or spin 2, but not spin 1 or any other spin? This is a lot harder. I assume that at some level the answer is “because this model works fairly well”. But why does it work fairly well?
By now I’ve found two answers for this, and I’ll tell you the more exciting answer, which I found in this book:
Igal Talmi, Simple Models of Complex Nuclei: the Shell Model and Interacting Boson Model, Harwood Academic Publishers, Chur, Switzerland, 1993.
In the ‘liquid drop model’ of nuclei, you think of a nucleus as a little droplet of fluid. You can think of an even-even nucleus as a roughly ellipsoidal droplet, which however can vibrate. But we need to treat it using quantum mechanics. So we need to understand quantum ellipsoids!
The space of ellipsoids in centered at the origin is 6-dimensional, because these ellipsoids are described by equations like
and there are 6 coefficients here. Not all nuclei are close to spherical! But perhaps it’s easiest to start by thinking about ellipsoids that are close to spherical, so that
where are small. If our nucleus were classical, we’d want equations that describe how these numbers change with time as our little droplet oscillates.
But the nucleus is deeply quantum mechanical. So in the Interacting Boson Model, invented by Iachello, it seems we replace with operators on a Hilbert space, say , and introduce corresponding momentum operators , obeying the usual ‘canonical commutation relations’:
As usual, we can take this Hilbert space to either be or ‘Fock space’ of : the Hilbert space completion of the symmetric algebra of . These are two descriptions of the same thing. The Fock space of gets an obvious representation of the unitary group , since that group acts on . And gets an obvious representation of , since rotations act on ellipsoids and thus on the tuples that we’re using to describe ellipsoids.
The latter description lets us see where the s-bosons and d-bosons are coming from! Our representation of on splits into two summands:
the (real) spin-0 representation, which is 1-dimensional because it takes just one number to describe the rotation-invariant aspects of the shape of an ellipsoid centered at the origin: for example, its volume. In physics jargon this number tells us the monopole moment of the mass distribution of our nucleus.
the (real) spin-2 representation, which is 5-dimensional because it takes 5 numbers to describe all other aspects of the shape of an ellipsoid centered at the origin. You need 2 numbers to say in which direction its longest axis points, one number to say how long that axis is, 1 number to say which direction the second-longest axis point in (it’s at right angles to the longest axis), and 1 number to say how long it is. In physics jargon these 5 numbers tell us the quadrupole moment of our nucleus.
This shows us why we don’t get spin-1 bosons! We’d get them if the mass distribution of our nucleus could have a nonzero dipole moment. In other words, we’d get them if we added linear terms to our equation
But by conservation of momentum, we can assume the center of mass of our nucleus stays at the origin, and set these linear terms to zero.
As usual, we can take linear combinations of the operators and to get annihilation and creation operators for s-bosons and d-bosons. If we want, we can think of these bosons as nucleon pairs. But we don’t need that microscopic interpretation if we don’t want it: we can just say we’re studying the quantum behavior of an oscillating ellipsoid!
After we have our Hilbert space and these operators on it, we can write down a Hamiltonian for our nucleus, or various possible candidate Hamiltonians, in terms of these operators. Talmi’s book goes into a lot of detail on that. And then we can compare the oscillations these Hamiltonians predict to what we see in the lab. (Often we just see the frequencies of the standing waves, which are proportional to the eigenvalues of the Hamiltonian.)
So, from a high-level mathematical viewpoint, what we’ve done is try to define a manifold of ellipsoid shapes, and then form its cotangent bundle , and then quantize that and start studying ‘quantum ellipsoids’.
Pretty cool! And there’s a lot more to say about it. But I’m wondering if there might be a better manifold of ellipsoid shapes than just . After all, when or become negative things go haywire: our ellipsoid can turn into a hyperboloid! The approach I’ve described is probably fine ‘perturbatively’, i.e. when are small. But it may not be the best when our ellipsoid oscillates so much it gets far from spherical.
I think we need a real algebraic geometer here. In both senses of the word ‘real’.
With the stock market crash and the big protests across the US, I’m finally feeling a trace of optimism that Trump’s stranglehold on the nation will weaken. Just a trace.
I still need to self-medicate to keep from sinking into depression — where ‘self-medicate’, in my case, means studying fun math and physics I don’t need to know. I’ve been learning about the interactions between number theory and group theory. But I haven’t been doing enough physics! I’m better at that, and it’s more visceral: more of a bodily experience, imagining things wiggling around.
So, I’ve been belatedly trying to lessen my terrible ignorance of nuclear physics. Nuclear physics is a fascinating application of quantum theory, but it’s less practical than chemistry and less sexy than particle physics, so I somehow skipped over it.
I’m finding it worth looking at! Right away it’s getting me to think about quantum ellipsoids.
Nuclear physics forces you to imagine blobs of protons and neutrons wiggling around in a very quantum-mechanical way. Nuclei are too complicated to fully understand. We can simulate them on a computer, but simulation is not understanding, and it’s also very hard: one book I’m reading points out that one computation you might want to do requires diagonalizing a matrix. So I’d rather learn about the many simplified models of nuclei people have created, which offer partial understanding… and lots of beautiful math.
Protons minimize energy by forming pairs with opposite spin. Same for neutrons. Each pair acts like a particle in its own right. So nuclei act very differently depending on whether they have an even or odd number of protons, and an even or odd number of neutrons!
The ‘Interacting Boson Model‘ is a simple approximate model of ‘even-even’ atomic nuclei: nuclei with an even number of protons and an even number of neutrons. It treats the nucleus as consisting of bosons, each boson being either a pair of nucleons—that is, either protons or neutrons—where the members of a pair have opposite spin but are the same in every other way. So, these bosons are a bit like the paired electrons responsible for superconductivity, called ‘Cooper pairs’.
However, in the Interacting Boson Model we assume our bosons all have either spin 0 (s-bosons) or spin 2 (d-bosons), and we ignore all properties of the bosons except their spin angular momentum. A spin-0 particle has 1 spin state, since the spin-0 representation of is 1-dimensional. A spin-2 particle has 5, since the spin-2 representation is 5-dimensional.
If we assume the maximum amount of symmetry among all 6 states, both s-boson and d-boson states, we get a theory with symmetry! And part of why I got interested in this stuff was that it would be fun to see a rather large group like showing up as symmetries—or approximate symmetries—in real world physics.
More sophisticated models recognize that not all these states behave the same, so they assume a smaller group of symmetries.
But there are some simpler questions to start with.
How do we make a spin-0 or spin-2 particle out of two nucleons? That’s easy. Two nucleons with opposite spin have total spin 0. But if they’re orbiting each other, they have orbital angular momentum too, so the pair can act like a particle with spin 0, 1, 2, 3, etc.
Why are these bosons in the Interacting Boson Model assumed to have spin 0 or spin 2, but not spin 1 or any other spin? This is a lot harder. I assume that at some level the answer is “because this model works fairly well”. But why does it work fairly well?
By now I’ve found two answers for this, and I’ll tell you the more exciting answer, which I found in this book:
• Igal Talmi, Simple Models of Complex Nuclei: the Shell Model and Interacting Boson Model, Harwood Academic Publishers, Chur, Switzerland, 1993.
In the ‘liquid drop model’ of nuclei, you think of a nucleus as a little droplet of fluid. You can think of an even-even nucleus as a roughly ellipsoidal droplet, which however can vibrate. But we need to treat it using quantum mechanics. So we need to understand quantum ellipsoids!
The space of ellipsoids in centered at the origin is 6-dimensional, because these ellipsoids are described by equations like
and there are 6 coefficients here. Not all nuclei are close to spherical! But perhaps it’s easiest to start by thinking about ellipsoids that are close to spherical, so that
where are small. If our nucleus were classical, we’d want equations that describe how these numbers change with time as our little droplet oscillates.
But the nucleus is deeply quantum mechanical. So in the Interacting Boson Model, invented by Iachello, it seems we replace with operators on a Hilbert space, say , and introduce corresponding momentum operators , obeying the usual ‘canonical commutation relations’:
As usual, we can take this Hilbert space to either be or the ‘Fock space’ on : the Hilbert space completion of the symmetric algebra of . These are two descriptions of the same thing. The Fock space on gets an obvious representation of the unitary group , since that group acts on . And gets an obvious representation of , since rotations act on ellipsoids and thus on the tuples that we’re using to describe ellipsoids.
The latter description lets us see where the s-bosons and d-bosons are coming from! Our representation of on splits into two summands:
• the (real) spin-0 representation, which is 1-dimensional because it takes just one number to describe the rotation-invariant aspects of the shape of an ellipsoid centered at the origin: for example, its volume. In physics jargon this number tells us the monopole moment of our nucleus.
• the (real) spin-2 representation, which is 5-dimensional because it takes 5 numbers to describe all other aspects of the shape of an ellipsoid centered at the origin. You need 2 numbers to say in which direction its longest axis points, one number to say how long that axis is, 1 number to say which direction the second-longest axis point in (it’s at right angles to the longest axis), and 1 number to say how long it is. In physics jargon these 5 numbers tell us the quadrupole moment of our nucleus.
This shows us why we don’t get spin-1 bosons! We’d get them if the mass distribution of our nucleus could have a nonzero dipole moment. In other words, we’d get them if we added linear terms to our equation
But by conservation of momentum, we can assume the center of mass of our nucleus stays at the origin, and set these linear terms to zero.
As usual, we can take linear combinations of the operators and to get annihilation and creation operators for s-bosons and d-bosons. If we want, we can think of these bosons as nucleon pairs. But we don’t need that microscopic interpretation if we don’t want it: we can just say we’re studying the quantum behavior of an oscillating ellipsoid!
After we have our Hilbert space and some operators on it, we can write down a Hamiltonian for our nucleous, or various possible candidate Hamiltonians, in terms of these operators. Talmi’s book goes into a lot of detail on that. And then we can compare the oscillations these Hamiltonians predict to what we see in the lab. (Often we just see the frequencies of the standing waves, which are proportional to the eigenvalues of the Hamiltonian.)
So, from a high-level mathematical viewpoint, what we’ve done is try to define a manifold of ellipsoid shapes, and then form its cotangent bundle , and then quantize that and start studying ‘quantum ellipsoids’.
Pretty cool! And there’s a lot more to say about it. But I’m wondering if there might be a better manifold of ellipsoid shapes than just . After all, when or become negative things go haywire: our ellipsoid can turn into a hyperboloid! The approach I’ve described is probably fine ‘perturbatively’, i.e. when are small. But it may not be the best when our ellipsoid oscillates so much it gets far from spherical.
I think we need a real algebraic geometer here. In both senses of the word ‘real’.
The direction wind is coming from is the windward side, and the direction wind is blowing toward is the leeward side, so if you’re sailing a boat, you have to make allowances for the fact that your course is going to shift a bit in the leeward direction as you go, and that allowance is called… leeway. My mind is blown! It’s so weird and good how words are made of things.
“I have a theory,” says the scientist in the book. But what does that mean? What does it mean to “have” a theory?
First, there’s the everyday sense. When you say “I have a theory”, you’re talking about an educated guess. You think you know why something happened, and you want to check your idea and get feedback. A pedant would tell you you don’t really have a theory, you have a hypothesis. It’s “your” hypothesis, “your theory”, because it’s what you think happened.
The pedant would insist that “theory” means something else. A theory isn’t a guess, even an educated guess. It’s an explanation with evidence, tested and refined in many different contexts in many different ways, a whole framework for understanding the world, the most solid knowledge science can provide. Despite the pedant’s insistence, that isn’t the only way scientists use the word “theory”. But it is a common one, and a central one. You don’t really “have” a theory like this, though, except in the sense that we all do. These are explanations with broad consensus, things you either know of or don’t, they don’t belong to one person or another.
Except, that is, if one person takes credit for them. We sometimes say “Darwin’s theory of evolution”, or “Einstein’s theory of relativity”. In that sense, we could say that Einstein had a theory, or that Darwin had a theory.
Sometimes, though, “theory” doesn’t mean this standard official definition, even when scientists say it. And that changes what it means to “have” a theory.
For some researchers, a theory is a lens with which to view the world. This happens sometimes in physics, where you’ll find experts who want to think about a situation in terms of thermodynamics, or in terms of a technique called Effective Field Theory. It happens in mathematics, where some choose to analyze an idea with category theory not to prove new things about it, but just to translate it into category theory lingo. It’s most common, though, in the humanities, where researchers often specialize in a particular “interpretive framework”.
For some, a theory is a hypothesis, but also a pet project. There are physicists who come up with an idea (maybe there’s a variant of gravity with mass! maybe dark energy is changing!) and then focus their work around that idea. That includes coming up with ways to test whether the idea is true, showing the idea is consistent, and understanding what variants of the idea could be proposed. These ideas are hypotheses, in that they’re something the scientist thinks could be true. But they’re also ideas with many moving parts that motivate work by themselves.
Taken to the extreme, this kind of “having” a theory can go from healthy science to political bickering. Instead of viewing an idea as a hypothesis you might or might not confirm, it can become a platform to fight for. Instead of investigating consistency and proposing tests, you focus on arguing against objections and disproving your rivals. This sometimes happens in science, especially in more embattled areas, but it happens much more often with crackpots, where people who have never really seen science done can decide it’s time for their idea, right or wrong.
Finally, sometimes someone “has” a theory that isn’t a hypothesis at all. In theoretical physics, a “theory” can refer to a complete framework, even if that framework isn’t actually supposed to describe the real world. Some people spend time focusing on a particular framework of this kind, understanding its properties in the hope of getting broader insights. By becoming an expert on one particular theory, they can be said to “have” that theory.
Bonus question: in what sense do string theorists “have” string theory?
You might imagine that string theory is an interpretive framework, like category theory, with string theorists coming up with the “string version” of things others understand in other ways. This, for the most part, doesn’t happen. Without knowing whether string theory is true, there isn’t much benefit in just translating other things to string theory terms, and people for the most part know this.
For some, string theory is a pet project hypothesis. There is a community of people who try to get predictions out of string theory, or who investigate whether string theory is consistent. It’s not a huge number of people, but it exists. A few of these people can get more combative, or make unwarranted assumptions based on dedication to string theory in particular: for example, you’ll see the occasional argument that because something is difficult in string theory it must be impossible in any theory of quantum gravity. You see a spectrum in the community, from people for whom string theory is a promising project to people for whom it is a position that needs to be defended and argued for.
For the rest, the question of whether string theory describes the real world takes a back seat. They’re people who “have” string theory in the sense that they’re experts, and they use the theory primarily as a mathematical laboratory to learn broader things about how physics works. If you ask them, they might still say that they hypothesize string theory is true. But for most of these people, that question isn’t central to their work.
The United States is now a country that disappears people.
Visa holders, green card holders, and even occasionally citizens mistaken for non-citizens: Trump’s goons can now seize them off the sidewalk at any time, handcuff them, detain them indefinitely in a cell in Louisiana with minimal access to lawyers, or even fly them to an overcrowded prison in El Salvador to be tortured.
It’s important to add: from what I know, some of the people being detained and deported are genuinely horrible. Some worked for organizations linked to Hamas, and cheered the murder of Jews. Some trafficked fentanyl. Some were violent gang members.
There are proper avenues to deport such people, in normal pre-Trumpian US law. For example, you can void someone’s visa by convincing a judge that they lied about not supporting terrorist organizations in their visa application.
But already other disappeared people seem to have been entirely innocent. Some apparently did nothing worse than write lefty op-eds or social media posts. Others had innocuous tattoos that were mistaken for gang insignia.
Millennia ago, civilization evolved mechanisms like courts and judges and laws and evidence and testimony, to help separate the guilty from the innocent. These are known problems with known solutions. No new ideas are needed.
One reader advised me not to blog about this issue unless I had something original to say: how could I possibly add to the New York Times’ and CNN’s daily coverage of every norm-shattering wrinkle? But other readers were livid at me for not blogging, even interpreting silence or delay as support for fascism.
For those readers, but more importantly for my kids and posterity, let me say: no one who follows this blog could ever accuse me of reflexive bleeding-heart wokery, much less of undue sympathy for “globalize the intifada” agitators. So with whatever credibility that grants me: Shtetl-Optimized unequivocally condemns the “grabbing random foreign students off the street” method of immigration enforcement. If there are resident aliens who merit deportation, prove it to a friggin’ judge (I’ll personally feel more confident that the law is being applied sanely if the judge wasn’t appointed by Trump). Prove that you got the right person, and that they did what you said, and that that violated the agreed-upon conditions of their residency according to some consistently-applied standard. And let the person contest the charges, with advice of counsel.
I don’t want to believe the most hyperbolic claims of my colleagues, that the US is now a full Soviet-style police state, or inevitably on its way to one. I beg any conservatives reading this post, particularly those with influence over events: help me not to believe this.
The quantum double-slit experiment, in which objects are sent toward a wall with two slits and then recorded on a screen behind the wall, creates an interference pattern that builds up gradually, object by object. And yet, it’s crucial that the path of each object on its way to the screen remain unknown. If one measures which of the slits each object passes through, the interference pattern never appears.
Strange things are said about this. There are vague, weird slogans: “measurement causes the wave function to collapse“; “the particle interferes with itself“; “electrons are both particles and waves“; etc. One reads that the objects are particles when they reach the screen, but they are waves when they go through the slits, causing the interference — unless their passage through the slits is measured, in which case they remain particles.
But in fact the equations of 1920s quantum physics say something different and not vague in the slightest — though perhaps equally weird. As we’ll see today, the elimination of interference by measurement is no mystery at all, once you understand both measurement and interference. Those of you who’ve followed my recent posts on these two topics will find this surprisingly straightforward; I guarantee you’ll say, “Oh, is that all?” Other readers will probably want to read
When do we expect quantum interference? As I’ll review in a moment, there’s a simple criterion:
a system of objects (not the objects themselves!) will exhibit quantum interference if the system, initially in a superposition of possibilities, reaches a single possibility via two or more pathways.
To remind you what that means, let’s compare two contrasting cases (covered carefully in this post.) Figs. 1a and 1b show pre-quantum animations of different quantum systems, in which two balls (drawn blue and orange) are in a superposition of moving left OR moving right. I’ve chosen to stop each animation right at the moment when the blue ball in the top half of the superposition is at the same location as the blue ball in the bottom half, because if the orange ball weren’t there, this is when we’d expect it to see quantum interference.
But for interference to occur, the orange ball, too, must at that same moment be in the same place in both parts of the superposition. That does happen for the system in Fig. 1a — the top and bottom parts of the figure line up exactly, and so interference will occur. But the system in Fig. 1b, whose top and bottom parts never look the same, will not show quantum interference.
Fig. 1a: A system of two balls in a superposition, from a pre-quantum viewpoint. As the system evolves, a moment is reached when the two parts of the superposition are identical. As the system has then reached a single possibility via two routes, quantum interference may result.
Figure 1b: Similar to Fig. 1a, except that when the blue ball is at the same location in both parts of the superposition, the orange ball is at two different locations. At no moment are the two possibilities in the superposition the same, so quantum interference cannot occur.
In other words, quantum interference requires that the two possibilities in the superposition become identical at some moment in time. Partial resemblance is not enough.
The Measurement
A measurement always involves an interaction of some sort between the object we want to measure and the device doing the measurement. We will typically
For today’s purposes, the details of the second step won’t matter, so I’ll focus on the first step.
Setting Up
We’ll call the object going through the slits a “particle”, and we’ll call the measurement device a “measuring ball” (or just “ball” for short.) The setup is depicted in Fig. 2, where the particle is approaching the slits and the measuring ball lies in wait.
Figure 2: A particle (blue) approaches a wall with two slits, behind which sits a screen where the particle’s arrival will be detected. Also present is a lightweight measuring ball (black), ready to fly in and measure the particle’s position by colliding with it as it passes through the wall.
If No Measurement is Made at the Slits
Suppose we allow the particle to proceed and we make no measurement of its location as it passes through the slits. Then we can leave the ball where it is, at the position I’ve marked M in Fig. 3. If the particle makes it through the wall, it must pass through one slit or the other, leaving the system in a superposition of the form
the particle is near the left slit [and the ball is at position M] OR
the particle is near the right slit [and the ball is at position M]
as shown at the top of Fig. 3. (Note: because the ball and particle are independent [unentangled] in this superposition, it can be written in factored form as in Fig. 12 of this post.)
From here, the particle (whose motion is now quite uncertain as a result of passing through a narrow slit) can proceed unencumbered to the screen. Let’s say it arrives at the point marked P, as at the bottom of Fig. 3.
Figure 3: (Top) As the particle passes through the slits, the system is set into a superposition of two possibilities in which the particle passes through the left slit OR the right slit. (The particle’s future motion is quite uncertain, as indicated by the green arrows.) In both possibilities, the measuring ball is at point M. (Bottom) If the particle arrives at point P on the screen, then the two possibilties in the superposition become identical, as in Fig. 1a, so quantum interference can result. This will be true no matter what point P we choose, and so an interference pattern will be seen across the whole screen.
Crucially, both halves of the superposition now describe the same situation: particle at P, ball at M. The system has arrived here via two paths:
The particle went through the left slit and arrived at the point P (with the ball always at M), OR
The particle went through the right slit and arrived at the point P (with the ball always at M).
Therefore, since the system has reached a single possibility via two different routes, quantum interference may be observed.
But now let’s make the measurement. We’ll do it by throwing the ball rapidly toward the particle, timed carefully so that, as shown in Fig. 4, either
the particle is at the left slit, in which case the ball passes behind it and travels onward, OR
the particle is at the right slit, in which case the ball hits it and bounces back.
(Recall that I assumed the measuring ball is lightweight, so the collision doesn’t much affect the particle; for instance, the particle might be an heavy atom, while the measuring ball is a light atom.)
Figure 4: As the particle moves through the wall, the ball is sent rapidly in motion. If the particle passes through the right slit, the ball will hit it and bounce back; if the particle passes through the left slit, the ball will miss it and will continue to the left.
The ball’s late-time behavior reveals — and thus measures — the particle’s behavior as it passed through the wall:
the ball moving to the left means the particle went through the left slit;
the ball moving to the right means the particle went through the right slit.
To make this measurement complete and permanent requires a longer story with more details; for instance, we might choose to amplify the result with a Geiger counter. But the details don’t matter, and besides, that takes place later. Let’s keep our focus on what happens next.
The Effect of the Measurement
What happens next is that the particle reaches the point P on the screen. It can do this whether it traveled via the left slit or via the right slit, just as before, and so you might think there should still be an interference pattern. However, remembering Figs. 1a and 1b and the criterion for interference, take a look at Fig. 5.
Figure 5: Following the measurement made in Fig. 4, the arrival of the particle at the point P on the screen finds the ball in two possible locations, depending on which slit the particle went through. In contrast to Fig. 3, the two parts of the superposition are not identical, and so (as in Fig. 1b) no quantum interference pattern will be observed.
Even though the particle by itself could have taken two paths to the point P, the system as a whole is still in a superposition of two different possibilities, not one — more like Fig. 1b than like Fig. 1a. Specifically,
the particle is at position P and the ball is at location ML (which happens if, in Fig. 4, the particle was near the left slit and the ball continued to the left); OR
the particle is at position P and the ball is at location MR (which happens if, in Fig. 4, the particle was near the right slit and the ball bounced back to the right).
The measurement process — by the very definition of “measurement” as a procedure that segregates left-slit cases from right-slit cases — has resulted in the two parts of the superposition being different even when they both have the particle reaching the same point P. Therefore, in contrast to Fig. 3, quantum interference between the two parts of the superposition cannot occur.
And that’s it. That’s all there is to it.
Looking Ahead.
The double-slit experiment is hard to understand if one relies on vague slogans. But if one relies on the math, one sees that many of the seemingly mysterious features of the experiment are in fact straightforward.
I’ll say more about this in future posts. In particular, to convince you today’s argument is really correct, I’ll look more closely at the quantum wave function corresponding to Figs. 3-5, and will reproduce the same phenomenon in simpler examples. Then we’ll apply the resulting insights to other cases, including
measurements that do not destroy interference,
measurements that only partly destroy interference,
destruction of interference without measurement, and
In a world where misinformation, voluntary or accidental, reigns supreme; in a world where lies become truth if they are broadcast for long enough; in a world where we have unlimited access to superintelligent machines, but we prefer to remain ignorant; in this world we are unfortunately living today, that is, the approach taken by scientists to accumulate knowledge - peer review - is something we should hold dear and preserve with care. And yet...
Applied category theorists are flocking to AI, because that’s where the money is. I avoid working on it, both because I have an instinctive dislike of ‘hot topics’, and because at present AI is mainly being used to make rich and powerful people richer and more powerful.
However, I like to pay some attention to how category theorists are getting jobs connected to AI, and what they’re doing. Many of these people are my friends, so I wonder what they will do for AI, and the world at large—and what working on AI will do to them.
Let me list a bit of what’s going on. I’ll start with a cautionary tale, and then turn to the most important program linking AI and category theory today.
Symbolica
When Musk and his AI head Andrej Karpathy didn’t listen to their engineer George Morgan’s worry that current techniques in deep learning couldn’t “scale to infinity and solve all problems,” Morgan left Tesla and started a company called Symbolica to work on symbolic reasoning. The billionaire Vinod Khosla gave him $2 million in seed money. He began with an approach based on hypergraphs, but then these researchers wrote a position paper that pushed the company in a different direction:
Kholsa liked the new direction and invested $30 million more in Symbolica. At Gavranovic and Lessard’s suggestion Morgan hired category theorists including Dominic Verity and Neil Ghani.
But Morgan was never fully sold on category theory: he still wanted to pursue his hypergraph approach. After a while, continued disagreements between Morgan and the category theorists took their toll. He fired some, even having one summarily removed from his office. Another resigned voluntarily. Due to nondisclosure agreements, these people no longer talk publicly about what went down.
So one moral for category theorists, or indeed anyone with a good idea: after your idea helps someone get a lot of money, they may be able to fire you.
ARIA
David Dalrymple is running a £59 million program on Mathematics for Safeguarded AI at the UK agency ARIA (the Advanced Research + Invention Agency). He is very interested in using category theory for this purpose.
Here’s what the webpage for the Safeguarded AI program says:
Why this programme
As AI becomes more capable, it has the potential to power scientific breakthroughs, enhance global prosperity, and safeguard us from disasters. But only if it’s deployed wisely. Current techniques working to mitigate the risk of advanced AI systems have serious limitations, and can’t be relied upon empirically to ensure safety. To date, very little R&D effort has gone into approaches that provide quantitative safety guarantees for AI systems, because they’re considered impossible or impractical.
What we’re shooting for
By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’: an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.
This project aims to provide complete axiomatic theories of string diagrams for significant categories of probabilistic processes. Fabio will then use these theories to develop compositional methods of analysis for different kinds of probabilistic graphical models.
Safety: Core Representation Underlying Safeguarded AI
This project looks to design a calculus that utilises the semantic structure of quasi-Borel spaces, introduced in ‘A convenient category for higher-order probability theory‘. Ohad and his team will develop the internal language of quasi-Borel spaces as a “semantic universe” for stochastic processes, define syntax that is amenable to type-checking and versioning, and interface with other teams in the programme—either to embed other formalisms as sub-calculi in quasi-Borel spaces, or vice versa (e.g. for imprecise probability).
This project plans to overcome the limitations of traditional philosophical formalisms by integrating interdisciplinary knowledge through applied category theory. In collaboration with other TA1.1 Creators, David will explore graded modal logic, type theory, and causality, and look to develop the conceptual tools to support the broader Safeguarded AI programme.
Double Categorical Systems Theory for Safeguarded AI
This project aims to utilise Double Categorical Systems Theory (DCST) as a mathematical framework to facilitate collaboration between stakeholders, domain experts, and computer aides in co-designing an explainable and auditable model of an autonomous AI system’s deployment environment. David + team will expand this modelling framework to incorporate formal verification of the system’s safety and reliability, study the verification of model-surrogacy of hybrid discrete-continuous systems, and develop serialisable data formats for representing and operating on models, all with the goal of preparing the DCST framework for broader adoption across the Safeguarded AI Programme.
Filippo and team intend to establish a robust mathematical framework that extends beyond the metrics expressible in quantitative algebraic theories and coalgebras over metric spaces. By shifting from Cartesian to a monoidal setting, this group will examine these metrics using algebraic contexts (to enhance syntax foundations) and coalgebraic contexts provide robust quantitative semantics and effective techniques for establishing quantitative bounds on black-box behaviours), ultimately advancing the scope of quantitative reasoning in these domains.
Doubly Categorical Systems Logic: A Theory of Specification Languages
This project aims to develop a logical framework to classify and interoperate various logical systems created to reason about complex systems and their behaviours, guided by Doubly Categorical Systems Theory (DCST). Matteo’s goal is to study the link between the compositional and morphological structure of systems and their behaviour, specifically in the way logic pertaining these two components works, accounting for both dynamic and temporal features. Such a path will combine categorical approaches to both logic and systems theory.
True Categorical Programming for Composable Systems Total
This project intends to develop a type theory for categorical programming that enables encoding of key mathematical structures not currently supported by existing languages. These structures include functors, universal properties, Kan extensions, lax (co)limits, and Grothendieck constructions. Jade + team are aiming to create a type theory that accurately translates categorical concepts into code without compromise, and then deploy this framework to develop critical theorems related to the mathematical foundations of Safeguarded AI.
Sam + team will look to employ categorical probability toward key elements essential for world modelling in the Safeguarded AI Programme. They will investigate imprecise probability (which provides bounds on probabilities of unsafe behaviour), and stochastic dynamical systems for world modelling, and then look to create a robust foundation of semantic version control to support the above elements.
Amar + team will develop a combinatorial and diagrammatic syntax, along with categorical semantics, for multimodal Petri Nets. These Nets will model dynamical systems that undergo mode or phase transitions, altering their possible places, events, and interactions. Their goal is to create a category-theoretic framework for mathematical modelling and safe-by-design specification of dynamical systems and process theories which exhibit multiple modes of operation.
Topos UK
The Topos Institute is a math research institute in Berkeley with a focus on category theory. Three young category theorists there—Sophie Libkind, David Jaz Myers and Owen Lynch—wrote a proposal called “Double categorical systems theory for safeguarded AI”, which got funded by ARIA. So now they are moving to Oxford, where they will be working with Tim Hosgood, José Siqueira, Xiaoyan Li and maybe others at a second branch of the Topos Institute, called Topos UK.
Among other things, Tai-Danae Bradley is a research mathematician at SandboxAQ, a startup focused on AI and quantum technologies. She has applied category theory to natural language processing in a number of interesting papers.
VERSES
VERSES is a “cognitive computing company building next-generation intelligent software systems” with Karl Friston as chief scientist. They’ve hired the category theorists Toby St Clere Smith and Marco Perrin, who are working on compositional tools for approximate Bayesian inference.
Now finally, we come to the heart of the matter of quantum interference, as seen from the perspective of in 1920’s quantum physics. (We’ll deal with quantum field theory later this year.)
Last time I looked at some cases of two particle states in which the particles’ behavior is independent — uncorrelated. In the jargon, the particles are said to be “unentangled”. In this situation, and only in this situation, the wave function of the two particles can be written as a product of two wave functions, one per particle. As a result, any quantum interference can be ascribed to one particle or the other, and is visible in measurements of either one particle or the other. (More precisely, it is observable in repeated experiments, in which we do the same measurement over and over.)
In this situation, because each particle’s position can be studied independent of the other’s, we can be led to think any interference associated with particle 1 happens near where particle 1 is located, and similarly for interference involving the second particle.
But this line of reasoning only works when the two particles are uncorrelated. Once this isn’t true — once the particles are entangled — it can easily break down. We saw indications of this in an example that appeared at the ends of my last two posts (here and here), which I’m about to review. The question for today is: what happens to interference in such a case?
Correlation: When “Where” Breaks Down
Let me now review the example of my recent posts. The pre-quantum system looks like this
Figure 1: An example of a superposition, in a pre-quantum view, where the two particles are correlated and where interference will occur that involves both particles together.
Notice the particles are correlated; either both particles are moving to the left ORboth particles are moving to the right. (The two particles are said to be “entangled”, because the behavior of one depends upon the behavior of the other.) As a result, the wave function cannot be factored (in contrast to most examples in my last post) and we cannot understand the behavior of particle 1 without simultaneously considering the behavior of particle 2. Compare this to Fig. 2, an example from my last post in which the particles are independent; the behavior of particle 2 is the same in both parts of the superposition, independent of what particle 1 is doing.
Figure 2: Unlike Fig. 1, here the two particles are uncorrelated; the behavior of particle 2 is the same whether particle 1 is moving left OR right. As a result, interference can occur for particle 1 separately from any behavior of particle 2, as shown in this post.
Let’s return now to Fig. 1. The wave function for the corresponding quantum system, shown as a graph of its absolute value squared on the space of possibilities, behaves as in Fig. 3.
Figure 3: The absolute-value-squared of the wave function for the system in Fig, 1, showing interference as the peaks cross. Note the interference fringes are diagonal relative to the x1 and x2 axes.
But as shown last time in Fig. 19, at the moment where the interference in Fig. 3 is at its largest, if we measure particle 1 we see no interference effect. More precisely, if we do the experiment many times and measure particle 1 each time, as depicted in Fig. 4, we see no interference pattern.
Figure 4: The result of repeated experiments in which we measure particle 1, at the moment of maximal interference, in the system of Fig. 3. Each new experiment is shown as an orange dot; results of past experiments are shown in blue. No interference effect is seen.
We see something analogous if we measure particle 2.
Yet the interference is plain as day in Fig. 3. It’s obvious when we look at the full two-dimensional space of possibilities, even though it is invisible in Fig. 4 for particle 1 and in the analogous experiment for particle 2. So what measurements, if any, can we make that can reveal it?
The clue comes from the fact that the interference fringes lie at a 45 degree angle, perpendicular neither to the x1 axis nor to the x2 axis but instead to the axis for the variable 1/2(x1 + x2), the average of the positions of particle 1 and 2. It’s that average position that we need to measure if we are to observe the interference.
But doing so requires we that we measure both particles’ positions. We have to measure them both every time we repeat the experiment. Only then can we start making a plot of the average of their positions.
When we do this, we will find what is shown in Fig 5.
The top row shows measurements of particle 1.
The bottom row shows measurements of particle 2.
And the middle row shows a quantity that we infer from these measurements: their average.
For each measurement, I’ve drawn a straight orange line between the measurement of x1 and the measurement of x2; the center of this line lies at the average position 1/2(x1+x2). The actual averages are then recorded in a different color, to remind you that we don’t measure them directly; we infer them from the actual measurements of the two particles’ positions.
Figure 5: As in Fig. 4, the result of repeated experiments in which we measure both particles’ positions at the moment of maximal interference in Fig. 3. Top and bottom rows show the position measurements of particles 1 and 2; the middle row shows their average. Each new experiment is shown as two orange dots, they are connected by an orange line, at whose midpoint a new yellow dot is placed. Results of past experiments are shown in blue. No interference effect is seen in the individual particle positions, yet one appears in their average.
In short, the interference is not associated with either particle separately — none is seen in either the top or bottom rows. Instead, it is found within the correlation between the two particles’ positions. This is something that neither particle can tell us on its own.
And where is the interference? It certainly lies near 1/2(x1+x2)=0. But this should worry you. Is that really a point in physical space?
You could imagine a more extreme example of this experiment in which Fig. 5 shows particle 1 located in Boston and particle 2 located in New York City. This would put their average position within appropriately-named Middletown, Connecticut. (I kid you not; check for yourself.) Would we really want to say that the interference itself is located in Middletown, even though it’s a quiet bystander, unaware of the existence of two correlated particles that lie in opposite directions 90 miles (150 km) away?
After all, the interference appears in the relationship between the particles’ positions in physical space, not in the positions themselves. Its location in the space of possibilities (Fig. 3) is clear. Its location in physical space (Fig. 5) is anything but.
Still, I can imagine you pondering whether it might somehow make sense to assign the interference to poor, unsuspecting Middletown. For that reason, I’m going to make things even worse, and take Middletown out of the middle.
A Second System with No Where
Here’s another system with interference, whose pre-quantum version is shown in Figs. 6a and 6b:
Figure 6a: Another system in a superposition with entangled particles, shown in its pre-quantum version in physical space. In part A of the superposition both particles are stationary, while in part B they move oppositely.
Figure 6b: The same system as in Fig. 6a, depicted in the space of possibilities with its two initial possibilities labeled as stars. Possibility A remains where it is, while possibility B moves toward and intersects with possibility A, leading us to expect interference in the quantum wave function.
The corresponding wave function is shown in Fig. 7. Now the interference fringes are oriented diagonally the other way compared to Fig. 3. How are we to measure them this time?
Figure 7: The absolute-value-squared of the wave function for the system shown in Fig. 6. The interference fringes lie on the opposite diagonal from those of Fig. 3.
The average position 1/2(x1+x2) won’t do; we’ll see nothing interesting there. Instead the fringes are near (x1-x2)=4 — that is, they occur when the particles, no matter where they are in physical space, are at a distance of four units. We therefore expect interference near 1/2(x1-x2)=2. Is it there?
In Fig. 8 I’ve shown the analogue of Figs. 4 and 5, depicting
the measurements of the two particle positions x1 and x2, along with
their average 1/2(x1+x2) plotted between them (in yellow)
(half) their difference 1/2(x1-x2) plotted below them (in green).
That quantity 1/2(x1-x2) is half the horizontal length of the orange line. Hidden in its behavior over many measurements is an interference pattern, seen in the bottom row, where the 1/2(x1-x2) measurements are plotted. [Note also that there is no interference pattern in the measurements of 1/2(x1+x2), in contrast to Fig. 4.]
Figure 8: For the system of Figs. 6-7, repeated experiments in which the measurement of the position of particle 1 is plotted in the top row (upper blue points), that of particle 2 is plotted in the third row (lower blue points), their average is plotted between (yellow points), and half their difference is plotted below them (green points.) Each new set of measurements is shown as orange points connected by an orange line, as in Fig. 5. An interference pattern is seen only in the difference.
Now the question of the hour: where is the interference in this case? It is found near 1/2(x1-x2)=2 — but that certainly is not to be identified with a legitimate position in physical space, such as the point x=2.
First of all, making such an identification in Fig. 8 would be like saying that one particle is in New York and the other is in Boston, while the interference is 150 kilometers offshore in the ocean. But second and much worse, I could change Fig. 8 by moving both particles 10 units to the left and repeating the experiment. This would cause x1, x2, and 1/2(x1+x2) in Fig. 8 to all shift left by 10 units, moving them off your computer screen, while leaving 1/2(x1-x2) unchanged at 2. In short, all the orange and blue and yellow points would move out of your view, while the green points would remain exactly where they are. The difference of positions — a distance — is not a position.
If 10 units isn’t enough to convince you, let’s move the two particles to the other side of the Sun, or to the other side of the galaxy. The interference pattern stubbornly remains at 1/2(x1-x2)=2. The interference pattern is in a difference of positions, so it doesn’t care whether the two particles are in France, Antarctica, or Mars.
We can move the particles anywhere in the universe, as long as we take them together with their average distance remaining the same, and the interference pattern remains exactly the same. So there’s no way we can identify the interference as being located at a particular value of x, the coordinate of physical space. Trying to do so creates nonsense.
This is totally unlike interference in water waves and sound waves. That kind of interference happens in a someplace; we can say where the waves are, how big they are at a particular location, and where their peaks and valleys are in physical space. Quantum interference is not at all like this. It’s something more general, more subtle, and more troubling to our intuition.
[By the way, there’s nothing special about the two combinations 1/2(x1+x2) and 1/2(x1-x2), the average or the difference. It’s easy to find systems where the intereference arises in the combination x1+2x2, or 3x1-x2, or any other one you like. In none of these is there a natural way to say “where” the interference is located.]
The Profound Lesson
From these examples, we can begin to learn a central lesson of modern physics, one that a century of experimental and theoretical physics have been teaching us repeatedly, with ever greater subtlety. Imagining reality as many of us are inclined to do, as made of localized objects positioned in and moving through physical space — the one-dimensional x-axis in my simple examples, and the three-dimensional physical space that we take for granted when we specify our latitude, longitude and altitude — is simply not going to work in a quantum universe. The correlations among objects have observable consequences, and those correlations cannot simply be ascribed locations in physical space. To make sense of them, it seems we need to expand our conception of reality.
In the process of recognizing this challenge, we have had to confront the giant, unwieldy space of possibilities, which we can only visualize for a single particle moving in up to three dimensions, or for two or three particles moving in just one dimension. In realistic circumstances, especially those of quantum field theory, the space of possibilities has a huge number of dimensions, rendering it horrendously unimaginable. Whether this gargantuan space should be understood as real — perhaps even more real than physical space — continues to be debated.
Indeed, the lessons of quantum interference are ones that physicists and philosophers have been coping with for a hundred years, and their efforts to make sense of them continue to this day. I hope this series of posts has helped you understand these issues, and to appreciate their depth and difficulty.
Looking ahead, we’ll soon take these lessons, and other lessons from recent posts, back to the double-slit experiment. With fresher, better-informed eyes, we’ll examine its puzzles again.
In late 2020, I was sitting by a window in my home office (AKA living room) in Cambridge, Massachusetts. I’d drafted 15 chapters of my book Quantum Steampunk. The epilogue, I’d decided, would outline opportunities for the future of quantum thermodynamics. So I had to come up with opportunities for the future of quantum thermodynamics. The rest of the book had related foundational insights provided by quantum thermodynamics about the universe’s nature. For instance, quantum thermodynamics had sharpened the second law of thermodynamics, which helps explain time’s arrow, into more-precise statements. Conventional thermodynamics had not only provided foundational insights, but also accompanied the Industrial Revolution, a paragon of practicality. Could quantum thermodynamics, too, offer practical upshots?
Quantum thermodynamicists had designed quantum engines, refrigerators, batteries, and ratchets. Some of these devices could outperform their classical counterparts, according to certain metrics. Experimentalists had even realized some of these devices. But the devices weren’t useful. For instance, a simple quantum engine consisted of one atom. I expected such an atom to produce one electronvolt of energy per engine cycle. (A light bulb emits about 1021 electronvolts of light per second.) Cooling the atom down and manipulating it would cost loads more energy. The engine wouldn’t earn its keep.
Autonomous quantum machines offered greater hope for practicality. By autonomous, I mean, not requiring time-dependent external control: nobody need twiddle knobs or push buttons to guide the machine through its operation. Such control requires work—organized, coordinated energy. Rather than receiving work, an autonomous machine accesses a cold environment and a hot environment. Heat—random, disorganized energy cheaper than work—flows from the hot to the cold. The machine transforms some of that heat into work to power itself. That is, the machine sources its own work from cheap heat in its surroundings. Some air conditioners operate according to this principle. So can some quantum machines—autonomous quantum machines.
Thermodynamicists had designed autonomous quantum engines and refrigerators. Trapped-ion experimentalists had realized one of the refrigerators, in a groundbreaking result. Still, the autonomous quantum refrigerator wasn’t practical. Keeping the ion cold and maintaining its quantum behavior required substantial work.
My community needed, I wrote in my epilogue, an analogue of solar panels in southern California. (I probably drafted the epilogue during a Boston winter, thinking wistfully of Pasadena.) If you built a solar panel in SoCal, you could sit back and reap the benefits all year. The panel would fulfill its mission without further effort from you. If you built a solar panel in Rochester, you’d have to scrape snow off of it. Also, the panel would provide energy only a few months per year. The cost might not outweigh the benefit. Quantum thermal machines resembled solar panels in Rochester, I wrote. We needed an analogue of SoCal: an appropriate environment. Most of it would be cold (unlike SoCal), so that maintaining a machine’s quantum nature would cost a user almost no extra energy. The setting should also contain a slightly warmer environment, so that net heat would flow. If you deposited an autonomous quantum machine in such a quantum SoCal, the machine would operate on its own.
Where could we find a quantum SoCal? I had no idea.
A few months later, I received an email from quantum experimentalist Simone Gasparinetti. He was setting up a lab at Chalmers University in Sweden. What, he asked, did I see as opportunities for experimental quantum thermodynamics? We’d never met, but we agreed to Zoom. Quantum Steampunk on my mind, I described my desire for practicality. I described autonomous quantum machines. I described my yearning for a quantum SoCal.
I have it, Simone said.
Simone and his colleagues were building a quantum computer using superconducting qubits. The qubits fit on a chip about the size of my hand. To keep the chip cold, the experimentalists put it in a dilution refrigerator. You’ve probably seen photos of dilution refrigerators from Google, IBM, and the like. The fridges tend to be cylindrical, gold-colored monstrosities from which wires stick out. (That is, they look steampunk.) You can easily develop the impression that the cylinder is a quantum computer, but it’s only the fridge.
Not a quantum computer
The fridge, Simone said, resembles an onion: it has multiple layers. Outer layers are warmer, and inner layers are colder. The quantum computer sits in the innermost layer, so that it behaves as quantum mechanically as possible. But sometimes, even the fridge doesn’t keep the computer cold enough.
Imagine that you’ve finished one quantum computation and you’re preparing for the next. The computer has written quantum information to certain qubits, as you’ve probably written on scrap paper while calculating something in a math class. To prepare for your next math assignment, given limited scrap paper, you’d erase your scrap paper. The quantum computer’s qubits need erasing similarly. Erasing, in this context, means cooling down even more than the dilution refrigerator can manage.
Why not use an autonomous quantum refrigerator to cool the scrap-paper qubits?
I loved the idea, for three reasons. First, we could place the quantum refrigerator beside the quantum computer. The dilution refrigerator would already be cold, for the quantum computations’ sake. Therefore, we wouldn’t have to spend (almost any) extra work on keeping the quantum refrigerator cold. Second, Simone could connect the quantum refrigerator to an outer onion layer via a cable. Heat would flow from the warmer outer layer to the colder inner layer. From the heat, the quantum refrigerator could extract work. The quantum refrigerator would use that work to cool computational qubits—to erase quantum scrap paper. The quantum refrigerator would service the quantum computer. So, third, the quantum refrigerator would qualify as practical.
Over the next three years, we brought that vision to life. (By we, I mostly mean Simone’s group, as my group doesn’t have a lab.)
Artist’s conception of the autonomous-quantum-refrigerator chip. Credit: Chalmers University of Technology/Boid AB/NIST.
Postdoc Aamir Ali spearheaded the experiment. Then-master’s student Paul Jamet Suria and PhD student Claudia Castillo-Moreno assisted him. Maryland postdoc Jeffrey M. Epstein began simulating the superconducting qubits numerically, then passed the baton to PhD student José Antonio Marín Guzmán.
The experiment provided a proof of principle: it demonstrated that the quantum refrigerator could operate. The experimentalists didn’t apply the quantum refrigerator in a quantum computation. Also, they didn’t connect the quantum refrigerator to an outer onion layer. Instead, they pumped warm photons to the quantum refrigerator via a cable. But even in such a stripped-down experiment, the quantum refrigerator outperformed my expectations. I thought it would barely lower the “scrap-paper” qubit’s temperature. But that qubit reached a temperature of 22 milliKelvin (mK). For comparison: if the qubit had merely sat in the dilution refrigerator, it would have reached a temperature of 45–70 mK. State-of-the-art protocols had lowered scrap-paper qubits’ temperatures to 40–49 mK. So our quantum refrigerator outperformed our competitors, through the lens of temperature. (Our quantum refrigerator cooled more slowly than they did, though.)
Simone, José Antonio, and I have followed up on our autonomous quantum refrigerator with a forward-looking review about useful autonomous quantum machines. Keep an eye out for a blog post about the review…and for what we hope grows into a subfield.
In summary, yes, publishing a popular-science book can benefit one’s research.
Grant Sanderson (who runs, and creates most of the content for, the website and Youtube channel 3blue1brown) has been collaborating with myself and others (including my coauthor Tanya Klowden) on producing a two-part video giving an account of some of the history of the cosmic distance ladder, building upon a previous public lecture I gave on this topic, and also relating to a forthcoming popular book with Tanya on this topic. The first part of this video is available here; the second part is available here.
The videos were based on a somewhat unscripted interview that Grant conducted with me some months ago, and as such contained some minor inaccuracies and omissions (including some made for editing reasons to keep the overall narrative coherent and within a reasonable length). They also generated many good questions from the viewers of the Youtube video. I am therefore compiling here a “FAQ” of various clarifications and corrections to the videos; this was originally placed as a series of comments on the Youtube channel, but the blog post format here will be easier to maintain going forward. Some related content will also be posted on the Instagram page for the forthcoming book with Tanya.
Questions on the two main videos are marked with an appropriate timestamp to the video.
4:26Did Eratosthenes really check a local well in Alexandria?
This was a narrative embellishment on my part. Eratosthenes’s original work is lost to us. The most detailed contemperaneous account, by Cleomedes, gives a simplified version of the method, and makes reference only to sundials (gnomons) rather than wells. However, a secondary account of Pliny states (using this English translation), “Similarly it is reported that at the town of Syene, 5000 stades South of Alexandria, at noon in midsummer no shadow is cast, and that in a well made for the sake of testing this the light reaches to the bottom, clearly showing that the sun is vertically above that place at the time”. However, no mention is made of any well in Alexandria in either account.
4:50How did Eratosthenes know that the Sun was so far away that its light rays were close to parallel?
This was not made so clear in our discussions or in the video (other than a brief glimpse of the timeline at 18:27), but Eratosthenes’s work actually came after Aristarchus, so it is very likely that Eratosthenes was aware of Aristarchus’s conclusions about how distant the Sun was from the Earth. Even if Aristarchus’s heliocentric model was disputed by the other Greeks, at least some of his other conclusions appear to have attracted some support. Also, after Eratosthenes’s time, there was further work by Greek, Indian, and Islamic astronomers (such as Hipparchus, Ptolemy, Aryabhata, and Al-Battani) to measure the same distances that Aristarchus did, although these subsequent measurements for the Sun also were somewhat far from modern accepted values.
5:17Is it completely accurate to say that on the summer solstice, the Earth’s axis of rotation is tilted “directly towards the Sun”?
Strictly speaking, “in the direction towards the Sun” is more accurate than “directly towards the Sun”; it tilts at about 23.5 degrees towards the Sun, but it is not a total 90-degree tilt towards the Sun.
5:39Wait, aren’t there two tropics? The tropic of Cancer and the tropic of Capricorn?
Yes! This corresponds to the two summers Earth experiences, one in the Northern hemisphere and one in the Southern hemisphere. The tropic of Cancer, at a latitude of about 23 degrees north, is where the Sun is directly overhead at noon during the Northern summer solstice (around June 21); the tropic of Capricorn, at a latitude of about 23 degrees south, is where the Sun is directly overhead at noon during the Southern summer solstice (around December 21). But Alexandria and Syene were both in the Northern Hemisphere, so it is the tropic of Cancer that is relevant to Eratosthenes’ calculations.
5:41Isn’t it kind of a massive coincidence that Syene was on the tropic of Cancer?
Actually, Syene (now known as Aswan) was about half a degree of latitude away from the tropic of Cancer, which was one of the sources of inaccuracy in Eratosthenes’ calculations. But one should take the “look-elsewhere effect” into account: because the Nile cuts across the tropic of Cancer, it was quite likely to happen that the Nile would intersect the tropic near some inhabited town. It might not necessarily have been Syene, but that would just mean that Syene would have been substituted by this other town in Eratosthenes’s account.
On the other hand, it was fortunate that the Nile ran from South to North, so that distances between towns were a good proxy for the differences in latitude. Apparently, Eratosthenes actually had a more complicated argument that would also work if the two towns in question were not necessarily oriented along the North-South direction, and if neither town was on the tropic of Cancer; but unfortunately the original writings of Eratosthenes are lost to us, and we do not know the details of this more general argument. (But some variants of the method can be found in later work of Posidonius, Aryabhata, and others.)
Nowadays, the “Eratosthenes experiment” is run every year on the March equinox, in which schools at the same longitude are paired up to measure the elevation of the Sun at the same point in time, in order to obtain a measurement of the circumference of the Earth. (The equinox is more convenient than the solstice when neither location is on a tropic, due to the simple motion of the Sun at that date.) With modern timekeeping, communications, surveying, and navigation, this is a far easier task to accomplish today than it was in Eratosthenes’ time.
6:30I thought the Earth wasn’t a perfect sphere. Does this affect this calculation?
Yes, but only by a small amount. The centrifugal forces caused by the Earth’s rotation along its axis cause an equatorial bulge and a polar flattening so that the radius of the Earth fluctuates by about 20 kilometers from pole to equator. This sounds like a lot, but it is only about 0.3% of the mean Earth radius of 6371 km and is not the primary source of error in Eratosthenes’ calculations.
7:27Are the riverboat merchants and the “grad student” the leading theories for how Eratosthenes measured the distance from Alexandria to Syene?
There is some recent research that suggests that Eratosthenes may have drawn on the work of professional bematists (step measurers – a precursor to the modern profession of surveyor) for this calculation. This somewhat ruins the “grad student” joke, but perhaps should be disclosed for the sake of completeness.
8:51How long is a “lunar month” in this context? Is it really 28 days?
In this context the correct notion of a lunar month is a “synodic month” – the length of a lunar cycle relative to the Sun – which is actually about 29 days and 12 hours. It differs from the “sidereal month” – the length of a lunar cycle relative to the fixed stars – which is about 27 days and 8 hours – due to the motion of the Earth around the Sun (or the Sun around the Earth, in the geocentric model). [A similar correction needs to be made around 14:59, using the synodic month of 29 days and 12 hours rather than the “English lunar month” of 28 days (4 weeks).]
10:47Is the time taken for the Moon to complete an observed rotation around the Earth slightly less than 24 hours as claimed?
Actually, I made a sign error: the lunar day (also known as a tidal day) is actually 24 hours and 50 minutes, because the Moon rotates in the same direction as the spinning of Earth around its axis. The animation therefore is also moving in the wrong direction as well (related to this, the line of sight is covering up the Moon in the wrong direction to the Moon rising at around 10:38).
11:32Is this really just a coincidence that the Moon and Sun have almost the same angular width?
I believe so. First of all, the agreement is not that good: due to the non-circular nature of the orbit of the Moon around the Earth, and Earth around the Sun, the angular width of the Moon actually fluctuates to be as much as 10% larger or smaller than the Sun at various times (cf. the “supermoon” phenomenon). All other known planets with known moons do not exhibit this sort of agreement, so there does not appear to be any universal law of nature that would enforce this coincidence. (This is in contrast with the empirical fact that the Moon always presents the same side to the Earth, which occurs in all other known large moons (as well as Pluto), and is well explained by the physical phenomenon of tidal locking.)
On the other hand, as the video hopefully demonstrates, the existence of the Moon was extremely helpful in allowing the ancients to understand the basic nature of the solar system. Without the Moon, their task would have been significantly more difficult; but in this hypothetical alternate universe, it is likely that modern cosmology would have still become possible once advanced technology such as telescopes, spaceflight, and computers became available, especially when combined with the modern mathematics of data science. Without giving away too many spoilers, a scenario similar to this was explored in the classic short story and novel “Nightfall” by Isaac Asimov.
12:58Isn’t the illuminated portion of the Moon, as well as the visible portion of the Moon, slightly smaller than half of the entire Moon, because the Earth and Sun are not an infinite distance away from the Moon?
Technically yes (and this is actually for a very similar reason to why half Moons don’t quite occur halfway between the new Moon and the full Moon); but this fact turns out to have only a very small effect on the calculations, and is not the major source of error. In reality, the Sun turns out to be about 86,000 Moon radii away from the Moon, so asserting that half of the Moon is illuminated by the Sun is actually a very good first approximation. (The Earth is “only” about 220 Moon radii away, so the visible portion of the Moon is a bit more noticeably less than half; but this doesn’t actually affect Aristarchus’s arguments much.)
The angular diameter of the Sun also creates an additional thin band between the fully illuminated and fully non-illuminated portions of the Moon, in which the Sun is intersecting the lunar horizon and so only illuminates the Moon with a portion of its light, but this is also a relatively minor effect (and the midpoints of this band can still be used to define the terminator between illuminated and non-illuminated for the purposes of Aristarchus’s arguments).
13:27What is the difference between a half Moon and a quarter Moon?
If one divides the lunar month, starting and ending at a new Moon, into quarters (weeks), then half moons occur both near the end of the first quarter (a week after the new Moon, and a week before the full Moon), and near the end of the third quarter (a week after the full Moon, and a week before the new Moon). So, somewhat confusingly, half Moons come in two types, known as “first quarter Moons” and “third quarter Moons”.
14:49I thought the sine function was introduced well after the ancient Greeks.
It’s true that the modern sine function only dates back to the Indian and Islamic mathematical traditions in the first millennium CE, several centuries after Aristarchus. However, he still had Euclidean geometry at his disposal, which provided tools such as similar triangles that could be used to reach basically the same conclusions, albeit with significantly more effort than would be needed if one could use modern trigonometry.
On the other hand, Aristarchus was somewhat hampered by not knowing an accurate value for , which is also known as Archimedes’ constant: the fundamental work of Archimedes on this constant actually took place a few decades after that of Aristarchus!
15:17I plugged in the modern values for the distances to the Sun and Moon and got 18 minutes for the discrepancy, instead of half an hour.
Yes; I quoted the wrong number here. In 1630, Godfried Wendelen replicated Aristarchus’s experiment. With improved timekeeping and the then-recent invention of the telescope, Wendelen obtained a measurement of half an hour for the discrepancy, which is significantly better than Aristarchus’s calculation of six hours, but still a little bit off from the true value of 18 minutes. (As such, Wendelinus’s estimate for the distance to the Sun was 60% of the true value.)
15:27Wouldn’t Aristarchus also have access to other timekeeping devices than sundials?
Yes, for instance clepsydrae (water clocks) were available by that time; but they were of limited accuracy. It is also possible that Aristarchus could have used measurements of star elevations to also estimate time; it is not clear whether the astrolabe or the armillary sphere was available to him, but he would have had some other more primitive astronomical instruments such as the dioptra at his disposal. But again, the accuracy and calibration of these timekeeping tools would have been poor.
However, most likely the more important limiting factor was the ability to determine the precise moment at which a perfect half Moon (or new Moon, or full Moon) occurs; this is extremely difficult to do with the naked eye. (The telescope would not be invented for almost two more millennia.)
17:37Could the parallax problem be solved by assuming that the stars are not distributed in a three-dimensional space, but instead on a celestial sphere?
Putting all the stars on a fixed sphere would make the parallax effects less visible, as the stars in a given portion of the sky would now all move together at the same apparent velocity – but there would still be visible large-scale distortions in the shape of the constellations because the Earth would be closer to some portions of the celestial sphere than others; there would also be variability in the brightness of the stars, and (if they were very close) the apparent angular diameter of the stars. (These problems would be solved if the celestial sphere was somehow centered around the moving Earth rather than the fixed Sun, but then this basically becomes the geocentric model with extra steps.)
18:29Did nothing of note happen in astronomy between Eratosthenes and Copernicus?
Not at all! There were significant mathematical, technological, theoretical, and observational advances by astronomers from many cultures (Greek, Islamic, Indian, Chinese, European, and others) during this time, for instance improving some of the previous measurements on the distance ladder, a better understanding of eclipses, axial tilt, and even axial precession, more sophisticated trigonometry, and the development of new astronomical tools such as the astrolabe. See for instance this “deleted scene” from the video, as well as the FAQ entry for 14:49 for this video and 24:54 for the second video, or this instagram post. But in order to make the overall story of the cosmic distance ladder fit into a two-part video, we chose to focus primarily on the first time each rung of the ladder was climbed.
We have since learned that this portrait was most likely painted in the 19th century, and may have been based more on Kepler’s mentor, Michael Mästlin. A more commonly accepted portrait of Kepler may be found at his current Wikipedia page.
19:07Isn’t it tautological to say that the Earth takes one year to perform a full orbit around the Sun?
Technically yes, but this is an illustration of the philosophical concept of “referential opacity“: the content of a sentence can change when substituting one term for another (e.g., “1 year” and “365 days”), even when both terms refer to the same object. Amusingly, the classic illustration of this, known as Frege’s puzzles, also comes from astronomy: it is an informative statement that Hesperus (the evening star) and Phosphorus (the morning star, also known as Lucifer) are the same object (which nowadays we call Venus), but it is a mere tautology that Hesperus and Hesperus are the same object: changing the reference from Phosphorus to Hesperus changes the meaning.
19:10How did Copernicus figure out the crucial fact that Mars takes 687 days to go around the Sun? Was it directly drawn from Babylonian data?
Technically, Copernicus drew from tables by European astronomers that were largely based on earlier tables from the Islamic golden age, which in turn drew from earlier tables by Indian and Greek astronomers, the latter of which also incorporated data from the ancient Babylonians, so it is more accurate to say that Copernicus relied on centuries of data, at least some of which went all the way back to the Babylonians. Among all of this data was the times when Mars was in opposition to the Sun; if one imagines the Earth and Mars as being like runners going around a race track circling the Sun, with Earth on an inner track and Mars on an outer track, oppositions are analogous to when the Earth runner “laps” the Mars runner. From the centuries of observational data, such “laps” were known to occur about once every 780 days (this is known as the synodic period of Mars). Because the Earth takes roughly 365 days to perform a “lap”, it is possible to do a little math and conclude that Mars must therefore complete its own “lap” in 687 days (this is known as the sidereal period of Mars). (See also this post on the cosmic distance ladder Instagram for some further elaboration.)
The situation is complex. When Kepler served as Brahe’s assistant, Brahe only provided Kepler with a limited amount of data, primarily involving Mars, in order to confirm Brahe’s own geo-heliocentric model. After Brahe’s death, the data was inherited by Brahe’s son-in-law and other relatives, who intended to publish Brahe’s work separately; however, Kepler, who was appointed as Imperial Mathematician to succeed Brahe, had at least some partial access to the data, and many historians believe he secretly copied portions of this data to aid his own research before finally securing complete access to the data from Brahe’s heirs after several years of disputes. On the other hand, as intellectual property rights laws were not well developed at this time, Kepler’s actions were technically legal, if ethically questionable.
21:39What is that funny loop in the orbit of Mars?
This is known as retrograde motion. This arises because the orbital velocity of Earth (about 30 km/sec) is a little bit larger than that of Mars (about 24 km/sec). So, in opposition (when Mars is in the opposite position in the sky than the Sun), Earth will briefly overtake Mars, causing its observed position to move westward rather than eastward. But in most other times, the motion of Earth and Mars are at a sufficient angle that Mars will continue its apparent eastward motion despite the slightly faster speed of the Earth.
21:59Couldn’t one also work out the direction to other celestial objects in addition to the Sun and Mars, such as the stars, the Moon, or the other planets? Would that have helped?
Actually, the directions to the fixed stars were implicitly used in all of these observations to determine how the celestial sphere was positioned, and all the other directions were taken relative to that celestial sphere. (Otherwise, all the calculations would be taken on a rotating frame of reference in which the unknown orbits of the planets were themselves rotating, which would have been an even more complex task.) But the stars are too far away to be useful as one of the two landmarks to triangulate from, as they generate almost no parallax and so cannot distinguish one location from another.
Measuring the direction to the Moon would tell you which portion of the lunar cycle one was in, and would determine the phase of the Moon, but this information would not help one triangulate, because the Moon’s position in the heliocentric model varies over time in a somewhat complicated fashion, and is too tied to the motion of the Earth to be a useful “landmark” to one to determine the Earth’s orbit around the Sun.
In principle, using the measurements to all the planets at once could allow for some multidimensional analysis that would be more accurate than analyzing each of the planets separately, but this would require some sophisticated statistical analysis and modeling, as well as non-trivial amounts of compute – neither of which were available in Kepler’s time.
22:57Can you elaborate on how we know that the planets all move on a plane?
The Earth’s orbit lies in a plane known as the ecliptic (it is where the lunar and solar eclipses occur). Different cultures have divided up the ecliptic in various ways; in Western astrology, for instance, the twelve main constellations that cross the ecliptic are known as the Zodiac. The planets can be observed to only wander along the Zodiac, but not other constellations: for instance, Mars can be observed to be in Cancer or Libra, but never in Orion or Ursa Major. From this, one can conclude (as a first approximation, at least), that the planets all lie on the ecliptic.
However, this isn’t perfectly true, and the planets will deviate from the ecliptic by a small angle known as the ecliptic latitude. Tycho Brahe’s observations on these latitudes for Mars were an additional useful piece of data that helped Kepler complete his calculations (basically by suggesting how to join together the different “jigsaw pieces”), but the math here gets somewhat complicated, so the story here has been somewhat simplified to convey the main ideas.
23:04What are the other universal problem solving tips?
Grant Sanderson has a list (in a somewhat different order) in this previous video.
23:28Can one work out the position of Earth from fixed locations of the Sun and Mars when the Sun and Mars are in conjunction (the same location in the sky) or opposition (opposite locations in the sky)?
Technically, these are two times when the technique of triangulation fails to be accurate; and also in the former case it is extremely difficult to observe Mars due to the proximity to the Sun. But again, following the Universal Problem Solving Tip from 23:07, one should initially ignore these difficulties to locate a viable method, and correct for these issues later. This videoseries by Welch Labs goes into Kepler’s methods in more detail.
24:04So Kepler used Copernicus’s calculation of 687 days for the period of Mars. But didn’t Kepler discard Copernicus’s theory of circular orbits?
Good question! It turns out that Copernicus’s calculations of orbital periods are quite robust (especially with centuries of data), and continue to work even when the orbits are not perfectly circular. But even if the calculations did depend on the circular orbit hypothesis, it would have been possible to use the Copernican model as a first approximation for the period, in order to get a better, but still approximate, description of the orbits of the planets. This in turn can be fed back into the Copernican calculations to give a second approximation to the period, which can then give a further refinement of the orbits. Thanks to the branch of mathematics known as perturbation theory, one can often make this type of iterative process converge to an exact answer, with the error in each successive approximation being smaller than the previous one. (But performing such an iteration would probably have been beyond the computational resources available in Kepler’s time; also, the foundations of perturbation theory require calculus, which only was developed several decades after Kepler.)
24:21Did Brahe have exactly 10 years of data on Mars’s positions?
Actually, it was more like 17 years, but with many gaps, due both to inclement weather, as well as Brahe turning his attention to other astronomical objects than Mars in some years; also, in times of conjunction, Mars might only be visible in the daytime sky instead of the night sky, again complicating measurements. So the “jigsaw puzzle pieces” in 25:26 are in fact more complicated than always just five locations equally spaced in time; there are gaps and also observational errors to grapple with. But to understand the method one should ignore these complications; again, see “Universal Problem Solving Tip #1”. Even with his “idea of true genius”, it took many years of further painstaking calculation for Kepler to tease out his laws of planetary motion from Brahe’s messy and incomplete observational data.
26:44Shouldn’t the Earth’s orbit be spread out at perihelion and clustered closer together at aphelion, to be consistent with Kepler’s laws?
Yes, you are right; there was a coding error here.
26:53What is the reference for Einstein’s “idea of pure genius”?
Actually, the precise quote was “an idea of true genius”, and can be found in the introduction to Carola Baumgardt’s “Life of Kepler“.
Strictly speaking; no; his writings are all in Arabic, and he was nominally a subject of the Abbasid Caliphate whose rulers were Arab; but he was born in Khwarazm (in modern day Uzbekistan), and would have been a subject of either the Samanid empire or the Khrawazmian empire, both of which were largely self-governed and primarily Persian in culture and ethnic makeup, despite being technically vassals of the Caliphate. So he would have been part of what is sometimes called “Greater Persia” or “Greater Iran”.
Another minor correction: while Al-Biruni was born in the tenth century, his work on the measurement of the Earth was published in the early eleventh century.
Is really called the angle of declination?
This was a misnomer on my part; this angle is more commonly called the dip angle.
But the height of the mountain would be so small compared to the radius of the Earth! How could this method work?
Using the Taylor approximation , one can approximately write the relationship between the mountain height , the Earth radius , and the dip angle (in radians) as . The key point here is the inverse quadratic dependence on , which allows for even relatively small values of to still be realistically useful for computing . Al-Biruni’s measurement of the dip angle was about radians, leading to an estimate of that is about four orders of magnitude larger than , which is within ballpark at least of a typical height of a mountain (on the order of a kilometer) and the radius of the Earth (6400 kilometers).
Was the method really accurate to within a percentage point?
This is disputed, somewhat similarly to the previous calculations of Eratosthenes. Al-Biruni’s measurements were in cubits, but there were multiple incompatible types of cubit in use at the time. It has also been pointed out that atmospheric refraction effects would have created noticeable changes in the observed dip angle . It is thus likely that the true accuracy of Al-Biruni’s method was poorer than 1%, but that this was somehow compensated for by choosing a favorable conversion between cubits and modern units.
1:13Did Captain Cook set out to discover Australia?
One of the objectives of Cook’s first voyage was to discover the hypothetical continent of Terra Australis. This was considered to be distinct from Australia, which at the time was known as New Holland. As this name might suggest, prior to Cook’s voyage, the northwest coastline of New Holland had been explored by the Dutch; Cook instead explored the eastern coastline, naming this portion New South Wales. The entire continent was later renamed to Australia by the British government, following a suggestion of Matthew Flinders; and the concept of Terra Australis was abandoned.
4:40The relative position of the Northern and Southern hemisphere observations is reversed from those earlier in the video.
Yes, this was a slight error in the animation; the labels here should be swapped for consistency of orientation.
7:06So, when did they finally manage to measure the transit of Venus, and use this to compute the astronomical unit?
While Le Gentil had the misfortune to not be able to measure either the 1761 or 1769 transits, other expeditions of astronomers (led by Dixon-Mason, Chappe d’Auteroche, and Cook) did take measurements of one or both of these transits with varying degrees of success, with the measurements of Cook’s team of the 1769 transit in Tahiti being of particularly high quality. All of this data was assembled later by Lalande in 1771, leading to the most accurate measurement of the astronomical unit at the time (within 2.3% of modern values, which was about three times more accurate than any previous measurement).
8:53What does it mean for the transit of Io to be “twenty minutes ahead of schedule” when Jupiter is in opposition (Jupiter is opposite to the Sun when viewed from the Earth)?
Actually, it should be halved to “ten minutes ahead of schedule”, with the transit being “ten minutes behind schedule” when Jupiter is in conjunction, with the net discrepancy being twenty minutes (or actually closer to 16 minutes when measured with modern technology). Both transits are being compared against an idealized periodic schedule in which the transits are occuring at a perfectly regular rate (about 42 hours), where the period is chosen to be the best fit to the actual data. This discrepancy is only noticeable after carefully comparing transit times over a period of months; at any given position of Jupiter, the Doppler effects of Earth moving towards or away from Jupiter would only affect shift each transit by just a few seconds compared to the previous transit, with the delays or accelerations only becoming cumulatively noticeable after many such transits.
Also, the presentation here is oversimplified: at times of conjunction, Jupiter and Io are too close to the Sun for observation of the transit. Rømer actually observed the transits at other times than conjunction, and Huygens used more complicated trigonometry than what was presented here to infer a measurement for the speed of light in terms of the astronomical unit (which they had begun to measure a bit more accurately than in Aristarchus’s time; see the FAQ entry for 15:17 in the first video).
10:05Are the astrological signs for Earth and Venus swapped here?
Yes, this was a small mistake in the animation.
10:34Shouldn’t one have to account for the elliptical orbit of the Earth, as well as the proper motion of the star being observed, or the effects of general relativity?
Yes; the presentation given here is a simplified one to convey the idea of the method, but in the most advanced parallax measurements, such as the ones taken by the Hipparcos and Gaia spacecraft, these factors are taken into account, basically by taking as many measurements (not just two) as possible of a single star, and locating the best fit of that data to a multi-parameter model that incorporates the (known) orbit of the Earth with the (unknown) distance and motion of the star, as well as additional gravitational effects from other celestial bodies, such as the Sun and other planets.
14:53The formula I was taught for apparent magnitude of stars looks a bit different from the one here.
This is because astronomers use a logarithmic scale to measure both apparent magnitude and absolute magnitude. If one takes the logarithm of the inverse square law in the video, and performs the normalizations used by astronomers to define magnitude, one arrives at the standard relation between absolute and apparent magnitude.
But this is an oversimplification, most notably due to neglect of the effects of extinction effects caused by interstellar dust. This is not a major issue for the relatively short distances observable via parallax, but causes problems at larger scales of the ladder (see for instance the FAQ entry here for 18:08). To compensate for this, one can work in multiple frequencies of the spectrum (visible, x-ray, radio, etc.), as some frequencies are less susceptible to extinction than others. From the discrepancies between these frequencies one can infer the amount of extinction, leading to “dust maps” that can then be used to facilitate such corrections for subsequent measurements in the same area of the universe. (More generally, the trend in modern astronomy is towards “multi-messenger astronomy” in which one combines together very different types of measurements of the same object to obtain a more accurate understanding of that object and its surroundings.)
18:08Can we really measure the entire Milky Way with this method?
Strictly speaking, there is a “zone of avoidance” on the far side of the Milky way that is very difficult to measure in the visible portion of the spectrum, due to the large amount of intervening stars, dust, and even a supermassive black hole in the galactic center. However, in recent years it has become possible to explore this zone to some extent using the radio, infrared, and x-ray portions of the spectrum, which are less affected by these factors.
18:19How did astronomers know that the Milky Way was only a small portion of the entire universe?
This issue was the topic of the “Great Debate” in the early twentieth century. It was only with the work of Hubble using Leavitt’s law to measure distances to Magellanic clouds and “spiral nebulae” (that we now know to be other galaxies), building on earlier work of Leavitt and Hertzsprung, that it was conclusively established that these clouds and nebulae in fact were at much greater distances than the diameter of the Milky Way.
18:45How can one compensate for light blending effects when measuring the apparent magnitude of Cepheids?
This is a non-trivial task, especially if one demands a high level of accuracy. Using the highest resolution telescopes available (such as HST or JWST) is of course helpful, as is switching to other frequencies, such as near-infrared, where Cepheids are even brighter relative to nearby non-Cepheid stars. One can also apply sophisticated statistical methods to fit to models of the point spread of light from unwanted sources, and use nearby measurements of the same galaxy without the Cepheid as a reference to help calibrate those models. Improving the accuracy of the Cepheid portion of the distance ladder is an ongoing research activity in modern astronomy.
18:54What is the mechanism that causes Cepheids to oscillate?
For most stars, there is an equilibrium size: if the star’s radius collapses, then the reduced potential energy is converted to heat, creating pressure to pushing the star outward again; and conversely, if the star expands, then it cools, causing a reduction in pressure that no longer counteracts gravitational forces. But for Cepheids, there is an additional mechanism called the kappa mechanism: the increased temperature caused by contraction increases ionization of helium, which drains energy from the star and accelerates the contraction; conversely, the cooling caused by expansion causes the ionized helium to recombine, with the energy released accelerating the expansion. If the parameters of the Cepheid are in a certain “instability strip”, then the interaction of the kappa mechanism with the other mechanisms of stellar dynamics create a periodic oscillation in the Cepheid’s radius, which increases with the mass and brightness of the Cepheid.
For a recent re-analysis of Leavitt’s original Cepheid data, see this paper.
19:10Did Leavitt mainly study the Cepheids in our own galaxy?
This was an inaccuracy in the presentation. Leavitt’s original breakthrough paper studied Cepheids in the Small Magellanic Cloud. At the time, the distance to this cloud was not known; indeed, it was a matter of debate whether this cloud was in the Milky Way, or some distance away from it. However, Leavitt (correctly) assumed that all the Cepheids in this cloud were roughly the same distance away from our solar system, so that the apparent brightness was proportional to the absolute brightness. This gave an uncalibrated form of Leavitt’s law between absolute brightness and period, subject to the (then unknown) distance to the Small Magellanic Cloud. After Leavitt’s work, there were several efforts (by Hertzsprung, Russell, and Shapley) to calibrate the law by using the few Cepheids for which other distance methods were available, such as parallax. (Main sequence fitting to the Hertzsprung-Russell diagram was not directly usable, as Cepheids did not lie on the main sequence; but in some cases one could indirectly use this method if the Cepheid was in the same stellar cluster as a main sequence star.) Once the law was calibrated, it could be used to measure distances to other Cepheids, and in particular to compute distances to extragalactic objects such as the Magellanic clouds.
19:15Was Leavitt’s law really a linear law between period and luminosity?
Strictly speaking, the period-luminosity relation commonly known as Leavitt’s law was a linear relation between the absolute magnitude of the Cepheid and the logarithm of the period; undoing the logarithms, this becomes a power law between the luminosity and the period.
20:26Was Hubble the one to discover the redshift of galaxies?
This was an error on my part; Hubble was using earlier work of Vesto Slipher on these redshifts, and combining it with his own measurements of distances using Leavitt’s law to arrive at the law that now bears his name; he was also assisted in his observations by Milton Humason. It should also be noted that Georges Lemaître had also independently arrived at essentially the same law a few years prior, but his work was published in a somewhat obscure journal and did not receive broad recognition until some time later.
20:37Hubble’s original graph doesn’t look like a very good fit to a linear law.
Hubble’s original data was somewhat noisy and inaccurate by modern standards, and the redshifts were affected by the peculiar velocities of individual galaxies in addition to the expanding nature of the universe. However, as the data was extended to more galaxies, it became increasingly possible to compensate for these effects and obtain a much tighter fit, particularly at larger scales where the effects of peculiar velocity are less significant. See for instance this article from 2015 where Hubble’s original graph is compared with a more modern graph. This more recent graph also reveals a slight nonlinear correction to Hubble’s law at very large scales that has led to the remarkable discovery that the expansion of the universe is in fact accelerating over time, a phenomenon that is attributed to a positive cosmological constant (or perhaps a more complex form of dark energy in the universe). On the other hand, even with this nonlinear correction, there continues to be a roughly 10% discrepancy of this law with predictions based primarily on the cosmic microwave background radiation; see the FAQ entry for 23:49.
20:46Does general relativity alone predict an uniformly expanding universe?
This was an oversimplification. Einstein’s equations of general relativity contain a parameter , known as the cosmological constant, which currently is only computable indirectly from fitting to experimental data. But even with this constant fixed, there are multiple solutions to these equations (basically because there are multiple possible initial conditions for the universe). For the purposes of cosmology, a particularly successful family of solutions are the solutions given by the Lambda-CDM model. This family of solutions contains additional parameters, such as the density of dark matter in the universe. Depending on the precise values of these parameters, the universe could be expanding or contracting, with the rate of expansion or contraction either increasing, decreasing, or staying roughly constant. But if one fits this model to all available data (including not just red shift measurements, but also measurements on the cosmic microwave background radiation and the spatial distribution of galaxies), one deduces a version of Hubble’s law which is nearly linear, but with an additional correction at very large scales; see the next item of this FAQ.
21:07Is Hubble’s original law sufficiently accurate to allow for good measurements of distances at the scale of the observable universe?
Not really; as mentioned in the end of the video, there were additional efforts to cross-check and calibrate Hubble’s law at intermediate scales between the range of Cepheid methods (about 100 million light years) and observable universe scales (about 100 billion light years) by using further “standard candles” than Cepheids, most notably Type Ia supernovae (which are bright enough and predictable enough to be usable out to about 10 billion light years), the Tully-Fisher relation between the luminosity of a galaxy and its rotational speed, and gamma ray bursts. It turns out that due to the accelerating nature of the universe’s expansion, Hubble’s law is not completely linear at these large scales; this important correction cannot be discerned purely from Cepheid data, but also requires the other standard candles, as well as fitting that data (as well as other observational data, such as the cosmic microwave background radiation) to the cosmological models provided by general relativity (with the best fitting models to date being some version of the Lambda-CDM model).
On the other hand, a naive linear extrapolation of Hubble’s original law to all larger scales does provide a very rough picture of the observable universe which, while too inaccurate for cutting edge research in astronomy, does give some general idea of its large-scale structure.
21:15Where did this guess of the observable universe being about 20% of the full universe come from?
There are some ways to get a lower bound on the size of the entire universe that go beyond the edge of the observable universe. One is through analysis of the cosmic microwave background radiation (CMB), that has been carefully mapped out by several satellite observatories, most notably WMAP and Planck. Roughly speaking, a universe that was less than twice the size of the observable universe would create certain periodicities in the CMB data; such periodicities are not observed, so this provides a lower bound (see for instance this paper for an example of such a calculation). The 20% number was a guess based on my vague recollection of these works, but there is no consensus currently on what the ratio truly is; there are some proposals that the entire universe is in fact several orders of magnitude larger than the observable one.
The situation is somewhat analogous to Aristarchus’s measurement of the distance to the Sun, which was very sensitive to a small angle (the half-moon discrepancy). Here, the predicted size of the universe under the standard cosmological model is similarly dependent in a highly sensitive fashion on a measure of the flatness of the universe which, for reasons still not fully understood (but likely caused by some sort of inflation mechanism), happens to be extremely close to zero. As such, predictions for the size of the universe remain highly volatile at the current level of measurement accuracy.
23:44Was it a black hole collision that allowed for an independent measurement of Hubble’s law?
This was a slight error in the presentation. While the first gravitational wave observation by LIGO in 2015 was of a black hole collision, it did not come with an electromagnetic counterpart that allowed for a redshift calculation that would yield a Hubble’s law measurement. However, a later collision of neutron stars, observed in 2017, did come with an associated kilonova in which a redshift was calculated, and led to a Hubble measurement which was independent of most of the rungs of the distance ladder.
23:49Where can I learn more about this 10% discrepancy in Hubble’s law?
This is known as the Hubble tension (or, in more sensational media, the “crisis in cosmology”): roughly speaking, the various measurements of Hubble’s constant (either from climbing the cosmic distance ladder, or by fitting various observational data to standard cosmological models) tend to arrive at one of two values, that are about 10% apart from each other. The values based on gravitational wave observations are currently consistent with both values, due to significant error bars in this extremely sensitive method; but other more mature methods are now of sufficient accuracy that they are basically only consistent with one of the two values. Currently there is no consensus on the origin of this tension: possibilities include systemic biases in the observational data, subtle statistical issues with the methodology used to interpret the data, a correction to the standard cosmological model, the influence of some previously undiscovered law of physics, or some partial breakdown of the Copernican principle.
For an accessible recent summary of the situation, see this video by Becky Smethurst (“Dr. Becky”).
24:49So, what is a Type Ia supernova and why is it so useful in the distance ladder?
A Type Ia supernova occurs when a white dwarf in a binary system draws more and more mass from its companion star, until it reaches the Chandrasekhar limit, at which point its gravitational forces are strong enough to cause a collapse that increases the pressure to the point where a supernova is triggered via a process known as carbon detonation. Because of the universal nature of the Chandrasekhar limit, all such supernovae have (as a first approximation) the same absolute brightness and can thus be used as standard candles in a similar fashion to Cepheids (but without the need to first measure any auxiliary observable, such as a period). But these supernovae are also far brighter than Cepheids, and can so this method can be used at significantly larger distances than the Cepheid method (roughly speaking it can handle distances of ~100 billion light years, whereas Cepheids are reliable out to ~10 billion light years). Among other things, the supernovae measurements were the key to detecting an important nonlinear correction to Hubble’s law at these scales, leading to the remarkable conclusion that the expansion of the universe is in fact accelerating over time, which in the Lambda-CDM model corresponds to a positive cosmological constant, though there are more complex “dark energy” models that are also proposed to explain this acceleration.
This is partly due to time constraints, and the need for editing to tighten the narrative, but was also a conscious decision on my part. Advanced classes on the distance ladder will naturally focus on the most modern, sophisticated, and precise ways to measure distances, backed up by the latest mathematics, physics, technology, observational data, and cosmological models. However, the focus in this video series was rather different; we sought to portray the cosmic distance ladder as evolving in a fully synergestic way, across many historical eras, with the evolution of mathematics, science, and technology, as opposed to being a mere byproduct of the current state of these other disciplines. As one specific consequence of this change of focus, we emphasized the first time any rung of the distance ladder was achieved, at the expense of more accurate and sophisticated later measurements at that rung. For instance, refinements in the measurement of the radius of the Earth since Eratosthenes, improvements in the measurement of the astronomical unit between Aristarchus and Cook, or the refinements of Hubble’s law and the cosmological model of the universe in the twentieth and twenty-first centuries, were largely omitted (though some of the answers in this FAQ are intended to address these omissions).
Many of the topics not covered here (or only given a simplified treatment) are discussed in depth in other expositions, including other Youtube videos. I would welcome suggestions from readers for links to such resources in the comments to this post. Here is a partial list:
“Eratosthenes” – Cosmos (Carl Sagan), video posted Apr 24, 2009 (originally released Oct 1, 1980, as part of the episode “The Shores of the Cosmic Ocean”).
“How Far Away Is It” – David Butler, a multi-part series beginning Aug 16 2013.
Vaire works on reversible computing, an idea that tries to leverage thermodynamics to make a computer that wastes as little heat as possible. While I learned a lot of fun things that didn’t make it into the piece…I’m not going to tell you them this week! That’s because I’m working on another piece about reversible computing, focused on a different aspect of the field. When that piece is out I’ll have a big “bonus material post” talking about what I learned writing both pieces.
This week, instead, the bonus material is about FirstPrinciples.org itself, where you’ll be seeing me write more often in future. The First Principles Foundation was founded by Ildar Shar, a Canadian tech entrepreneur who thinks that physics is pretty cool. (Good taste that!) His foundation aims to support scientific progress, especially in addressing the big, fundamental questions. They give grants, analyze research trends, build scientific productivity tools…and most relevantly for me, publish science news on their website, in a section called the Hub.
The first time I glanced through the Hub, it was clear that FirstPrinciples and I have a lot in common. Like me, they’re interested both in scientific accomplishments and in the human infrastructure that makes them possible. They’ve interviewed figures in the open access movement, like the creators of arXiv and SciPost. On the science side, they mix coverage of the mainstream and reputable with outsiders challenging the status quo, and hot news topics with explainers of key concepts. They’re still new, and still figuring out what they want to be. But from my glimpse on the way, it looks like they’re going somewhere good.
The quantum double-slit experiment, in which objects are sent toward and through a pair of slits in a wall,and are recorded on a screen behind the slits, clearly shows an interference pattern. It’s natural to ask, “where does the interference occur?”
The problem is that there is a hidden assumption in this way of framing the question — a very natural assumption, based on our experience with waves in water or in sound. In those cases, we can explicitly see (Fig. 1) how interference builds up between the slits and the screen.
Figure 1: How water waves or sound waves interfere after passing through two slits.
But when we dig deep into quantum physics, this way of thinking runs into trouble. Asking “where” is not as straightforward as it seems. In the next post we’ll see why. Today we’ll lay the groundwork.
Figure 2: A pre-quantum view of a superposition in which particle 1 is moving left OR right, and particle 2 is stationary at x=3.
Here we have
particle 1 going from left to right, with particle 2 stationary at x=+3, OR
particle 1 going from right to left, with particle 2 again stationary at x=+3.
In Fig. 3 is what the wave function Ψ(x1,x2) [where x1 is the position of particle 1 and x2 is the position of particle 2] looks like when its absolute-value squared is graphed on the space of possibilities. Both peaks have x2=+3, representing the fact that particle 2 is stationary. They move in opposite directions and pass through each other horizontally as particle 1 moves to the right OR to the left.
Figure 3: The graph of the absolute-value-squared of the wave function for the quantum version of the system in Fig. 2.
This looks remarkably similar to what we would have if particle 2 weren’t there at all! The interference fringes run parallel to the x2 axis, meaning the locations of the interference peaks and valleys depend on x1 but not on x2. In fact, if we measure particle 1, ignoring particle 2, we’ll see the same interference pattern that we see when a single particle is in the superposition of Fig. 1 with particle 2 removed (Fig. 4):
Figure 4a: The square of the absolute value of the wave function for a particle in a superposition of the form shown in Fig. 2 but with the second particle removed.
Figure 4b: A closeup of the interference pattern that occurs at the moment when the two peaks in Fig. 4a perfectly overlap. The real and imaginary parts of the wave function are shown in red and blue, while its square is drawn in black.
We can confirm this in a simple way. If we measure the position of particle 1, ignoring particle 2, the probability of finding that particle at a specific position x1 is given by projecting the wave function, shown above as a function of x1 and x2, onto the x1 axis. [More mathematically, this is done by integrating over x2 to leave a function of x1only.] Sometimes (not always!) this is essentially equivalent to viewing the graph of the wave function from one side, as in Figs. 5-6.
Figure 5: Projecting the wave function of Fig. 3, at the moment of maximum interference, onto the x1 axis. Compare with the black curve in Fig. 4b.
Because the interference ridges in Fig. 3 are parallel to the x2 axis and thus independent of particle 2’s exact position, we do indeed find, when we project onto the x1 axis as in Fig. 5, that the familiar interference pattern of Fig. 4b reappears.
Meanwhile, if at that same moment we measure particle 2’s position, we will find results centered around x2=+3, with no interference, as seen in Fig. 6 where we project the wave function of Fig. 3 onto the x2 axis.
Figure 6: Projecting the wave function of Fig. 3, at the moment of maximum interference, onto the x2 axis. The position of particle 2 is thus close to x2=3, with no interference pattern.
Why is this case so simple, with the one-particle case in Fig. 4 and the two-particle case in Figs. 3 and 5 so closely resembling each other?
The Cause
It has nothing specifically to do with the fact that particle 2 is stationary. Another example I gave had particle 2 stationary in both parts of the superposition, but located in two different places. In Figs. 7a and 7b, the pre-quantum version of that system is shown both in physical space and in the space of possibilities [where I have, for the first time, put stars for the two possibilities onto the same graph.]
Figure 7a: A similar system to that of Fig. 2, drawn in its pre-quantum version in physical space.
Figure 7b: Same as Fig. 7a, but drawn in the space of possibilities.
You can see that the two stars’ paths will not intersect, since one remains at x2=+3 and the other remains at x2=-3. Thus there should be no interference — and indeed, none is seen in Fig. 8, where the time evolution of the full quantum wave function is shown. The two peaks miss each other, and so no interference occurs.
Figure 8: The absolute-value-squared of the wave function corresponding to Figs. 7a-7b.
If we project the wave function of Fig. 8 onto the x1 axis at the moment when the two peaks are at x1=0, we see (Fig. 9) a single peak (because the two peaks, with different values of x2, are projected onto each other). No interference fringes are seen.
Figure 9: At the moment when the first particle is near x1=0, the probability of finding particle 1 as a function of x1 shows a featureless peak, with no interference effects.
Instead the resemblance between Figs. 3-5 has to do with the fact that particle 2 is doing exactly the same thing in each part of the superposition. For instance, as in Fig. 10, suppose particle 2 is moving to the left in both possibilities.
Figure 10: A system similar to that of Fig. 2, but with particle 2 (orange) moving to the left in both parts of the superposition.
(In the top possibility, particles 1 and 2 will encounter one another; but we have been assuming for simplicity that they don’t interact, so they can safely pass right through each other.)
The resulting wave function is shown in Fig. 11:
Figure 11: The absolute-value-squared of the wave function corresponding to Fig.10.
The two peaks cross paths when x1=0 and x2=2. The wave function again shows interference at that location, with fringes that are independent of x2. If we project the wave function onto the x1=0 axis, we’ll get exactly the same thing we saw in Fig. 5, even though the behavior of the wave function in x2 is different.
This makes the pattern clear: if, in each part of the superposition, particle 2 behaves identically, then particle 1 will be subject to the same pattern of interference as if particle 2 were absent. Said another way, if the behavior of particle 1 is independent of particle 2 (and vice versa), then any interference effects involving one particle will be as though the other particle wasn’t even there.
Said yet another way, the two particles in Figs. 2 and 10 are uncorrelated, meaning that we can understand what either particle is doing without having to know what the other is doing.
Importantly, the examples studied in the previous post did not have this feature. That’s crucial in understanding why the interference seen at the end of that post wasn’t so simple.
Independence and Factoring
What we are seeing in Figs. 2 and 10 has an analogy in algebra. If we have an algebraic expression such as
(a c + b c),
in which c is common to both terms, then we can factor it into
(a+b)c.
The same is true of the kinds of physical processes we’ve been looking at. In Fig. 10 the two particles’ behavior is uncorrelated, so we can “factor” the pre-quantum system as follows.
Figure 12: The “factored” form of the superposition in Fig. 10.
What we see here is that factoring involves an AND, while superposition is an OR: the figure above says that (particle 1 is moving from left to right OR from right to left) AND (particle 2 is moving from right to left, no matter what particle 1 is doing.)
And in the quantum context, if (and only if) two particles’ behaviors are completely uncorrelated, we can literally factor the wave function into a product of two functions, one for each particle:
Ψ(x1,x2)=Ψ1(x1)Ψ2(x2)
In this specific case of Fig. 12, where the first particle is in a superposition whose parts I’ve labeled A and B, we can write Ψ1(x1) as a sum of two terms:
Ψ1(x1)=ΨA(x1) + ΨB(x1)
Specifically, ΨA(x1) describes particle 1 moving left to right — giving one peak in Fig. 11 — and ΨB(x1) describes particle 2 moving right to left, giving the other peak.
But this kind of factoring is rare, and not possible in general. None of the examples in the previous post (or of this post, excepting that of its Fig. 5) can be factored. That’s because in these examples, the particles are correlated: the behavior of one depends on the behavior of the other.
Superposition AND Superposition
If the particles are truly uncorrelated, we should be able to put both particles into superpositions of two possibilities. As a pre-quantum system, that would give us (particle 1 in state AOR state B) AND (particle 2 in state COR state D) in Fig. 13.
Figure 13: The two particles are uncorrelated, and so their behavior can be factored. The first particle is in a superposition of states A and B, the second in a superposition of states C and D.
The corresponding factored wave function, in which (particle 1 moves left to right OR right to left) AND (particle 2 moves left to right OR right to left), can be written as a product of two superpositions:
whose pre-quantum version gives us the four possibilities shown in Fig. 14.
Figure 14: The product in Fig. 13 is expanded into its four distinct possibilities.
The wave function therefore has four peaks, one for each term. The wave function behaves as shown in Fig. 15.
Figure 15: The wave function for the system in Fig. 14 shows interference of two pairs of possibilities, first for particle 1 and later for particle 2.
The four peaks interfere in pairs. The top two and the bottom two interfere when particle 1 reaches x1=0, creating fringes that run parallel to the x2 axis and thus are independent of x2. Notice that even though there are two sets of interference fringes when particle 1 reaches x1=0 in all the superpositions, we do not observe this if we only measure particle 1. When we project the wave function onto the x1 axis, the two sets of interference fringes line up, and we see the same single-particle interference pattern that we’ve seen so many times (Figs. 3-5). That’s all because particles 1 and 2 are uncorrelated.
Figure 16: The first instance of interference, seen in two peaks in Fig. 15 is reduced, when projected on to the x1 axis, to the same interference pattern as seen in Figs. 3-5; the measurement of particle 1’s position will show the same interference pattern in each case, because particles 1 and 2 are uncorrelated.
If at the same moment we measure particle 2 ignoring particle 1, we find (Fig. 17) that particle 2 has equal probability of being near x=2.5 or x=-0.5, with no interference effects.
Figure 17: The first instance of interference, seen in two peaks in Fig. 15, shows two peaks with no interference when projected on to the x2 axis. Thus measurements of particle 2’s position show no interference at this moment.
Meanwhile, the left two and the right two peaks in Fig. 15 subsequently interfere when particle 2 reaches x2=1, creating fringes that run parallel to the x1 axis, and thus are independent of x1; these will show up near x=1 in measurements of particle 2’s position. This is shown (Fig. 18) by projecting the wave function at that moment onto the x2 axis.
Figure 18: During the second instance of interference in Fig. 15, the projection of the wave function onto the x2 axis.
Locating the Interference?
So far, in all these examples, it seems that we can say where the interference occurs in physical space. For instance, in this last example, it appears that particle 1 shows interference around x=0, and slightly later particle 2 shows interference around x=1.
But if we look back at the end of the last post, we can see that something is off. In the examples considered there, the particles are correlated and the wave function cannot be factored. And in the last example in Fig. 12 of that post, we saw interference patterns whose ridges are parallel neither to the x1 axis nor to the x2 axis. . .an effect that a factored wave function cannot produce. [Fun exercise: prove this last statement.]
As a result, projecting the wave function of that example onto the x1 axis hides the interference pattern, as shown in Fig. 19. The same is true when projecting onto the x2 axis.
Figure 19: Alhough Fig. 12 of the previous post shows an interference pattern, it is hidden when the wave function is projected onto the x1 axis, leaving only a boring bump. The observable consequences are shown in Fig. 13 of that same post.
Consequently, neither measurements of particle 1’s position nor measurements of particle 2’s position can reveal the interference effect. (This is shown, for particle 1, in the previous post’s Fig. 13.) This leaves it unclear where the interference is, or even how to measure it.
But in fact it can be measured, and next time we’ll see how. We’ll also see that in a general superposition, where the two particles are correlated, interference effects often cannot be said to have a location in physical space. And that will lead us to a first glimpse of one of the most shocking lessons of quantum physics.
One More Uncorrelated Example, Just for Fun
To close, I’ll leave you with one more uncorrelated example, merely because it looks cool. In pre-quantum language, the setup is shown in Fig. 20.
Figure 20: Another uncorrelated superposition with four possibilities.
Now all four peaks interfere simultaneously, near (x1,x2)=(1,-1).
Figure 21: The four peaks simultaneously interfere, generating a grid pattern.
The grid pattern in the interference assures that the usual interference effects can be seen for both particles at the same time, with the interference for particle 1 near x1=1 and that for particle 2 near x2=-1. Here are the projections onto the two axes at the moment of maximal interference.
Figure 22a: At the moment of maximum interference, the projection of the wave function onto the x1 axis shows interference near x1=1.
Figure 22b: At the moment of maximum interference, the projection of the wave function onto the x2 axis shows interference near x2=-1.
This is a bit of a shaggy dog story, but I think it’s fun. There’s also a moral about the nature of mathematical research.
Once I was interested in the McGee graph, nicely animated here by Mamouka Jibladze:
This is the unique (3,7)-cage, meaning a graph such that each vertex has 3 neighbors and the shortest cycle has length 7. Since it has a very symmetrical appearance, I hoped it would be connected to some interesting algebraic structures. But which?
I read on Wikipedia that the symmetry group of the McGee graph has order 32. Let’s call it the McGee group. Unfortunately there are many different 32-element groups — 51 of them, in fact! — and the article didn’t say which one this was. (It does now.)
and Gordon Royle said the McGee group is “not a super-interesting group, it is SmallGroup(32,43) in either GAP or Magma”. Knowing this let me look up the McGee group on this website, which is wonderfully useful if you’re studying finite groups:
There I learned that the McGee group is the so-called holomorph of the cyclic group : that is, the semidirect product of and its automorphism group:
I resisted getting sucked into the general study of holomorphs, or what happens when you iterate the holomorph construction. Instead, I wanted a more concrete description of the McGee group.
is not just an abelian group: it’s a ring! Since multiplication in a ring distributes over addition, we can get automorphisms of the group by multiplying by those elements that have multiplicative inverses. These invertible elements form a group
called the multiplicative group of . In fact these give all the automorphisms of the group .
In short, the McGee group is
This is very nice, because this is the group of all transformations of of the form
If we think of as a kind of line — called the ‘affine line over ’ — these are precisely all the affine transformations of this line. Thus, the McGee group deserves to be called
This suggests that we can build the McGee graph in some systematic way starting from the affine line over . This turns out to be a bit complicated, because the vertices come in two kinds. That is, the McGee group doesn’t act transitively on the set of vertices. Instead, it has two orbits, shown as red and blue dots here:
The 8 red vertices correspond straightforwardly to the 8 points of the affine line, but the 16 blue vertices are more tricky. There are also the edges to consider: these come in three kinds! Greg Egan figured out how this works, and I wrote it up:
About two weeks ago, I gave a Zoom talk at the Illustrating Math Seminar about some topics on my blog Visual Insight. I mentioned that the McGee group is SmallGroup(32,43) and the holomorph of . And then someone — alas, I forget who — instantly typed in the chat that this is one of the two smallest groups with an amazing property! Namely, this group has an outer automorphism that maps each element to an element conjugate to it.
I didn’t doubt this for a second. To paraphrase what Hardy said when he received Ramanujan’s first letter, nobody would have the balls to make up this shit. So, I posed a challenge to find such an exotic outer automorphism:
An automorphism is class-preserving if for each there exists some such that
If you can use the same for every we call an inner automorphism. But some groups have class-preserving automorphisms that are not inner! These are the class-preserving outer automorphisms.
I don’t know if class-preserving outer automorphisms are good for anything, or important in any way. They mainly just seem intriguingly spooky. An outer automorphism that looks inner if you examine its effect on any one group element is nothing I’d ever considered. So I wanted to see an example.
Rising to my challenge, Greg Egan found a nice explicit formula for some class-preserving outer automorphisms of the McGee group.
As we’ve seen, any element of the McGee group is a transformation
so let’s write it as a pair . Greg Egan looked for automorphisms of the McGee group that are of the form
for some function
It is easy to check that is an automorphism if and only if
Moreover, is an inner automorphism if and only if
for some .
Now comes something cool noticed by Joshua Grochow: these formulas are an instance of a general fact about group cohomology!
Suppose we have a group acting as automorphisms of an abelian group . Then we can define the cohomology to be the group of -cocycles modulo -coboundaries. We only need the case here. A 1-cocycle is none other than a function obeying
while a 1-coboundary is one of the form
for some . You can check that every 1-coboundary is a 1-cocycle. is the group of 1-cocycles modulo 1-coboundaries.
In this situation we can define the semidirect product , and for any we can define a function
by
Now suppose and suppose is abelian. Then by straightforward calculations we can check:
is an automorphism iff is a 1-cocycle
and
is an inner automorphism iff is a 1-coboundary!
Thus, will have outer automorphisms if .
When then is abelian and is the McGee group. This puts Egan’s idea into a nice context. But we still need to actually find maps that give outer automorphisms of the McGee group, and then find class-preserving ones. I don’t know how to do that using general ideas from cohomology. Maybe someone smart could do the first part, but the ‘class-preserving’ condition doesn’t seem to emerge naturally from cohomology.
Anyway, Egan didn’t waste his time with such effete generalities: he actually found all choices of for which
is a class-preserving outer automorphism of the McGee group. Namely:
Last Saturday after visiting my aunt in Santa Barbara I went to Berkeley to visit the applied category theorists at the Topos Institute. I took a train, to lessen my carbon footprint a bit. The trip took 9 hours — a long time, but a beautiful ride along the coast and then through forests and fields.
The day before taking the train, I discovered my laptop was no longer charging! So, I bought a pad of paper. And then, while riding the train, I checked by hand that Egan’s first choice of really is a cocycle, and really is not a coboundary, so that it defines an outer automorphism of the McGee group. Then — and this was fairly easy — I checked that it defines a class-preserving automorphism. It was quite enjoyable, since I hadn’t done any long calculations recently.
One moral here is that interesting ideas often arise from the interactions of many people. The results here are not profound, but they are certainly interesting, and they came from online conversations with Greg Egan, Gordon Royle, Joshua Grochow, the mysterious person who instantly knew that the McGee group was one of the two smallest groups with a class-preserving outer automorphism, and others.
But what does it all mean, mathematically? Is there something deeper going on here, or is it all just a pile of curiosities?
What did we actually do, in the end? Following the order of logic rather than history, maybe this. We started with a commutative ring , took its group of affine transformations , and saw this group must have outer automorphisms if
We saw this cohomology group really is nonvanishing when and . Furthermore, we found a class-preserving outer automorphism of .
This raises a few questions:
What is the cohomology in general?
What are the outer automorphisms of ?
When does have class-preserving outer automorphisms?
This is a bit of a shaggy dog story, but I think it’s fun, and there’s a moral about the nature of mathematical research.
Act 1
Once I was interested in the McGee graph, nicely animated here by Mamouka Jibladze:
This is the unique (3,7)-cage, meaning a graph such that each vertex has 3 neighbors and the shortest cycle has length 7. Since it has a very symmetrical appearance, I hoped it would be connected to some interesting algebraic structures. But which?
I read on Wikipedia that the symmetry group of the McGee graph has order 32. Let’s call it the McGee group. Unfortunately there are many different 32-element groups—51 of them, in fact!—and the article didn’t say which one this was. (It does now.)
and Gordon Royle said the McGee group is “not a super-interesting group, it is SmallGroup(32,43) in either GAP or Magma”. Knowing this let me look up the McGee group on this website, which is wonderfully useful if you’re studying finite groups:
There I learned that the McGee group is the so-called holomorph of the cyclic group : that is, the semidirect product of and its automorphism group:
I resisted getting sucked into the general study of holomorphs, or what happens when you iterate the holomorph construction. Instead, I wanted a more concrete description of the McGee group.
is not just an abelian group: it’s a ring! Since multiplication in a ring distributes over addition, we can get automorphisms of the group by multiplying by those elements that have multiplicative inverses. These invertible elements form a group
called the multiplicative group of In fact these give all the automorphisms of the group
In short, the McGee group is
This is very nice, because this is the group of all transformations of of the form
If we think of as a kind of line—called the affine line over the ring —these are precisely all the affine transformations of this line. Thus, the McGee group deserves to be called
This suggests that we can build the McGee graph in some systematic way starting from the affine line over
It turns out to be a bit complicated, because the vertices come in two kinds. That is, the McGee group doesn’t act transitively on the set of vertices. Instead, it has two orbits, shown as red and blue dots here:
The 8 red vertices correspond straightforwardly to the 8 points of the affine line, but the 16 blue vertices are more tricky. There are also the edges to consider: these come in three kinds! Greg Egan figured out how this works, and I wrote it up:
About two weeks ago, I gave a Zoom talk at the Illustrating Math Seminar about some topics on my blog Visual Insight. I mentioned that the McGee group is SmallGroup(32,43) and the holomorph of And then someone—alas, I forget who—instantly typed in the chat that this is one of the two smallest groups with an amazing property! Namely, this group has an outer automorphism that maps each element to an element conjugate to it.
I didn’t doubt this for a second. To paraphrase what Hardy said when he received Ramanujan’s first letter, nobody would have the balls to make up this shit. So, I posed a challenge to find such an exotic outer automorphism:
An automorphism is class-preserving if for each there exists some such that
If you can use the same for every we call an inner automorphism. But some groups have class-preserving automorphisms that are not inner! These are the class-preserving outer automorphisms.
I don’t know if class-preserving outer automorphisms are good for anything, or important in any way. They mainly just seem intriguingly spooky. An outer automorphism that looks inner if you examine its effect on any one group element is nothing I’d ever considered. So I wanted to see an example.
Rising to my challenge, Greg Egan found a nice explicit formula for some class-preserving outer automorphisms of the McGee group.
As we’ve seen, any element of the McGee group is a transformation
so let’s write it as a pair Greg Egan looked for automorphisms of the McGee group that are of the form
for some function
It is easy to check that is an automorphism if and only if
Moreover, is an inner automorphism if and only if
for some
Now comes something cool noticed by Joshua Grochow: these formulas are an instance of a general fact about group cohomology!
Suppose we have a group acting as automorphisms of an abelian group Then we can define the cohomology to be the group of -cocycles modulo -coboundaries. We only need the case here. A 1-cocycle is none other than a function obeying
while a 1-coboundary is one of the form
for some You can check that every 1-coboundary is a 1-cocycle. is the group of 1-cocycles modulo 1-coboundaries.
In this situation we can define the semidirect product and for any we can define a function
by
Now suppose and suppose is abelian. Then by straightforward calculations we can check:
• is an automorphism iff is a 1-cocycle
and
• is an inner automorphism iff is a 1-coboundary!
Thus, will have outer automorphisms if
When then is abelian and is the McGee group. This puts Egan’s idea into a nice context. But we still need to actually find maps that give outer automorphisms of the McGee group, and then find class-preserving ones. I don’t know how to do that using general ideas from cohomology. Maybe someone smart could do the first part, but the ‘class-preserving’ condition doesn’t seem to emerge naturally from cohomology.
Anyway, Egan didn’t waste his time with such effete generalities: he actually found all choices of for which
is a class-preserving outer automorphism of the McGee group. Namely:
Last Saturday after visiting my aunt in Santa Barbara I went to Berkeley to visit the applied category theorists at the Topos Institute. I took a train, to lessen my carbon footprint a bit. The trip took 9 hours—a long time, but a beautiful ride along the coast and then through forests and fields.
The day before taking the train, I discovered my laptop was no longer charging! So, I bought a pad of paper. And then, while riding the train, I checked by hand that Egan’s first choice of really is a cocycle, and really is not a coboundary, so that it defines an outer automorphism of the McGee group. Then—and this was fairly easy—I checked that it defines a class-preserving automorphism. It was quite enjoyable, since I hadn’t done any long calculations recently.
One moral here is that interesting ideas often arise from the interactions of many people. The results here are not profound, but they are certainly interesting, and they came from online conversations with Greg Egan, Gordon Royle, Joshua Grochow, the mysterious person who instantly knew that the McGee group was one of the two smallest groups with a class-preserving outer automorphism, and others.
But what does it all mean, mathematically? Is there something deeper going on here, or is it all just a pile of curiosities?
What did we actually do, in the end? Following the order of logic rather than history, maybe this. We started with a commutative ring took its group of affine transformations and saw this group must have outer automorphisms if
We saw this cohomology group really is nonvanishing when and Furthermore, we found a class-preserving outer automorphism of
This raises a few questions:
• What is the cohomology in general?
• What are the outer automorphisms of ?
• When does have class-preserving outer automorphisms?
Did you know that particle colliders have to cool down their particle beams before they collide?
You might have learned in school that temperature is secretly energy. With a number called Boltzmann’s constant, you can convert a temperature of a gas in Kelvin to the average energy of a molecule in the gas. If that’s what you remember about temperature, it might seem weird that someone would cool down the particles in a particle collider. The whole point of a particle collider is to accelerate particles, giving them lots of energy, before colliding them together. Since those particles have a lot of energy, they must be very hot, right?
Well, no. Here’s the thing: temperature is not just the average energy. It’s the average random energy. It’s energy that might be used to make a particle move forward or backwards, up or down, a random different motion for each particle. It doesn’t include motion that’s the same for each particle, like the movement of a particle beam.
Cooling down a particle beam then, doesn’t mean slowing it down. Rather, it means making it more consistent, getting the different particles moving in the same direction rather than randomly spreading apart. You want the particles to go somewhere specific, speeding up and slamming into the other beam. You don’t want them to move randomly, running into the walls and destroying your collider. So you can have something with high energy that is comparatively cool.
In general, the best way I’ve found to think about temperature and heat is in terms of usefulness and uselessness. Cool things are useful, they do what you expect and not much more. Hot things are less useful, they use energy to do random things you don’t want. Sometimes, by chance, this random energy will still do something useful, and if you have a cold thing to pair with the hot thing, you can take advantage of this in a consistent way. But hot things by themselves are less useful, and that’s why particle colliders try to cool down their beams.
At this week’s American Physical Society Global Physics Summit in Anaheim, California, John Preskill spoke at an event celebrating 100 years of groundbreaking advances in quantum mechanics. Here are his remarks.
Welcome, everyone, to this celebration of 100 years of quantum mechanics hosted by the Physical Review Journals. I’m John Preskill and I’m honored by this opportunity to speak today. I was asked by our hosts to express some thoughts appropriate to this occasion and to feel free to share my own personal journey as a physicist. I’ll embrace that charge, including the second part of it, perhaps even more that they intended. But over the next 20 minutes I hope to distill from my own experience some lessons of broader interest.
I began graduate study in 1975, the midpoint of the first 100 years of quantum mechanics, 50 years ago and 50 years after the discovery of quantum mechanics in 1925 that we celebrate here. So I’ll seize this chance to look back at where quantum physics stood 50 years ago, how far we’ve come since then, and what we can anticipate in the years ahead.
As an undergraduate at Princeton, I had many memorable teachers; I’ll mention just one: John Wheeler, who taught a full-year course for sophomores that purported to cover all of physics. Wheeler, having worked with Niels Bohr on nuclear fission, seemed implausibly old, though he was actually 61. It was an idiosyncratic course, particularly because Wheeler did not refrain from sharing with the class his current research obsessions. Black holes were a topic he shared with particular relish, including the controversy at the time concerning whether evidence for black holes had been seen by astronomers. Especially notably, when covering the second law of thermodynamics, he challenged us to ponder what would happen to entropy lost behind a black hole horizon, something that had been addressed by Wheeler’s graduate student Jacob Bekenstein, who had finished his PhD that very year. Bekenstein’s remarkable conclusion that black holes have an intrinsic entropy proportional to the event horizon area delighted the class, and I’ve had had many occasions to revisit that insight in the years since then. The lesson being that we should not underestimate the potential impact of sharing our research ideas with undergraduate students.
Stephen Hawking made that connection between entropy and area precise the very next year when he discovered that black holes radiate; his resulting formula for black hole entropy, a beautiful synthesis of relativity, quantum theory, and thermodynamics ranks as one of the shining achievements in the first 100 years of quantum mechanics. And it raised a deep puzzle pointed out by Hawking himself with which we have wrestled since then, still without complete success — what happens to information that disappears inside black holes?
Hawking’s puzzle ignited a titanic struggle between cherished principles. Quantum mechanics tells us that as quantum systems evolve, information encoded in a system can get scrambled into an unrecognizable form, but cannot be irreversibly destroyed. Relativistic causality tells us that information that falls into a black hole, which then evaporates, cannot possibly escape and therefore must be destroyed. Who wins – quantum theory or causality? A widely held view is that quantum mechanics is the victor, that causality should be discarded as a fundamental principle. This calls into question the whole notion of spacetime — is it fundamental, or an approximate property that emerges from a deeper description of how nature works? If emergent, how does it emerge and from what? Fully addressing that challenge we leave to the physicists of the next quantum century.
I made it to graduate school at Harvard and the second half century of quantum mechanics ensued. My generation came along just a little too late to take part in erecting the standard model of particle physics, but I was drawn to particle physics by that intoxicating experimental and theoretical success. And many new ideas were swirling around in the mid and late 70s of which I’ll mention only two. For one, appreciation was growing for the remarkable power of topology in quantum field theory and condensed matter, for example the theory of topological solitons. While theoretical physics and mathematics had diverged during the first 50 years of quantum mechanics, they have frequently crossed paths in the last 50 years, and topology continues to bring both insight and joy to physicists. The other compelling idea was to seek insight into fundamental physics at very short distances by searching for relics from the very early history of the universe. My first publication resulted from contemplating a question that connected topology and cosmology: Would magnetic monopoles be copiously produced in the early universe? To check whether my ideas held water, I consulted not a particle physicist or a cosmologist, but rather a condensed matter physicist (Bert Halperin) who provided helpful advice. The lesson being that scientific opportunities often emerge where different subfields intersect, a realization that has helped to guide my own research over the following decades.
Looking back at my 50 years as a working physicist, what discoveries can the quantumists point to with particular pride and delight?
I was an undergraduate when Phil Anderson proclaimed that More is Different, but as an arrogant would be particle theorist at the time I did not appreciate how different more can be. In the past 50 years of quantum mechanics no example of emergence was more stunning than the fractional quantum Hall effect. We all know full well that electrons are indivisible particles. So how can it be that in a strongly interacting two-dimensional gas an electron can split into quasiparticles each carrying a fraction of its charge? The lesson being: in a strongly-correlated quantum world, miracles can happen. What other extraordinary quantum phases of matter await discovery in the next quantum century?
Another thing I did not adequately appreciate in my student days was atomic physics. Imagine how shocked those who elucidated atomic structure in the 1920s would be by the atomic physics of today. To them, a quantum measurement was an action performed on a large ensemble of similarly prepared systems. Now we routinely grab ahold of a single atom, move it, excite it, read it out, and induce pairs of atoms to interact in precisely controlled ways. When interest in quantum computing took off in the mid-90s, it was ion-trap clock technology that enabled the first quantum processors. Strong coupling between single photons and single atoms in optical and microwave cavities led to circuit quantum electrodynamics, the basis for today’s superconducting quantum computers. The lesson being that advancing our tools often leads to new capabilities we hadn’t anticipated. Now clocks are so accurate that we can detect the gravitational redshift when an atom moves up or down by a millimeter in the earth’s gravitational field. Where will the clocks of the second quantum century take us?
Surely one of the great scientific triumphs of recent decades has been the success of LIGO, the laser interferometer gravitational-wave observatory. If you are a gravitational wave scientist now, your phone buzzes so often to announce another black hole merger that it’s become annoying. LIGO would not be possible without advanced laser technology, but aside from that what’s quantum about LIGO? When I came to Caltech in the early 1980s, I learned about a remarkable idea (from Carl Caves) that the sensitivity of an interferometer can be enhanced by a quantum strategy that did not seem at all obvious — injecting squeezed vacuum into the interferometer’s dark port. Now, over 40 years later, LIGO improves its detection rate by using that strategy. The lesson being that theoretical insights can enhance and transform our scientific and technological tools. But sometimes that takes a while.
What else has changed since 50 years ago? Let’s give thanks for the arXiv. When I was a student few scientists would type their own technical papers. It took skill, training, and patience to operate the IBM typewriters of the era. And to communicate our results, we had no email or world wide web. Preprints arrived by snail mail in Manila envelopes, if you were lucky enough to be on the mailing list. The Internet and the arXiv made scientific communication far faster, more convenient, and more democratic, and LaTeX made producing our papers far easier as well. And the success of the arXiv raises vexing questions about the role of journal publication as the next quantum century unfolds.
I made a mid-career shift in research direction, and I’m often asked how that came about. Part of the answer is that, for my generation of particle physicists, the great challenge and opportunity was to clarify the physics beyond the standard model, which we expected to provide a deeper understanding of how nature works. We had great hopes for the new phenomenology that would be unveiled by the Superconducting Super Collider, which was under construction in Texas during the early 90s. The cancellation of that project in 1993 was a great disappointment. The lesson being that sometimes our scientific ambitions are thwarted because the required resources are beyond what society will support. In which case, we need to seek other ways to move forward.
And then the next year, Peter Shor discovered the algorithm for efficiently finding the factors of a large composite integer using a quantum computer. Though computational complexity had not been part of my scientific education, I was awestruck by this discovery. It meant that the difference between hard and easy problems — those we can never hope to solve, and those we can solve with advanced technologies — hinges on our world being quantum mechanical. That excited me because one could anticipate that observing nature through a computational lens would deepen our understanding of fundamental science. I needed to work hard to come up to speed in a field that was new to me — teaching a course helped me a lot.
Ironically, for 4 ½ years in the mid-1980s I sat on the same corridor as Richard Feynman, who had proposed the idea of simulating nature with quantum computers in 1981. And I never talked to Feynman about quantum computing because I had little interest in that topic at the time. But Feynman and I did talk about computation, and in particular we were both very interested in what one could learn about quantum chromodynamics from Euclidean Monte Carlo simulations on conventional computers, which were starting to ramp up in that era. Feynman correctly predicted that it would be a few decades before sufficient computational power would be available to make accurate quantitative predictions about nonperturbative QCD. But it did eventually happen — now lattice QCD is making crucial contributions to the particle physics and nuclear physics programs. The lesson being that as we contemplate quantum computers advancing our understanding of fundamental science, we should keep in mind a time scale of decades.
Where might the next quantum century take us? What will the quantum computers of the future look like, or the classical computers for that matter? Surely the qubits of 100 years from now will be much different and much better than what we have today, and the machine architecture will no doubt be radically different than what we can currently envision. And how will we be using those quantum computers? Will our quantum technology have transformed medicine and neuroscience and our understanding of living matter? Will we be building materials with astonishing properties by assembling matter atom by atom? Will our clocks be accurate enough to detect the stochastic gravitational wave background and so have reached the limit of accuracy beyond which no stable time standard can even be defined? Will quantum networks of telescopes be observing the universe with exquisite precision and what will that reveal? Will we be exploring the high energy frontier with advanced accelerators like muon colliders and what will they teach us? Will we have identified the dark matter and explained the dark energy? Will we have unambiguous evidence of the universe’s inflationary origin? Will we have computed the parameters of the standard model from first principles, or will we have convinced ourselves that’s a hopeless task? Will we have understood the fundamental constituents from which spacetime itself is composed?
There is an elephant in the room. Artificial intelligence is transforming how we do science at a blistering pace. What role will humans play in the advancement of science 100 years from now? Will artificial intelligence have melded with quantum intelligence? Will our instruments gather quantum data Nature provides, transduce it to quantum memories, and process it with quantum computers to discern features of the world that would otherwise have remained deeply hidden?
To a limited degree, in contemplating the future we are guided by the past. Were I asked to list the great ideas about physics to surface over the 50-year span of my career, there are three in particular I would nominate for inclusion on that list. (1) The holographic principle, our best clue about how gravity and quantum physics fit together. (2) Topological quantum order, providing ways to distinguish different phases of quantum matter when particles strongly interact with one another. (3) And quantum error correction, our basis for believing we can precisely control very complex quantum systems, including advanced quantum computers. It’s fascinating that these three ideas are actually quite closely related. The common thread connecting them is that all relate to the behavior of many-particle systems that are highly entangled.
Quantum error correction is the idea that we can protect quantum information from local noise by encoding the information in highly entangled states such that the protected information is inaccessible locally, when we look at just a few particles at a time. Topological quantum order is the idea that different quantum phases of matter can look the same when we observe them locally, but are distinguished by global properties hidden from local probes — in other words such states of matter are quantum memories protected by quantum error correction. The holographic principle is the idea that all the information in a gravitating three-dimensional region of space can be encoded by mapping it to a local quantum field theory on the two-dimensional boundary of the space. And that map is in fact the encoding map of a quantum error-correcting code. These ideas illustrate how as our knowledge advances, different fields of physics are converging on common principles. Will that convergence continue in the second century of quantum mechanics? We’ll see.
As we contemplate the long-term trajectory of quantum science and technology, we are hampered by our limited imaginations. But one way to loosely characterize the difference between the past and the future of quantum science is this: For the first hundred years of quantum mechanics, we achieved great success at understanding the behavior of weakly correlated many-particles systems relevant to for example electronic structure, atomic and molecular physics, and quantum optics. The insights gained regarding for instance how electrons are transported through semiconductors or how condensates of photons and atoms behave had invaluable scientific and technological impact. The grand challenge and opportunity we face in the second quantum century is acquiring comparable insight into the complex behavior of highly entangled states of many particles which are well beyond the reach of current theory or computation. This entanglement frontier is vast, inviting, and still largely unexplored. The wonders we encounter in the second century of quantum mechanics, and their implications for human civilization, are bound to supersede by far those of the first century. So let us gratefully acknowledge the quantum heroes of the past and present, and wish good fortune to the quantum explorers of the future.
For several years I ran a blog called Visual Insight, which was a place to share striking images that help explain topics in mathematics. Last week I gave this talk about it at the Illustrating Math Seminar:
It was fun showing people the great images created by Refurio Anachro, Greg Egan, Roice Nelson, Gerard Westendorp and other folks. For more info on the images I talked about, go here:
From August 2013 to January 2017 I ran a blog called Visual Insight, which was a place to share striking images that help explain topics in mathematics. Here’s the video of a talk I gave last week about some of those images:
It was fun showing people the great images created by Refurio Anachro, Greg Egan, Roice Nelson, Gerard Westendorp and many other folks. For more info on the images I talked about, read on….
Perhaps the most important thing to get right from the start, in most statistical problems, is to understand what is the probability distribution function (PDF) of your data. If you know it exactly -something that is theoretically possible but only rarely achieved in practice- you are in statistical heaven: you can use the maximum likelihood method for parameter estimation, and you can get to understand a lot about the whole problem.
In January 2016, Caltech’s Institute for Quantum Information and Matter unveiled a YouTube video featuring an extraordinary chess showdown between actor Paul Rudd (a.k.a. Ant-Man) and the legendary Dr. Stephen Hawking. But this was no ordinary match—Rudd had challenged Hawking to a game of Quantum Chess. At the time, Fast Company remarked, “Here we are, less than 10 days away from the biggest advertising football day of the year, and one of the best ads of the week is a 12-minute video of quantum chess from Caltech.” But a Super Bowl ad for what, exactly?
For the past nine years, Quantum Realm Games, with continued generous support from IQIM and other strategic partnerships, has been tirelessly refining the rudimentary Quantum Chess prototype showcased in that now-viral video, transforming it into a fully realized game—one you can play at home or even on a quantum computer. And now, at long last, we’ve reached a major milestone: the launch of Quantum Chess 1.0. You might be wondering—what took us so long?
The answer is simple: developing an AI capable of playing Quantum Chess.
Before we dive into the origin story of the first-ever AI designed to master a truly quantum game, it’s important to understand what enables modern chess AI in the first place.
Chess AI is a vast and complex field, far too deep to explore in full here. For those eager to delve into the details, the Chess Programming Wiki serves as an excellent resource. Instead, this post will focus on what sets Quantum Chess AI apart from its classical counterpart—and the unique challenges we encountered along the way.
With Chess AI, the name of the game is “depth”, at least for versions based on the Minimax strategy conceived by John von Neumann in 1928 (we’ll say a bit about Neural Network based AI later). The basic idea is that the AI will simulate the possible moves each player can make, down to some depth (number of moves) into the future, then decide which one is best based on a set of evaluation criteria (minimizing the maximum loss incurred by the opponent). The faster it can search, the deeper it can go. And the deeper it can go, the better its evaluation of each potential next move is.
Searching into the future can be modelled as a branching tree, where each branch represents a possible move from a given position (board configuration). The average branching factor for chess is about 35. That means that for a given board configuration, there are about 35 different moves to choose from. So if the AI looks 2 ply (moves) ahead, it sees 35×35 moves on average, and this blows up quickly. By 4 ply, the AI already has 1.5 million moves to evaluate.
Modern chess engines, like Stockfish and Leela, gain their strength by looking far into the future. Depth 10 is considered low in these cases; you really need 20+ if you want the engine to return an accurate evaluation of each move under consideration. To handle that many evaluations, these engines use strong heuristics to prune branches (the width of the tree), so that they don’t need to calculate the exponentially many leaves of the tree. For example, if one of the branches involves losing your Queen, the algorithm may decide to prune that branch and all the moves that come after. But as experienced players can see already, since a Queen sacrifice can sometimes lead to massive gains down the road, such a “naive” heuristic may need to be refined further before it is implemented. Even so, the tension between depth-first versus breadth-first search is ever present.
The addition of split and merge moves in Quantum Chess absolutely explodes the branching factor. Early simulations have shown that it may be in the range of 100-120, but more work is needed to get an accurate count. For all we know, branching could be much bigger. We can get a sense by looking at a single piece, the Queen.
On an otherwise empty chess board, a single Queen on d4 has 27 possible moves (we leave it to the reader to find them all). In Quantum Chess, we add the split move: every piece, besides pawns, can move to any two empty squares it can reach legally. This adds every possible paired combination of standard moves to the list.
But wait, there’s more!
Order matters in Quantum Chess. The Queen can split to d3 and c4, but it can also split to c4 and d3. These subtly different moves can yield different underlying phase structures (given their implementation via a square-root iSWAP gate between the source square and the first target, followed by an iSWAP gate between the source and the second target), potentially changing how interference works on, say, a future merge move. So you get 27*26 = 702 possible moves! And that doesn’t include possible merge moves, which might add another 15-20 branches to each node of our tree.
Do the math and we see that there are roughly 30 times as many moves in Quantum Chess for that queen. Even if we assume the branching factor is only 100, by ply 4 we have 100 million moves to search. We obviously need strong heuristics to do some very aggressive pruning.
But where do we get strong heuristics for a new game? We don’t have centuries of play to study and determine which sequences of moves are good and which aren’t. This brings us to our first attempt at a Quantum Chess AI. Enter StoQfish.
StoQfish
Quantum Chess is based on chess (in fact, you can play regular Chess all the way through if you and your opponent decide to make no quantum moves), which means that chess skill matters. Could we make a strong chess engine work as a quantum chess AI? Stockfish is open source, and incredibly strong, so we started there.
Given the nature of quantum states, the first thing you think about when you try to adapt a classical strategy into a quantum one, is to split the quantum superposition underlying the state of the game into a series of classical states and then sample them according to their (squared) amplitude in the superposition. And that is exactly what we did. We used the Quantum Chess Engine to generate several chess boards by sampling the current state of the game, which can be thought of as a quantum superposition of classical chess configurations, according to the underlying probability distribution. We then passed these boards to Stockfish. Stockfish would, in theory, return its own weighted distribution of the best classical moves. We had some ideas on how to derive split moves from this distribution, but let’s not get ahead of ourselves.
This approach had limited success and significant failures. Stockfish is highly optimized for classical chess, which means that there are some positions that it cannot process. For example, consider the scenario where a King is in superposition of being captured and not captured; upon capture of one of these Kings, samples taken after such a move will produce boards without a King! Similarly, what if a King in superposition is in check, but you’re not worried because the other half of the King is well protected, so you don’t move to protect it? The concept of check is a problem all around, because Quantum Chess doesn’t recognize it. Things like moving “through check” are completely fine.
You can imagine then why whenever Stockfish encounters a board without a King it crashes. In classical Chess, there is always a King on the board. In Quantum Chess, the King is somewhere in the chess multiverse, but not necessarily in every board returned by the sampling procedure.
You might wonder if we couldn’t just throw away boards that weren’t valid. That’s one strategy, but we’re sampling probabilities so if we throw out some of the data, then we introduce bias into the calculation, which leads to poor outcomes overall.
We tried to introduce a King onto boards where he was missing, but that became its own computational problem: how do you reintroduce the King in a way that doesn’t change the assessment of the position?
We even tried to hack Stockfish to abandon its obsession with the King, but that caused a cascade of other failures, and tracing through the Stockfish codebase became a problem that wasn’t likely to yield a good result.
This approach wasn’t working, but we weren’t done with Stockfish just yet. Instead of asking Stockfish for the next best move given a position, we tried asking Stockfish to evaluate a position. The idea was that we could use the board evaluations in our own Minimax algorithm. However, we ran into similar problems, including the illegal position problem.
So we decided to try writing our own minimax search, with our own evaluation heuristics. The basics are simple enough. A board’s value is related to the value of the pieces on the board and their location. And we could borrow from Stockfish’s heuristics as we saw fit.
This gave us Hal 9000. We were sure we’d finally mastered quantum AI. Right? Find out what happened, in the next post.
This guide is intended for the Chapman undergraduate students who are attending this year’s APS Global Summit. It may be useful for others as well.
The APS Global Summit is a ginormous event, featuring dozens of parallel sessions at any given time. It can be exciting for first-time attendees, but also overwhelming. Here, I compile some advice on how to navigate the meeting and some suggestions for sessions and events you might like to attend.
General Advice
Use the online schedule and the mobile app to help you navigate the meeting. If you create a login, the online schedule allows you to add things to your personalized schedule, which you can view on the app at the meeting. This is a very useful thing to do because making decisions of where to go on the fly is difficult.
Do not overschedule yourself. I know it is tempting to figure out how to go to as many things as you can, and run between sessions on opposite sides of the convention center. This will be harder to accomplish than you imagine. The meeting gets very crowded and it is exhausting to sit through a full three-hour session of talks. Schedule some break time and, where possible, schedule blocks of time in one location rather than running all over the place.
You will have noticed that most talks at the meeting are 12min long (10min + 2min). These are called contributed talks. Since they are so short, they are more like adverts for the work than a detailed explanation. They are usually aimed at experts and, quite frankly, many speakers do not know how to give these talks well. It is not worth attending these talks unless one of the following applies:
You are already an expert in that research area.
You are strongly considering doing research in that area.
You are there to support your friends and colleagues who are speaking in that session.
You are so curious about the research area that you are prepared to sit through a lot of opaque talks to get some idea of what is going on in the area.
The session is on a topic that is unusually accessible or the session is aimed at undergraduate students.
Instead, you should prioritize attending the following kinds of talks, which you can search for using the filters on the schedule:
Plenary talks: These are aimed at a general physics audience and are usually by famous speakers (famous by physics standards anyway). Some of these might also be…
Invited Sessions: These sessions consist of 30min talks by invited speakers in a common research area. There is no guarantee that they will be accessible to novices, but it is much more likely than with the contributed talks. Go to any invited sessions on areas of physics you are curious about.
Focus Sessions: Focus sessions consist mainly of contributed talks, but they also have one or two 30min invited talks. It is not considered rude to switch sessions between talks, so do not be afraid to just attend the invited talks. They are not always scheduled at the beginning of the session. In fact, some groups deliberately stagger the times of the invited talks so that people can see the invited talks in more than one focus session.
There are sessions that list “Undergraduate Students” as part of their target audience. A lot of these are “Undergraduate Research” sessions. It can be interesting to go to one or two of these to see the variety of undergraduate research experiences that are on offer. However, I would not advise only going to sessions on this list. For one thing, undergraduate research projects are not banned from the other sessions, so many of the best undergraduate projects will not be in those sessions. Going to sessions by topic is a better bet most of the time.
It is helpful to filter the sessions on the schedule by the organizing Unit (Division, Topical Group, or Forum). You can find a list of APS units here. For example, if you are particularly interested in Quantum Information and Computation then you will want to look at the sessions organized by DQI (Division of Quantum Information). Sessions organized by Forums are often particularly accessible, as they tend to be about less technical issues (DEI, Education, History and Philosophy, etc.)
The next sections contain some more specific suggestions about events, talks and sessions that you might like to attend.
Orientation and Networking Events
I have never been to an orientation or networking event at the APS meeting, but then again I did not go to the APS meeting as a student. Networking is one of the best things you can do at the meeting, so do take any opportunities to meet and talk to people.
The student lunch with the Experts is especially worth it because you get a one-on-eight meeting with a physicist who works on a topic you are interested in. You also get a free lunch. Spaces are limited, so you need to sign up for it on the Sunday, and early if you want to get your choice of expert.
Generally speaking, food is very expensive in the convention center. Therefore, the more places you can get free food the better. There are networking events, some of which are aimed at students and some of which have free meals. Other good bets for free food include the receptions and business meetings. (With a business meeting you may have to first sit through a boring administrative meeting for an APS unit, but at least the DQI meeting will feature me talking about The Quantum Times.)
Sessions Chaired by Chapman Faculty
The next few sections highlight talks and sessions that involve people at Chapman. You may want to come to these not only to support local people, but also to find out more about areas of research that you might want to do undergraduate research projects in.
The following sessions are being chaired by Chapman faculty. The chair does not give a talk during the session, but acts as a host. But chairs usually work in the areas that the session is about, so it is a good way to get more of an overview of things they are interested in.
Talks involving Chapman Faculty, Postdocs and Students
The talks listed below all have someone who is currently affiliated with Chapman as one or more of the authors. The Chapman person is not necessarily the person giving the talk.
The people giving the talks, especially if they are students or postdocs, would appreciate your support. It is also a good way of finding out more about research that is going on at Chapman.
Posters involving Chapman Faculty, Postdocs and Students
Poster sessions last longer than talks, so you can view the posters at your leisure. The presenter is supposed to stand by their poster and talk to people who come to see it. The following posters are being presented by Chapman undergraduates. Please drop by and support them.
Thursday March 20, 10:00am – 1:00pm, Anaheim Convention Center, Exhibit Hall A
These are sessions that reflect my own interests. It is a good bet that you will find me at one of these, unless I am teaching, or someone I know is speaking somewhere else. There are multiple sessions at the same time, but what I will typically do is select the one that has the most interesting looking talk at the time and switch sessions from time to time or take a break from sessions entirely if I get bored.
It is worthwhile to spend some time in the exhibit hall. It features a Careers Fair and a Grad School Fair, which will be larger and more relevant to physics students than other such fairs you might attend in the area.
But, of course, the main purpose of going to the exhibition hall is to acquire SWAG. Some free items I have obtained from past APS exhibit halls include:
Rubik’s cubes
Balls that light up when you bounce them
Yo-Yos
Wooden model airplanes
Snacks
T-shits
Tote bags
Enough stationery items to last for the rest of your degree
Free magazines and journals
Free or heavily discounted books
I recommend going when the hall first opens to get the highest quality SWAG.
Fun Stuff
Other fun stuff to do at this year’s meeting includes:
QuantumFest: This starts with the Quantum Jubilee event on Saturday, but there are events all week some of which you have to be registered for the meeting for. Definitely reserve a spot for the LabEscape escpae room. I have done one of their rooms before and it is fun.
Physics Rock-n-Roll Singalong: A very nerdy APS meeting tradition. Worth attending once in your life. Probably only once though.
Vjeko Kovac and I have just uploaded to the arXiv our paper “On several irrationality problems for Ahmes series“. This paper resolves (or at least makes partial progress on) some open questions of Erdős and others on the irrationality of Ahmes series, which are infinite series of the form for some increasing sequence of natural numbers. Of course, since most real numbers are irrational, one expects such series to “generically” be irrational, and we make this intuition precise (in both a probabilistic sense and a Baire category sense) in our paper. However, it is often difficult to establish the irrationality of any specific series. For example, it is already a non-trivial result of Erdős that the series is irrational, while the irrationality of (equivalent to Erdős problem #69) remains open, although very recently Pratt established this conditionally on the Hardy–Littlewood prime tuples conjecture. Finally, the irrationality of (Erdős problem #68) is completely open.
On the other hand, it has long been known that if the sequence grows faster than for any , then the Ahmes series is necessarily irrational, basically because the fractional parts of can be arbitrarily small positive quantities, which is inconsistent with being rational. This growth rate is sharp, as can be seen by iterating the identity to obtain a rational Ahmes series of growth rate for any fixed .
In our paper we show that if grows somewhat slower than the above sequences in the sense that , for instance if for a fixed , then one can find a comparable sequence for which is rational. This partially addresses Erdős problem #263, which asked if the sequence had this property, and whether any sequence of exponential or slower growth (but with convergent) had this property. Unfortunately we barely miss a full solution of both parts of the problem, since the condition we need just fails to cover the case , and also does not quite hold for all sequences going to infinity at an exponential or slower rate.
We also show the following variant; if has exponential growth in the sense that with convergent, then there exists nearby natural numbers such that is rational. This answers the first part of Erdős problem #264 which asked about the case , although the second part (which asks about ) is slightly out of reach of our methods. Indeed, we show that the exponential growth hypothesis is best possible in the sense a random sequence that grows faster than exponentially will not have this property, this result does not address any specific superexponential sequence such as , although it does apply to some sequence of the shape .
Our methods can also handle higher dimensional variants in which multiple series are simultaneously set to be rational. Perhaps the most striking result is this: we can find an increasing sequence of natural numbers with the property that is rational for every rational (excluding the cases to avoid division by zero)! This answers (in the negative) a question of Stolarsky Erdős problem #266, and also reproves Erdős problem #265 (and in the latter case one can even make grow double exponentially fast).
Our methods are elementary and avoid any number-theoretic considerations, relying primarily on the countable dense nature of the rationals and an iterative approximation technique. The first observation is that the task of representing a given number as an Ahmes series with each lying in some interval (with the disjoint, and going to infinity fast enough to ensure convergence of the series), is possible if and only if the infinite sumset
to contain , where . More generally, to represent a tuple of numbers indexed by some set of numbers simultaneously as with , this is the same as asking for the infinite sumset
to contain , where now
So the main problem is to get control on such infinite sumsets. Here we use a very simple observation:
Proposition 1 (Iterative approximation) Let be a Banach space, let be sets with each contained in the ball of radius around the origin for some with convergent, so that the infinite sumset is well-defined. Suppose that one has some convergent series in , and sets converging in norm to zero, such that
for all . Then the infinite sumset contains .
Informally, the condition (2) asserts that occupies all of “at the scale “.
Proof: Let . Our task is to express as a series with . From (2) we may write
for some and . Iterating this, we may find and such that
for all . Sending , we obtain
as required.
In one dimension, sets of the form are dense enough that the condition (2) can be satisfied in a large number of situations, leading to most of our one-dimensional results. In higher dimension, the sets lie on curves in a high-dimensional space, and so do not directly obey usable inclusions of the form (2); however, for suitable choices of intervals , one can take some finite sums which will become dense enough to obtain usable inclusions of the form (2) once reaches the dimension of the ambient space, basically thanks to the inverse function theorem (and the non-vanishing curvatures of the curve in question). For the Stolarsky problem, which is an infinite-dimensional problem, it turns out that one can modify this approach by letting grow slowly to infinity with .
I’ve worked on topological quantum computation, one of Alexei Kitaev’s brilliant innovations, for around 15 years now. It’s hard to find a more beautiful physics problem, combining spectacular quantum phenomena (non-Abelian anyons) with the promise of transformative technological advances (inherently fault-tolerant quantum computing hardware). Problems offering that sort of combination originally inspired me to explore quantum matter as a graduate student.
Non-Abelian anyons are emergent particles born within certain exotic phases of matter. Their utility for quantum information descends from three deeply related defining features:
Nucleating a collection of well-separated non-Abelian anyons within a host platform generates a set of quantum states with the same energy (at least to an excellent approximation). Local measurements give one essentially no information about which of those quantum states the system populates—i.e., any evidence of what the system is doing is hidden from the observer and, crucially, the environment. In turn, qubits encoded in that space enjoy intrinsic resilience against local environmental perturbations.
Swapping the positions of non-Abelian anyons manipulates the state of the qubits. Swaps can be enacted either by moving anyons around each other as in a shell game, or by performing a sequence of measurements that yields the same effect. Exquisitely precise qubit operations follow depending only on which pairs the user swaps and in what order. Properties (1) and (2) together imply that non-Abelian anyons offer a pathway both to fault-tolerant storage and manipulation of quantum information.
A pair of non-Abelian anyons brought together can “fuse” into multiple different kinds of particles, for instance a boson or a fermion. Detecting the outcome of such a fusion process provides a method for reading out the qubit states that are otherwise hidden when all the anyons are mutually well-separated. Alternatively, non-local measurements (e.g., interferometry) can effectively fuse even well-separated anyons, thus also enabling qubit readout.
I entered the field back in 2009 during the last year of my postdoc. Topological quantum computing—once confined largely to the quantum Hall realm—was then in the early stages of a renaissance driven by an explosion of new candidate platforms as well as measurement and manipulation schemes that promised to deliver long-sought control over non-Abelian anyons. The years that followed were phenomenally exciting, with broadly held palpable enthusiasm for near-term prospects not yet tempered by the practical challenges that would eventually rear their head.
A PhD comics cartoon on non-Abelian anyons from 2014.
In 2018, near the height of my optimism, I gave an informal blackboard talk in which I speculated on a new kind of forthcoming NISQ era defined by the birth of a Noisy Individual Semi-topological Qubit. To less blatantly rip off John Preskill’s famous acronym, I also—jokingly of course—proposed the alternative nomenclature POST-Q (Piece Of S*** Topological Qubit) era to describe the advent of such a device. The rationale behind those playfully sardonic labels is that the inaugural topological qubit would almost certainly be far from ideal, just as the original transistor appears shockingly crude when compared to modern electronics. You always have to start somewhere. But what does it mean to actually create a topological qubit, and how do you tell that you’ve succeeded—especially given likely POST-Q-era performance?
To my knowledge those questions admit no widely accepted answers, despite implications for both quantum science and society. I would like to propose defining an elementary topological qubit as follows:
A device that leverages non-Abelian anyons to demonstrably encode and manipulate a single qubit in a topologically protected fashion.
Some of the above words warrant elaboration. As alluded to above, non-Abelian anyons can passively encode quantum information—a capability that by itself furnishes a quantum memory. That’s the “encode” part. The “manipulate” criterion additionally entails exploiting another aspect of what makes non-Abelian anyons special—their behavior under swaps—to enact gate operations. Both the encoding and manipulation should benefit from intrinsic fault-tolerance, hence the “topologically protected fashion” qualifier. And very importantly, these features should be “demonstrably” verified. For instance, creating a device hosting the requisite number of anyons needed to define a qubit does not guarantee the all-important property of topological protection. Hurdles can still arise, among them: if the anyons are not sufficiently well-separated, then the qubit states will lack the coveted immunity from environmental perturbations; thermal and/or non-equilibrium effects might still induce significant errors (e.g., by exciting the system into other unwanted states); and measurements—for readout and possibly also manipulation—may lack the fidelity required to fruitfully exploit topological protection even if present in the qubit states themselves.
The preceding discussion raises a natural follow-up question: How do you verify topological protection in practice? One way forward involves probing qubit lifetimes, and fidelities of gates resulting from anyon swaps, upon varying some global control knob like magnetic field or gate voltage. As the system moves deeper into the phase of matter hosting non-Abelian anyons, both the lifetime and gate fidelities ought to improve dramatically—reflecting the onset of bona fide topological protection. First-generation “semi-topological” devices will probably fare modestly at best, though one can at least hope to recover general trends in line with this expectation.
By the above proposed definition, which I contend is stringent yet reasonable, realization of a topological qubit remains an ongoing effort. Fortunately the journey to that end offers many significant science and engineering milestones worth celebrating in their own right. Examples include:
Platform verification. This most indirect milestone evidences the formation of a non-Abelian phase of matter through (thermal or charge) Hall conductance measurements, detection of some anticipated quantum phase transition, etc.
Detection of non-Abelian anyons. This step could involve conductance, heat capacity, magnetization, or other types of measurements designed to support the emergence of either individual anyons or a collection of anyons. Notably, such techniques need not reveal the precise quantum state encoded by the anyons—which presents a subtler challenge.
Establishing readout capabilities. Here one would demonstrate experimental techniques, interferometry for example, that in principle can address that key challenge of quantum state readout, even if not directly applied yet to a system hosting non-Abelian anyons.
Fusion protocols. Readout capabilities open the door to more direct tests of the hallmark behavior predicted for a putative topological qubit. One fascinating experiment involves protocols that directly test non-Abelian anyon fusion properties. Successful implementation would solidify readout capabilities applied to an actual candidate topological qubit device.
Probing qubit lifetimes. Fusion protocols further pave the way to measuring the qubit coherence times, e.g., and —addressing directly the extent of topological protection of the states generated by non-Abelian anyons. Behavior clearly conforming to the trends highlighted above could certify the device as a topological quantum memory. (Personally, I most anxiously await this milestone.)
Fault-tolerant gates from anyon swaps. Likely the most advanced milestone, successfully implementing anyon swaps, again with appropriate trends in gate fidelity, would establish the final component of an elementary topological qubit.
Most experiments to date focus on the first two items above, platform verification and anyon detection. Microsoft’s recent Nature paper, together with the simultaneous announcement of supplementary new results, combine efforts in those areas with experiments aiming to establish interferometric readout capabilities needed for a topological qubit. Fusion, (idle) qubit lifetime measurements, and anyon swaps have yet to be demonstrated in any candidate topological quantum computing platform, but at least partially feature in Microsoft’s future roadmap. It will be fascinating to see how that effort evolves, especially given the aggressive timescales predicted by Microsoft for useful topological quantum hardware. Public reactions so far range from cautious optimism to ardent skepticism; data will hopefully settle the situation one way or another in the near future. My own take is that while Microsoft’s progress towards qubit readout is a welcome advance that has value regardless of the nature of the system to which those techniques are currently applied, convincing evidence of topological protection may still be far off.
In the meantime, I maintain the steadfast conviction that topological qubits are most certainly worth pursuing—in a broad range of platforms. Non-Abelian quantum Hall states seem resurgent candidates, and should not be discounted. Moreover, the advent of ultra-pure, highly tunable 2D materials provide new settings in which one can envision engineering non-Abelian anyon devices with complementary advantages (and disadvantages) compared to previously explored settings. Other less obvious contenders may also rise at some point. The prospect of discovering new emergent phenomena mitigating the need for quantum error correction warrants continued effort with an open mind.
This week’s lectures on instantons in my gauge theory class (a very important kind of theory for understanding many phenomenon in nature – light is an example of a phenomenon that is described by gauge theory) were a lot of fun to do, and mark the culmination of a month-long … Click to continue reading this post →
I had this neat calculation in my drawer and on the occasion of quantum mechanic's 100th birthday in 2025, I decided I submit a talk about it to the March meeting of the DPG, the German physical society, in Göttingen. And to have to show something, I put it out on the arxiv today. The idea is as follows:
The GHZ experiment is a beautiful version of Bell's inequality that demonstrates you get to wrong conclusions when you assume that a property of a quantum system has to have some (unknown) value even when you don't measure it. I would say it shows quantum theory is not realistic, in the sense that unmeasured properties do not have secret values (different for example from classical statistical mechanics where you could imagine to actually measure the exact position of molecule number 2342 in your container of gas). For details, see the paper or this beautiful explanation by Coleman. I should mention here that there is another way out by assuming some non-local forces that conspire to make the result come out right never the less.
On the other hand there is Bohmian mechanics. This is well known to be a non-local theory (as the time evolution of its particles depend on the positions of all other particles in the system or even universe) but what I found more interesting is also realistic: There, it is claimed that all that matters are particles positions (including the positions of pointers on your measurement devices that you might interpret as showing something different than positions for example velocities or field strengths or whatever) and those have all (possibly unknown) values at all times even if you don't measure them.
So how can the two be brought together? There might be an obstacle in the fact that GHZ is usually presented to be a correlation of spins and in the Bohmian literature spins are not really positions, you will always have to make use of some Stern-Gerlach experiments to translate those into actual positions. But we can circumvent this the other way: We don't really need spins, we just need observables of the commutation relation of Pauli matrices. You might think that those cannot be realised with position measurements as they always commute but this is only true as you do the position measurements at equal times. If you wait between them, you can in fact have almost Pauli type operators.
So we can set up a GHZ experiment in terms of three particles in three boxes and for each particle you measure whether it is in the left or the right half of the box but for each particle you decide if you do it at time 0 or at a later moment. You can look at the correlation of the three measurements as a function of time (of course, as you measure different particles, the actual measurements you do still commute independent of time) and what you find is the blue line in
GHZ correlations vs. Bohmian correlations
You can also (numerically) solve the Bohmian equation of motion and compute the expectation of the correlation of positions of the three particles at different times which gives the orange line, clearly something else. No surprise, the realistic theory cannot predict the outcome of an experiment that demonstrates that quantum theory is not realistic. And the non-local character of the evolution equation does not help either.
To save the Bohmian theory, one can in fact argue that I have computed the wrong thing: After measuring the position of one particle at time 0 or by letting it interact with a measuring device, the future time evolution of all particles is affected and one should compute that correlation with the corrected (effectively collapsed) wave function. That, however, I cannot do and I claim is impossible since it would depend on the details of how the first particle's position is actually measured (whereas the orthodox prediction above is independent of those details as those interactions commute with the later observations). In any case, at least my interpretation is that if you don't want to predict the correlation wrong the best you can do is to say you cannot do the calculation as it depends on unknown details (but the result of course shouldn't).
In any case, the standard argument why Bohmian mechanics is indistinguishable from more conventional treatments is that all that matters are position correlations and since those are given by psi-squared they are the same for all approaches. But I show this is not the case for these multi-time correlations.
Post script: What happens when you try to discuss physics with a philosopher:
After living in Japan for about four months, we left in mid-December. We miss it already.
One of the pleasures we discovered is the onsen, or hot spring. Originally referring to the natural volcanic springs themselves, and the villages around them, there are now onsens all over Japan. Many hotels have an onsen, and most towns will have several. Some people still use them as their primary bath and shower for keeping clean. (Outside of actual volcanic locations, these are technically sento rather than onsen.) You don’t actually wash yourself in the hot baths themselves; they are just for soaking, and there are often several, at different temperatures, mineral content, indoor and outdoor locations, whirlpools and even “electric baths” with muscle-stimulating currents. For actual cleaning, there is a bank of hand showers, usually with soap and shampoo. Some can be very basic, some much more like a posh spa, with massages, saunas, and a restaurant.
Our favourite, about 25 minutes away by bicycle, was Kirari Onsen Tsukuba. When not traveling, we tried to go every weekend, spending a day soaking in the hot water, eating the good food, staring at the gardens, snacking on Hokkaido soft cream — possibly the best soft-serve ice cream in the world (sorry, Carvel!), and just enjoying the quiet and peace. Even our seven- and nine-year old girls have found the onsen spirit, calming and quieting themselves down for at least a few hours.
Living in Tsukuba, lovely but not a common tourist destination, although with plenty of foreigners due to the constellation of laboratories and universities, we were often one of only one or two western families in our local onsen. It sometimes takes Americans (and those from other buttoned-up cultures) some time to get used to their sex-segregated but fully-naked policies of the baths themselves. The communal areas, however, are mixed, and fully-clothed. In fact, many hotels and fancier onsen facilities supply a jinbei, a short-sleeve pyjama set in which you can softly pad around the premises during your stay. (I enjoyed wearing jinbei so much that I purchased a lightweight cotton set for home, and am also trying to get my hands on samue, a somewhat heavier style of traditional Japanese clothing.)
And my newfound love for the onsen is another reason not to get a tattoo beyond the sagging flesh and embarrassment of my future self: in Japan, tattoos are often a symbol of the yakuza, and are strictly forbidden in the onsen, even for foreigners.
Later in our sabbatical, we will be living in the Netherlands, which also has a good public bath culture, but it will be hard to match the calm of the Japanese onsen.
Thanks to everyone who made all those kind remarks in various places last month after my mother died. I've not responded individually (I did not have the strength) but I did read them all and they were deeply appreciated. Yesterday would’ve been mum‘s 93rd birthday. A little side-note occurred to me the other day: Since she left us a month ago, she was just short of having seen two perfect square years. (This year and 1936.) Anyway, still on the theme of playing with numbers, my siblings and I agreed that as a tribute to her on the day, we would all do some kind of outdoor activity for 93 minutes. Over in London, my brother and sister did a joint (probably chilly) walk together in Regents Park and surrounds. I decided to take out a piece of the afternoon at low tide and run along the beach. It went pretty well, [...] Click to continue reading this post →
As a French citizen I should probably disavow the following post and remind myself that I have access to some of the best food in the world. Yet it's impossible to forget the tastes of your childhood. And indeed there are lots of British things that are difficult or very expensive to get hold of in France. Some of them (Marmite, Branston pickle ...) I can import via occasional trips across the channel, or in the luggage of visiting relatives. However, since Brexit this no longer works for fresh food like bacon and sausages. This is probably a good thing for my health, but every now and then I get a hankering for a fry-up or a bacon butty, and as a result of their rarity these are amongst the favourite breakfasts of my kids too. So I've learnt how to make bacon and sausages (it turns out that boudin noir is excellent with a fry-up and I even prefer it to black pudding).
Sausages are fairly labour-intensive, but after about an hour or so's work it's possible to make one or two kilos worth. Back bacon, on the other hand, takes three weeks to make one batch, and I thought I'd share the process here.
1. Cut of meat
The first thing is to get the right piece of pork, since animals are divided up differently in different countries. I've made bacon several times now and keep forgetting which instructions I previously gave to the butcher at my local Grand Frais ... Now I have settled on asking for a carré de porc, and when they (nearly always) tell me that they don't have that in I ask for côtes de porc première in one whole piece, and try to get them to give me a couple of kilos. As you can find on wikipedia, I need the same piece of meat used to make pork chops. I then ask them to remove the spine, but it should still have the ribs. So I start with this:
2. Cure
Next the meat has to be cured for 10 days (I essentially follow the River Cottage recipe). I mix up a 50-50 batch of PDV salt and brown sugar (1 kg in total here), and add some pepper, juniper berries and bay leaves:
Notice that this doesn't include any nitrites or nitrates. I have found that nitrates/nitrites are essential for the flavour in sausages, but in bacon the only thing that they will do (other than be a carcinogen) as far as I can tell is make the meat stay pink when you cook it. I can live without that. This cure makes delicious bacon as far as I'm concerned.
The curing process involves applying 1/10th of the mixture each day for ten days and draining off the liquid produced at each step. After the first coating it looks like this:
The salt and sugar remove water from the meat, and penetrate into it, preserving it. Each day I get liquid at the bottom, which I drain off and apply the next cure. After one day it looks like this:
This time I still had liquid after 10 days:
3. Drying
After ten days, I wash/wipe off the cure and pat it down with some vinegar. If you leave cure on the meat it will be much too salty (and, to be honest, this cure always gives quite salty bacon). So at this point it looks like this:
I then cover the container with a muslin that has been doused with a bit more vinegar, and leave in the fridge (at first) and then in the garage (since it's nice and cold this time of year) for ten days or so. This part removes extra moisture. It's possible that there will be small amounts of white mould that appear during this stage, but these are totally benign: you only have to worry if it starts to smell or you get blue/black mould, but this never happened to me so far.
4. Smoking
After the curing/drying, the bacon is ready to eat and should in principle keep almost indefinitely. However, I prefer smoked bacon, so I cold smoke it. This involves sticking it in a smoker (essentially just a box where you can suspend the meat above some smouldering sawdust) for several hours:
The sawdust is beech wood and slowly burns round in the little spiral device you can see above. Of course, I close the smoker up and usually put it in the shed to protect against the elements:
5. All done!
And then that's it! Delicious back bacon that really doesn't take very long to eat:
As I mentioned above, it's usually still a bit salty, so when I slice it to cook I put the pieces in water for a few minutes before grilling/frying:
Here you see that the colour is just like frying pork chops ... but the flavour is exactly right!
I've been very quiet here over the last couple of weeks. My mother, Delia Maria Johnson, already in hospital since 5th November or so, took a turn for the worse and began a rapid decline. She died peacefully after some days, and to be honest I’ve really not been myself since then.
There's an extra element to the sense of loss when (as it approaches) you are powerless to do anything because of being thousands of miles away. On the plus side, because of the ease of using video calls, and with the help of my sister being there, I was able to be somewhat present during what turned out to be the last moments when she was aware of people around her, and therefore was able to tell her I loved her one last time.
Rather than charging across the world on planes, trains, and in automobiles, probably being out of reach during any significant changes in the situation (the doctors said I would likely not make it in time) I did a number of things locally that I am glad I got to do.
It began with visiting (and sending a photo from) the Santa Barbara mission, a place she dearly loved and was unable to visit again after 2019, along with the pier. These are both places we walked together so much back when I first lived here in what feels like another life.
Then, two nights before mum passed away, but well after she’d seemed already beyond reach of anyone, although perhaps (I’d like to think) still able to hear things, my sister contacted me from her bedside asking if I’d like to read mum a psalm, perhaps one of her favourites, 23 or 91. At first I thought she was already planning the funeral, and expressed my surprise at this since mum was still alive and right next to her. But I’d misunderstood, and she’d in fact had a rather great idea. This suggestion turned into several hours of, having sent on recordings of the two psalms, my digging into the poetry shelf in the study and discovering long neglected collections through which I searched (sometimes accompanied by my wife and son) for additional things to read. I recorded some and sent them along, as well as one from my son, I’m delighted to say. Later, the whole thing turned into me singing various songs while playing my guitar and sending recordings of those along too.
Incidentally, the guitar-playing was an interesting turn of events since not many months ago I decided after a long lapse to start playing guitar again, and try to move the standard of my playing (for vocal accompaniment) to a higher level than I’d previously done, by playing and practicing for a little bit on a regular basis. I distinctly recall thinking at one point during one practice that it would be nice to play for mum, although I did not imagine that playing to her while she was on her actual death-bed would be the circumstance under which I’d eventually play for her, having (to my memory) never directly done so back when I used to play guitar in my youth. (Her overhearing me picking out bits of Queen songs behind my room door when I was a teenager doesn’t count as direct playing for her.)