Planet Musings

February 14, 2025

Doug NatelsonIndirect costs + potential unintended consequences

It's been another exciting week where I feel compelled to write about the practice of university-based research in the US.  I've written about "indirect costs" before, but it's been a while.  I will try to get readers caught up on the basics of the university research ecosystem in the US, what indirect costs are, the latest (ahh, the classic Friday evening news dump) from NIH, and what might happen.  (A note up front:  there are federal laws regulating indirect costs, so the move by NIH will very likely face immediate legal challenges.  Update:  And here come the lawsuitsUpdate 2Here is a useful explanatory video.)  Update 3:  This post is now closed (7:53pm CST 13 Feb).  When we get to the "bulldozer going to pile you lot onto the trash" level of discourse, there is no more useful discussion happening.

How does university-based sponsored technical research work in the US?  Since WWII, but particularly since the 1960s, many US universities conduct a lot of science and engineering research sponsored by US government agencies, foundations, and industry.  By "sponsored", I mean there is a grant or contract between a sponsor and the university that sends funds to the university in exchange for research to be conducted by one or more faculty principal investigators, doctoral students, postdocs, undergrads, staff scientists, etc.  When a PI writes a proposal to a sponsor, a budget is almost always required that spells out how much funding is being requested and how it will be spent.  For example, a proposal could say, we are going to study superconductivity in 2D materials, and the budget (which comes with a budget justification) says, to do this, I need $37000 per year to pay a graduate research assistant for 12 months, plus $12000 per year for graduate student tuition, plus $8000 in for the first year for a special amplifier, plus $10000 to cover materials, supplies, and equipment usage fees.  Those are called direct costs.  

In addition, the budget asks for funds to cover indirect costs.  Indirect costs are meant to cover the facilities and administrative costs that the university will incur doing the research - that includes things like, maintaining the lab building, electricity, air conditioning, IT infrastructure, research accountants to keep track of the expenses and generate financial reports, etc.  Indirect costs are computed as some percentage of some subset of the direct costs (e.g.,  there are no indirect costs charged on grad tuition or pieces of equipment more expensive than $5K).  Indirect cost rates have varied over the years but historically have been negotiated between universities and the federal government.  As I wrote eight years ago, "the magic (ahem) is all hidden away in OMB Circular A21 (wiki about it, pdf of the actual doc).  Universities periodically go through an elaborate negotiation process with the federal government (see here for a description of this regarding MIT), and determine an indirect cost rate for that university."  Rice's indirect cost rate is 56.5% for on-campus fed or industrial sponsored projects.  Off-campus rates are lower (if you're really doing the research at CERN, then logically your university doesn't need as much indirect).  Foundations historically try to negotiate lower indirect cost rates, often arguing that their resources are limited and paying for administration is not what their charters endorse.  The true effective indirect rate for universities is always lower than the stated number because of such negotiations.

PIs are required to submit technical progress reports, and universities are required to submit detailed financial reports, to track these grants.  

This basic framework has been in place for decades, and it has resulted in the growth of research universities, with enormous economic and societal benefit.  Especially as industrial long term research has waned in the US (another screed I have written before), the university research ecosystem has been hugely important in contributing to modern technological society.  We would not have the internet now, for example, if not for federally sponsored research.

Is it ideal?  No.  Are there inefficiencies?  Sure.  Should the whole thing be burned down?  Not in my opinion, no.

"All universities lose money doing research."  This is a quote from my colleague who was provost when I arrived at Rice, and was said to me tongue-in-cheek, but also with more than a grain of truth.  If you look at how much it really takes to run the research apparatus, the funds brought in via indirect costs do not cover those costs.  I have always said that this is a bit like Hollywood accounting - if research was a true financial disaster, universities wouldn't do it.  The fact is that research universities have been willing to subsidize the additional real indirect costs because having thriving research programs brings benefits that are not simple to quantify financially - reputation, star faculty, opportunities for their undergrads that would not exist in the absence of research, potential patent income and startup companies, etc.

Reasonable people can disagree on what is the optimal percentage number for indirect costs.   It's worth noting that the indirect cost rate at Bell Labs back when I was there was something close to 100%.  Think about that.  In a globally elite industrial research environment, with business-level financial pressure to be frugal, the indirect rate was 100%.  

The fact is, if indirect cost rates are set too low, universities really will be faced with existential choices about whether to continue to support sponsored research.  The overall benefits of having research programs will not outweigh the large financial costs of supporting this business.

Congress has made these true indirect costs steadily higher.  Over the last decades, both because it is responsible stewardship and because it's good politics, Congress has passed laws requiring more and more oversight of research expenditures and security.  Compliance with these rules has meant that universities have had to hire more administrators - on financial accounting and reporting, research security, tech transfer and intellectual property, supervisory folks for animal- and human-based research, etc.  Agencies can impose their own requirements as well.  Some large center-type grants from NIH/HHS and DOD require preparation and submission of monthly financial reports. 

What did NIH do yesterday?  NIH put out new guidance (linked above) setting their indirect cost rate to 15% effective this coming Monday.  This applies not just to new grants, but also to awards already underway.  There is also a not very veiled threat in there that says, we have chosen for now not to retroactively go back to the start of current awards and ask for funds (already spent) to be returned to us, but we think we would be justified in doing so.   The NIH twitter feed proudly says that this change will produce an immediate savings to US taxpayers of $4B. 

What does this mean?  What are the intended and possible unintended consequences?  It seems very likely that other agencies will come under immense pressure to make similar changes.  If all agencies do so, and nothing else changes, this will mean tens of millions fewer dollars flowing to typical research universities every year.   If a university has $300M annually in federally sponsored research, then that would be generating under the old rules (assume 55% indirect rate) $194M of direct and $106M of indirect costs.  If the rate is dropped to 15% and the direct costs stay the same at $194M, then that would generate $29M of indirect costs, a net cut to the university of $77M per year.

There will be legal challenges to all of this, I suspect. 

The intended consequences are supposedly to save taxpayer dollars and force universities to streamline their administrative processes.  However, given that Congress and the agencies are unlikely to lessen their reporting and oversight requirements, it's very hard for me to see how there can be some radical reduction in accounting and compliance staffs.  There seems to be a sentiment that this will really teach those wealthy elite universities a lesson, that with their big endowments they should pick up more of the costs.

One unintended consequence:  If this broadly goes through and sticks, universities will want to start making new direct costs.  For a grant like the one I described above, you could imagine asking for $1200 per year for electricity, $1000/yr for IT support, $3000/yr for lab space maintenance, etc.  This will create a ton of work for lawyers, as there will be a fight over what is or is not an allowable direct cost.  This will also create the need for even more accounting types to track all of this.  This is the exact opposite of "streamlined" administrative processes.

A second unintended consequence:  Universities for whom doing research is financially a lot more of a marginal proposition would likely get out of those activities,  if they truly can't recover the costs of operating their offices of research.  This is the opposite of improving the situation and student opportunities at the less elite universities.

From a purely real politik perspective that often appeals to legislators:  Everything that harms the US research enterprise effectively helps adversaries.  The US benefitted enormously after WWII by building a global premier research environment.  Risking that should not be done lightly.

Don't panic.  There is nothing gained by freaking out.  Whatever happens, it will likely be a drawn out process.  It's best to be aware of what's happening, educated about what it means, and deliberate in formulating strategies that will preserve research excellence and capabilities.

(So help me, I really want my next post to be about condensed matter physics or nanoscale science!)

February 13, 2025

Terence TaoCosmic Distance Ladder video with Grant Sanderson (3blue1brown): commentary and corrections

Grant Sanderson (who creates the website and Youtube channel 3blue1brown) has been collaborating with myself and others (including my coauthor Tanya Klowden) on producing a two-part video giving an account of some of the history of the cosmic distance ladder, building upon a previous public lecture I gave on this topic, and also relating to a forthcoming popular book with Tanya on this topic. The first part of this video is already available; at the current time Grant is still editing the second part.

The video was based on a somewhat unscripted interview that Grant conducted with me some months ago, and as such contained some minor inaccuracies and omissions. It also generated many good questions from the viewers of the Youtube video. I am therefore compiling here a “FAQ” of various clarifications and corrections to the video; this was originally placed as a series of comments on the Youtube channel, but the blog post format here will be easier to maintain going forward. Some related content will also be posted on the Instagram page for the forthcoming book with Tanya.

For now, the comments only pertain to the first video; I plan to extend this page with commentary for the second video when it becomes available. I will mark each question with an appropriate timestamp to the video.

  • 4:50 How did Eratosthenes know that the Sun was so far away that its light rays were close to parallel?

    This was not made so clear in our discussions or in the video (other than a brief glimpse of the timeline at 18:27), but Eratosthenes’s work actually came after Aristarchus, so it is very likely that Eratosthenes was aware of Aristarchus’s conclusions about how distant the Sun was from the Earth. Even if Aristarchus’s heliocentric model was disputed by the other Greeks, at least some of his other conclusions appear to have attracted some support. Also, after Eratosthenes’s time, there was further work by Greek, Indian, and Islamic astronomers (such as Hipparchus, Ptolemy, Aryabhata, and Al-Battani) to measure the same distances that Aristarchus did, although these subsequent measurements for the Sun also were somewhat far from modern accepted values.
  • 5:17 Is it completely accurate to say that on the summer solstice, the Earth’s axis of rotation is tilted “directly towards the Sun”?

    Strictly speaking, “in the direction towards the Sun” is more accurate than “directly towards the Sun”; it tilts at about 23.5 degrees towards the Sun, but it is not a total 90-degree tilt towards the Sun.
  • 5:39 Wait, aren’t there two tropics? The tropic of Cancer and the tropic of Capricorn?

    Yes! This corresponds to the two summers Earth experiences, one in the Northern hemisphere and one in the Southern hemisphere. The tropic of Cancer, at a latitude of 23 degrees north, is where the Sun is directly overhead at noon during the Northern summer solstice (around June 21); the tropic of Capricorn, at a latitude of 23 degrees south, is where the Sun is directly overhead at noon during the Southern summer solstice (around December 21). But Alexandria and Syene were both in the Northern Hemisphere, so it is the tropic of Cancer that is relevant to Eratosthenes’ calculations.
  • 5:41 Isn’t it kind of a massive coincidence that Syene was on the tropic of Cancer?

    Actually, Syene (now known as Aswan) was about half a degree of latitude away from the tropic of Cancer, which was one of the sources of inaccuracy in Eratosthenes’ calculations.  But one should take the “look-elsewhere effect” into account: because the Nile cuts across the tropic of Cancer, it was quite likely to happen that the Nile would intersect the tropic near some inhabited town.  It might not necessarily have been Syene, but that would just mean that Syene would have been substituted by this other town in Eratosthenes’s account.  

    On the other hand, it was fortunate that the Nile ran from North to South, so that distances between towns were a good proxy for the differences in latitude.  Apparently, Eratosthenes actually had a more complicated argument that would also work if the two towns in question were not necessarily oriented along the North-South direction, and if neither town was on the tropic of Cancer; but unfortunately the original writings of Eratosthenes are lost to us, and we do not know the details of this more general argument.

    Nowadays, the “Eratosthenes experiment” is run every year on the March equinox, in which schools at the same longitude are paired up to measure the elevation of the Sun at the same point in time, in order to obtain a measurement of the circumference of the Earth.  (The equinox is more convenient than the solstice when neither location is on a tropic, due to the simple motion of the Sun at that date.) With modern timekeeping, communications, surveying, and navigation, this is a far easier task to accomplish today than it was in Eratosthenes’ time.
  • 6:30 I thought the Earth wasn’t a perfect sphere. Does this affect this calculation?

    Yes, but only by a small amount. The centrifugal forces caused by the Earth’s rotation along its axis cause an equatorial bulge and a polar flattening so that the radius of the Earth fluctuates by about 20 kilometers from pole to equator. This sounds like a lot, but it is only about 0.3% of the mean Earth radius of 6371 km and is not the primary source of error in Eratosthenes’ calculations.
  • 7:27 Are the riverboat merchants and the “grad student” the leading theories for how Eratosthenes measured the distance from Alexandria to Syene?

    There is some recent research that suggests that Eratosthenes may have drawn on the work of professional bematists (step measurers) for this calculation. This somewhat ruins the “grad student” joke, but perhaps should be disclosed for the sake of completeness.
  • 8:51 How long is a “lunar month” in this context? Is it really 28 days?

    In this context the correct notion of a lunar month is a “synodic month” – the length of a lunar cycle relative to the Sun – which is actually about 29 days and 12 hours. It differs from the “sidereal month” – the length of a lunar cycle relative to the fixed stars – which is about 27 days and 8 hours – due to the motion of the Earth around the Sun (or the Sun around the Earth, in the geocentric model). [A similar correction needs to be made around 14:59, using the synodic month of 29 days and 12 hours rather than the “English lunar month” of 28 days (4 weeks).]
  • 10:47 Is the time taken for the Moon to complete an observed rotation around the Earth slightly less than 24 hours as claimed?

    Actually, I made a sign error: the lunar day (also known as a tidal day) is actually 24 hours and 50 minutes, because the Moon rotates in the same direction as the spinning of Earth around its axis. The animation therefore is also moving in the wrong direction as well (related to this, the line of sight is covering up the Moon in the wrong direction to the Moon rising at around 10:38).
  • 14:49 I thought the sine function was introduced well after the ancient Greeks.

    It’s true that the modern sine function only dates back to the Indian and Islamic mathematical traditions in the first millennium CE, several centuries after Aristarchus.  However, he still had Euclidean geometry at his disposal, which provided tools such as similar triangles that could be used to reach basically the same conclusions, albeit with significantly more effort than would be needed if one could use modern trigonometry.

    On the other hand, Aristarchus was somewhat hampered by not knowing an accurate value for \pi, which is also known as Archimedes’ constant: the fundamental work of Archimedes on this constant actually took place a few decades after that of Aristarchus!
  • 15:17 I plugged in the modern values for the distances to the Sun and Moon and got 18 minutes for the discrepancy, instead of half an hour.

    Yes; I quoted the wrong number here. In 1630, Godfried Wendelen replicated Aristarchus’s experiment. With improved timekeeping and the recent invention of the telescope, Wendelen obtained a measurement of half an hour for the discrepancy, which is significantly better than Aristarchus’s calculation of six hours, but still a little bit off from the true value of 18 minutes. (As such, Wendelinus’s estimate for the distance to the Sun was 60% of the true value.)
  • 15:27 Wouldn’t Aristarchus also have access to other timekeeping devices than sundials?

    Yes, for instance clepsydrae (water clocks) were available by that time; but they were of limited accuracy. It is also possible that Aristarchus could have used measurements of star elevations to also estimate time; it is not clear whether the astrolabe or the armillary sphere was available to him, but he would have had some other more primitive astronomical instruments such as the dioptra at his disposal. But again, the accuracy and calibration of these timekeeping tools would have been poor.

    However, most likely the more important limiting factor was the ability to determine the precise moment at which a perfect half Moon (or new Moon, or full Moon) occurs; this is extremely difficult to do with the naked eye. (The telescope would not be invented for almost two more millennia.)
  • 17:37 Could the parallax problem be solved by assuming that the stars are not distributed in a three-dimensional space, but instead on a celestial sphere?

    Putting all the stars on a fixed sphere would make the parallax effects less visible, as the stars in a given portion of the sky would now all move together at the same apparent velocity – but there would still be visible large-scale distortions in the shape of the constellations because the Earth would be closer to some portions of the celestial sphere than others. (This problem would be solved if the celestial sphere was somehow centered around the moving Earth rather than the fixed Sun, but then this basically becomes the geocentric model with extra steps.)
  • 18:29 Did nothing of note happen in astronomy between Eratosthenes and Kepler?

    Not at all! There were significant mathematical, technological, theoretical, and observational advances by astronomers from many cultures (Greek, Islamic, Indian, Chinese, European, and others) during this time, for instance improving some of the previous measurements on the distance ladder, a better understanding of eclipses, axial tilt, and even axial precession, more sophisticated trigonometry, and the development of new astronomical tools such as the astrolabe. But in order to make the overall story of the cosmic distance ladder fit into a two-part video, we chose to focus primarily on the first time each rung of the ladder was climbed.
  • 19:07 Isn’t it tautological to say that the Earth takes one year to perform a full orbit around the Sun?

    Technically yes, but this is an illustration of the philosophical concept of “referential opacity“: the content of a sentence can change when substituting one term for another (e.g., “1 year” and “365 days”), even when both terms refer to the same object. Amusingly, the classic illustration of this, known as Frege’s puzzles, also comes from astronomy: it is an informative statement that Hesperus (the evening star) and Phosphorus (the morning star, also known as Lucifer) are the same object (which nowadays we call Venus), but it is a mere tautology that Hesperus and Hesperus are the same object: changing the reference from Phosphorus to Hesperus changes the meaning.
  • 19:10 How did Copernicus figure out the crucial fact that Mars takes 687 days to go around the Sun? Was it directly drawn from Babylonian data?

    Technically, Copernicus drew from tables by Islamic astronomers, which were in turn based on earlier tables by Greek astronomers, who also incorporated data from the ancient Babylonians, so it is more accurate to say that Copernicus relied on centuries of data, at least some of which went all the way back to the Babylonians. Among all of this data was the times when Mars was in opposition to the Sun; if one imagines the Earth and Mars as being like runners going around a race track circling the Sun, with Earth on an inner track and Mars on an outer track, oppositions are analogous to when the Earth runner “laps” the Mars runner. From the centuries of observational data, such “laps” were known to occur about once every 780 days (this is known as the synodic period of Mars). Because the Earth takes 365 days to perform a “lap”, it is possible to do a little math and conclude that Mars must therefore complete its own “lap” in 687 days (this is known as the sidereal period of Mars). (See also this post on the cosmic distance ladder Instagram for some further elaboration.)
  • 21:39 What is that funny loop in the orbit of Mars?

    This is known as retrograde motion. This arises because the orbital velocity of Earth (about 30 km/sec) is a little bit larger than that of Mars (about 24 km/sec). So, in opposition (when Mars is in the opposite position in the sky than the Sun), Earth will briefly overtake Mars, causing its observed position to move westward rather than eastward. But in most other times, the motion of Earth and Mars are at a sufficient angle that Mars will continue its apparent eastward motion despite the slightly faster speed of the Earth.
  • 21:59 Couldn’t one also work out the direction to other celestial objects in addition to the Sun and Mars, such as the stars, the Moon, or the other planets?  Would that have helped?

    Actually, the directions to the fixed stars were implicitly used in all of these observations to determine how the celestial sphere was positioned, and all the other directions were taken relative to that celestial sphere.  (Otherwise, all the calculations would be taken on a rotating frame of reference in which the unknown orbits of the planets were themselves rotating, which would have been an even more complex task.)  But the stars are too far away to be useful as one of the two landmarks to triangulate from, as they generate almost no parallax and so cannot distinguish one location from another.

    Measuring the direction to the Moon would tell you which portion of the lunar cycle one was in, and would determine the phase of the Moon, but this information would not help one triangulate, because the Moon’s position in the heliocentric model varies over time in a somewhat complicated fashion, and is too tied to the motion of the Earth to be a useful “landmark” to one to determine the Earth’s orbit around the Sun.

    In principle, using the measurements to all the planets at once could allow for some multidimensional analysis that would be more accurate than analyzing each of the planets separately, but this would require some sophisticated statistical analysis and modeling, as well as non-trivial amounts of compute – neither of which were available in Kepler’s time.
  • 22:57 Can you elaborate on how we know that the planets all move on a plane?

    The Earth’s orbit lies in a plane known as the ecliptic (it is where the lunar and solar eclipses occur). Different cultures have divided up the ecliptic in various ways; in Western astrology, for instance, the twelve main constellations that cross the ecliptic are known as the Zodiac. The planets can be observed to only wander along the Zodiac, but not other constellations: for instance, Mars can be observed to be in Cancer or Libra, but never in Orion or Ursa Major. From this, one can conclude (as a first approximation, at least), that the planets all lie on the ecliptic. However, this isn’t perfectly true, and the planets will deviate from the ecliptic by a small angle known as the ecliptic latitude. Tycho Brahe’s observations on these latitudes for Mars were an additional useful piece of data that helped Kepler complete his calculations (basically by suggesting how to join together the different “jigsaw pieces”), but the math here gets somewhat complicated, so the story here has been somewhat simplified to convey the main ideas.
  • 23:28 Can one work out the position of Earth from fixed locations of the Sun and Mars when the Sun and Mars are in conjunction (the same location in the sky) or opposition (opposite locations in the sky)?

    Technically, these are two times when the technique of triangulation fails to be accurate; and also in the former case it is extremely difficult to observe Mars due to the proximity to the Sun. But again, following the Universal Problem Solving Tip from 23:07, one should initially ignore these difficulties to locate a viable method, and correct for these issues later.
  • 24:04 So Kepler used Copernicus’s calculation of 687 days for the period of Mars. But didn’t Kepler discard Copernicus’s theory of circular orbits?

    Good question! It turns out that Copernicus’s calculations of orbital periods are quite robust (especially with centuries of data), and continue to work even when the orbits are not perfectly circular. But even if the calculations did depend on the circular orbit hypothesis, it would have been possible to use the Copernican model as a first approximation for the period, in order to get a better, but still approximate, description of the orbits of the planets. This in turn can be fed back into the Copernican calculations to give a second approximation to the period, which can then give a further refinement of the orbits. Thanks to the branch of mathematics known as perturbation theory, one can often make this type of iterative process converge to an exact answer, with the error in each successive approximation being smaller than the previous one. (But performing such an iteration would probably have been beyond the computational resources available in Kepler’s time; also, the foundations of perturbation theory require calculus, which only was developed several decades after Kepler.)
  • 24:21 Did Brahe have exactly 10 years of data on Mars’s positions?

    Actually, it was more like 17 years, but with many gaps, due both to inclement weather, as well as Brahe turning his attention to other astronomical objects than Mars in some years; also, in times of conjunction, Mars might only be visible in the daytime sky instead of the night sky, again complicating measurements. So the “jigsaw puzzle pieces” in 25:26 are in fact more complicated than always just five locations equally spaced in time; there are gaps and also observational errors to grapple with. But to understand the method one should ignore these complications; again, see “Universal Problem Solving Tip #1”. Even with his “idea of true genius” (which, incidentally one can find in Einstein’s introduction to Carola Baumgardt’s “Life of Kepler“), it took many years of further painstaking calculation for Kepler to tease out his laws of planetary motion from Brahe’s messy and incomplete observational data.
  • 26:44 Shouldn’t the Earth’s orbit be spread out at perihelion and clustered closer together at aphelion, to be consistent with Kepler’s laws?

    Yes, you are right; there was a coding error here.

Matt Strassler Article for Pioneer Works, On the Musical Nature of Particle Physics

Pioneer Works is “an artist and scientist-led cultural center in Red Hook, Brooklyn that fosters innovative thinking through the visual and performing arts, technology, music, and science.” It’s a cool place: if you’re in the New York area, check them out! Among many other activities, they host a series called “Picture This,” in which scientists ruminate over scientific images that they particularly like. My own contribution to this series has just come out, in which I expound upon the importance and meaning of this graph from the CMS experimental collaboration at the Large Hadron Collider [LHC]. (The ATLAS experimental collaboration at the LHC has made essentially identical images.)

The point of the article is to emphasize the relation between the spikes seen in this graph and the images of musical frequencies that one might see in a recording studio (as in this image from this paper). The similarity is not an accident.

Each of the two biggest spikes is a sign of an elementary “particle”; the Z boson is the left-most spike, and the Higgs boson is the central spike. What is spiking is the probability of creating such a particle as a function of the energy of some sort of physical process (specifically, a collision of objects that are found inside protons), plotted along the horizontal axis. But energy E is related to the mass m of the “particle” (via E=mc2) and it is simultaneously related to the frequency f of the vibration of the “particle” (via the Planck-Einstein equation E = hf)… and so this really is a plot of frequencies, with spikes reflecting cosmic resonances analogous to the resonances of musical instruments. [If you find this interesting and would like more details, it was a major topic in my book.]

The title of the article refers to the fact that the Z boson and Higgs boson frequencies are out of tune, in the sense that if you slowed down their frequencies and turned them into sound, they’d be dissonant, and not very nice to listen to. The same goes for all the other frequencies of the elementary “particles”; they’re not at all in tune. We don’t know why, because we really have no idea where any of these frequencies come from. The Higgs field has a major role to play in this story, but so do other important aspects of the universe that remain completely mysterious. And so this image, which shows astonishingly good agreement between theoretical predictions (colored regions) and LHC data (black dots), also reveals how much we still don’t understand about the cosmos.

February 11, 2025

Scott Aaronson Toward a non-constant cancellation function

It now seems the switch of Cancel Culture has only two settings:

  1. everything is cancellable—including giving intellectual arguments against specific DEI policies, or teaching students about a Chinese filler word (“ne-ge”) that sounds a little like the N-word, or else
  2. nothing is cancellable—not even tweeting “normalize Indian hate” and “I was racist before it was cool,” shortly before getting empowered to remake the US federal government.

How could we possibly draw any line between these two extremes? Wouldn’t that require … judgment? Common sense? Consideration of the facts of individual cases?

I, of course, survived attempted cancellation by a large online mob a decade ago, led by well-known figures such as Amanda Marcotte and Arthur Chu. Though it was terrifying at the time—it felt like my career and even my life were over—I daresay that, here in 2025, not many people would still condemn me for trying to have the heartfelt conversation I did about nerds, feminism, and dating, deep in the comments section of this blog. My side has now conclusively “won” that battle. The once-terrifying commissars of the People’s Republic of Woke, who delighted in trying to ruin me, are now bound and chained, as whooping soldiers of the MAGA Empire drag them by their hair to the torture dungeons.

And this is … not at all the outcome I wanted? It’s a possible outcome that I foresaw in 2014, and was desperately trying to help prevent, through fostering open dialogue between shy male nerds and feminists? I’m now, if anything, more terrified for my little tribe of pro-Enlightenment, science-loving nerds than I was under the woke regime? Speaking of switches with only two settings.

Anyway, with whatever moral authority this experience vests in me, I’d like to suggest that, in future cancellation controversies, the central questions ought to include the following:

  1. What did the accused person actually say or do? Disregarding all confident online discourse about what that “type” of person normally does, or wants to do.
  2. Is there a wider context that often gets cut from social media posts, but that, as soon as you know it, makes the incident seem either better or worse?
  3. How long ago was the offense: more like thirty years or like last week?
  4. Was the person in a radically different condition than they are now—e.g., were they very young, or undergoing a mental health episode, or reacting to a fresh traumatic incident, or drunk or high?
  5. Were the relevant cultural norms different when the offense happened? Did countless others say or do the same thing, and if so, are they also at risk of cancellation?
  6. What’s reasonable to infer about what the person actually believes? What do they want to have happen to whichever group they offended? What would they do to the group given unlimited power? Have they explicitly stated answers to these questions, either before or after the incident? Have they taken real-world actions by which we could judge their answers as either sincere or insincere?
  7. If we don’t cancel this person, what are we being asked to tolerate? Just that they get to keep teaching and publishing views that many people find objectionable? Or that they get to impose their objectionable views on an entire academic department, university, company, organization, or government?
  8. If we agree that the person said something genuinely bad, did they apologize or express regret? Or, if what they said got confused with something bad, did they rush to clarify and disclaim the bad interpretation?
  9. Did they not only refuse to clarify or apologize, but do the opposite? That is, did they express glee about what they were able to get away with, or make light of the suffering or “tears” of their target group?

People can debate how to weigh these considerations, though I personally put enormous weight on 8 and 9, what you could call the “clarification vs. glee axis.” I have nearly unlimited charity for people willing to have a good-faith moral conversation with the world, and nearly unlimited contempt for people who mock the request for such a conversation.

The sad part is that, in practice, the criteria for cancellation have tended instead to be things like:

  • Is the target giving off signals of shame, distress, and embarrassment—thereby putting blood in the water and encouraging us to take bigger bites?
  • Do we, the mob, have the power to cancel this person? Does the person’s reputation and livelihood depend on organizations that care what we think, that would respond to pressure from us?

The trouble with these questions is that, not only are their answers not positively correlated with which people deserve to be cancelled, they’re negatively correlated. This is precisely how you get the phenomenon of the left-wing circular firing squad, which destroys the poor schmucks capable of shame even while the shameless, the proud racists and pussy-grabbers, go completely unpunished. Surely we can do better than that.

Tommaso DorigoUnsupervised Tracking

Pattern recognition is an altisonant name to a rather common, if complex, activity we perform countless times in our daily lives. Our brain is capable of interpreting successions of sounds, written symbols, or images almost infallibly - so much so that people like me, who have sometimes trouble to recognize a face that should be familiar, get their own disfunctionality term - in this case, prosopagnosia.

read more

Matt Strassler Elementary Particles Do Not Exist (Part 2)

[An immediate continuation of Part 1, which you should definitely read first; today’s post is not stand-alone.]

The Asymmetry Between Location and Motion

We are in the middle of trying to figure out if the electron (or other similar object) could possibly be of infinitesimal size, to match the naive meaning of the words “elementary particle.” In the last post, I described how 1920’s quantum physics would envision an electron (or other object) in a state |P0> of definite momentum or a state |X0> of definite position (shown in Figs. 1 and 2 from last time.)

If it is meaningful to say that “an electron is really is an object whose diameter is zero”, we would naturally expect to be able to put it into a state in which its position is clearly defined and located at some specific point X0 — namely, we should be able to put it into the state |X0>. But do such states actually exist?

Symmetry and Asymmetry

In Part 1 we saw all sorts of symmetry between momentum and position:

  • the symmetry between x and p in the Heisenberg uncertainty principle,
  • the symmetry between the states |X0> and |P0>,
  • the symmetry seen in their wave functions as functions of x and p shown in Figs. 1 and 2 (and see also 1a and 2a, in the side discussion, for more symmetry.)

This symmetry would seem to imply that if we could put any object, including an elementary particle, in the state |P0>, we ought to be able to put it into a state |X0>, too.

But this logic won’t follow, because in fact there’s an even more important asymmetry. The states |X0> and |P0> differ crucially. The difference lies in their energy.

Who cares about energy?

There are a couple of reasons we should care, closely related. First, just as there is a relationship between position and momentum, there is a relationship between time and energy: energy is deeply related to how wave functions evolve over time. Second, energy has its limits, and we’re going to see them violated.

Energy and How Wave Functions Change Over Time

In 1920s quantum physics, the evolution of our particle’s wave function depends on how much energy it has… or, if its energy is not definite, on the various possible energies that it may have.

Definite Momentum and Energy: Simplicity

This change with time is simple for the state |P0>, because this state, with definite momentum, also has definite energy. It therefore evolves in a very simple way: it keeps its shape, but moves with a constant speed.

Figure 5: In the state |P0>, shown in Fig. 1 of Part 1, the particle has definite momentum and energy and moves steadily at constant speed; the particle’s position is completely unknown at all times.

How much energy does it have? Well, in 1920s quantum physics, just as in pre-1900 physics, the motion-energy E of an isolated particle of definite momentum p is

  • E = p2/2m

where m is the particle’s mass. Where does this formula come from? In first-year university physics, we learn that a particle’s momentum is mv and that its motion-energy is mv2/2 = (mv)2/2m = p2/2m; so in fact this is a familiar formula from centuries ago.

Less Definite Momentum and Energy: Greater Complexity

What about the compromise states mentioned in Part 1, the ones that lie somewhere between the extreme states |X0> and |P0>, in which the particle has neither definite position nor definite momentum? These “Gaussian wave packets” appeared in Fig. 3 and 4 of Part 1. The state of Fig. 3 has less definite momentum than the |P0> state, but unlike the latter, it has a rough location, albeit broadly spread out. How does it evolve?

As seen in Fig. 6, the wave still moves to the left, like the |P0> state. But this motion is now seen not only in the red and blue waves which represent the wave function itself but also in the probability for where to find the particle’s position, shown in the black curve. Our knowledge of the position is poor, but we can clearly see that the particle’s most likely position moves steadily to the left.

Figure 6: In a state with less definite momentum than |P0>, as shown in Fig. 3 of Part 1, the particle has less definite momentum and energy, but its position is roughly known, and its most likely position moves fairly steadily at near-constant speed. If we watched the wave function for a long time, it would slowly spread out.

What happens if the particle’s position is better known and the momentum is becoming quite uncertain? We saw what a wave function for such a particle looks like in Fig. 4 of Part 1, where the position is becoming quite well known, but nowhere as precisely as in the |X0> state. How does this wave function evolve over time? This is shown in Fig. 7.

Figure 7: In a state with better known position, shown in Fig. 4 of Part 1, the particle’s position is initially well known but becomes less and less certain over time, as its indefinite momentum and energy causes it to move away from its initial position at a variety of possible speeds.

We see the wave function still indicates the particle is moving to the left. But the wave function spreads out rapidly, meaning that our knowledge of its position is quickly decreasing over time. In fact, if you look at the right edge of the wave function, it has barely moved at all, so the particle might be moving slowly. But the left edge has disappeared out of view, indicating that the particle might be moving very rapidly. Thus the particle’s momentum is indeed very uncertain, and we see this in the evolution of the state.

This uncertainty in the momentum means that we have increased uncertainty in the particle’s motion-energy. If it is moving slowly, its motion-energy is low, while if it is moving rapidly, its motion-energy is much higher. If we measure its motion-energy, we might find it anywhere in between. This is why its evolution is so much more complex than that seen in Fig. 5 and even Fig. 6.

Near-Definite Position: Breakdown

What happens as we make the particle’s position better and better known, approaching the state |X0> that we want to put our electron in to see if it can really be thought of as a true particle within the methods of 1920s quantum physics?

Well, look at Fig. 8, which shows the time-evolution of a state almost as narrow as |X0> .

Figure 8: the time-evolution of a state almost as narrow as |X0>.

Now we can’t even say if the particle is going to the left or to the right! It may be moving extremely rapidly, disappearing off the edges of the image, or it may remain where it was initially, hardly moving at all. Our knowledge of its momentum is basically nil, as the uncertainty principle would lead us to expect. But there’s more. Even though our knowledge of the particle’s position is initially excellent, it rapidly degrades, and we quickly know nothing about it.

We are seeing the profound asymmetry between position and momentum:

  • a particle of definite momentum can retain that momentum for a long time,
  • a particle of definite position immediately becomes one whose position is completely unknown.

Worse, the particle’s speed is completely unknown, which means it can be extremely high! How high can it go? Well, the closer we make the initial wave function to that of the state |X0>, the faster the particle can potentially move away from its initial position — until it potentially does so in excess of the cosmic speed limit c (often referred to as the “speed of light”)!

That’s definitely bad. Once our particle has the possibility of reaching light speed, we need Einstein’s relativity. But the original quantum methods of Heisenberg-Born-Jordan and Schrödinger do not account for the cosmic speed limit. And so we learn: in the 1920s quantum physics taught in undergraduate university physics classes, a state of definite position simply does not exist.

Isn’t it Relatively Easy to Resolve the Problem?

But can’t we just add relativity to 1920s quantum physics, and then this problem will take care of itself?

You might think so. In 1928, Dirac found a way to combine Einstein’s relativity with Schrödinger’s wave equation for electrons. In this case, instead of the motion-energy of a particle being E = p2/2m, Dirac’s equation focuses on the total energy of the particle. Written in terms of the particle’s rest mass m [which is the type of mass that doesn’t change with speed], that total energy satisfies the equation

  • E = \sqrt{ (pc)^2 + (mc^2)^2 }

For stationary particles, which have p=0, this equation reduces to E=mc2, as it must.

This does indeed take care of the cosmic speed limit; our particle no longer breaks it. But there’s no cosmic momentum limit; even though v has a maximum, p does not. In Einstein’s relativity, the relation between momentum and speed isn’t p=mv anymore. Instead it is

  • p = mv/\sqrt{1-(v/c)^2}

which gives the old formula when v is much less than c, but becomes infinite as v approaches c.

Not that there’s anything wrong with that; momentum can be as large as one wants. The problem is that, as you can see for the formula for energy above, when p goes to infinity, so does E. And while that, too, is allowed, it causes a severe crisis, which I’ll get to in a moment.

Actually, we could have guessed from the start that the energy of a particle in a state of definite position |X0> would be arbitrarily large. The smaller is the position uncertainty Δx, the larger is the momentum uncertainty Δp; and once we have no idea what the particle’s momentum is, we may find that it is huge — which in turn means its energy can be huge too.

Notice the asymmetry. A particle with very small Δp must have very large Δx, but having an unknown location does not affect an isolated particle’s energy. But a particle with very small Δx must have very large Δp, which inevitably means very large energy.

The Particle(s) Crisis

So let’s try to put an isolated electron into a state |X0>, knowing that the total energy of the electron has some probability of being exceedingly high. In particular, it may be much, much larger — tens, or thousands, or trillions of times larger — than mc2 [where again m means the “rest mass” or “invariant mass” of the particle — the version of mass that does not change with speed.]

The problem that cannot be avoided first arises once the energy reaches 3mc2 . We’re trying to make a single electron at a definite location. But how can we be sure that 3mc2 worth of energy won’t be used by nature in another way? Why can’t nature use it to make not only an electron but also a second electron and a positron? [Positrons are the anti-particles of electrons.] If stationary, each of the three particles would require mc2 for its existence.

If electrons (not just the electron we’re working with, but electrons in general) didn’t ever interact with anything, and were just incredibly boring, inert objects, then we could keep this from happening. But not only would this be dull, it simply isn’t true in nature. Electrons do interact with electromagnetic fields, and with other things too. As a result, we can’t stop nature from using those interactions and Einstein’s relativity to turn 3mc2 of energy into three slow particles — two electrons and a positron — instead of one fast particle!

For the state |X0> with Δx = 0 and Δp = infinity, there’s no limit to the energy; it could be 3mc2, 11mc2, 13253mc2, 9336572361mc2. As many electron/positron pairs as we like can join our electron. The |X0> state we have ended up with isn’t at all like the one we were aiming for; it’s certainly not going to be a single particle with a definite location.

Our relativistic version of 1920s quantum physics simply cannot handle this proliferation. As I’ve emphasized, an isolated physical system has only one wave function, no matter how many particles it has, and that wave function exists in the space of possibilities. How big is that space of possibilities here?

Normally, if we have N particles moving in d dimensions of physical space, then the space of possibilities has N-times-d dimensions. (In examples that I’ve given in this post and this one, I had two particles moving in one dimension, so the space of possibilities was 2×1=2 dimensional.) But here, N isn’t fixed. Our state |X0> might have one particle, three, seventy one, nine thousand and thirteen, and so on. And if these particles are moving in our familiar three dimensions of physical space, then the space of possibilities is 3 dimensional if there is one particle, 9 dimensional if there are three particles, 213 dimensional if there are seventy-one particles — or said better, since all of these values of N are possible, our wave function has to simultaneously exist in all of these dimensional spaces at the same time, and tell us the probability of being in one of these spaces compared to the others.

Still worse, we have neglected the fact that electrons can emit photons — particles of light. Many of them are easily emitted. So on top of everything else, we need to include arbitrary numbers of photons in our |X0> state as well.

Good Heavens. Everything is completely out of control.

How Small Can An Electron Be (In 1920s Quantum Physics?)

How small are we actually able to make an electron’s wave function before the language of the 1920s completely falls apart? Well, for the wave function describing the electron to make sense,

  • its motion-energy must be below mc2, which means that
  • p has to be small compared to mc , which means that
  • Δp has to be small compared to mc , which means that
  • by Heisenberg’s uncertainty principle, Δx has to be large compared to h/(mc)

This distance (up to the factor of 1/) is known as a particle’s Compton wavelength, and it is about 10-13 meters for an electron. That’s about 1/1000 of the distance across an atom, but 100 times the diameter of a small atomic nucleus. Therefore, 1920s quantum physics can describe electrons whose wave functions allow them to range across atoms, but cannot describe an electron restricted to a region the size of an atomic nucleus, or of a proton or neutron, whose size is 10-15 meters. It certainly can’t handle an electron restricted to a point!

Let me reiterate: an electron cannot be restricted to a region the size of a proton and still be described by “quantum mechanics”.

As for neutrinos, it’s much worse; since their masses are much smaller, they can’t even be described in regions whose diameter is that of a thousand atoms!

The Solution: Relativistic Quantum Field Theory

It took scientists two decades (arguably more) to figure out how to get around this problem. But with benefit of hindsight, we can say that it’s time for us to give up on the quantum physics of the 1920s, and its image of an electron as a dot — as an infinitesimally small object. It just doesn’t work.

Instead, we now turn to relativistic quantum field theory, which can indeed handle all this complexity. It does so by no longer viewing particles as fundamental objects with infinitesimal size, position x and momentum p, and instead seeing them as ripples in fields. A quantum field can have any number of ripples — i.e. as many particles and anti-particles as you want, and more generally an indefinite number. Along the way, quantum field theory explains why every electron is exactly the same as every other. There is no longer symmetry between x and p, no reason to worry about why states of definite momentum exist and those of definite position do not, and no reason to imagine that “particles” [which I personally think are more reasonably called “wavicles“, since they behave much more like waves than particles] have a definite, unchanging shape.

The space of possibilities is now the space of possible shapes for the field, which always has an infinite number of dimensions — and indeed the wave function of a field (or of multiple fields) is a function of an infinite number of variables (really a function of a function [or of multiple functions], called a “functional”).

Don’t get me wrong; quantum field theory doesn’t do all this in a simple way. As physicists tried to cope with the difficult math of quantum field theory, they faced many serious challenges, including apparent infinities everywhere and lots of consistency requirements that needed to be understood. Nevertheless, over the past seven decades, they solved the vast majority of these problems. As they did so, field theory turned out to agree so well with data that it has become the universal modern language for describing the bricks and mortar of the universe.

Yet this is not the end of the story. Even within quantum field theory, we can still find ways to define what we mean by the “size” of a particle, though doing so requires a new approach. Armed with this definition, we do now have clear evidence that electrons are much smaller than protons. And so we can ask again: can an elementary “particle” [wavicle] have zero size?

We’ll return to this question in later posts.

February 10, 2025

Matt Strassler Elementary Particles Do Not Exist (Part 1)

This is admittedly a provocative title coming from a particle physicist, and you might think it tongue-in-cheek. But it’s really not.

We live in a cosmos with quantum physics, relativity, gravity, and a bunch of elementary fields, whose ripples we call elementary particles. These elementary “particles” include objects like electrons, photons, quarks, Higgs bosons, etc. Now if, in ordinary conversation in English, we heard the words “elementary” and “particle” used together, we would probably first imagine that elementary particles are tiny balls, shrunk down to infinitesimal size, making them indivisible and thus elementary — i.e., they’re not made from anything smaller because they’re as small as could be. As mere points, they would be objects whose diameter is zero.

But that’s not what they are. They can’t be.

I’ll tell this story in stages. In my last post, I emphasized that after the Newtonian view of the world was overthrown in the early 1900s, there emerged the quantum physics of the 1920s, which did a very good job of explaining atomic physics and a variety of other phenomena. In atomic physics, the electron is indeed viewed as a particle, though with behavior that is quite unfamiliar. The particle no longer travels on a path through physical space, and instead its behavior — where it is, and where it is going — is described probabilistically, using a wave function that exists in the space of possibilities.

But as soon became clear, 1920s quantum physics forbids the very existence of elementary particles.

In 1920s Quantum Physics, True Particles Do Not Exist

To claim that particles do not exist in 1920s quantum physics might seem, at first, absurd, especially to people who took a class on the subject. Indeed, in my own blog post from last week, I said, without any disclaimers, that “1920s quantum physics treats an electron as a particle with position x and momentum p that are never simultaneously definite.” (Recall that momentum is about motion; in pre-quantum physics, the momentum of an object is its mass m times its speed v.) Unless I was lying to you, my statement would seem to imply that the electron is allowed to have definite position x if its momentum p is indefinite, and vice versa. And indeed, that’s what 1920s quantum physics would imply.

To see why this is only half true, we’re going to examine two different perspectives on how 1920s quantum physics views location and motion — position x and momentum p.

  1. There is a perfect symmetry between position and momentum (today’s post)
  2. There is a profound asymmetry between position and momentum (next post)

Despite all the symmetry, the asymmetry turns out to be essential, and we’ll see (in the next post) that it implies particles of definite momentum can exist, but particles of definite position cannot… not in 1920s quantum physics, anyway.

The Symmetry Between Location and Motion

The idea of a symmetry between location and motion may seem pretty weird at first. After all, isn’t motion the change in something’s location? Obviously the reverse is not generally true: location is not the change in something’s motion! Instead, the change in an object’s motion is called its “acceleration” (a physics word that includes what in English we’d call acceleration, deceleration and turning.) In what sense are location and motion partners?

The Uncertainty Principle of Werner Heisenberg

In a 19th century reformulation of Newton’s laws of motion that was introduced by William Rowan Hamilton — keeping the same predictions, but rewriting the math in a new way — there is a fundamental symmetry between position x and momentum p. This way of looking at things is carried on into quantum physics, where we find it expressed most succinctly through Heisenberg’s uncertainty principle, which specifically tells us that we cannot know a object’s position and momentum simultaneously.

This might sound vague, but Heisenberg made his principle very precise. Let’s express our uncertainty in the object’s position as Δx. (Heisenberg defined this as the average value of x2 minus the squared average value of x. Less technically, it means that if we think the particle is probably at a position x0, an uncertainty of Δx means that the particle has a 95% chance of being found anywhere between x0-2Δx and x0+2Δx.) Let’s similarly express our uncertainty about the object’s momentum (which, again, is naively its speed times its mass) as Δp. Then in 1920s quantum physics, it is always true that

  • Δp Δx > h / (4π)

where h is Planck’s constant, the mascot of all things quantum. In other words, if we know our uncertainty on an object’s position Δx, then the uncertainty on its momentum cannot be smaller than a minimum amount:

  • Δp > h / (4π Δx) .

Thus, the better we know an object’s position, implying a smaller Δx, the less we can know about the object’s momentum — and vice versa.

This can be taken to extremes:,

  • if we knew an object’s motion perfectly — if Δp is zero — then Δx = h / (4π Δp) = infinity, in which case we have no idea where the particle might be
  • if we knew an object’s location perfectly — if Δx is zero — then Δp = h / (4π Δx) = infinity, in which case we have no idea where or how fast the particle might be going.

You see everything is perfectly symmetric: the more I know about the object’s location, the less I can know about its motion, and vice versa.

(Note: My knowledge can always be worse. If I’ve done a sloppy measurement, I could be very uncertain about the object’s location and very uncertain about its location. The uncertainty principle contains a greater-than sign (>), not an equals sign. But I can never be very certain about both at the same time.)

An Object with Known Motion

What does it mean for an object to have zero uncertainty in its position or its motion? Quantum physics of the 1920s asserts that any system is described by a wave function that tells us the probability for where we might find it and what it might be doing. So let’s ask: what form must a wave function take to describe a single particle with perfectly known momentum p?

The physical state corresponding to a single particle with perfectly known momentum P0 , which is often denoted |P0>, has a wave function

\Psi_{|{P_0}\rangle} (x) = \cos [P_0 x] + i \sin [P_0 x]

times an overall constant which we don’t have to care about. Notice the i = \sqrt{-1} ; this is a complex number at each position x. I’ve plotted the real and imaginary parts of this function in Fig. 1 below. As you see, both the real (red) and imaginary (blue) parts look like a simple wave, of infinite extent and of constant wavelength and height.

Figure 1: In red and blue, the real and imaginary parts of the wave function describing a particle of known momentum (up to an overall constant). In black is the square of the wave function, showing that the particle has equal probability to be at each possible location.

Now, what do we learn from the wave function about where this object is located? The probability for finding the object at a particular position X is given by the absolute value of the wave function squared. Recall that if I have any complex number z = x + i y, then its absolute value squared |z2| equals |x2|+|y2|. Therefore the probability to be at X is proportional to

  • |\Psi_{|{P_0}\rangle} (X)|^2 = (\cos [P_0 X])^2 + ( \sin [P_0 X])^2 = 1

(again multiplied by an overall constant.) Notice, as shown by the black line in Fig. 1, this is the same no matter what X is, which means the object has an equal probability to be at any location we choose. And so, we have absolutely no idea of where it is; as far as we’re concerned, its position is completely random.

An Object with Known Location

As symmetry requires, we can do the same for a single object with perfectly known position X0. The corresponding physical state, denoted |X0>, has a wave function

  • \Psi_{|{X_0}\rangle} (x) = 0\ {\rm\  except \ at\ } x=X_0, {\rm \ where \ it \ is \ } \infty\

again times an overall constant. Physicists call this a “delta function”, but it’s just an infinitely narrow spike of some sort. I’ve plotted something like it in Figure 2, but you should imagine it being infinitely thin and infinitely high, which obviously I can’t actually draw.

This wave function tells us that the probability that the object is at any point other than X0 is equal to zero. You might think the probability of it being at X0 is infinity squared, but the math is clever and the probability that it is at X0 is exactly 1. So if the particle is in the physical state |X0>, we know exactly where it is: it’s at position X0.

Figure 2: The wave function describing a particle of known position (up to an overall constant). The square of the wave function is in black, showing that the particle has zero probability to be anywhere except at the spike. The real and imaginary parts (in red and blue) are mostly covered by the black line.

What do we know about its motion? Well, we saw in Fig. 1 that to know an object’s momentum perfectly, its wave function should be a spread-out, simple wave with a constant wavelength. This giant spike, however, is as different from nice simple waves as it could possibly be. So |X0> is a state in which the momentum of the particle, and thus its motion, is completely unknown. [To prove this vague argument using math, we would use a Fourier transform; we’ll get more insight into this in a later post.]

So we have two functions, as different from each other as they could possibly be,

  • Fig. 1 describing an object with a definite momentum and completely unknown position, and
  • Fig. 2 describing an object with definite position and completely unknown momentum.

CAUTION: We might be tempted to think: “oh, Fig. 1 is the wave, and Fig. 2 is the particle”. Indeed the pictures make this very tempting! But no. In both cases, we are looking at the shape of a wave function that describes where an object, perhaps a particle, is located. When people talk about an electron being both wave and particle, they’re not simply referring to the relation between momentum states and position states; there’s more to it than that.

CAUTION 2: Do not identify the wave function with the particle it describes!!! It is not true that each particle has its own wave function. Instead, if there were two particles, there would still be only one wave function, describing the pair of particles. See this post and this one for more discussion of this crucial point.

Objects with More or Less Uncertainty

We can gain some additional intuition for this by stepping back from our extreme |P0> and |X0> states, and looking instead at compromise states that lie somewhere between the extremes. In these states, neither p nor x is precisely known, but the uncertainty of one is as small as it can be given the uncertainty of the other. These special states are often called “Gaussian wave packets”, and they are ideal for showing us how Heisenberg’s uncertainty principle plays out.

In Fig. 3 I’ve shown a wave function for a particle whose position is poorly known but whose momentum is better known. This wave function looks like a trimmed version of the |P0> state of Fig. 1, and indeed the momentum of the particle won’t be too far from P0. The position is clearly centered to the right of the vertical axis, but it has a large probability to be on the left side, too. So in this state, Δp is small and Δx is large.

Figure 3: A wave function similar to that of Fig. 1, describing a particle that has an almost definite momentum and a rather uncertain position.

In Fig. 4 I’ve shown a wave function of a wave packet that has the situation reversed: its position is well known and its momentum is not. It looks like a smeared out version of the |X0> state in Fig. 2, and so the particle is most likely located quite close to X0. We can see the wave function shows some wavelike behavior, however, indicating the particle’s momentum isn’t completely unknown; nevertheless, it differs greatly from the simple wave in Fig. 1, so the momentum is pretty uncertain. So here, Δx is small and Δp is large.

Figure 4: A wave function similar to that of Fig. 2, describing a particle that has an almost definite position and a highly uncertain momentum.

In this way we can interpolate however we like between Figs. 1 and 2, getting whatever uncertainty we want on momentum and position as long as they are consistent with Heisenberg’s uncertainty relation.

Wave functions in the space of possible momenta

There’s even another more profound, but somewhat more technical, way to see the symmetry in action; click here if you are interested.

As I’ve emphasized recently (and less recently), the wave function of a system exists in the space of possibilities for that system. So far I’ve been expressing this particle’s wave function as a space of possibilities for the particle’s location — in other words, I’ve been writing it, and depicting it in Figs. 1 and 2, as Ψ(x). Doing so makes it more obvious what the probabilities are for where the particle might be located, but to understand what this function means for what the particle’s motion takes some reasoning.

But I could instead (thanks to the symmetry between position and momentum) write the wave function in the space of possibilities for the particle’s motion! In other words, I can take the state |P0>, in which the particle has definite momentum, and write it either as Ψ(x), shown in Fig. 1, or as Ψ(p), shown in Fig. 1a.

Figure 1a: The wave function of Fig. 1, written in the space of possibilities of momenta instead of the space of possibilities of position; i.e., the horizontal axis show the particle’s momentum p, not its position x as is the case in Figs. 1 and 2. This shows the particle’s momentum is definitely known. Compare this with Fig. 2, showing a different wave function in which the particle’s position is definitely known.

Remarkably, Fig. 1a looks just like Fig. 2 — except for one crucial thing. In Fig. 2, the horizontal axis is the particle’s position. In Fig. 1a, however, the horizontal axis is the particle’s momentum — and so while Fig. 2 shows a wave function for a particle with definite position, Fig. 1a shows a wave function for a particle with definite momentum, the same wave function as in Fig. 1.

We can similarly write the wave function of Fig. 2 in the space of possibilities for the particle’s position, and not surprisingly, the resulting Fig. 2a looks just like Fig. 1, except that its horizontal axis represents p, and so in this case we have no idea what the particle’s momentum is — neither the particle’s speed nor its direction.

Fig. 2a: As in Fig. 1a, the wave function in Fig. 2 written in terms of the particle’s momentum p.

The relationship between Fig. 1 and Fig. 1a is that each is the Fourier transform of the other [where the momentum is related to the inverse wavelength of the wave obtained in the transform.] Similarly, Figs. 2 and 2a are each other’s Fourier transforms.

In short, the wave function for the state |P0> (as a function of position) in Fig. 1 looks just like the wave function for the state |X0> (as a function of momentum) in Fig. 2a, and a similar relation holds for Figs. 2 and 1a. Everything is symmetric!

The Symmetry and the Particle…

So, what’s this all got to do with electrons and other elementary particles? Well, if a “particle” is really and truly a particle, an object of infinitesimal size, then we certainly ought to be able to put it, or at least imagine it, in a position state like |X0>, in which its position is clearly X0 with no uncertainty. Otherwise how could we ever even tell if its size is infinitesimal? (This is admittedly a bit glib, but the rough edges to this argument won’t matter in the end.)

That’s where this symmetry inherent in 1920s quantum physics comes in. We do in fact see states of near-definite momentum — of near-definite motion. We can create them pretty easily, for instance in narrow electron beams, where the electrons have been steered by electric and magnetic fields so they have very precisely defined momentum. Making position states is trickier, but it would seem they must exist, thanks to the symmetry of momentum and position.

But they don’t. And that’s thanks to a crucial asymmetry between location and motion that we’ll explore next time.

n-Category Café A Characterization of Standard Borel Spaces

People in measure theory find it best to work with, not arbitrary measurable spaces, but certain nice ones called standard Borel spaces. I’ve used them myself.

The usual definition of these looks kind of clunky: a standard Borel space is a set XX equipped with a σ\sigma-algebra Σ\Sigma for which there exists a complete metric on XX such that Σ\Sigma is the σ\sigma-algebra of Borel sets. But the results are good. For example, every standard Borel space is isomorphic to one of these:

  • a finite or countably infinite set with its σ\sigma-algebra of all subsets,
  • the real line with its sigma-algebra of Borel subsets.

So standard Borel spaces are a good candidate for Tom Leinster’s program of revealing the mathematical inevitability of certain traditionally popular concepts. I forget exactly how he put it, but it’s a great program and I remember some examples: he found nice category-theoretic characterizations of Lebesgue integration, entropy, and the nerve of a category.

Now someone has done this for standard Borel spaces!

This paper did it:

Here is his result:

Theorem. The category SBor\mathsf{SBor} of standard Borel spaces and Borel maps is the (bi)initial object in the 2-category of countably complete countably extensive Boolean categories.

Here a category is countably complete if it has countable limits. It’s countably extensive if it has countable coproducts and a map

X iY i X \to \sum_i Y_i

into a countable coproduct is the same thing as a decomposition X iX iX \cong \sum_i X_i together with maps X iY iX_i \to Y_i for each ii. It’s Boolean if the poset of subobjects of any object is a Boolean algebra.

So, believe it or not, these axioms are sufficient to develop everything we do with standard Borel spaces!

February 09, 2025

Matt Strassler An Attack on US Universities

As expected, the Musk/Trump administration has aimed its guns at the US university system, deciding that universities that get grants from the federal government’s National Institute of Health will have their “overhead” capped at 15%. Overhead is the money that is used to pay for the unsung things that make scientific research at universities and medical schools possible. It pays for staff that keep the university running — administrators and accountants in business offices, machinists who help build experiments, janitorial staff, and so on — as well as the costs for things like building maintenance and development, laboratory support, electricity and heating, computing clusters, and the like.

I have no doubt that the National Science Foundation, NASA, and other scientific funding agencies will soon follow suit.

As special government employee Elon Musk wrote on X this weekend, “Can you believe that universities with tens of billions in endowments were siphoning off 60% of research award money for ‘overhead’? What a ripoff!

The actual number is 38%. Overhead of 60% is measured against the research part of the award, not the total award, and so the calculation is 60%/(100%+60%) = 37.5%, not 60%/100%=60%. This math error is a little worrying, since the entire national budget is under Musk’s personal control. And never mind that a good chunk of that money often comes back to research indirectly, or that “siphon”, a loaded word implying deceit, is inappropriate — the overhead rate for each university isn’t a secret.

Is overhead at some universities too high? A lot of scientific researchers feel that it is. One could reasonably require a significant but gradual reduction of the overhead rate over several years, which would cause limited damage to the nation’s research program. But dropping the rate to 15%, and doing so over a weekend, will simply crush budgets at every major academic research institution in the country, leaving every single one with a significant deficit. Here is one estimate of the impact on some of the United States leading universities; I can’t quickly verify these details myself, but the numbers look to be at the right scale. They are small by Musk standards, but they come to something very roughly like $10000, more or less, per student, per year.

Also, once the overhead rate is too low, having faculty doing scientific research actually costs a university money. Every new grant won by a scientist at the university makes the school’s budget deficit worse. Once that line is crossed, a university may have to limit research… possibly telling some fraction of its professors not to apply for grants and to stop doing research.

It is very sad that Mr. Musk considers the world’s finest medical/scientific research program, many decades in the making and of such enormous value to the nation, to be deserving of this level of disruption. While is difficult to ruin our world-leading medical and scientific research powerhouse overnight, this decision (along with the funding freeze/not-freeze/kinda-freeze from two weeks ago) is a good start. Even if this cut is partially reversed, the consequences on health care and medicine in this country, and on science and engineering more widely, will be significant and long-lasting — because if you were one of the world’s best young medical or scientific researchers, someone who easily could get a job in any country around the globe, would you want to work in the US right now? The threat of irrational chaos that could upend your career at any moment is hardly appealing.

February 08, 2025

Scott Aaronson “If you’re not a woke communist, you have nothing to fear,” they claimed

Part of me feels bad not to have written for weeks about quantum error-correction or BQP or QMA or even the new Austin-based startup that launched a “quantum computing dating app” (which, before anyone asks, is 100% as gimmicky and pointless as it sounds).

But the truth is that, even if you cared narrowly about quantum computing, there would be no bigger story right now than the fate of American science as a whole, which for the past couple weeks has had a knife to its throat.

Last week, after I blogged about the freeze in all American federal science funding (which has since been lifted by a judge’s order), a Trump-supporting commenter named Kyle had this to say:

No, these funding cuts are not permanent. He is only cutting funds until his staff can identify which money is going to the communists and the wokes. If you aren’t a woke or a communist, you have nothing to fear.

Read that one more time: “If you aren’t woke or a communist, you have nothing to fear.”

Can you predict what happened barely a week later? Science magazine now reports that the Trump/Musk/DOGE administration is planning to cut the National Science Foundation’s annual budget from $9 billion to only $3 billion (Biden, by contrast, had proposed an increase to $10 billion). Other brilliant ideas under discussion, according to the article, are to use AI to evaluate the grant proposals (!), and to shift the little NSF funding that remains from universities to private companies.

To be clear: in the United States, NSF is the only government agency whose central mission is curiosity-driven basic research—not that other agencies like DOE or NIH or NOAA, which also fund basic research, are safe from the chopping block either.

Maybe Congress, where support for basic science has long been bipartisan, will at some point grow some balls and push back on this. If not, though: does anyone seriously believe that you can cut the NSF’s budget by two-thirds while targeting only “woke communism”? That this won’t decimate the global preeminence of American universities in math, physics, computer science, astronomy, genetics, neuroscience, and more—preeminence that took a century to build?

Or does anyone think that I, for example, am a “woke communist”? I, the old-fashioned Enlightenment liberal who repeatedly risked his reputation to criticize “woke communism,” who the “woke communists” denounced when they noticed him at all, and who narrowly survived a major woke cancellation attempt a decade ago? Alas, I doubt any of that will save me: I presumably won’t be able to get NSF grants either under this new regime. Nor will my hundreds of brilliant academic colleagues, who’ve done what they can to make sure the center of quantum computing research remains in America rather than China or anywhere else.

I of course have no hope that the “Kyles” of the world will ever apologize to me for their prediction, their promise, being so dramatically wrong. But here’s my plea to Elon Musk, J. D. Vance, Joe Lonsdale, Curtis Yarvin, the DOGE boys, and all the readers of this blog who are connected to their circle: please prove me wrong, and prove Kyle right.

Please preserve and increase the NSF’s budget, after you’ve cleansed it of “woke communism” as you see fit. For all I care, add a line item to the budget for studying how to build rockets that are even bigger, louder, and more phallic.

But if you won’t save the NSF and the other basic research agencies—well hey, you’re the ones who now control the world’s nuclear-armed superpower, not me. But don’t you dare bullshit me about how you did all this so that merit-based science could once again flourish, like in the days of Newton and Gauss, finally free from meddling bureaucrats and woke diversity hires. You’d then just be another in history’s endless litany of conquering bullies, destroying what they can’t understand, no more interesting than all the previous bullies.

February 07, 2025

Matt Strassler What is an Electron? How Times Have Changed

When the electron, the first subatomic particle to be identified, was discovered in 1897, it was thought to be a tiny speck with electric charge, moving around on a path governed by the forces of electricity, magnetism and gravity. This was just as one would expect for any small object, given the incredibly successful approach to physics that had been initiated by Galileo and Newton and carried onward into the 19th century.

But this view didn’t last long. Less than 15 years later, physicists learned that an atom has a tiny nucleus with positive electric charge and most of an atom’s mass. This made it clear that something was deeply wrong, because if Newton’s and Maxwell’s laws applied, then all the electrons in an atom should have spiraled into the nucleus in less than a second.

From 1913 to 1925, physicists struggled toward a new vision of the electron. They had great breakthroughs and initial successes in the late 1920s. But still, something was off. They did not really find what they were looking for until the end of the 1940s.

Most undergraduates in physics, philosophers who are interested in physics, and general readers mainly learn about quantum physics of the 1920s, that of Heisenberg, Born, Jordan and of Schrödinger. The methods developed at that time, often called “quantum mechanics” for historical reasons, represented the first attempt by physicists to make sense of the atomic, molecular, and subatomic world. Quantum mechanics is all you need to know if you just want to do chemistry, quantum computing, or most atomic physics. It forms the basis of many books about the applications of quantum physics, including those read by most non-experts. The strange puzzles of quantum physics, including the double-slit experiment that I reviewed recently, and many attempts to interpret or alter quantum physics, are often phrased using this 1920s-era approach.

What often seems to be forgotten is that 1920s quantum physics does not agree with data. It’s an approximation, and sometimes a very good one. But it is inconsistent with Einstein’s relativity principle, a cornerstone of the cosmos. This is in contrast to the math and concepts that replaced it, known as relativistic quantum field theory. Importantly, electrons in quantum field theory are very different from the electrons of the 1920s.

And so, when trying to make ultimate conceptual sense of the universe, we should always be careful to test our ideas using quantum field theory, not relying on the physics of the 1920s. Otherwise we risk developing an interpretation which is inconsistent with data, at a huge cost in wasted time. Meanwhile, when we do use the 1920s viewpoint, we should always remember its limitations, and question its implications.

Overview

Before I go into details, here’s an overview.

I have argued strongly in my book and on this blog that calling electrons “particles” is misleading, and one needs to remember this if one wants to understand them. One might instead consider calling them “wavicles“, a term itself from the 1920s that I find appropriate. You may not like this term, and I don’t insist that you adopt it. What’s important is that you understand the conceptual point that the term is intended to convey.

Most crucially, electrons as wavicles is an idea from quantum field theory, not from the 1920s (though a few people, like de Broglie, were on the right track.) In the viewpoint of 1920s quantum physics, electrons are not wavicles. They are particles. Quantum particles.

Before quantum physics, an electron was described as an object with a position and a velocity (or a momentum, which is the electron’s mass times its velocity), moving through the world along a precise path. But in 1920s quantum physics, an electron is described as a particle with a position or a momentum, or some compromise between the two; its path is not definite.

In Schrödinger’s viewpoint [and I emphasize that there are others — his approach is just the most familiar to non-experts], there is a quantum wave function (or more accurately, a quantum state) that tells us the probabilities for the particle’s behavior: where we might find it, and where it might be going.

A wave function must not be identified with the particle itself. No matter how many particles there are, there is only one wave function. Specifically, if there are two electrons, then a single quantum wave function tells us the probabilities for their joint behavior — for the behavior of the system of two electrons. The two electrons are not independent of one another; in quantum physics I can’t say what one’s behavior might be without worrying about what the other is doing. The wave function describes the two electrons, but it is not either one of them.

Then we get to quantum field theory of the late 1940s and beyond. Now we view an electron as a wave — as a ripple in a field, known as the electron field. The whole field, across all of space, has to be described by the wave function, not just the one electron. (In fact, that’s not right either: our wave function has to simultaneously describe all the universe’s fields.) This is very different conceptually from the ’20s; the electron is never an object with a precise position, and instead it is generally spread out.

So it’s really, really important to remember that it is relativistic quantum field theory that universally agrees with experiments, not the quantum physics of the ’20s. If we forget this, we risk drawing wrong conclusions from the latter. Moreover, it becomes impossible to understand what modern particle physicists are talking about, because our description of the physics of “particles” relies on relativistic quantum field theory.

The Electron Over Time

Let me now go into more detail, with hope of giving you some intuition for how things have changed from 1900 to 1925 to 1950.

1900: Electrons Before Quantum Physics

A Simple Particle

Pre-quantum physics (such as one learns in a first-year undergraduate course) treats an electron as a particle with a definite position which changes in a definite way over time; it has a definite speed v which represents the rate of the change of its position. The particle also has definite momentum p equal to its mass m times its speed v. Scientists call this a “classical particle”, because it’s what Isaac Newton himself, the founder of old-school (“classical”) physics would have meant by the word “particle”.

Figure 1: A classical particle (blue dot) moves across across physical space. At the moment shown, it is at position A, and its path takes it to the right with a definite velocity.
Two Simple Particles

Two particles are just two of these objects. That’s obvious, right? [Seems as though it ought to be. But as we’ll see, quantum physics says that not only isn’t it obvious, it’s false.]

Figure 2: Two particles, each traveling independently on its own path. Particle 1 moves rapidly to the right and is located at A, while particle 2 moves slowly to the left and is located at B.
Two Particles in the “Space of Possibilities”

But now I’m going to do something that may seem unnecessarily complicated — a bit mind-bending for no obvious purpose. I want to describe the motion of these two particles not in the physical space in which they individually move but instead in the space of possibilities for two-particle system, viewed as a whole.

Why? Well, in classical physics, it’s often useful, but it’s also unnecessary. I can tell you where the two particles are in physical space and be done with it. But in quantum physics I cannot. The two particles do not, in general, exist independently. The system must be viewed as a whole. So to understand how quantum physics works, we need to understand the space of possibilities for two classical particles.

This isn’t that hard, even if it’s unfamiliar. Instead of depicting the two particles as two independent dots at two locations A and B along the line shown in Fig. 2, I will instead depict the system by indicating a point in a two-dimensional plane, where

  • the horizontal axis depicts where the first particle is located
  • the vertical axis depicts where the second particle is located

To make sure that you remember that I am not depicting any one particle but rather the system of two particles, I have drawn what the system is doing at this moment as a star in this two-dimensional space of possibilities. Notice the star is located at A along the horizontal axis and at B along the vertical axis, indicating that one particle is at A and the other is at B.

Figure 3: Within the space of possibilities, the system shown in Fig. 2 is located at the star, where the horizontal axis (the position of particle 1) is at A and the vertical axis (the position of the particle 2) is at B. Over time the star is moving to the right and downward, as shown by the arrow, indicating that in physical space particle 1 moves to the right and the particle 2 to the left, as shown in Fig. 2.

Moreover, in contrast to the two arrows in physical space that I have drawn in Fig. 2, each one indicating the motion of the corresponding particle, I have drawn a single arrow in the space of possibilities, indicating how the system is changing over time. As you can see from Fig. 2,

  • the first particle is moving from A to the right in physical space, which corresponds to rightward motion along the horizontal axis of Fig. 3;
  • the second particle is moving from B to the left in physical space, which corresponds to downward motion along the vertical axis in Fig. 3;

and so the arrow indicating how the system is changing over time points downward and to the right. It points more to the right than downward, because the motion of the particle at A is faster than the motion of the particle at B.

Why didn’t I bother to make a version of Fig. 3 for the case of just one particle? That’s because for just one particle, physical space and the space of possibilities are the same, so the pictures would be identical.

I suggest you take some time to compare Figs. 2 and 3 until the relationship is clear. It’s an important conceptual step, without which even 1920s quantum physics can’t make sense.

If you’re having trouble with it, try this post, in which I gave another example, a bit more elaborate but with more supporting discussion.

1925: Electrons in 1920s Quantum Physics

A Quantum Particle

1920s quantum physics, as one learns in an upper-level undergraduate course, treats an electron as a particle with position x and momentum p that are never simultaneously definite, and both are generally indefinite to a greater or lesser degree. The more definite the position, the less definite the momentum can be, and vice versa; that’s Heisenberg’s uncertainty principle applied to a particle. Since these properties of a particle are indefinite, quantum physics only tells us about their statistical likelihoods. A single electron is described by a wave function (or “state vector”) that gives us the probabilities of it having, at a particular moment in time, a specific location x0 or specific momentum p0. I’ll call this a “quantum particle”.

How can we depict this? For a single particle, it’s easy — so easy that it’s misleading, as we’ll see when we go to two particles. All we have to do is show what the wave function looks like; and the wave function [actually the square of the wave function] tells us about the probability of where we might find the particle. This is indicated in Fig. 4.

Figure 4: A quantum particle corresponding to Fig. 1. The probability of finding the particle at any particular position is given by the square of a wave function, here sketched in red (for wave crests) and blue (for wave troughs). Rather than the particle being at the location A, it may be somewhere (blue dot) near A , but it could be anywhere where the wave function is non-zero. We can’t say exactly where (hence the question mark) without actually measuring, which would change the wave function.

As I mentioned earlier, the case of one particle is special, because the space of possibilities is the same as physical space. That’s potentially misleading. So rather than think too hard about this picture, where there are many potentially misleading elements, let’s go to two particles, where things look much more complicated, but are actually much clearer once you understand them.

Two Quantum Particles

Always remember: it’s not one wave function per particle. It’s one wave function for each isolated system of particles. Two electrons are also described by a single wave function, one that gives us the probability of, say, electron 1 being at location A while electron 2 is simultaneously at location B. That function cannot be expressed in physical space! It can only be expressed in the space of possibilities, because it never tells us the probability of finding the first electron at position 1 independent of what electron 2 is doing.

In other words, there is no analogue of Fig. 2. Quantum physics is too subtle to be squeezed easily into a description in physical space. Instead, all we can look for is a generalization of Fig. 3.

And when we do, we might find something like what is shown in Fig. 5; in contrast to Fig. 4, where the wave function gives us a rough idea of where we may find a single particle, now the wave function gives us a rough idea of what the system of two particles may be doing — and more precisely, it gives us the probability for any one thing that the two particles, collectively, might be doing. Compare this figure to Fig. 2.

Figure 5: The probability of finding the two-particle system at any given point in the space of possibilities is given by the square of a wave function, shown again in red (wave crests) and blue (wave troughs). We don’t know if the positions of the two particles is as indicated by the star (hence the question mark), but the wave function does tell us the probability that this is the case, as well as the probability of all other possibilities.

In Fig. 2, we know what the system is doing; particle 1 is at position A and particle 2 is at position B, and we know how their positions are changing with time. In Fig. 5 we know the wave function and how it is changing with time, but the wave function only gives us probabilities for where the particles might be found — namely that they are near position A and position B, respectively, but exactly can’t be known known until we measure, at which point the wave function will change dramatically, and all information about the particles’ motions will be lost. Nor, even though roughly that they are headed right and left respectively, we can’t know exactly where they are going unless we measure their momenta, again changing the wave function dramatically, and all information about the particles’ positions will be lost.

And again, if this is too hard to follow, try this post, in which I gave another example, a bit more complicated but with more supporting discussion.

1950: Electrons in Modern Quantum Field Theory

1940s-1950s relativistic quantum field theory, as a future particle physicist typically learns in graduate school, treats electrons as wave-like objects — as ripples in the electron field.

[[[NOTA BENE: I wrote “the ElectrON field”, not “the electrIC field”. The electrIC field is something altogether different!!!]]]

The electron field (like any cosmic field) is found everywhere in physical space.

(Be very careful not to confuse a field, defined in physical space, with a wave function, which is defined on the space of possibilities, a much larger, abstract space. The universe has many fields in its physical space, but only one wave function across the abstract space of all its possibilities.)

In quantum field theory, an electron has a definite mass, but as a ripple, it can be given any shape, and it is always undergoing rapid vibration, even when stationary. It does not have a position x, unlike the particles found in 1920s quantum field theory, though it can (very briefly) be shaped into a rather localized object. It cannot be divided into pieces, even if its shape is very broadly spread out. Nevertheless it is possible to create or destroy electrons one at a time (along with either a positron [the electron’s anti-particle] or an anti-neutrino.) This rather odd object is what I would mean by a “wavicle”; it is a particulate, indivisible, faint wave.

Meanwhile, there is a wave function for the whole field (really for all the cosmic fields at once), and so that whole notion is vastly more complicated than in 1920s physics. In particular, the space of possibilities, where the wave function is defined, is the space of all possible shapes for the field! This is a gigantic space, because it takes an infinite amount of information to specify a field’s shape. (After all, you have to tell me what the field’s strength is at each point in space, and there are an infinite number of such points.) That means that the space of possibilities now has an infinite number of dimensions! So the wave function is a function of an infinite number of variables, making it completely impossible to draw, generally useless for calculations, and far beyond what any human brain can envision.

It’s almost impossible to figure out how to convey all this in a picture. Below is my best attempt, and it’s not one I’m very proud of. Someday I may think of something better.

Figure 6: In quantum field theory — in contrast to “classical” field theory — we generally do not know the shape of the field (its strength, or “value”, shown on the vertical axis, at each location in physical space, drawn as the horizontal axis.) Instead, the range of possible shapes is described by a wave function, not directly shown. One possible shape for a somewhat localized electron, roughly centered around the position A, is shown (with a question mark to remind you that we do not know the actual shape.) The blue blur is an attempt to vaguely represent a wave function for this single electron that allows for other shapes, but with most of those shapes somewhat resembling the shape shown and thus localized roughly around the position A. [Yeah, this is pretty bad.]

I’ve drawn the single electron in physical space, and indicated one possible shape for the field representing this electron, along with a blur and a question mark to emphasize that we don’t generally know the shape for the field — analogous to the fact that when I drew one electron in Fig. 4, there was a blur and question mark that indicated that we don’t generally know the position of the particle in 1920s quantum physics.

[There actually is a way to draw what a single, isolated particle’s wave function looks like in a space of possibilities, but you have to scramble that space in a clever way, far beyond what I can explain right now. We’ll see it later this year.]

Ugh. Writing about quantum physics, even about non-controversial issues, is really hard. The only thing I can confidently hope to have conveyed here is that there is a very big difference between electrons as they were understood and described in 1920’s quantum physics and electrons as they are described in modern quantum field theory. If we get stuck in the 1920’s, the math and concepts that we apply to puzzles like the double slit experiment and “spooky action at a distance” are never going to be quite right.

As for what’s wrong with Figure 6, there are so many things, some incidental, some fundamental:

  • The picture I’ve drawn would be somewhat accurate for a Higgs boson as a ripple in the Higgs field. But an electron is a fermion, not a boson, and trying to draw the ripple without being misleading is kind of impossible.
  • The electron field is given by a complex number, and in fact more than one, so drawing it as though it has a shape like the one shown in Fig. 6 is an oversimplification.
  • At best, Fig. 6 sketches how an electron would look if it didn’t experience any forces. But because electrons are electrically charged and do experience electric and magnetic forces, we can’t just show the electron field without showing the electromagnetic field too; the wave function for an electron deeply involves both. That gets super-complicated.
  • The wave function is suggested by a vague blur, but in fact it always has more structure than can be depicted here.
  • And there are probably more issues, as I’m sure some readers will point out. Go ahead and do so; it’s better to state all the flaws out loud.

What about two electrons — two ripples in the electron field? This is currently beyond my abilities to sketch. Even ignoring the effects of electric and magnetic forces, describing two electrons in quantum field theory in a picture like Fig. 6 seems truly impossible. For one thing, because electrons are precisely identical in quantum field theory, there are always correlations between the two electrons that cannot be avoided — they can never be independent, in the way that two classical electrons are. (In fact this correlation even affects Fig. 5; I ignored this issue to keep things simpler.) So they really cannot be depicted in physical space. But the space of possibilities is far too enormous for any depiction (unless we do some serious rescrambling — again, something for later in the year, and even then it will only work for bosons.)

And what should you take away from this? Some things about quantum physics can be understood using 1920’s language, but not the nature of electrons and other elementary “particles”. When we try to extract profound lessons from quantum physics without using quantum field theory, we have to be very careful to make sure that those lessons still apply when we try to bring them to the cosmos we really live in — a cosmos for which 1920’s quantum physics proved just as imperfect, though still useful, as the older laws of Newton and Maxwell.

Matt von HippelIntegration by Parts, Evolved

I posted what may be my last academic paper today, about a project I’ve been working on with Matthias Wilhelm for most of the last year. The paper is now online here. For me, the project has been a chance to broaden my horizons, learn new skills, and start to step out of my academic comfort zone. For Matthias, I hope it was grant money well spent.

I wanted to work on something related to machine learning, for the usual trendy employability reasons. Matthias was already working with machine learning, but was interested in pursuing a different question.

When is machine learning worthwhile? Machine learning methods are heuristics, unreliable methods that sometimes work well. You don’t use a heuristic if you have a reliable method that runs fast enough. But if all you have are heuristics to begin with, then machine learning can give you a better heuristic.

Matthias noticed a heuristic embedded deep in how we do particle physics, and guessed that we could do better. In particle physics, we use pictures called Feynman diagrams to predict the probabilities for different outcomes of collisions, comparing those predictions to observation to look for evidence of new physics. Each Feynman diagram corresponds to an integral, and for each calculation there are hundreds, thousands, or even millions of those integrals to do.

Luckily, physicists don’t actually have to do all those integrals. It turns out that most of them are related, by a slightly more advanced version of that calculus class mainstay, integration by parts. Using integration by parts you can solve a list of equations, finding out how to write your integrals in terms of a much smaller list.

How big a list of equations do you need, and which ones? Twenty-five years ago, Stefano Laporta proposed a “golden rule” to choose, based on his own experience, and people have been using it (more or less, with their own tweaks) since then.

Laporta’s rule is a heuristic, with no proof that it is the best option, or even that it will always work. So we probably shouldn’t have been surprised when someone came up with a better heuristic. Watching talks at a December 2023 conference, Matthias saw a presentation by Johann Usovitsch on a curious new rule. The rule was surprisingly simple, just one extra condition on top of Laporta’s. But it was enough to reduce the number of equations by a factor of twenty.

That’s great progress, but it’s also a bit frustrating. Over almost twenty-five years, no-one had guessed this one simple change?

Maybe, thought Matthias and I, we need to get better at guessing.

We started out thinking we’d try reinforcement learning, a technique where a machine is trained by playing a game again and again, changing its strategy when that strategy brings it a reward. We thought we could have the machine learn to cut away extra equations, getting rewarded if it could cut more while still getting the right answer. We didn’t end up pursuing this very far before realizing another strategy would be a better fit.

What is a rule, but a program? Laporta’s golden rule and Johann’s new rule could both be expressed as simple programs. So we decided to use a method that could guess programs.

One method stood out for sheer trendiness and audacity: FunSearch. FunSearch is a type of algorithm called a genetic algorithm, which tries to mimic evolution. It makes a population of different programs, “breeds” them with each other to create new programs, and periodically selects out the ones that perform best. That’s not the trendy or audacious part, though, people have been doing that sort of genetic programming for a long time.

The trendy, audacious part is that FunSearch generates these programs with a Large Language Model, or LLM (the type of technology behind ChatGPT). Using an LLM trained to complete code, FunSearch presents the model with two programs labeled v0 and v1 and asks it to complete v2. In general, program v2 will have some traits from v0 and v1, but also a lot of variation due to the unpredictable output of LLMs. The inventors of FunSearch used this to contribute the variation needed for evolution, using it to evolve programs to find better solutions to math problems.

We decided to try FunSearch on our problem, modifying it a bit to fit the case. We asked it to find a shorter list of equations, giving a better score for a shorter list but a penalty if the list wasn’t able to solve the problem fully.

Some tinkering and headaches later, it worked! After a few days and thousands of program guesses, FunSearch was able to find a program that reproduced the new rule Johann had presented. A few hours more, and it even found a rule that was slightly better!

But then we started wondering: do we actually need days of GPU time to do this?

An expert on heuristics we knew had insisted, at the beginning, that we try something simpler. The approach we tried then didn’t work. But after running into some people using genetic programming at a conference last year, we decided to try again, using a Python package they used in their work. This time, it worked like a charm, taking hours rather than days to find good rules.

This was all pretty cool, a great opportunity for me to cut my teeth on Python programming and its various attendant skills. And it’s been inspiring, with Matthias drawing together more people interested in seeing just how much these kinds of heuristic methods can do there. I should be clear though, that so far I don’t think our result is useful. We did better than the state of the art on an example, but only slightly, and in a way that I’d guess doesn’t generalize. And we needed quite a bit of overhead to do it. Ultimately, while I suspect there’s something useful to find in this direction, it’s going to require more collaboration, both with people using the existing methods who know better what the bottlenecks are, and with experts in these, and other, kinds of heuristics.

So I’m curious to see what the future holds. And for the moment, happy that I got to try this out!

Doug NatelsonNSF targeted with mass layoffs, acc to Politico; huge cuts in president’s budget request

According to this article at politico, there was an all-hands meeting at NSF today (at least for the engineering directorate) where they were told that there will be staff layoffs of 25-50% over the next two months.

This is an absolute catastrophe if it is accurately reported and comes to pass.  NSF is already understaffed.  This goes far beyond anything involving DEI, and is essentially a declaration that the US is planning to abrogate the federal role in supporting science and engineering research.  

Moreover, I strongly suspect that if this conversation is being had at NSF, it is likely being had at DOE and NIH.

I don't even know how to react to this, beyond encouraging my fellow US citizens to call their representatives and senators and make it clear that this would be an unmitigated disaster.

Update: looks like the presidential budget request will be for a 2/3 cut to the NSF.  Congress often goes against such recommendations, but this is certainly an indicator of what the executive branch seems to want.  


February 06, 2025

Terence TaoQuaternions and spherical trigonometry

Hamilton’s quaternion number system {\mathbb{H}} is a non-commutative extension of the complex numbers, consisting of numbers of the form {t + xi + yj + zk} where {t,x,y,z} are real numbers, and {i,j,k} are anti-commuting square roots of {-1} with {ij=k}, {jk=i}, {ki=j}. While they are non-commutative, they do keep many other properties of the complex numbers:

  • Being non-commutative, the quaternions do not form a field. However, they are still a skew field (or division ring): multiplication is associative, and every non-zero quaternion has a unique multiplicative inverse.
  • Like the complex numbers, the quaternions have a conjugation

    \displaystyle  \overline{t+xi+yj+zk} := t-xi-yj-zk,

    although this is now an antihomomorphism rather than a homomorphism: {\overline{qr} = \overline{r}\ \overline{q}}. One can then split up a quaternion {t + xi + yj + zk} into its real part {t} and imaginary part {xi+yj+zk} by the familiar formulae

    \displaystyle  \mathrm{Re} q := \frac{q + \overline{q}}{2}; \quad \mathrm{Im} q := \frac{q - \overline{q}}{2}

    (though we now leave the imaginary part purely imaginary, as opposed to dividing by {i} in the complex case).
  • The inner product

    \displaystyle  \langle q, r \rangle := \mathrm{Re} q \overline{r}

    is symmetric and positive definite (with {1,i,j,k} forming an orthonormal basis). Also, for any {q}, {q \overline{q}} is real, hence equal to {\langle q, q \rangle}. Thus we have a norm

    \displaystyle  |q| = \sqrt{q\overline{q}} = \sqrt{\langle q,q \rangle} = \sqrt{t^2 + x^2 + y^2 + z^2}.

    Since the real numbers commute with all quaternions, we have the multiplicative property {|qr| = |q| |r|}. In particular, the unit quaternions {U(1,\mathbb{H}) := \{ q \in \mathbb{H}: |q|=1\}} (also known as {SU(2)}, {Sp(1)}, or {Spin(3)}) form a compact group.
  • We have the cyclic trace property

    \displaystyle  \mathrm{Re}(qr) = \mathrm{Re}(rq)

    which allows one to take adjoints of left and right multiplication:

    \displaystyle  \langle qr, s \rangle = \langle q, s\overline{r}\rangle; \quad \langle rq, s \rangle = \langle q, \overline{r}s \rangle

  • As {i,j,k} are square roots of {-1}, we have the usual Euler formulae

    \displaystyle  e^{i\theta} = \cos \theta + i \sin \theta, e^{j\theta} = \cos \theta + j \sin \theta, e^{k\theta} = \cos \theta + k \sin \theta

    for real {\theta}, together with other familiar formulae such as {\overline{e^{i\theta}} = e^{-i\theta}}, {e^{i(\alpha+\beta)} = e^{i\alpha} e^{i\beta}}, {|e^{i\theta}| = 1}, etc.
We will use these sorts of algebraic manipulations in the sequel without further comment.

The unit quaternions {U(1,\mathbb{H}) = \{ q \in \mathbb{H}: |q|=1\}} act on the imaginary quaternions {\{ xi + yj + zk: x,y,z \in {\bf R}\} \equiv {\bf R}^3} by conjugation:

\displaystyle  v \mapsto q v \overline{q}.

This action is by orientation-preserving isometries, hence by rotations. It is not quite faithful, since conjugation by the unit quaternion {-1} is the identity, but one can show that this is the only loss of faithfulness, reflecting the well known fact that {U(1,\mathbb{H}) \equiv SU(2)} is a double cover of {SO(3)}.

For instance, for any real {\theta}, conjugation by {e^{i\theta/2} = \cos(\theta/2) + i \sin(\theta/2)} is a rotation by {\theta} around {i}:

\displaystyle  e^{i\theta/2} i e^{-i\theta/2} = i \ \ \ \ \ (1)

\displaystyle  e^{i\theta/2} j e^{-i\theta/2} = \cos(\theta) j + \sin(\theta) k \ \ \ \ \ (2)

\displaystyle  e^{i\theta/2} k e^{-i\theta/2} = \cos(\theta) k - \sin(\theta) j. \ \ \ \ \ (3)

Similarly for cyclic permutations of {i,j,k}. The doubling of the angle here can be explained from the Lie algebra fact that {[i,j]=ij-ji} is {2k} rather than {k}; it also closely related to the aforementioned double cover. We also of course have {U(1,\mathbb{H})\equiv Spin(3)} acting on {\mathbb{H}} by left multiplication; this is known as the spinor representation, but will not be utilized much in this post. (Giving {\mathbb{H}} the right action of {{\bf C}} makes it a copy of {{\bf C}^2}, and the spinor representation then also becomes the standard representation of {SU(2)} on {{\bf C}^2}.)

Given how quaternions relate to three-dimensional rotations, it is not surprising that one can also be used to recover the basic laws of spherical trigonometry – the study of spherical triangles on the unit sphere. This is fairly well known, but it took a little effort for me to locate the required arguments, so I am recording the calculations here.

The first observation is that every unit quaternion {q} induces a unit tangent vector {qj\overline{q}} on the unit sphere {S^2 \subset {\bf R}^3}, located at {qi\overline{q} \in S^2}; the third unit vector {qk\overline{q}} is then another tangent vector orthogonal to the first two (and oriented to the left of the original tangent vector), and can be viewed as the cross product of {qi\overline{q} \in S^2} and {qj\overline{q} \in S^2}. Right multplication of this quaternion then corresponds to various natural operations on this unit tangent vector:

  • Right multiplying {q} by {e^{i\theta/2}} does not affect the location {qi\overline{q}} of the tangent vector, but rotates the tangent vector {qj\overline{q}} anticlockwise by {\theta} in the direction of the orthogonal tangent vector {qk\overline{q}}, as it replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} + \sin(\theta) qk\overline{q}}.
  • Right multiplying {q} by {e^{k\theta/2}} advances the tangent vector by geodesic flow by angle {\theta}, as it replaces {qi\overline{q}} by {\cos(\theta) qi\overline{q} + \sin(\theta) qj\overline{q}}, and replaces {qj\overline{q}} by {\cos(\theta) qj\overline{q} - \sin(\theta) qi\overline{q}}.

Now suppose one has a spherical triangle with vertices {A,B,C}, with the spherical arcs {AB, BC, CA} subtending angles {c, a, b} respectively, and the vertices {A,B,C} subtending angles {\alpha,\beta,\gamma} respectively; suppose also that {ABC} is oriented in an anti-clockwise direction for sake of discussion. Observe that if one starts at {A} with a tangent vector oriented towards {B}, advances that vector by {c}, and then rotates by {\pi - \beta}, the tangent vector now at {B} and pointing towards {C}. If one advances by {a} and rotates by {\pi - \gamma}, one is now at {C} pointing towards {A}; and if one then advances by {b} and rotates by {\pi - \alpha}, one is back at {A} pointing towards {B}. This gives the fundamental relation

\displaystyle  e^{kc/2} e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} e^{kb/2} e^{i(\pi-\alpha)/2} = 1 \ \ \ \ \ (4)

relating the three sides and three equations of this triangle. (A priori, due to the lack of faithfulness of the {U(1,\mathbb{H})} action, the right-hand side could conceivably have been {-1} rather than {1}; but for extremely small triangles the right-hand side is clearly {1}, and so by continuity it must be {1} for all triangles.) Indeed, a moments thought will reveal that the condition (4) is necessary and sufficient for the data {a,b,c,\alpha,\beta,\gamma} to be associated with a spherical triangle. Thus one can view (4) as a “master equation” for spherical trigonometry: in principle, it can be used to derive all the other laws of this subject.

Remark 1 The law (4) has an evident symmetry {(a,b,c,\alpha,\beta,\gamma) \mapsto (\pi-\alpha,\pi-\beta,\pi-\gamma,\pi-a,\pi-b,\pi-c)}, which corresponds to the operation of replacing a spherical triangle with its dual triangle. Also, there is nothing particularly special about the choice of imaginaries {i,k} in (4); one can conjugate (4) by various quaternions and replace {i,k} here by any other orthogonal pair of unit quaternions.

Remark 2 If we work in the small scale regime, replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c} for some small {\varepsilon>0}, then we expect spherical triangles to behave like Euclidean triangles. Indeed, (4) to zeroth order becomes

\displaystyle  e^{i(\pi-\beta)/2} e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} = 1

which reflects the classical fact that the sum of angles of a Euclidean triangle is equal to {\pi}. To first order, one obtains

\displaystyle  c + a e^{i(\pi-\gamma)/2} e^{i(\pi-\alpha)/2} + b e^{i(\pi-\alpha)/2} = 0

which reflects the evident fact that the vector sum of the sides of a Euclidean triangle sum to zero. (Geometrically, this correspondence reflects the fact that the action of the (projective) quaternion group on the unit sphere converges to the action of the special Euclidean group {SE(2)} on the plane, in a suitable asymptotic limit.)

The identity (4) is an identity of two unit quaternions; as the unit quaternion group {U(1,\mathbb{H})} is three-dimensional, this thus imposes three independent constraints on the six real parameters {a,b,c,\alpha,\beta,\gamma} of the spherical triangle. One can manipulate this constraint in various ways to obtain various trigonometric identities involving some subsets of these six parameters. For instance, one can rearrange (4) to get

\displaystyle  e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} = e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}. \ \ \ \ \ (5)

Conjugating by {i} to reverse the sign of {k}, we also have

\displaystyle  e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} = e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2}.

Taking the inner product of both sides of these identities, we conclude that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2}, e^{i(\pi-\beta)/2} e^{-ka/2} e^{i(\pi-\gamma)/2} \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2}, e^{kc/2} e^{-i(\pi-\alpha)/2} e^{kb/2} \rangle.

Using the various properties of inner product, the former expression simplifies to {\mathrm{Re} e^{ka} = \cos a}, while the latter simplifies to

\displaystyle  \mathrm{Re} \langle e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2}, e^{kc} \rangle.

We can write {e^{kc} = \cos c + (\sin c) k} and

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb} e^{i(\pi-\alpha)/2} = \cos b - (\sin b) (\cos(\pi-\alpha) k + \sin(\pi-\alpha) j)

so on substituting and simplifying we obtain

\displaystyle  \cos b \cos c + \sin b \sin c \cos \alpha = \cos a

which is the spherical cosine rule. Note in the infinitesimal limit (replacing {a,b,c} by {\varepsilon a, \varepsilon b, \varepsilon c}) this rule becomes the familiar Euclidean cosine rule

\displaystyle  a^2 = b^2 + c^2 - 2bc \cos \alpha.

In a similar fashion, from (5) we see that the quantity

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, k \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, k \rangle.

The first expression simplifies by (1) and properties of the inner product to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} k e^{i(\pi-\beta)/2} \rangle,

which by (2), (3) simplifies further to {\sin a \sin \beta}. Similarly, the second expression simplifies to

\displaystyle  \langle e^{-kb/2} i e^{kb/2} , e^{i(\pi-\alpha)/2} k e^{-i(\pi-\alpha)/2}\rangle,

which by (2), (3) simplifies to {\sin b \sin \alpha}. Equating the two and rearranging, we obtain

\displaystyle  \frac{\sin \alpha}{\sin a} = \frac{\sin \beta}{\sin b}

which is the spherical sine rule. Again, in the infinitesimal limit we obtain the familiar Euclidean sine rule

\displaystyle  \frac{\sin \alpha}{a} = \frac{\sin \beta}{b}.

As a variant of the above analysis, we have from (5) again that

\displaystyle  \langle e^{i(\pi-\beta)/2} e^{ka/2} e^{i(\pi-\gamma)/2} i e^{-i(\pi-\gamma)/2} e^{-ka/2} e^{-i(\pi-\beta)/2}, j \rangle

is equal to

\displaystyle  \langle e^{-kc/2} e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2} e^{kc/2}, j \rangle.

As before, the first expression simplifies to

\displaystyle  \langle e^{ka/2} i e^{-ka/2}, e^{-i(\pi-\beta)/2} j e^{i(\pi-\beta)/2} \rangle

which equals {\sin a \cos \beta}. Meanwhile, the second expression can be rearranged as

\displaystyle  \langle e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}, e^{kc/2} j e^{-kc/2} \rangle.

By (2), (3) we can simplify

\displaystyle  e^{-i(\pi-\alpha)/2} e^{-kb/2} i e^{kb/2} e^{i(\pi-\alpha)/2}

\displaystyle = (\cos b) i - (\sin b) \cos(\pi-\alpha) j + (\sin b) \sin(\pi-\alpha) k

and so the inner product is {\cos b \sin c - \cos b \sin c \cos \alpha}, leading to the “five part rule

\displaystyle  \cos b \sin c - \sin b \cos c \cos \alpha = \sin a \cos \beta.

In the case of a right-angled triangle {\beta=\pi/2}, this simplifies to one of Napier’s rules

\displaystyle  \cos \alpha = \frac{\tan c}{\tan b}, \ \ \ \ \ (6)

which in the infinitesimal limit is the familiar {\cos \alpha = \frac{c}{b}}. The other rules of Napier can be derived in a similar fashion.

Example 3 One application of Napier’s rule (6) is to determine the sunrise equation for when the sun rises and sets at a given location on the Earth, and a given time of year. For sake of argument let us work in summer, in which the declination {\delta} of the Sun is positive (due to axial tilt, it reaches a maximum of {23.5^\circ} at the summer solstice). Then the Sun subtends an angle of {\pi/2-\delta} from the pole star (Polaris in the northern hemisphere, Sigma Octantis in the southern hemisphere), and appears to rotate around that pole star once every {24} hours. On the other hand, if one is at a latitude {\phi}, then the pole star an elevation of {\phi} above the horizon. At extremely high latitudes {\phi > \pi/2-\delta}, the sun will never set (a phenomenon known as “midnight sun“); but in all other cases, at sunrise or sunset, the sun, pole star, and horizon point below the pole star will form a right-angled spherical triangle, with hypotenuse subtending an angle {\pi/2-\delta} and vertical side subtending an angle {\phi}. The angle subtended by the pole star in this triangle is {\pi-\omega}, where {\omega} is the solar hour angle {\omega} – the angle that the sun deviates from its noon position. Equation (6) then gives the sunrise equation

\displaystyle  \cos(\pi-\omega) = \frac{\tan \phi}{\tan(\pi/2-\delta)}

or equivalently

\displaystyle  \cos \omega = - \tan \phi \tan \delta.

A similar rule determines the time of sunset. In particular, the number of daylight hours in summer (assuming one is not in the midnight sun scenario {\phi > \pi/2 -\delta}) is given by

\displaystyle  24 - \frac{24}{\pi} \mathrm{arccos}(\tan \phi \tan \delta).

The situation in winter is similar, except that {\delta} is now negative, and polar night (no sunrise) occurs when {\phi > \pi/2+\delta}.

February 05, 2025

Scott Aaronson The duty of stating the obvious

Tommaso DorigoHey AI, Design A Calorimeter For Me

As artificial intelligence tools continue to evolve and improve their performance on more and more general tasks, scientists struggle to make the best use of them. 
The problem is not incompetence - in fact, at least in my field of study (high-energy physics) most of us have grown rather well educated on the use and development of tailored machine learning algorithms. The problem is rather that our problems are enormously complex. Long gone are the years when we started to apply with success deep neural networks to classification and regression problems of data analysis: those were easy tasks. The bar now is set much higher - optimize the design of instruments we use for our scientific research. 

read more

February 03, 2025

John BaezBacking Up US Federal Databases

 

I hope you’ve read the news:

• Ethan Singer, Thousands of U.S. government web pages have been taken down since Friday, New York Times, 2 Feburary 2025.

Some of the things taken down mentions diversity, equity and inclusion, but they also include research papers on optics, chemistry, medicine and much more. They may reappear, but at this time nobody knows.

If you want to help save US federal web pages and databases, here are some things to do:

• First check to see if they’re already backed up. You can go to the Wayback Machine and type a website’s URL into the search bar. Also check out the Safeguarding Research Discourse Group, which has a list of what’s been backed up.

• If they’re not already on the Wayback Machine, you can save web pages there. The easiest way to do this is by installing the Wayback Machine extension for your browser. The add-ons and extensions are listed on the left-hand panel of the website’s homepage.

• If you’re concerned that certain websites or web pages may be removed, you can suggest federal websites and content that end in .gov, .mil and .com to the End of Term Web Archive.

• You can suggest federal climate and environmental databases to Environmental Data and Governance Initiative.

• You can suggest databases to The Data Liberation Project.

• You can suggest databases and also report databases you’ve backed up to the Safeguarding Research Discourse Group.

• For CDC data: tell science journalist Maggie Koerth which CDC data you’ve downloaded and whether you’ve made them publicly available.

I’ve taken these from Naseem Miller and added a bit. As you can see, there are overlapping efforts that are not yet coordinated with each other. This has some advantages (for example the Safeguarding Research Discourse Group is based outside the US) and some disadvantages (it’s hard to tell definitively what hasn’t been backed up yet).

For more on the situation, go here:

• Naseem Miller, Researchers rush to preserve federal health databases before they disappear from government websites, The Journalist’s Resource, 31 January 2025.

• Library Innovation Lab Team, Preserving public U.S. federal data, 30 January 2025.

Finally, Ben Ramsey notes: if you work in government and are asked to remove content from websites as result of Trump’s order, please use the http status code 451 instead of 404. 451 is the correct status code to use for these cases. Also include a Link header with the link relation “blocked-by” that “identifies the entity that blocks access to a resource following receipt of a legal demand.”

February 02, 2025

n-Category Café Backing Up US Federal Databases

I hope you’ve read the news:

Many of the pages taken down mention DEI, but they also include research papers on optics, chemistry, medicine and much more. They may reappear, but at this time nobody knows.

If you want to help save US federal web pages and databases, here are some things to do.

  • First check to see if they’re already backed up. You can go to the Wayback Machine and type a website’s URL into the search bar. Also check out the Safeguarding Research Discourse Group, which has a list of what’s been backed up.

  • If they’re not already on the Wayback Machine, you can save web pages there. The easiest way to do this is by installing the Wayback Machine extension for your browser. The add-ons and extensions are listed on the left-hand panel of the website’s homepage.

  • If you’re concerned that certain websites or web pages may be removed, you can suggest federal websites and content that end in .gov, .mil and .com to the End of Term Web Archive.

  • You can suggest federal climate and environmental databases to Environmental Data and Governance Initiative.

  • You can suggest databases to The Data Liberation Project.

  • You can suggest databases and also report databases you’ve backed up to the Safeguarding Research Discourse Group. This seems to be a community devoted to such issues.

  • For Centers for Disease Control data: tell science journalist Maggie Koerth which CDC data you’ve downloaded and whether you’ve made them publicly available.

I’ve taken these suggestions from Naseem Miller and added a bit. As you can see, there are overlapping efforts that are not yet coordinated with each other. This has some advantages (for example the Safeguarding Research Discourse Group is based outside the US) and some disadvantages (it’s hard to tell definitively what hasn’t been backed up yet).

For more on the situation, go here:

Scott Aaronson Hymn to be recited for the next thousand mornings

A few years ago, scientists feared they’d lose their jobs if they said anything against diversity programs.

I was against that.

Now scientists fear they’ll lose their jobs if they say anything for diversity programs.

I’m against that too.

A few years ago, if you didn’t list your pronouns, you were on the wrong side of history.

I was on the wrong side of history.

Now, if you want equal rights for your trans friends, you’re an enemy of the people.

I’m an enemy of the people.

Then, they said the woke triumph over universities, the media, and Silicon Valley had bent the moral arc of the universe and overrode individual conscience.

I chose conscience anyway.

Now they say the MAGA triumph over the White House, Congress, the Supreme Court, and (again) Silicon Valley has bent the moral arc back.

I choose conscience again.

Then and now the ideologues say: don’t you realize you need to pick a side?

What they don’t understand is that I have picked a side.

Doug NatelsonAn update, + a paper as a fun distraction

My post last week clearly stimulated some discussion.  I know people don't come here for political news, but as a professional scientist it's hard to ignore the chaotic present situation, so here are some things to read, before I talk about a fun paper:

  • Science reports on what is happening with NSF.  The short version: As of Friday afternoon, panels are delayed and funds (salary) are still not accessible for NSF postdoctoral fellows.  Here is NPR's take.
  • As of Friday afternoon, there is a new court order that specifically names the agency heads (including the NSF director), saying to disburse already approved funds according to statute.  
  • Update: The NSF is now allowing postdocs and GRF recipients to get paid; they are obeying the new court order.  See here and the FAQ specifically.
Looks like on this and a variety of other issues, we will see whether court orders actually compel actions anymore.

Now to distract ourselves with dreams of the future, this paper was published in Nature Photonics, measuring radiation pressure exerted by a laser on a 50 nm thick silicon nitride membrane.  The motivation is a grand one:  using laser-powered light sails to propel interstellar probes up to a decent fraction (say 10% or more) of the velocity of light.  It's easy to sketch out the basic idea on a napkin, and it has been considered seriously for decades (see this 1984 paper).  Imagine a reflective sail say 10 m\(^{2}\) and 100 nm thick.  When photons at normal incidence bounce from a reflective surface, they transfer momentum \(2\hbar \omega/c) normal to the surface.  If the reflective surface is very thin and low mass, and you can bounce enough photons off it, you can get decent accelerations.  Part of the appeal is, this is a spacecraft where you effectively keep the engine (the whopping laser) here at home and don't have to carry it with you.  There are braking schemes so that you could try to slow the craft down when it reaches your favorite target system.

A laser-powered lightsail (image from CalTech)

Of course, actually doing this on a scale where it would be useful faces enormous engineering challenges (beyond building whopping lasers and operating them for years at a time with outstanding collimation and positioning).  Reflection won't be perfect, so there will be heating.  Ideally, you'd want a light sail that passively stabilizes itself in the center of the beam.  In this paper, the investigators implement a clever scheme to measure radiation forces, and they test ideas involving dielectric gratings etched into the sail to generate self-stabilization.   Definitely more fun to think about such futuristic ideas than to read the news.

(An old favorite science fiction story of mine is "The Fourth Profession", by Larry Niven.  The imminent arrival of an alien ship at earth is heralded by the appearance of a bright point in the sky, whose emission turns out to be the highly blue-shifted, reflected spectrum of the sun, bouncing off an incoming alien light sail.  The aliens really need humanity to build them a launching laser to get to their next destination.)

January 31, 2025

John PreskillBeyond NISQ: The Megaquop Machine

On December 11, I gave a keynote address at the Q2B 2024 Conference in Silicon Valley. This is a transcript of my remarks. The slides I presented are here. The video of the talk is here.

NISQ and beyond

I’m honored to be back at Q2B for the 8th year in a row.

The Q2B conference theme is “The Roadmap to Quantum Value,” so I’ll begin by showing a slide from last year’s talk. As best we currently understand, the path to economic impact is the road through fault-tolerant quantum computing. And that poses a daunting challenge for our field and for the quantum industry.

We are in the NISQ era. And NISQ technology already has noteworthy scientific value. But as of now there is no proposed application of NISQ computing with commercial value for which quantum advantage has been demonstrated when compared to the best classical hardware running the best algorithms for solving the same problems. Furthermore, currently there are no persuasive theoretical arguments indicating that commercially viable applications will be found that do not use quantum error-correcting codes and fault-tolerant quantum computing.

NISQ, meaning Noisy Intermediate-Scale Quantum, is a deliberately vague term. By design, it has no precise quantitative meaning, but it is intended to convey an idea: We now have quantum machines such that brute force simulation of what the quantum machine does is well beyond the reach of our most powerful existing conventional computers. But these machines are not error-corrected, and noise severely limits their computational power.

In the future we can envision FASQ* machines, Fault-Tolerant Application-Scale Quantum computers that can run a wide variety of useful applications, but that is still a rather distant goal. What term captures the path along the road from NISQ to FASQ? Various terms retaining the ISQ format of NISQ have been proposed [here, here, here], but I would prefer to leave ISQ behind as we move forward, so I’ll speak instead of a megaquop or gigaquop machine and so on meaning one capable of executing a million or a billion quantum operations, but with the understanding that mega means not precisely a million but somewhere in the vicinity of a million.

Naively, a megaquop machine would have an error rate per logical gate of order 10^{-6}, which we don’t expect to achieve anytime soon without using error correction and fault-tolerant operation. Or maybe the logical error rate could be somewhat larger, as we expect to be able to boost the simulable circuit volume using various error mitigation techniques in the megaquop era just as we do in the NISQ era. Importantly, the megaquop machine would be capable of achieving some tasks beyond the reach of classical, NISQ, or analog quantum devices, for example by executing circuits with of order 100 logical qubits and circuit depth of order 10,000.

What resources are needed to operate it? That depends on many things, but a rough guess is that tens of thousands of high-quality physical qubits could suffice. When will we have it? I don’t know, but if it happens in just a few years a likely modality is Rydberg atoms in optical tweezers, assuming they continue to advance in both scale and performance.

What will we do with it? I don’t know, but as a scientist I expect we can learn valuable lessons by simulating the dynamics of many-qubit systems on megaquop machines. Will there be applications that are commercially viable as well as scientifically instructive? That I can’t promise you.

The road to fault tolerance

To proceed along the road to fault tolerance, what must we achieve? We would like to see many successive rounds of accurate error syndrome measurement such that when the syndromes are decoded the error rate per measurement cycle drops sharply as the code increases in size. Furthermore, we want to decode rapidly, as will be needed to execute universal gates on protected quantum information. Indeed, we will want the logical gates to have much higher fidelity than physical gates, and for the logical gate fidelities to improve sharply as codes increase in size. We want to do all this at an acceptable overhead cost in both the number of physical qubits and the number of physical gates. And speed matters — the time on the wall clock for executing a logical gate should be as short as possible.

A snapshot of the state of the art comes from the Google Quantum AI team. Their recently introduced Willow superconducting processor has improved transmon lifetimes, measurement errors, and leakage correction compared to its predecessor Sycamore. With it they can perform millions of rounds of surface-code error syndrome measurement with good stability, each round lasting about a microsecond. Most notably, they find that the logical error rate per measurement round improves by a factor of 2 (a factor they call Lambda) when the code distance increases from 3 to 5 and again from 5 to 7, indicating that further improvements should be achievable by scaling the device further. They performed accurate real-time decoding for the distance 3 and 5 codes. To further explore the performance of the device they also studied the repetition code, which corrects only bit flips, out to a much larger code distance. As the hardware continues to advance we hope to see larger values of Lambda for the surface code, larger codes achieving much lower error rates, and eventually not just quantum memory but also logical two-qubit gates with much improved fidelity compared to the fidelity of physical gates.

Last year I expressed concern about the potential vulnerability of superconducting quantum processors to ionizing radiation such as cosmic ray muons. In these events, errors occur in many qubits at once, too many errors for the error-correcting code to fend off. I speculated that we might want to operate a superconducting processor deep underground to suppress the muon flux, or to use less efficient codes that protect against such error bursts.

The good news is that the Google team has demonstrated that so-called gap engineering of the qubits can reduce the frequency of such error bursts by orders of magnitude. In their studies of the repetition code they found that, in the gap-engineered Willow processor, error bursts occurred about once per hour, as opposed to once every ten seconds in their earlier hardware.  Whether suppression of error bursts via gap engineering will suffice for running deep quantum circuits in the future is not certain, but this progress is encouraging. And by the way, the origin of the error bursts seen every hour or so is not yet clearly understood, which reminds us that not only in superconducting processors but in other modalities as well we are likely to encounter mysterious and highly deleterious rare events that will need to be understood and mitigated.

Real-time decoding

Fast real-time decoding of error syndromes is important because when performing universal error-corrected computation we must frequently measure encoded blocks and then perform subsequent operations conditioned on the measurement outcomes. If it takes too long to decode the measurement outcomes, that will slow down the logical clock speed. That may be a more serious problem for superconducting circuits than for other hardware modalities where gates can be orders of magnitude slower.

For distance 5, Google achieves a latency, meaning the time from when data from the final round of syndrome measurement is received by the decoder until the decoder returns its result, of about 63 microseconds on average. In addition, it takes about another 10 microseconds for the data to be transmitted via Ethernet from the measurement device to the decoding workstation. That’s not bad, but considering that each round of syndrome measurement takes only a microsecond, faster would be preferable, and the decoding task becomes harder as the code grows in size.

Riverlane and Rigetti have demonstrated in small experiments that the decoding latency can be reduced by running the decoding algorithm on FPGAs rather than CPUs, and by integrating the decoder into the control stack to reduce communication time. Adopting such methods may become increasingly important as we scale further. Google DeepMind has shown that a decoder trained by reinforcement learning can achieve a lower logical error rate than a decoder constructed by humans, but it’s unclear whether that will work at scale because the cost of training rises steeply with code distance. Also, the Harvard / QuEra team has emphasized that performing correlated decoding across multiple code blocks can reduce the depth of fault-tolerant constructions, but this also increases the complexity of decoding, raising concern about whether such a scheme will be scalable.

Trading simplicity for performance

The Google processors use transmon qubits, as do superconducting processors from IBM and various other companies and research groups. Transmons are the simplest superconducting qubits and their quality has improved steadily; we can expect further improvement with advances in materials and fabrication. But a logical qubit with very low error rate surely will be a complicated object due to the hefty overhead cost of quantum error correction. Perhaps it is worthwhile to fashion a more complicated physical qubit if the resulting gain in performance might actually simplify the operation of a fault-tolerant quantum computer in the megaquop regime or well beyond. Several versions of this strategy are being pursued.

One approach uses cat qubits, in which the encoded 0 and 1 are coherent states of a microwave resonator, well separated in phase space, such that the noise afflicting the qubit is highly biased. Bit flips are exponentially suppressed as the mean photon number of the resonator increases, while the error rate for phase flips induced by loss from the resonator increases only linearly with the photon number. This year the AWS team built a repetition code to correct phase errors for cat qubits that are passively protected against bit flips, and showed that increasing the distance of the repetition code from 3 to 5 slightly improves the logical error rate. (See also here.)

Another helpful insight is that error correction can be more effective if we know when and where the errors occur in a quantum circuit. We can apply this idea using a dual rail encoding of the qubits. With two microwave resonators, for example, we can encode a qubit by placing a single photon in either the first resonator (the 10) state, or the second resonator (the 01 state). The dominant error is loss of a photon, causing either the 01 or 10 state to decay to 00. One can check whether the state is 00, detecting whether the error occurred without disturbing a coherent superposition of 01 and 10. In a device built by the Yale / QCI team, loss errors are detected over 99% of the time and all undetected errors are relatively rare. Similar results were reported by the AWS team, encoding a dual-rail qubit in a pair of transmons instead of resonators.

Another idea is encoding a finite-dimensional quantum system in a state of a resonator that is highly squeezed in two complementary quadratures, a so-called GKP encoding. This year the Yale group used this scheme to encode 3-dimensional and 4-dimensional systems with decay rate better by a factor of 1.8 than the rate of photon loss from the resonator. (See also here.)

A fluxonium qubit is more complicated than a transmon in that it requires a large inductance which is achieved with an array of Josephson junctions, but it has the advantage of larger anharmonicity, which has enabled two-qubit gates with better than three 9s of fidelity, as the MIT team has shown.

Whether this trading of simplicity for performance in superconducting qubits will ultimately be advantageous for scaling to large systems is still unclear. But it’s appropriate to explore such alternatives which might pay off in the long run.

Error correction with atomic qubits

We have also seen progress on error correction this year with atomic qubits, both in ion traps and optical tweezer arrays. In these platforms qubits are movable, making it possible to apply two-qubit gates to any pair of qubits in the device. This opens the opportunity to use more efficient coding schemes, and in fact logical circuits are now being executed on these platforms. The Harvard / MIT / QuEra team sampled circuits with 48 logical qubits on a 280-qubit device –- that big news broke during last year’s Q2B conference. Atom computing and Microsoft ran an algorithm with 28 logical qubits on a 256-qubit device. Quantinuum and Microsoft prepared entangled states of 12 logical qubits on a 56-qubit device.

However, so far in these devices it has not been possible to perform more than a few rounds of error syndrome measurement, and the results rely on error detection and postselection. That is, circuit runs are discarded when errors are detected, a scheme that won’t scale to large circuits. Efforts to address these drawbacks are in progress. Another concern is that the atomic movement slows the logical cycle time. If all-to-all coupling enabled by atomic movement is to be used in much deeper circuits, it will be important to speed up the movement quite a lot.

Toward the megaquop machine

How can we reach the megaquop regime? More efficient quantum codes like those recently discovered by the IBM team might help. These require geometrically nonlocal connectivity and are therefore better suited for Rydberg optical tweezer arrays than superconducting processors, at least for now. Error mitigation strategies tailored for logical circuits, like those pursued by Qedma, might help by boosting the circuit volume that can be simulated beyond what one would naively expect based on the logical error rate. Recent advances from the Google team, which reduce the overhead cost of logical gates, might also be helpful.

What about applications? Impactful applications to chemistry typically require rather deep circuits so are likely to be out of reach for a while yet, but applications to materials science provide a more tempting target in the near term. Taking advantage of symmetries and various circuit optimizations like the ones Phasecraft has achieved, we might start seeing informative results in the megaquop regime or only slightly beyond.

As a scientist, I’m intrigued by what we might conceivably learn about quantum dynamics far from equilibrium by doing simulations on megaquop machines, particularly in two dimensions. But when seeking quantum advantage in that arena we should bear in mind that classical methods for such simulations are also advancing impressively, including in the past year (for example, here and here).

To summarize, advances in hardware, control, algorithms, error correction, error mitigation, etc. are bringing us closer to megaquop machines, raising a compelling question for our community: What are the potential uses for these machines? Progress will require innovation at all levels of the stack.  The capabilities of early fault-tolerant quantum processors will guide application development, and our vision of potential applications will guide technological progress. Advances in both basic science and systems engineering are needed. These are still the early days of quantum computing technology, but our experience with megaquop machines will guide the way to gigaquops, teraquops, and beyond and hence to widely impactful quantum value that benefits the world.

I thank Dorit Aharonov, Sergio Boixo, Earl Campbell, Roland Farrell, Ashley Montanaro, Mike Newman, Will Oliver, Chris Pattison, Rob Schoelkopf, and Qian Xu for helpful comments.

*The acronym FASQ was suggested to me by Andrew Landahl.

The megaquop machine (image generated by ChatGPT.
The megaquop machine (image generated by ChatGPT).

Jordan EllenbergIt’s tricky

In the final stages of writing up a combinatorial argument, no part of which is really deep, but which is giving me absolute fits, so this song is in my head. Penn and Teller! I forgot or perhaps never knew they were in this video.

Also: my 14-year-old daughter knows this song and reports “Everybody knows this song.” But she doesn’t know any other songs by Run-DMC. I’m not sure she knows any other ’80s hip hop at all. “It’s Tricky” wasn’t even Run-DMC’s biggest hit. It’s very mysterious, which pieces of the musical past survive into contemporary culture and which are forgotten.

Scott Aaronson The American science funding catastrophe

It’s been almost impossible to get reliable information this week, but here’s what my sources are telling me:

There is still a complete freeze on money being disbursed from the US National Science Foundation. Well, there’s total chaos in the federal government much more broadly, a lot of it more immediately consequential than the science freeze, but I’ll stick for now to my little corner of the universe.

The funding freeze has continued today, despite the fact that Trump supposedly rescinded it yesterday after a mass backlash. Basically, program directors remain in a state of confusion, paralysis, and fear. Where laws passed by Congress order them to do one thing, but the new Executive Orders seem to order the opposite, they’re simply doing nothing, waiting for clarification, and hoping to preserve their jobs.

Hopefully the funding will restart in a matter of days, after NSF and other agencies go through and cancel any expense that can be construed as DEI-related. Hopefully this will be like the short-lived Muslim travel ban of 2017: a “shock-and-awe” authoritarian diktat that thrills the base but quickly melts on contact with the reality of how our civilization works.

The alternative is painful to contemplate. If the current freeze drags on for months, tens of thousands of grad students and postdocs will no longer get stipends, and will be forced to quit. Basic science in the US will essentially grind to a halt—and even if it eventually restarts, an entire cohort of young physicists, mathematicians, and biologists will have been lost, while China and other countries race ahead in those fields.

Also, even if the funding does restart, the NSF and other federal agencies are now under an indefinite hiring freeze. If not quickly lifted, this will shrink these agencies and cripple their ability to carry out their missions.

If you voted for Trump, because you wanted to take a hammer to the woke deep state or whatever, then please understand: you may or may not have realized you were voting for this, exactly, but this is what you’ve gotten. In place of professionals who you dislike and who are sometimes systematically wrong, the American spaceship is now being piloted by drunken baboons, mashing the controls to see what happens. I hope you like the result.

Meanwhile, to anyone inside or outside the NSF who has more information about this rapidly-evolving crisis: I strongly encourage you to share whatever you know in the comments section. Or get in touch with me by email. I’ll of course respect all wishes for anonymity, and I won’t share anything without permission. But you now have a chance—some might even say an enviable chance—to put your loyalty to science and your country above your fear of a bully.

Update: By request, you can also contact me at ScottAaronson.49 on the encrypted messaging app Signal.

Another update: Maybe I should’ve expected this, but people are now sending me Signal messages to ask quantum mechanics questions or share their views on random topics! Should’ve added: I’m specifically interested in on-the-ground intel, from anyone who has it, about the current freeze in American science funding.

Yet another update: Terry Tao discusses the NSF funding crisis in terms of mean field theory.

Tommaso DorigoThe Multi-Muon Analysis - A Recollection

As part of the celebrations for 20 years of blogging, I am re-posting articles that in some way were notable for the history of the blog. This time I want to (re)-submit to you four pieces I wrote to explain the unexplainable: the very complicated analysis performed by a group of physicists within the CDF experiment, which led them to claim that there was a subtle new physics process hidden in the data collected in Run 2. There would be a lot to tell about that whole story, but suffices to say here that the signal never got confirmed by independent analyses and by DZERO, the competing experiment at the Tevatron. As mesmerizing and striking the CDF were, they were finally archived as some intrinsic incapability of the experiment to make perfect sense of their muon detector signals.

read more

Matt von HippelPhysics Gets Easier, Then Harder

Some people have stories about an inspiring teacher who introduced them to their life’s passion. My story is different: I became a physicist due to a famously bad teacher.

My high school was, in general, a good place to learn science, but physics was the exception. The teacher at the time had a bad reputation, and while I don’t remember exactly why I do remember his students didn’t end up learning much physics. My parents were aware of the problem, and aware that physics was something I might have a real talent for. I was already going to take math at the university, having passed calculus at the high school the year before, taking advantage of a program that let advanced high school students take free university classes. Why not take physics at the university too?

This ended up giving me a huge head-start, letting me skip ahead to the fun stuff when I started my Bachelor’s degree two years later. But in retrospect, I’m realizing it helped me even more. Skipping high-school physics didn’t just let me move ahead: it also let me avoid a class that is in many ways more difficult than university physics.

High school physics is a mess of mind-numbing formulas. How is velocity related to time, or acceleration to displacement? What’s the current generated by a changing magnetic field, or the magnetic field generated by a current? Students learn a pile of apparently different procedures to calculate things that they usually don’t particularly care about.

Once you know some math, though, you learn that most of these formulas are related. Integration and differentiation turn the mess of formulas about acceleration and velocity into a few simple definitions. Understand vectors, and instead of a stack of different rules about magnets and circuits you can learn Maxwell’s equations, which show how all of those seemingly arbitrary rules fit together in one reasonable package.

This doesn’t just happen when you go from high school physics to first-year university physics. The pattern keeps going.

In a textbook, you might see four equations to represent what Maxwell found. But once you’ve learned special relativity and some special notation, they combine into something much simpler. Instead of having to keep track of forces in diagrams, you can write down a Lagrangian and get the laws of motion with a reliable procedure. Instead of a mess of creation and annihilation operators, you can use a path integral. The more physics you learn, the more seemingly different ideas get unified, the less you have to memorize and the more just makes sense. The more physics you study, the easier it gets.

Until, that is, it doesn’t anymore. A physics education is meant to catch you up to the state of the art, and it does. But while the physics along the way has been cleaned up, the state of the art has not. We don’t yet have a unified set of physical laws, or even a unified way to do physics. Doing real research means once again learning the details: quantum computing algorithms or Monte Carlo simulation strategies, statistical tools or integrable models, atomic lattices or topological field theories.

Most of the confusions along the way were research problems in their own day. Electricity and magnetism were understood and unified piece by piece, one phenomenon after another before Maxwell linked them all together, before Lorentz and Poincaré and Einstein linked them further still. Once a student might have had to learn a mess of particles with names like J/Psi, now they need just six types of quarks.

So if you’re a student now, don’t despair. Physics will get easier, things will make more sense. And if you keep pursuing it, eventually, it will stop making sense once again.

January 30, 2025

n-Category Café Comagnitude 2

Previously: Part 1

Last time, I talked about the magnitude of a set-valued functor. Today, I’ll introduce the comagnitude of a set-valued functor.

I don’t know how much there is to the comagnitude idea. Let’s see! I’ll tell you all the interesting things I know about it.

Along the way, I’ll also ask an elementary question about group actions that I hope someone knows how to answer.

Recap

First, a quick recap of last time. A weighting on a finite category AA is a function w A:ob(A)w_A: ob(A) \to \mathbb{Q} such that

aA|A(b,a)|w A(a)=1 \sum_{a \in A} |A(b, a)| w_A(a) = 1

for all bAb \in A. This is a system of linear equations with as many equations as unknowns, so there’s usually a unique weighting w Aw_A on AA. The magnitude of a functor X:AFinSetX: A \to FinSet is defined to be

|X|= aw A(a)|X(a)|. |X| = \sum_a w_A(a) |X(a)| \in \mathbb{Q}.

This is equal to the cardinality of the colimit of XX when XX is a coproduct of representables. Examples of this equality include the inclusion-exclusion principle and the simple formula for the number of orbits of a free action of a group on a finite set. But even when XX isn’t a coproduct of representables, |X||X| means something, and I gave a few examples involving entropy.

There’s also a definition of the magnitude (or Euler characteristic) of a finite category AA: it’s the total weight

|A|= aAw A(a). |A| = \sum_{a \in A} w_A(a).

This is equal to the magnitude of the constant functor Δ1:AFinSet\Delta 1: A \to FinSet that sends everything to the one-element set.

But in the other direction, the definition of the magnitude of a set-valued functor can also be seen as a special case of the definition of the magnitude of a category: |X||X| is equal to the magnitude of 𝔼(X)\mathbb{E}(X), the category of elements of XX. In a comment on the last post, Mike Shulman pointed to work of his and Kate Ponto’s showing that it’s also equal to the Euler characteristic of the homotopy colimit of XX, under some fairly mild additional hypotheses on AA.

Aside   The two results in the last paragraph are easily deduced from each other, up to some small changes in the hypotheses on AA. This isn’t entirely surprising, as 𝔼(X)\mathbb{E}(X) is the colax colimit of X:ACatX: A \to Cat (where I’ve changed the codomain of XX from SetSet to CatCat by viewing sets as discrete categories). In any case, Thomason proved that the nerve of 𝔼(X)\mathbb{E}(X) is homotopy equivalent to hocolim(X)hocolim(X), so in particular, they have the same Euler characteristic. But the Euler characteristic of the nerve of a category is its magnitude (under hypotheses), so

|𝔼(X)|=χ(hocolim(X)). |\mathbb{E}(X)| = \chi(hocolim(X)).

Thus, if we know that one side of this equation is equal to |X||X|, then so is the other. A few more details of this argument are in my comment here.

But there’s a but! Whereas magnitude can be defined for categories as well as set-valued functors, it will turn out that comagnitude can’t be, as far as I can see. We’ll get to this later.

The cardinality of a limit

Let’s play a dual game to the one we played last time: given a finite category AA and a functor X:AFinSetX: A \to FinSet, can we find a formula for the cardinality of the limit of XX, in terms of AA and the cardinalities of the sets X(a)X(a) (aAa \in A) only?

Of course, we can’t in general: |limX||lim X| depends on what XX does to the maps in AA as well as the objects. But there are significant cases where we can:

Examples

  • If AA is the two-object discrete category (01)(0 \quad 1) then a functor X:AFinSetX: A \to FinSet is a pair (X 0,X 1)(X_0, X_1) of finite sets, and

    |X 0×X 1|=|X 0|×|X 1|. |X_0 \times X_1| = |X_0| \times |X_1|.

  • What about pullbacks? Take finite sets and functions

    X 0fX 1gX 2. X_0 \stackrel{f}{\rightarrow} X_1 \stackrel{g}{\leftarrow} X_2.

    Suppose ff is uniform, meaning that all its fibres are isomorphic: f 1(y)f 1(y)f^{-1}(y) \cong f^{-1}(y') for all y,yX 1y, y' \in X_1. This is equivalent to the condition that ff is a product projection X 1×FX 1X_1 \times F \to X_1, a condition I talked about here, using the word “projection” instead of “uniform”. Using the uniformity of ff, it’s a little exercise to show that the pullback of our diagram has cardinality

    |X 0||X 2||X 1|. \frac{{|X_0|} \cdot {|X_2|}}{|X_1|}.

  • In a similar way, we can show that the equalizer of a pair of functions f,g:X 0X 1f, g: X_0 \to X_1 has cardinality |X 0|/|X 1|{|X_0|}/{|X_1|} if the function (f,g):X 0X 1×X 1(f, g): X_0 \to X_1 \times X_1 is uniform.

    You can think of this condition as saying that given an element x 0X 0x_0 \in X_0 chosen uniformly at random, all pairs (f(x 0),g(x 0))(f(x_0), g(x_0)) are equally likely. In particular, the probability that (f(x 0),g(x 0))(f(x_0), g(x_0)) lies on the diagonal of X 1×X 1X_1 \times X_1 is

    |diagonal||X 1×X 1|=|X 1||X 1||X 1|=1|X 1|. \frac{|diagonal|}{|X_1 \times X_1|} = \frac{|X_1|}{{|X_1|}\cdot {|X_1|}} = \frac{1}{|X_1|}.

    Since an element x 0X 0x_0 \in X_0 is in the equalizer if and only if (f(x 0),g(x 0))(f(x_0), g(x_0)) lies on the diagonal, we’d expect the equalizer to have cardinality |X 0|/|X 1|{|X_0|}/{|X_1|}, and it does. I’ll come back to this probabilistic idea later, rigorously.

  • Now let’s consider cofree actions of a group GG on a set. By definition, the cofree GG-set generated by a set SS is the set S GS^G of GG-indexed families (s g) gG(s_g)_{g \in G} of elements of SS (or functions GSG \to S if you prefer), with action

    h(s g) gG=(s gh) gG. h \cdot (s_g)_{g \in G} = (s_{g h})_{g \in G}.

    The fixed points of this GG-set are the constant families (s) gG(s)_{g \in G}, for sSs \in S. So, if we’re given just the GG-set S GS^G and not told what SS is, we can reconstruct SS anyway: it’s the set of fixed points. In particular, when everything is finite, |S||S| is the cardinality of the GG-set to the power of 1/o(G)1/o(G) (writing oo for order).

    Now, when a GG-set is regarded as a functor BGSetB G \to Set, the set of fixed points is the limit of the functor. So for a functor X:BGFinSetX: B G \to FinSet corresponding to a cofree action of a group GG on a finite set X 0X_0,

    |limX|=|X 0| 1/o(G). |lim X| = |X_0|^{1/o(G)}.

Let me digress briefly to ask an elementary question:

Question   How do you recognize a cofree group action?

What I mean is this. Define a GG-set to be free if it’s isomorphic to G×SG \times S (with its usual action) for some set SS. It’s a useful little theorem that a GG-set XX is free if and only if:

for all gGg \in G, if gx=xg x = x for some xXx \in X then g=1g = 1.

Is there an analogous result for cofree actions?

As I mentioned above, if you’ve got a cofree GG-set XX and you want to find the set SS such that XS GX \cong S^G, it’s easy: SS is the set of fixed points. But I can’t get any further than that, or obtain any non-tautological necessary and sufficient condition for a GG-set to be cofree.

Back to the cardinality of limits! In all the examples above, for functors X:AFinSetX: A \to FinSet satisfying certain conditions, we had

|limX|= aA|X(a)| q(a) |lim X| = \prod_{a \in A} |X(a)|^{q(a)}

for some rational numbers q(a)q(a) (aAa \in A) depending only on AA, not XX. If you read Part 1, you might be able to guess both what these numbers q(a)q(a) are and what condition on XX guarantees that this equation holds. So I’ll just jump to the answer and state the definitions that make it all work.

Here goes. A coweighting on AA is a weighting on A opA^{op}, or explicitly, a function w A:ob(A)w^A: ob(A) \to \mathbb{Q} such that

aA|A(a,b)|w A(a)=1 \sum_{a \in A} |A(a, b)| w^A(a) = 1

for all bAb \in A. As for weightings, I’ll assume throughout that AA has exactly one coweighting, which is generically true.

As I explained in a previous post, the forgetful functor

Set ASet ob(A) Set^A \to Set^{ob(A)}

has both a left adjoint LL and a right adjoint RR. A functor ASetA \to Set is in the image of LL if and only if it’s a coproduct of representables; these were the functors that made the formula for the cardinality of a colimit work. I’ll say a functor ASetA \to Set is cofree if it’s in the image of RR. These are the ones that make the formula for the magnitude of a limit work:

Theorem   Let AA be a finite category and X:AFinSetX: A \to FinSet a cofree functor. Then

|limX|= aA|X(a)| w A(a). |lim X| = \prod_{a \in A} |X(a)|^{w^A(a)}.

What do the cofree functors actually look like? I don’t know a really easily testable condition — and that’s why I asked the question above about cofree actions by a group. But they’re not the products of representables. In fact, given a family of sets

(S(a)) aASet ob(A), (S(a))_{a \in A} \in Set^{ob(A)},

the cofree functor X=R(S)X = R(S) is given by

X(a)= bS(b) A(a,b). X(a) = \prod_b S(b)^{A(a, b)}.

You can see from this formula how X(a)X(a) is functorial in aa. (The right-hand side is doubly contravariant in aa, which means it’s covariant.) In examples, the functions

X(a)X(u)X(b) X(a) \stackrel{X(u)}{\to} X(b)

arising from maps auba \stackrel{u}{\to} b in AA have a strong tendency to be uniform, in the sense I defined in the pullback example above. In fact, as I mentioned in a previous post, X(u)X(u) is always uniform if uu is epic, and in the kinds of category AA that we often take limits over, the maps often are epic.

Examples

  • When AA is a two-object discrete category, so that limits over AA are binary products, every functor AFinSetA \to FinSet is cofree, the coweights of the objects of AA are both 11, and the theorem gives the usual formula for the cardinality of a product of two finite sets.

  • For pullbacks, we’re taking AA to be (012)(0 \rightarrow 1 \leftarrow 2), whose coweighting is (1,1,1)(1, -1, 1). A functor AFinSetA \to FinSet is a diagram (X 0X 1X 2)(X_0 \rightarrow X_1 \leftarrow X_2) of finite sets, and it’s cofree if and only if both these functions are uniform. In that case, the theorem gives us the formula we saw before for the cardinality of a pullback: it’s

    |X 0||X 2||X 1|. \frac{{|X_0|}\cdot {|X_2|}}{|X_1|}.

  • Similarly, a functor from A=(01)A = (0 \rightrightarrows 1) to FinSetFinSet consists of finite sets and functions f,g:X 0X 1f, g: X_0 \to X_1, it’s cofree if and only if (f,g):X 0X 1×X 1(f, g): X_0 \to X_1 \times X_1 is uniform, and the theorem tells us that the equalizer has cardinality

    |X 0||X 1|. \frac{|X_0|}{|X_1|}.

  • Finally, when AA is the one-object category corresponding to a group GG, the single coweight is 1/o(G)1/o(G), a functor AFinSetA \to FinSet is a finite GG-set XX, and the theorem tells us that if the action is cofree then the number of fixed points is

    |X 0| 1/o(G)|X_0|^{1/o(G)}

    (where |X 0||X_0| means the cardinality of the underlying set X 0X_0 of XX).

Comagnitude: definition and examples

Given an arbitrary functor X:AFinSetX: A \to FinSet, not necessarily cofree, it’s typically not true that

|limX|= aA|X(a)| w A(a). |lim X| = \prod_{a \in A} |X(a)|^{w^A(a)}.

But we can still think about the right-hand side! So I’ll define the comagnitude X\|X\| of XX by

X= aA|X(a)| w A(a) +. \|X\| = \prod_{a \in A} |X(a)|^{w^A(a)} \in \mathbb{R}^+.

(The notation X\|X\| isn’t great, and isn’t intended to suggest a norm. I just wanted something that looked similar to but different from X\|X\|. Suggestions are welcome.)

Examples

  • If XX is cofree then X=|limX|\|X\| = {|lim X|}.

  • For a diagram X=(X 0X 1X 2)X = (X_0 \rightarrow X_1 \leftarrow X_2) of finite sets, seen as a set-valued functor in the usual way,

    X=|X 0||X 2||X 1|. \|X\| = \frac{{|X_0|}\cdot {|X_2|}}{|X_1|}.

  • Similarly, for a diagram X=(X 0X 1)X = (X_0 \rightrightarrows X_1) of finite sets,

    X=|X 0||X 1|. \|X\| = \frac{|X_0|}{|X_1|}.

  • For a finite set X 0X_0 equipped with an action of a finite monoid GG, the comagnitude of the corresponding functor X:BGFinSetX: B G \to FinSet is given by

    X=|X 0| 1/o(G). \|X\| = |X_0|^{1/o(G)}.

  • Let AA be any finite category and SS any finite set. The constant functor ΔS:AFinSet\Delta S: A \to FinSet with value SS has comagnitude

    a|S| w A(a)=|S| aw A(a). \prod_a |S|^{w^A(a)} = |S|^{\sum_a w^A(a)}.

    Now it’s a tiny lemma that the total coweight of a category AA is equal to the total weight (try it!), which by definition is the magnitude of AA. So

    AΔSFinSet=|S| |A|. \| A \stackrel{\Delta S}{\to} FinSet \| = |S|^{|A|}.

For the next example, I have in my head an imaginary dialogue.

Me: Here’s a finite category AA. I’m going to choose a functor X:ASetX: A \to Set in secret, and I’m going to tell you the set X(a)X(a) for each object aa of AA, but I’m not going to tell you what XX does to morphisms. I want you to calculate the cardinality of the limit of XX.

You: That’s impossible! The cardinality of the limit depends on what XX does to morphisms as well as objects.

Me: Try anyway.

You: Well, okay. If the only information I’ve got is which set each object of AA gets sent to, and I’m supposed to calculate the cardinality of the limit, then the best I can do is to consider all functors that do what you say on objects. I’ll calculate the cardinality of the limit of each one, then take the average. That’s the best guess I can make.

This suggests an idea: could the comagnitude of a functor X:AFinSetX: A \to FinSet be the expected value of |limY||lim Y| for a random functor Y:AFinSetY: A \to FinSet satisfying |Y(a)|=|X(a)|{|Y(a)|} = {|X(a)|}?

Sadly not, in general (regardless of what probability distribution you use). Bu there’s a significant case where the answer is yes, where comagnitude can be characterized in terms of random functors:

Example   Let QQ be a quiver (directed multigraph). We can form the free category FQF Q on QQ, whose objects are the vertices of QQ and whose maps are the directed paths of edges in QQ. Assuming that QQ is finite and acyclic, the category FQF Q is finite too.

It’s a theorem that for a functor X:FQFinSetX: F Q \to FinSet, the comagnitude X\|X\| is the expected value of |limY||lim Y| for a functor Y:AFinSetY: A \to FinSet chosen uniformly at random subject to the condition that Y(a)=X(a)Y(a) = X(a) for all aAa \in A.

This covers cases such as products, pullbacks and equalizers: all those categories AA are free on a quiver.

(Incidentally, the cofree functors on FQF Q have a simple explicit description, which I gave here. For those, the cardinality of the limit is equal to the expected value: it’s “what you’d expect”.)

(Co)magnitude of functors vs. (co)magnitude of categories

I mentioned earlier that the constant functor ΔS:AFinSet\Delta S: A \to FinSet has comagnitude |S| |A||S|^{|A|}, for any finite set SS. In this sense, the definition of the magnitude of a category is a special case of the definition of the comagnitude of a set-valued functor (sort of; maybe “special case” is slightly overstating it). With the situation for magnitude in mind, you might wonder whether, in the other direction, the definition of the comagnitude of a set-valued functor is a special case of the definition of the magnitude of a category, or of some notion of the “comagnitude of a category”.

Not as far as I can see. For a start, two set-valued functors can have the same category of elements but different comagnitudes; so for a functor X:AFinSetX: A \to FinSet, there’s no way we’re going to be able to extract the comagnitude of XX from its category of elements. Now the category of elements is a colax colimit, so you might think of using the lax limit instead. Or you might remember the result of Ponto and Shulman that the magnitude of XX is the Euler characteristic of its homotopy colimit (under hypotheses on AA), and hope for some dual result involving homotopy limits instead.

But I don’t think any of that works. For a start, if we view X:ASetX: A \to Set as a functor ACatA \to Cat taking values in discrete categories, then its lax limit is simply the discrete category on the set limXlim X. That’s too trivial to help. But more importantly, coweights can be fractional, which means that comagnitudes

X= a|X(a)| w A(a) \|X\| = \prod_a |X(a)|^{w^A(a)}

are in general irrational. Since the magnitude of a finite category is always rational, there’s no finite category in the world whose magnitude is going to be X\|X\|.

Properties of comagnitude

In any case, comagnitude of set-valued functors has some good properties. Most simply, if XY:AFinSetX \cong Y: A \to FinSet then X=Y\|X\| = \|Y\|. Also, given functors

AFBYFinSet, A \stackrel{F}{\to} B \stackrel{Y}{\to} FinSet,

we have YF=Y\|Y \circ F\| = \|Y\| if FF is an equivalence, or even just if FF has a right adjoint. Comagnitude also behaves well with respect to limits. For example, given functors X,Y,Z:AFinSetX, Y, Z: A \to FinSet and natural transformations

XYZ X \rightarrow Y \leftarrow Z

such that the diagram

X(a)Y(a)Z(a) X(a) \rightarrow Y(a) \leftarrow Z(a)

is cofree for each aa (meaning that both functions are uniform), then the comagnitude of the pullback

X× YZ:AFinSet X \times_Y Z : A \to FinSet

is given by

X× YZ=XZY. \|X \times_Y Z\| = \frac{\|X\| \cdot \|Z\|}{\|Y\|}.

A similar result holds for any other limit shape.

Codiscrepancy

The last idea I want to share is about “codiscrepancy”. For a functor X:AFinSetX: A \to FinSet, typically |limX|X{|lim X|} \neq \|X\| unless XX is cofree. We can measure the extent to which this equation fails by defining the codiscrepancy of XX to be

|limX|X +. \frac{|lim X|}{\|X\|} \in \mathbb{R}^+.

(I’ll ignore the possibility that this is 0/00/0.) In my last post, I used the word “discrepancy” for the difference between the cardinality of the colimit and the magnitude. For limits and comagnitude, taking the ratio seems more appropriate than taking the difference.

Examples

  • Take an equalizer diagram EX 0X 1E \to X_0 \rightrightarrows X_1 in FinSetFinSet. Its codiscrepancy is

    |E||X 0|/|X 1|=|E||X 1||X 0|. \frac{|E|}{{|X_0|}/{|X_1|}} = \frac{{|E|} \cdot {|X_1|}}{|X_0|}.

    In the case of discrepancy and magnitude last time, there was some discussion in both the post and the comments about the relationship between discrepancy and the Euler characteristic of chain complexes. Codiscrepancy seems more related to homotopy cardinality, where we take an alternating product of cardinalities of homotopy groups rather than an alternating sum of ranks of homology groups. But I’m still absorbing the comments from last time and won’t try to say more now.

  • For a pullback diagram

    P X 0 X 2 X 1 \begin{array}{ccc} P & \rightarrow & X_0 \\ \downarrow & & \downarrow \\ X_2 & \rightarrow & X_1 \end{array}

    in FinSet, the codiscrepancy is

    |P||X 0||X 2|/|X 1|=|P||X 1||X 0||X 2|. \frac{|P|}{{|X_0|}\cdot {|X_2|}/{|X_1|}} = \frac{{|P|}\cdot{|X_1|}}{{|X_0|}\cdot{|X_2|}}.

    This provides an answer — although maybe not a completely satisfying one — to the question I posed at the end of part 1: can mutual information be seen as some kind of magnitude?

    Here’s how. As explained in the example on conditional entropy in part 1, an integer-valued measure on a product I×JI \times J of finite sets can be seen as a set AA together with a function AI×JA \to I \times J, or equivalently a pair of functions IAJI \leftarrow A \rightarrow J. The pullback square

    I×J I J 1 \begin{array}{ccc} I \times J & \rightarrow & I \\ \downarrow & & \downarrow \\ J & \rightarrow & 1 \end{array}

    of sets gives rise to a pullback square of monoids

    End I×J(A) End I(A) End J(A) End(A), \begin{array}{ccc} End_{I \times J}(A) & \rightarrow & End_I(A) \\ \downarrow & & \downarrow \\ End_J(A) & \rightarrow & End(A), \end{array}

    where, for instance, End I(A)End_I(A) is the monoid of endomorphisms of AA over II. And the codiscrepancy of this pullback square is

    |End(A)||End I×J(A)||End I(A)||End J(A)|, \frac{{|End(A)|}\cdot{|End_{I \times J}(A)|}}{{|End_I(A)|}\cdot{|End_J(A)|}},

    This is exactly the mutual information of our measure on I×JI \times J. So that gives the definition of the mutual information of an integer-valued measure, and by the same kind of argument we used last time, we can easily get from this the definition for arbitrary measures, including probability measures.

Reflections

Zooming out now and contemplating everything in this post and the last, the big question in my mind is: what’s this all good for? Do the magnitude and comagnitude of set-valued functors have significance beyond the examples I’ve given?

And what about the world of enriched categories and functors? The most novel and startling results in the theory of magnitude have come from thinking about metric spaces as enriched categories. So what happens in that setting? Are there interesting results?

January 29, 2025

n-Category Café Comagnitude 1

Next: Part 2

In this post and the next, I want to try out a new idea and see where it leads. It goes back to where magnitude began, which was the desire to unify elementary counting formulas like the inclusion-exclusion principle and the simple formula for the number of orbits in a free action of a group on a finite set.

To prepare the ground for comagnitude, I need to present magnitude itself in a slightly different way from usual. I won’t assume you know anything about magnitude, but if you do, watch out for something new: a connection between magnitude and entropy (ordinary, relative and conditional) that I don’t think has quite been articulated before.

The inclusion-exclusion formula tells us the cardinality of a union, which in categorical terms is a pushout. The set of orbits of a free action of a group GG on a finite set is the colimit of the associated functor BGFinSetB G \to FinSet. (Here BGB G is the one-object category whose maps are the elements of GG.) Generally, we’re going to think about the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet, where AA is a finite category.

In both the examples I just mentioned, the cardinality |colimX||colim X| of the colimit is a \mathbb{Q}-linear combination of the cardinalities |X(a)||X(a)| of the individual sets X(a)X(a), taken over all objects aAa \in A. For inclusion-exclusion, we take

A=(012) A = (0 \leftarrow 1 \rightarrow 2)

and get

|colim(X)|=|X(0)||X(1)|+|X(2)|. |colim(X)| = |X(0)| - |X(1)| + |X(2)|.

For a free action of a group GG, we take A=BGA = B G. If the single object of AA is called aa, then

|colim(X)|=|X(a)|/o(G) |colim(X)| = |X(a)|/o(G)

— the cardinality of the set X(a)X(a) being acted on, divided by the order of GG.

Generally, there’s no hope of computing the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet in terms of the cardinalities of the sets X(a)X(a) alone. The colimit usually depends on what the functor XX does to morphisms, as well as objects. So in order to pull this off, we’re going to have to confine ourselves to only those functors XX that are especially convenient. The hope is that for any AA, there are rational numbers w A(a)w_A(a) (aAa \in A) such that for all “convenient” functors X:AFinSetX: A \to FinSet,

|colim(X)|= aAw A(a)|X(a)|. |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|.

Here’s how to simultaneously figure out what the coefficients w A(a)w_A(a) in this weighted sum must be and what “convenient” should mean.

The starting thought is that most notions of “nice enough” for set-valued functors include the representables. Since the colimit of any representable is the one-element set, for the last equation to hold for all representables A(b,)A(b, -) means that

1= aAw A(a)|A(b,a)| 1 = \sum_{a \in A} w_A(a) |A(b, a)|

for all bAb \in A.

And now we’re getting somewhere! This is a system of equations with the same number of equations as unknowns w A(a)w_A(a), namely, the number of objects of AA. And in this situation, there’s typically a unique solution, at least if we work over the field \mathbb{Q}.

A family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

aA|A(b,a)|w A(a)=1 \sum_{a \in A} |A(b, a)| w_A(a) = 1

for all bAb \in A is called a weighting on AA, and w A(a)w_A(a) is called the weight of aa. For the purposes of this post, I’ll assume there’s always exactly one weighting on AA. That’s not entirely justified: there are examples of finite categories with no weighting or finitely many. (See Examples 1.11 of my paper The Euler characteristic of a category, where all the stuff I’m talking about right now is worked out.) But it’s safe enough.

So, take the weighting w Aw_A on AA. By defintion, the class of functors X:AFinSetX: A \to FinSet satisfying

|colim(X)|= aAw A(a)|X(a)| |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|

contains the representables. But it’s easy to see that it’s also closed under coproducts. So, this equation — which I’ll call the colimit formula — holds for all coproducts of representables!

Which functors ASetA \to Set are coproducts of representables? They’re sometimes called the “familially representable” functors, the idea being that a coproduct of representables iIA(a i,)\sum_{i \in I} A(a_i, -) is represented by a family of objects (a i) iI(a_i)_{i \in I}. But in my last post, I called them the “free functors”. The way I put it there was that the forgetful functor

Set ASet obA Set^A \to Set^{ob\ A}

has both a left and a right adjoint, and a functor XSet AX \in Set^A is free if and only if it’s in the image of the left adjoint. That means

X aAS(a)×A(a,) X \cong \sum_{a \in A} S(a) \times A(a, -)

for some family of sets (S(a)) aA(S(a))_{a \in A}.

Examples

  • When AA is the discrete two-object category {0,1}\{0, 1\}, both weights are 11, and the general colimit formula

    |colim(X)|= aw A(a)|X(a)| |colim(X)| = \sum_a w_A(a) |X(a)|

    reduces to something very basic:

    |X(0)+X(1)|=|X(0)|+|X(1)| |X(0) + X(1)| = |X(0)| + |X(1)|

    — the cardinality of a coproduct is the sum of the individual cardinalities. When AA is a discrete category, all functors ASetA \to Set are coproducts of representables.

  • When AA is the shape (012)(0 \leftarrow 1 \rightarrow 2) for pushouts, a functor X:AFinSetX: A \to FinSet is a diagram

    (X(0)X(1)X(2)) (X(0) \leftarrow X(1) \rightarrow X(2))

    of finite sets, and XX is a coproduct of representables if and only if both these maps are injections. Assume XX is a coproduct of representables. The weighting on AA is (1,1,1)(1, -1, 1), so the colimit formula says that the cardinality of a pushout is

    |X(0)||X(1)|+|X(2)| |X(0)| - |X(1)| + |X(2)|

    — the inclusion-exclusion formula.

  • For a group GG (or more generally, a monoid), the weight of the unique object of BGB G is 1/o(G)1/o(G), the reciprocal of the order. Given an action of GG on a a finite set, the corresponding functor BGFinSetB G \to FinSet is a coproduct of representables if and only if the action is free. In that case, the colimit formula says that the number of orbits is the cardinality of the set divided by o(G)o(G).

  • Finally, when A=(01)A = (0 \rightrightarrows 1), the colimit formula says that the coequalizer of X(0)X(1)X(0) \rightrightarrows X(1) has |X(1)||X(0)||X(1)| - |X(0)| elements, provided that the two functions from X(0)X(0) to X(1)X(1) are injective with disjoint images.

Aside for category theorists   Under finiteness conditions, and assuming that AA is Cauchy complete, a functor X:ASetX: A \to Set is a coproduct of representables if and only if it is flat with respect to the class of finite connected limits. This might seem more abstract than the original condition, but it actually gives a concrete, testable condition for XX to be a coproduct of representables. (See Lemma 5.2 and Definition 3.2 here.) This result enables one to see rather quickly that in all the examples above, the functors arising as coproducts of representables are exactly what I said they are.

 

Summary so far  We’ve seen that a finite category AA typically has a unique weighting w Aw_A — meaning a family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

a|A(a,b)|w A(a)=1 \sum_a |A(a, b)| w_A(a) = 1

for all bAb \in A. And we’ve seen that when X:AFinSetX: A \to FinSet is a coproduct of representables, the “colimit formula” holds:

|colim(X)|= aw A(a)|X(a)|. |colim(X)| = \sum_a w_A(a) |X(a)|.

Now, what if XX isn’t a coproduct of representables? The colimit formula won’t usually hold. But the right-hand side of this non-equation still calculates something. What is it?

Let me define the magnitude |X||X| of a functor X:AFinSetX: A \to FinSet by

|X|= aAw A(a)|X(a)|. |X| = \sum_{a \in A} w_A(a) |X(a)| \in \mathbb{Q}.

This is equal to |colim(X)||colim(X)| if XX is a coproduct of representables, but not in general.

For example, if we take a monoid MM acting on a finite set, the magnitude of the corresponding functor BMFinSetB M \to FinSet is the cardinality of the set divided by the order of MM. Unless the action is free, this isn’t usually the number of orbits, and it might not even be an integer.

The functor Δ1:AFinSet\Delta 1: A \to FinSet sending everything in AA to the one-element set has magnitude

|Δ1|= aAw A(a), |\Delta 1| = \sum_{a \in A} w_A(a),

which is called the magnitude or Euler characteristic of AA. (Under finiteness hypothesis, it’s equal to the topological Euler characteristic of the classifying space in AA.) Hence the magnitude of a category is a special case of the magnitude of a set-valued functor. Conversely, one can show that the magnitude of a functor X:AFinSetX: A \to FinSet is equal to the magnitude of its category of elements.

So, the concepts of magnitude of a category (already studied extensively) and the magnitude of a set-valued functor (a term I think I’m introducing for the first time here) are each special cases of the other. But in these posts, unlike everything I’ve written about magnitude before, I’m going to give prime position to magnitude of set-valued functors rather than of categories.

 

Does this definition of the magnitude of a set-valued functor have sensible properties? It does!

First, if X,Y:AFinSetX, Y: A \to FinSet are isomorphic then certainly |X|=|Y||X| = |Y|. Second, and a bit less trivially, magnitude is invariant under equivalence in the following sense: if we compose X:AFinSetX: A \to FinSet with an equivalence of categories G:BAG: B \to A then

|XG|=|X|. |X \circ G| = |X|.

In fact, this holds just as long as GG has a left adjoint.

The term “magnitude” suggests it should behave in a cardinality-like way, and it does that too. For example, given functors X,Y:AFinSetX, Y: A \to FinSet,

|X+Y|=|X|+|Y|, |X + Y| = |X| + |Y|,

where the left-hand side is the magnitude of the coproduct of our two functors. A bit more generally, suppose we have a diagram of functors and natural transformations

XYZ X \leftarrow Y \rightarrow Z

such that for each aAa \in A, the functions

X(a)Y(a)Z(a) X(a) \leftarrow Y(a) \rightarrow Z(a)

are injective with disjoint images (or more abstractly, the corresponding functor from (012)(0 \leftarrow 1 \rightarrow 2) to FinSetFinSet is a coproduct of representables). Then one can show that the magnitude of the pushout X+ YZX +_Y Z is given by

|X+ YZ|=|X||Y|+|Z|. |X +_Y Z| = |X| - |Y| + |Z|.

So magnitude in this sense obeys the inclusion-exclusion formula. More generally still, there’s a similar result for any shape of colimit, but I won’t write it out here. It’s exactly what the pushout case suggests.

 

I hope I’ve driven home the message that although the colimit formula

|colim(X)|=|X| |colim(X)| = |X|

for functors X:AFinSetX: A \to FinSet holds when XX is a coproduct of representables, it doesn’t hold for arbitrary XX. So perhaps it’s interesting to see how much the two sides of the equation differ when XX isn’t a coproduct of representables. In other words, let’s look at the difference

|colim(X)||X|=|colim(X)| aw A(a)|X(a)|. |colim(X)| - |X| = |colim(X)| - \sum_a w_A(a) |X(a)| \in \mathbb{Q}.

It seems to me that this quantity could be interesting. However, I don’t have a strong feel for what it means yet, so for now I’ll give it a bland name: the discrepancy of XX.

Example  The discrepancy of a coequalizer

X 0X 1C X_0 \rightrightarrows X_1 \to C

is

|C|(|X 1||X 0|)=|X 0||X 1|+|C|. |C| - (|X_1| - |X_0|) = |X_0| - |X_1| + |C|.

We’ve already seen that this is 00 if our two functions f,g:X 0X 1f, g: X_0 \rightrightarrows X_1 are injective with disjoint images. But in general, it looks rather like an Euler characteristic. In fact, from our coequalizer we can form a chain complex of free abelian groups

0X 0gfX 1hC0 0 \to \mathbb{Z} X_0 \stackrel{\mathbb{Z}g - \mathbb{Z}f}{\to} \mathbb{Z} X_1 \stackrel{\mathbb{Z}h}{\to} \mathbb{Z} C \to 0

where hh is the original function X 1CX_1 \to C. Then the discrepancy of our coequalizer is equal to the Euler characteristic of this complex.

This complex is exact except maybe at X 0\mathbb{Z} X_0. Let’s do a little check: if ff and gg are injective with disjoint images then gf\mathbb{Z}g - \mathbb{Z}f is injective too, so the complex is exact everywhere, which implies that the Euler characteristic is 00. Since the discrepancy is also 00 in this case, we’ve just confirmed one case of the result that the discrepancy of the coequalizer is the Euler characteristic of the resulting complex.

Example  For a pushout square

X 1 X 0 X 2 P, \begin{array}{ccc} X_1 &\rightarrow &X_0 \\ \downarrow & &\downarrow \\ X_2 &\rightarrow &P, \end{array}

the discrepancy is

|P|(|X 0||X 1|+|X 2|)=|X 1||X 0||X 2|+|P|. |P| - (|X_0| - |X_1| + |X_2|) = |X_1| - |X_0| - |X_2| + |P|.

I don’t know what to do with this formula, but it has a certain symmetry. Has anyone come across it before?

 

I’ll finish by explaining the entropy connection I promised at the start. Here we go!

 

Ordinary Shannon entropy  For this, we need to think about the Shannon entropy not just of probability measures — the usual setting — but of arbitrary measures. The sets we’re considering measures on will always be finite.

A probability measure pp on a finite set {1,,n}\{1, \ldots, n\} is just an nn-tuple (p 1,,p n)(p_1, \ldots, p_n) of nonnegative reals summing to 11, and its entropy H(p)H(p) is defined by

H(p)= ip ilogp i, H(p) = -\sum_i p_i \log p_i,

with the convention that 0log0=00\ \log\ 0 = 0. A measure a=(a 1,,a n)a = (a_1, \ldots, a_n) on {1,,n}\{1, \ldots, n\} is any nn-tuple of nonnegative reals, and the definition of entropy is extended from probability measures to arbitrary measures by homogeneity. That is, writing a=a i\|a\| = \sum a_i, the tuple a/aa/\|a\| is a probability measure, and we define

H(a)=aH(a/a) H(a) = \|a\| H(a/\|a\|)

(or H(a)=0H(a) = 0 if a=0\|a\| = 0). This is the unique way of extending the definition so that

H(λa)=λH(a) H(\lambda a) = \lambda H(a)

for all measures aa and real λ0\lambda \geq 0.

Let’s now consider measures a=(a 1,,a n)a = (a_1, \ldots, a_n) where each a ia_i is an integer. Such a thing amounts to a map

AπI A \stackrel{\pi}{\to} I

of finite sets. To see this, take I={1,,n}I = \{1, \ldots, n\}, and take AπIA \stackrel{\pi}{\to} I to be a function into II whose ii-fibre has cardinality a ia_i.

This categorification manoeuvre — upgrading from numbers to sets — brings maps into play. In particular, we get the monoid End(A)End(A) of all functions AAA \to A, and its submonoid End I(A)End_I(A) consisting of the endomorphisms of AA over II. That is, the elements of End I(A)End_I(A) are the functions f:AAf: A \to A such that πf=π\pi \circ f = \pi.

The inclusion of monoids

End I(A)End(A), End_I(A) \hookrightarrow End(A),

like any other monoid homomorphism, induces an action of the domain on the underlying set of the codomain: applying an element fEnd I(A)f \in End_I(A) to an element gEnd(A)g \in End(A) gives a new element fgEnd(A)f \circ g \in End(A). And this monoid action corresponds to a functor

End I(A)FinSet. End_I(A) \to FinSet.

(Or “BEnd I(A)FinSetB End_I(A) \to FinSet” if you want to be fussy, but I’ll drop the BBs.)

Question: what’s the magnitude of this functor?

Answer: it’s

|End(A)||End I(A)|=a a ia i a i= i(aa i) a i. \frac{|End(A)|}{|End_I(A)|} = \frac{\|a\|^{\|a\|}}{\prod_i a_i^{a_i}} = \prod_i \biggl( \frac{\|a\|}{a_i} \biggr)^{a_i}.

This is nothing but exp(H(a))\exp(H(a)), the exponential of the entropy of our measure! So:

The exponential of entropy is a special case of magnitude.

(Sometimes it seems that the exponential of entropy is a more fundamental quantity than the entropy itself. Apart from anything else, it doesn’t depend on a choice of base. I like to call the exponential of entropy the diversity.)

At least, that’s true when the measure of each point is an integer. But from knowing the entropy of such measures, we can easily obtain the entropy of those where the entropy of each point is rational, assuming the homogeneity property H(λa)=λH(a)H(\lambda a) = \lambda H(a). And then one can get the completely general case of real measures by extending by continuity.

The basic calculation above was described in an earlier post of mine, but at that time I didn’t see clearly what it meant.

 

Relative entropy  Given two probability measures pp and rr on the same finite set, you can calculate their relative entropy. It’s a measure of how surprised you’d be to observe a distribution of pp in a sample drawn from a population with distribution rr. (I think of rr as the “reference” distribution.) The formula for H(pr)H(p\|r), the entropy of pp relative to rr, is

H(pr)= ip ilog(p i/r i)[0,]. H(p\|r) = \sum_i p_i \log(p_i/r_i) \in [0, \infty].

Just as for ordinary entropy, we can extend it from probability measures to arbitrary measures. We’re now dealing with two measures aa and bb on the same finite set, and I’ll assume they have the same total mass: a=b\|a\| = \|b\|. Again, we extend homogeneously, so that

H(λaλb)=λH(ab) H(\lambda a \| \lambda b) = \lambda H(a\|b)

whenever λ0\lambda \geq 0.

And as before, we’ll focus on measures taking integer values. A pair of measures on a finite set II, both taking integer values and with the same total mass, amounts to a diagram

AI A \rightrightarrows I

in FinSetFinSet. The idea is that if we take I={1,,n}I = \{1, \ldots, n\} and write a ia_i and b ib_i for the cardinalities of the fibres of the ii-fibres of these two maps, then one of our measures is (a 1,,a n)(a_1, \ldots, a_n) and the other is (b 1,,b n)(b_1, \ldots, b_n). They have the same total mass because

a 1++a n=|A|=b 1++b n. a_1 + \cdots + a_n = |A| = b_1 + \cdots + b_n.

Calling our two maps π,ρ:AI\pi, \rho: A \to I, we can consider the set

Hom I(AπI,AρI) Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)

of functions g:AAg : A \to A such that ρg=π\rho \circ g = \pi. This set is acted on by the monoid

End I(AπI) End_I(A \stackrel{\pi}{\to} I)

of functions f:AAf : A \to A such that πf=π\pi \circ f = \pi, by composition.

As we keep seeing, a monoid action corresponds to a functor from that monoid to SetSet, whose magnitude we can compute. In this case, the magnitude is

|Hom I(AπI,AρI)||End I(AπI)|=b i a ia i=(a ib i) a i, \frac{|Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)|}{|End_I(A \stackrel{\pi}{\to} I)|} = \frac{\prod b_i^{a_i}}{\prod a_i} = \prod \biggl( \frac{a_i}{b_i} \biggr)^{a_i},

which is equal to

exp(H(ab)) \exp(-H(a \| b))

—the negative exponential of relative entropy.

(We’re never going to be able to get rid of that negative and get exp(H(ab))\exp(H(a \| b)) itself as a magnitude, since H(ab)H(a\|b) can be infinite but that the magnitude of a functor can’t be, at least in the context at hand.)

As for ordinary entropy, if we know relative entropy for integer-valued measures then it’s only two short steps to the definition for arbitrary measures.

 

Conditional entropy  Whereas relative entropy takes as its input a pair of measures on the same set, conditional entropy takes as its input two measures on different sets.

It’s usually presented in terms of random variables: we have random variables XX and YY taking values in sets II and JJ respectively (which for us will be finite), and the conditional entropy H(X|Y)H(X | Y) is defined by

H(X|Y)= iI,jJP(X=i,Y=j)log1P(X=i|Y=j). H(X | Y) = \sum_{i \in I, j \in J} P(X = i, Y = j) \log \frac{1}{P(X = i | Y = j)}.

If we write (p ij) iI,jJ(p_{i j})_{i \in I, j \in J} for the joint distribution — so that the p ijp_{i j} are nonnegative reals summing to 11 — then

H(X|Y)= i,jp ijlog(p jp ij). H(X | Y) = \sum_{i, j} p_{i j} \log \biggl( \frac{p_j}{p_{i j}} \biggr).

Here pp is a probability measure on I×JI \times J. But just as for ordinary and relative entropy, this definition extends to arbitrary measures aa on I×JI \times J, by scaling homogeneously.

To see how conditional entropy is a special case of magnitude, we follow a path that should be familiar by now.

An integer-valued measure on I×JI \times J amounts to a map AI×JA \to I \times J of finite sets, or equivalently a diagram

IAJ I \leftarrow A \rightarrow J

in FinSetFinSet. In the monoid of all endomorphisms of AA, we can consider the endomorphisms over II, or over JJ, or most restrictively of all, over I×JI \times J. In particular, there is an inclusion of monoids

End I×J(A)End J(A). End_{I \times J}(A) \hookrightarrow End_J(A).

As for ordinary entropy, this homomorphism induces an action of the domain on the underlying set of the codomain, giving a functor End I×J(A)FinSetEnd_{I \times J}(A) \to FinSet whose magnitude we can calculate. It’s

|End J(A)||End I×J(A)|= ja j a j i,ja ij a ij= i,j(a ja ij) a ij, \frac{|End_J(A)|}{|End_{I \times J}(A)|} = \frac{\prod_j a_j^{a_j}}{\prod_{i, j} a_{i j}^{a_{i j}}} = \prod_{i, j} \biggl( \frac{a_j}{a_{i j}} \biggr)^{a_{i j}},

where I’m using a ija_{i j} to mean the cardinality of the (i,j)(i, j)-fibre of AI×JA \to I \times J and a ja_j to mean the cardinality of the jj-fibre of AJA \to J. This is exactly the exponential of the conditional entropy.

 

Conclusion   The exponentials of

  • ordinary entropy
  • the negative of relative entropy
  • conditional entropy

all arise as special cases of magnitude — at least for integer-valued measures, but it’s routine to derive the definition for general measures from that special case.

 

If you know lots about entropy, you might be wondering now: what about mutual information? Is that the magnitude of some set-valued functor too?

I don’t know! Given AI×JA \to I \times J, the quantity we’d like to obtain as a magnitude is

|End(A)||End I×J(A)||End I(A)||End J(A)|. \frac{|End(A)| |End_{I \times J}(A)|}{|End_I(A)| |End_J(A)|}.

I can’t see how to do that in any natural way.

Maybe I’ll come back to this point next time, when I’ll start on the topic that’s actually the title of these posts: comagnitude.

Terence TaoNew exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach

Timothy Trudgian, Andrew Yang and I have just uploaded to the arXiv the paper “New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach“. This paper launches a project envisioned in this previous blog post, in which the (widely dispersed) literature on various exponents in classical analytic number theory, as well as the relationships between these exponents, are collected in a living database, together with computer code to optimize the relations between them, with one eventual goal being to automate as much as possible the “routine” components of many analytic number theory papers, in which progress on one type of exponent is converted via standard arguments to progress on other exponents.

The database we are launching concurrently with this paper is called the Analytic Number Theory Exponent Database (ANTEDB). This Github repository aims to collect the latest results and relations on exponents such as the following:

  • The growth exponent {\mu(\sigma)} of the Riemann zeta function {\zeta(\sigma+it)} at real part {\sigma} (i.e., the best exponent for which {\zeta(\sigma+it) \ll |t|^{\mu(\sigma)+o(1)}} as {t \rightarrow \infty});
  • Exponent pairs {(k,\ell)} (used to bound exponential sums {\sum_{n \in I} e(T F(n/N))} for various phase functions {F} and parameters {T,N});
  • Zero density exponents {A(\sigma)} (used to bound the number of zeros of {\zeta} of real part larger than {\sigma});
etc.. These sorts of exponents are related to many topics in analytic number theory; for instance, the Lindelof hypothesis is equivalent to the assertion {\mu(1/2)=0}.

Information on these exponents is collected both in a LaTeX “blueprint” that is available as a human-readable set of web pages, and as part of our Python codebase. In the future one could also imagine the data being collected in a Lean formalization, but at present the database only contains a placeholder Lean folder.

As a consequence of collecting all the known bounds in the literature on these sorts of exponents, as well as abstracting out various relations between these exponents that were implicit in many papers in this subject, we were then able to run computer-assisted searches to improve some of the state of the art on these exponents in a largely automated fashion (without introducing any substantial new inputs from analytic number theory). In particular, we obtained:

  • four new exponent pairs;
  • several new zero density estimates; and
  • new estimates on the additive energy of zeroes of the Riemann zeta function.

We are hoping that the ANTEDB will receive more contributions in the future, for instance expanding to other types of exponents, or to update the database as new results are obtained (or old ones added). In the longer term one could also imagine integrating the ANTEDB with other tools, such as Lean or AI systems, but for now we have focused primarily on collecting the data and optimizing the relations between the exponents.

January 26, 2025

Jordan EllenbergThe edge of the world (is Midvale Boulevard)

The Whole Foods in Madison used to be on University Avenue where Shorewood Boulevard empties onto it. That’s about a 7-minute drive from my house. I stopped in there all the time, just because I was driving by and I needed something that was easy to find there. Last year, Whole Foods moved over to the new Hilldale Yards development, still on University but just across Midvale. It’s now an 8-minute drive from my house. And I haven’t been there since. Somehow, everything east of Midvale is “my zone” — a place I a place I might pass by, and go without planning to go. And anything west of Midvale is only accessible by a dedicated trip.

By the way, the old Whole Foods location is now Fresh Mart, an international (but mostly Middle Eastern and Eastern European) grocery store. It’s awesome to have an international store the size of a Whole Foods but it is really hard to imagine how they’re making rent on a space that big. So please go there, buy some actually sharp feta and freekeh and ajvar and marinated sprats in a can! They have an extremely multicultural hot bar with great grape leaves and a coffeeshop with multiple kinds of baklava! I’m glad it’s in my zone.

Jordan EllenbergAphorism mashup

“Never doubt that the master’s tools will dismantle the master’s house; indeed, they’re the only thing that ever have.”

Jordan EllenbergAbstain abstemiously

The word “abstemious” is one that usually comes to mind only because (apparently sharing this feature only with “facetious”) it has all five English vowels, once each, in alphabetical order. I had always assumed it derived from the word “abstain.” But no! “Abstain” is the Latin negative prefix ab- followed by tenere, “to have” — you don’t have what you abstain from. But to be abstemious is specifically to abstain from alcohol. What comes after ab is actually temetum, a Latin word whose ultimate origin is the proto-Indo-European temH, or “darkness” (i.e. what you experience when you drink too much temetum.)

Anyway, no better time to listen to EBN OZN’s wondrous one hit, “AEIOU Sometimes Y,” which is apparently the first pop song ever recorded entirely on a computer. “Do I want to go out?”

January 25, 2025

Doug NatelsonTurbulent times

While I've been absolutely buried under deadlines, it's been a crazy week for US science, and things are unlikely to calm down anytime soon.  As I've written before, I largely try to keep my political views off here, since that's not what people want to read from me, and I want to keep the focus on the amazing physics of materials and nanoscale systems.  (Come on, this is just cool - using light to dynamically change the chirality of crystals?  That's really nifty.)   

Still, it's hard to be silent, even just limiting the discussion to science-related issues.  Changes of presidential administrations always carry a certain amount of perturbation, as the heads of many federal agencies are executive branch appointees who serve at the pleasure of the president.  That said, the past week was exceptional for multiple reasons, including pulling the US out of the WHO as everyone frets about H5N1 bird flu; a highly disruptive freeze of activity within HHS (lots of negative consequences even if it wraps up quickly); and immediate purging of various agency websites of any programs or language related to DEI, with threatened punishment for employees who don't report their colleagues for insufficient reporting of any continued DEI-related activities.

Treating other people with respect, trying to make science (and engineering) welcoming to all, and trying to engage and educate the widest possible population in expanding human knowledge should not be controversial positions.  Saying that we should try to broaden the technical workforce, or that medical trials should involve women and multiple races should not be controversial positions.

What I wrote eight years ago is still true.  It is easier to break things than to build things.  Rash steps very often have lingering unintended consequences.  

Panic is not helpful.  Doomscrolling is not helpful.  Getting through challenging times requires determination, focus, and commitment to not losing one's principles.  

Ok, enough out of me.  Next week (deadlines permitting) I'll be back with some science, because that's what I do.


January 24, 2025

Matt von HippelScience Journalism Tasting Notes

When you’ve done a lot of science communication you start to see patterns. You notice the choices people make when they write a public talk or a TV script, the different goals and practical constraints that shape a piece. I’ve likened it to watching an old kung fu movie and seeing where the wires are.

I don’t have a lot of experience doing science journalism, I can’t see the wires yet. But I’m starting to notice things, subtle elements like notes at a wine-tasting. Just like science communication by academics, science journalism is shaped by a variety of different goals.

First, there’s the need for news to be “new”. A classic news story is about something that happened recently, or even something that’s happening right now. Historical stories usually only show up as new “revelations”, something the journalist or a researcher recently dug up. This isn’t a strict requirement, and it seems looser in science journalism than in other types of journalism: sometimes you can have a piece on something cool the audience might not know, even if it’s not “new”. But it shapes how things are covered, it means that a piece on something old will often have something tying it back to a recent paper or an ongoing research topic.

Then, a news story should usually also be a “story”. Science communication can sometimes involve a grab-bag of different topics, like a TED talk that shows off a few different examples. Journalistic pieces often try to deliver one core message, with details that don’t fit the narrative needing to wait for another piece where they fit better. You might be tempted to round this off to saying that journalists are better writers than academics, since it’s easier for a reader to absorb one message than many. But I think it also ties to the structure. Journalists do have content with multiple messages, it just usually is not published as one story, but a thematic collection of stories.

Combining those two goals, there’s a tendency for news to focus on what happened. “First they had the idea, then there were challenges, then they made their discovery, now they look to the future.” You can’t just do that, though, because of another goal: pedagogy. Your audience doesn’t know everything you know. In order for them to understand what happened, there are often other things they have to understand. In non-science news, this can sometimes be brief, a paragraph that gives the background for people who have been “living under a rock”. In science news, there’s a lot more to explain. You have to teach something, and teaching well can demand a structure very different from the one-step-at a time narrative of what happened. Balancing these two is tricky, and it’s something I’m still learning how to do, as can be attested by the editors who’ve had to rearrange some of my pieces to make the story flow better.

News in general cares about being independent, about journalists who figure out the story and tell the truth regardless of what the people in power are saying. Science news is strange because, if a scientist gets covered at all, it’s almost always positive. Aside from the occasional scandal or replication crisis, science news tends to portray scientific developments as valuable, “good news” rather than “bad news”. If you’re a politician or a company, hearing from a journalist might make you worry. If you say the wrong thing, you might come off badly. If you’re a scientist, your biggest worry is that a journalist might twist your words into a falsehood that makes your work sound too good. On the other hand, a journalist who regularly publishes negative things about scientists would probably have a hard time finding scientists to talk to! There are basic journalistic ethics questions here that one probably learns about at journalism school and we who sneak in with no training have to learn another way.

These are the flavors I’ve tasted so far: novelty and narrative vs. education, positivity vs. accuracy. I’ll doubtless see more over the years, and go from someone who kind of knows what they’re doing to someone who can mentor others. With that in mind, I should get to writing!

January 22, 2025

Terence TaoA pilot project in universal algebra to explore new ways to collaborate and use machine assistance?

Traditionally, mathematics research projects are conducted by a small number (typically one to five) of expert mathematicians, each of which are familiar enough with all aspects of the project that they can verify each other’s contributions. It has been challenging to organize mathematical projects at larger scales, and particularly those that involve contributions from the general public, due to the need to verify all of the contributions; a single error in one component of a mathematical argument could invalidate the entire project. Furthermore, the sophistication of a typical math project is such that it would not be realistic to expect a member of the public, with say an undergraduate level of mathematics education, to contribute in a meaningful way to many such projects.

For related reasons, it is also challenging to incorporate assistance from modern AI tools into a research project, as these tools can “hallucinate” plausible-looking, but nonsensical arguments, which therefore need additional verification before they could be added into the project.

Proof assistant languages, such as Lean, provide a potential way to overcome these obstacles, and allow for large-scale collaborations involving professional mathematicians, the broader public, and/or AI tools to all contribute to a complex project, provided that it can be broken up in a modular fashion into smaller pieces that can be attacked without necessarily understanding all aspects of the project as a whole. Projects to formalize an existing mathematical result (such as the formalization of the recent proof of the PFR conjecture of Marton, discussed in this previous blog post) are currently the main examples of such large-scale collaborations that are enabled via proof assistants. At present, these formalizations are mostly crowdsourced by human contributors (which include both professional mathematicians and interested members of the general public), but there are also some nascent efforts to incorporate more automated tools (either “good old-fashioned” automated theorem provers, or more modern AI-based tools) to assist with the (still quite tedious) task of formalization.

However, I believe that this sort of paradigm can also be used to explore new mathematics, as opposed to formalizing existing mathematics. The online collaborative “Polymath” projects that several people including myself organized in the past are one example of this; but as they did not incorporate proof assistants into the workflow, the contributions had to be managed and verified by the human moderators of the project, which was quite a time-consuming responsibility, and one which limited the ability to scale these projects up further. But I am hoping that the addition of proof assistants will remove this bottleneck.

I am particularly interested in the possibility of using these modern tools to explore a class of many mathematical problems at once, as opposed to the current approach of focusing on only one or two problems at a time. This seems like an inherently modularizable and repetitive task, which could particularly benefit from both crowdsourcing and automated tools, if given the right platform to rigorously coordinate all the contributions; and it is a type of mathematics that previous methods usually could not scale up to (except perhaps over a period of many years, as individual papers slowly explore the class one data point at a time until a reasonable intuition about the class is attained). Among other things, having a large data set of problems to work on could be helpful for benchmarking various automated tools and compare the efficacy of different workflows.

One recent example of such a project was the Busy Beaver Challenge, which showed this July that the fifth Busy Beaver number {BB(5)} was equal to {47176870}. Some older crowdsourced computational projects, such as the Great Internet Mersenne Prime Search (GIMPS), are also somewhat similar in spirit to this type of project (though using more traditional proof of work certificates instead of proof assistants). I would be interested in hearing of any other extant examples of crowdsourced projects exploring a mathematical space, and whether there are lessons from those examples that could be relevant for the project I propose here.

More specifically I would like to propose the following (admittedly artificial) project as a pilot to further test out this paradigm, which was inspired by a MathOverflow question from last year, and discussed somewhat further on my Mastodon account shortly afterwards.

The problem is in the field of universal algebra, and concerns the (medium-scale) exploration of simple equational theories for magmas. A magma is nothing more than a set {G} equipped with a binary operation {\circ: G \times G \rightarrow G}. Initially, no additional axioms on this operation {\circ} are imposed, and as such magmas by themselves are somewhat boring objects. Of course, with additional axioms, such as the identity axiom or the associative axiom, one can get more familiar mathematical objects such as groups, semigroups, or monoids. Here we will be interested in (constant-free) equational axioms, which are axioms of equality involving expressions built from the operation {\circ} and one or more indeterminate variables in {G}. Two familiar examples of such axioms are the commutative axiom

\displaystyle  x \circ y = y \circ x

and the associative axiom

\displaystyle  (x \circ y) \circ z = x \circ (y \circ z),

where {x,y,z} are indeterminate variables in the magma {G}. On the other hand the (left) identity axiom {e \circ x = x} would not be considered an equational axiom here, as it involves a constant {e \in G} (the identity element), which we will not consider here.

To illustrate the project I have in mind, let me first introduce eleven examples of equational axioms for magmas:

  • Equation1: {x=y}
  • Equation2: {x \circ y = z \circ w}
  • Equation3: {x \circ y = x}
  • Equation4: {(x \circ x) \circ y = y \circ x}
  • Equation5: {x \circ (y \circ z) = (w \circ u) \circ v}
  • Equation6: {x \circ y = x \circ z}
  • Equation7: {x \circ y = y \circ x}
  • Equation8: {x \circ (y \circ z) = (x \circ w) \circ u}
  • Equation9: {x \circ (y \circ z) = (x \circ y) \circ w}
  • Equation10: {x \circ (y \circ z) = (x \circ y) \circ z}
  • Equation11: {x = x}
Thus, for instance, Equation7 is the commutative axiom, and Equation10 is the associative axiom. The trivial axiom Equation1 is the strongest, as it forces the magma {G} to have at most one element; at the opposite extreme, the reflexive axiom Equation11 is the weakest, being satisfied by every single magma.

One can then ask which axioms imply which others. For instance, Equation1 implies all the other axioms in this list, which in turn imply Equation11. Equation8 implies Equation9 as a special case, which in turn implies Equation10 as a special case. The full poset of implications can be depicted by the following Hasse diagram:

This in particular answers the MathOverflow question of whether there were equational axioms intermediate between the constant axiom Equation1 and the associative axiom Equation10.

Most of the implications here are quite easy to prove, but there is one non-trivial one, obtained in this answer to a MathOverflow post closely related to the preceding one:

Proposition 1 Equation4 implies Equation7.

Proof: Suppose that {G} obeys Equation4, thus

\displaystyle  (x \circ x) \circ y = y \circ x \ \ \ \ \ (1)

for all {x,y \in G}. Specializing to {y=x \circ x}, we conclude

\displaystyle (x \circ x) \circ (x \circ x) = (x \circ x) \circ x

and hence by another application of (1) we see that {x \circ x} is idempotent:

\displaystyle  (x \circ x) \circ (x \circ x) = x \circ x. \ \ \ \ \ (2)

Now, replacing {x} by {x \circ x} in (1) and then using (2), we see that

\displaystyle  (x \circ x) \circ y = y \circ (x \circ x),

so in particular {x \circ x} commutes with {y \circ y}:

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ (x \circ x). \ \ \ \ \ (3)

Also, from two applications (1) one has

\displaystyle  (x \circ x) \circ (y \circ y) = (y \circ y) \circ x = x \circ y.

Thus (3) simplifies to {x \circ y = y \circ x}, which is Equation7. \Box

A formalization of the above argument in Lean can be found here.

I will remark that the general question of determining whether one set of equational axioms determines another is undecidable; see Theorem 14 of this paper of Perkins. (This is similar in spirit to the more well known undecidability of various word problems.) So, the situation here is somewhat similar to the Busy Beaver Challenge, in that past a certain point of complexity, we would necessarily encounter unsolvable problems; but hopefully there would be interesting problems and phenomena to discover before we reach that threshold.

The above Hasse diagram does not just assert implications between the listed equational axioms; it also asserts non-implications between the axioms. For instance, as seen in the diagram, the commutative axiom Equation7 does not imply the Equation4 axiom

\displaystyle  (x+x)+y = y + x.

To see this, one simply has to produce an example of a magma that obeys the commutative axiom Equation7, but not the Equation4 axiom; but in this case one can simply choose (for instance) the natural numbers {{\bf N}} with the addition operation {x \circ y := x+y}. More generally, the diagram asserts the following non-implications, which (together with the indicated implications) completely describes the poset of implications between the eleven axioms:
  • Equation2 does not imply Equation3.
  • Equation3 does not imply Equation5.
  • Equation3 does not imply Equation7.
  • Equation5 does not imply Equation6.
  • Equation5 does not imply Equation7.
  • Equation6 does not imply Equation7.
  • Equation6 does not imply Equation10.
  • Equation7 does not imply Equation6.
  • Equation7 does not imply Equation10.
  • Equation9 does not imply Equation8.
  • Equation10 does not imply Equation9.
  • Equation10 does not imply Equation6.
The reader is invited to come up with counterexamples that demonstrate some of these implications. The hardest type of counterexamples to find are the ones that show that Equation9 does not imply Equation8: a solution (in Lean) can be found here. I placed proofs in Lean of all the above implications and anti-implications can be found in this github repository file.

As one can see, it is already somewhat tedious to compute the Hasse diagram of just eleven equations. The project I propose is to try to expand this Hasse diagram by a couple orders of magnitude, covering a significantly larger set of equations. The set I propose is the set {{\mathcal E}} of equations that use the magma operation {\circ} at most four times, up to relabeling and the reflexive and symmetric axioms of equality; this includes the eleven equations above, but also many more. How many more? Recall that the Catalan number {C_n} is the number of ways one can form an expression out of {n} applications of a binary operation {\circ} (applied to {n+1} placeholder variables); and, given a string of {m} placeholder variables, the Bell number {B_m} is the number of ways (up to relabeling) to assign names to each of these variables, where some of the placeholders are allowed to be assigned the same name. As a consequence, ignoring symmetry, the number of equations that involve at most four operations is

\displaystyle  \sum_{n,m \geq 0: n+m \leq 4} C_n C_m B_{n+m+2} = 9131.

The number of equations in which the left-hand side and right-hand side are identical is

\displaystyle  \sum_{n=0}^2 C_n B_{n+1} = 1 * 1 + 1 * 2 + 2 * 5 = 13;

these are all equivalent to reflexive axiom (Equation11). The remaining {9118} equations come in pairs by the symmetry of equality, so the total size of {{\mathcal E}} is

\displaystyle  1 + \frac{9118}{2} = 4560.

I have not yet generated the full list of such identities, but presumably this will be straightforward to do in a standard computer language such as Python (I have not tried this, but I imagine some back-and-forth with a modern AI would let one generate most of the required code). [UPDATE, Sep 26: Amir Livne Bar-on has kindly enumerated all the equations, of which there are actually 4694.]

It is not clear to me at all what the geometry of {{\mathcal E}} will look like. Will most equations be incomparable with each other? Will it stratify into layers of “strong” and “weak” axioms? Will there be a lot of equivalent axioms? It might be interesting to record now any speculations as what the structure of this poset, and compare these predictions with the outcome of the project afterwards.

A brute force computation of the poset {{\mathcal E}} would then require {4560 \times (4560-1) = 20789040} comparisons, which looks rather daunting; but of course due to the axioms of a partial order, one could presumably identify the poset by a much smaller number of comparisons. I am thinking that it should be possible to crowdsource the exploration of this poset in the form of submissions to a central repository (such as the github repository I just created) of proofs in Lean of implications or non-implications between various equations, which could be validated in Lean, and also checked against some file recording the current status (true, false, or open) of all the {20789040} comparisons, to avoid redundant effort. Most submissions could be handled automatically, with relatively little human moderation required; and the status of the poset could be updated after each such submission.

I would imagine that there is some “low-hanging fruit” that could establish a large number of implications (or anti-implications) quite easily. For instance, laws such as Equation2 or Equation3 more or less completely describe the binary operation {\circ}, and it should be quite easy to check which of the {4560} laws are implied by either of these two laws. The poset {{\mathcal E}} has a reflection symmetry associated to replacing the binary operator {\circ} by its reflection {\circ^{\mathrm{op}}: (x,y) \mapsto y \circ x}, which in principle cuts down the total work by a factor of about two. Specific examples of magmas, such as the natural numbers with the addition operation, obey some set of equations in {{\mathcal E}} but not others, and so could be used to generate a large number of anti-implications. Some existing automated proving tools for equational logic, such as Prover9 and Mace4 (for obtaining implications and anti-implications respectively), could then be used to handle most of the remaining “easy” cases (though some work may be needed to convert the outputs of such tools into Lean). The remaining “hard” cases could then be targeted by some combination of human contributors and more advanced AI tools.

Perhaps, in analogy with formalization projects, we could have a semi-formal “blueprint” evolving in parallel with the formal Lean component of the project. This way, the project could accept human-written proofs by contributors who do not necessarily have any proficiency in Lean, as well as contributions from automated tools (such as the aforementioned Prover9 and Mace4), whose output is in some other format than Lean. The task of converting these semi-formal proofs into Lean could then be done by other humans or automated tools; in particular I imagine modern AI tools could be particularly valuable for this portion of the workflow. I am not quite sure though if existing blueprint software can scale to handle the large number of individual proofs that would be generated by this project; and as this portion would not be formally verified, a significant amount of human moderation might also be needed here, and this also might not scale properly. Perhaps the semi-formal portion of the project could instead be coordinated on a forum such as this blog, in a similar spirit to past Polymath projects.

It would be nice to be able to integrate such a project with some sort of graph visualization software that can take an incomplete determination of the poset {{\mathcal E}} as input (in which each potential comparison {E \implies E'} in {{\mathcal E}} is marked as either true, false, or open), completes the graph as much as possible using the axioms of partial order, and then presents the partially known poset in a visually appealing way. If anyone knows of such a software package, I would be happy to hear of it in the comments.

Anyway, I would be happy to receive any feedback on this project; in addition to the previous requests, I would be interested in any suggestions for improving the project, as well as gauging whether there is sufficient interest in participating to actually launch it. (I am imagining running it vaguely along the lines of a Polymath project, though perhaps not formally labeled as such.)

UPDATE, Sep 30 2024: The project is up and running (and highly active), with the main page being this Github repository. See also the Lean Zulip chat for some (also very active) discussion on the project.

January 20, 2025

John PreskillTen lessons I learned from John Preskill

Last August, Toronto’s Centre for Quantum Information and Quantum Control (CQIQC) gave me 35 minutes to make fun of John Preskill in public. CQIQC was hosting its biannual conference, also called CQIQC, in Toronto. The conference features the awarding of the John Stewart Bell Prize for fundamental quantum physics. The prize derives its name for the thinker who transformed our understanding of entanglement. John received this year’s Bell Prize for identifying, with collaborators, how we can learn about quantum states from surprisingly few trials and measurements.

The organizers invited three Preskillites to present talks in John’s honor: Hoi-Kwong Lo, who’s helped steer quantum cryptography and communications; Daniel Gottesman, who’s helped lay the foundations of quantum error correction; and me. I believe that one of the most fitting ways to honor John is by sharing the most exciting physics you know of. I shared about quantum thermodynamics for (simple models of) nuclear physics, along with ten lessons I learned from John. You can watch the talk here and check out the paper, recently published in Physical Review Letters, for technicalities.

John has illustrated this lesson by wrestling with the black-hole-information paradox, including alongside Stephen Hawking. Quantum information theory has informed quantum thermodynamics, as Quantum Frontiers regulars know. Quantum thermodynamics is the study of work (coordinated energy that we can harness directly) and heat (the energy of random motion). Systems exchange heat with heat reservoirs—large, fixed-temperature systems. As I draft this blog post, for instance, I’m radiating heat into the frigid air in Montreal Trudeau Airport.

So much for quantum information. How about high-energy physics? I’ll include nuclear physics in the category, as many of my Europeans colleagues do. Much of nuclear physics and condensed matter involves gauge theories. A gauge theory is a model that contains more degrees of freedom than the physics it describes. Similarly, a friend’s description of the CN Tower could last twice as long as necessary, due to redundancies. Electrodynamics—the theory behind light bulbs—is a gauge theory. So is quantum chromodynamics, the theory of the strong force that holds together a nucleus’s constituents.

Every gauge theory obeys Gauss’s law. Gauss’s law interrelates the matter at a site to the gauge field around the site. For example, imagine a positive electric charge in empty space. An electric field—a gauge field—points away from the charge at every spot in space. Imagine a sphere that encloses the charge. How much of the electric field is exiting the sphere? The answer depends on the amount of charge inside, according to Gauss’s law.

Gauss’s law interrelates the matter at a site with the gauge field nearby…which is related to the matter at the next site…which is related to the gauge field farther away. So everything depends on everything else. So we can’t easily claim that over here are independent degrees of freedom that form a system of interest, while over there are independent degrees of freedom that form a heat reservoir. So how can we define the heat and work exchanged within a lattice gauge theory? If we can’t, we should start biting our nails: thermodynamics is the queen of the physical theories, a metatheory expected to govern all other theories. But how can we define the quantum thermodynamics of lattice gauge theories? My colleague Zohreh Davoudi and her group asked me this question.

I had the pleasure of addressing the question with five present and recent Marylanders…

…the mention of whom in my CQIQC talk invited…

I’m a millennial; social media took off with my generation. But I enjoy saying that my PhD advisor enjoys far more popularity on social media than I do.

How did we begin establishing a quantum thermodynamics for lattice gauge theories?

Someone who had a better idea than I, when I embarked upon this project, was my colleague Chris Jarzynski. So did Dvira Segal, a University of Toronto chemist and CQIQC’s director. So did everyone else who’d helped develop the toolkit of strong-coupling thermodynamics. I’d only heard of the toolkit, but I thought it sounded useful for lattice gauge theories, so I invited Chris to my conversations with Zohreh’s group.

I didn’t create this image for my talk, believe it or not. The picture already existed on the Internet, courtesy of this blog.

Strong-coupling thermodynamics concerns systems that interact strongly with reservoirs. System–reservoir interactions are weak, or encode little energy, throughout much of thermodynamics. For example, I exchange little energy with Montreal Trudeau’s air, relative to the amount of energy inside me. The reason is, I exchange energy only through my skin. My skin forms a small fraction of me because it forms my surface. My surface is much smaller than my volume, which is proportional to the energy inside me. So I couple to Montreal Trudeau’s air weakly.

My surface would be comparable to my volume if I were extremely small—say, a quantum particle. My interaction with the air would encode loads of energy—an amount comparable to the amount inside me. Should we count that interaction energy as part of my energy or as part of the air’s energy? Could we even say that I existed, and had a well-defined form, independently of that interaction energy? Strong-coupling thermodynamics provides a framework for answering these questions.

Kevin Kuns, a former Quantum Frontiers blogger, described how John explains physics through simple concepts, like a ball attached to a spring. John’s gentle, soothing voice resembles a snake charmer’s, Kevin wrote. John charms his listeners into returning to their textbooks and brushing up on basic physics.

Little is more basic than the first law of thermodynamics, synopsized as energy conservation. The first law governs how much a system’s internal energy changes during any process. The energy change equals the heat absorbed, plus the work absorbed, by the system. Every formulation of thermodynamics should obey the first law—including strong-coupling thermodynamics. 

Which lattice-gauge-theory processes should we study, armed with the toolkit of strong-coupling thermodynamics? My collaborators and I implicitly followed

and

We don’t want to irritate experimentalists by asking them to run difficult protocols. Tom Rosenbaum, on the left of the previous photograph, is a quantum experimentalist. He’s also the president of Caltech, so John has multiple reasons to want not to irritate him.

Quantum experimentalists have run quench protocols on many quantum simulators, or special-purpose quantum computers. During a quench protocol, one changes a feature of the system quickly. For example, many quantum systems consist of particles hopping across a landscape of hills and valleys. One might flatten a hill during a quench.

We focused on a three-step quench protocol: (1) Set the system up in its initial landscape. (2) Quickly change the landscape within a small region. (3) Let the system evolve under its natural dynamics for a long time. Step 2 should cost work. How can we define the amount of work performed? By following

John wrote a blog post about how the typical physicist is a one-trick pony: they know one narrow subject deeply. John prefers to know two subjects. He can apply insights from one field to the other. A two-trick pony can show that Gauss’s law behaves like a strong interaction—that lattice gauge theories are strongly coupled thermodynamic systems. Using strong-coupling thermodynamics, the two-trick pony can define the work (and heat) exchanged within a lattice gauge theory. 

An experimentalist can easily measure the amount of work performed,1 we expect, for two reasons. First, the experimentalist need measure only the small region where the landscape changed. Measuring the whole system would be tricky, because it’s so large and it can contain many particles. But an experimentalist can control the small region. Second, we proved an equation that should facilitate experimental measurements. The equation interrelates the work performed1 with a quantity that seems experimentally accessible.

My team applied our work definition to a lattice gauge theory in one spatial dimension—a theory restricted to living on a line, like a caterpillar on a thin rope. You can think of the matter as qubits2 and the gauge field as more qubits. The system looks identical if you flip it upside-down; that is, the theory has a \mathbb{Z}_2 symmetry. The system has two phases, analogous to the liquid and ice phases of H_2O. Which phase the system occupies depends on the chemical potential—the average amount of energy needed to add a particle to the system (while the system’s entropy, its volume, and more remain constant).

My coauthor Connor simulated the system numerically, calculating its behavior on a classical computer. During the simulated quench process, the system began in one phase (like H_2O beginning as water). The quench steered the system around within the phase (as though changing the water’s temperature) or across the phase transition (as though freezing the water). Connor computed the work performed during the quench.1 The amount of work changed dramatically when the quench started steering the system across the phase transition. 

Not only could we define the work exchanged within a lattice gauge theory, using strong-coupling quantum thermodynamics. Also, that work signaled a phase transition—a large-scale, qualitative behavior.

What future do my collaborators and I dream of for our work? First, we want for an experimentalist to measure the work1 spent on a lattice-gauge-theory system in a quantum simulation. Second, we should expand our definitions of quantum work and heat beyond sudden-quench processes. How much work and heat do particles exchange while scattering in particle accelerators, for instance? Third, we hope to identify other phase transitions and macroscopic phenomena using our work and heat definitions. Fourth—most broadly—we want to establish a quantum thermodynamics for lattice gauge theories.

Five years ago, I didn’t expect to be collaborating on lattice gauge theories inspired by nuclear physics. But this work is some of the most exciting I can think of to do. I hope you think it exciting, too. And, more importantly, I hope John thought it exciting in Toronto.

I was a student at Caltech during “One Entangled Evening,” the campus-wide celebration of Richard Feynman’s 100th birthday. So I watched John sing and dance onstage, exhibiting no fear of embarrassing himself. That observation seemed like an appropriate note on which to finish with my slides…and invite questions from the audience.

Congratulations on your Bell Prize, John.

1Really, the dissipated work.

2Really, hardcore bosons.

January 19, 2025

John BaezMagnetohydrodynamics

Happy New Year!

I recently wrote about how the Parker Solar Probe crossed the Sun’s ‘Alfvén surface’: the surface outside which the outflowing solar wind becomes supersonic.

This is already pretty cool—but even better, the ‘sound’ here is not ordinary sound: it consists of vibrations in both the hot electrically conductive plasma of the Sun’s atmosphere and its magnetic field! These vibrations are called ‘Alfvén waves’.

To understand these waves, we need to describe how an electrically conductive fluid interacts with the electric and magnetic fields. We can do this—to a reasonable approximation—using the equations of ‘magnetohydrodynamics’:

• Wikipedia, Magnetohydrodynamics: equations.

These equations also describe other phenomena we see in stars, outer space, fusion reactors, the Earth’s liquid outer core, and the Earth’s upper atmosphere: the ionosphere, and above that the magnetosphere. These phenomena can be very difficult to understand, combining all the complexities of turbulence with new features arising from electromagnetism. Here’s an example, called the Orzag–Tang vortex, simulated by Philip Mocz:

I’ve never studied magnetohydrodynamics—I was afraid that if I learned a little, I’d never stop, because it’s endlessly complex. Now I’m getting curious. But all I want to do today is explain to you, and myself, what the equations of magnetohydrodynamics are—and where they come from.

To get these equations, we assume that our system is described by some time-dependent fields on 3-dimensional space, or whatever region of space our fluid occupies. I’ll write vector fields in boldface and scalar fields in non-bold:

• the velocity of our fluid, \mathbf{v}

• the density field of our fluid, \rho

• the pressure field of our fluid, P

• the electric current, \mathbf{J}

• the electric field, \mathbf{E}

• the magnetic field, \mathbf{B}

You may have noticed one missing: the charge density! We assume that this is zero, because in a highly conductive medium the positive and negative charges quickly even out unless the electric field is changing very rapidly. (When this assumption breaks down we use more complicated equations.)

So, we start with Maxwell’s equations, but with the charge density set to zero:

\begin{array}{lll}     \nabla\cdot\mathbf{E}= 0 & \qquad \quad &     \displaystyle{   \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} } \\ \\     \nabla\cdot\mathbf{B}=0  & &  \displaystyle{   \nabla\times\mathbf{B}= \mu_0 \mathbf{J}+ \epsilon_0 \mu_0 \frac{\partial\mathbf{E} }{\partial t} }  \end{array}

I’m writing them in SI units, where they include two constants:

• the electric permittivity of the vacuum, \epsilon_0

• the magnetic permeability of the vacuum, \mu_0

The product of these two constants is the reciprocal of the square of the speed of light:

\mu_0 \epsilon_0 = 1/c^2

This makes the last term Maxwell added to his equations very small unless the electric field is changing very rapidly:

\displaystyle{ \nabla\times\mathbf{B}= \mu_0 \mathbf{J} + \epsilon_0 \mu_0 \frac{\partial\mathbf{E} }{\partial t} }

In magnetohydrodynamics we assume the electric field is changing slowly, so we drop this last term, getting a simpler equation:

\displaystyle{ \nabla\times\mathbf{B}= \mu_0 \mathbf{J} }

We also assume a sophisticated version of Ohm’s law: the electric current \mathbf{J} is proportional to the force on the charges at that location. But here the force involves not only the electric field but the magnetic field! So, it’s given by the Lorentz force law, namely

\mathbf{E} + \mathbf{v} \times \mathbf{B}

where notice we’re assuming the velocity of the charges is the fluid velocity \mathbf{v}. Thus we get

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

where \eta is the electrical conductivity of the fluid.

Next we assume local conservation of mass: the increase (or decrease) of the fluid’s density at some point can only be caused by fluid flowing toward (or away from) that point. So, the time derivative of the density \rho is minus the divergence of the momentum density \rho \mathbf{v}:

\displaystyle{ \frac{\partial \rho}{\partial t} =   - \nabla \cdot (\rho \mathbf{v}) }

This is analogous to the equation describing local conservation of charge in electromagnetism: the so-called continuity equation.

We also assume that the pressure is some function of the density:

\displaystyle{ P = f(\rho) }

This is called the equation of state of our fluid. The function f depends on the fluid: for example, for an ideal gas P is simply proportional to \rho. We can use the equation of state to eliminate P and work with \rho.

Last—but not least—we need an equation of motion saying how the fluid’s velocity \mathbf{v} changes with time! This equation follows from Newton’s law \mathbf{F} = m \mathbf{a}, or more precisely his actual law

\displaystyle{ \frac{d}{d t} \mathbf{p} = \mathbf{F}}

where \mathbf{p} is momentum. But we need to replace \mathbf{p} by the momentum density \rho \mathbf{v} and replace \mathbf{F} by the force density, which is

\displaystyle{ \mathbf{J} \times \mathbf{B} - \nabla P + \mu \nabla^2 \mathbf{v}}

The force density comes three parts:

• The force of the magnetic field on the current, as described by the Lorentz force law: \mathbf{J} \times \mathbf{B}.

• The force caused by the gradient of pressure, pointing toward regions of lower pressure: - \nabla P

• the force caused by viscosity, where faster bits of fluid try to speed up their slower neighbors, and vice versa: \mu \nabla^2 \mathbf{v}.

Here \mu is called the coefficient of viscosity.

Thus, our equation of motion is this:

\displaystyle{ \frac{D}{D t} (\rho \mathbf{v}) = \mathbf{J} \times \mathbf{B} - \nabla P + \mu \nabla^2 \mathbf{v} }

Here we aren’t using just the ordinary time derivative of \rho \mathbf{v}: we want to keep track of how \rho \mathbf{v} is changing for a bit of fluid that’s moving along with the flow of the fluid, so we need to add in the derivative of \rho \mathbf{v} in the \mathbf{v} direction. For this we use the material derivative:

\displaystyle{ \frac{D}{Dt} = \frac{\partial}{\partial t} + \mathbf{v} \cdot \nabla   }

which also has many other names like ‘convective derivative’ or ‘substantial derivative’.

So, those are the equations of magnetohydrodynamics! Let’s see them in one place. I’ll use the equation of state to eliminate the pressure by writing it as a function of density:

MAGNETOHYDRODYNAMICS

Simplified Maxwell equations:

\begin{array}{lll}     \nabla\cdot\mathbf{E}= 0 & \qquad \quad &     \displaystyle{ \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} } \\ \\     \nabla\cdot\mathbf{B}=0  & &  \displaystyle{  \nabla\times\mathbf{B}= \mu_0 \mathbf{J} }  \end{array}

Ohm’s law:

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

Local conservation of mass:

\displaystyle{ \frac{\partial \rho}{\partial t} = - \nabla \cdot (\rho \mathbf{v}) }

Equation of motion:

\displaystyle{ \left( \frac{\partial}{\partial t} + \mathbf{v} \cdot \nabla  \right) (\rho \mathbf{v}) = \mathbf{J} \times \mathbf{B} - \nabla (f(\rho)) +  \mu \nabla^2 \mathbf{v}}

Notice that in our simplified Maxwell equations, two terms involving the electric field are gone. That’s why these are called the equations of magnetohydrodynamics. You can even eliminate the current \mathbf{J} from these equations, replacing it with \mu_0 \nabla \times \mathbf{B}. The magnetic field reigns supreme!

Magnetic diffusion

It feels unsatisfying to quit right after I show you the equations of magnetohydrodynamics. Having gotten this far, I can’t resist showing you a couple of cool things we can do with these equations!

First, we can use Ohm’s law to see how the magnetic field tends to ‘diffuse’, like heat spreads out through a medium.

We start with Ohm’s law:

\eta \mathbf{J} = \mathbf{E} + \mathbf{v} \times \mathbf{B}

We take the curl of both sides:

\displaystyle{ \eta \nabla \times \mathbf{J} = \nabla \times \mathbf{E} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

We can get every term here to involve \mathbf{B} if we use two of our simplified Maxwell equations:

\displaystyle{ \mathbf{J} = \frac{1}{\eta} \nabla \times \mathbf{B}, \qquad  \nabla\times\mathbf{E}= -\frac{\partial\mathbf{B} }{\partial t} }

We get this:

\displaystyle{ \frac{\eta}{\mu_0} \nabla \times (\nabla \times \mathbf{B}) = -\frac{\partial\mathbf{B} }{\partial t} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

Then we can use this vector identity:

\displaystyle{ \nabla \times (\nabla \times \mathbf{B}) = \nabla (\nabla \cdot \mathbf{B}) - \nabla^2 \mathbf{B} }

Since another of the Maxwell equations says \nabla \cdot \mathbf{B} = 0, we get

\displaystyle{ \nabla \times (\nabla \times \mathbf{B}) = - \nabla^2 \mathbf{B} }

and thus

\displaystyle{ - \frac{\eta}{\mu_0}  \nabla^2 \mathbf{B} = -\frac{\partial\mathbf{B} }{\partial t} + \nabla \times( \mathbf{v} \times \mathbf{B}) }

and finally we get

THE MAGNETIC DIFFUSION EQUATION

\displaystyle{ \frac{\partial\mathbf{B} }{\partial t}  = \frac{\eta}{\mu_0}  \nabla^2 \mathbf{B} \, + \, \nabla \times( \mathbf{v} \times \mathbf{B}) }

Except for the last term, this is the heat equation—but not for temperature, which is a scalar field, but for the magnetic field, which is a vector field! The constant \eta / \mu_0 says how fast the magnetic field spreads out, so it’s called the magnetic diffusivity.

The last term makes things more complicated and interesting.

Magnetic pressure and tension

Second, and finally, I want to give you more intuition for how the magnetic field exerts a force on the conductive fluid in magnetohydrodynamics. We’ll see that the magnetic field has both pressure and tension.

Remember, the magnetic field exerts a force

\mathbf{F}_m = \mathbf{J} \times \mathbf{B}

More precisely this is the force per volume: the magnetic force density. We can express this solely in terms of the magnetic field using one of our simplified Maxwell equations

\nabla \times \mathbf{B} = \mu_0 \mathbf{J}

We get

\displaystyle{ \mathbf{F}_m = \frac{1}{\mu_0} (\nabla \times \mathbf{B}) \times \mathbf{B} }

Next we can use a fiendish vector calculus identity: for any vector fields \mathbf{A}, \mathbf{B} in 3d space we have

\begin{array}{ccl}   \nabla (\mathbf{A} \cdot \mathbf{B}) &=&   (\mathbf{A} \cdot \nabla) \mathbf{B} + (\mathbf{B} \cdot \nabla) \mathbf{A} \\  & & +\mathbf{A} \times (\nabla \times \mathbf{B}) + \mathbf{B} \times (\nabla \times \mathbf{A})   \end{array}

When \mathbf{A} = \mathbf{B} this gives

\nabla (\mathbf{B} \cdot \mathbf{B}) = 2(\mathbf{B} \cdot \nabla) \mathbf{B} + 2 \mathbf{B} \times (\nabla \times \mathbf{B})

or what we actually need now:

(\nabla \times \mathbf{B}) \times \mathbf{B} = (\mathbf{B} \cdot \nabla) \mathbf{B} - \frac{1}{2} \nabla (B^2)

where B^2 = \mathbf{B} \cdot \mathbf{B}.

This identity gives a nice formula for

THE MAGNETIC FORCE DENSITY

\displaystyle{ \mathbf{F}_m = \frac{1}{\mu_0}(\mathbf{B} \cdot \nabla) \mathbf{B} \, - \, \frac{1}{2\mu_0} \nabla (B^2) }

The point of all these manipulations was not merely to revive our flagging memories of vector calculus—though that was good too. The real point is that they reveal that the magnetic force density consists of two parts, each with a nice physical intepretation.

Of course the force acts on the fluid, not on the magnetic field lines themselves. But as you dig deeper into magnetohydrodynamics, you’ll see that sometimes the magnetic field lines get ‘frozen in’ to the fluid—that is, they get carried along by the fluid flow, like bits of rope in a raging river. Then the first term above tends to straighten out these field lines, while the second term tends to push them apart!

The second term is minus the gradient of

\displaystyle{ \frac{1}{2\mu_0} B^2 }

Since minus the gradient of ordinary pressure creates a force density on a fluid, we call the quantity above the magnetic pressure. Yes, the square of the magnitude of the magnetic field creates a kind of pressure! It pushes the fluid like this:


The other more subtle term is called magnetic tension:

\displaystyle{ \frac{1}{\mu_0}(\mathbf{B} \cdot \nabla) \mathbf{B}  }

It points like this:



Its magnitude is \kappa B^2/\mu_0 where 1/\kappa is the radius of curvature of the circle that best fits the magnetic field lines at the given point. And it points toward the center of that circle!

I’m getting tired of proving formulas, so I’ll leave the proofs of these facts as puzzles. If you get stuck and want a hint, go here.

There’s a lot more to say about this, and you can find a lot of it here:

• Gordon I. Ogilvie, Lecture notes: astrophysical fluid dynamics.

I might or might not say more—but if I don’t, these notes will satisfy your curiosity about exactly when magnetic field lines get ‘frozen in’ to the motion of an electrically conductive fluid, and how magnetic pressure then tends to push these field lines apart, while magnetic tension tends to straighten them out!

All of this is very cool, and I think this subject is a place where all the subtler formulas of vector calculus really get put to good use.

January 17, 2025

Matt von HippelWays Freelance Journalism Is Different From Academic Writing

A while back, I was surprised when I saw the writer of a well-researched webcomic assume that academics are paid for their articles. I ended up writing a post explaining how academic publishing actually works.

Now that I’m out of academia, I’m noticing some confusion on the other side. I’m doing freelance journalism, and the academics I talk to tend to have some common misunderstandings. So academics, this post is for you: a FAQ of questions I’ve been asked about freelance journalism. Freelance journalism is more varied than academia, and I’ve only been doing it a little while, so all of my answers will be limited to my experience.

Q: What happens first? Do they ask you to write something? Do you write an article and send it to them?

Academics are used to writing an article, then sending it to a journal, which sends it out to reviewers to decide whether to accept it. In freelance journalism in my experience, you almost never write an article before it’s accepted. (I can think of one exception I’ve run into, and that was for an opinion piece.)

Sometimes, an editor reaches out to a freelancer and asks them to take on an assignment to write a particular sort of article. This happens more freelancers that have been working with particular editors for a long time. I’m new to this, so the majority of the time I have to “pitch”. That means I email an editor describing the kind of piece I want to write. I give a short description of the topic and why it’s interesting. If the editor is interested, they’ll ask some follow-up questions, then tell me what they want me to focus on, how long the piece should be, and how much they’ll pay me. (The last two are related, many places pay by the word.) After that, I can write a draft.

Q: Wait, you’re paid by the word? Then why not make your articles super long, like Victor Hugo?

I’m paid per word assigned, not per word in the finished piece. The piece doesn’t have to strictly stick to the word limit, but it should be roughly the right size, and I work with the editor to try to get it there. In practice, places seem to have a few standard size ranges and internal terminology for what they are (“blog”, “essay”, “short news”, “feature”). These aren’t always the same as the categories readers see online. Some places have a web page listing these categories for prospective freelancers, but many don’t, so you have to either infer them from the lengths of articles online or learn them over time from the editors.

Q: Why didn’t you mention this important person or idea?

Because pieces pay more by the word, it’s easier as a freelancer to sell shorter pieces than longer ones. For science news, favoring shorter pieces also makes some pedagogical sense. People usually take away only a few key messages from a piece, if you try to pack in too much you run a serious risk of losing people. After I’ve submitted a draft, I work with the editor to polish it, and usually that means cutting off side-stories and “by-the-ways” to make the key points as vivid as possible.

Q: Do you do those cool illustrations?

Academia has a big focus on individual merit. The expectation is that when you write something, you do almost all of the work yourself, to the extent that more programming-heavy fields like physics and math do their own typesetting.

Industry, including journalism, is more comfortable delegating. Places will generally have someone on-staff to handle illustrations. I suggest diagrams that could be helpful to the piece and do a sketch of what they could look like, but it’s someone else’s job to turn that into nice readable graphic design.

Q: Why is the title like that? Why doesn’t that sound like you?

Editors in journalistic outlets are much more involved than in academic journals. Editors won’t just suggest edits, they’ll change wording directly and even input full sentences of their own. The title and subtitle of a piece in particular can change a lot (in part because they impact SEO), and in some places these can be changed by the editor quite late in the process. I’ve had a few pieces whose title changed after I’d signed off on them, or even after they first appeared.

Q: Are your pieces peer-reviewed?

The news doesn’t have peer review, no. Some places, like Quanta Magazine, do fact-checking. Quanta pays independent fact-checkers for longer pieces, while for shorter pieces it’s the writer’s job to verify key facts, confirming dates and the accuracy of quotes.

Q: Can you show me the piece before it’s published, so I can check it?

That’s almost never an option. Journalists tend to have strict rules about showing a piece before it’s published, related to more political areas where they want to preserve the ability to surprise wrongdoers and the independence to find their own opinions. Science news seems like it shouldn’t require this kind of thing as much, it’s not like we normally write hit pieces. But we’re not publicists either.

In a few cases, I’ve had people who were worried about something being conveyed incorrectly, or misleadingly. For those, I offer to do more in the fact-checking stage. I can sometimes show you quotes or paraphrase how I’m describing something, to check whether I’m getting something wrong. But under no circumstances can I show you the full text.

Q: What can I do to make it more likely I’ll get quoted?

Pieces are short, and written for a general, if educated, audience. Long quotes are harder to use because they eat into word count, and quotes with technical terms are harder to use because we try to limit the number of terms we ask the reader to remember. Quotes that mention a lot of concepts can be harder to find a place for, too: concepts are introduced gradually over the piece, so a quote that mentions almost everything that comes up will only make sense to the reader at the very end.

In a science news piece, quotes can serve a couple different roles. They can give authority, an expert’s judgement confirming that something is important or real. They can convey excitement, letting the reader see a scientist’s emotions. And sometimes, they can give an explanation. This last only happens when the explanation is very efficient and clear. If the journalist can give a better explanation, they’re likely to use that instead.

So if you want to be quoted, keep that in mind. Try to say things that are short and don’t use a lot of technical jargon or bring in too many concepts at once. Convey judgement, which things are important and why, and convey passion, what drives you and excited you about a topic. I am allowed to edit quotes down, so I can take a piece of a longer quote that’s cleaner or cut a long list of examples from an otherwise compelling statement. I can correct grammar and get rid of filler words and obvious mistakes. But I can’t put words in your mouth, I have to work with what you actually said, and if you don’t say anything I can use then you won’t get quoted.

John BaezObelisks

Wow! Biologists seem to have discovered an entirely new kind of life form. They’re called ‘obelisks’, and you probably have some in you.

They were discovered in 2024—not by somebody actually seeing one, but by analyzing huge amounts of genetic data from the human gut. This search found 29,959 new RNA sequences, similar to each other, but very different from any previously known. Thus, we don’t know where these things fit into the tree of life!

Biologists found them when they were trying to solve a puzzle. Even smaller than viruses, there exist ‘viroids’ that are just loops of RNA that cleverly manage to reproduce using the machinery of the cell they infect. Viruses have a protein coat. Viroids are just bare RNA—it doesn’t even code for any proteins!

But all known viroids only infect plants. The first one found causes a disease in potatoes; another causes a disease in avocados, and so on. This raised the puzzle: why aren’t there viroids that infect bacteria, or animals?

Now perhaps we’ve found them! But not quite: while obelisks may work in a similar way, they seem genetically unrelated. Also, their RNA seems to code for two proteins.

Given how little we know about this stuff, I think some caution is in order. Still, this is really cool. Do any of you biologists out there know any research going on now to learn more?

The original paper is free to read on the bioRxiv:

• Ivan N. Zheludev, Robert C. Edgar, Maria Jose Lopez-Galiano, Marcos de la Peña, Artem Babaian, Ami S. Bhatt, Andrew Z. Fire, Viroid-like colonists of human microbiomes, Cell 187 (2024), 6521-6536.e18.

I see just one other paper, about an automated system for detecting obelisks:

• Frederico Schmitt Kremer and Danielle Ribeiro de Barros, Tormentor: an obelisk prediction and annotation pipeline.

There’s also a budding Wikipedia article on obelisks:

• Wikipedia, Obelisk.

Thanks to Mike Stay for pointing this out!

January 16, 2025

John BaezThe Formal Gardens, and Beyond

I visited an old estate today
Whose gardens, much acclaimed throughout the world,
Spread out beyond the gated entranceway
In scenic splendors gradually unfurled.
Bright potted blooms sprung beaming by the drive,
While further off, large topiary yews
Rose stoutly in the air. The site, alive
With summer, traded sunned and shaded views.
This composition—classical, restrained—
Bespoke the glories of a golden age:
An equilibrium perchance ordained
By god-directors on an earthly stage.
I set off lightly down a stone-laid path
With boxwood gathered cloudlike at each side;
Here, no hint of nature’s chastening wrath
Impinged upon man’s flourishes of pride.
Proceeding, I pressed forward on a walk
Of gravel, through an arch of climbing vines
And ramblers—where a long, inquiring stalk
Of fragrant musk rose, with its fuzzy spines,
Pressed toward my face with luscious pale-pink blooms.
Aha! At once, I almost could distill
A wayward slant within these outdoor rooms!
Though order reigned, I glimpsed some riot spill
From bush and bending bough. Each bordered sward
Spoke elegance, encasing gemlike pools—
Appointed, each, with cherubs keeping guard—
Yet moss had gathered round the sculpted stools.
Still rapt by these core plats, I shortly passed
Into the grounds beyond—deep green and vast.
Mincing, I issued from that inner fold
To find ahead a trilling rivulet
With flowers on either side; yet how controlled
Was even this—a cautious, cool vignette!
Nonetheless, as in some scattered spots
I’d seen before, there crept a shaggy clump
Of unmown grass—a few forgotten blots
Upon this bloom-besprinkled sphere; a bump
Of wild daylilies, mowed along with lawn;
And, further yet afield, a sprawling mire
Spread forth, less fettered still. Here berries’ brawn
Arose obscenely through the bulbs—each briar
Announcing more desuetude in this place—
Until, at once, stopped short all tended space.
There meadows shot up—shaggy, coarse, and plain;
Wiry weeds and scabbed grass, and the mad buzz
Of insects’ millions; here the splattering rain
Had mothered slug and bug and mugworts’ fuzz.
Beyond, thick woodlands, reckless and abrupt,
Loomed, calling, “Ho, enough of tended stuff!
We are what’s real!”—threatening to erupt—
Ah, dizzy, blowsy trees—nature’s high rough
Abandon!
And so I mused upon the human mind:
Its own mild garths and cool Augustan plots
Were laid for promenades—sedate, refined—
A genteel garden park, it seems, of thoughts.
Or so it might appear—yet gather close,
O marveling guest, round something slantwise spied:
An errant feature free of plan or pose—
A rankling thing you’d wish you hadn’t eyed!
Here, stark, the prankster stands—perhaps a spire
Of malice rising prideful in the air;
Perhaps a wild confusion of desire;
Perhaps a raw delusion, none too rare.
Unchecked, untrimmed, they hint at countless more
Uncomely details sprung at every edge
Of reason’s fair cross-axis, past the door
Of harmony’s last stand; truth’s final hedge.
Observe: my own best traits were raised by force
In soil hauled in from some more fertile strand.
My consciousness, when nature takes its course,
Still bristles, as if tended by no hand.
Stripped were the grounds from which my grace was carved;
Spaded, seeded, hoed its gaping womb—
Beaten and blazed its weeds; its vermin starved
To press my brain toward paths and gorgeous bloom.
Friend, pass no further from my watered spheres,
Well-groomed to please men’s civil hands and eyes!
For past these bounds, a roiling madness rears;
The way of chaos rules and wilds arise!


This was written by my sister Alexandra Baez, who died on Saturday, September 21st, 2024. She had cancer of the tongue, which she left untreated for too long, and it metastatized: in the end, despite radiation therapy and chemotherapy, she was asphixiated by a tumor in her throat.

She had her own landscape maintenance business, which fit perfectly with her deep love and knowledge of plants. She put a lot of energy into formal poetry—that is, poetry with a strict meter and rhyme scheme. This poem is longer than most of hers, and I think this length helps bring the reader into the world of the formal garden, and then out of it into wild world of nature: the real.

January 15, 2025

n-Category Café The Dual Concept of Injection

We’re brought up to say that the dual concept of injection is surjection, and of course there’s a perfectly good reason for this. The monics in the category of sets are the injections, the epics are the surjections, and monics and epics are dual concepts in the usual categorical sense.

But there’s another way of looking at things, which gives a different answer to the question “what is the dual concept of injection?”

This different viewpoint comes from the observation that in the category of sets, the injections are precisely the coproduct inclusions (coprojections)

AA+B. A \to A + B.

Certainly every coproduct inclusion in the category of sets is injective. And conversely, any injection is a coproduct inclusion, because every subset of a set has a complement.

So, the injections between sets are the specialization to Set\mathbf{Set} of the general categorical concept of coproduct coprojection. The dual of that general categorical concept is product projection. Hence one can reasonably say that the dual concept of injection is that of product projection

A×BA, A \times B \to A,

where AA and BB are sets.

Which maps of sets are product projections? It’s not hard to show that they’re exactly the functions f:XYf: X \to Y whose fibres are all isomorphic:

f 1(y)f 1(y) f^{-1}(y) \cong f^{-1}(y')

for all y,yYy, y' \in Y. You could reasonably call these “uniform” maps, or “even coverings”, or maybe there’s some other good name, but in this post I’ll just call them projections (with the slightly guilty feeling that this is already an overworked term).

Projections are always surjections, except in the trivial case where the fibres are all empty; this happens when our map is of the form Y\varnothing \to Y. On the other hand, most surjections aren’t projections. A surjection is a function whose fibres are nonempty, but there’s no guarantee that all the fibres will be isomorphic, and usually they’re not.

A projection appears in a little puzzle I heard somewhere. Suppose someone hands you a tangle of string, a great knotted mass with who knows how many bits of string all jumbled together. How do you count the number of pieces of string?

The slow way is to untangle everything, separate the strands, then count them. But the fast way is to simply count the ends of the pieces of string, then divide by two. The point is that there’s a projection — specifically, a two-to-one map — from the set of ends to the set of pieces of string.

String theory aside, does this alternative viewpoint on the dual concept of injection have any importance? I think it does, at least a little. Let me give two illustrations.

Factorization, graphs and cographs

It’s a standard fact that any function between sets can be factorized as a surjection followed by an injection, and the same is true in many other familiar categories. But here are two other methods of factorization, dual to one another, that also work in many familiar categories. One involves injections, and the other projections.

  • In any category with finite coproducts, any map f:XYf: X \to Y factors as

    XX+Y(f1)Y, X \to X + Y \stackrel{\binom{f}{1}}{\to} Y,

    where the first map is the coproduct coprojection. This is a canonical factorization of ff as a coproduct coprojection followed by a (canonically) split epic. In Set\mathbf{Set}, it’s a canonical factorization of a function ff as an injection followed by a surjection (or “split surjection”, if you don’t want to assume the axiom of choice).

    Unlike the usual factorization involving a surjection followed by an injection, this one isn’t unique: there are many ways to factor ff as an injection followed a surjection. But it is canonical.

  • In any category with finite products, any map f:XYf: X \to Y factors as

    X(1,f)X×YY, X \stackrel{(1, f)}{\to} X \times Y \to Y,

    where the second map is the product projection. This is a canonical factorization of ff as a split monic followed by a projection. In Set\mathbf{Set}, “split monic” just means a function with a retraction, or concretely, an injection with the property that if the domain is empty then so is the codomain.

    (I understand that in some circles, this or a similar statement is called the “fundamental theorem of reversible computing”. This seems like quite a grand name, but I don’t know the context.)

    Again, even in Set\mathbf{Set}, this factorization is not unique: most functions can be factored as a split monic followed by a projection in multiple ways. But again, it is canonical.

These factorizations have a role in basic set theory. Let’s consider the second one first.

Take a function f:XYf: X \to Y. The resulting function

(1,f):XX×Y (1, f): X \to X \times Y

is injective, which means it represents a subset of X×YX \times Y, called the graph of ff. A subset of X×YX \times Y is also called a relation between XX and YY; so the graph of ff is a relation between XX and YY.

The second factorization shows that ff can be recovered from its graph, by composing with the projection X×YYX \times Y \to Y. Thus, the set of functions from XX to YY embeds into the set of relations between XX and YY. Which relations between XX and YY correspond to functions? The answer is well-known: it’s the set of relations that are “functional”, meaning that each element of XX is related to exactly one element of YY.

This correspondence between functions and their graphs is very important for many reasons, which I won’t go into here. But it has a much less well known dual, which involves the first factorization.

Here I have to take a bit of a run-up. The injections into a set SS, taken up to isomorphism over SS, correspond to the subsets of SS. Dually, the surjections out of SS, taken up to isomorphism under SS, correspond to the equivalence relations on SS. (This is more or less the first isomorphism theorem for sets.) And just as we define a relation between sets XX and YY to be a subset of X×YX \times Y, it’s good to define a corelation between XX and YY to be an equivalence relation on X+YX + Y.

Now, given a function XYX \to Y, the resulting function

(f1):X+YY \binom{f}{1}: X + Y \to Y

is surjective, and so represents an equivalence relation on X+YX + Y. This equivalence relation on X+YX + Y is called the cograph of ff, and is a corelation between XX and YY.

The first of the two factorizations above shows that ff can be recovered from its cograph, by composing with the injection XX+YX \to X + Y. Thus, the set of functions from XX to YY embeds into the set of corelations between XX and YY. Which corelations between XX and YY correspond to functions? This isn’t so well known, but — recalling that a corelation between XX and YY is an equivalence relation on X+YX + Y — you can show that it’s the corelations with the property that every equivalence class contains exactly one element of YY.

I’ve digressed enough already, but I can’t resist adding that the dual graph/cograph approaches correspond to the two standard types of picture that we draw when talking about functions: graphs on the left, and cographs on the right.

        

For example, cographs are discussed in Lawvere and Rosebrugh’s book Sets for Mathematics.

Back to the injection-projection duality! I said I’d give two illustrations of the role it plays. That was the first one: the canonical factorization of a map as an injection followed by a split epic or a split monic followed by a projection. Here’s the second.

Free and cofree presheaves

Let AA be a small category. If you’re given a set P(a)P(a) for each object aa of AA then this does not, by itself, constitute a functor ASetA \to \mathbf{Set}. A functor has to be defined on maps too. But PP does generate a functor ASetA \to \mathbf{Set}. In fact, it generates two of them, in two dual universal ways.

More exactly, we’re starting with a family (P(a)) aA(P(a))_{a \in A} of sets, which is in object of the category Set obA\mathbf{Set}^{ob\ A}. There’s a forgetful functor

Set ASet obA, \mathbf{Set}^A \to \mathbf{Set}^{ob\ A},

and the general lore of Kan extensions tells us that it has both a left adjoint LL and a right adjoint RR. Better still, it provides explicit formulas: the functors L(P),R(P):ASetL(P), R(P): A \to \mathbf{Set} are given by

(L(P))(a)= bA(b,a)×P(b) (L(P))(a) = \sum_b A(b, a) \times P(b)

(where \sum means coproduct or disjoint union), and

(R(P))(a)= bP(b) A(a,b). (R(P))(a) = \prod_b P(b)^{A(a, b)}.

I’ll call L(P)L(P) the free functor on PP, and R(P)R(P) the cofree functor on PP.

Example   Let GG be a group, seen as a one-object category. Then we’re looking at the forgetful functor Set GSet\mathbf{Set}^G \to \mathbf{Set} from GG-sets to sets, which has both adjoints.

The left adjoint LL sends a set PP to G×PG \times P with the obvious GG-action, which is indeed usually called the free GG-set on PP.

The right adjoint RR sends a set PP to the set P GP^G with its canonical action: acting on a family (p g) gGP G(p_g)_{g \in G} \in P^G by an element uGu \in G produces the family (p gu) gG(p_{g u})_{g \in G}. This is a cofree GG-set.

Cofree group actions are important in some parts of the theory of dynamical systems, where GG is often the additive group \mathbb{Z}. A \mathbb{Z}-set is a set equipped with an automorphism. The cofree \mathbb{Z}-set on a set PP is the set R(P)=P R(P) = P^{\mathbb{Z}} of double sequences of elements of the “alphabet” PP, where the automorphism shifts a sequence along by one. This is called the full shift on PP, and it’s a fundamental object in symbolic dynamics.

What’s this got to do with the injection-projection duality? The next example gives a strong clue.

Example   Let AA be the category (01)(0 \to 1) consisting of a single nontrivial map. Then an object of Set obA\mathbf{Set}^{ob\ A} is a pair (P 0,P 1)(P_0, P_1) of sets, and an object of Set A\mathbf{Set}^A is a function X 0fX 1X_0 \stackrel{f}{\to} X_1 between a pair of sets.

The free functor ASetA \to \mathbf{Set} on PP is the injection

AA+B. A \to A + B.

The cofree functor ASetA \to \mathbf{Set} on PP is the projection

A×BB. A \times B \to B.

More generally, it’s a pleasant exercise to prove:

  • Any free functor X:ASetX: A \to \mathbf{Set} preserves monics. In other words, if uu is a monic in AA then X(u)X(u) is an injection.

  • Any cofree functor X:ASetX: A \to \mathbf{Set} sends epics to projections. In other words, if uu is an epic in AA then X(u)X(u) is a projection.

These necessary conditions for being a free or cofree functor aren’t sufficient. (If you want a counterexample, think about group actions.) But they give you some feel for what free and cofree functors are like.

For instance, if AA is the free category on a directed graph, then every map in AA is both monic and epic. So when X:ASetX: A \to \mathbf{Set} is free, the functions X(u)X(u) are all injections, for every map uu in AA. When XX is cofree, they’re all projections. This imposes a severe restriction on which functors can be free or cofree.

(Question: do quiver representation people ever look at cofree representations? Here we’d replace Set\mathbf{Set} by Vect\mathbf{Vect}.)

In another post, I’ll explain why I’ve been thinking about this.

Jordan EllenbergSurprises of Mexico

We recently got back from a family trip to Mexico, a country I’d almost never been to (a couple of childhood day trips to Nogales, a walk across the border into Juárez in 1999, a day and a half in Cabo San Lucas giving a lecture.) I’m a fan! Some surprises:

  1. Drinking pulque under a highway bridge with our family (part of the food tour!) I heard this incredible banger playing on the radio:

“Reaktorn läck i Barsebäck” doesn’t sound like the name of a Mexican song, and indeed, this is a 1980 album track by Sweidish raggare rocker Eddie Meduza, which, for reasons that seem to be completely unknown, is extremely popular in Mexico, where it is (for reasons that, etc etc) known as “Himno a la Banda.”

2. Mexico had secession movements. Yucatán was an independent state in the 1840s. (One of our guides, who moved there from Mexico City, told us it still feels like a different country.) And a Maya revolt in the peninsula created a de facto independent state for decades in the second half of the 19th century. Apparently the perceived weakness of the national government was one cause of the Pastry War.

3. Partly as a result of the above, sisal plantation owners in Yucatán were short of indentured workers, so they imported 1,000 desperate Korean workers in 1905. By the time their contracts ended, independent Korea had been overrun by Japan. So they stayed in Mexico and their descendants still live there today.

4. It is customary in Mexico, or at least in Mexico City, to put a coat rack right next to the table in a restaurant. I guess that makes sense! Why wouldn’t you want your coat right there, and isn’t it nicer for it to be on a rack than hung over the back of your chair?

5. The torta we usually see in America, on a big round roll, is a Central Mexico torta. In Yucatán, a torta is served on French bread, because the filling is usually cocinita pibil, which is so juicy it would make a soft roll soggy. A crusty French roll can soak up the juices without losing its structural integrity. The principle is essentially the same as that of an Italian beef.

6. The Mesoamerican ballgame is not only still played throughout the region, there’s a World Cup of it.

Andrew JaffeThe Only Gaijin in the Onsen

After living in Japan for about four months, we left in mid-December. We miss it already.

One of the pleasures we discovered is the onsen, or hot spring. Originally referring to the natural volcanic springs themselves, and the villages around them, there are now onsens all over Japan. Many hotels have an onsen, and most towns will have several. Some people still use them as their primary bath and shower for keeping clean. (Outside of actual volcanic locations, these are technically sento rather than onsen.) You don’t actually wash yourself in the hot baths themselves; they are just for soaking, and there are often several, at different temperatures, mineral content, indoor and outdoor locations, whirlpools and even “electric baths” with muscle-stimulating currents. For actual cleaning, there is a bank of hand showers, usually with soap and shampoo. Some can be very basic, some much more like a posh spa, with massages, saunas, and a restaurant.

Our favourite, about 25 minutes away by bicycle, was Kirari Onsen Tsukuba. When not traveling, we tried to go every weekend, spending a day soaking in the hot water, eating the good food, staring at the gardens, snacking on Hokkaido soft cream — possibly the best soft-serve ice cream in the world (sorry, Carvel!), and just enjoying the quiet and peace. Even our seven- and nine-year old girls have found the onsen spirit, calming and quieting themselves down for at least a few hours.

Living in Tsukuba, lovely but not a common tourist destination, although with plenty of foreigners due to the constellation of laboratories and universities, we were often one of only one or two western families in our local onsen. It sometimes takes Americans (and those from other buttoned-up cultures) some time to get used to their sex-segregated but fully-naked policies of the baths themselves. The communal areas, however, are mixed, and fully-clothed. In fact, many hotels and fancier onsen facilities supply a jinbei, a short-sleeve pyjama set in which you can softly pad around the premises during your stay. (I enjoyed wearing jinbei so much that I purchased a lightweight cotton set for home, and am also trying to get my hands on samue, a somewhat heavier style of traditional Japanese clothing.)

And my newfound love for the onsen is another reason not to get a tattoo beyond the sagging flesh and embarrassment of my future self: in Japan, tattoos are often a symbol of the yakuza, and are strictly forbidden in the onsen, even for foreigners.

Later in our sabbatical, we will be living in the Netherlands, which also has a good public bath culture, but it will be hard to match the calm of the Japanese onsen.

January 11, 2025

Clifford Johnson93 minutes

Thanks to everyone who made all those kind remarks in various places last month after my mother died. I've not responded individually (I did not have the strength) but I did read them all and they were deeply appreciated. Yesterday would’ve been mum‘s 93rd birthday. A little side-note occurred to me the other day: Since she left us a month ago, she was just short of having seen two perfect square years. (This year and 1936.) Anyway, still on the theme of playing with numbers, my siblings and I agreed that as a tribute to her on the day, we would all do some kind of outdoor activity for 93 minutes. Over in London, my brother and sister did a joint (probably chilly) walk together in Regents Park and surrounds. I decided to take out a piece of the afternoon at low tide and run along the beach. It went pretty well, [...] Click to continue reading this post

The post 93 minutes appeared first on Asymptotia.

January 10, 2025

Matt von HippelGovernment Science Funding Isn’t a Precision Tool

People sometimes say there is a crisis of trust in science. In controversial subjects, from ecology to health, increasingly many people are rejecting not only mainstream ideas, but the scientists behind them.

I think part of the problem is media literacy, but not in the way you’d think. When we teach media literacy, we talk about biased sources. If a study on cigarettes is funded by the tobacco industry or a study on climate change is funded by an oil company, we tell students to take a step back and consider that the scientists might be biased.

That’s a worthwhile lesson, as far as it goes. But it naturally leads to another idea. Most scientific studies aren’t funded by companies, most studies are funded by the government. If you think the government is biased, does that mean the studies are too?

I’m going to argue here that government science funding is a very different thing than corporations funding individual studies. Governments do have an influence on scientists, and a powerful one, but that influence is diffuse and long-term. They don’t have control over the specific conclusions scientists reach.

If you picture a stereotypical corrupt scientist, you might imagine all sorts of perks. They might get extra pay from corporate consulting fees. Maybe they get invited to fancy dinners, go to corporate-sponsored conferences in exotic locations, and get gifts from the company.

Grants can’t offer any of that, because grants are filtered through a university. When a grant pays a scientist’s salary, the university pays less to compensate, instead reducing their teaching responsibilities or giving them a slightly better chance at future raises. Any dinners or conferences have to obey not only rules from the grant agency (a surprising number of grants these days can’t pay for alcohol) but from the university as well, which can set a maximum on the price of a dinner or require people to travel economy using a specific travel agency. They also have to be applied for: scientists have to write their planned travel and conference budget, and the committee evaluating grants will often ask if that budget is really necessary.

Actual corruption isn’t the only thing we teach news readers to watch out for. By funding research, companies can choose to support people who tend to reach conclusions they agree with, keep in contact through the project, then publicize the result with a team of dedicated communications staff.

Governments can’t follow up on that level of detail. Scientific work is unpredictable, and governments try to fund a wide breadth of scientific work, so they have to accept that studies will not usually go as advertised. Scientists pivot, finding new directions and reaching new opinions, and government grant agencies don’t have the interest or the staff to police them for it. They also can’t select very precisely, with committees that often only know bits and pieces about the work they’re evaluating because they have to cover so many different lines of research. And with the huge number of studies funded, the number that can be meaningfully promoted by their comparatively small communications staff is only a tiny fraction.

In practice, then, governments can’t choose what conclusions scientists come to. If a government grant agency funds a study, that doesn’t tell you very much about whether the conclusion of the study is biased.

Instead, governments have an enormous influence on the general type of research that gets done. This doesn’t work on the level of conclusions, but on the level of topics, as that’s about the most granular that grant committees can get. Grants work in a direct way, giving scientists more equipment and time to do work of a general type that the grant committees are interested in. It works in terms of incentives, not because researchers get paid more but because they get to do more, hiring more students and temporary researchers if they can brand their work in terms of the more favored type of research. And it works by influencing the future: by creating students and sustaining young researchers who don’t yet have temporary positions, and by encouraging universities to hire people more likely to get grants for their few permanent positions.

So if you’re suspicious the government is biasing science, try to zoom out a bit. Think about the tools they have at their disposal, about how they distribute funding and check up on how it’s used. The way things are set up currently, most governments don’t have detailed control over what gets done. They have to filter that control through grant committees of opinionated scientists, who have to evaluate proposals well outside of their expertise. Any control you suspect they’re using has to survive that.

January 05, 2025

Mark GoodsellMaking back bacon

As a French citizen I should probably disavow the following post and remind myself that I have access to some of the best food in the world. Yet it's impossible to forget the tastes of your childhood. And indeed there are lots of British things that are difficult or very expensive to get hold of in France. Some of them (Marmite, Branston pickle ...) I can import via occasional trips across the channel, or in the luggage of visiting relatives. However, since Brexit this no longer works for fresh food like bacon and sausages. This is probably a good thing for my health, but every now and then I get a hankering for a fry-up or a bacon butty, and as a result of their rarity these are amongst the favourite breakfasts of my kids too. So I've learnt how to make bacon and sausages (it turns out that boudin noir is excellent with a fry-up and I even prefer it to black pudding). 

Sausages are fairly labour-intensive, but after about an hour or so's work it's possible to make one or two kilos worth. Back bacon, on the other hand, takes three weeks to make one batch, and I thought I'd share the process here.

1. Cut of meat

The first thing is to get the right piece of pork, since animals are divided up differently in different countries. I've made bacon several times now and keep forgetting which instructions I previously gave to the butcher at my local Grand Frais ... Now I have settled on asking for a carré de porc, and when they (nearly always) tell me that they don't have that in I ask for côtes de porc première in one whole piece, and try to get them to give me a couple of kilos. As you can find on wikipedia, I need the same piece of meat used to make pork chops. I then ask them to remove the spine, but it should still have the ribs. So I start with this:



2. Cure

Next the meat has to be cured for 10 days (I essentially follow the River Cottage recipe). I mix up a 50-50 batch of PDV salt and brown sugar (1 kg in total here), and add some pepper, juniper berries and bay leaves:


Notice that this doesn't include any nitrites or nitrates. I have found that nitrates/nitrites are essential for the flavour in sausages, but in bacon the only thing that they will do (other than be a carcinogen) as far as I can tell is make the meat stay pink when you cook it. I can live without that. This cure makes delicious bacon as far as I'm concerned. 

The curing process involves applying 1/10th of the mixture each day for ten days and draining off the liquid produced at each step. After the first coating it looks like this:


The salt and sugar remove water from the meat, and penetrate into it, preserving it. Each day I get liquid at the bottom, which I drain off and apply the next cure. After one day it looks like this:


This time I still had liquid after 10 days:

3. Drying

After ten days, I wash/wipe off the cure and pat it down with some vinegar. If you leave cure on the meat it will be much too salty (and, to be honest, this cure always gives quite salty bacon). So at this point it looks like this:


I then cover the container with a muslin that has been doused with a bit more vinegar, and leave in the fridge (at first) and then in the garage (since it's nice and cold this time of year) for ten days or so. This part removes extra moisture. It's possible that there will be small amounts of white mould that appear during this stage, but these are totally benign: you only have to worry if it starts to smell or you get blue/black mould, but this never happened to me so far.

4. Smoking

After the curing/drying, the bacon is ready to eat and should in principle keep almost indefinitely. However, I prefer smoked bacon, so I cold smoke it. This involves sticking it in a smoker (essentially just a box where you can suspend the meat above some smouldering sawdust) for several hours:


 










The sawdust is beech wood and slowly burns round in the little spiral device you can see above. Of course, I close the smoker up and usually put it in the shed to protect against the elements:


5. All done!

And then that's it! Delicious back bacon that really doesn't take very long to eat:


As I mentioned above, it's usually still a bit salty, so when I slice it to cook I put the pieces in water for a few minutes before grilling/frying:

Here you see that the colour is just like frying pork chops ... but the flavour is exactly right!









January 04, 2025

Doug NatelsonThis week in the arXiv: quantum geometry, fluid momentum "tunneling", and pasta sauce

Three papers caught my eye the other day on the arXiv at the start of the new year:

arXiv:2501.00098 - J. Yu et al., "Quantum geometry in quantum materials" - I hope to write up something about quantum geometry soon, but I wanted to point out this nice review even if I haven't done my legwork yet.  The ultrabrief point:  The single-particle electronic states in crystalline solids may be written as Bloch waves, of the form \(u_{n \mathbf{k}}(\mathbf{r}) \exp(i \mathbf{k} \cdot \mathbf{r})\), where the (crystal) momentum is given by \(\hbar \mathbf{k}\) and \(u_{n \mathbf{k}}\) is a function with the real-space periodicity of the crystal lattice and contains an implicit \(\mathbf{k}\) dependence.  You can get very far in understanding solid-state physics without worrying about this, but it turns out that there are a number of very important phenomena that originate from the oft-neglected \(\mathbf{k}\) dependence of \(u_{n \mathbf{k}}\).  These include the anomalous Hall effect, the (intrinsic) spin Hall effect, the orbital Hall effect, etc.  Basically the \(\mathbf{k}\) dependence of \(u_{n \mathbf{k}}\) in the form of derivatives defines an internal "quantum" geometry of the electronic structure.  This review is a look at the consequences of quantum geometry on things like superconductivity, magnetic excitations, excitons, Chern insulators, etc. in quantum materials.

Fig. 1 from arXiv:2501.01253
arXiv:2501.01253 - B. Coquinot et al., "Momentum tunnelling between nanoscale liquid flows" - In electronic materials there is a phenomenon known as Coulomb drag, in which a current driven through one electronic system (often a 2D electron gas) leads, through Coulomb interactions, to a current in adjacent but otherwise electrically isolated electronic system (say another 2D electron gas separated from the first by a few-nm insulating layer).  This paper argues that there should be a similar-in-spirit phenomenon when a polar liquid (like water) flows on one side of a thin membrane (like one or few-layer graphene, which can support electronic excitations like plasmons) - that this could drive flow of a polar fluid on the other side of the membrane (see figure).  They cast this in the language of momentum tunneling across the membrane, but the point is that it's some inelastic scattering process mediated by excitations in the membrane.  Neat idea.

arXiv:2501.00536 - G. Bartolucci et al., "Phase behavior of Cacio and Pepe sauce" - Cacio e pepe is a wonderful Italian pasta dish with a sauce made from pecorino cheese, pepper, and hot pasta cooking water that contains dissolved starch.  When prepared well, it's incredibly creamy, smooth, and satisfying.  The authors here perform a systematic study of the sauce properties as a function of temperature and starch concentration relative to cheese content, finding the part of parameter space to avoid if you don't want the sauce to "break" (condensing out clumps of cheese-rich material and ruining the sauce texture).  That's cool, but what is impressive is that they are actually able to model the phase stability mathematically and come up with a scientifically justified version of the recipe.  Very fun.


Tommaso DorigoHoliday Chess Riddle

During Christmas holidays I tend to indulge in online chess playing a bit too much, wasting several hours a day that could be used to get back on track with the gazillion research projects I am currently trying to keep pushing. But at times it gives me pleasure, when I conceive some good tactical sequence. 
Take the position below, from a 5' game on chess.com today. White has obtained a winning position, but can you win it with the clock ticking? (I have less than two minutes left for the rest of the game...)

read more

January 01, 2025

John PreskillHappy 200th birthday, Carnot’s theorem!

In Kenneth Grahame’s 1908 novel The Wind in the Willows, a Mole meets a Water Rat who lives on a River. The Rat explains how the River permeates his life: “It’s brother and sister to me, and aunts, and company, and food and drink, and (naturally) washing.” As the River plays many roles in the Rat’s life, so does Carnot’s theorem play many roles in a thermodynamicist’s.

Nicolas Léonard Sadi Carnot lived in France during the turn of the 19th century. His father named him Sadi after the 13th-century Persian poet Saadi Shirazi. Said father led a colorful life himself,1 working as a mathematician, engineer, and military commander for and before the Napoleonic Empire. Sadi Carnot studied in Paris at the École Polytechnique, whose members populate a “Who’s Who” list of science and engineering. 

As Carnot grew up, the Industrial Revolution was humming. Steam engines were producing reliable energy on vast scales; factories were booming; and economies were transforming. France’s old enemy Britain enjoyed two advantages. One consisted of inventors: Englishmen Thomas Savery and Thomas Newcomen invented the steam engine. Scotsman James Watt then improved upon Newcomen’s design until rendering it practical. Second, northern Britain contained loads of coal that industrialists could mine to power her engines. France had less coal. So if you were a French engineer during Carnot’s lifetime, you should have cared about engines’ efficiencies—how effectively engines used fuel.2

Carnot proved a fundamental limitation on engines’ efficiencies. His theorem governs engines that draw energy from heat—rather than from, say, the motional energy of water cascading down a waterfall. In Carnot’s argument, a heat engine interacts with a cold environment and a hot environment. (Many car engines fall into this category: the hot environment is burning gasoline. The cold environment is the surrounding air into which the car dumps exhaust.) Heat flows from the hot environment to the cold. The engine siphons off some heat and converts it into work. Work is coordinated, well-organized energy that one can directly harness to perform a useful task, such as turning a turbine. In contrast, heat is the disordered energy of particles shuffling about randomly. Heat engines transform random heat into coordinated work.

In The Wind and the Willows, Toad drives motorcars likely powered by internal combustion, rather than by a steam engine of the sort that powered the Industrial Revolution.

An engine’s efficiency is the bang we get for our buck—the upshot we gain, compared to the cost we spend. Running an engine costs the heat that flows between the environments: the more heat flows, the more the hot environment cools, so the less effectively it can serve as a hot environment in the future. An analogous statement concerns the cold environment. So a heat engine’s efficiency is the work produced, divided by the heat spent.

Carnot upper-bounded the efficiency achievable by every heat engine of the sort described above. Let T_{\rm C} denote the cold environment’s temperature; and T_{\rm H}, the hot environment’s. The efficiency can’t exceed 1 - \frac{ T_{\rm C} }{ T_{\rm H} }. What a simple formula for such an extensive class of objects! Carnot’s theorem governs not only many car engines (Otto engines), but also the Stirling engine that competed with the steam engine, its cousin the Ericsson engine, and more.

In addition to generality and simplicity, Carnot’s bound boasts practical and fundamental significances. Capping engine efficiencies caps the output one can expect of a machine, factory, or economy. The cap also prevents engineers from wasting their time on daydreaming about more-efficient engines. 

More fundamentally than these applications, Carnot’s theorem encapsulates the second law of thermodynamics. The second law helps us understand why time flows in only one direction. And what’s deeper or more foundational than time’s arrow? People often cast the second law in terms of entropy, but many equivalent formulations express the law’s contents. The formulations share a flavor often synopsized with “You can’t win.” Just as we can’t grow younger, we can’t beat Carnot’s bound on engines. 

Video courtesy of FQxI

One might expect no engine to achieve the greatest efficiency imaginable: 1 - \frac{ T_{\rm C} }{ T_{\rm H} }, called the Carnot efficiency. This expectation is incorrect in one way and correct in another. Carnot did design an engine that could operate at his eponymous efficiency: an eponymous engine. A Carnot engine can manifest as the thermodynamicist’s favorite physical system: a gas in a box topped by a movable piston. The gas undergoes four strokes, or steps, to perform work. The strokes form a closed cycle, returning the gas to its initial conditions.3 

Steampunk artist Todd Cahill beautifully illustrated the Carnot cycle for my book. The gas performs useful work because a weight sits atop the piston. Pushing the piston upward, the gas lifts the weight.

The gas expands during stroke 1, pushing the piston and so outputting work. Maintaining contact with the hot environment, the gas remains at the temperature T_{\rm H}. The gas then disconnects from the hot environment. Yet the gas continues to expand throughout stroke 2, lifting the weight further. Forfeiting energy, the gas cools. It ends stroke 2 at the temperature T_{\rm C}.

The gas contacts the cold environment throughout stroke 3. The piston pushes on the gas, compressing it. At the end of the stroke, the gas disconnects from the cold environment. The piston continues compressing the gas throughout stroke 4, performing more work on the gas. This work warms the gas back up to T_{\rm H}.

In summary, Carnot’s engine begins hot, performs work, cools down, has work performed on it, and warms back up. The gas performs more work on the piston than the piston performs on it.

At what cost, if the engine operates at the Carnot efficiency? The engine mustn’t waste heat. One wastes heat by roiling up the gas unnecessarily—by expanding or compressing it too quickly. The gas must stay in equilibrium, a calm, quiescent state. One can keep the gas quiescent only by running the cycle infinitely slowly. The cycle will take an infinitely long time, outputting zero power (work per unit time). So one can achieve the perfect efficiency only in principle, not in practice, and only by sacrificing power. Again, you can’t win.

Efficiency trades off with power.

Carnot’s theorem may sound like the Eeyore of physics, all negativity and depression. But I view it as a companion and backdrop as rich, for thermodynamicists, as the River is for the Water Rat. Carnot’s theorem curbs diverse technologies in practical settings. It captures the second law, a foundational principle. The Carnot cycle provides intuition, serving as a simple example on which thermodynamicists try out new ideas, such as quantum engines. Carnot’s theorem also provides what physicists call a sanity check: whenever a researcher devises a new (for example, quantum) heat engine, they can confirm that the engine obeys Carnot’s theorem, to help confirm their proposal’s accuracy. Carnot’s theorem also serves as a school exercise and a historical tipping point: the theorem initiated the development of thermodynamics, which continues to this day. 

So Carnot’s theorem is practical and fundamental, pedagogical and cutting-edge—brother and sister, and aunts, and company, and food and drink. I just wouldn’t recommend trying to wash your socks in Carnot’s theorem.

1To a theoretical physicist, working as a mathematician and an engineer amounts to leading a colorful life.

2People other than Industrial Revolution–era French engineers should care, too.

3A cycle doesn’t return the hot and cold environments to their initial conditions, as explained above.

December 30, 2024

Tommaso DorigoWhy Measure The Top Quark Production Cross Section?

As part of my self-celebrations for XX years of blogging activities, I am reposting here (very) old articles I wrote over the years on topics ranging from Physics to Chess to anything in between. The post I am recycling today is one that describes for laymen a reason why it is interesting to continue going after the top quark, many years (10, at the time the article was written) after the discovery of that particle. The piece appeared in July 10, 2005 in my column at the Quantum Diaries blog (https://qd.typepad.com/6/2005/07/ok_so_i_promise.html).

read more

December 29, 2024

John BaezProblems to Sharpen the Young

 

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?

You probably know this puzzle. There are two efficient solutions, related by a symmetry that switches the wolf and the cabbage.

But what you might not know is that this puzzle goes back to a book written around 800 AD, sometimes attributed to Charlemagne’s advisor Alcuin! Charlemagne brought this monk from York to help set up the educational system of his empire. But Alcuin had a great fondness for logic. Nobody is sure if he wrote this book — but it’s fascinating nonetheless.

It has either 53 or 56 logic puzzles, depending on the version. It’s called Propositiones ad Acuendos Juvenes, or Problems to Sharpen the Young. If the wolf, goat and cabbage problem is too easy for you, you might like this one:

Three men, each with a sister, must cross a boat which can carry only two people, so that a woman whose brother is not present is never left in the company of another man.

There are also trick puzzles, like this:

A man has 300 pigs. He ordered all of them slaughtered in 3 days, but with an odd number killed each day. What number were to be killed each day?

Wikipedia says this was given to punish unruly students — presumably students who didn’t know that the sum of three odd numbers is always odd.

It’s fascinating to think that while some Franks were fighting Saxons and Lombards, students in more peaceful parts of the empire were solving these puzzles!

The book also has some of the first recorded packing problems, like this:

There is a triangular city which has one side of 100 feet, another side of 100 feet, and a third of 90 feet. Inside of this, I want to build rectangular houses in such a way that each house is 20 feet in length, 10 feet in width. How many houses can I fit in the city?

This is hard! There’s a nice paper about all the packing problems in this book:

• Nikolai Yu. Zolotykh, Alcuin’s Propositiones de Civitatibus: the earliest packing problems.

He shows two solutions to the above problem, both with 16 houses. In solution here there’s a tiny space between houses 7 and 15.


However, Alcuin — or whoever wrote the book — didn’t really solve the problem. They just used an approximate Egyptian formula for the area of a triangle in terms of its side lengths, and divided that by the area of the houses! This is consistent with the generally crappy knowledge of math in Charlemagne’s empire.

There’s an even harder packing problem in this book, which again isn’t solved correctly.

For more on this book of puzzles, check out this:

• Wikipedia, Propositiones ad Acuendos Juvenes.

You can get a translation into English here:

• John Hadley and David Singmaster,
Problems to Sharpen the Young, The Mathematical Gazette 76 (1992), 102–126.

Finally, here is a nice paper on the question of whether Alcuin wrote the book, and various bit of evidence for Alcuin’s interest in mathematics, numerology, and related fun activities:

• Michael N. Brennan, Alcuin, mathematics and the rational mind, in Insular Iconographies: Essays in Honor of Jane Hawkes, eds. Meg Boulton and Michael D. J. Bintley, Boydell Press, 2019, pp. 203–216.

It’s so fascinating that I’ll quote a bunch of it. It starts this way:

A medieval mathematical manuscript

The medieval cleric Alcuin (AD 735–804) is credited in surviving manuscripts with being the originator of a collection of fifty-three mathematical and logical puzzles, the Propositiones ad acuendos iuvenes (“Problems for Sharpening the Young”). There is no direct evidence to connect the collection with Alcuin either as compiler or creator, but even modern commentators continue to associate his name with the puzzles. There are at least fourteen extant or partial copies of the Propositiones, which date from the ninth to the fifteenth century, suggesting that the collection was in popular use at least from the time of Alcuin onwards. Michael Gorman confidently listed the Propositiones among ninety spurious prose works of Alcuin for two reasons: firstly, because Alcuin always attached a dedicatory letter to his works, and there is none here, and secondly, because the work falls, Gorman thinks, among those documents which Alcuin would have had neither the time nor the energy to write in the period AD 782-800. Alcuin himself admitted to having only “stolen hours” at night in which to write a life of Willibrord. Despite Gorman”s view that the work is pseudo-Alcuin, it is reasonable to ask if there is internal evidence in the puzzles which would support or oppose the assigning of the collection to Alcuin himself, committed as he was to the promotion of educational subjects, including mathematics, in the course of the Carolingian “renaissance”.

The majority of the problem types in the Propositiones are known from earlier Chinese, Indian, Egyptian, Byzantine, Greek, and Roman sources, whilst others appear in works by Boethius, Metrodorus, and Isidore of Seville. Among the puzzles, however, there are a small number of important types that are not as yet known from earlier sources. These include the so-called “river-crossing problems”, “strange family problems”, a transportation (or “desert-crossing”) problem, and a problem that relies on the summation of an arithmetical series. The case for the mathematical creativity of the author, if there was only one author, rests on these. One puzzle has outstripped all the others in having popular appeal down to the present day. This puzzle, passing presumably from later medieval reproductions of the Propositiones into oral tradition, concerns a farmer who needs to ferry three items across a river in a boat.

He then goes on to discuss the problem of the wolf, the goat and the cabbage — though in this paper, presumably more historically accurate, it’s a wolf, a goat and a bag of oats. Then he returns to the big question:

Did Alcuin compose the Propositiones?

In a letter to Charlemagne in AD 799, three years after Alcuin had moved from the Palace School to an abbacy at Tours, Alcuin wrote that he was sending, along with some examples of grammar and correct expression, quidquid figuras arithmeticas laetitiae causa (“certain arithmetical curiosities for [your] pleasure”). He added that he would write these on an empty sheet that Charlemagne sent to him and suggested that “our friend and helper Beselel will […] be able to look up the problems in an arithmetic book”. Beselel was Alcuin”s nickname for the technically skilled Einhard, later Charlemagne”s biographer. It would be convenient if the reference in Alcuin”s letter were to the Propositiones, but for many reasons it is almost certainly not. For one thing, fifty-three propositions and their solutions would not fit on the blank side of a folio, given that they occupy ten folio sides in most of the manuscripts in which they are currently found. Secondly, since Alcuin”s letter to Charlemagne dealt primarily with the importance of exactness of language, a grammarian and self-conscious Latinist like Alcuin would not describe the Propositiones as figurae arithmeticae, or refer to an “arithmetic” book in a case where the solutions required both geometry and logic methods. Thirdly, the idea of Einhard looking up the problems in an arithmetic book is an odd one, given that the Propositiones is usually found with answers, even if in most cases it does not show how these answers were arrived at.

Apart from the reasons given by Gorman for Alcuin not having been the author of the Propositiones — the effort and time involved and the absence of a dedication — there are deeper, internal reasons for concluding that Alcuin was not the author. The river-crossing puzzles, along with the “transportation problem”, and a puzzle about the number of birds on a100-step ladder took a considerable amount of time and mathematical sophistication to compose. Accompanying them are two types of puzzles that appear repeatedly in the collection and that demand far lesssophistication: area problems, often fanciful (such as problem 29: “how many rectangular houses will fit into a town which has a circular wall?”); and division problems where a “lord of the manor” wishes to divide grain among his household (problems 32-5, and others). Repetition of problems suggests that the Propositiones was intended for use as a practice book, but this supposed pedagogical purpose stumbles (even if it does not fall) on two counts. Firstly, the solutions to most of the mensuration problems, like the one involving a circular town, rely on late Roman methods for approximating the areas of common figures (circles, triangles, etc.) and these methods can be quite wrong. Secondly, the lack of worked solutions in the Propositiones deprived the student of the opportunity to learn the method required to solve another question of the same type. Solution methods might have been lost in transcription, but their almost total absence makes this unlikely. Mathematical methods (algebra in particular) that post-dated the Carolingian era would have been necessary to provide elegant and instructive solutions to many of the problems, and one cannot escape the suspicion that trial-and-error was the method used by whoever supplied answers (without worked solutions) to the Propositiones: a guess was made, refined, and then a new guess made. Such an approach is difficult to systemise, and even more difficult to describe in writing for the benefit of students. We are left with a mixture of “complex” mathematical problems, such as those cited earlier, and simpler questions whose answers are mostly not justified.

This lack of uniformity suggests that it was a compiler rather than a composer who produced the first edition of the Propositiones. If Alcuin was involved, he was no more than a medium through which the fifty-three puzzles were assembled. No one person was the author of the Propositiones problems, because no mathematically sophisticated person would have authored the weaker ones, and nobody who was not mathematically sophisticated could have authored the others. Furthermore, with the more sophisticated problems, there is a noticeable absence of the kind of refinement and repetition that is common in modern textbooks.

He then goes on to discuss Alcuin’s fascination with numerology, acrostics, and making up nicknames for his friends. It’s too bad we’ll never know for sure what, if anything, Alcuin had to do with this book of puzzles!

December 23, 2024

John PreskillFinding Ed Jaynes’s ghost

You might have heard of the conundrum “What do you give the man who has everything?” I discovered a variation on it last October: how do you celebrate the man who studied (nearly) everything? Physicist Edwin Thompson Jaynes impacted disciplines from quantum information theory to biomedical imaging. I almost wrote “theoretical physicist,” instead of “physicist,” but a colleague insisted that Jaynes had a knack for electronics and helped design experiments, too. Jaynes worked at Washington University in St. Louis (WashU) from 1960 to 1992. I’d last visited the university in 2018, as a newly minted postdoc collaborating with WashU experimentalist Kater Murch. I’d scoured the campus for traces of Jaynes like a pilgrim seeking a saint’s forelock or humerus. The blog post “Chasing Ed Jaynes’s ghost” documents that hunt.

I found his ghost this October.

Kater and colleagues hosted the Jaynes Centennial Symposium on a brilliant autumn day when the campus’s trees were still contemplating shedding their leaves. The agenda featured researchers from across the sciences and engineering. We described how Jaynes’s legacy has informed 21st-century developments in quantum information theory, thermodynamics, biophysics, sensing, and computation. I spoke about quantum thermodynamics and information theory—specifically, incompatible conserved quantities, about which my research-group members and I have blogged many times.

Irfan Siddiqi spoke about quantum technologies. An experimentalist at the University of California, Berkeley, Irfan featured on Quantum Frontiers seven years ago. His lab specializes in superconducting qubits, tiny circuits in which current can flow forever, without dissipating. How can we measure a superconducting qubit? We stick the qubit in a box. Light bounces back and forth across the box. The light interacts with the qubit while traversing it, in accordance with the Jaynes–Cummings model. We can’t seal any box perfectly, so some light will leak out. That light carries off information about the qubit. We can capture the light using a photodetector to infer about the qubit’s state.

The first half of Jaynes–Cummings

Bill Bialek, too, spoke about inference. But Bill is a Princeton biophysicist, so fruit flies preoccupy him more than qubits do. A fruit fly metamorphoses from a maggot that hatches from an egg. As the maggot develops, its cells differentiate: some form a head, some form a tail, and so on. Yet all the cells contain the same genetic information. How can a head ever emerge, to differ from a tail? 

A fruit-fly mother, Bill revealed, injects molecules into an egg at certain locations. These molecules diffuse across the egg, triggering the synthesis of more molecules. The knock-on molecules’ concentrations can vary strongly across the egg: a maggot’s head cells contain molecules at certain concentrations, and the tail cells contain the same molecules at other concentrations.

At this point in Bill’s story, I was ready to take my hat off to biophysicists for answering the question above, which I’ll rephrase here: if we find that a certain cell belongs to a maggot’s tail, why does the cell belong to the tail? But I enjoyed even more how Bill turned the question on its head (pun perhaps intended): imagine that you’re a maggot cell. How can you tell where in the maggot you are, to ascertain how to differentiate? Nature asks this question (loosely speaking), whereas human observers ask Bill’s first question.

To answer the second question, Bill recalled which information a cell accesses. Suppose you know four molecules’ concentrations: c_1, c_2, c_3, and c_4. How accurately can you predict the cell’s location? That is, what probability does the cell have of sitting at some particular site, conditioned on the cs? That probability is large only at one site, biophysicists have found empirically. So a cell can accurately infer its position from its molecules’ concentrations.

I’m no biophysicist (despite minor evidence to the contrary), but I enjoyed Bill’s story as I enjoyed Irfan’s. Probabilities, information, and inference are abstract notions; yet they impact physical reality, from insects to quantum science. This tension between abstraction and concreteness arrested me when I first encountered entropy, in a ninth-grade biology lecture. The tension drew me into information theory and thermodynamics. These toolkits permeate biophysics as they permeate my disciplines. So, throughout the symposium, I spoke with engineers, medical-school researchers, biophysicists, thermodynamicists, and quantum scientists. They all struck me as my kind of people, despite our distribution across the intellectual landscape. Jaynes reasoned about distributions—probability distributions—and I expect he’d have approved of this one. The man who studied nearly everything deserves a celebration that illuminates nearly everything.

December 20, 2024

Clifford JohnsonA Long Goodbye

I've been very quiet here over the last couple of weeks. My mother, Delia Maria Johnson, already in hospital since 5th November or so, took a turn for the worse and began a rapid decline. She died peacefully after some days, and to be honest I’ve really not been myself since then.

My mother Delia at a wedding in 2012

There's an extra element to the sense of loss when (as it approaches) you are powerless to do anything because of being thousands of miles away. On the plus side, because of the ease of using video calls, and with the help of my sister being there, I was able to be somewhat present during what turned out to be the last moments when she was aware of people around her, and therefore was able to tell her I loved her one last time.

Rather than charging across the world on planes, trains, and in automobiles, probably being out of reach during any significant changes in the situation (the doctors said I would likely not make it in time) I did a number of things locally that I am glad I got to do.

It began with visiting (and sending a photo from) the Santa Barbara mission, a place she dearly loved and was unable to visit again after 2019, along with the pier. These are both places we walked together so much back when I first lived here in what feels like another life.

Then, two nights before mum passed away, but well after she’d seemed already beyond reach of anyone, although perhaps (I’d like to think) still able to hear things, my sister contacted me from her bedside asking if I’d like to read mum a psalm, perhaps one of her favourites, 23 or 91. At first I thought she was already planning the funeral, and expressed my surprise at this since mum was still alive and right next to her. But I’d misunderstood, and she’d in fact had a rather great idea. This suggestion turned into several hours of, having sent on recordings of the two psalms, my digging into the poetry shelf in the study and discovering long neglected collections through which I searched (sometimes accompanied by my wife and son) for additional things to read. I recorded some and sent them along, as well as one from my son, I’m delighted to say. Later, the whole thing turned into me singing various songs while playing my guitar and sending recordings of those along too.

Incidentally, the guitar-playing was an interesting turn of events since not many months ago I decided after a long lapse to start playing guitar again, and try to move the standard of my playing (for vocal accompaniment) to a higher level than I’d previously done, by playing and practicing for a little bit on a regular basis. I distinctly recall thinking at one point during one practice that it would be nice to play for mum, although I did not imagine that playing to her while she was on her actual death-bed would be the circumstance under which I’d eventually play for her, having (to my memory) never directly done so back when I used to play guitar in my youth. (Her overhearing me picking out bits of Queen songs behind my room door when I was a teenager doesn’t count as direct playing for her.)

Due to family circumstances I’ll perhaps go into another time... Click to continue reading this post

The post A Long Goodbye appeared first on Asymptotia.

December 19, 2024

Terence TaoOn the distribution of eigenvalues of GUE and its minors at fixed index

I’ve just uploaded to the arXiv the paper “On the distribution of eigenvalues of GUE and its minors at fixed index“. This is a somewhat technical paper establishing some estimates regarding one of the most well-studied random matrix models, the Gaussian Unitary Ensemble (GUE), that were not previously in the literature, but which will be needed for some forthcoming work of Hariharan Narayanan on the limiting behavior of “hives” with GUE boundary conditions (building upon our previous joint work with Sheffield).

For sake of discussion we normalize the GUE model to be the random {N \times N} Hermitian matrix {H} whose probability density function is proportional to {e^{-\mathrm{tr} H^2}}. With this normalization, the famous Wigner semicircle law will tell us that the eigenvalues {\lambda_1 \leq \dots \leq \lambda_N} of this matrix will almost all lie in the interval {[-\sqrt{2N}, \sqrt{2N}]}, and after dividing by {\sqrt{2N}}, will asymptotically be distributed according to the semicircle distribution

\displaystyle  \rho_{\mathrm{sc}}(x) := \frac{2}{\pi} (1-x^2)_+^{1/2}.

In particular, the normalized {i^{th}} eigenvalue {\lambda_i/\sqrt{2N}} should be close to the classical location {\gamma_{i/N}}, where {\gamma_{i/N}} is the unique element of {[-1,1]} such that

\displaystyle  \int_{-\infty}^{\gamma_{i/N}} \rho_{\mathrm{sc}}(x)\ dx = \frac{i}{N}.

Eigenvalues can be described by their index {i} or by their (normalized) energy {\lambda_i/\sqrt{2N}}. In principle, the two descriptions are related by the classical map {i \mapsto \gamma_{i/N}} defined above, but there are microscopic fluctuations from the classical location that create subtle technical difficulties between “fixed index” results in which one focuses on a single index {i} (and neighboring indices {i+1, i-1}, etc.), and “fixed energy” results in which one focuses on a single energy {x} (and eigenvalues near this energy). The phenomenon of eigenvalue rigidity does give some control on these fluctuations, allowing one to relate “averaged index” results (in which the index {i} ranges over a mesoscopic range) with “averaged energy” results (in which the energy {x} is similarly averaged over a mesoscopic interval), but there are technical issues in passing back from averaged control to pointwise control, either for the index or energy.

We will be mostly concerned in the bulk region where the index {i} is in an inteval of the form {[\delta n, (1-\delta)n]} for some fixed {\delta>0}, or equivalently the energy {x} is in {[-1+c, 1-c]} for some fixed {c > 0}. In this region it is natural to introduce the normalized eigenvalue gaps

\displaystyle  g_i := \sqrt{N/2} \rho_{\mathrm{sc}}(\gamma_{i/N}) (\lambda_{i+1} - \lambda_i).

The semicircle law predicts that these gaps {g_i} have mean close to {1}; however, due to the aforementioned fluctuations around the classical location, this type of claim is only easy to establish in the “fixed energy”, “averaged energy”, or “averaged index” settings; the “fixed index” case was only achieved by myself as recently as 2013, where I showed that each such gap in fact asymptotically had the expected distribution of the Gaudin law, using manipulations of determinantal processes. A significantly more general result, avoiding the use of determinantal processes, was subsequently obtained by Erdos and Yau.

However, these results left open the possibility of bad tail behavior at extremely large or small values of the gaps {g_i}; in particular, moments of the {g_i} were not directly controlled by previous results. The first result of the paper is to push the determinantal analysis further, and obtain such results. For instance, we obtain moment bounds

\displaystyle  \mathop{\bf E} g_i^p \ll_p 1

for any fixed {p > 0}, as well as an exponential decay bound

\displaystyle  \mathop{\bf P} (g_i > h) \ll \exp(-h/4)

for {0 < h \ll \log\log N}, and a lower tail bound

\displaystyle  \mathop{\bf P} (g_i \leq h) \ll h^{2/3} \log^{1/2} \frac{1}{h}

for any {h>0}. We also obtain good control on sums {g_i + \dots + g_{i+m-1}} of {m} consecutive gaps for any fixed {m}, showing that this sum has mean {m + O(\log^{4/3} (2+m))} and variance {O(\log^{7/3} (2+m))}. (This is significantly less variance than one would expect from a sum of {m} independent random variables; this variance reduction phenomenon is closely related to the eigenvalue rigidity phenomenon alluded to earlier, and reflects the tendency of eigenvalues to repel each other.)

A key point in these estimates is that no factors of {\log N} occur in the estimates, which is what one would obtain if one tried to use existing eigenvalue rigidity theorems. (In particular, if one normalized the eigenvalues {\lambda_i} at the same scale at the gap {g_i}, they would fluctuate by a standard deviation of about {\sqrt{\log N}}; it is only the gaps between eigenvalues that exhibit much smaller fluctuation.) On the other hand, the dependence on {h} is not optimal, although it was sufficient for the applications I had in mind.

As with my previous paper, the strategy is to try to replace fixed index events such as {g_i > h} with averaged energy events. For instance, if {g_i > h} and {i} has classical location {x}, then there is an interval of normalized energies {t} of length {\gg h}, with the property that there are precisely {N-i} eigenvalues to the right of {f_x(t)} and no eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}, where

\displaystyle  f_x(t) = \sqrt{2N}( x + \frac{t}{N \rho_{\mathrm{sc}}(x)})

is an affine rescaling to the scale of the eigenvalue gap. So matters soon reduce to controlling the probability of the event

\displaystyle  (N_{x,t} = N-i) \wedge (N_{x,t,h/2} = 0)

where {N_{x,t}} is the number of eigenvalues to the right of {f_x(t)}, and {N_{x,t,h/2}} is the number of eigenvalues in the interval {[f_x(t), f_x(t+h/2)]}. These are fixed energy events, and one can use the theory of determinantal processes to control them. For instance, each of the random variables {N_{x,t}}, {N_{x,t,h/2}} separately have the distribution of sums of independent Boolean variables, which are extremely well understood. Unfortunately, the coupling is a problem; conditioning on the event {N_{x,t} = N-i}, in particular, affects the distribution of {N_{x,t,h/2}}, so that it is no longer the sum of independent Boolean variables. However, it is still a mixture of such sums, and with this (and the Plancherel-Rotach asymptotics for the GUE determinantal kernel) it is possible to proceed and obtain the above estimates after some calculation.

For the intended application to GUE hives, it is important to not just control gaps {g_i} of the eigenvalues {\lambda_i} of the GUE matrix {M}, but also the gaps {g'_i} of the eigenvalues {\lambda'_i} of the top left {N-1 \times N-1} minor {M'} of {M}. This minor of a GUE matrix is basically again a GUE matrix, so the above theorem applies verbatim to the {g'_i}; but it turns out to be necessary to control the joint distribution of the {g_i} and {g'_i}, and also of the interlacing gaps {\tilde g_i} between the {\lambda_i} and {\lambda'_i}. For fixed energy, these gaps are in principle well understood, due to previous work of Adler-Nordenstam-van Moerbeke and of Johansson-Nordenstam which show that the spectrum of both matrices is asymptotically controlled by the Boutillier bead process. This also gives averaged energy and averaged index results without much difficulty, but to get to fixed index information, one needs some universality result in the index {i}. For the gaps {g_i} of the original matrix, such a universality result is available due to the aforementioned work of Erdos and Yau, but this does not immediately imply the corresponding universality result for the joint distribution of {g_i} and {g'_i} or {\tilde g_i}. For this, we need a way to relate the eigenvalues {\lambda_i} of the matrix {M} to the eigenvalues {\lambda'_i} of the minors {M'}. By a standard Schur’s complement calculation, one can obtain the equation

\displaystyle a_{NN} - \lambda_i - \sum_{j=1}^{N-1}\frac{|X_j|^2}{\lambda'_j - \lambda_i} = 0

for all {i}, where {a_{NN}} is the bottom right entry of {M}, and {X_1,\dots,X_{N-1}} are complex gaussians independent of {\lambda'_j}. This gives a random system of equations to solve for {\lambda_i} in terms of {\lambda'_j}. Using the previous bounds on eigenvalue gaps (particularly the concentration results for sums of consecutive gaps), one can localize this equation to the point where a given {\lambda_i} is mostly controlled by a bounded number of nearby {\lambda'_j}, and hence a single gap {g_i} is mostly controlled by a bounded number of {g'_j}. From this, it is possible to leverage the existing universality result of Erdos and Yau to obtain universality of the joint distribution of {g_i} and {g'_i} (or of {\tilde g_i}). (The result can also be extended to more layers of the minor process than just two, as long as the number of minors is held fixed.)

This at last brings us to the final result of the paper, which is the one which is actually needed for the application to GUE hives. Here, one is interested in controlling the variance of a linear combination {\sum_{l=1}^m a_l \tilde g_{i+l}} of a fixed number {l} of consecutive interlacing gaps {\tilde g_{i+l}}, where the {a_l} are arbitrary deterministic coefficients. An application of the triangle and Cauchy-Schwarz inequalities, combined with the previous moment bounds on gaps, shows that this randomv ariable has variance {\ll m \sum_{l=1}^m |a_i|^2}. However, this bound is not expected to be sharp, due to the expected decay between correlations of eigenvalue gaps. In this paper, I improve the variance bound to

\displaystyle  \ll_A \frac{m}{\log^A(2+m)} \sum_{l=1}^m |a_i|^2

for any {A>0}, which is what was needed for the application.

This improvement reflects some decay in the covariances between distant interlacing gaps {\tilde g_i, \tilde g_{i+h}}. I was not able to establish such decay directly. Instead, using some Fourier analysis, one can reduce matters to studying the case of modulated linear statistics such as {\sum_{l=1}^m e(\xi l) \tilde g_{i+l}} for various frequencies {\xi}. In “high frequency” cases one can use the triangle inequality to reduce matters to studying the original eigenvalue gaps {g_i}, which can be handled by a (somewhat complicated) determinantal process calculation, after first using universality results to pass from fixed index to averaged index, thence to averaged energy, then to fixed energy estimates. For low frequencies the triangle inequality argument is unfavorable, and one has to instead use the determinantal kernel of the full minor process, and not just an individual matrix. This requires some classical, but tedious, calculation of certain asymptotics of sums involving Hermite polynomials.

The full argument is unfortunately quite complex, but it seems that the combination of having to deal with minors, as well as fixed indices, places this result out of reach of many prior methods.

December 17, 2024

Andrew JaffeDiscovering Japan

My old friend Marc Weidenbaum, curator and writer of disquiet.com, reminded me, in his latest post, of the value of blogging. So, here I am (again).

Since September, I have been on sabbatical in Japan, working mostly at QUP (International Center for Quantum-field Measurement Systems for Studies of the Universe and Particles) at the KEK accelerator lab in Tsukuba, Japan, and spending time as well at the Kavli IPMU, about halfway into Tokyo from here. Tsukuba is a “science city” about 30 miles northeast of Tokyo, home to multiple Japanese scientific establishments (such as a University and a major lab for JAXA, the Japanese space agency).

Scientifically, I’ve spent a lot of time thinking and talking about the topology of the Universe, future experiments to measure the cosmic microwave background, and statistical tools for cosmology experiments. And I was honoured to be asked to deliver a set of lectures on probability and statistics in cosmology, a topic which unites most of my research interests nowadays.

Japan, and Tsukuba in particular, is a very nice place to live. It’s close enough to Tokyo for regular visits (by the rapid Tsukuba Express rail line), but quiet enough for our local transport to be dominated by cycling around town. We love the food, the Japanese schools that have welcomed our children, the onsens, and our many views of Mount Fuji.

Fuji with buildings

Fuji through windows

And after almost four months in Japan, it’s beginning to feel like home.

Unfortunately, we’re leaving our short-term home in Japan this week. After a few weeks of travel in Southeast Asia, we’ll be decamped to the New York area for the rest of the Winter and early Spring. But (as further encouragement to myself to continue blogging) I’ll have much more to say about Japan — science and life — in upcoming posts.

December 09, 2024

David Hoggpossible Trojan planet?

In group meeting last week, Stefan Rankovic (NYU undergrad) presented results on a very low-amplitude possible transit in the lightcurve of a candidate long-period eclipsing binary system found in the NASA Kepler data. The weird thing is that (even though the period is very long) the transit of the possible planet looks just like the transit of the secondary star in the eclipsing binary. Like just like it, only lower in amplitude (smaller in radius).

If the transit looks identical, only lower in amplitude, it suggests that it is taking an extremely similar chord across the primary star, at the same speed, with no difference in inclination. How could that be? Well if they are moving at the same speed on the same path, maybe we have a 1:1 resonance, like a Trojan? If so, there are so many cool things about this system. It was an exciting group meeting, to be sure.

December 01, 2024

Clifford JohnsonMagic Ingredients Exist!

I’m a baker, as you probably know. I’ve regularly made bread, cakes, pies, and all sorts of things for friends and family. About a year ago, someone in the family was diagnosed with a severe allergy to gluten, and within days we removed all gluten products from the kitchen, began … Click to continue reading this post

The post Magic Ingredients Exist! appeared first on Asymptotia.

November 30, 2024

Clifford JohnsonHope’s Benefits

The good news (following from last post) is that it worked out! I was almost short of the amount I needed to cover the pie, and so that left nothing for my usual decoration... but it was a hit at dinner and for left-overs today, so that's good!

--cvj Click to continue reading this post

The post Hope’s Benefits appeared first on Asymptotia.

November 28, 2024

Clifford JohnsonHope

The delicious chaos that (almost always) eventually tames into a tasty flaky pastry crust… it’s always a worrying mess to start out, but you trust to your experience, and you carry on, with hope. #thanksgiving

The post Hope appeared first on Asymptotia.

October 28, 2024

John PreskillAnnouncing the quantum-steampunk creative-writing course!

Why not run a quantum-steampunk creative-writing course?

Quantum steampunk, as Quantum Frontiers regulars know, is the aesthetic and spirit of a growing scientific field. Steampunk is a subgenre of science fiction. In it, futuristic technologies invade Victorian-era settings: submarines, time machines, and clockwork octopodes populate La Belle Èpoque, a recently liberated Haiti, and Sherlock Holmes’s London. A similar invasion characterizes my research field, quantum thermodynamics: thermodynamics is the study of heat, work, temperature, and efficiency. The Industrial Revolution spurred the theory’s development during the 1800s. The theory’s original subject—nineteenth-century engines—were large, were massive, and contained enormous numbers of particles. Such engines obey the classical mechanics developed during the 1600s. Hence thermodynamics needs re-envisioning for quantum systems. To extend the theory’s laws and applications, quantum thermodynamicists use mathematical and experimental tools from quantum information science. Quantum information science is, in part, the understanding of quantum systems through how they store and process information. The toolkit is partially cutting-edge and partially futuristic, as full-scale quantum computers remain under construction. So applying quantum information to thermodynamics—quantum thermodynamics—strikes me as the real-world incarnation of steampunk.

But the thought of a quantum-steampunk creative-writing course had never occurred to me, and I hesitated over it. Quantum-steampunk blog posts, I could handle. A book, I could handle. Even a short-story contest, I’d handled. But a course? The idea yawned like the pitch-dark mouth of an unknown cavern in my imagination.

But the more I mulled over Edward Daschle’s suggestion, the more I warmed to it. Edward was completing a master’s degree in creative writing at the University of Maryland (UMD), specializing in science fiction. His mentor Emily Brandchaft Mitchell had sung his praises via email. In 2023, Emily had served as a judge for the Quantum-Steampunk Short-Story Contest. She works as a professor of English at UMD, writes fiction, and specializes in the study of genre. I reached out to her last spring about collaborating on a grant for quantum-inspired art, and she pointed to her protégé.

Who won me over. Edward and I are co-teaching “Writing Quantum Steampunk: Science-Fiction Workshop” during spring 2025.

The course will alternate between science and science fiction. Under Edward’s direction, we’ll read and discuss published fiction. We’ll also learn about what genres are and how they come to be. Students will try out writing styles by composing short stories themselves. Everyone will provide feedback about each other’s writing: what works, what’s confusing, and opportunities for improvement. 

The published fiction chosen will mirror the scientific subjects we’ll cover: quantum physics; quantum technologies; and thermodynamics, including quantum thermodynamics. I’ll lead this part of the course. The scientific studies will interleave with the story reading, writing, and workshopping. Students will learn about the science behind the science fiction while contributing to the growing subgenre of quantum steampunk.

We aim to attract students from across campus: physics, English, the Jiménez-Porter Writers’ House, computer science, mathematics, and engineering—plus any other departments whose students have curiosity and creativity to spare. The course already has four cross-listings—Arts and Humanities 270, Physics 299Q, Computer Science 298Q, and Mechanical Engineering 299Q—and will probably acquire a fifth (Chemistry 298Q). You can earn a Distributive Studies: Scholarship in Practice (DSSP) General Education requirement, and undergraduate and graduate students are welcome. QuICS—the Joint Center for Quantum Information and Computer Science, my home base—is paying Edward’s salary through a seed grant. Ross Angelella, the director of the Writers’ House, arranged logistics and doused us with enthusiasm. I’m proud of how organizations across the university are uniting to support the course.

The diversity we seek, though, poses a challenge. The course lacks prerequisites, so I’ll need to teach at a level comprehensible to the non-science students. I’d enjoy doing so, but I’m concerned about boring the science students. Ideally, the science students will help me teach, while the non-science students will challenge us with foundational questions that force us to rethink basic concepts. Also, I hope that non-science students will galvanize discussions about ethical and sociological implications of quantum technologies. But how can one ensure that conversation will flow?

This summer, Edward and I traded candidate stories for the syllabus. Based on his suggestions, I recommend touring science fiction under an expert’s guidance. I enjoyed, for a few hours each weekend, sinking into the worlds of Ted Chiang, Ursula K. LeGuinn, N. K. Jemison, Ken Liu, and others. My scientific background informed my reading more than I’d expected. Some authors, I could tell, had researched their subjects thoroughly. When they transitioned from science into fiction, I trusted and followed them. Other authors tossed jargon into their writing but evidenced a lack of deep understanding. One author nailed technical details about quantum computation, initially impressing me, but missed the big picture: his conflict hinged on a misunderstanding about entanglement. I see all these stories as affording opportunities for learning and teaching, in different ways.

Students can begin registering for “Writing Quantum Steampunk: Science-Fiction Workshop” on October 24. We can offer only 15 seats, due to Writers’ House standards, so secure yours as soon as you can. Part of me still wonders how the Hilbert space I came to be co-teaching a quantum-steampunk creative-writing course.1 But I look forward to reading with you next spring!


1A Hilbert space is a mathematical object that represents a quantum system. But you needn’t know that to succeed in the course.

Matt LeiferDoctoral Position

Funding is available for a Doctor of Science Studentship with Dr. Matthew Leifer at the Institute for Quantum Studies, Chapman University, California, USA.  It is in Chapman’s unique interdisciplinary Math, Physics, and Philosophy (MPP) program, which emphasizes research that encompasses two or more of the three core disciplines.  This is a 3-year program that focuses on research, and students are expected to have a terminal Masters degree before they start.

This position is part of the Southern California Quantum Foundations Hub, funded by the John Templeton Foundation.  The research project must be in quantum foundations, particularly in one of the three theme areas of the grant:

  1. The Nature of the Quantum State
  2. Past and Future Boundary Conditions
  3. Agency in Quantum Observers. 

The university also provides other scholarships for the MPP program.  Please apply before January 15, 2025, to receive full consideration for the available funding.

Please follow the “Graduate Application” link on the MPP website to apply.

For informal inquiries about the position and research projects, please get in touch with me.