Any action of a finite group on a finite set gives a linear representation of on the vector space with basis . This is called a ‘permutation represention’. And this raises a natural question: how many representations of finite groups are permutation representations?
Most representation are not permutation representations, since every permutation representation has a vector fixed by all elements of , namely the vector that’s the sum of all elements of . In other words, every permutation representation has a 1-dimensional trivial rep sitting inside it.
But what if we could ‘subtract off’ this trivial representation?
There are different levels of subtlety with which we can do this. For example, we can decategorify, and let:
the Burnside ring of be the ring of formal differences of isomorphism classes of actions of on finite sets;
the representation ring of be the ring of formal differences of isomorphism classes of finite-dimensional representations of .
There’s an obvious map , since any action of on a finite set gives a permutation representation of on the vector space with basis .
So now we can ask: is typically surjective, or typically not surjective?
In fact everything depends on what field we’re using for our vector spaces! For starters let’s
take .
Here’s a list of finite groups where the map from the Burnside ring to the representation ring is known to be surjective, taken from the nLab article Permutation representations:
cyclic groups,
symmetric groups,
-groups (that is, groups whose order is a power of the prime ),
binary dihedral groups for (at least) ,
the binary tetrahedral group, binary octahedral group, and binary icosahedral group,
the general linear group .
Now, these may seem like rather special classes of groups, but devoted readers of this blog know that most finite groups have order that’s a power of 2 I don’t think this has been proved yet, but it’s obviously true empirically, and we also have a good idea as to why:
So, map from the Burnside ring to the representation ring is surjective for most finite groups!
That’s if we work over . If we work over the situation flip-flops, and I believe is usually not surjective. It’s already not surjective for cyclic groups bigger than !
Why? Because has a 1-dimensional representation where the generator acts as multiplication by a primitive th root of unity, and since this is not a rational number when , this representation is not definable over . Thus (one can show), there’s no way to get this representation as a formal difference of permutation representations, since those are always definable over .
And this phenomenon of needing roots of unity to define representations is not special to cyclic groups: it happens for most finite groups as hinted at by Artin’s theorem on induced characters.
An example
Now, you’ll notice that I didn’t yet give an example of a finite group where the map from the Burnside ring to the representation ring fails to be surjective when we work over . I recently had a lot of fun doing an exercise in Serre’s book Linear Representations of Finite Groups, where he asks us in Exercise 13.4 to prove that is such a group. Here is the quaternion group, an 8-element subgroup of the invertible quaternions:
I couldn’t resist trying to understand why this is the counterexample Serre gave. For one thing, has 24 elements, and I love the number 24. For another, I love the quaternions.
Right now I believe is the smallest group for which is non-surjective. I’ve asked on MathOverflow, but so far nobody has answered that question. I got a lot of useful information about my other question, though: is surjective for most finite groups, or not?
To solve Serre’s problem 13.4, he asked you to use exercise 13.3. Here is my reformulation and solution of that problem. I believe any field characteristic zero would work for this theorem:
Theorem. Suppose is a finite group with a linear representation such that:
is irreducible and faithful
every subgroup of is normal
appears with multiplicity in the regular representation of .
Then the map from the Burnside ring of to the representation ring of is not surjective.
Proof. It suffices to prove that the multiplicity of in any permutation representation of is a multiple of , so that the class cannot be in the image of .
Since every finite -set is a coproduct of transitive actions of , which are isomorphic to actions on for subgroups of , every permutation representation of is a direct sum of those on spaces of the form . (This is my notation for the vector space with basis .) Thus, it suffices to show that the multiplicity of in the representation on is if is the trivial group, and otherwise.
The former holds by assumption 3. For the latter, suppose is a nontrivial subgroup of . Because is normal by assumption 2, every element acts trivially on : we can see this by letting act on an arbitrary basis element :
Since is nontrivial, it contains elements that act trivially on . But no can act trivially on because is faithful, by assumption 1. Thus cannot be a subrepresentation of . That is, appears with multiplicity in . ▮
Serre’s exercise 13.4 is to show the group obeys the conditions of this proposition. As a hint, Serre suggests to embed and in the multiplicative group of the algebra (the quaternions defined over ). By letting act by left multiplication and act by right multiplication, one obtains a 4-dimensional irreducible representation of which appears with multiplicity in the regular representation. Furthermore is faithful and irreducible, and every subgroup of is normal.
Grant Sanderson (who runs, and creates most of the content for, the website and Youtube channel 3blue1brown) has been collaborating with myself and others (including my coauthor Tanya Klowden) on producing a two-part video giving an account of some of the history of the cosmic distance ladder, building upon a previous public lecture I gave on this topic, and also relating to a forthcoming popular book with Tanya on this topic. The first part of this video is available here; the second part is available here.
The videos were based on a somewhat unscripted interview that Grant conducted with me some months ago, and as such contained some minor inaccuracies and omissions (including some made for editing reasons to keep the overall narrative coherent and within a reasonable length). They also generated many good questions from the viewers of the Youtube video. I am therefore compiling here a “FAQ” of various clarifications and corrections to the videos; this was originally placed as a series of comments on the Youtube channel, but the blog post format here will be easier to maintain going forward. Some related content will also be posted on the Instagram page for the forthcoming book with Tanya.
Questions on the two main videos are marked with an appropriate timestamp to the video.
4:26Did Eratosthenes really check a local well in Alexandria?
This was a narrative embellishment on my part. Eratosthenes’s original work is lost to us. The most detailed contemperaneous account, by Cleomedes, gives a simplified version of the method, and makes reference only to sundials (gnomons) rather than wells. However, a secondary account of Pliny states (using this English translation), “Similarly it is reported that at the town of Syene, 5000 stades South of Alexandria, at noon in midsummer no shadow is cast, and that in a well made for the sake of testing this the light reaches to the bottom, clearly showing that the sun is vertically above that place at the time”. However, no mention is made of any well in Alexandria in either account.
4:50How did Eratosthenes know that the Sun was so far away that its light rays were close to parallel?
This was not made so clear in our discussions or in the video (other than a brief glimpse of the timeline at 18:27), but Eratosthenes’s work actually came after Aristarchus, so it is very likely that Eratosthenes was aware of Aristarchus’s conclusions about how distant the Sun was from the Earth. Even if Aristarchus’s heliocentric model was disputed by the other Greeks, at least some of his other conclusions appear to have attracted some support. Also, after Eratosthenes’s time, there was further work by Greek, Indian, and Islamic astronomers (such as Hipparchus, Ptolemy, Aryabhata, and Al-Battani) to measure the same distances that Aristarchus did, although these subsequent measurements for the Sun also were somewhat far from modern accepted values.
5:17Is it completely accurate to say that on the summer solstice, the Earth’s axis of rotation is tilted “directly towards the Sun”?
Strictly speaking, “in the direction towards the Sun” is more accurate than “directly towards the Sun”; it tilts at about 23.5 degrees towards the Sun, but it is not a total 90-degree tilt towards the Sun.
5:39Wait, aren’t there two tropics? The tropic of Cancer and the tropic of Capricorn?
Yes! This corresponds to the two summers Earth experiences, one in the Northern hemisphere and one in the Southern hemisphere. The tropic of Cancer, at a latitude of about 23 degrees north, is where the Sun is directly overhead at noon during the Northern summer solstice (around June 21); the tropic of Capricorn, at a latitude of about 23 degrees south, is where the Sun is directly overhead at noon during the Southern summer solstice (around December 21). But Alexandria and Syene were both in the Northern Hemisphere, so it is the tropic of Cancer that is relevant to Eratosthenes’ calculations.
5:41Isn’t it kind of a massive coincidence that Syene was on the tropic of Cancer?
Actually, Syene (now known as Aswan) was about half a degree of latitude away from the tropic of Cancer, which was one of the sources of inaccuracy in Eratosthenes’ calculations. But one should take the “look-elsewhere effect” into account: because the Nile cuts across the tropic of Cancer, it was quite likely to happen that the Nile would intersect the tropic near some inhabited town. It might not necessarily have been Syene, but that would just mean that Syene would have been substituted by this other town in Eratosthenes’s account.
On the other hand, it was fortunate that the Nile ran from South to North, so that distances between towns were a good proxy for the differences in latitude. Apparently, Eratosthenes actually had a more complicated argument that would also work if the two towns in question were not necessarily oriented along the North-South direction, and if neither town was on the tropic of Cancer; but unfortunately the original writings of Eratosthenes are lost to us, and we do not know the details of this more general argument. (But some variants of the method can be found in later work of Posidonius, Aryabhata, and others.)
Nowadays, the “Eratosthenes experiment” is run every year on the March equinox, in which schools at the same longitude are paired up to measure the elevation of the Sun at the same point in time, in order to obtain a measurement of the circumference of the Earth. (The equinox is more convenient than the solstice when neither location is on a tropic, due to the simple motion of the Sun at that date.) With modern timekeeping, communications, surveying, and navigation, this is a far easier task to accomplish today than it was in Eratosthenes’ time.
6:30I thought the Earth wasn’t a perfect sphere. Does this affect this calculation?
Yes, but only by a small amount. The centrifugal forces caused by the Earth’s rotation along its axis cause an equatorial bulge and a polar flattening so that the radius of the Earth fluctuates by about 20 kilometers from pole to equator. This sounds like a lot, but it is only about 0.3% of the mean Earth radius of 6371 km and is not the primary source of error in Eratosthenes’ calculations.
7:27Are the riverboat merchants and the “grad student” the leading theories for how Eratosthenes measured the distance from Alexandria to Syene?
There is some recent research that suggests that Eratosthenes may have drawn on the work of professional bematists (step measurers – a precursor to the modern profession of surveyor) for this calculation. This somewhat ruins the “grad student” joke, but perhaps should be disclosed for the sake of completeness.
8:51How long is a “lunar month” in this context? Is it really 28 days?
In this context the correct notion of a lunar month is a “synodic month” – the length of a lunar cycle relative to the Sun – which is actually about 29 days and 12 hours. It differs from the “sidereal month” – the length of a lunar cycle relative to the fixed stars – which is about 27 days and 8 hours – due to the motion of the Earth around the Sun (or the Sun around the Earth, in the geocentric model). [A similar correction needs to be made around 14:59, using the synodic month of 29 days and 12 hours rather than the “English lunar month” of 28 days (4 weeks).]
10:47Is the time taken for the Moon to complete an observed rotation around the Earth slightly less than 24 hours as claimed?
Actually, I made a sign error: the lunar day (also known as a tidal day) is actually 24 hours and 50 minutes, because the Moon rotates in the same direction as the spinning of Earth around its axis. The animation therefore is also moving in the wrong direction as well (related to this, the line of sight is covering up the Moon in the wrong direction to the Moon rising at around 10:38).
11:32Is this really just a coincidence that the Moon and Sun have almost the same angular width?
I believe so. First of all, the agreement is not that good: due to the non-circular nature of the orbit of the Moon around the Earth, and Earth around the Sun, the angular width of the Moon actually fluctuates to be as much as 10% larger or smaller than the Sun at various times (cf. the “supermoon” phenomenon). All other known planets with known moons do not exhibit this sort of agreement, so there does not appear to be any universal law of nature that would enforce this coincidence. (This is in contrast with the empirical fact that the Moon always presents the same side to the Earth, which occurs in all other known large moons (as well as Pluto), and is well explained by the physical phenomenon of tidal locking.)
On the other hand, as the video hopefully demonstrates, the existence of the Moon was extremely helpful in allowing the ancients to understand the basic nature of the solar system. Without the Moon, their task would have been significantly more difficult; but in this hypothetical alternate universe, it is likely that modern cosmology would have still become possible once advanced technology such as telescopes, spaceflight, and computers became available, especially when combined with the modern mathematics of data science. Without giving away too many spoilers, a scenario similar to this was explored in the classic short story and novel “Nightfall” by Isaac Asimov.
12:58Isn’t the illuminated portion of the Moon, as well as the visible portion of the Moon, slightly smaller than half of the entire Moon, because the Earth and Sun are not an infinite distance away from the Moon?
Technically yes (and this is actually for a very similar reason to why half Moons don’t quite occur halfway between the new Moon and the full Moon); but this fact turns out to have only a very small effect on the calculations, and is not the major source of error. In reality, the Sun turns out to be about 86,000 Moon radii away from the Moon, so asserting that half of the Moon is illuminated by the Sun is actually a very good first approximation. (The Earth is “only” about 220 Moon radii away, so the visible portion of the Moon is a bit more noticeably less than half; but this doesn’t actually affect Aristarchus’s arguments much.)
13:27What is the difference between a half Moon and a quarter Moon?
If one divides the lunar month, starting and ending at a new Moon, into quarters (weeks), then half moons occur both near the end of the first quarter (a week after the new Moon, and a week before the full Moon), and near the end of the third quarter (a week after the full Moon, and a week before the new Moon). So, somewhat confusingly, half Moons come in two types, known as “first quarter Moons” and “third quarter Moons”.
14:49I thought the sine function was introduced well after the ancient Greeks.
It’s true that the modern sine function only dates back to the Indian and Islamic mathematical traditions in the first millennium CE, several centuries after Aristarchus. However, he still had Euclidean geometry at his disposal, which provided tools such as similar triangles that could be used to reach basically the same conclusions, albeit with significantly more effort than would be needed if one could use modern trigonometry.
On the other hand, Aristarchus was somewhat hampered by not knowing an accurate value for , which is also known as Archimedes’ constant: the fundamental work of Archimedes on this constant actually took place a few decades after that of Aristarchus!
15:17I plugged in the modern values for the distances to the Sun and Moon and got 18 minutes for the discrepancy, instead of half an hour.
Yes; I quoted the wrong number here. In 1630, Godfried Wendelen replicated Aristarchus’s experiment. With improved timekeeping and the then-recent invention of the telescope, Wendelen obtained a measurement of half an hour for the discrepancy, which is significantly better than Aristarchus’s calculation of six hours, but still a little bit off from the true value of 18 minutes. (As such, Wendelinus’s estimate for the distance to the Sun was 60% of the true value.)
15:27Wouldn’t Aristarchus also have access to other timekeeping devices than sundials?
Yes, for instance clepsydrae (water clocks) were available by that time; but they were of limited accuracy. It is also possible that Aristarchus could have used measurements of star elevations to also estimate time; it is not clear whether the astrolabe or the armillary sphere was available to him, but he would have had some other more primitive astronomical instruments such as the dioptra at his disposal. But again, the accuracy and calibration of these timekeeping tools would have been poor.
However, most likely the more important limiting factor was the ability to determine the precise moment at which a perfect half Moon (or new Moon, or full Moon) occurs; this is extremely difficult to do with the naked eye. (The telescope would not be invented for almost two more millennia.)
17:37Could the parallax problem be solved by assuming that the stars are not distributed in a three-dimensional space, but instead on a celestial sphere?
Putting all the stars on a fixed sphere would make the parallax effects less visible, as the stars in a given portion of the sky would now all move together at the same apparent velocity – but there would still be visible large-scale distortions in the shape of the constellations because the Earth would be closer to some portions of the celestial sphere than others; there would also be variability in the brightness of the stars, and (if they were very close) the apparent angular diameter of the stars. (These problems would be solved if the celestial sphere was somehow centered around the moving Earth rather than the fixed Sun, but then this basically becomes the geocentric model with extra steps.)
18:29Did nothing of note happen in astronomy between Eratosthenes and Copernicus?
Not at all! There were significant mathematical, technological, theoretical, and observational advances by astronomers from many cultures (Greek, Islamic, Indian, Chinese, European, and others) during this time, for instance improving some of the previous measurements on the distance ladder, a better understanding of eclipses, axial tilt, and even axial precession, more sophisticated trigonometry, and the development of new astronomical tools such as the astrolabe. See for instance this “deleted scene” from the video, as well as the FAQ entry for 14:49 for this video and 24:54 for the second video. But in order to make the overall story of the cosmic distance ladder fit into a two-part video, we chose to focus primarily on the first time each rung of the ladder was climbed.
We have since learned that this portrait was most likely painted in the 19th century, and may have been based more on Kepler’s mentor, Michael Mästlin. A more commonly accepted portrait of Kepler may be found at his current Wikipedia page.
19:07Isn’t it tautological to say that the Earth takes one year to perform a full orbit around the Sun?
Technically yes, but this is an illustration of the philosophical concept of “referential opacity“: the content of a sentence can change when substituting one term for another (e.g., “1 year” and “365 days”), even when both terms refer to the same object. Amusingly, the classic illustration of this, known as Frege’s puzzles, also comes from astronomy: it is an informative statement that Hesperus (the evening star) and Phosphorus (the morning star, also known as Lucifer) are the same object (which nowadays we call Venus), but it is a mere tautology that Hesperus and Hesperus are the same object: changing the reference from Phosphorus to Hesperus changes the meaning.
19:10How did Copernicus figure out the crucial fact that Mars takes 687 days to go around the Sun? Was it directly drawn from Babylonian data?
Technically, Copernicus drew from tables by European astronomers that were largely based on earlier tables from the Islamic golden age, which in turn drew from earlier tables by Indian and Greek astronomers, the latter of which also incorporated data from the ancient Babylonians, so it is more accurate to say that Copernicus relied on centuries of data, at least some of which went all the way back to the Babylonians. Among all of this data was the times when Mars was in opposition to the Sun; if one imagines the Earth and Mars as being like runners going around a race track circling the Sun, with Earth on an inner track and Mars on an outer track, oppositions are analogous to when the Earth runner “laps” the Mars runner. From the centuries of observational data, such “laps” were known to occur about once every 780 days (this is known as the synodic period of Mars). Because the Earth takes roughly 365 days to perform a “lap”, it is possible to do a little math and conclude that Mars must therefore complete its own “lap” in 687 days (this is known as the sidereal period of Mars). (See also this post on the cosmic distance ladder Instagram for some further elaboration.)
The situation is complex. When Kepler served as Brahe’s assistant, Brahe only provided Kepler with a limited amount of data, primarily involving Mars, in order to confirm Brahe’s own geo-heliocentric model. After Brahe’s death, the data was inherited by Brahe’s son-in-law and other relatives, who intended to publish Brahe’s work separately; however, Kepler, who was appointed as Imperial Mathematician to succeed Brahe, had at least some partial access to the data, and many historians believe he secretly copied portions of this data to aid his own research before finally securing complete access to the data from Brahe’s heirs after several years of disputes. On the other hand, as intellectual property rights laws were not well developed at this time, Kepler’s actions were technically legal, if ethically questionable.
21:39What is that funny loop in the orbit of Mars?
This is known as retrograde motion. This arises because the orbital velocity of Earth (about 30 km/sec) is a little bit larger than that of Mars (about 24 km/sec). So, in opposition (when Mars is in the opposite position in the sky than the Sun), Earth will briefly overtake Mars, causing its observed position to move westward rather than eastward. But in most other times, the motion of Earth and Mars are at a sufficient angle that Mars will continue its apparent eastward motion despite the slightly faster speed of the Earth.
21:59Couldn’t one also work out the direction to other celestial objects in addition to the Sun and Mars, such as the stars, the Moon, or the other planets? Would that have helped?
Actually, the directions to the fixed stars were implicitly used in all of these observations to determine how the celestial sphere was positioned, and all the other directions were taken relative to that celestial sphere. (Otherwise, all the calculations would be taken on a rotating frame of reference in which the unknown orbits of the planets were themselves rotating, which would have been an even more complex task.) But the stars are too far away to be useful as one of the two landmarks to triangulate from, as they generate almost no parallax and so cannot distinguish one location from another.
Measuring the direction to the Moon would tell you which portion of the lunar cycle one was in, and would determine the phase of the Moon, but this information would not help one triangulate, because the Moon’s position in the heliocentric model varies over time in a somewhat complicated fashion, and is too tied to the motion of the Earth to be a useful “landmark” to one to determine the Earth’s orbit around the Sun.
In principle, using the measurements to all the planets at once could allow for some multidimensional analysis that would be more accurate than analyzing each of the planets separately, but this would require some sophisticated statistical analysis and modeling, as well as non-trivial amounts of compute – neither of which were available in Kepler’s time.
22:57Can you elaborate on how we know that the planets all move on a plane?
The Earth’s orbit lies in a plane known as the ecliptic (it is where the lunar and solar eclipses occur). Different cultures have divided up the ecliptic in various ways; in Western astrology, for instance, the twelve main constellations that cross the ecliptic are known as the Zodiac. The planets can be observed to only wander along the Zodiac, but not other constellations: for instance, Mars can be observed to be in Cancer or Libra, but never in Orion or Ursa Major. From this, one can conclude (as a first approximation, at least), that the planets all lie on the ecliptic.
However, this isn’t perfectly true, and the planets will deviate from the ecliptic by a small angle known as the ecliptic latitude. Tycho Brahe’s observations on these latitudes for Mars were an additional useful piece of data that helped Kepler complete his calculations (basically by suggesting how to join together the different “jigsaw pieces”), but the math here gets somewhat complicated, so the story here has been somewhat simplified to convey the main ideas.
23:04What are the other universal problem solving tips?
Grant Sanderson has a list (in a somewhat different order) in this previous video.
23:28Can one work out the position of Earth from fixed locations of the Sun and Mars when the Sun and Mars are in conjunction (the same location in the sky) or opposition (opposite locations in the sky)?
Technically, these are two times when the technique of triangulation fails to be accurate; and also in the former case it is extremely difficult to observe Mars due to the proximity to the Sun. But again, following the Universal Problem Solving Tip from 23:07, one should initially ignore these difficulties to locate a viable method, and correct for these issues later. This videoseries by Welch Labs goes into Kepler’s methods in more detail.
24:04So Kepler used Copernicus’s calculation of 687 days for the period of Mars. But didn’t Kepler discard Copernicus’s theory of circular orbits?
Good question! It turns out that Copernicus’s calculations of orbital periods are quite robust (especially with centuries of data), and continue to work even when the orbits are not perfectly circular. But even if the calculations did depend on the circular orbit hypothesis, it would have been possible to use the Copernican model as a first approximation for the period, in order to get a better, but still approximate, description of the orbits of the planets. This in turn can be fed back into the Copernican calculations to give a second approximation to the period, which can then give a further refinement of the orbits. Thanks to the branch of mathematics known as perturbation theory, one can often make this type of iterative process converge to an exact answer, with the error in each successive approximation being smaller than the previous one. (But performing such an iteration would probably have been beyond the computational resources available in Kepler’s time; also, the foundations of perturbation theory require calculus, which only was developed several decades after Kepler.)
24:21Did Brahe have exactly 10 years of data on Mars’s positions?
Actually, it was more like 17 years, but with many gaps, due both to inclement weather, as well as Brahe turning his attention to other astronomical objects than Mars in some years; also, in times of conjunction, Mars might only be visible in the daytime sky instead of the night sky, again complicating measurements. So the “jigsaw puzzle pieces” in 25:26 are in fact more complicated than always just five locations equally spaced in time; there are gaps and also observational errors to grapple with. But to understand the method one should ignore these complications; again, see “Universal Problem Solving Tip #1”. Even with his “idea of true genius”, it took many years of further painstaking calculation for Kepler to tease out his laws of planetary motion from Brahe’s messy and incomplete observational data.
26:44Shouldn’t the Earth’s orbit be spread out at perihelion and clustered closer together at aphelion, to be consistent with Kepler’s laws?
Yes, you are right; there was a coding error here.
26:53What is the reference for Einstein’s “idea of pure genius”?
Actually, the precise quote was “an idea of true genius”, and can be found in the introduction to Carola Baumgardt’s “Life of Kepler“.
Strictly speaking; no; his writings are all in Arabic, and he was nominally a subject of the Abbasid Caliphate whose rulers were Arab; but he was born in Khwarazm (in modern day Uzbekistan), and would have been a subject of either the Samanid empire or the Khrawazmian empire, both of which were largely self-governed and primarily Persian in culture and ethnic makeup, despite being technically vassals of the Caliphate. So he would have been part of what is sometimes called “Greater Persia” or “Greater Iran”.
Another minor correction: while Al-Biruni was born in the tenth century, his work on the measurement of the Earth was published in the early eleventh century.
Is really called the angle of declination?
This was a misnomer on my part; this angle is more commonly called the dip angle.
But the height of the mountain would be so small compared to the radius of the Earth! How could this method work?
Using the Taylor approximation , one can approximately write the relationship between the mountain height , the Earth radius , and the dip angle (in radians) as . The key point here is the inverse quadratic dependence on , which allows for even relatively small values of to still be realistically useful for computing . Al-Biruni’s measurement of the dip angle was about radians, leading to an estimate of that is about four orders of magnitude larger than , which is within ballpark at least of a typical height of a mountain (on the order of a kilometer) and the radius of the Earth (6400 kilometers).
Was the method really accurate to within a percentage point?
This is disputed, somewhat similarly to the previous calculations of Eratosthenes. Al-Biruni’s measurements were in cubits, but there were multiple incompatible types of cubit in use at the time. It has also been pointed out that atmospheric refraction effects would have created noticeable changes in the observed dip angle . It is thus likely that the true accuracy of Al-Biruni’s method was poorer than 1%, but that this was somehow compensated for by choosing a favorable conversion between cubits and modern units.
1:13Did Captain Cook set out to discover Australia?
One of the objectives of Cook’s first voyage was to discover the hypothetical continent of Terra Australis. This was considered to be distinct from Australia, which at the time was known as New Holland. As this name might suggest, prior to Cook’s voyage, the northwest coastline of New Holland had been explored by the Dutch; Cook instead explored the eastern coastline, naming this portion New South Wales. The entire continent was later renamed to Australia by the British government, following a suggestion of Matthew Flinders; and the concept of Terra Australis was abandoned.
4:40The relative position of the Northern and Southern hemisphere observations is reversed from those earlier in the video.
Yes, this was a slight error in the animation; the labels here should be swapped for consistency of orientation.
7:06So, when did they finally manage to measure the transit of Venus, and use this to compute the astronomical unit?
While Le Gentil had the misfortune to not be able to measure either the 1761 or 1769 transits, other expeditions of astronomers (led by Dixon-Mason, Chappe d’Auteroche, and Cook) did take measurements of one or both of these transits with varying degrees of success, with the measurements of Cook’s team of the 1769 transit in Tahiti being of particularly high quality. All of this data was assembled later by Lalande in 1771, leading to the most accurate measurement of the astronomical unit at the time (within 2.3% of modern values, which was about three times more accurate than any previous measurement).
8:53What does it mean for the transit of Io to be “twenty minutes ahead of schedule” when Jupiter is in opposition (Jupiter is opposite to the Sun when viewed from the Earth)?
Actually, it should be halved to “ten minutes ahead of schedule”, with the transit being “ten minutes behind schedule” when Jupiter is in conjunction, with the net discrepancy being twenty minutes (or actually closer to 16 minutes when measured with modern technology). Both transits are being compared against an idealized periodic schedule in which the transits are occuring at a perfectly regular rate (about 42 hours), where the period is chosen to be the best fit to the actual data. This discrepancy is only noticeable after carefully comparing transit times over a period of months; at any given position of Jupiter, the Doppler effects of Earth moving towards or away from Jupiter would only affect shift each transit by just a few seconds compared to the previous transit, with the delays or accelerations only becoming cumulatively noticeable after many such transits.
Also, the presentation here is oversimplified: at times of conjunction, Jupiter and Io are too close to the Sun for observation of the transit. Rømer actually observed the transits at other times than conjunction, and Huygens used more complicated trigonometry than what was presented here to infer a measurement for the speed of light in terms of the astronomical unit (which they had begun to measure a bit more accurately than in Aristarchus’s time; see the FAQ entry for 15:17 in the first video).
10:05Are the astrological signs for Earth and Venus swapped here?
Yes, this was a small mistake in the animation.
10:34Shouldn’t one have to account for the elliptical orbit of the Earth, as well as the proper motion of the star being observed, or the effects of general relativity?
Yes; the presentation given here is a simplified one to convey the idea of the method, but in the most advanced parallax measurements, such as the ones taken by the Hipparcos and Gaia spacecraft, these factors are taken into account, basically by taking as many measurements (not just two) as possible of a single star, and locating the best fit of that data to a multi-parameter model that incorporates the (known) orbit of the Earth with the (unknown) distance and motion of the star, as well as additional gravitational effects from other celestial bodies, such as the Sun and other planets.
14:53The formula I was taught for apparent magnitude of stars looks a bit different from the one here.
This is because astronomers use a logarithmic scale to measure both apparent magnitude and absolute magnitude. If one takes the logarithm of the inverse square law in the video, and performs the normalizations used by astronomers to define magnitude, one arrives at the standard relation between absolute and apparent magnitude.
But this is an oversimplification, most notably due to neglect of the effects of extinction effects caused by interstellar dust. This is not a major issue for the relatively short distances observable via parallax, but causes problems at larger scales of the ladder (see for instance the FAQ entry here for 18:08). To compensate for this, one can work in multiple frequencies of the spectrum (visible, x-ray, radio, etc.), as some frequencies are less susceptible to extinction than others. From the discrepancies between these frequencies one can infer the amount of extinction, leading to “dust maps” that can then be used to facilitate such corrections for subsequent measurements in the same area of the universe. (More generally, the trend in modern astronomy is towards “multi-messenger astronomy” in which one combines together very different types of measurements of the same object to obtain a more accurate understanding of that object and its surroundings.)
18:08Can we really measure the entire Milky Way with this method?
Strictly speaking, there is a “zone of avoidance” on the far side of the Milky way that is very difficult to measure in the visible portion of the spectrum, due to the large amount of intervening stars, dust, and even a supermassive black hole in the galactic center. However, in recent years it has become possible to explore this zone to some extent using the radio, infrared, and x-ray portions of the spectrum, which are less affected by these factors.
18:19How did astronomers know that the Milky Way was only a small portion of the entire universe?
This issue was the topic of the “Great Debate” in the early twentieth century. It was only with the work of Hubble using Leavitt’s law to measure distances to Magellanic clouds and “spiral nebulae” (that we now know to be other galaxies), building on earlier work of Leavitt and Hertzsprung, that it was conclusively established that these clouds and nebulae in fact were at much greater distances than the diameter of the Milky Way.
18:45How can one compensate for light blending effects when measuring the apparent magnitude of Cepheids?
This is a non-trivial task, especially if one demands a high level of accuracy. Using the highest resolution telescopes available (such as HST or JWST) is of course helpful, as is switching to other frequencies, such as near-infrared, where Cepheids are even brighter relative to nearby non-Cepheid stars. One can also apply sophisticated statistical methods to fit to models of the point spread of light from unwanted sources, and use nearby measurements of the same galaxy without the Cepheid as a reference to help calibrate those models. Improving the accuracy of the Cepheid portion of the distance ladder is an ongoing research activity in modern astronomy.
18:54What is the mechanism that causes Cepheids to oscillate?
For most stars, there is an equilibrium size: if the star’s radius collapses, then the reduced potential energy is converted to heat, creating pressure to pushing the star outward again; and conversely, if the star expands, then it cools, causing a reduction in pressure that no longer counteracts gravitational forces. But for Cepheids, there is an additional mechanism called the kappa mechanism: the increased temperature caused by contraction increases ionization of helium, which drains energy from the star and accelerates the contraction; conversely, the cooling caused by expansion causes the ionized helium to recombine, with the energy released accelerating the expansion. If the parameters of the Cepheid are in a certain “instability strip”, then the interaction of the kappa mechanism with the other mechanisms of stellar dynamics create a periodic oscillation in the Cepheid’s radius, which increases with the mass and brightness of the Cepheid.
For a recent re-analysis of Leavitt’s original Cepheid data, see this paper.
19:15Was Leavitt’s law really a linear law between period and luminosity?
Strictly speaking, the period-luminosity relation commonly known as Leavitt’s law was a linear relation between the absolute magnitude of the Cepheid and the logarithm of the period; undoing the logarithms, this becomes a power law between the luminosity and the period.
20:26Was Hubble the one to discover the redshift of galaxies?
This was an error on my part; Hubble was using earlier work of Vesto Slipher on these redshifts, and combining it with his own measurements of distances using Leavitt’s law to arrive at the law that now bears his name; he was also assisted in his observations by Milton Humason. It should also be noted that Georges Lemaître had also independently arrived at essentially the same law a few years prior, but his work was published in a somewhat obscure journal and did not receive broad recognition until some time later.
20:37Hubble’s original graph doesn’t look like a very good fit to a linear law.
Hubble’s original data was somewhat noisy and inaccurate by modern standards, and the redshifts were affected by the peculiar velocities of individual galaxies in addition to the expanding nature of the universe. However, as the data was extended to more galaxies, it became increasingly possible to compensate for these effects and obtain a much tighter fit, particularly at larger scales where the effects of peculiar velocity are less significant. See for instance this article from 2015 where Hubble’s original graph is compared with a more modern graph. This more recent graph also reveals a slight nonlinear correction to Hubble’s law at very large scales that has led to the remarkable discovery that the expansion of the universe is in fact accelerating over time, a phenomenon that is attributed to a positive cosmological constant (or perhaps a more complex form of dark energy in the universe). On the other hand, even with this nonlinear correction, there continues to be a roughly 10% discrepancy of this law with predictions based primarily on the cosmic microwave background radiation; see the FAQ entry for 23:49.
20:46Does general relativity alone predict an uniformly expanding universe?
This was an oversimplification. Einstein’s equations of general relativity contain a parameter , known as the cosmological constant, which currently is only computable indirectly from fitting to experimental data. But even with this constant fixed, there are multiple solutions to these equations (basically because there are multiple possible initial conditions for the universe). For the purposes of cosmology, a particularly successful family of solutions are the solutions given by the Lambda-CDM model. This family of solutions contains additional parameters, such as the density of dark matter in the universe. Depending on the precise values of these parameters, the universe could be expanding or contracting, with the rate of expansion or contraction either increasing, decreasing, or staying roughly constant. But if one fits this model to all available data (including not just red shift measurements, but also measurements on the cosmic microwave background radiation and the spatial distribution of galaxies), one deduces a version of Hubble’s law which is nearly linear, but with an additional correction at very large scales; see the next item of this FAQ.
21:07Is Hubble’s original law sufficiently accurate to allow for good measurements of distances at the scale of the observable universe?
Not really; as mentioned in the end of the video, there were additional efforts to cross-check and calibrate Hubble’s law at intermediate scales between the range of Cepheid methods (about 100 million light years) and observable universe scales (about 100 billion light years) by using further “standard candles” than Cepheids, most notably Type Ia supernovae (which are bright enough and predictable enough to be usable out to about 10 billion light years), the Tully-Fisher relation between the luminosity of a galaxy and its rotational speed, and gamma ray bursts. It turns out that due to the accelerating nature of the universe’s expansion, Hubble’s law is not completely linear at these large scales; this important correction cannot be discerned purely from Cepheid data, but also requires the other standard candles, as well as fitting that data (as well as other observational data, such as the cosmic microwave background radiation) to the cosmological models provided by general relativity (with the best fitting models to date being some version of the Lambda-CDM model).
On the other hand, a naive linear extrapolation of Hubble’s original law to all larger scales does provide a very rough picture of the observable universe which, while too inaccurate for cutting edge research in astronomy, does give some general idea of its large-scale structure.
21:15Where did this guess of the observable universe being about 20% of the full universe come from?
There are some ways to get a lower bound on the size of the entire universe that go beyond the edge of the observable universe. One is through analysis of the cosmic microwave background radiation (CMB), that has been carefully mapped out by several satellite observatories, most notably WMAP and Planck. Roughly speaking, a universe that was less than twice the size of the observable universe would create certain periodicities in the CMB data; such periodicities are not observed, so this provides a lower bound (see for instance this paper for an example of such a calculation). The 20% number was a guess based on my vague recollection of these works, but there is no consensus currently on what the ratio truly is; there are some proposals that the entire universe is in fact several orders of magnitude larger than the observable one.
The situation is somewhat analogous to Aristarchus’s measurement of the distance to the Sun, which was very sensitive to a small angle (the half-moon discrepancy). Here, the predicted size of the universe under the standard cosmological model is similarly dependent in a highly sensitive fashion on a measure of the flatness of the universe which, for reasons still not fully understood (but likely caused by some sort of inflation mechanism), happens to be extremely close to zero. As such, predictions for the size of the universe remain highly volatile at the current level of measurement accuracy.
23:44Was it a black hole collision that allowed for an independent measurement of Hubble’s law?
This was a slight error in the presentation. While the first gravitational wave observation by LIGO in 2015 was of a black hole collision, it did not come with an electromagnetic counterpart that allowed for a redshift calculation that would yield a Hubble’s law measurement. However, a later collision of neutron stars, observed in 2017, did come with an associated kilonova in which a redshift was calculated, and led to a Hubble measurement which was independent of most of the rungs of the distance ladder.
23:49Where can I learn more about this 10% discrepancy in Hubble’s law?
This is known as the Hubble tension (or, in more sensational media, the “crisis in cosmology”): roughly speaking, the various measurements of Hubble’s constant (either from climbing the cosmic distance ladder, or by fitting various observational data to standard cosmological models) tend to arrive at one of two values, that are about 10% apart from each other. The values based on gravitational wave observations are currently consistent with both values, due to significant error bars in this extremely sensitive method; but other more mature methods are now of sufficient accuracy that they are basically only consistent with one of the two values. Currently there is no consensus on the origin of this tension: possibilities include systemic biases in the observational data, subtle statistical issues with the methodology used to interpret the data, a correction to the standard cosmological model, the influence of some previously undiscovered law of physics, or some partial breakdown of the Copernican principle.
For an accessible recent summary of the situation, see this video by Becky Smethurst (“Dr. Becky”).
24:49So, what is a Type Ia supernova and why is it so useful in the distance ladder?
A Type Ia supernova occurs when a white dwarf in a binary system draws more and more mass from its companion star, until it reaches the Chandrasekhar limit, at which point its gravitational forces are strong enough to cause a collapse that increases the pressure to the point where a supernova is triggered via a process known as carbon detonation. Because of the universal nature of the Chandrasekhar limit, all such supernovae have (as a first approximation) the same absolute brightness and can thus be used as standard candles in a similar fashion to Cepheids (but without the need to first measure any auxiliary observable, such as a period). But these supernovae are also far brighter than Cepheids, and can so this method can be used at significantly larger distances than the Cepheid method (roughly speaking it can handle distances of ~100 billion light years, whereas Cepheids are reliable out to ~10 billion light years). Among other things, the supernovae measurements were the key to detecting an important nonlinear correction to Hubble’s law at these scales, leading to the remarkable conclusion that the expansion of the universe is in fact accelerating over time, which in the Lambda-CDM model corresponds to a positive cosmological constant, though there are more complex “dark energy” models that are also proposed to explain this acceleration.
This is partly due to time constraints, and the need for editing to tighten the narrative, but was also a conscious decision on my part. Advanced classes on the distance ladder will naturally focus on the most modern, sophisticated, and precise ways to measure distances, backed up by the latest mathematics, physics, technology, observational data, and cosmological models. However, the focus in this video series was rather different; we sought to portray the cosmic distance ladder as evolving in a fully synergestic way, across many historical eras, with the evolution of mathematics, science, and technology, as opposed to being a mere byproduct of the current state of these other disciplines. As one specific consequence of this change of focus, we emphasized the first time any rung of the distance ladder was achieved, at the expense of more accurate and sophisticated later measurements at that rung. For instance, refinements in the measurement of the radius of the Earth since Eratosthenes, improvements in the measurement of the astronomical unit between Aristarchus and Cook, or the refinements of Hubble’s law and the cosmological model of the universe in the twentieth and twenty-first centuries, were largely omitted (though some of the answers in this FAQ are intended to address these omissions).
Many of the topics not covered here (or only given a simplified treatment) are discussed in depth in other expositions, including other Youtube videos. I would welcome suggestions from readers for links to such resources in the comments to this post. Here is a partial list:
“Eratosthenes” – Cosmos (Carl Sagan), video posted Apr 24, 2009 (originally released Oct 1, 1980, as part of the episode “The Shores of the Cosmic Ocean”).
“How Far Away Is It” – David Butler, a multi-part series beginning Aug 16 2013.
Winter is not over yet, but I am already busy fixing the details of some conferences, schools, and lectures I will give around Europe this summer. Here I wish to summarize them, in the hope of arising the interest of some of you in the relevant events I will attend to.
Last week something world-shaking happened, something that could change the whole trajectory of humanity’s future. No, not that—we’ll get to that later.
For now I’m talking about the “Emergent Misalignment” paper. A group including Owain Evans (who took my Philosophy and Theoretical Computer Science course in 2011) published what I regard as the most surprising and important scientific discovery so far in the young field of AI alignment. (See also Zvi’s commentary.) Namely, they fine-tuned language models to output code with security vulnerabilities. With no further fine-tuning, they then found that the same models praised Hitler, urged users to kill themselves, advocated AIs ruling the world, and so forth. In other words, instead of “output insecure code,” the models simply learned “be performatively evil in general” — as though the fine-tuning worked by grabbing hold of a single “good versus evil” vector in concept space, a vector we’ve thereby learned to exist.
(“Of course AI models would do that,” people will inevitably say. Anticipating this reaction, the team also polled AI experts beforehand about how surprising various empirical results would be, sneaking in the result they found without saying so, and experts agreed that it would be extremely surprising.)
Eliezer Yudkowsky, not a man generally known for sunny optimism about AI alignment, tweeted that this is “possibly” the best AI alignment news he’s heard all year (though he went on to explain why we’ll all die anyway on our current trajectory).
Why is this such a big deal, and why did even Eliezer treat it as good news?
Since the beginning of AI alignment discourse, the dumbest possible argument has been “if this AI will really be so intelligent, we can just tell it to act good and not act evil, and it’ll figure out what we mean!” Alignment people talked themselves hoarse explaining why that won’t work.
Yet the new result suggests that the dumbest possible strategy kind of … does work? In the current epoch, at any rate, if not in the future? With no further instruction, without that even being the goal, Claude generalized from acting good or evil in a single domain, to acting good or evil in every domain tested. Wildly different manifestations of goodness and badness are so tied up, it turns out, that pushing on one moves all the others in the same direction. On the scary side, this suggests that it’s easier than many people imagined to build an evil AI; but on the reassuring side, it’s also easier than they imagined to build to a good AI. Either way, you just drag the internal Good vs. Evil slider to wherever you want it!
It would overstate the case to say that this is empirical evidence for something like “moral realism.” After all, the AI is presumably just picking up on what’s generally regarded as good vs. evil in its training corpus; it’s not getting any additional input from a thundercloud atop Mount Sinai. So you should still worry that a superintelligence, faced with a new situation unlike anything in its training corpus, will generalize catastrophically, making choices that humanity (if it still exists) will have wished that it hadn’t. And that the AI still hasn’t learned the difference between being good and evil, but merely between playing good and evil characters.
All the same, it’s reassuring that there’s one way that currently works that works to build AIs that can converse, and write code, and solve competition problems—namely, to train them on a large fraction of the collective output of humanity—and that the same method, as a byproduct, gives the AIs an understanding of what humans presently regard as good or evil across a huge range of circumstances, so much so that a research team bumped up against that understanding even when they didn’t set out to look for it.
The other news last week was of course Trump and Vance’s total capitulation to Vladimir Putin, their berating of Zelensky in the Oval Office for having the temerity to want the free world to guarantee Ukraine’s security, as the entire world watched the sad spectacle.
Here’s the thing. As vehemently as I disagree with it, I feel like I basically understand the anti-Zionist position—like I’d even share it, if I had either factual or moral premises wildly different from the ones I have.
Likewise for the anti-abortion position. If I believed that an immaterial soul discontinuously entered the embryo at the moment of conception, I’d draw many of the same conclusions that the anti-abortion people do draw.
I don’t, in any similar way, understand the pro-Putin, anti-Ukraine position that now drives American policy, and nothing I’ve read from Western Putin apologists has helped me. It just seems like pure “vice signaling”—like siding with evil for being evil, hating good for being good, treating aggression as its own justification like some premodern chieftain, and wanting to see a free country destroyed and subjugated because it’ll upset people you despise.
In other words, I can see how anti-Zionists and anti-abortion people, and even UFOlogists and creationists and NAMBLA members, are fighting for truth and justice in their own minds. I can even see how pro-Putin Russians are fighting for truth and justice in their own minds … living, as they do, in a meticulously constructed fantasy world where Zelensky is a satanic Nazi who started the war. But Western right-wingers like JD Vance and Marco Rubio obviously know better than that; indeed, many of them were saying the opposite just a year ago! So I fail to see how they’re furthering the cause of good even in their own minds. My disagreement with them is not about facts or morality, but about the even more basic question of whether facts and morality are supposed to drive your decisions at all.
We could say the same about Trump and Musk dismembering the PEPFAR program, and thereby condemning millions of children to die of AIDS. Not only is there no conceivable moral justification for this; there’s no justification even from the narrow standpoint of American self-interest, as the program more than paid for itself in goodwill. Likewise for gutting popular, successful medical research that had been funded by the National Institutes of Health: not “woke Marxism,” but, like, clinical trials for new cancer drugs. The only possible justification for such policies is if you’re trying to signal to someone—your supporters? your enemies? yourself?—just how callous and evil you can be. As they say, “the cruelty is the point.”
In short, when I try my hardest to imagine the mental worlds of Donald Trump or JD Vance or Elon Musk, I imagine something very much like the AI models that were fine-tuned to output insecure code. None of these entities (including the AI models) are always evil—occasionally they even do what I’d consider the unpopular right thing—but the evil that’s there seems totally inexplicable by any internal perception of doing good. It’s as though, by pushing extremely hard on a single issue (birtherism? gender transition for minors?), someone inadvertently flipped the signs of these men’s good vs. evil vectors. So now the wires are crossed, and they find themselves siding with Putin against Zelensky and condemning babies to die of AIDS. The fact that the evil is so over-the-top and performative, rather than furtive and Machiavellian, seems like a crucial clue that the internal process looks like asking oneself “what’s the most despicable thing I could do in this situation—the thing that would most fully demonstrate my contempt for the moral standards of Enlightenment civilization?,” and then doing that thing.
Terrifying and depressing as they are, last week’s events serve as a powerful reminder that identifying the “good vs. evil” direction in concept space is only a first step. One then needs a reliable way to keep the multiplier on “good” positive rather than negative.
As part of my post last week about measurement and measurement devices, I provided a very simple example of a measuring device. It consists of a ball sitting in a dip on a hill (Fig. 1a), or, as a microscopic version of the same, a microsopic ball, made out of only a small number of atoms, in a magnetic trap (Fig. 1b). Either object, if struck hard by an incoming projectile, can escape and never return, and so the absence of the ball from the dip (or trap) serves to confirm that a projectile has come by. The measurement is crude — it only tells us whether there was a projectile or not — but it is reasonably definitive.
Fig. 1a: A ball in a dimple on the side of the hill will be easily and permanently removed from its perch if struck by a passing object.
Fig. 1b: Similarly to Fig. 1a, a microscopic ball in a trap made from electric and/or magnetic field may easily escape the trap if struck.
In fact, we could learn more about the projectile with a bit more work. If we measured the ball’s position and speed (approximately, to the degree allowed by the quantum uncertainty principle), we would get an estimate of the energy carried by the projectile and the time when the collision occurred. But how definitive would these measurements be?
With a macroscopic ball, we’d be pretty safe in drawing conclusions. However, if the objects being measured and the measurement device are ultra-microscopic — something approaching atomic size or even smaller — then the measurement evidence is fragile. Our efforts to learn something from the microscopic ball will be in vain if the ball suffers additional collisions before we get to study it. Indeed, if a tiny ball interacts with any other object, microscopic or macroscopic, there is a risk that the detailed information about its collision with the projectile will be lost, long before we are able to obtain it.
Amplify Quickly
The best way to keep this from happening is to quickly translate the information from the collision, as captured in the microscopic ball’s behavior, into some kind of macroscopic effect. Once the information is stored macroscopically, it is far harder to erase.
For instance, while a large meteor striking the Earth might leave a pond-sized crater, a subatomic particle striking a metal table might leave a hole only an atom wide. It doesn’t take much to fill in an atom-sized hole in the blink of an eye, but a crater that you could swim in isn’t going to disappear overnight. So if we want to know about the subatomic particle’s arrival, it would be good if we could quickly cause the hole to grow much larger.
This is why almost all microscopic measurements include a step of amplification — the conversion of a microscopic effect into a macroscopic one. Finding new, clever and precise ways of doing this is part of the creativity and artistry of experimental physicists who study atoms, atomic nuclei, or elementary particles.
There are various methods of amplification, but most methods can be thought of, in a sort of cartoon view, as a chain of ever more stable measurements, such as this:
a first measurement using a microscopic device, such as our tiny ball in a trap;
a second measurement that measures the device itself, using a more stable device;
a third measurement that measures the second device, using an even more stable one;
and so on in a chain until the last device is so stable that its information cannot easily or quickly be erased.
Amplification in Experiments
The Geiger-Müller Counter
A classic and simple device that uses amplification is a Geiger counter (or Geiger-Müller counter). (Hans Geiger, while a postdoctoral researcher for Ernest Rutherford, performed a key set of experiments that Rutherford eventually interpreted as evidence that atoms have tiny nuclei.) This counter, like our microscopic ball in Fig. 1b, simply records the arrival of high-energy subatomic projectiles. It does so by turning the passage of a single ultra-microscopic object into a measurable electric current. (Often it is designed to make a concurrent audible electronic “click” for ease of use.)
How does this device turn a single particle, with a lot of energy relative to a typical atomic energy level but very little relative to human activity, into something powerful enough to create a substantial, measurable electric current? The trick is to use the electric field to create a chain reaction.
The Electric Field
The electric field is present throughout the universe (like all cosmic fields). But usually, between the molecules of air or out in deep space, it is zero or quite small. However, when it is strong, as when you have just taken off a wool hat in winter, or just before a lightning strike, it can make your hair stand on end.
More generally, a strong electric field exerts a powerful pull on electrically charged objects, such as electrons or atomic nuclei. Positively charged objects will accelerate in one direction, while negatively charged objects will accelerate in the other. That means that a strong electric field will
separate positively charged objects from negatively charged objects
cause both types of objects to speed up, albeit in opposite direction,
Meanwhile electrically neutral objects are largely left alone.
The Strategy
So here’s the strategy behind the Geiger-Müller counter. Start with a gas of atoms, sitting inside of a closed tube in a region with a strong electric field. Atoms are electrically neutral, so they aren’t much affected by the electric field.
But the atoms will serve as our initial measurement devices. If a high-energy subatomic particle comes flying through the gas, it will strike some of the gas atoms and “ionize” them — that is, it will strip an electron off the atom. In doing so it breaks the electrically neutral atom into a negatively charged electron and a positively charged leftover, called an “ion.”
If it weren’t for the strong electric field, the story would remain microscopic; the relatively few ions and electrons would quickly find their way back together, and all evidence of the atomic-scale measurements would be lost. But instead, the powerful electric field causes the ions to move in one direction and the electrons to move in the opposite direction, so that they cannot simply rejoin each other. Not only that, the field causes these subatomic objects to speed up as they separate.
This is especially significant for the electrons, which pick up so much speed that they are able to ionize even more atoms — our secondary measurement devices. Now the number of electrons freed from their atoms has become much larger.
The effect is an chain reaction, with more and more electrons stripped off their atoms, accelerated by the electric field to high speed, allowing them in their turn to ionize yet more atoms. The resulting cascade, or “avalanche,” is called a Townsend discharge; it was discovered in the late 1890s. In a tiny fraction of a second, the small number of electrons liberated by the passage of a single subatomic particle has been multiplied exceedingly, and a crowd of electrons now moves through the gas.
The chain reaction continues until this electron mob arrives at a wire in the center of the counter — the final measurement device in the long chain from microscopic to macroscopic. The inflow of a huge number of the electrons onto the wire, combined with the flow of the ions onto the wall of the device, causes an electrical current to flow. Thanks to the amplification, this current is large enough to be easily detected, and in response a separate signal is sent to the device’s sound speaker, causing it to make a “click!”
Broader Lessons
It’s worth noting that the strategy behind the Geiger-Müller counter requires an input of energy from outside the device, supplied by a battery or the electrical grid. When you think about it, this is not surprising. After the initial step there are rather few moving electrons, and their total motion-energy is still rather low; but by the end of the avalanche, the motion-energy of the tremendous number of moving electrons is far greater. Since energy is conserved, that energy has to have come from somewhere.
Said another way, to keep the electric field strong amid all these charged particles, which would tend to cancel the field out, requires the maintenance of high voltage between the outer wall and inner wire of the counter. Doing so requires a powerful source of energy.
Without this added energy and the resulting amplification, the current from the few initially ionized atoms would be extremely small, and the information about the passing high-energy particle could easily be lost due to ordinary microscopic processes. But the chain reaction’s amplification of the number of electrons and their total amount of energy dramatically increases the current and reduces the risk of losing the information.
Many devices, such as the photomultiplier tube for the detection of photons [particles of light], are like the Geiger-Müller counter in using an external source of energy to boost a microscopic effect. Other devices (like the cloud chamber) use natural forms of amplification that can occur in unstable systems. (The basic principle is similar to what happens with unstable snow on a steep slope: as any off-piste skier will warn you, under the correct circumstances a minor disturbance can cause a mountain-wide snow avalanche.) If these issues interest you, I suggest you read more about the various detectors and subdetectors at ongoing particle experiments, such as those at the Large Hadron Collider.
Amplification in a Simplified Setting
I’ve described the Geiger-Müller counter without any explicit reference to quantum physics. Is there any hope that we could understand how this process really takes place using quantum language, complete with a wave function?
Not in practice: the chain reaction is far, far too complicated. A quantum system’s wave function does not exist in the physical space we live in; it exists in the space of possibilities. Amplification involving hordes of electrons and ions forces us to consider a gigantic space of possibilities; for instance, a million particles moving in our familiar three spatial dimensions would correspond to a space of possibilities that has three million dimensions. Neither you nor I nor the world’s most expert mathematical physicist can visualize that.
Nevertheless, we can gain intuition about the basic idea behind this device by simplifying the chain reaction into a minimal form, one that involves just three objects moving in one dimension, and three stages:
an initial measurement involving something microscopic
addition of energy to the microscopic measurement device
transfer of the information by a second measurement to something less microscopic and more stable.
You can think of these as the first steps of a chain reaction.
So let’s explore this simplified idea. As I often do, I’ll start with a pre-quantum viewpoint, and use that to understand what is happening in a corresponding quantum wave function.
The Pre-Quantum View
The pre-quantum viewpoint differs from that in my last post (which you should read if you haven’t already) in that we have two steps in the measurement rather than just one:
a projectile is measured by a microscopic ball (the “microball”),
the microball is similarly measured by a larger device, which I’ll refer to as the “macroball”.
The projectile, microball and macroball will be colored purple, blue and orange, and their positions along the x-axis of physical space will be referred to as x1, x2 and x3. Our space of possibilities then is a three-dimensional space consisting of all possible values of x1, x2 and x3.
The two-step measurement process really involves four stages:
The projectile approaches the stationary balls from the left.
The projectile collides with the microball and (in a small change from the last post, for convenience) bounces off to the left, leaving the microball moving to the right.
The microball is then subject to a force that greatly accelerates it, so that it soon carries a great deal of motion-energy.
The highly energetic microball now bounces off the macroball, sending the latter into motion.
The view of this process in physical space is shown on the left side of Fig 2. Notice the acceleration of the microball between the two collisions.
Figure 2: (Left) In physical space, the projectile travels to the right and strikes the stationary microball, causing the latter to move; the microball is then accelerated to high speed and strikes the macroball, which recoils in response. The information from the initial collision has been transferred to the more stable macroball.(Right) The same process seen in the space of possibilities; note the labels on the axes. The system is marked by a red dot, with a gray trail showing its history. Note the two collisions and the acceleration between them. At the end, the system’s x3 is increasing, reflecting the macroball’s motion
On the right side of Fig. 2, the motion of the three-object system within the space of possibilities is shown by the moving red dot. To make it easier to see how the red dot moves acrossthe space of possibilities, I’ve plotted its trail across that space as a gray line. Notice there are two collisions, the first one when the projectile and microball collide (x1=x2) and the second where the two balls collide (x2=x3), resulting in two sudden changes in the motion of the dot. Notice also the rapid acceleration between the first collision and the second, as the microball gains sufficient energy to give the macroball appreciable speed.
The Quantum View
In quantum physics, the idea is the same, where the dot representing the system’s value of (x1, x2, x3) is replaced by the peak of a spread-out wave function. It’s difficult to plot a wave function in three dimensions, but I can at least mark out the region where its absolute value is large — where the probability to find the system is highest. I’ve sketched this in Fig. 3. Not surprisingly if follows the same path as the system in Fig. 2.
Figure 3: Sketch of the wave function for this system (compare to Fig. 2a), showing only the location of the highest peak of the wave function (the region where we are most likely to find the system.)
In the pre-quantum case of Fig. 2, the red dot asserts certainty; if we were to measure x1, x2 and/or x3, we would find exactly the values of these quantities corresponding to the location of the dot. In quantum physics of Fig. 3, the peak of the wave function asserts high probability but not certainty. The wave function is spread out; we don’t know exactly what we would find if we directly measured x1, x2 and x3 at any particular moment.
Still, the path of the wave function’s peak is very similar to the path of the red dot, as was also true in the previous post. Generally, in the examples we’ve looked at so far, we haven’t shown much difference between the pre-quantum viewpoint and the quantum viewpoint. You might even be wondering if they’re more similar than people say. But there can be big differences, as we will see very soon.
The Wider View
If I could draw something with more than three dimensions, we could add another stage to our microball and macroball; we could accelerate the macroball and cause it to collide with something even larger, perhaps visible to the naked eye. Or instead of one macroball, we could amplify and transfer the microball’s energy to ten microballs, which in turn could have their energy amplified and transferred to a hundred microballs… and then we would have something akin to a Townsend discharge avalanche and a Geiger-Müller counter. Both in pre-quantum and in quantum physics, this would be impossible to draw; the space of possibilities is far too large. Nevertheless, the simple example in Figs. 2 and 3 provides some intuition for how a longer chain of amplification would work. It shows the basic steps needed to turn a fragile microscopic measurement into a robust macroscopic one, suitable for human scientific research or for our sense perceptions in daily living.
In the articles that will follow, I will generally assume (unless specified otherwise) that each microscopic measurement that I describe is followed by this kind of amplification and conversion to something macroscopic. I won’t be able to draw it, but as we can see in this example, the fundamental underlying idea isn’t that hard to understand.
This week’s lectures on instantons in my gauge theory class (a very important kind of theory for understanding many phenomenon in nature – light is an example of a phenomenon that is described by gauge theory) were a lot of fun to do, and mark the culmination of a month-long … Click to continue reading this post →
“It leverages the world’s first topoconductor, a breakthrough type of material which can observe and control Majorana particles to produce more reliable and scalable qubits, which are the building blocks for quantum computers.”
That sounds wild! Are they really using particles in a computer?
A: All computers use particles. Electrons are particles!
Q: You know what I mean!
A: You’re asking if these are “particle physics” particles, like the weird types they try to observe at the LHC?
No, they’re not.
Particle physicists use a mathematical framework called quantum field theory, where particles are ripples in things called quantum fields that describe properties of the universe. But they aren’t the only people to use that framework. Instead of studying properties of the universe you can study properties of materials, weird alloys and layers of metal and crystal that do weird and useful things. The properties of these materials can be approximately described with the same math, with quantum fields. Just as the properties of the universe ripple to produce particles, these properties of materials ripple to produce what are called quasiparticles. Ultimately, these quasiparticles come down to movements of ordinary matter, usually electrons in the original material. They’re just described with a kind of math that makes them look like their own particles.
Q: So, what are these Majorana particles supposed to be?
A: In quantum field theory, most particles come with an antimatter partner. Electrons, for example, have partners called positrons, with a positive electric charge instead of a negative one. These antimatter partners have to exist due to the math of quantum field theory, but there is a way out: some particles are their own antimatter partner, letting one particle cover both roles. This happens for some “particle physics particles”, but all the examples we’ve found are a type of particle called a “boson”, particles related to forces. In 1937, the physicist Ettore Majorana figured out the math you would need to make a particle like this that was a fermion instead, the other main type of particle that includes electrons and protons. So far, we haven’t found one of these Majorana fermions in nature, though some people think the elusive neutrino particles could be an example. Others, though, have tried instead to find a material described by Majorana’s theory. This should in principle be easier, you can build a lot of different materials after all. But it’s proven quite hard for people to do. Back in 2018, Microsoft claimed they’d managed this, but had to retract the claim. This time, they seem more confident, though the scientific community is still not convinced.
Q: And what’s this topoconductor they’re talking about?
A: Topoconductor is short for topological superconductor. Superconductors are materials that conduct electricity much better than ordinary metals.
Q: And, topological means? Something about donuts, right?
A: If you’ve heard anything about topology, you’ve heard that it’s a type of mathematics where donuts are equivalent to coffee cups. You might have seen an animation of a coffee cup being squished and mushed around until the ring of the handle becomes the ring of a donut.
This isn’t actually the important part of topology. The important part is that, in topology, a ball is not equivalent to a donut.
Topology is the study of which things can change smoothly into one another. If you want to change a donut into a ball, you have to slice through the donut’s ring or break the surface inside. You can’t smoothly change one to another. Topologists study shapes of different kinds of things, figuring out which ones can be changed into each other smoothly and which can’t.
Q: What does any of that have to do with quantum computers?
A: The shapes topologists study aren’t always as simple as donuts and coffee cups. They can also study the shape of quantum fields, figuring out which types of quantum fields can change smoothly into each other and which can’t.
The idea of topological quantum computation is to use those rules about what can change into each other to encode information. You can imagine a ball encoding zero, and a donut encoding one. A coffee cup would then also encode one, because it can change smoothly into a donut, while a box would encode zero because you can squash the corners to make it a ball. This helps, because it means that you don’t screw up your information by making smooth changes. If you accidentally drop your box that encodes zero and squish a corner, it will still encode zero.
This matters in quantum computing because it is very easy to screw up quantum information. Quantum computers are very delicate, and making them work reliably has been immensely challenging, requiring people to build much bigger quantum computers so they can do each calculation with many redundant backups. The hope is that topological superconductors would make this easier, by encoding information in a way that is hard to accidentally change.
Q: Cool. So does that mean Microsoft has the best quantum computer now?
A: The machine Microsoft just announced has only a single qubit, the quantum equivalent of just a single bit of computer memory. At this point, it can’t do any calculations. It can just be read, giving one or zero. The hope is that the power of the new method will let Microsoft catch up with companies that have computers with hundred of qubits, and help them arrive faster at the millions of qubits that will be needed to do anything useful.
Q: Ah, ok. But it sounds like they accomplished some crazy Majorana stuff at least, right?
A: Umm…
Read the Shtetl-Optimized FAQ if you want more details. The short answer is that this is still controversial. So far, the evidence they’ve made public isn’t enough to show that they found these Majorana quasiparticles, or that they made a topological superconductor. They say they have more recent evidence that they haven’t published yet. We’ll see.
Nature could be said to be constructed out an immense number of physical processes… indeed, that’s almost the definition of “physics”. But what makes a physical process a measurement? And once we understand that, what makes a measurement in quantum physics, a fraught topic, different from measurements that we typically perform as teenagers in a grade school science class?
We could have a long debate about this. But for now I prefer to just give examples that illustrate some key features of measurements, and to focus attention on perhaps the simplest intuitive measurement device… one that we’ll explore further and put to use in many interesting examples of quantum physics.
Measurements and Devices
We typically think of measurements as something that humans do. But not all measurements are human artifice. A small fraction of physical processes are natural measurements, occuring without human intervention. What distinguishes a measurement from some other garden variety process?
A central element of a measurement is a device, natural or artificial, simple or complicated, that records some aspect of a physical process semi-permanently, so that this record can be read out after the process is over, at least for a little while.
For example, the Earth itself can serve as a measurement device. Meteor Crater in Arizona, USA is the record of a crude measurement of the size, energy and speed of a large rock, as well of how long ago it impacted Earth’s surface. No human set out to make the measurement, but the crater’s details are just as revealing as any human experiment. It’s true that to appreciate and understand this measurement fully requires work by humans: theoretical calculations and additional measurements. But still, it’s the Earth that recorded the event and stored the data, as any measurement device should.
Figure 1: A rock’s energy, measured by the Earth. Meteor Crater, Arizona, USA; National Map Seamless Server – NASA Earth Observatory
The Earth has served as a measurement device in many other ways: its fossils have recorded its life forms, its sedimentary rocks have recorded the presence of its ancient seas, and a layer of iridium and shocked quartz have provided a record of the giant meteor that killed off the dinosaurs (excepting birds) along with many other species. The data from those measurements sat for many millions of years, unknown until human scientists began reading it out.
I’m being superficial here, skipping over all sorts of subtle issues. For instance, when does a measurement start, and when is it over? For instance, did the measurement of the rock that formed Meteor Crater start when the Earth and the future meteor were first created in the early days of the solar system, or only when the rock was within striking distance of our planet? Was it over when Meteor Crater had solidified, or was it complete when the first human measured its size and shape, or was it finished when humans first inferred the size of the rock that made the crater? I don’t want to dwell on these definitional questions today. The point I’m making here is that measurement has nothing intrinsically to do with human beings per se. It has to do with the ability to record a process in such a way that facts about that process can be extracted, long after the process is over.
The measurement device for any particular process has to satisfy some basic requirements.
Pre-measurement metastability: The device must be fairly stable before the process occurs, so that it doesn’t react or change prematurely, but not so stable that it can’t change when the process takes place.
Sensitivity: During the interaction between the device and whatever is being measured, the device needs to react or change in some substantial way that is predictable (at least in part).
Post-measurement stability: The change to the device during the measurement has to be semi-permanent, long-lasting enough that there’s time to detect and interpret it.
Interpretability: The change to the device has to be substantial and unambiguous enough that it can be used to extract information about what was measured.
Examples of Devices
A simple example: consider a small paper cup as a device for measuring the possible passage of a rubber ball. If the paper cup is sitting on a flat, horizontal table, it is reasonably stable and won’t go anywhere, barring a strong gust of wind. But if a rubber ball goes flying by and hits the cup, the cup will be knocked off the table… and thus the cup is very sensitive to the collision with the ball. The change is also stable and semi-permanent; once the cup is on the floor, it won’t spontaneously jump back up onto the table. And so, after setting a cup on a table in a windowless room near a squash court and returning days later, we can figure out from the position of the cup whether a rubber ball (or something similar) has passed close to the cup while we were away. Of course, this is a very crude measurement, but it captures the main idea.
Incidentally, such a measurement is sometimes referred to as “non-destructive”: the cup is so flimsy that its the effect of the cup on the ball is very limited, and so the ball continues onward almost unaffected. This is in contrast to the measurement of the rock that made Meteor Crater, which most certainly was “destructive” to the rock.
Yet even in this destructive event, all the criteria for a measurement are met. The Earth and its dry surface in Arizona are (and were) pretty stable over millennia, despite erosion. The Earth’s surface is very sensitive to a projectile fifty meters across and moving at ten or more kilometers per second; and the resulting deep, slowly-eroding crater represents a substantial, semi-permanent change that we can interpret roughly 50,000 years later.
In Figure 2 is a very simple and crude device designed to measure disturbances ranging from earthquakes to collisions. It consists of a ball sitting stationary within a dimple (a low spot) on a hill. It will remain there as long as it isn’t jostled — it is reasonably stable. But it is sensitive: if an earthquake occurs, or if something comes flying through the air and strikes the ball, it will pop out of the dimple. Then it will roll down the hill, never to return to the its original perch — thus leaving a long-lasting record of the process that disturbed it. We can later read the ball’s absence from the dimple, or its presence far off to the right, as evidence of some kind of violent disturbance, whereas if it remains in the dimple we may conclude that no such violent disturbance has occurred.
Figure 2: If the ball in the dip is subjected to a disturbance, it will end up rolling off to the right, thus recording the existence of the event that disturbed it.
What about measurement devices in quantum physics? The needs are often the same; a measurement still requires a stable yet sensitive device that can respond to an interaction in a substantial, semi-permanent, interpretable way.
Today we’ll keep things very simple, and limit ourselves to a quantum version of Fig. 2, employed in the simplest of circumstances. But soon we’ll see that when measurements involve quantum physics, surprising and unfamiliar issues quickly arise.
An Simple Device for Quantum Measurement
Here’s an example of a suitable device, a sort of microscopic version of Fig. 2. Imagine a small ball of material, perhaps a few atoms wide, that is gently trapped in place by forces that are strong but not too strong. (These might be of the form of an ion trap or an atom trap; or we might even be speaking of a single atom incorporated into a much larger molecule. The details do not matter here.) This being quantum physics, the trap might not hold the ball in place forever, thanks to the process known as “tunneling“; but it can be arranged to stay in place long enough for our purposes.
Figure 3: A nearly-atomic-sized object in an idealized trap; if jostled sharply, it may move past the dark ring and permanently escape.
If the ball is bumped by an atom or subatomic particle flying by at high speed, it may be knocked out of its trap, following which it will keep moving. So if we look in the trap and discover it empty, or if we find the ball far outside the trap, we will know that some energetic object must have passed through the trap. The ball’s location and motion record the existence of that passing object. (They also potentially record additional information, depending on how we set up the experiment, about the object’s motion-energy and its time of arrival.)
To appreciate a measurement involving quantum physics, it’s often best to first think through what happens in a pre-quantum version of the same scenario. Doing so gives us an opportunity to use two complementary views of the measurement: an intuitive one in physical space and more abstract one in the space of possibilities. This will help us interpret the quantum case, where an understanding of a measurement can only be achieved in the space of possibilities.
A Measurement in Pre-Quantum Physics
We’re going to imagine that an incoming projectile (which I’ll draw in purple) is moving along a straight line (which we’ll call the x-axis) and strikes the measuring device — the ball (which I’ll draw in blue) sitting inside its trap. To keep things simple enough to draw, I’ll assume that any collision that occurs will leave the ball and projectile still moving along the x-axis.
With these two objects restricted to a one-dimensional line, our space of possibilities will be two-dimensional, one dimension representing the possible positions x1 of the projectile, and the other representing the possible positions x2 of the ball. (If you are not at all familiar with the space of possibilities and how to think about it, I recommend you first read this article, which addresses the key ideas, and this article, which gives an example very much relevant to this post.)
Below in Fig. 4 is an animation showing what happens, from two viewpoints, as the projectile strikes the ball, allowing the ball’s motion to measure the passage of the projectile.
The first (at left) is the familiar viewpoint: what would happen before our eyes, in physical space, if these objects were big enough to see. The projectile moves to the right, with the ball stationary; a collision occurs, following which the projectile continues on the right, albeit a bit more slowly, and the ball, having popped out of its trap, moves off the the right.
The second viewpoint (at right) is not something we could see; it happens in the space of possibilities (or “configuration space,”) which we can see only in our minds. In this two-dimensional space, with axes that are the projectile’s and ball’s possible positions x1 and x2, the system — the combination of the projectile and ball — is at any moment sitting at one point. That point is indicated by a star; its location has as its x1 coordinate the projectile’s position at a moment in time, while its x2 coordinate is the ball’s position at that same moment in time.
Figure 4: (Left) In physical space, the projectile travels to the right and strikes the stationary ball, causing the latter to move. (Right) The same process seen in the space of possibilities; note the labels on the axes. On the diagonal line, the two objects would be coincident in physical space, with x1 = x2.
The two animations are synchronized in time. I suggest you spend some time with the animation until it is clear to you what is happening.
Initially, the star moves horizontally. This indicates that the value of x2 isn’t changing; the ball is stationary. Both x1 and x2 are initially negative, so the star is in the lower-left quadrant.
Notice the diagonal line, at x1 = x2 ; if the system is on that line, a collision between the two objects is occurring, since they are at the same point. It is when the star reaches this line that the ball begins to move, and the star’s motion is correspondingly no longer horizontal.
After the collision, both the projectile and ball move to the right, which means the values of x1 and x2 are both increasing. This in turn means that the star moves up and to the right following the collision, eventually reaching the upper-right quadrant where both x1 and x2 are positive.
By contrast, if the measurement device were switched off, so that the projectile and the ball could no longer interact, the projectile would just continue its constant motion to the right, unchanged, and the ball would remain at its initial location, as in Fig. 5. In the space of possibilities, the star would move to the right as the projectile’s position x1 steadily increases, while it would remain at the same vertical level because the ball’s position x2 is never changing.
Figure 5: Same as Fig. 4 except that no collision occurs; the ball remains stationary and the projectile continues on steadily.
The Same Measurement in Quantum Physics
Now, how do we describe the measurement in quantum physics? In general we cannot portray what happens in a quantum system using only physical space. Instead, our system of two objects is described by a single wave function, which is a function of the space of possibilities. That is, it is a function of x1 and x2, and also time, since it changes from moment to moment. [Important: the system is not described by two wave functions (i.e., one per object), and the single wave function of the system is not a function of physical space, with its coordinate x. There is one wave function, and it is a function of all possibilities.]
At each moment in time, and for each possible arrangement of the system — for each of the possible locations of the two objects, with the projectile having position x1 and the ball having position x2 — this function gives us a complex number Ψ(x1, x2; t). The absolute value squared of this number gives us the probability of the corresponding possibility — the probability that if we choose to measure the positions of the projectile and ball, we will find the projectile has position x1 and that the ball has position x2.
What I’m going to do now is plot for you this wave function, using a 3d plot, where two of the axes are x1 and x2 and the third axis is the absolute value of Ψ(x1, x2; t). [Not its square, though the difference doesn’t matter much here.] The colors give the argument (or “phase”) of the complex number Ψ(x1, x2; t). As suggested by recent plots where we looked at wave functions for a single particle, the flow of the color bands often conveys the motion of the system across the space of possibilities; you’ll see this in the patterns below.
Going in the reverse order from above, let’s first look at the quantum wave function corresponding to Fig. 5, when no measurement takes place and the projectile passes by the ball unimpeded. You can see that the peak in the wave function, telling us most probable values for the results of measurements of x1and x2, if carried out at a specific time t, moves along roughly the same path as the star in Fig. 5: the most probable values of x1increase steadily with time, while those of x2 remain fixed.
Figure 6: The wave function corresponding to a quantum version of Fig. 5, with no measurement carried out; the system is most likely to be to be found where the wave function is largest. The projectile’s most likely position x1 steadily increases while the most likely position x2 of the ball remains constant. Compare to the right-hand panel of Fig. 5.
In this situation, the ball’s behavior has nothing to do with the projectile. We cannot learn anything one way or the other about the projectile from the position or motion of the ball.
What about when a measurement takes place, as in Fig. 4? Again, as seen in Fig. 7, the majority of the wave function follows the path of the star, with the most probable values of x2beginning to increase around the most likely time of the collision. This change in the most likely value of x2 is an indication of the presence of the projectile and its interaction with the ball. [Note: Fig. 7, unlike other quantum wave functions shown in this series, is a sketch, not a precise solution to the full quantum equations; I simply haven’t yet found a set-up where the equations can be solved numerically with enough precision and speed to get a nice-looking result. I expect I’ll eventually find an example, but it might take some time.]
Figure 7: As in Fig. 6, but including the measurement illustrated in Fig. 4. [Note this is only a sketch, not a full calculation.] The most likely position x2 of the ball is initially constant but begins to increase following the collision, thus recording the observation of the projectile. Compare to the right-hand panel of Fig. 4.
More precisely, because of the collision, the motion of the ball is now correlated with that of the projectile — their motions are logically and physically related. That by itself is not unusual; all interactions between objects lead to some level of correlation between them. But this correlation is stable; as a result of the collision, the ball is highly unlikely to be found back in its initial position. And so, when we later look at the trap and find it empty, this does indeed give us reliable information about the projectile, namely that at some point it passed through the trap. (This type of correlation, both within and beyond the measurement context, will be a major topic in the future.)
So far, this all looks quite straightforward. The motion of the star in Fig. 4 is seen in the motion of the peak of the wave function in Fig. 7. Similar behavior is seen in Figs. 5 and 6. But these are simple cases: where the projectile’s motion is well-known, its location is not too uncertain, and the measurement device is almost perfect. We will soon explore far more complex and interesting quantum examples, using this simple one as our conceptual foundation, and things won’t be so straightforward anymore.
I’ll stop here for today. Please let me know in the comments if there are aspects of this story that you find confusing; we need all to be on the same page before we advance into the more subtle elements of our quantum world.
Update (Feb. 27): While we’re on the subject of theoretical computer science, friends-of-the-blog Adam Klivans and Raghu Meka have asked me to publicize that STOC’2025 TheoryFest, to be held June 23-27 in Prague, is eagerly seeking proposals for workshops. The deadline is March 9th.
Because of a recent breakthrough by Cook and Mertz on Tree Evaluation, Ryan now shows that every problem solvable in t time on a multitape Turing machine is also solvable in close to √t space
As a consequence, he shows that there are problems solvable in O(n) space that require nearly quadratic time on multitape Turing machines
If this could be applied recursively to boost the polynomial degree, then P≠PSPACE
On Facebook, someone summarized this result as “there exists an elephant that can’t fit through a mouse hole.” I pointed out that for decades, we only knew how to show there was a blue whale that didn’t fit through the mouse hole
I’ll be off the Internet for much of today (hopefully only today?) because of jury duty! Good thing you’ll have Ryan’s amazing new paper to keep y’all busy…
Update (Feb. 25): It occurs to me that the new result is yet another vindication for Ryan’s style of doing complexity theory—a style that I’ve variously described with the phrases “ironic complexity theory” and “caffeinated alien reductions,” and that’s all about using surprising upper bounds for one thing to derive unsurprising lower bounds for a different thing, sometimes with a vertigo-inducing chain of implications in between. This style has a decidedly retro feel to it: it’s been clear since the 1960s both that there are surprising algorithms (for example for matrix multiplication), and that the time and space hierarchy theorems let us prove at least some separations. The dream for decades was to go fundamentally beyond that, separating complexity classes by “cracking their codes” and understanding the space of all possible things they can express. Alas, except for low-level circuit classes, that program has largely failed, for reasons partly explained by the Natural Proofs barrier. So Ryan achieves his successes by simply doubling down on two things that have worked since the beginning: (1) finding even more surprising algorithms (or borrowing surprising algorithms from other people), and then (2) combining those algorithms with time and space hierarchy theorems in clever ways to achieve new separations.
In the beginning, there were hardly any spaces whose magnitude we knew.
Line segments were about the best we could do. Then Mark
Meckesintroduced the technique of potential
functions for calculating magnitude, which was shown to be very
powerful. For instance, Juan Antonio
Barceló
and Tony
Carbery
used it to compute the magnitude of
odd-dimensional Euclidean balls, which turn out to be rational functions of
the radius. Using potential functions allows you to tap into the vast
repository of knowledge of PDEs.
In this post and the next, I’ll explain this technique from a
categorical viewpoint, saying almost nothing about the analytic details. This is
category theory as an organizational tool, used to help us understand how
the various ideas fit together. Specifically, I’ll explain potential
functions in terms of the magnitude of functors, which I wrote about
here a few weeks
ago.
Before I can describe this categorical viewpoint on potential functions, I have to explain what potential
functions are in the magnitude context, and why they’re very
useful. That’s what I’ll do today.
This part of the story is about metric spaces. For now I’ll assume they
satisfy all the classical axioms, including symmetry of the metric, meaning
that for all points and . When
metric spaces are viewed as enriched categories, symmetry isn’t automatic
— but we’ll come to that next time.
A weighting on a finite metric space is a function such that for all ,
Everyone who sees this formula for the first time asks where the
exponential comes from. Ultimately it’s because of the enriched category
viewpoint (which again we’ll come to next time), but the short story is that exponential is essentially the only
reasonable function that converts addition into multiplication.
For simplicity, I’ll assume here that every finite metric space has
a unique weighting, which I’ll call . Since the definition of
weighting involves the same number of equations as unknowns, this is
generically true (and it’s always true for subspaces of ), even though there
are exceptions.
The magnitude of is
That’s for finite metric spaces. To extend the definition to compact
metric spaces, there are various ideas you might try. You could define the
magnitude of a compact space as the supremum of all the magnitudes of its
finite subspaces. Or, you could take an ever-denser sequence of finite
subsets of your compact space, then define its magnitude to be the limit
of the magnitudes of the approximating subsets. Or, you could try replacing
the sums in the formulas above by integrals, somehow.
Mark Meckes showed that all these
approaches are equivalent. They all give the same definition of the magnitude
of a compact space. (At least, this is true subject to a condition called “positive definiteness”
which I won’t discuss and which always holds for subspaces of
.)
How do we actually calculate magnitude, say for compact subspaces of ?
In principle, you already know how do it. You run through all the finite subsets of
, calculate the magnitude of each using the definition above, and
take the sup. The trouble is, this procedure is incredibly hard to work
with. It’s completely impractical.
A slightly more practical approach is to look for a “weight measure”, that is, a
signed measure on such that for all ,
As Mark showed, if is a weight measure then the magnitude of is
given by . This is the analogue of the formula for finite spaces.
Example Take an interval in . It’s an
elementary exercise to check that
is a weight measure on , where and denote the
Dirac deltas at and , and is Lebesgue measure
on . It follows that is the total mass of this measure,
which is
In other words, it’s plus half the length of the interval.
The trouble with weight measures is that very few spaces have them. Even
Euclidean balls don’t, beyond dimension one. It turns out that we need
something more general than a measure, something more like a distribution.
In another paper, Mark worked out
exactly what kind of measure-like or distribution-like thing a weighting
should be. The answer is very nice, but I won’t explain it here, because I
want to highlight the formal aspects above all.
So I’ll simply write the weighting on a metric space as , without
worrying too much about what kind of thing actually is. All that
matters is that it behaves something like a measure: we can pair it with
“nice enough” functions to get a real number,
which I’ll write as
or simply
And I’ll assume that all the spaces that we discuss do have a unique
weighting .
That’s the background. But back to the question: how do we actually calculate magnitude?
First main ideaDon’t look at metric spaces in
isolation. Instead, consider spaces embedded in some big space that
we understand well.
Think of the big space as fixed. The typical choice is
. One sense in which we “understand” well is
that we have a weight measure on it: it’s just Lebesgue measure divided by
a constant factor . This follows easily from the fact that
is homogeneous. (It doesn’t matter for anything I’m going to
say, but the constant is , for
which there’s a standard formula involving s and factorials.)
The potential function of a subspace is the function defined by
By definition of weighting, has constant value on . But
outside , it could be anything, and it turns out that its behaviour
outside is very informative.
Although is a measure-like thing on the subspace of , we
typically extend it by to all of , and then the definition of the
potential function becomes
Examples In all these examples, I’ll take the big space to
be .
Let . Then the weighting on is the
Dirac delta at , and the potential function is given by
(), whose graph looks like this:
Let . As we’ve already seen,
and a little elementary work then reveals that the potential function
is given by
which looks like this:
In the two examples we just did, the potential function of is just the
negative exponential of the distance between and the argument
. That’s not exactly coincidence: as we’ll see in the next part, the
function just
described corresponds to a strict colimit, whereas the
potential function corresponds to a lax colimit. So it’s not
surprising that they coincide in simple cases.
But this only happens in the very simplest cases. It doesn’t happen for
Euclidean balls above dimension , or even for two-point spaces. For example, taking
the subset of , we have
which looks like this:
whereas looks like this:
Similar, but different!
And now we come to the:
Second main ideaWork with potential functions instead of
weightings.
To explain what this means, I need to tell you three good things about
potential functions.
The potential function determines the magnitude:
At the formal level, proving this is a one-line calculation: substitute the
definition of into the right-hand side and follow your nose.
For example, we saw that has weighting , where
is Lebesgue measure and is a known constant. So
for compact . Here refers to ordinary Lebesgue
integration.
(For , and in fact a bit more generally, this result appears as
Theorem 4.16 here.)
You can recover the weighting from the potential function. So, you
don’t lose any information by working with one rather than the other.
How do you recover it? Maybe it’s easiest to explain in the case when
the spaces are finite. If we write for the matrix
then the definition of the potential
function can be expressed very succinctly:
Here and are viewed as column vectors with entries indexed
by the points of . (For , those entries are for points not
in .) Assuming is invertible, this means we recover from
as . And something similar is true in the
non-finite setting.
However, what really makes the technique of potential functions sing is
that when , there’s a much more explicit way to recover the weighting from the potential function:
(up to a constant factor that I’ll ignore). Here is the identity and is the Laplace operator, . This is Proposition 5.9 of
Mark’s paper.
How much more of this bullet point you’ll want to read depends on how
interested you are in the analysis. The fundamental point is simply
that is some kind of differential
operator. But for those who want a bit more:
To make sense of everything, you need to interpret it all in a
distributional sense. In particular, this allows one to make
sense of the power , which is not an integer if is
even.
Maybe you wondered why someone might have proved a result on the
magnitude of odd-dimensional Euclidean balls only, as I
mentioned at the start of the post. What could cause the odd- and
even-dimensional cases to become separated? It’s because whether
or not is an integer depends on whether is odd or
even. When is odd, it’s an integer, which makes a differential rather than
pseudodifferential operator. Heiko Gimperlein, Magnus Goffeng and
Nikoletta Louca later worked out lots about the even-dimensional case,
but I won’t talk about that here.
Finally, where does the operator come
from? Sadly, I don’t have an intuitive explanation. Ultimately it
comes down to the fact that the Fourier transform of
is (up
to a constant). But that itself is a calculation that’s really
quite tricky (for me), and it’s hard to see anything beyond “it
is what it is”.
The third good thing about potential functions is that they satisfy a
differential equation, in the situation where our big space is
. Specifically:
Indeed, the definition of weighting implies that on ,
and the “second good thing” together with the fact that
is supported on give the second clause.
Not only do we have a differential equation for , we also have
boundary conditions. There are boundary conditions at the boundary of
, because of something I’ve been entirely vague about: the functions
we’re dealing with are meant to be suitably smooth. There are also
boundary conditions at , because our functions are also meant
to decay suitably fast.
Maybe Mark will read this and correct me if I’m wrong, but I believe
there are exactly the right number of boundary conditions to guarantee that there’s
(typically? always?) a unique solution. In any case, the following example —
also taken from Mark’s paper —
illustrates the situation.
Example Let’s calculate the magnitude of a real interval
using the potential function method.
Its potential function is a function such that on and on the rest of the real
line. That differential equation comes from taking in the
case .
The functions satisfying are those of the form for some constant , and in our case we’re free to choose the constant and the sign differently on the two connected components of . So there are lots of solutions. But is required to be continuous and to converge to
at , and that pins it down uniquely: the one and only solution is
I showed you the graph of this potential function above, in the case where .
So the magnitude of is
where is the constant . This gives the
answer:
The crucial point is that in this example, we didn’t have to come up with
a weighting on . The procedure was quite mechanical. And that’s the
attraction of the method of potential functions.
Next time, I’ll put all this into a categorical context using the notion
of the magnitude of a functor, which I introduced
here.
Despite the “2” in the title, you can follow this post without having read
part
1. The
whole point is to sneak up on the metricky, analysisy stuff about potential
functions from a categorical angle, by considering constructions that are categorically reasonable in their own right. Let’s go!
Let be a monoidal category, which I’ll assume is symmetric, closed,
complete or cocomplete as necessary. Take a pair of -categories and
, and a -functor
Then we get a new functor
defined by
This is sometimes called the nerve functor induce by ,
because of the first example in the following list.
Examples In the first few examples, I’ll take the enriching category
to be .
Let be the inclusion , which takes a
finite nonempty totally ordered set and realizes
it as a category in the usual way. Then
sends a small category to its nerve, which is the simplicial set
whose -simplices are paths in of length .
Let be the functor that sends to the
topological -simplex . Then sends a
topological space to its “singular simplicial set”, whose
-simplices are the continuous maps .
Let be an inclusion of partially ordered sets,
viewed as categories in the usual way. Then is given by
For example, let be the set of integers,
ordered by divisibility. Let be the set of
(positive) primes. Then is the functor that sends to
If has a right adjoint , then . In particular, is representable for each .
Take the inclusion , where is the
category of commutative rings. Now take opposites throughout to get
. The resulting nerve functor
sends a ring to (by which I mean the restriction of to the category of
fields). More explicitly,
Here is the set of prime ideals of , and
is the field of fractions of the integral domain . In particular,
is a coproduct of representables for each . (In the terminology of Diers, one therefore says
that is a “right multi-adjoint”. A closely related statement is that it’s a
parametric right adjoint. But that’s all just jargon that won’t matter for this post.)
Now let’s consider with its additive monoidal
structure, so that -categories are generalized metric spaces and
-functors are the functions between metric spaces that are variously called
short, distance-decreasing, contractive, or 1-Lipschitz: for all . I’ll just call them “maps” of metric
spaces.
Take a map of metric spaces. The induced map is given by
Unlike in the first post, I’m not going to assume here that our
metric spaces are symmetric. so the “op” on the
matters, and in general.
We’ll be particularly interested in the case where the -functor
is full and faithful. Most of the examples above have this property. In
the metric case, , being full and faithful means
being distance-preserving, or in other words, an inclusion of a metric
subspace. In that case, it’s normal to drop the . So we’d usually then write
Next I’ll define the potential function of a -functor. For this we
need some of the apparatus of magnitude. Roughly speaking, this means we
have a notion of the “size” of each object of our base monoidal category .
More exactly, we take a
field and a function , to be thought of as
assigning to each object of its size . And to make
everything work, we assume our size function has reasonable
properties:
where is the monoidal product on and is its unit.
The basic definitions are these. A weighting on a finite -category
is a function such that
for all , and a coweighting on is a weighting on
. Although it’s not true that every finite -category has a unique
weighting, it’s a harmless enough assumption that I’ll make it here. The
magnitude of a finite -category is
It’s a tiny lemma that the total weight is equal to
the total coweight , so you can equivalently define
the magnitude to be one or the other.
There’s a whole lot to say about why this is a worthwhile
definition, and I’ve said it in detail many times here before. But
here I’ll just say two short things:
Taking , , and to be cardinality,
the magnitude of a finite category is equal to the Euler characteristic of
its classifying space, under suitable hypotheses guaranteeing that the
latter is defined.
Take , , and to be (because that’s just about the only function converting the
tensor product of into addition, as our size functions are
contractually obliged to do). Then we get a notion of the magnitude of a
metric space, a real-valued invariant that turns out to be extremely
interesting.
That’s the magnitude of enriched categories. But we can also talk about
the magnitude of enriched presheaves. Take a finite -category and
a -presheaf . Its magnitude is
When I wrote about the magnitude of functors
before,
I concentrated on covariant functors , which meant that the
weights in the weighted sum that defines were, well, the weights
. But since we’ve now changed to , the weights have
become coweights.
Let me briefly recall why the magnitude of presheaves is
interesting, at least in the case :
If our presheaf is a coproduct of representables then is
, the cardinality of the colimit of .
The magnitude of presheaves generalizes the magnitude of categories. The
magnitude of a category is equal to the magnitude of the presheaf
with constant value .
In the other direction, the magnitude of categories generalizes the
magnitude of presheaves: for all presheaves
. Here means the category of elements of , also
called the (domain of the) Grothendieck construction.
It’s enlightening to think of this result as follows. If we extend the
codomain of from to then is the colax
colimit of . So . You can
compare this to the more limited result that , which only holds when
has the special property that it’s a coproduct of
representables.
There’s also another closely related result, due to Ponto and
Shulman: under mild
further hypotheses, . Mike and I had a little
discussion about the relationship
here.
Now here’s something funny. Which do you think is a more refined
invariant of a presheaf : the magnitude ,
which is equal to and , or the cardinality
of the strict colimit?
From one perspective, it’s the magnitude. After all, we usually think
of lax or pseudo 2-dimensional things as more subtle and revealing than
strict 1-dimensional things. But from another, it’s the cardinality of
the strict colimit. For instance, if is the one-object category
corresponding to a group then is a right -set, the
cardinality of the colimit is the number of orbits (an interesting
quantity), but the magnitude is always just the cardinality of the set
divided by the order of (relatively boring and crude).
In this post I’ll keep comparing these two quantities. Whichever seems
“more refined”, the comparison is interesting in itself.
(I stuck to the case here, because for an arbitrary
, there isn’t a notion of “colimit” as such: we’d need to think about
weighted colimits. It’s also harder to say what the category of elements
or homotopy colimit of an enriched presheaf should be.)
Now here’s something new. The potential function of a -functor is the function
defined by
for all . Expanding out the definitions, this means that
where is the notation I introduced for the coweighting on . If is full and
faithful then on the set of objects in the image of .
Examples The definition I just gave implicitly assumes that
is finite. But I’ll relax that assumption in some of these examples,
at the cost of some things getting less than rigorous.
Again, I’ll begin with some examples where .
What’s the potential function of the inclusion ? To make sense of this, we need a coweighting on , and I’m
pretty sure there isn’t one. So let’s abandon and instead use
its subcategory , consisting of all the finite nonempty
totally ordered sets but only the injective
order-preserving maps between them. This amounts to considering only face
maps, not degeneracies.
The coweighting on is . So, the
potential function of is given, on a category
, by
Here is the set of paths of length in , which we
often write as . So
Even if the category is finite, this is a divergent series. But
morally, is the alternating sum of the number of -cells
of .
In other words, the potential function of is morally Euler characteristic.
As I hinted above, it’s interesting to compare the potential function
with what you get if you take the cardinality of the colimit
of instead of its magnitude. I’ll call this the
“strict potential function”. In this case, it’s
— the cardinality of the set of connected-components of . So
while the potential function gives Euler characteristic, the strict
potential function gives the number of connected components.
Similarly, the potential function of the usual functor , sending to , is given by
for topological spaces . Again, this formally looks like Euler
characteristic: the alternating sum of the number of cells in each
dimension. And much as for categories, the strict potential function
gives the number of path-components.
For the inclusion that we
saw earlier, the potential function is
what’s often denoted by , sending an integer to the
number of distinct prime factors of . The strict potential function is
the same.
An easy example that works for any : if has a right
adjoint then since is representable, its magnitude is equal to
the cardinality of its colimit, which is . So has
constant value , as does the strict potential function.
More generally, if is a coproduct of representables then
, and the same is true for the strict potential function.
For the inclusion , we saw that
for every ring . This is a coproduct of representables, so by the
previous example, . That is, the potential
function is given by
(as is the strict potential function). Usually is infinite,
so this calculation is dodgy, but if we restrict ourselves to finite
fields and rings then everything is above board.
For an opfibration , one can show that the potential function
is given by
I should explain the notation on the right-hand side. The category is the fibre over ,
consisting of the objects of that sends to and the maps in
that sends to . We can take its
magnitude (at least if it’s finite and lightning doesn’t strike), which is
what I’ve denoted by .
I won’t include the proof of this result, but I want to emphasize that
it doesn’t involve finding a coweighting on . You might think it
would, because the definition of involves the coweighting on
. But it turns out that it’s enough to just assume it exists.
Now take , so that -categories are generalized
metric spaces. The potential function of a map of metric
spaces is the function given by
In particular, if is the inclusion of a subspace , then
it’s natural to write instead of , and we have
This is equal to on but can take other values on . In suitable infinite contexts, sums become integrals and
becomes something like a measure or distribution, in which case
the formula becomes
This is exactly the formula for the potential function in Part
1,
with one difference: there, I used weightings on , and here, I’m using
coweightings. It’s coweightings that one should use. In the previous
post, I assumed that all metrics were symmetric, which means that
weightings and coweightings are the same. So there’s no inconsistency.
(Of course, one could take duals throughout and use weightings
on instead. But we’ll see that whichever choice you make, you end up
having to consider weightings on one of and and coweightings on
the other.)
What about the strict potential function, sending a point to the “cardinality of the colimit of ”? Well, I put that phrase in inverted commas because
we’re now in an enriched context, so it needs a bit of
interpretation. “Cardinality” is okay: it becomes the size function
. “Colimit”
wouldn’t usually make sense in an enriched world, but we’re saved by
the fact that the monoidal unit of (namely, ) is
terminal. The colimit of a -functor into is
just its infimum. So the strict potential function of a map of metric spaces is just
I showed you an example of the difference between the potential
function and the strict potential function in Part
1,
although not with those words. If we take to be the inclusion
then the potential function is
whereas the strict potential function is
These two functions are identical on , but different
on the interval . If you’ve ever wondered what the difference
is between strict and lax colimits, here’s an example!
By definition, the potential function of an enriched functor is given by
but a slightly different viewpoint is sometimes helpful. We can take the
function and push it forward
forward along to obtain a new function
Really it’s best to think of as not a function but a measure taking
values in (although these are the same thing on a finite set). Then
this is just the usual pushforward measure construction. In any case, the
formula for the potential function now becomes
which has the advantage that everything is taking place in rather than
. In the case where is an embedding of a subspace of a
metric space, is just the extension of the measure on
to all of , which we’d usually just write as (with a wink to the
audience). In integral notation, the last formula becomes
I’ve now explained what potential functions are. But what are
they good for?
Last
time,
I explained that they’re very good indeed for helping us to calculate the
magnitude of metric spaces. The key was that
for a subspace of a metric space . And as I recounted, that key
unlocks the door to the world of PDE methods.
So you might hope something similar is true for enriched categories in
general: that there’s a formula for the magnitude in terms of the potential
function. And there is! For any -functor , it’s a theorem that
This is an easy calculation from the definitions.
(If you’re really paying attention, you’ll notice that we used the
coweighting on to define the potential function, and now we’re using
the weighting on . That’s just how it turns out. One is a weighting, and the other is a coweighting.)
Examples
For a -functor that has a right adjoint, we saw that the
potential function has constant value , so this formula tells us
that
But the right-hand side is by definition the magnitude of , so what
this formula is saying is that
In other words, if there’s an adjunction between two categories, their
magnitudes are equal!
This has been known forever
(Proposition 2.4), and is also intuitive from a homotopical
viewpoint. But it’s nice that it just pops out.
Less trivially, we saw above that for an opfibration , the
potential function is . So
In other words, the magnitude of the “total” category is the
weighted sum over the base of the magnitudes of the fibres. (Special
case: the magnitude of a product is the product of the magnitudes.)
Now this has been known forever too (Proposition
2.8), but I want to emphasize
that the proof is fundamentally different from the one I just
linked. That proof constructs the weighting on from the
weightings on the base and the fibres. Now that’s an easy and
informative proof, but what we’ve just done is different, because it
didn’t involve figuring out the weighting or coweighting on . So
although the result isn’t new or difficult, it’s perhaps grounds for
optimism that the method of potential functions will let us prove new things about enriched categories other than metric spaces.
What new things? I don’t know! This is where I’ve got to now. Maybe there
are applications in the metric world in which is nothing like an
inclusion. Maybe there are applications to graphs, replacing the PDE
methods used for subspaces of by discrete
analogues. Maybe
the potential function method can be used to shed light on the tricky result that
the magnitude of graphs is invariant under certain Whitney twists (Theorem
5.2), and more generally under the sycamore
twists introduced by Emily Roff (Theorem
6.5). Let’s find out!
There has been some spectacular progress in geometric measure theory: Hong Wang and Joshua Zahl have just released a preprint that resolves the three-dimensional case of the infamous Kakeya set conjecture! This conjecture asserts that a Kakeya set – a subset of that contains a unit line segment in every direction, must have Minkowski and Hausdorff dimension equal to three. (There is also a stronger “maximal function” version of this conjecture that remains open at present, although the methods of this paper will give some non-trivial bounds on this maximal function.) It is common to discretize this conjecture in terms of small scale . Roughly speaking, the conjecture then asserts that if one has a family of tubes of cardinality , and pointing in a -separated set of directions, then the union of these tubes should have volume . Here we shall be a little vague as to what means here, but roughly one should think of this as “up to factors of the form for any “; in particular this notation can absorb any logarithmic losses that might arise for instance from a dyadic pigeonholing argument. For technical reasons (including the need to invoke the aforementioned dyadic pigeonholing), one actually works with slightly smaller sets , where is a “shading” of the tubes in that assigns a large subset of to each tube in the collection; but for this discussion we shall ignore this subtlety and pretend that we can always work with the full tubes.
Previous results in this area tended to center around lower bounds of the form
for various intermediate dimensions , that one would like to make as large as possible. For instance, just from considering a single tube in this collection, one can easily establish (1) with . By just using the fact that two lines in intersect in a point (or more precisely, a more quantitative estimate on the volume between the intersection of two tubes, based on the angle of intersection), combined with a now classical -based argument of Córdoba, one can obtain (1) with (and this type of argument also resolves the Kakeya conjecture in two dimensions). In 1995, building on earlier work by Bourgain, Wolff famously obtained(1) with using what is now known as the “Wolff hairbrush argument”, based on considering the size of a “hairbrush” – the union of all the tubes that pass through a single tube (the hairbrush “stem”) in the collection.
In their new paper, Wang and Zahl established (1) for . The proof is lengthy (127 pages!), and relies crucially on their previous paper establishing a key “sticky” case of the conjecture. Here, I thought I would try to summarize the high level strategy of proof, omitting many details and also oversimplifying the argument at various places for sake of exposition. The argument does use many ideas from previous literature, including some from my own papers with co-authors; but the case analysis and iterative schemes required are remarkably sophisticated and delicate, with multiple new ideas needed to close the full argument.
A natural strategy to prove (1) would be to try to induct on : if we let represent the assertion that (1) holds for all configurations of tubes of dimensions , with -separated directions, we could try to prove some implication of the form for all , where is some small positive quantity depending on . Iterating this, one could hope to get arbitrarily close to .
A general principle with these sorts of continuous induction arguments is to first obtain the trivial implication in a non-trivial fashion, with the hope that this non-trivial argument can somehow be perturbed or optimized to get the crucial improvement . The standard strategy for doing this, since the work of Bourgain and then Wolff in the 1990s (with precursors in older work of Córdoba), is to perform some sort of “induction on scales”. Here is the basic idea. Let us call the tubes in “thin tubes”. We can try to group these thin tubes into “fat tubes” of dimension for some intermediate scale ; it is not terribly important for this sketch precisely what intermediate value is chosen here, but one could for instance set if desired. Because of the -separated nature of the directions in , there can only be at most thin tubes in a given fat tube, and so we need at least fat tubes to cover the thin tubes. Let us suppose for now that we are in the “sticky” case where the thin tubes stick together inside fat tubes as much as possible, so that there are in fact a collection of fat tubes , with each fat tube containing about of the thin tubes. Let us also assume that the fat tubes are -separated in direction, which is an assumption which is highly consistent with the other assumptions made here.
If we already have the hypothesis , then by applying it at scale instead of we conclude a lower bound on the volume occupied by fat tubes:
Since , this morally tells us that the typical multiplicity of the fat tubes is ; a typical point in should belong to about fat tubes.
Now, inside each fat tube , we are assuming that we have about thin tubes that are -separated in direction. If we perform a linear rescaling around the axis of the fat tube by a factor of to turn it into a tube, this would inflate the thin tubes to be rescaled tubes of dimensions , which would now be -separated in direction. This rescaling does not affect the multiplicity of the tubes. Applying again, we see morally that the multiplicity of the rescaled tubes, and hence the thin tubes inside , should be .
We now observe that the multiplicity of the full collection of thin tubes should morally obey the inequality
since if a given point lies in at most fat tubes, and within each fat tube a given point lies in at most thin tubes in that fat tube, then it should only be able to lie in at most tubes overall. This heuristically gives , which then recovers (1) in the sticky case.
In their previous paper, Wang and Zahl were roughly able to squeeze a little bit more out of this argument to get something resembling in the sticky case, loosely following a strategy of Nets Katz and myself that I discussed in this previous blog post from over a decade ago. I will not discuss this portion of the argument further here, referring the reader to the introduction to that paper; instead, I will focus on the arguments in the current paper, which handle the non-sticky case.
Let’s try to repeat the above analysis in a non-sticky situation. We assume (or some suitable variant thereof), and consider some thickened Kakeya set
where is something resembling what we might call a “Kakeya configuration” at scale : a collection of thin tubes of dimension that are -separated in direction. (Actually, to make the induction work, one has to consider a more general family of tubes than these, satisfying some standard “Wolff axioms” instead of the direction separation hypothesis; but we will gloss over this issue for now.) Our goal is to prove something like for some , which amounts to obtaining some improved volume bound
that improves upon the bound coming from . From the previous paper we know we can do this in the “sticky” case, so we will assume that is “non-sticky” (whatever that means).
A typical non-sticky setup is when there are now fat tubes for some multiplicity (e.g., for some small constant ), with each fat tube containing only thin tubes. Now we have an unfortunate imbalance: the fat tubes form a “super-Kakeya configuration”, with too many tubes at the coarse scale for them to be all -separated in direction, while the thin tubes inside a fat tube form a “sub-Kakeya configuration” in which there are not enough tubes to cover all relevant directions. So one cannot apply the hypothesis efficiently at either scale.
This looks like a serious obstacle, so let’s change tack for a bit and think of a different way to try to close the argument. Let’s look at how intersects a given -ball . The hypothesis suggests that might behave like a -dimensional fractal (thickened at scale ), in which case one might be led to a predicted size of of the form . Suppose for sake of argument that the set was denser than this at this scale, for instance we have
for all and some . Observe that the -neighborhood is basically , and thus has volume by the hypothesis (indeed we would even expect some gain in , but we do not attempt to capture such a gain for now). Since -balls have volume , this should imply that needs about balls to cover it. Applying (3), we then heuristically have
which would give the desired gain . So we win if we can exhibit the condition (3) for some intermediate scale . I think of this as a “Frostman measure violation”, in that the Frostman type bound
is being violated.
The set , being the union of tubes of thickness , is essentially the union of cubes. But it has been observed in several previous works (starting with a paper of Nets Katz, Izabella Laba, and myself) that these Kakeya type sets tend to organize themselves into larger “grains” than these cubes – in particular, they can organize into disjoint prisms (or “grains”) in various orientations for some intermediate scales . The original “graininess” argument of Nets, Izabella and myself required a stickiness hypothesis which we are explicitly not assuming (and also an “x-ray estimate”, though Wang and Zahl were able to find a suitable substitute for this), so is not directly available for this argument; however, there is an alternate approach to graininess developed by Guth, based on the polynomial method, that can be adapted to this setting. (I am told that Guth has a way to obtain this graininess reduction for this paper without invoking the polynomial method, but I have not studied the details.) With rescaling, we can ensure that the thin tubes inside a single fat tube will organize into grains of a rescaled dimension . The grains associated to a single fat tube will be essentially disjoint; but there can be overlap between grains from different fat tubes.
The exact dimensions of the grains are not specified in advance; the argument of Guth will show that is significantly larger than , but other than that there are no bounds. But in principle we should be able to assume without loss of generality that the grains are as “large” as possible. This means that there are no longer grains of dimensions with much larger than ; and for fixed , there are no wider grains of dimensions with much larger than .
One somewhat degenerate possibility is that there are enormous grains of dimensions approximately (i.e., ), so that the Kakeya set becomes more like a union of planar slabs. Here, it turns out that the classical arguments of Córdoba give good estimates, so this turns out to be a relatively easy case. So we can assume that least one of or is small (or both).
We now revisit the multiplicity inequality (2). There is something slightly wasteful about this inequality, because the fat tubes used to define occupy a lot of space that is not in . An improved inequality here is
where is the multiplicity, not of the fat tubes , but rather of the smaller set . The point here is that by the graininess hypotheses, each is the union of essentially disjoint grains of some intermediate dimensions . So the quantity is basically measuring the multiplicity of the grains.
It turns out that after a suitable rescaling, the arrangement of grains looks locally like an arrangement of tubes. If one is lucky, these tubes will look like a Kakeya (or sub-Kakeya) configuration, for instance with not too many tubes in a given direction. (More precisely, one should assume here some form of the Wolff axioms, which the authors refer to as the “Katz-Tao Convex Wolff axioms”). A suitable version if the hypothesis will then give the bound
Meanwhile, the thin tubes inside a fat tube are going to be a sub-Kakeya configuration, having about times fewer tubes than a Kakeya configuration. It turns out to be possible to use to then get a gain in here,
for some small constant . Inserting these bounds into (4), one obtains a good bound which leads to the desired gain .
So the remaining case is when the grains do not behave like a rescaled Kakeya or sub-Kakeya configuration. Wang and Zahl introduce a “structure theorem” to analyze this case, concluding that the grains will organize into some larger convex prisms , with the grains in each prism behaving like a “super-Kakeya configuration” (with significantly more grains than one would have for a Kakeya configuration). However, the precise dimensions of these prisms is not specified in advance, and one has to split into further cases.
One case is when the prisms are “thick”, in that all dimensions are significantly greater than . Informally, this means that at small scales, looks like a super-Kakeya configuration after rescaling. With a somewhat lengthy induction on scales argument, Wang and Zahl are able to show that (a suitable version of) implies an “x-ray” version of itself, in which the lower bound of super-Kakeya configurations is noticeably better than the lower bound for Kakeya configurations. The upshot of this is that one is able to obtain a Frostman violation bound of the form (3) in this case, which as discussed previously is already enough to win in this case.
It remains to handle the case when the prisms are “thin”, in that they have thickness . In this case, it turns out that the arguments of Córdoba, combined with the super-Kakeya nature of the grains inside each of these thin prisms, implies that each prism is almost completely occupied by the set . In effect, this means that these prisms themselves can be taken to be grains of the Kakeya set. But this turns out to contradict the maximality of the dimensions of the grains (if everything is set up properly). This treats the last remaining case needed to close the induction on scales, and obtain the Kakeya conjecture!
Applied category theorists are flocking to AI, because that’s where the money is. I avoid working on it, both because I have an instinctive dislike of ‘hot topics’, and because at present AI is mainly being used to make rich and powerful people richer and more powerful.
However, I like to pay some attention to how category theorists are getting jobs connected to AI, and what they’re doing. Many of these people are my friends, so I wonder what they will do for AI, and the world at large — and what working on AI will do to them.
Let me list a bit of what’s going on. I’ll start with a cautionary tale, and then turn to the most important program linking AI and category theory today.
Symbolica
When Musk and his AI head Andrej Karpathy didn’t listen to their engineer George Morgan’s worry that current techniques in deep learning couldn’t “scale to infinity and solve all problems,” Morgan left Tesla and started a company called Symbolica to work on symbolic reasoning. The billionaire Vinod Khosla gave him $2 million in seed money. He began with an approach based on hypergraphs, but then these researchers wrote a position paper that pushed the company in a different direction:
Kholsa liked the new direction and invested $30 million more in Symbolica. At Gavranovic and Lessard’s suggestion Morgan hired category theorists including Dominic Verity and Neil Ghani.
But Morgan was never fully sold on category theory: he still wanted to pursue his hypergraph approach. After a while, continued disagreements between Morgan and the category theorists took their toll. He fired some, even having one summarily removed from his office. Another resigned voluntarily. Due to nondisclosure agreements, these people no longer talk publicly about what went down.
So one moral for category theorists, or indeed anyone with a good idea: after your idea helps someone get a lot of money, they may be able to fire you.
ARIA
David Dalrymple is running a £59 million program on Mathematics for Safeguarded AI at the UK agency ARIA (the Advanced Research + Invention agency). He is very interested in using category theory for this purpose.
Here’s what the webpage for the Safeguarded AI program says:
Why this programme
As AI becomes more capable, it has the potential to power scientific breakthroughs, enhance global prosperity, and safeguard us from disasters. But only if it’s deployed wisely. Current techniques working to mitigate the risk of advanced AI systems have serious limitations, and can’t be relied upon empirically to ensure safety. To date, very little R&D effort has gone into approaches that provide quantitative safety guarantees for AI systems, because they’re considered impossible or impractical.
What we’re shooting for
By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’: an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.
This project aims to provide complete axiomatic theories of string diagrams for significant categories of probabilistic processes. Fabio will then use these theories to develop compositional methods of analysis for different kinds of probabilistic graphical models.
Safety: Core Representation Underlying Safeguarded AI
This project looks to design a calculus that utilises the semantic structure of quasi-Borel spaces, introduced in ‘A convenient category for higher-order probability theory’. Ohad + team will develop the internal language of quasi-Borel spaces as a ‘semantic universe’ for stochastic processes, define syntax that is amenable to type-checking and versioning, and interface with other teams in the programme—either to embed other formalisms as sub-calculi in quasi-Borel spaces, or vice versa (e.g. for imprecise probability).
This project plans to overcome the limitations of traditional philosophical formalisms by integrating interdisciplinary knowledge through applied category theory. In collaboration with other TA1.1 Creators, David will explore graded modal logic, type theory, and causality, and look to develop the conceptual tools to support the broader Safeguarded AI programme.
Double Categorical Systems Theory for Safeguarded AI
This project aims to utilise Double Categorical Systems Theory (DCST) as a mathematical framework to facilitate collaboration between stakeholders, domain experts, and computer aides in co-designing an explainable and auditable model of an autonomous AI system’s deployment environment. David + team will expand this modelling framework to incorporate formal verification of the system’s safety and reliability, study the verification of model-surrogacy of hybrid discrete-continuous systems, and develop serialisable data formats for representing and operating on models, all with the goal of preparing the DCST framework for broader adoption across the Safeguarded AI Programme.
Filippo + team intend to establish a robust mathematical framework that extends beyond the metrics expressible in quantitative algebraic theories and coalgebras over metric spaces. By shifting from Cartesian to a monoidal setting, this group will examine these metrics using algebraic contexts (to enhance syntax foundations) and coalgebraic contexts provide robust quantitative semantics and effective techniques for establishing quantitative bounds on black-box behaviours), ultimately advancing the scope of quantitative reasoning in these domains.
Doubly Categorical Systems Logic: A Theory of Specification Languages
This project aims to develop a logical framework to classify and interoperate various logical systems created to reason about complex systems and their behaviours, guided by Doubly Categorical Systems Theory (DCST). Matteo’s goal is to study the link between the compositional and morphological structure of systems and their behaviour, specifically in the way logic pertaining these two components works, accounting for both dynamic and temporal features. Such a path will combine categorical approaches to both logic and systems theory.
True Categorical Programming for Composable Systems Total
This project intends to develop a type theory for categorical programming that enables encoding of key mathematical structures not currently supported by existing languages. These structures include functors, universal properties, Kan extensions, lax (co)limits, and Grothendieck constructions. Jade + team are aiming to create a type theory that accurately translates categorical concepts into code without compromise, and then deploy this framework to develop critical theorems related to the mathematical foundations of Safeguarded AI.
Sam + team will look to employ categorical probability toward key elements essential for world modelling in the Safeguarded AI Programme. They will investigate imprecise probability (which provides bounds on probabilities of unsafe behaviour), and stochastic dynamical systems for world modelling, and then look to create a robust foundation of semantic version control to support the above elements.
Amar + team will develop a combinatorial and diagrammatic syntax, along with categorical semantics, for multimodal Petri Nets. These Nets will model dynamical systems that undergo mode or phase transitions, altering their possible places, events, and interactions. Their goal is to create a category-theoretic framework for mathematical modelling and safe-by-design specification of dynamical systems and process theories which exhibit multiple modes of operation.
Topos UK
The Topos Institute is a math research institute in Berkeley with a focus on category theory. Three young category theorists there—Sophie Libkind, David Jaz Myers and Owen Lynch—wrote a proposal called “Double categorical systems theory for safeguarded AI”, which got funded by ARIA. So now they are moving to the UK, where they will be working with Tim Hosgood, José Siquiera, Xiaoyan Li and maybe others at a second branch of the Topos Institute, called Topos UK.
Among other things, Tai-Danae Bradley is a research mathematician at SandboxAQ, a startup focused on AI and quantum technologies. She has applied category theory to natural language processing in a number of interesting papers.
VERSES
VERSES is a “cognitive computing company building next-generation intelligent software systems” with Karl Friston as chief scientist. They’ve hired the category theorists Toby St Clere Smith and Marco Perrin, who are working on compositional tools for approximate Bayesian inference.
Applied category theorists are flocking to AI, because that’s where the money is. I avoid working on it, both because I have an instinctive dislike of ‘hot topics’, and because at present AI is mainly being used to make rich and powerful people richer and more powerful.
However, I like to pay some attention to how category theorists are getting jobs connected to AI, and what they’re doing. Many of these people are my friends, so I wonder what they will do for AI, and the world at large—and what working on AI will do to them.
Let me list a bit of what’s going on. I’ll start with a cautionary tale, and then turn to the most important program linking AI and category theory today.
Symbolica
When Musk and his AI head Andrej Karpathy didn’t listen to their engineer George Morgan’s worry that current techniques in deep learning couldn’t “scale to infinity and solve all problems,” Morgan left Tesla and started a company called Symbolica to work on symbolic reasoning. The billionaire Vinod Khosla gave him $2 million in seed money. He began with an approach based on hypergraphs, but then these researchers wrote a position paper that pushed the company in a different direction:
Kholsa liked the new direction and invested $30 million more in Symbolica. At Gavranovic and Lessard’s suggestion Morgan hired category theorists including Dominic Verity and Neil Ghani.
But Morgan was never fully sold on category theory: he still wanted to pursue his hypergraph approach. After a while, continued disagreements between Morgan and the category theorists took their toll. He fired some, even having one summarily removed from his office. Another resigned voluntarily. Due to nondisclosure agreements, these people no longer talk publicly about what went down.
So one moral for category theorists, or indeed anyone with a good idea: after your idea helps someone get a lot of money, they may be able to fire you.
ARIA
David Dalrymple is running a £59 million program on Mathematics for Safeguarded AI at the UK agency ARIA (the Advanced Research + Invention Agency). He is very interested in using category theory for this purpose.
Here’s what the webpage for the Safeguarded AI program says:
Why this programme
As AI becomes more capable, it has the potential to power scientific breakthroughs, enhance global prosperity, and safeguard us from disasters. But only if it’s deployed wisely. Current techniques working to mitigate the risk of advanced AI systems have serious limitations, and can’t be relied upon empirically to ensure safety. To date, very little R&D effort has gone into approaches that provide quantitative safety guarantees for AI systems, because they’re considered impossible or impractical.
What we’re shooting for
By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’: an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.
This project aims to provide complete axiomatic theories of string diagrams for significant categories of probabilistic processes. Fabio will then use these theories to develop compositional methods of analysis for different kinds of probabilistic graphical models.
Safety: Core Representation Underlying Safeguarded AI
This project looks to design a calculus that utilises the semantic structure of quasi-Borel spaces, introduced in ‘A convenient category for higher-order probability theory‘. Ohad and his team will develop the internal language of quasi-Borel spaces as a “semantic universe” for stochastic processes, define syntax that is amenable to type-checking and versioning, and interface with other teams in the programme—either to embed other formalisms as sub-calculi in quasi-Borel spaces, or vice versa (e.g. for imprecise probability).
This project plans to overcome the limitations of traditional philosophical formalisms by integrating interdisciplinary knowledge through applied category theory. In collaboration with other TA1.1 Creators, David will explore graded modal logic, type theory, and causality, and look to develop the conceptual tools to support the broader Safeguarded AI programme.
Double Categorical Systems Theory for Safeguarded AI
This project aims to utilise Double Categorical Systems Theory (DCST) as a mathematical framework to facilitate collaboration between stakeholders, domain experts, and computer aides in co-designing an explainable and auditable model of an autonomous AI system’s deployment environment. David + team will expand this modelling framework to incorporate formal verification of the system’s safety and reliability, study the verification of model-surrogacy of hybrid discrete-continuous systems, and develop serialisable data formats for representing and operating on models, all with the goal of preparing the DCST framework for broader adoption across the Safeguarded AI Programme.
Filippo and team intend to establish a robust mathematical framework that extends beyond the metrics expressible in quantitative algebraic theories and coalgebras over metric spaces. By shifting from Cartesian to a monoidal setting, this group will examine these metrics using algebraic contexts (to enhance syntax foundations) and coalgebraic contexts provide robust quantitative semantics and effective techniques for establishing quantitative bounds on black-box behaviours), ultimately advancing the scope of quantitative reasoning in these domains.
Doubly Categorical Systems Logic: A Theory of Specification Languages
This project aims to develop a logical framework to classify and interoperate various logical systems created to reason about complex systems and their behaviours, guided by Doubly Categorical Systems Theory (DCST). Matteo’s goal is to study the link between the compositional and morphological structure of systems and their behaviour, specifically in the way logic pertaining these two components works, accounting for both dynamic and temporal features. Such a path will combine categorical approaches to both logic and systems theory.
True Categorical Programming for Composable Systems Total
This project intends to develop a type theory for categorical programming that enables encoding of key mathematical structures not currently supported by existing languages. These structures include functors, universal properties, Kan extensions, lax (co)limits, and Grothendieck constructions. Jade + team are aiming to create a type theory that accurately translates categorical concepts into code without compromise, and then deploy this framework to develop critical theorems related to the mathematical foundations of Safeguarded AI.
Sam + team will look to employ categorical probability toward key elements essential for world modelling in the Safeguarded AI Programme. They will investigate imprecise probability (which provides bounds on probabilities of unsafe behaviour), and stochastic dynamical systems for world modelling, and then look to create a robust foundation of semantic version control to support the above elements.
Amar + team will develop a combinatorial and diagrammatic syntax, along with categorical semantics, for multimodal Petri Nets. These Nets will model dynamical systems that undergo mode or phase transitions, altering their possible places, events, and interactions. Their goal is to create a category-theoretic framework for mathematical modelling and safe-by-design specification of dynamical systems and process theories which exhibit multiple modes of operation.
Topos UK
The Topos Institute is a math research institute in Berkeley with a focus on category theory. Three young category theorists there—Sophie Libkind, David Jaz Myers and Owen Lynch—wrote a proposal called “Double categorical systems theory for safeguarded AI”, which got funded by ARIA. So now they are moving to Oxford, where they will be working with Tim Hosgood, José Siquiera, Xiaoyan Li and maybe others at a second branch of the Topos Institute, called Topos UK.
Among other things, Tai-Danae Bradley is a research mathematician at SandboxAQ, a startup focused on AI and quantum technologies. She has applied category theory to natural language processing in a number of interesting papers.
VERSES
VERSES is a “cognitive computing company building next-generation intelligent software systems” with Karl Friston as chief scientist. They’ve hired the category theorists Toby St Clere Smith and Marco Perrin, who are working on compositional tools for approximate Bayesian inference.
In my last post, I looked at how 1920’s quantum physics (“Quantum Mechanics”, or QM) conceives of a particle with definite momentum and completely uncertain position. I also began the process of exploring how Quantum Field Theory (QFT) views the same object. I’m going to assume you’ve read that post, though I’ll quickly review some of its main points.
In that post, I invented a simple type of particle called a Bohron that moves around in a physical space in the shape of a one-dimensional line, the x-axis.
I discussed the wave function in QM corresponding to a Bohron of definite momentum P1, and depicted that function Ψ(x1) (where x1 is the Bohron’s position) in last post’s Fig. 3.
In QFT, on the other hand, the Bohron is a ripple in the Bohron field, which is a function B(x) that gives a real number for each point x in physical space. That function has the form shown in last post’s Fig. 4.
We then looked at the broad implications of these differences between QM and QFT. But one thing is glaringly missing: we haven’t yet discussed the wave function in QFT for a Bohron of definite momentum P1. That’s what we’ll do today.
The QFT Wave Function
Wave functions tell us the probabilities for various possibilities — specifically, for all the possible ways in which a physical system can be arranged. (That set of all possibilities is called “the space of possibilities“.)
This is a tricky enough idea even when we just have a system of a few particles; for example, if we have N particles moving on a line, then the space of possibilities is an N-dimensional space. In QFT, wave functions can be extremely complicated, because the space of possibilities for a field is infinite dimensional, even when physical space is just a one-dimensional line. Specifically, for any particular shape s(x) that we choose, the wave function for the field is Ψ[s(x)] — a complex number for every functions(x). Its absolute-value-squared is proportional to the probability that the field B(x) takes on that particular shape s(x).
Since there are an infinite number of classes of possible shapes, Ψ in QFT is a function of an infinite number of variables. Said another way, the space of possibilities has an infinite number of dimensions. Ugh! That’s both impossible to draw and impossible to visualize. What are we to do?
Simplifying the Question
By restricting our attention dramatically, we can make some progress. Instead of trying to find the wave function for all possible shapes, let’s try to understand a simplified wave function that ignores most possible shapes but gives us the probabilities for shapes that look like those in Fig. 5 (a variant of Fig. 4 of the last post). This is the simple wavy shape that corresponds to the fixed momentum P1:
where A, the amplitude for this simple wave, can be anything we like. Here’s what that shape looks like for A=1:
Figure 5: The shape A cos(P1 x) for A=1.
If we do this, the wave function for this set of possible shapes is just a function of A; it tells us the probability that A=1 vs. A=-2 vs. A=3.2 vs. A=-4.57, etc. In other words, we’re going to write a restricted wave function Ψ[A] that doesn’t give us all the information we could possibly want about the field, but does tell us the probability for the Bohron field B(x) to take on the shape A cos(P1 x).
This restriction to Ψ[A] is surprisingly useful. That’s because, in comparing the state containing one Bohron with momentum P1 to a state with no Bohrons anywhere — the “vacuum state”, as it is called — the only thing that changes in the wave function is the part of the wave function that is proportional to Ψ[A].
In other words, if we tried to keep all the other information in the wave function, involving all the other possible shapes, we’d be wasting time, because all of that stuff is going to be the same whether there’s a Bohron with momentum P1 present or not.
To properly understand and appreciate Ψ[A] in the presence of a Bohron with momentum P1, we should first explore Ψ[A] in the vacuum state. Once we know the probabilities for A in the absence of a Bohron, we’ll be able to recognize what has changed in the presence of a Bohron.
The Zero Bohron (“Vacuum”) State
In the last post, we examined what the QM wave function looks like that describes a single Bohron with definite momentum (see Fig. 3 of that post). But what is the QM wave function for the vacuum state, the state that has no Bohrons in it?
The answer: it’s a meaningless question. QM is a theory of objects that have positions in space (or other simple properties.) If there are no objects in the theory, then there’s… well… no QM, no wave function, and nothing to discuss.
[You might complain that the Bohron field itself should be thought of as an “object” — but aside from the fact that this is questionable (is air pressure an object?), the QM of a field is QFT, so taking this route would just prove my point.]
In QFT, by contrast, the “vacuum state” is perfectly meaningful and has a wave function. The full vacuum state wave function Ψ[s(x)] is too complicated for us to talk about today. But again, if we keep our focus on the special shapes that look like cos[P1 x], we can easily write the vacuum state’s wave function for that shape’s amplitude, Ψ[A].
Understanding the Vacuum State’s Wave Function
You might have thought, naively, that if a field contains no “particles”, then the field would just be zero; that is, it would have 100% probability to take the form B(x)=0, and 0% probability to have any other shape. This would mean that Ψ[A] would be non-zero only for A=0, forming a spike as shown in Fig. 6. Here, employing a visualization method I use often, I’m showing the wave function’s real part in red and its imaginary part in blue; its absolute-value squared, in black, is mostly hidden behind the red curve.
Figure 6: A naive guess for the vacuum state of the Bohron field would have B(x) = 0 and therefore A=0. But this state would have enormously high energy and would rapidly spread to large values of A.
We’ve seen a similar-looking wave function before in the context of QM. A particle with a definite position also has a wave function in the form of a spike. But as we saw, it doesn’t stay that way: thanks to Heisenberg’s uncertainty principle, the spike instantly spreads out with a speed that reflects the state’s very high energy.
The same issue would afflict the vacuum state of a QFT if its wave function looked like Fig. 6. Just as there’s an uncertainty principle in QM that relates position and motion (changes in position), there’s an uncertainty principle in QFT that relates A and changes in A (and more generally relates B(x) and changes in B(x).) A state with a definite value of position immediately spreads out with a huge amount of energy, and the same is true for a state with a definite value of A; the shape of Ψ[A] in Fig. 6 will immediately spread out dramatically.
In short, a state that momentarily has B(x) = 0, and in particular A=0, won’t remain in this form. Not only will it change rapidly, it will do so with enormous energy. That does not sound healthy for a supposed vacuum state — the state with no Bohrons in it — which ought to be stable and have low energy.
The field’s actual vacuum state therefore has a spread of values for A — and in fact it is a Gaussian wave packet centered around A=0. In QM we have encountered Gaussian wave packets that give a spread-out position; here, in QFT, we need a packet for a spread-out amplitude, shown in Fig. 7 using the representation in which we show the real part, imaginary part, and absolute-value squared of the wave function. In Fig. 7a I’ve made the A-axis horizontal; I’ve then replotted exactly the same thing in Fig. 7b with the A axis vertical, which turns out to be useful as we’ll see in just a moment.
Figure 7a: The real part (red), imaginary part (blue, and zero) and absolute-value-squared of Ψ[A] (the wave function for the amplitude of the shape in Fig. 5) for the vacuum state.
Figure 7b: Same as Fig. 7a, turned sideways for better intuition.
Another way to represent this same wave function involves plotting points at a grid of values for A, with each point drawn in gray-scale that reflects the square of the wave function |Ψ(A)|2, as in Fig. 8. Note that the most probable value for A is zero, but it’s also quite likely to be somewhat away from zero.
Figure 8: The value of |Ψ(A)|2 for the vacuum state, expressed in gray-scale, for a grid of choices of A. Note the most probable value of A is zero.
But now we’re going to go a step further, because what we’re really interested in is not the wave function for A but the wave function for the Bohron field. We want to know how that field B(x) is behaving in the vacuum state. To gain intuition for the vacuum state wave function in terms of the Bohron field (remembering that we’ve restricted ourselves to the shape cos[P1 x] shown in Fig. 5), we’ll generalize Fig. 8: instead of one dot for each value of A, we’ll plot the whole shapeAcos[P1 x] for a grid of choices of A, using gray-scale that’s proportional to |Ψ(A)|2. This is shown in Fig. 9; in a sense, it is a combination of Fig. 8 with Fig. 5.
Figure 9: For a grid of values of A, the shape Acos[P1 x] is drawn in gray-scale that reflects the magnitude of |Ψ(A)|2, and thus the probability for that value of A. This picture gives us intuition for the probabilities for the shape of the field B(x) in the vacuum state. The Bohron field is generally not zero in this state, even though the possible shapes of B(x) are centered around B(x) = 0.
Remember, this is not showing the probability for the position of a particle, or even that of a “particle”. It is showing the probability in the vacuum state for the field B(x) to take on a certain shape, albeit restricted to shapes proportional to cos[P1 x]. We can see that the most likely value of A is zero, but there is a substantial spread around zero that causes the field’s value to be uncertain.
In the vacuum state, what’s true for a shape with momentum P1 would be true also for any and all shapes of the form cos[P x] for any possible momentum P. In principle, we could combine all of those shapes, for all of the different momenta, together in a much more complicated version of Fig. 9. However, that would make the picture completely unreadable, so I won’t try to do that — although I’ll do something intermediate, with multiple values of P, in later posts.
Oh, and I mustn’t forget to flash a warning: everything I’ve just told you and will tell you for the rest of this post is limited to a child’s version of QFT. I’m only describing what the vacuum state looks like for a “free” (i.e. non-interacting) Bohron field. This field doesn’t do anything except send individual “particles” around that never change or interact with each other. If you want to know more about truly interesting QFTs, such as the ones in the real world — well, expect some things to be recognizable from today’s post, but much of this will, yet again, have to be revisited.
The One-Bohron State
Now that we know the nature of the wave function for the vacuum state, at least when restricted to shapes proportional to cos[P1 x], how does this change in the presence of a single Bohron of momentum P1?
The answer is quite simple: the wave function Ψ(A) changes from to (up to an overall constant of no interest to us here.) Depicting this state in analogy to what we did for the vacuum state in Figs. 7b, 8 and 9, we find Figs. 10, 11 and 12.
Figure 10: As in Fig. 7, but for the one-Bohron state. Note the probability for A=0 is now zero, and the probability (black curve) peaks at non-zero positive and negative values of A.
Figure 11: As in Fig. 8, but for the one-Bohron state.
Figure 12: As in Fig. 9, but for the one-Bohron state. Note the probability for B(x)=0 is zero in the one-Bohron state with momentum P1, in contrast to the vacuum state.
Notice that the one-Bohron state is clearly distinguishable from the vacuum state; most notably the probability for A=0 is now zero, and its spread is larger, with the most likely values for A now non-zero.
There’s one more difference between these states, which I won’t attempt to prove to you at the moment. The vacuum state doesn’t show any motion; that’s not surprising, because there are no Bohrons there to do any moving. But the one-Bohron state, with its Bohron of definite momentum, will display signs of a definite speed and direction. You should imagine all the wiggles in Fig. 12 moving steadily to the right as time goes by, whereas Fig. 9 is static.
Well, that’s it. That’s what the QFT wave function for a one-Bohron state of definite momentum P1 looks like — when we ignore the additional complexity that comes from the shapes for other possible momenta P, on the grounds that their behavior is the same in this state as it is in the vacuum state.
A Summary of Today’s Steps
That’s more than enough for today, so let me emphasize some key points here. Compare and contrast:
In QM:
The Bohron with definite momentum is a particle with a position, though that position is unknown.
The wave function for the Bohron, spread out across the space of the Bohron’s possible positions x1, has a wavelength with respect to x1.
In QFT:
The Bohron “particle” (i.e. wavicle) is intrinsically spread out across physical space [the horizontal x-axis in Figs. 9 and 12] and the Bohron itself has a wavelength with respect to x.
Meanwhile the wave function, spread out across the space of possible amplitudes A (the vertical axis in Figs. 7a, 8, 10 and 11) does not contain simply packaged information about how the activity in the Bohron field is spread out across physical space x; both the vacuum state and one-Bohron states are spread out, but you can’t just read off that fact from Figs. 8 and 11.
And note that the wave function has nothing simple to say about the position of the Bohron; after all the spread-out “particle” doesn’t even have a clearly defined position!
Just to make sure this is clear, let me say this again slightly differently. While in QM, the Bohron particle with definite momentum has an unknown position, in QFT, the Bohron “particle” with definite momentum does not even have a position, because it is intrinsicallyspread out. The QFT wave function says nothing about our uncertainty about the Bohron’s location; that uncertainty is already captured in the fact that the real (not complex!) function B(x) is proportional to a cosine function. Indeed physical space, and its coordinate x, don’t even appear directly in Ψ(A). Instead the QFT wave function, in the restricted form we’ve considered, only tells us the probability that B(x) = Acos[P1 x] for a particular value of A — and that those probabilities are different when there is a single Bohron present (Fig. 12) compared to when there is none (Fig. 9).
I hope you can now start to see why I don’t find the word particle helpful in describing a QFT Bohron. The Bohron does have some limited particle-like qualities, most notably its indivisibility, and we’ll explore those soon. But you might already understand why I prefer wavicle.
We are far from done with QFT; this is just the beginning of our explorations. There are many follow-up questions to address, such as
Can we put our QFT Bohron into a wave packet state similar to last post’s Fig. 2? What would that look like?
Do these differences between QM and QFT have implications for how we think about experiments, such as the double-slit experiment or Bell’s particle-pair experiment?
What do QFT wave functions look like if there are two “particles” rather than just one? There are several cases, all of them interesting.
How do measurements work, and how are they different, in QM versus QFT?
What about fields more complicated than the Bohron field, such as the electron field or the electromagnetic field?
We’ll deal with these one by one over the coming days and weeks; stay tuned.
Why do I find the word particle so problematic that I keep harping on it, to the point that some may reasonably view me as obsessed with the issue? It has to do with the profound difference between the way an electron is viewed in 1920s quantum physics (“Quantum Mechanics”, or QM for short) as opposed to 1950s relativistic Quantum Field Theory (abbreviated as QFT). [The word “relativistic” means “incorporating Einstein’s special theory of relativity of 1905”.] My goal this week is to explain carefully this difference.
in QFT, an electron is at best a “particle”, with the word in quotation marks
(and I personally prefer the term wavicle.)
I’ve discussed this to some degree already in my article about how the view of an electron has changed over time, but here I’m going to give you a fuller picture. To complete the story will take two or three posts, but today’s post will already convey one of the most important points.
There are two short readings that you may want to dofirst.
The second is a short section of a blog post that explains how QM views an isolated object in a state of definite momentum — i.e. something whose motion (both speed and direction) is precisely known, and whose position is therefore completely unknown, thanks to Heisenberg’s uncertainty principle.
I’ll will review the main point of the second item, and then I’ll start explaining what an isolated object of definite momentum looks like in QFT.
Removing Everything Extraneous
First, though, let’s make things as simple as possible. Though electrons are familiar, they are more complicated than some of their cousins, thanks to their electric charge and “spin”, and the fact that they are fermions. By contrast, bosons with neither charge nor spin are much simpler. In nature, these include Higgs bosons and electrically-neutral pions, but each of these has some unnecessary baggage. For this reason I’ll frame my discussion in terms of imaginary objects even simpler than a Higgs boson. I’ll call these spinless, chargeless objects “Bohrons” in honor of Niels Bohr (and I’ll leave the many puns to my readers.)
For today we’ll just need one, lonely Bohron, not interacting with anything else, and moving along a line. Using 1920s QM in the style of Schrödinger, we’ll take the following viewpoints.
A Bohron is a particle and exists in physical space, which we’ll take to be just a line — the set of points arranged along what we’ll call the x-axis.
The Bohron has a property we call position in physical space. We’ll refer to its position as x1.
For just one Bohron, the space of possibilities is simply all of its possible positions — all possible values of x1. [See Fig. 1]
The system of one isolated Bohron has a wave function Ψ(x1), a complex number at each point in the space of possibilities. [Note it is not a function of x, the points in physical space; it is a function of x1, the possible positions of the Bohron.]
The wave function predicts the probability of finding the Bohron at any selected position x1: it is proportional to |Ψ(x1)|2, the square of the absolute value of the complex number Ψ(x1).
In a previous post, I described states of definite momentum. But I also described states whose momentum is slightly less definite — a broad Gaussian wave packet state, which is a bit more intutive. The wave function for a Bohron in this state is shown in Fig. 2, using three different representations. You can see intuitively that the Bohron’s motion is quite steady, reflecting near definite momentum, while the wave function’s peak is very broad, reflecting great uncertainty in the Bohron’s position.
Fig. 2a shows the real and imaginary parts of Ψ(x1) in red and blue, along with its absolute-value squared |Ψ(x1)|2 in black.
Fig. 2b shows the absolute value |Ψ(x1)| in a color that reflects the argument [i.e. the phase] of Ψ(x1).
Fig. 2c indicates |Ψ(x1)|2, using grayscale, at a grid of x1values; the Bohron is more likely to be found at or near dark points than at or near lighter ones.
For more details and examples using these representations, see this post.
Figure 2a: The wave function for a wave packet state with near-definite momentum, showing its real (red) and imaginary (blue) parts and its absolute value squared (black.)
Figure 2b: The same wave function, with the curve showing its absolute value and colored by its argument.
Figure 2c: The same wave function, showing its absolute value squared using gray-scale valueson a grid of x1 points. The Bohron is more likely to be found near dark-shaded points.
To get a Bohron of definite momentum P1, we simply take what is plotted in Fig. 2 and make the broad peak wider and wider, so that the uncertainty in the Bohron’s position becomes infinite. Then (as discussed in this post) the wave function for that state, referred to as |P1>, can be drawn as in Fig. 3:
Figure 3a: As in Fig. 2a, but now for a state |P1> of precisely known momentum to the left.
Figure 3b: As in Fig. 2b, but now for a state |P1> of precisely known momentum to the left.
Figure 3c: As in Fig. 2c, but now for a state |P1> of precisely known momentum; note the probability of finding the Bohron is equal at every point at all times.
In math, the wave function for the state at some fixed moment in time takes a simple form, such as
where i is the square root of -1. This is a special state, because the absolute-value-squared of this function is just 1 for every value of x1, and so the probability of measuring the Bohron to be at any particular x1 is the same everywhere and at all times. This is seen in Fig. 3c, and reflects the fact that in a state with exactly known momentum, the uncertainty on the Bohron’s position is infinite.
Let’s compare the Bohron (the particle itself) in the state |P1> to the wave function that describes it.
In the state |P1>, the Bohron’s location is completely unknown. Still, its position is a meaningful concept, in the sense that we could measure it. We can’t predict the outcome of that measurement, but the measurement will give us a definite answer, not a vague indefinite one. That’s because the Bohron is a particle; it is not spread out across physical space, even though we don’t know where it is.
By contrast, the wave function Ψ(x1) is spread out, as is clear in Fig. 3. But caution: it is not spread out across physical space, the points of the x axis. It is spread out across the space of possibilities — across the range of possible positions x1. See Fig. 1 [and read my article on the space of possibilities if this makes no sense to you.]
Thus neither the Bohron nor its wave function is spread out in physical space!
We do have waves here, and they have a wavelength; that’s the distance between one crest and the next in Fig. 3a, and the distance beween one red band and the next in Fig. 3b. That wavelength is a property of the wave function, not a property of the Bohron. To have a wavelength, an object has to be wave-like, which our QM Bohron is not.
Conversely, the Bohron has a momentum (which is definite in this state, and is something we can measure). This has real effects; if the Bohron hits another particle, some or all of its momentum will be transferred, and the second particle will recoil from the blow. By contrast, the wave function does not have momentum. It cannot hit anything and make it recoil, because, like any wave function, it sits outside the physical system. It merely describes an object with momentum, and tells us the probable outcomes of measurements of that object.
Keep these details of wavelength (the wave function’s purview) and the momentum (the Bohron’s purview) in mind. This is how 1920’s QM organizes things. But in QFT, things are different.
First Step Toward a QFT State of Definite Momentum
Now let’s move to quantum field theory, and start the process of making a Bohron of definite momentum. We’ll take some initial steps today, and finish up in the next post.
Our Bohron is now a “particle”, in quotation marks. Why? Because our Bohron is no longer a dot, with a measurable (even if unknown) position. It is now a ripple in a field, which we’ll call the Bohron field. That said, there’s still something particle-like about the Bohron, because you can only have an integer number (1, 2, 3, 4, 5, …) of Bohrons, and you can never have a fractional number (1/2, 7/10, 2.46, etc.) of Bohrons. This feature is something we’ll discuss in later posts, but we’ll just accept it for now.
As fields go, the Bohron field is a very simple example. At any given moment, the field takes on a value — a real number — at each point in space. Said another way, it is a function of physical space, of the form B(x).
Very, very important:Do not confuse the Bohron field B(x) with a wave function!!
This field is a function in physical space (not the space of possibilities). B(x) is a function of physical space pointsx that make up the x-axis, and is not a function of a particle’s positionx1, nor is it a function of any other coordinate that might arise in the space of possibilities.
I’ve chosen the simplest type of QFT field: B(x) is a real number at each location in physical space. This is in contrast to a QM wave function, which is a complex number for each possibility in the space of possibilities.
The field itself can carry energy and momentum and transport it from place to place. This is unlike a wave function, which can only describe the energy and momentum that may be carried by physical objects.
Now here’s the key distinction. Whereas the Bohron of QM has a position, the Bohron of QFT does not generally have a position. Instead, it has a shape.
If our Bohron is to have a definite momentum P1, the field must ripple in a simple way, taking on a shape proportional to a sine or cosine function from pre-university math. An example would be:
where A is a real number, called the “amplitude” of the wave, and x is a location in physical space.
At some point soon we’ll consider all possible values of A — a part of the space of possibilities for the field B(x) — so remember that A can vary. To remind you, I’ve plotted this shape for A=1 in Fig. 4a and again for A=-3/2 in Fig 4b.
Figure 4a: The function A cos[P1 x], for the momentum P1 set equal to 1 and the amplitude A set equal to 1.
Figure 4b: Same as Fig. 4a, but with A = -3/2.
Initial Comparison of QM and QFT
At first, the plots in Fig. 4 of the QFT Bohron’s shape look very similar to the QM wave function of the Bohron particles, especially as drawn in Fig. 3a. The math formulas for the two look similar, too; compare the formula after Fig. 3 to the one above Fig. 4.
However, appearances are deceiving. In fact, when we look carefully, EVERYTHING IS COMPLETELY DIFFERENT.
Our QM Bohron with definite momentum has a wave function Ψ(x1), while in QFT it has a shape B(x); they are functions of variables which, though related, are different.
On top of that, there’s a wave function in QFT too, which we haven’t drawn yet. When we do, we’ll see that the QFT Bohron’s wave function looks nothing like the QM Bohron’s wave function. That’s because
the space of possibilities for the QM wave function is the space of possible positions that the Bohron particle can have, but
the space of possibilities for the QFT wave function is the space of all possible shapes that the Bohron field can have.
The plot in Fig. 4 shows a curve that is both positive and negative but is drawn colorless, in contrast to Fig. 3b, where the curve is positive but colored. That’s because
the Bohron field B(x) is a real number with no argument [phase], whereas
the QM wave function Ψ(x1) for the state of definite momentum has an always-positive absolute value and a rapidly varying argument [phase].
The axes in Fig. 4 are labeled differently from the axis in Fig. 3. That’s because (see Fig. 1)
the QFT Bohron field B(x) is found in physical space, while
the QM wave function Ψ(x1) for the Bohron particle is found in the particle’s space of possibilities.
The absolute-value-squared of a wave function |Ψ(x1)|2 is interpreted as a probability (specifically, the probability for the particular possibility that the particle is at position x1. There is no such interpretation for the square of the Bohron field |B(x)|2. We will later find a probability interpretation for the QFT wave function, but we are not there yet.
Both Fig. 4 and Figs. 3a, 3b show curves with a wavelength, albeit along different axes. But they are very different in every sense
In QM, the Bohron has no wavelength; only its wave function has a wavelength — and that involves lengths not in physical space but in the space of possibilities.
In QFT,
the field ripple corresponding to the QFT Bohron with definite momentum has a physical wavelength;
meanwhile the QFT Bohron’s wave function does not have anything resembling a wavelength! The field’s space of possibilities, where the wave function lives, doesn’t even have a recognizable notion of lengths in general, much less wavelengths in particular.
I’ll explain that last statement next time, when we look at the nature of the QFT wave function that corresponds to having a single QFT Bohron.
A Profound Change of Perspective
But before we conclude for the day, let’s take a moment to contemplate the remarkable change of perspective that is coming into our view, as we migrate our thinking from QM of the 1920s to modern QFT. In both cases, our Bohron of definite momentum is certainly associated with a definite wavelength; we can see that both in Fig. 3 and in Fig. 4. The formula for the relation is well-known to scientists; the wavelength λ for a Bohron of momentum P1 is simply
where h is Planck’s famous constant, the mascot of quantum physics. Larger momentum means smaller wavelength, and vice versa. On this, QM and QFT agree.
But compare:
in QM, this wavelength sits in the wave function, and has nothing to do with waves in physical space;
in QFT, the wavelength is not found in the field’s wave function; instead it is found in the field itself, and specifically in its ripples, which are waves in physical space.
I’ve summarized this in Table 1.
Table 1: The Bohron with definite momentum has an associated wavelength. In QM, this wavelength appears in the wave function. In QFT it does not; both the wavelength and the momentum are found in the field itself. This has caused no end of confusion.
Let me say that another way. In QM, our Bohron is a particle; it has a position, cannot spread out in physical space, and has no wavelength. In QFT, our Bohron is a “particle”, a wavy object that can spread out in physical space, and can indeed have a wavelength. (This is why I’d rather call it a wavicle.)
[Aside for experts: if anyone thinks I’m spouting nonsense, I encourage the skeptic to simply work out the wave function for phonons (or their counterparts with rest mass) in a QM system of coupled balls and springs, and watch as free QFT and its wave function emerge. Every statement made here is backed up with a long but standard calculation, which I’m happy to show you and discuss.]
I think this little table is deeply revealing both about quantum physics and about its history. It goes a long way toward explaining one of the many reasons why the brilliant founding parents of quantum physics were so utterly confused for a couple of decades. [I’m going to go out on a limb here, because I’m certainly not a historian of physics; if I have parts of the history wrong, please set me straight.]
Based on experiments on photons and electrons and on the theoretical insight of Louis de Broglie, it was intuitively clear to the great physicists of the 1920s that electrons and photons, which they were calling particles, do have a wavelength related to their momentum. And yet, in the late 1920s, when they were just inventing the math of QM and didn’t understand QFT yet, the wavelength was always sitting in the wave function. So that made it seem as though maybe the wave function was the particle, or somehow was an aspect of the particle, or that in any case the wave function must carry momentum and be a real physical thing, or… well, clearly it was very confusing. It still confuses many students and science writers today, and perhaps even some professional scientists and philosophers.
In this context, is it surprising that Bohr was led in the late 1920s to suggest that electrons are both particles and waves, depending on experimental context? And is it any wonder that many physicists today, with the benefit of both hindsight and a deep understanding of QFT, don’t share this perspective?
In addition, physicists already knew, from 19th century research, that electromagnetic waves — ripples in the electromagnetic field, which include radio waves and visible light — have both wavelength and momentum. Learning that wave functions for QM have wavelength and describe particles with momentum, as in Fig. 3, some physicists naturally assumed that fields and wave functions are closely related. This led to the suggestion that to build the math of QFT, you must go through the following steps:
first you take particles and describe them with a wave function, and then
second, you make this wave function into a field, and describe it using an even bigger wave function.
(This is where the archaic terms “first quantization” and “second quantization” come from.) But this idea was misguided, arising from early conceptual confusions about wave functions. The error becomes more understandable when you imagine what it must have been like to try to make sense of Table 1 for the very first time.
In the next post, we’ll move on to something novel: images depicting the QFT wave function for a single Bohron. I haven’t seen these images anywhere else, so I suspect they’ll be new to most readers.
In 2023 the director of the Fields Institute asked me to lead a program on climate change. For quite some time I’ve been meaning to say what happened to that.
Briefly, I quit when I realized that I couldn’t get myself motivated to do the job, no matter how hard I tried.
It turned out the job was mainly to apply for grants. There were various options for what these grants could be. At the small end, a grant could be for a meeting or series of meetings at the Fields Institute, focused on a specific topic like agent-based models applied to climate change. At the large end, it could be to set up a ‘network’ of research teams in Canada, who would work together and occasionally meet. Regardless of the details, I was supposed to get grants that would help fund the Fields Institute and bring it into the realm of doing something about climate change.
Unfortunately I’m not really good at applying for grants, I don’t like doing it, and throughout my life I’ve generally managed to avoid it. I also avoid ‘networking’ except for contacting individuals here and there when I have a specific question. So I don’t have a lot of connections among people working to fight climate change, and I don’t have the patience to build up these connections just to get people to help me apply for grants for the Fields Institute.
The irony of people flying to meetings on climate change is not lost on me, either.
Mind you, the project is probably a worthwhile endeavor. But I’m not the right one to lead it. So after some dithering, I quit.
This is the latest chapter in the more general failure of the Azimuth Project. I started this project here in 2010. I had hoped that the urgency of the climate crisis was so great that if I lit a match the fire would spread: i.e., people would jump on board, figure out what scientists could do, and start doing it. That actually did happen to some extent—just enough to make me think it was working. But there are lots of projects to do something about climate change, and the ones that really get anywhere are much better organized than the Azimuth Project.
The big problem, I now realize, is that my talents don’t include leading an organization. I’m not good at working with large groups of people, giving them a clear goal, raising money to pay them, delegating authority, and so on. I did okay as the thesis advisor of up to six grad students, but that’s about my limit.
What succeeded much more than the Azimuth Project per se was the idea of network theory, which I espoused in this manifesto in 2011. I put a lot of work into network theory, and I was lucky to have several grad students who jumped in and pushed it a lot further. You can see some of the results here:
though this leaves out a lot of exciting new work.
My original dream for network theory is that it would let us understand the biosphere better and learn to work with it instead of against it. Here’s what I actually said:
I wish there were a branch of mathematics—in my dreams I call it green mathematics—that would interact with biology and ecology just as fruitfully as traditional mathematics interacts with physics. If the 20th century was the century of physics, while the 21st is the century of biology, shouldn’t mathematics change too? As we struggle to understand and improve humanity’s interaction with the biosphere, shouldn’t mathematicians have some role to play?
I’m not sure we are much closer to that, but the mathematics we’ve developed has had concrete applications, like better tools for epidemiological modeling. I still want to apply them to climate change, but clearly I’m moving too slowly for this to be a practical way of addressing the climate crisis.
A broader project including network theory is ‘applied category theory’. This has been taken up whole-heartedly by my former student Brendan Fong, who is good at organizing people and running an institution. So, if you want to see a project that’s actually succeeding, don’t look here: look at the Topos Institute. This is approximately what the Azimuth Project was trying to be—but run with a competence beyond my wildest dreams.
Besides my own personal lack of organizational skill, the other big problem is that Azimuth Project assumed a vaguely technocratic approach to governance. It’s not that I thought nerds in white lab coats were running the show! But I did think that the climate crisis would eventually be recognized as such—a crisis—and governments would deploy scientists to deal with it. I did not expect that by 2025 the US would be ruled by a corrupt demagogue who would ban the use of the term ‘climate change’ in government documents. Nor did I expect that authoritarian politicians and billionaires would form a world-wide alliance to disrupt democracy and replace it with a system of national oligarchies.
But that’s what is happening now. So I now feel that ‘solving the climate crisis’, already such a herculean task that I put it in wryly sarcastic quotes, can’t be done without doing something even bigger: fighting for freedom and justice, while rethinking and reforming politics and the economy in a way that takes the biosphere and the patterns of human behavior into account.
For a couple of years I’ve been calling this bigger project the ‘New Enlightenment’. It would aim at a reboot of civilization, based on sounder principles, which picks up the job where the so-called Age of Enlightenment left off. (I don’t especially like the word ‘enlightenment’, but I haven’t thought of a better one.)
Of course the scope, viability and very meaning of such a project will be argued ad nauseum as it unfolds. But what else can we do, actually? The ‘same old same old’ just isn’t working.
The decrease is almost entirely due to gains in lighting efficiency in households, and particularly the transition from incandescent (and compact fluorescent) light bulbs to LED light bulbs:
Annual energy savings from this switch to consumers in the US were already estimated to be $14.7 billion in 2020 – or several hundred dollars per household – and are projected to increase, even in the current inflationary era, with the cumulative savings across the US estimated to reach $890 billion by 2035.
What I also did not realize before this meeting is the role that recent advances in pure mathematics – and specifically, the development of the “landscape function” that was a primary focus of this collaboration – played in accelerating this transition. This is not to say that this piece of mathematics was solely responsible for these developments; but, as I hope to explain here, it was certainly part of the research and development ecosystem in both academia and industry, spanning multiple STEM disciplines and supported by both private and public funding. This application of the landscape function was already reported upon by Quanta magazine at the very start of this collaboration back in 2017; but it is only in the last few years that the mathematical theory has been incorporated into the latest LED designs and led to actual savings at the consumer end.
LED lights are made from layers of semiconductor material (e.g., Gallium nitride or Indium gallium nitride) arranged in a particular fashion. When enough of a voltage difference is applied to this material, electrons are injected into the “n-type” side of the LED, while holes of electrons are injected into the “p-type” side, creating a current. In the active layer of the LED, these electrons and holes recombine in the quantum wells of the layer, generating radiation (light) via the mechanism of electroluminescence. The brightness of the LED is determined by the current, while the power consumption is the product of the current and the voltage. Thus, to improve energy efficiency, one seeks to design LEDs to require as little voltage as possible to generate a target amount of current.
As it turns out, the efficiency of an LED, as well as the spectral frequencies of light they generate, depend in many subtle ways on the precise geometry of the chemical composition of the semiconductors, the thickness of the layers, the geometry of how the layers are placed atop one another, the temperature of the materials, and the amount of disorder (impurities) introduced into each layer. In particular, in order to create quantum wells that can efficiently trap the electrons and holes together to recombine to create light of a desired frequency, it is useful to introduce a certain amount of disorder into the layers in order to take advantage of the phenomenon of Anderson localization. However, one cannot add too much disorder, lest the electron states become fully bound and the material behaves too much like an insulator to generate appreciable current.
One can of course make empirical experiments to measure the performance of various proposed LED designs by fabricating them and then testing them in a laboratory. But this is an expensive and painstaking process that does not scale well; one cannot test thousands of candidate designs this way to isolate the best performing ones. So, it becomes desirable to perform numerical simulations of these designs instead, which – if they are sufficiently accurate and computationally efficient – can lead to a much shorter and cheaper design cycle. (In the near future one may also hope to accelerate the design cycle further by incorporating machine learning and AI methods; but these techniques, while promising, are still not fully developed at the present time.)
So, how can one perform numerical simulation of an LED? By the semiclassical approximation, the wave function of an individual electron should solve the time-independent Schrödinger equation
where is the wave function of the electron at this energy level, and is the conduction band energy. The behavior of hole wavefunctions follows a similar equation, governed by the valence band energy instead of . However, there is a complication: these band energies are not solely coming from the semiconductor, but also contain a contribution that comes from electrostatic effects from the electrons and holes, and more specifically by solving the Poisson equation
where is the dielectric constant of the semiconductor, are the carrier densities of electrons and holes respectively, , are further densities of ionized acceptor and donor atoms, and are physical constants. This equation looks somewhat complicated, but is mostly determined by the carrier densities , which in turn ultimately arise from the probability densities associated to the eigenfunctions via the Born rule, combined with the Fermi-Dirac distribution from statistical mechanics; for instance, the electron carrier density is given by the formula
with a similar formula for . In particular, the net potential depends on the wave functions , turning the Schrödinger equation into a nonlinear self-consistent Hartree-type equation. From the wave functions one can also compute the current, determine the amount of recombination between electrons and holes, and therefore also calculate the light intensity and absorption rates. But the main difficulty is to solve for the wave functions for the different energy levels of the electron (as well as the counterpart for holes).
One could attempt to solve this nonlinear system iteratively, by first proposing an initial candidate for the wave functions , using this to obtain a first approximation for the conduction band energy and valence band energy , and then solving the Schrödinger equations to obtain a new approximation for , and repeating this process until it converges. However, the regularity of the potentials plays an important role in being able to solve the Schrödinger equation. (The Poisson equation, being elliptic, is relatively easy to solve to high accuracy by standard methods, such as finite element methods.) If the potential is quite smooth and slowly varying, then one expects the wave functions to be quite delocalizated, and for traditional approximations such as the WKB approximation to be accurate.
However, in the presence of disorder, such approximations are no longer valid. As a consequence, traditional methods for numerically solving these equations had proven to be too inaccurate to be of practical use in simulating the performance of a LED design, so until recently one had to rely primarily on slower and more expensive empirical testing methods. One real-world consequence of this was the “green gap“; while reasonably efficient LED designs were available in the blue and red portions of the spectrum, there was not a suitable design that gave efficient output in the green spectrum. Given that many applications of LED lighting required white light that was balanced across all visible colors of the spectrum, this was a significant impediment to realizing the energy-saving potential of LEDs.
Here is where the landscape function comes in. This function started as a purely mathematical discovery: when solving a Schrödinger equation such as
(where we have now suppressed all physical constants for simplicity), it turns out that the behavior of the eigenfunctions at various energy levels is controlled to a remarkable extent by the landscape function, defined to be the solution to the equation
As discussed in this previous blog post (discussing a paper on this topic I wrote with some of the members of this collaboration), one reason for this is that the Schrödinger equation can be transformed after some routine calculations to
thus making an effective potential for the Schrödinger equation (and also being the coefficients of an effective geometry for the equation). In practice, when is a disordered potential, the effective potential tends to be behave like a somewhat “smoothed out” or “homogenized” version of that exhibits superior numerical performance. For instance, the classical Weyl law predicts (assuming a smooth confining potential ) that the density of states up to energy – that is to say, the number of bound states up to – should asymptotically behave like . This is accurate at very high energies , but when is disordered, it tends to break down at low and medium energies. However, the landscape function makes a prediction for this density of states that is significantly more accurate in practice in these regimes, with a mathematical justification (up to multiplicative constants) of this accuracy obtained in this paper of David, Filoche, and Mayboroda. More refined predictions (again with some degree of theoretical support from mathematical analysis) can be made on the local integrated density of states, and with more work one can then also obtain approximations for the carrier density functions mentioned previously in terms of the energy band level functions , . As the landscape function is relatively easy to compute (coming from solving a single elliptic equation), this gives a very practical numerical way to carry out the iterative procedure described previously to model LEDs in a way that has proven to be both numerically accurate, and significantly faster than empirical testing, leading to a significantly more rapid design cycle.
In particular, recent advances in LED technology have largely closed the “green gap” by introducing designs that incorporate “-defects”: -shaped dents in the semiconductor layers of the LED that create lateral carrier injection pathways and modify the internal electric field, enhancing hole transport into the active layer. The ability to accurately simulate the effects of these defects has allowed researchers to largely close this gap:
My understanding is that the major companies involved in developing LED lighting are now incorporating landscape-based methods into their own proprietary simulation models to achieve similar effects in commercially produced LEDs, which should lead to further energy savings in the near future.
Thanks to Svitlana Mayboroda and Marcel Filoche for detailed discussions, comments, and corrections of the material here.
An early physics demonstration that many of us see in elementary school is that of static electricity: an electrical insulator like a wool cloth or animal fur is rubbed on a glass or plastic rod, and suddenly the rod can pick up pieces of styrofoam or little bits of paper. Alternately, a rubber balloon is rubbed against a kid's hair, and afterward the balloon is able to stick to a wall with sufficient force that static friction keeps the balloon from sliding down the surface. The physics here is that when materials are rubbed together, there can be a net transfer of electrical charge from one to the other, a phenomenon called triboelectricity. The electrostatic attraction between net charge on the balloon and the polarizable surface of the wall is enough to hold up the balloon.
Balloons electrostatically clinging to a wall, from here.
The big mysteries are, how and why do charges transfer between materials when they are rubbed together? As I wrote about once before, this is still not understood, despite more than 2500 years of observations. The electrostatic potentials that can be built up through triboelectricity are not small. They can be tens of kV, enough to cause electrons accelerating across those potentials to emit x-rays when they smack into the positively charged surface. Whatever is going on, it's a way to effectively concentrate the energy from mechanical work into displacing charges. This is how Wimshurst machines and Van de Graaff generators work, even though we don't understand the microscopic physics of the charge generation and separation.
There are disagreements to this day about the mechanisms at work in triboelectricity, including the role of adsorbates, surface chemistry, whether the charges transferred are electrons or ions, etc. From how electronic charge transfer works between metals, or between metals and semiconductors, it's not crazy to imagine that somehow this should all come down to work functions or the equivalent. Depending on the composition and structure of materials, the electrons in there can be bound more tightly (energetically deeper compared to the energy of an electron far away, also called "the vacuum" level) or more loosely (energetically shallower, closer to the energy of a free electron). It's credible that bringing two such materials in contact could lead to electrons "falling down hill" from the more loosely-binding material into the more tightly binding one. That clearly is not the whole story, though, or this would've been figured out long ago.
This week, a new paper revealed an interesting wrinkle. The net preference for picking up or losing charge seems to depend very clearly on the history of repeated contacts. The authors used PDMS silicone rubber, and they find that repeated contacting can deterministically bake in a tendency for charge to flow one direction. Using various surface spectroscopy methods, they find no obvious differences at the PDMS surface before/after the contacting procedures, but charge transfer is affected.
My sneaking suspicion is that adsorbates will turn out to play a huge role in all of this. This may be one of those issues like friction (see here too), where there is a general emergent phenomenon (net charge transfer) that can take place via multiple different underlying pathways. Experiments in ultrahigh vacuum with ultraclean surfaces will undoubtedly show quantitatively different results than experiments in ambient conditions, but they may both show triboelectricity.
I had this neat calculation in my drawer and on the occasion of quantum mechanic's 100th birthday in 2025, I decided I submit a talk about it to the March meeting of the DPG, the German physical society, in Göttingen. And to have to show something, I put it out on the arxiv today. The idea is as follows:
The GHZ experiment is a beautiful version of Bell's inequality that demonstrates you get to wrong conclusions when you assume that a property of a quantum system has to have some (unknown) value even when you don't measure it. I would say it shows quantum theory is not realistic, in the sense that unmeasured properties do not have secret values (different for example from classical statistical mechanics where you could imagine to actually measure the exact position of molecule number 2342 in your container of gas). For details, see the paper or this beautiful explanation by Coleman. I should mention here that there is another way out by assuming some non-local forces that conspire to make the result come out right never the less.
On the other hand there is Bohmian mechanics. This is well known to be a non-local theory (as the time evolution of its particles depend on the positions of all other particles in the system or even universe) but what I found more interesting is also realistic: There, it is claimed that all that matters are particles positions (including the positions of pointers on your measurement devices that you might interpret as showing something different than positions for example velocities or field strengths or whatever) and those have all (possibly unknown) values at all times even if you don't measure them.
So how can the two be brought together? There might be an obstacle in the fact that GHZ is usually presented to be a correlation of spins and in the Bohmian literature spins are not really positions, you will always have to make use of some Stern-Gerlach experiments to translate those into actual positions. But we can circumvent this the other way: We don't really need spins, we just need observables of the commutation relation of Pauli matrices. You might think that those cannot be realised with position measurements as they always commute but this is only true as you do the position measurements at equal times. If you wait between them, you can in fact have almost Pauli type operators.
So we can set up a GHZ experiment in terms of three particles in three boxes and for each particle you measure whether it is in the left or the right half of the box but for each particle you decide if you do it at time 0 or at a later moment. You can look at the correlation of the three measurements as a function of time (of course, as you measure different particles, the actual measurements you do still commute independent of time) and what you find is the blue line in
GHZ correlations vs. Bohmian correlations
You can also (numerically) solve the Bohmian equation of motion and compute the expectation of the correlation of positions of the three particles at different times which gives the orange line, clearly something else. No surprise, the realistic theory cannot predict the outcome of an experiment that demonstrates that quantum theory is not realistic. And the non-local character of the evolution equation does not help either.
To save the Bohmian theory, one can in fact argue that I have computed the wrong thing: After measuring the position of one particle at time 0 or by letting it interact with a measuring device, the future time evolution of all particles is affected and one should compute that correlation with the corrected (effectively collapsed) wave function. That, however, I cannot do and I claim is impossible since it would depend on the details of how the first particle's position is actually measured (whereas the orthodox prediction above is independent of those details as those interactions commute with the later observations). In any case, at least my interpretation is that if you don't want to predict the correlation wrong the best you can do is to say you cannot do the calculation as it depends on unknown details (but the result of course shouldn't).
In any case, the standard argument why Bohmian mechanics is indistinguishable from more conventional treatments is that all that matters are position correlations and since those are given by psi-squared they are the same for all approaches. But I show this is not the case for these multi-time correlations.
Post script: What happens when you try to discuss physics with a philosopher:
Q1. Did you seeMicrosoft’sannouncement? A. Yes, thanks, you can stop emailing to ask! Microsoft’s Chetan Nayak was even kind enough to give me a personal briefing a few weeks ago. Yesterday I did a brief interview on this for the BBC’s World Business Report, and I also commented for MIT Technology Review.
Q2. What is a topological qubit? A. It’s a special kind of qubit built using nonabelian anyons, which are excitations that can exist in a two-dimensional medium, behaving neither as fermions nor as bosons. The idea grew out of seminal work by Alexei Kitaev, Michael Freedman, and others starting in the late 1990s. Topological qubits have proved harder to create and control than ordinary qubits.
Q3. Then why do people care about topological qubits? A. The dream is that they could eventually be more resilient to decoherence than regular qubits, since an error, in order to matter, needs to change the topology of how the nonabelian anyons are braided around each other. So you’d have some robustness built in to the physics of your system, rather than having to engineer it laboriously at the software level (via quantum fault-tolerance).
Q4. Did Microsoft create the first topological qubit? A. Well, they say they did! [Update: Commenters point out to me that buried in Nature‘s review materials is the following striking passage: “The editorial team wishes to point out that the results in this manuscript do not represent evidence for the presence of Majorana zero modes in the reported devices. The work is published for introducing a device architecture that might enable fusion experiments using future Majorana zero modes.” So, the situation is that Microsoft is unambiguously claiming to have created a topological qubit, and they just published a relevant paper in Nature, but their claim to have created a topological qubit has not yet been accepted by peer review.]
Q5. Didn’t Microsoft claim the experimental creation of Majorana zero modes—a building block of topological qubits—back in 2018, and didn’t they then need to retract their claim? A. Yep. Certainly that history is making some experts cautious about the new claim. When I asked Chetan Nayak how confident I should be, his response was basically “look, we now have a topological qubit that’s behaving fully as a qubit; how much more do people want?”
Q6. Is this a big deal? A.If the claim stands, I’d say it would be a scientific milestone for the field of topological quantum computing and physics beyond. The number of topological qubits manipulated in a single experiment would then have finally increased from 0 to 1, and depending on how you define things, arguably a “new state of matter” would even have been created, one that doesn’t appear in nature (but only in Nature).
Q7. Is this useful? A. Not yet! If anyone claims that a single qubit, or even 30 qubits, are already useful for speeding up computation, you can ignore anything else that person says. (Certainly Microsoft makes no such claim.) On the question of what we believe quantum computers will or won’t eventually be useful for, see like half the archives of this blog over the past twenty years.
Q8. Does this announcement vindicate topological qubits as the way forward for quantum computing? A. Think of it this way. If Microsoft’s claim stands, then topological qubits have finally reached some sort of parity with where more traditional qubits were 20-30 years ago. I.e., the non-topological approaches like superconducting, trapped-ion, and neutral-atom have an absolutely massive head start: there, Google, IBM, Quantinuum, QuEra, and other companies now routinely do experiments with dozens or even hundreds of entangled qubits, and thousands of two-qubit gates. Topological qubits can win if, and only if, they turn out to be so much more reliable that they leapfrog the earlier approaches—sort of like the transistor did to the vacuum tube and electromechanical relay. Whether that will happen is still an open question, to put it extremely mildly.
Q9. Are there other major commercial efforts to build topological qubits? A. No, it’s pretty much just Microsoft [update: apparently Nokia Bell Labs also has a smaller, quieter effort, and Delft University in the Netherlands also continues work in the area, having ended an earlier collaboration with Microsoft]. Purely as a scientist who likes to see things tried, I’m grateful that at least one player stuck with the topological approach even when it ended up being a long, painful slog.
Q10. Is Microsoft now on track to scale to a million topological qubits in the next few years? A. In the world of corporate PR and pop-science headlines, sure, why not? As Bender from Futuramasays, “I can guarantee anything you want!” In the world of reality, a “few years” certainly feels overly aggressive to me, but good luck to Microsoft and good luck to its competitors! I foresee exciting times ahead, provided we still have a functioning civilization in which to enjoy them.
Update (Feb 20): Chetan Nayak himself comments here, to respond to criticisms about Microsoft’s Nature paper lacking direct evidence for majorana zero modes or topological qubits. He says that the paper, though published this week, was submitted a year ago, before the evidence existed. Of course we all look forward to the followup paper.
I had an article last week in Quanta Magazine. It’s a piece about something called the Bethe ansatz, a method in mathematical physics that was discovered by Hans Bethe in the 1930’s, but which only really started being understood and appreciated around the 1960’s. Since then it’s become a key tool, used in theoretical investigations in areas from condensed matter to quantum gravity. In this post, I thought I’d say a bit about the story behind the piece and give some bonus material that didn’t fit.
When I first decided to do the piece I reached out to Jules Lamers. We were briefly office-mates when I worked in France, where he was giving a short course on the Bethe ansatz and the methods that sprung from it. It turned out he had also been thinking about writing a piece on the subject, and we considered co-writing for a bit, but that didn’t work for Quanta. He helped me a huge amount with understanding the history of the subject and tracking down the right sources. If you’re a physicist who wants to learn about these things, I recommend his lecture notes. And if you’re a non-physicist who wants to know more, I hope he gets a chance to write a longer popular-audience piece on the topic!
If you clicked through to Jules’s lecture notes, you’d see the word “Bethe ansatz” doesn’t appear in the title. Instead, you’d see the phrase “quantum integrability”. In classical physics, an “integrable” system is one where you can calculate what will happen by doing an integral, essentially letting you “solve” any problem completely. Systems you can describe with the Bethe ansatz are solvable in a more complicated quantum sense, so they get called “quantum integrable”. There’s a whole research field that studies these quantum integrable systems.
My piece ended up rushing through the history of the field. After talking about Bethe’s original discovery, I jumped ahead to ice. The Bethe ansatz was first used to think about ice in the 1960’s, but the developments I mentioned leading up to it, where experimenters noticed extra variability and theorists explained it with the positions of hydrogen atoms, happened earlier, in the 1930’s. (Thanks to the commenter who pointed out that this was confusing!) Baxter gets a starring role in this section and had an important role in tying things together, but other people (Lieb and Sutherland) were involved earlier, showing that the Bethe ansatz indeed could be used with thin sheets of ice. This era had a bunch of other big names that I didn’t have space to talk about: C. N. Yang makes an appearance, and while Faddeev comes up later, I didn’t mention that he had a starring role in the 1970’s in understanding the connection to classical integrability and proposing a mathematical structure to understand what links all these different integrable theories together.
I vaguely gestured at black holes and quantum gravity, but didn’t have space for more than that. The connection there is to a topic you might have heard of before if you’ve read about string theory, called AdS/CFT, a connection between two kinds of world that are secretly the same: a toy model of gravity called Anti-de Sitter space (AdS) and a theory without gravity that looks the same at any scale (called a Conformal Field Theory, or CFT). It turns out that in the most prominent example of this, the theory without gravity is integrable! In fact, it’s a theory I spent a lot of time working with back in my research days, called N=4 super Yang-Mills. This theory is kind of like QCD, and in some sense it has integrability for similar reasons to those that Feynman hoped for and Korchemsky and Faddeev found. But it actually goes much farther, outside of the high-energy approximation where Korchemsky and Faddeev’s result works, and in principle seems to include everything you might want to know about the theory. Nowadays, people are using it to investigate the toy model of quantum gravity, hoping to get insights about quantum gravity in general.
One thing I didn’t get a chance to mention at all is the connection to quantum computing. People are trying to build a quantum computer with carefully-cooled atoms. It’s important to test whether the quantum computer functions well enough, or if the quantum states aren’t as perfect as they need to be. One way people have been testing this is with the Bethe ansatz: because it lets you calculate the behavior of special systems perfectly, you can set up your quantum computer to model a Bethe ansatz, and then check how close to the prediction your results are. You know that the theoretical result is complete, so any failure has to be due to an imperfection in your experiment.
I gave a quick teaser to a very active field, one that has fascinated a lot of prominent physicists and been applied in a wide variety of areas. I hope I’ve inspired you to learn more!
Yesterday’s wave function, showing an interesting interference phenomenon.
Admittedly, it’s a classic trap — one I use as a teaching tool in every quantum physics class. The wave function definitely looks, intuitively, as though two particles are colliding. But no. . . the wave function describes only one particle.
And what is this particle doing? It’s actually in the midst of a disguised version of the famous double slit experiment! This version is much simpler than the usual one, and will be super-useful to us going forward. It will make it significantly easier to see how all the puzzles of the double-slit experiment play out, both from the old, outdated but better known perspective of 1920’s quantum physics and from the modern perspective of quantum field theory.
You can read the details about this wave function — why it can’t possibly describe two particles, why it shows interference despite there being only one particle, and why it gives us a simpler version of the double-slit experiment — in an addendum to yesterday’s post.
The NSF funds university research as well as some national facilities. Organizationally, the NSF is an independent agency, meaning that it doesn’t reside under a particular cabinet secretary, though its Director is a presidential appointee who is confirmed by the US Senate. The NSF comprises a number of directorates (most relevant for readers of this blog are probably Mathematical and Physical Sciences; Engineering; and STEM Education, though there are several others). Within the directorates are divisions (for example, MPS → Division of Materials Research; Division of Chemistry; Division of Physics; Division of Mathematics etc.). Within each division are a variety of programs, spanning from individual investigator grants to medium to large center proposals, to group training grants, to individual graduate and postdoctoral fellowships. Each program is administered by one or more program officers who are either scientists who have become civil servants, or "rotators", academics who take a leave of absence from their university positions to serve at the NSF for some number of years. The NSF is the only agency whose mission historically has explicitly included science education. The NSF's budget has been about $9B/yr (though until very recently there was supposedly bipartisan support for large increases), and 94% of its funds are spent on research, education, and related activities. NSF funds more than 1/4 of all basic research done at universities in the US, and it also funds tech development, like small business innovation grants.
The NSF, more than any other agency that funds physical science and engineering research, relies on peer review. Grants are reviewed by individual reviewers and/or panels. Compared to other agencies, the influence of program officers in the review process is minimal. If a grant doesn't excite the reviewers, it won't get funded. This has its pluses and minuses, but it's less of a personal networking process than other agencies. The success rate for many NSF programs is low, averaging around 25% in DMR, and 15% or so for graduate fellowships. Every NSF program officer with whom I've ever interacted has been dedicated and professional.
Well, yesterday the NSF laid off 11% of its workforce. I had an exchange last night with a long-time NSF program director, who gave permission for me to share the gist, suitably anonymized. (I also corrected typos.) This person says that they want people to be aware of what's going on. They say that NSF leadership is apparently helping with layoffs, and that "permanent Program Directors (feds such as myself) will be undergoing RIF or Reduction In Force process within the next month or so. So far, through buyout and firing today we lost about 16% of the workforce, and RIF is expected to bring it up to 50%." When I asked further, this person said this was "fairly certain". They went on: "Another danger is budget. We do no know what happens after the current CR [continuing resolution] ends March 14. A long shutdown or another CR are possible. For FY26 we are told about plans to reduce the NSF budget by 50%-75% - such reduction will mean no new awards for at least a year, elimination of divisions, merging of programs. Individual researchers and professional societies can help by raising the voice of objection. But realistically, we need to win the midterms to start real change. For now we are losing this battle. I can only promise you that NSF PDs are united as never before in our dedication to serve our communities of reesarchers and educators. We will continue to do so as long as we are here." On a related note, here is a thread by a just laid off NSF program officer. Note that congress has historically ignored presidential budget requests to cut NSF, but it's not at all clear that this can be relied upon now.
Voluntarily hobbling the NSF is, in my view, a terrible mistake that will take decades to fix. The argument that this is a fiscally responsible thing to do is weak. The total federal budget expenditures in FY24 was $6.75T. The NSF budget was $9B, or 0.13% of the total. The secretary of defense today said that their plan is to cut 8% of the DOD budget every year for the next several years. That's a reduction of 9 NSF budgets per year.
I fully recognize that many other things are going on in the world right now, and many agencies are under similar pressures, but I wanted to highlight the NSF in particular. Acting like this is business as usual, the kind of thing that happens whenever there is a change of administration, is disingenuous.
Assa Auerbach’s course was the most maddening course I’ve ever taken.
I was a master’s student in the Perimeter Scholars International program at the Perimeter Institute for Theoretical Physics. Perimeter trotted in world experts to lecture about modern physics. Many of the lecturers dazzled us with their pedagogy and research. We grew to know them not only in class and office hours, but also over meals at Perimeter’s Black-Hole Bistro.
Assa hailed from the Technion in Haifa, Israel. He’d written the book—at least, a book—about condensed matter, the physics of materials. He taught us condensed matter, according to some definition of “taught.”
Assa zipped through course material. He refrained from defining terminology. He used loose, imprecise language that conveys intuition to experts and only to experts. He threw at us the Hubbard model, the Heisenberg model, the Meissner effect, and magnons. If you don’t know what those terms mean, then I empathize. Really.
So I fought Assa like a groom hauling on a horse’s reins. I raised my hand again and again, insisting on clarifications. I shot off questions as quickly as I could invent them, because they were the only barriers slowing him down. He told me they were.
One day, we were studying magnetism. It arises because each atom in a magnet has a magnetic moment, a tiny compass that can angle in any direction. Under certain conditions, atoms’ magnetic moments tend to angle in opposite directions. Sometimes, not all atoms can indulge this tendency, as in the example below.
Physicists call this clash frustration, which I wanted to understand comprehensively and abstractly. But Assa wouldn’t define frustration; he’d only sketch an example.
But what is frustration? I insisted.
It’s when the atoms aren’t happy, he said, like you are now.
After class, I’d escape to the bathroom and focus on breathing. My body felt as though it had been battling an assailant physically.
Earlier this month, I learned that Assa had passed away suddenly. A former Perimeter classmate reposted the Technion’s news blurb on Facebook. A photo of Assa showed a familiar smile flashing beneath curly salt-and-pepper hair.
Am I defaming the deceased? No. The news of Assa’s passing walloped me as hard as any lecture of his did. I liked Assa and respected him; he was a researcher’s researcher. And I liked Assa for liking me for fighting to learn.
Photo courtesy of the Technion
One day, at the Bistro, Assa explained why the class had leaped away from the foundations of condensed matter into advanced topics so quickly: earlier discoveries felt “stale” to him. Everyone, he believed, could smell their moldiness. I disagreed, although I didn’t say so: decades-old discoveries qualify as new to anyone learning about them for the first time. Besides, 17th-century mechanics and 19th-century thermodynamics soothe my soul. But I respected Assa’s enthusiasm for the cutting-edge. And I did chat with him at the Bistro, where his friendliness shone like that smile.
Five years later, I was sojourning at the Kavli Institute for Theoretical Physics (KITP) in Santa Barbara, near the end of my PhD. The KITP, like Perimeter, draws theorists from across the globe. I spotted Assa among them and reached out about catching up. We discussed thermodynamics and experiments and travel.
Assa confessed that, at Perimeter, he’d been lecturing to himself—presenting lectures that he’d have enjoyed hearing, rather than lectures designed for master’s students. He’d appreciated my slowing him down. Once, he explained, he’d guest-lectured at Harvard. Nobody asked questions, so he assumed that the students must have known the material already, that he must have been boring them. So he sped up. Nobody said anything, so he sped up further. At the end, he discovered that nobody had understood any of his material. So he liked having an objector keeping him in check.
And where had this objector ended up? In a PhD program and at a mecca for theoretical physicists. Pursuing the cutting edge, a budding researcher’s researcher. I’d angled in the same direction as my former teacher. And one Perimeter classmate, a faculty member specializing in condensed matter today, waxed even more eloquently about Assa’s inspiration when we were students.
Physics needs more scientists like Assa: nose to the wind, energetic, low on arrogance. Someone who’d respond to this story of frustration with that broad smile.
When we teach students about the properties of quantum objects (and about thermodynamics), we often talk about the "statistics" obeyed by indistinguishable particles. I've written about aspects of this before. "Statistics" in this sense means, what happens mathematically to the multiparticle quantum state \(|\Psi\rangle\) when two particles are swapped. If we use the label \(\mathbf{1}\) to mean the set of quantum numbers associated with particle 1, etc., then the question is, how are \(|\Psi(\mathbf{1},\mathbf{2})\rangle\) and \(|\Psi(\mathbf{2},\mathbf{1})\rangle\) related to each other. We know that probabilities have to be conserved, so \(\langle \Psi(\mathbf{1},\mathbf{2}) | \Psi(\mathbf{1},\mathbf{2})\rangle = \langle \Psi(\mathbf{2},\mathbf{1}) | \Psi(\mathbf{2},\mathbf{1})\rangle\).
The usual situation is to assume \(|\Psi(\mathbf{2},\mathbf{1})\rangle\ = c |\Psi(\mathbf{1},\mathbf{2})\rangle\), where \(c\) is a complex number of magnitude 1. If \(c = 1\), which is sort of the "common sense" expectation from classical physics, the particles are bosons, obeying Bose-Einstein stastistics. If \(c = -1\), the particles are fermions and obey Fermi-Dirac statistics. In principle, one could have \(c = \exp(i\alpha)\), where \(\alpha\) is some phase angle. Particles in that general case are called anyons, and I wrote about them here. Low energy excitations of electrons (fermions) confined in 2D in the presence of a magnetic field can act like anyons, but it seems there can't be anyons in higher dimensions.
Being imprecise, when particles are "dilute" -- "far" from each other in terms of position and momentum -- we typically don't really need to worry much about what kind of quantum statistics govern the particles. The distribution function - the average occupancy of a typical single-particle quantum state (labeled by a coordinate \(\mathbf{r}\), a wavevector \(\mathbf{k}\), and a spin \(\mathbf{\sigma}\) as one possibility) - is much less than 1. When particles are much more dense, though, the quantum statistics matter enormously. At low temperatures, bosons can all pile into the (single-particle, in the absence of interactions) ground state - that's Bose-Einstein condensation. In contrast, fermions have to stack up into higher energy states, since FD statistics imply that no two indistinguishable fermions can be in the same state - this is the Pauli Exclusion Principle, and it's basically why solids are solid. If a gas of particles is at a temperature \(T\) and a chemical potential \(\mu\), then the distribution function and a function of energy \(\epsilon\) for bosons or fermions is given by \(f(\epsilon,\mu,T) = 1/ (\exp((\epsilon-\mu)/k_{\mathrm{B}}T) \pm 1 )\), where the \(+\) sign is the fermion case and the \(-\) sign is the boson case.
In the paper at hand, the authors take on parastatistics, the question of what happens if, besides spin, there are other "internal degrees of freedom" that are attached to particles described by additional indices that obey different algebras. As they point out, this is not a new idea, but what they have done here is show that it is possible to have mathematically consistent versions of this that do not trivially reduce to fermions and bosons and can survive in, say, 3 spatial dimensions. They argue that low energy excitations (quasiparticles) of some quantum spin systems can have these properties. That's cool but not necessarily surprising - there are quasiparticles in condensed matter systems that are argued to obey a variety of exotic relations originally proposed in the world of high energy theory (Weylfermions, Majorana fermions, massless Dirac fermions). They also put forward the possibility that elementary particles could obey these statistics as well. (Ideas transferring over from condensed matter or AMO physics to high energy is also not a new thing; see the Anderson-Higgs mechanism, and the concept of unparticles, which has connections to condensed matter systems where electronic quasiparticles may not be well defined.)
Fig. 1 from this paper, showing distribution functions for fermions, bosons, and more exotic systems studied in the paper.
Interestingly, the authors work out what the distribution function can look like for these exotic particles, as shown here (fig 1 from the paper). The left panel shows how many particles can be in a single-particle spatial state for fermions (zero or one), bosons (up to \(\infty\)), and funky parastatistics-obeying particles of different types. The right panel shows the distribution functions for these cases. I think this is very cool. When I've taught statistical physics to undergrads, I've told the students that no one has written down a general distribution function for systems like this. Guess I'll have to revise my statements on this!
The physicist was called before the big wide world and asked, Why?
This commitment This drive This dream
(and as Nature is a woman, so let her be)
How does she defend? How does she serve your interests, home and abroad (which may be one and the same)?
The physicist stood before the big wide world alone but not alone
and answered
She makes me worth defending.
A realist defends to defend Lives to live Survives to survive And devours to devour It’s dour Mere existence The law of “better mine than yours”
Instead, the physicist spoke of the painters, the sculptors, …and the poets He spoke of dignity and honor and love and worth Of seeing a twinkling many-faceted thing past the curve of the road and a future to be shared.
It's been another exciting week where I feel compelled to write about the practice of university-based research in the US. I've written about "indirect costs" before, but it's been a while. I will try to get readers caught up on the basics of the university research ecosystem in the US, what indirect costs are, the latest (ahh, the classic Friday evening news dump) from NIH, and what might happen. (A note up front: there are federal laws regulating indirect costs, so the move by NIH will very likely face immediate legal challenges. Update: And here come the lawsuits. Update 2: Here is a useful explanatory video.) Update 3: This post is now closed (7:53pm CST 13 Feb). When we get to the "bulldozer going to pile you lot onto the trash" level of discourse, there is no more useful discussion happening.
How does university-based sponsored technical research work in the US? Since WWII, but particularly since the 1960s, many US universities conduct a lot of science and engineering research sponsored by US government agencies, foundations, and industry. By "sponsored", I mean there is a grant or contract between a sponsor and the university that sends funds to the university in exchange for research to be conducted by one or more faculty principal investigators, doctoral students, postdocs, undergrads, staff scientists, etc. When a PI writes a proposal to a sponsor, a budget is almost always required that spells out how much funding is being requested and how it will be spent. For example, a proposal could say, we are going to study superconductivity in 2D materials, and the budget (which comes with a budget justification) says, to do this, I need $37000 per year to pay a graduate research assistant for 12 months, plus $12000 per year for graduate student tuition, plus $8000 in for the first year for a special amplifier, plus $10000 to cover materials, supplies, and equipment usage fees. Those are called direct costs.
In addition, the budget asks for funds to cover indirect costs. Indirect costs are meant to cover the facilities and administrative costs that the university will incur doing the research - that includes things like, maintaining the lab building, electricity, air conditioning, IT infrastructure, research accountants to keep track of the expenses and generate financial reports, etc. Indirect costs are computed as some percentage of some subset of the direct costs (e.g., there are no indirect costs charged on grad tuition or pieces of equipment more expensive than $5K). Indirect cost rates have varied over the years but historically have been negotiated between universities and the federal government. As I wrote eight years ago, "the magic (ahem) is all hidden away in OMB Circular A21 (wiki about it, pdf of the actual doc). Universities periodically go through an elaborate negotiation process with the federal government (see here for a description of this regarding MIT), and determine an indirect cost rate for that university." Rice's indirect cost rate is 56.5% for on-campus fed or industrial sponsored projects. Off-campus rates are lower (if you're really doing the research at CERN, then logically your university doesn't need as much indirect). Foundations historically try to negotiate lower indirect cost rates, often arguing that their resources are limited and paying for administration is not what their charters endorse. The true effective indirect rate for universities is always lower than the stated number because of such negotiations.
PIs are required to submit technical progress reports, and universities are required to submit detailed financial reports, to track these grants.
This basic framework has been in place for decades, and it has resulted in the growth of research universities, with enormous economic and societal benefit. Especially as industrial long term research has waned in the US (another screed I have written before), the university research ecosystem has been hugely important in contributing to modern technological society. We would not have the internet now, for example, if not for federally sponsored research.
Is it ideal? No. Are there inefficiencies? Sure. Should the whole thing be burned down? Not in my opinion, no.
"All universities lose money doing research." This is a quote from my colleague who was provost when I arrived at Rice, and was said to me tongue-in-cheek, but also with more than a grain of truth. If you look at how much it really takes to run the research apparatus, the funds brought in via indirect costs do not cover those costs. I have always said that this is a bit like Hollywood accounting - if research was a true financial disaster, universities wouldn't do it. The fact is that research universities have been willing to subsidize the additional real indirect costs because having thriving research programs brings benefits that are not simple to quantify financially - reputation, star faculty, opportunities for their undergrads that would not exist in the absence of research, potential patent income and startup companies, etc.
Reasonable people can disagree on what is the optimal percentage number for indirect costs. It's worth noting that the indirect cost rate at Bell Labs back when I was there was something close to 100%. Think about that. In a globally elite industrial research environment, with business-level financial pressure to be frugal, the indirect rate was 100%.
The fact is, if indirect cost rates are set too low, universities really will be faced with existential choices about whether to continue to support sponsored research. The overall benefits of having research programs will not outweigh the large financial costs of supporting this business.
Congress has made these true indirect costs steadily higher. Over the last decades, both because it is responsible stewardship and because it's good politics, Congress has passed laws requiring more and more oversight of research expenditures and security. Compliance with these rules has meant that universities have had to hire more administrators - on financial accounting and reporting, research security, tech transfer and intellectual property, supervisory folks for animal- and human-based research, etc. Agencies can impose their own requirements as well. Some large center-type grants from NIH/HHS and DOD require preparation and submission of monthly financial reports.
What did NIH do yesterday? NIH put out new guidance (linked above) setting their indirect cost rate to 15% effective this coming Monday. This applies not just to new grants, but also to awards already underway. There is also a not very veiled threat in there that says, we have chosen for now not to retroactively go back to the start of current awards and ask for funds (already spent) to be returned to us, but we think we would be justified in doing so. The NIH twitter feed proudly says that this change will produce an immediate savings to US taxpayers of $4B.
What does this mean? What are the intended and possible unintended consequences? It seems very likely that other agencies will come under immense pressure to make similar changes. If all agencies do so, and nothing else changes, this will mean tens of millions fewer dollars flowing to typical research universities every year. If a university has $300M annually in federally sponsored research, then that would be generating under the old rules (assume 55% indirect rate) $194M of direct and $106M of indirect costs. If the rate is dropped to 15% and the direct costs stay the same at $194M, then that would generate $29M of indirect costs, a net cut to the university of $77M per year.
There will be legal challenges to all of this, I suspect.
The intended consequences are supposedly to save taxpayer dollars and force universities to streamline their administrative processes. However, given that Congress and the agencies are unlikely to lessen their reporting and oversight requirements, it's very hard for me to see how there can be some radical reduction in accounting and compliance staffs. There seems to be a sentiment that this will really teach those wealthy elite universities a lesson, that with their big endowments they should pick up more of the costs.
One unintended consequence: If this broadly goes through and sticks, universities will want to start making new direct costs. For a grant like the one I described above, you could imagine asking for $1200 per year for electricity, $1000/yr for IT support, $3000/yr for lab space maintenance, etc. This will create a ton of work for lawyers, as there will be a fight over what is or is not an allowable direct cost. This will also create the need for even more accounting types to track all of this. This is the exact opposite of "streamlined" administrative processes.
A second unintended consequence: Universities for whom doing research is financially a lot more of a marginal proposition would likely get out of those activities, if they truly can't recover the costs of operating their offices of research. This is the opposite of improving the situation and student opportunities at the less elite universities.
From a purely real politik perspective that often appeals to legislators: Everything that harms the US research enterprise effectively helps adversaries. The US benefitted enormously after WWII by building a global premier research environment. Risking that should not be done lightly.
Don't panic. There is nothing gained by freaking out. Whatever happens, it will likely be a drawn out process. It's best to be aware of what's happening, educated about what it means, and deliberate in formulating strategies that will preserve research excellence and capabilities.
(So help me, I really want my next post to be about condensed matter physics or nanoscale science!)
It now seems the switch of Cancel Culture has only two settings:
everything is cancellable—including giving intellectual arguments against specific DEI policies, or teaching students about a Chinese filler word (“ne-ge”) that sounds a little like the N-word, or else
nothing is cancellable—not even tweeting “normalize Indian hate” and “I was racist before it was cool,” shortly before getting empowered to remake the US federal government.
How could we possibly draw any line between these two extremes? Wouldn’t that require … judgment? Common sense? Consideration of the facts of individual cases?
I, of course, survived attemptedcancellation by a large online mob a decade ago, led by well-known figures such as Amanda Marcotte and Arthur Chu. Though it was terrifying at the time—it felt like my career and even my life were over—I daresay that, here in 2025, not many people would still condemn me for trying to have the heartfelt conversation I did about nerds, feminism, and dating, deep in the comments section of this blog. My side has now conclusively “won” that battle. The once-terrifying commissars of the People’s Republic of Woke, who delighted in trying to ruin me, are now bound and chained, as whooping soldiers of the MAGA Empire drag them by their hair to the torture dungeons.
And this is … not at all the outcome I wanted? It’s a possible outcome that I foresaw in 2014, and was desperately trying to help prevent, through fostering open dialogue between shy male nerds and feminists? I’m now, if anything, more terrified for my little tribe of pro-Enlightenment, science-loving nerds than I was under the woke regime? Speaking of switches with only two settings.
Anyway, with whatever moral authority this experience vests in me, I’d like to suggest that, in future cancellation controversies, the central questions ought to include the following:
What did the accused person actually say or do? Disregarding all confident online discourse about what that “type” of person normally does, or wants to do.
Is there a wider context that often gets cut from social media posts, but that, as soon as you know it, makes the incident seem either better or worse?
How long ago was the offense: more like thirty years or like last week?
Was the person in a radically different condition than they are now—e.g., were they very young, or undergoing a mental health episode, or reacting to a fresh traumatic incident, or drunk or high?
Were the relevant cultural norms different when the offense happened? Did countless others say or do the same thing, and if so, are they also at risk of cancellation?
What’s reasonable to infer about what the person actually believes? What do they want to have happen to whichever group they offended? What would they do to the group given unlimited power? Have they explicitly stated answers to these questions, either before or after the incident? Have they taken real-world actions by which we could judge their answers as either sincere or insincere?
If we don’t cancel this person, what are we being asked to tolerate? Just that they get to keep teaching and publishing views that many people find objectionable? Or that they get to impose their objectionable views on an entire academic department, university, company, organization, or government?
If we agree that the person said something genuinely bad, did they apologize or express regret? Or, if what they said got confused with something bad, did they rush to clarify and disclaim the bad interpretation?
Did they not only refuse to clarify or apologize, but do the opposite? That is, did they express glee about what they were able to get away with, or make light of the suffering or “tears” of their target group?
People can debate how to weigh these considerations, though I personally put enormous weight on 8 and 9, what you could call the “clarification vs. glee axis.” I have nearly unlimited charity for people willing to have a good-faith moral conversation with the world, and nearly unlimited contempt for people who mock the request for such a conversation.
The sad part is that, in practice, the criteria for cancellation have tended instead to be things like:
Is the target giving off signals of shame, distress, and embarrassment—thereby putting blood in the water and encouraging us to take bigger bites?
Do we, the mob, have the power to cancel this person? Does the person’s reputation and livelihood depend on organizations that care what we think, that would respond to pressure from us?
The trouble with these questions is that, not only are their answers not positively correlated with which people deserve to be cancelled, they’re negatively correlated. This is precisely how you get the phenomenon of the left-wing circular firing squad, which destroys the poor schmucks capable of shame even while the shameless, the proud racists and pussy-grabbers, go completely unpunished. Surely we can do better than that.
Pattern recognition is an altisonant name to a rather common, if complex, activity we perform countless times in our daily lives. Our brain is capable of interpreting successions of sounds, written symbols, or images almost infallibly - so much so that people like me, who have sometimes trouble to recognize a face that should be familiar, get their own disfunctionality term - in this case, prosopagnosia.
People in measure theory find it best to work with, not arbitrary measurable spaces, but certain nice ones called standard Borel spaces. I’ve used them myself.
The usual definition of these looks kind of clunky: a standard Borel space is a set equipped with a -algebra for which there exists a complete metric on such that is the -algebra of Borel sets. But the results are good. For example, every standard Borel space is isomorphic to one of these:
a finite or countably infinite set with its -algebra of all subsets,
the real line with its sigma-algebra of Borel subsets.
So standard Borel spaces are a good candidate for Tom Leinster’s program of revealing the mathematical inevitability of certain traditionally popular concepts. I forget exactly how he put it, but it’s a great program and I remember some examples: he found nice category-theoretic characterizations of Lebesgue integration, entropy, and the nerve of a category.
Now someone has done this for standard Borel spaces!
Theorem. The category of standard Borel spaces and Borel maps is the (bi)initial object in the 2-category of countably complete countably extensive Boolean categories.
Here a category is countably complete if it has countable limits. It’s countably extensive if it has countable coproducts and a map
into a countable coproduct is the same thing as a decomposition together with maps for each . It’s Boolean if the poset of subobjects of any object is a Boolean algebra.
So, believe it or not, these axioms are sufficient to develop everything we do with standard Borel spaces!
Part of me feels bad not to have written for weeks about quantum error-correction or BQP or QMA or even the new Austin-based startup that launched a “quantum computing dating app” (which, before anyone asks, is 100% as gimmicky and pointless as it sounds).
But the truth is that, even if you cared narrowly about quantum computing, there would be no bigger story right now than the fate of American science as a whole, which for the past couple weeks has had a knife to its throat.
Last week, after I blogged about the freeze in all American federal science funding (which has since been lifted by a judge’s order), a Trump-supporting commenter named Kyle had this to say:
No, these funding cuts are not permanent. He is only cutting funds until his staff can identify which money is going to the communists and the wokes. If you aren’t a woke or a communist, you have nothing to fear.
Read that one more time: “If you aren’t woke or a communist, you have nothing to fear.”
Can you predict what happened barely a week later? Science magazine now reports that the Trump/Musk/DOGE administration is planning to cut the National Science Foundation’s annual budget from $9 billion to only $3 billion (Biden, by contrast, had proposed an increase to $10 billion). Other brilliant ideas under discussion, according to the article, are to use AI to evaluate the grant proposals (!), and to shift the little NSF funding that remains from universities to private companies.
To be clear: in the United States, NSF is the only government agency whose central mission is curiosity-driven basic research—not that other agencies like DOE or NIH or NOAA, which also fund basic research, are safe from the chopping block either.
Maybe Congress, where support for basic science has long been bipartisan, will at some point grow some balls and push back on this. If not, though: does anyone seriously believe that you can cut the NSF’s budget by two-thirds while targeting only “woke communism”? That this won’t decimate the global preeminence of American universities in math, physics, computer science, astronomy, genetics, neuroscience, and more—preeminence that took a century to build?
Or does anyone think that I, for example, am a “woke communist”? I, the old-fashioned Enlightenment liberal who repeatedly risked his reputation to criticize “woke communism,” who the “woke communists” denounced when they noticed him at all, and who narrowly survived a major woke cancellation attempt a decade ago? Alas, I doubt any of that will save me: I presumably won’t be able to get NSF grants either under this new regime. Nor will my hundreds of brilliant academic colleagues, who’ve done what they can to make sure the center of quantum computing research remains in America rather than China or anywhere else.
I of course have no hope that the “Kyles” of the world will ever apologize to me for their prediction, their promise, being so dramatically wrong. But here’s my plea to Elon Musk, J. D. Vance, Joe Lonsdale, Curtis Yarvin, the DOGE boys, and all the readers of this blog who are connected to their circle: please prove me wrong, and prove Kyle right.
Please preserve and increase the NSF’s budget, after you’ve cleansed it of “woke communism” as you see fit. For all I care, add a line item to the budget for studying how to build rockets that are even bigger, louder, and more phallic.
But if you won’t save the NSF and the other basic research agencies—well hey, you’re the ones who now control the world’s nuclear-armed superpower, not me. But don’t you dare bullshit me about how you did all this so that merit-based science could once again flourish, like in the days of Newton and Gauss, finally free from meddling bureaucrats and woke diversity hires. You’d then just be another in history’s endless litany of conquering bullies, destroying what they can’t understand, no more interesting than all the previous bullies.
I posted what may be my last academic paper today, about a project I’ve been working on with Matthias Wilhelm for most of the last year. The paper is now online here. For me, the project has been a chance to broaden my horizons, learn new skills, and start to step out of my academic comfort zone. For Matthias, I hope it was grant money well spent.
I wanted to work on something related to machine learning, for the usual trendy employability reasons. Matthias was already working with machine learning, but was interested in pursuing a different question.
When is machine learning worthwhile? Machine learning methods are heuristics, unreliable methods that sometimes work well. You don’t use a heuristic if you have a reliable method that runs fast enough. But if all you have are heuristics to begin with, then machine learning can give you a better heuristic.
Matthias noticed a heuristic embedded deep in how we do particle physics, and guessed that we could do better. In particle physics, we use pictures called Feynman diagrams to predict the probabilities for different outcomes of collisions, comparing those predictions to observation to look for evidence of new physics. Each Feynman diagram corresponds to an integral, and for each calculation there are hundreds, thousands, or even millions of those integrals to do.
Luckily, physicists don’t actually have to do all those integrals. It turns out that most of them are related, by a slightly more advanced version of that calculus class mainstay, integration by parts. Using integration by parts you can solve a list of equations, finding out how to write your integrals in terms of a much smaller list.
Laporta’s rule is a heuristic, with no proof that it is the best option, or even that it will always work. So we probably shouldn’t have been surprised when someone came up with a better heuristic. Watching talks at a December 2023 conference, Matthias saw a presentation by Johann Usovitsch on a curious new rule. The rule was surprisingly simple, just one extra condition on top of Laporta’s. But it was enough to reduce the number of equations by a factor of twenty.
That’s great progress, but it’s also a bit frustrating. Over almost twenty-five years, no-one had guessed this one simple change?
Maybe, thought Matthias and I, we need to get better at guessing.
We started out thinking we’d try reinforcement learning, a technique where a machine is trained by playing a game again and again, changing its strategy when that strategy brings it a reward. We thought we could have the machine learn to cut away extra equations, getting rewarded if it could cut more while still getting the right answer. We didn’t end up pursuing this very far before realizing another strategy would be a better fit.
What is a rule, but a program? Laporta’s golden rule and Johann’s new rule could both be expressed as simple programs. So we decided to use a method that could guess programs.
One method stood out for sheer trendiness and audacity: FunSearch. FunSearch is a type of algorithm called a genetic algorithm, which tries to mimic evolution. It makes a population of different programs, “breeds” them with each other to create new programs, and periodically selects out the ones that perform best. That’s not the trendy or audacious part, though, people have been doing that sort of genetic programming for a long time.
The trendy, audacious part is that FunSearch generates these programs with a Large Language Model, or LLM (the type of technology behind ChatGPT). Using an LLM trained to complete code, FunSearch presents the model with two programs labeled v0 and v1 and asks it to complete v2. In general, program v2 will have some traits from v0 and v1, but also a lot of variation due to the unpredictable output of LLMs. The inventors of FunSearch used this to contribute the variation needed for evolution, using it to evolve programs to find better solutions to math problems.
We decided to try FunSearch on our problem, modifying it a bit to fit the case. We asked it to find a shorter list of equations, giving a better score for a shorter list but a penalty if the list wasn’t able to solve the problem fully.
Some tinkering and headaches later, it worked! After a few days and thousands of program guesses, FunSearch was able to find a program that reproduced the new rule Johann had presented. A few hours more, and it even found a rule that was slightly better!
But then we started wondering: do we actually need days of GPU time to do this?
An expert on heuristics we knew had insisted, at the beginning, that we try something simpler. The approach we tried then didn’t work. But after running into some people using genetic programming at a conference last year, we decided to try again, using a Python package they used in their work. This time, it worked like a charm, taking hours rather than days to find good rules.
This was all pretty cool, a great opportunity for me to cut my teeth on Python programming and its various attendant skills. And it’s been inspiring, with Matthias drawing together more people interested in seeing just how much these kinds of heuristic methods can do there. I should be clear though, that so far I don’t think our result is useful. We did better than the state of the art on an example, but only slightly, and in a way that I’d guess doesn’t generalize. And we needed quite a bit of overhead to do it. Ultimately, while I suspect there’s something useful to find in this direction, it’s going to require more collaboration, both with people using the existing methods who know better what the bottlenecks are, and with experts in these, and other, kinds of heuristics.
So I’m curious to see what the future holds. And for the moment, happy that I got to try this out!
According to this article at politico, there was an all-hands meeting at NSF today (at least for the engineering directorate) where they were told that there will be staff layoffs of 25-50% over the next two months.
This is an absolute catastrophe if it is accurately reported and comes to pass. NSF is already understaffed. This goes far beyond anything involving DEI, and is essentially a declaration that the US is planning to abrogate the federal role in supporting science and engineering research.
Moreover, I strongly suspect that if this conversation is being had at NSF, it is likely being had at DOE and NIH.
I don't even know how to react to this, beyond encouraging my fellow US citizens to call their representatives and senators and make it clear that this would be an unmitigated disaster.
Hamilton’s quaternion number system is a non-commutative extension of the complex numbers, consisting of numbers of the form where are real numbers, and are anti-commuting square roots of with , , . While they are non-commutative, they do keep many other properties of the complex numbers:
Being non-commutative, the quaternions do not form a field. However, they are still a skew field (or division ring): multiplication is associative, and every non-zero quaternion has a unique multiplicative inverse.
Like the complex numbers, the quaternions have a conjugation
although this is now an antihomomorphism rather than a homomorphism: . One can then split up a quaternion into its real part and imaginary part by the familiar formulae
(though we now leave the imaginary part purely imaginary, as opposed to dividing by in the complex case).
The inner product
is symmetric and positive definite (with forming an orthonormal basis). Also, for any , is real, hence equal to . Thus we have a norm
Since the real numbers commute with all quaternions, we have the multiplicative property . In particular, the unit quaternions (also known as , , or ) form a compact group.
which allows one to take adjoints of left and right multiplication:
As are square roots of , we have the usual Euler formulae
for real , together with other familiar formulae such as , , , etc.
We will use these sorts of algebraic manipulations in the sequel without further comment.
The unit quaternions act on the imaginary quaternions by conjugation:
This action is by orientation-preserving isometries, hence by rotations. It is not quite faithful, since conjugation by the unit quaternion is the identity, but one can show that this is the only loss of faithfulness, reflecting the well known fact that is a double cover of .
For instance, for any real , conjugation by is a rotation by around :
Similarly for cyclic permutations of . The doubling of the angle here can be explained from the Lie algebra fact that is rather than ; it also closely related to the aforementioned double cover. We also of course have acting on by left multiplication; this is known as the spinor representation, but will not be utilized much in this post. (Giving the right action of makes it a copy of , and the spinor representation then also becomes the standard representation of on .)
Given how quaternions relate to three-dimensional rotations, it is not surprising that one can also be used to recover the basic laws of spherical trigonometry – the study of spherical triangles on the unit sphere. This is fairly well known, but it took a little effort for me to locate the required arguments, so I am recording the calculations here.
The first observation is that every unit quaternion induces a unit tangent vector on the unit sphere , located at ; the third unit vector is then another tangent vector orthogonal to the first two (and oriented to the left of the original tangent vector), and can be viewed as the cross product of and . Right multplication of this quaternion then corresponds to various natural operations on this unit tangent vector:
Right multiplying by does not affect the location of the tangent vector, but rotates the tangent vector anticlockwise by in the direction of the orthogonal tangent vector , as it replaces by .
Right multiplying by advances the tangent vector by geodesic flow by angle , as it replaces by , and replaces by .
Now suppose one has a spherical triangle with vertices , with the spherical arcs subtending angles respectively, and the vertices subtending angles respectively; suppose also that is oriented in an anti-clockwise direction for sake of discussion. Observe that if one starts at with a tangent vector oriented towards , advances that vector by , and then rotates by , the tangent vector now at and pointing towards . If one advances by and rotates by , one is now at pointing towards ; and if one then advances by and rotates by , one is back at pointing towards . This gives the fundamental relation
relating the three sides and three equations of this triangle. (A priori, due to the lack of faithfulness of the action, the right-hand side could conceivably have been rather than ; but for extremely small triangles the right-hand side is clearly , and so by continuity it must be for all triangles.) Indeed, a moments thought will reveal that the condition (4) is necessary and sufficient for the data to be associated with a spherical triangle. Thus one can view (4) as a “master equation” for spherical trigonometry: in principle, it can be used to derive all the other laws of this subject.
Remark 1 The law (4) has an evident symmetry , which corresponds to the operation of replacing a spherical triangle with its dual triangle. Also, there is nothing particularly special about the choice of imaginaries in (4); one can conjugate (4) by various quaternions and replace here by any other orthogonal pair of unit quaternions.
Remark 2 If we work in the small scale regime, replacing by for some small , then we expect spherical triangles to behave like Euclidean triangles. Indeed, (4) to zeroth order becomes
which reflects the classical fact that the sum of angles of a Euclidean triangle is equal to . To first order, one obtains
which reflects the evident fact that the vector sum of the sides of a Euclidean triangle sum to zero. (Geometrically, this correspondence reflects the fact that the action of the (projective) quaternion group on the unit sphere converges to the action of the special Euclidean group on the plane, in a suitable asymptotic limit.)
The identity (4) is an identity of two unit quaternions; as the unit quaternion group is three-dimensional, this thus imposes three independent constraints on the six real parameters of the spherical triangle. One can manipulate this constraint in various ways to obtain various trigonometric identities involving some subsets of these six parameters. For instance, one can rearrange (4) to get
Conjugating by to reverse the sign of , we also have
Taking the inner product of both sides of these identities, we conclude that
is equal to
Using the various properties of inner product, the former expression simplifies to , while the latter simplifies to
We can write and
so on substituting and simplifying we obtain
which is the spherical cosine rule. Note in the infinitesimal limit (replacing by ) this rule becomes the familiar Euclidean cosine rule
In a similar fashion, from (5) we see that the quantity
is equal to
The first expression simplifies by (1) and properties of the inner product to
which by (2), (3) simplifies further to . Similarly, the second expression simplifies to
which by (2), (3) simplifies to . Equating the two and rearranging, we obtain
which is the spherical sine rule. Again, in the infinitesimal limit we obtain the familiar Euclidean sine rule
As a variant of the above analysis, we have from (5) again that
is equal to
As before, the first expression simplifies to
which equals . Meanwhile, the second expression can be rearranged as
and so the inner product is , leading to the “five part rule”
In the case of a right-angled triangle , this simplifies to one of Napier’s rules
which in the infinitesimal limit is the familiar . The other rules of Napier can be derived in a similar fashion.
Example 3 One application of Napier’s rule (6) is to determine the sunrise equation for when the sun rises and sets at a given location on the Earth, and a given time of year. For sake of argument let us work in summer, in which the declination of the Sun is positive (due to axial tilt, it reaches a maximum of at the summer solstice). Then the Sun subtends an angle of from the pole star (Polaris in the northern hemisphere, Sigma Octantis in the southern hemisphere), and appears to rotate around that pole star once every hours. On the other hand, if one is at a latitude , then the pole star an elevation of above the horizon. At extremely high latitudes , the sun will never set (a phenomenon known as “midnight sun“); but in all other cases, at sunrise or sunset, the sun, pole star, and horizon point below the pole star will form a right-angled spherical triangle, with hypotenuse subtending an angle and vertical side subtending an angle . The angle subtended by the pole star in this triangle is , where is the solar hour angle – the angle that the sun deviates from its noon position. Equation (6) then gives the sunrise equation
or equivalently
A similar rule determines the time of sunset. In particular, the number of daylight hours in summer (assuming one is not in the midnight sun scenario ) is given by
The situation in winter is similar, except that is now negative, and polar night (no sunrise) occurs when .
As artificial intelligence tools continue to evolve and improve their performance on more and more general tasks, scientists struggle to make the best use of them. The problem is not incompetence - in fact, at least in my field of study (high-energy physics) most of us have grown rather well educated on the use and development of tailored machine learning algorithms. The problem is rather that our problems are enormously complex. Long gone are the years when we started to apply with success deep neural networks to classification and regression problems of data analysis: those were easy tasks. The bar now is set much higher - optimize the design of instruments we use for our scientific research.
Some of the things taken down mentions diversity, equity and inclusion, but they also include research papers on optics, chemistry, medicine and much more. They may reappear, but at this time nobody knows.
If you want to help save US federal web pages and databases, here are some things to do:
• First check to see if they’re already backed up. You can go to the Wayback Machine and type a website’s URL into the search bar. Also check out the Safeguarding Research Discourse Group, which has a list of what’s been backed up.
• If they’re not already on the Wayback Machine, you can save web pages there. The easiest way to do this is by installing the Wayback Machine extension for your browser. The add-ons and extensions are listed on the left-hand panel of the website’s homepage.
• If you’re concerned that certain websites or web pages may be removed, you can suggest federal websites and content that end in .gov, .mil and .com to the End of Term Web Archive.
I’ve taken these from Naseem Miller and added a bit. As you can see, there are overlapping efforts that are not yet coordinated with each other. This has some advantages (for example the Safeguarding Research Discourse Group is based outside the US) and some disadvantages (it’s hard to tell definitively what hasn’t been backed up yet).
Finally, Ben Ramsey notes: if you work in government and are asked to remove content from websites as result of Trump’s order, please use the http status code 451 instead of 404. 451 is the correct status code to use for these cases. Also include a Link header with the link relation “blocked-by” that “identifies the entity that blocks access to a resource following receipt of a legal demand.”
On December 11, I gave a keynote address at the Q2B 2024 Conference in Silicon Valley. This is a transcript of my remarks.The slides I presented are here. The video of the talk is here.
NISQ and beyond
I’m honored to be back at Q2B for the 8th year in a row.
The Q2B conference theme is “The Roadmap to Quantum Value,” so I’ll begin by showing a slide from last year’s talk. As best we currently understand, the path to economic impact is the road through fault-tolerant quantum computing. And that poses a daunting challenge for our field and for the quantum industry.
We are in the NISQ era. And NISQ technology already has noteworthy scientific value. But as of now there is no proposed application of NISQ computing with commercial value for which quantum advantage has been demonstrated when compared to the best classical hardware running the best algorithms for solving the same problems. Furthermore, currently there are no persuasive theoretical arguments indicating that commercially viable applications will be found that do not use quantum error-correcting codes and fault-tolerant quantum computing.
NISQ, meaning Noisy Intermediate-Scale Quantum, is a deliberately vague term. By design, it has no precise quantitative meaning, but it is intended to convey an idea: We now have quantum machines such that brute force simulation of what the quantum machine does is well beyond the reach of our most powerful existing conventional computers. But these machines are not error-corrected, and noise severely limits their computational power.
In the future we can envision FASQ* machines, Fault-Tolerant Application-Scale Quantum computers that can run a wide variety of useful applications, but that is still a rather distant goal. What term captures the path along the road from NISQ to FASQ? Various terms retaining the ISQ format of NISQ have been proposed [here, here, here], but I would prefer to leave ISQ behind as we move forward, so I’ll speak instead of a megaquop or gigaquop machine and so on meaning one capable of executing a million or a billion quantum operations, but with the understanding that mega means not precisely a million but somewhere in the vicinity of a million.
Naively, a megaquop machine would have an error rate per logical gate of order 10^{-6}, which we don’t expect to achieve anytime soon without using error correction and fault-tolerant operation. Or maybe the logical error rate could be somewhat larger, as we expect to be able to boost the simulable circuit volume using various error mitigation techniques in the megaquop era just as we do in the NISQ era. Importantly, the megaquop machine would be capable of achieving some tasks beyond the reach of classical, NISQ, or analog quantum devices, for example by executing circuits with of order 100 logical qubits and circuit depth of order 10,000.
What resources are needed to operate it? That depends on many things, but a rough guess is that tens of thousands of high-quality physical qubits could suffice. When will we have it? I don’t know, but if it happens in just a few years a likely modality is Rydberg atoms in optical tweezers, assuming they continue to advance in both scale and performance.
What will we do with it? I don’t know, but as a scientist I expect we can learn valuable lessons by simulating the dynamics of many-qubit systems on megaquop machines. Will there be applications that are commercially viable as well as scientifically instructive? That I can’t promise you.
The road to fault tolerance
To proceed along the road to fault tolerance, what must we achieve? We would like to see many successive rounds of accurate error syndrome measurement such that when the syndromes are decoded the error rate per measurement cycle drops sharply as the code increases in size. Furthermore, we want to decode rapidly, as will be needed to execute universal gates on protected quantum information. Indeed, we will want the logical gates to have much higher fidelity than physical gates, and for the logical gate fidelities to improve sharply as codes increase in size. We want to do all this at an acceptable overhead cost in both the number of physical qubits and the number of physical gates. And speed matters — the time on the wall clock for executing a logical gate should be as short as possible.
A snapshot of the state of the art comes from the Google Quantum AI team. Their recently introduced Willow superconducting processor has improved transmon lifetimes, measurement errors, and leakage correction compared to its predecessor Sycamore. With it they can perform millions of rounds of surface-code error syndrome measurement with good stability, each round lasting about a microsecond. Most notably, they find that the logical error rate per measurement round improves by a factor of 2 (a factor they call Lambda) when the code distance increases from 3 to 5 and again from 5 to 7, indicating that further improvements should be achievable by scaling the device further. They performed accurate real-time decoding for the distance 3 and 5 codes. To further explore the performance of the device they also studied the repetition code, which corrects only bit flips, out to a much larger code distance. As the hardware continues to advance we hope to see larger values of Lambda for the surface code, larger codes achieving much lower error rates, and eventually not just quantum memory but also logical two-qubit gates with much improved fidelity compared to the fidelity of physical gates.
Last year I expressed concern about the potential vulnerability of superconducting quantum processors to ionizing radiation such as cosmic ray muons. In these events, errors occur in many qubits at once, too many errors for the error-correcting code to fend off. I speculated that we might want to operate a superconducting processor deep underground to suppress the muon flux, or to use less efficient codes that protect against such error bursts.
The good news is that the Google team has demonstrated that so-called gap engineering of the qubits can reduce the frequency of such error bursts by orders of magnitude. In their studies of the repetition code they found that, in the gap-engineered Willow processor, error bursts occurred about once per hour, as opposed to once every ten seconds in their earlier hardware. Whether suppression of error bursts via gap engineering will suffice for running deep quantum circuits in the future is not certain, but this progress is encouraging. And by the way, the origin of the error bursts seen every hour or so is not yet clearly understood, which reminds us that not only in superconducting processors but in other modalities as well we are likely to encounter mysterious and highly deleterious rare events that will need to be understood and mitigated.
Real-time decoding
Fast real-time decoding of error syndromes is important because when performing universal error-corrected computation we must frequently measure encoded blocks and then perform subsequent operations conditioned on the measurement outcomes. If it takes too long to decode the measurement outcomes, that will slow down the logical clock speed. That may be a more serious problem for superconducting circuits than for other hardware modalities where gates can be orders of magnitude slower.
For distance 5, Google achieves a latency, meaning the time from when data from the final round of syndrome measurement is received by the decoder until the decoder returns its result, of about 63 microseconds on average. In addition, it takes about another 10 microseconds for the data to be transmitted via Ethernet from the measurement device to the decoding workstation. That’s not bad, but considering that each round of syndrome measurement takes only a microsecond, faster would be preferable, and the decoding task becomes harder as the code grows in size.
Riverlane and Rigetti have demonstrated in small experiments that the decoding latency can be reduced by running the decoding algorithm on FPGAs rather than CPUs, and by integrating the decoder into the control stack to reduce communication time. Adopting such methods may become increasingly important as we scale further. Google DeepMind has shown that a decoder trained by reinforcement learning can achieve a lower logical error rate than a decoder constructed by humans, but it’s unclear whether that will work at scale because the cost of training rises steeply with code distance. Also, the Harvard / QuEra team has emphasized that performing correlated decoding across multiple code blocks can reduce the depth of fault-tolerant constructions, but this also increases the complexity of decoding, raising concern about whether such a scheme will be scalable.
Trading simplicity for performance
The Google processors use transmon qubits, as do superconducting processors from IBM and various other companies and research groups. Transmons are the simplest superconducting qubits and their quality has improved steadily; we can expect further improvement with advances in materials and fabrication. But a logical qubit with very low error rate surely will be a complicated object due to the hefty overhead cost of quantum error correction. Perhaps it is worthwhile to fashion a more complicated physical qubit if the resulting gain in performance might actually simplify the operation of a fault-tolerant quantum computer in the megaquop regime or well beyond. Several versions of this strategy are being pursued.
One approach uses cat qubits, in which the encoded 0 and 1 are coherent states of a microwave resonator, well separated in phase space, such that the noise afflicting the qubit is highly biased. Bit flips are exponentially suppressed as the mean photon number of the resonator increases, while the error rate for phase flips induced by loss from the resonator increases only linearly with the photon number. This year the AWS team built a repetition code to correct phase errors for cat qubits that are passively protected against bit flips, and showed that increasing the distance of the repetition code from 3 to 5 slightly improves the logical error rate. (See also here.)
Another helpful insight is that error correction can be more effective if we know when and where the errors occur in a quantum circuit. We can apply this idea using a dual rail encoding of the qubits. With two microwave resonators, for example, we can encode a qubit by placing a single photon in either the first resonator (the 10) state, or the second resonator (the 01 state). The dominant error is loss of a photon, causing either the 01 or 10 state to decay to 00. One can check whether the state is 00, detecting whether the error occurred without disturbing a coherent superposition of 01 and 10. In a device built by the Yale / QCI team, loss errors are detected over 99% of the time and all undetected errors are relatively rare. Similar results were reported by the AWS team, encoding a dual-rail qubit in a pair of transmons instead of resonators.
Another idea is encoding a finite-dimensional quantum system in a state of a resonator that is highly squeezed in two complementary quadratures, a so-called GKP encoding. This year the Yale group used this scheme to encode 3-dimensional and 4-dimensional systems with decay rate better by a factor of 1.8 than the rate of photon loss from the resonator. (See also here.)
A fluxonium qubit is more complicated than a transmon in that it requires a large inductance which is achieved with an array of Josephson junctions, but it has the advantage of larger anharmonicity, which has enabled two-qubit gates with better than three 9s of fidelity, as the MIT team has shown.
Whether this trading of simplicity for performance in superconducting qubits will ultimately be advantageous for scaling to large systems is still unclear. But it’s appropriate to explore such alternatives which might pay off in the long run.
Error correction with atomic qubits
We have also seen progress on error correction this year with atomic qubits, both in ion traps and optical tweezer arrays. In these platforms qubits are movable, making it possible to apply two-qubit gates to any pair of qubits in the device. This opens the opportunity to use more efficient coding schemes, and in fact logical circuits are now being executed on these platforms. The Harvard / MIT / QuEra team sampled circuits with 48 logical qubits on a 280-qubit device –- that big news broke during last year’s Q2B conference. Atom computing and Microsoft ran an algorithm with 28 logical qubits on a 256-qubit device. Quantinuum and Microsoft prepared entangled states of 12 logical qubits on a 56-qubit device.
However, so far in these devices it has not been possible to perform more than a few rounds of error syndrome measurement, and the results rely on error detection and postselection. That is, circuit runs are discarded when errors are detected, a scheme that won’t scale to large circuits. Efforts to address these drawbacks are in progress. Another concern is that the atomic movement slows the logical cycle time. If all-to-all coupling enabled by atomic movement is to be used in much deeper circuits, it will be important to speed up the movement quite a lot.
Toward the megaquop machine
How can we reach the megaquop regime? More efficient quantum codes like those recently discovered by the IBM team might help. These require geometrically nonlocal connectivity and are therefore better suited for Rydberg optical tweezer arrays than superconducting processors, at least for now. Error mitigation strategies tailored for logical circuits, like those pursued by Qedma, might help by boosting the circuit volume that can be simulated beyond what one would naively expect based on the logical error rate. Recent advances from the Google team, which reduce the overhead cost of logical gates, might also be helpful.
What about applications? Impactful applications to chemistry typically require rather deep circuits so are likely to be out of reach for a while yet, but applications to materials science provide a more tempting target in the near term. Taking advantage of symmetries and various circuit optimizations like the ones Phasecraft has achieved, we might start seeing informative results in the megaquop regime or only slightly beyond.
As a scientist, I’m intrigued by what we might conceivably learn about quantum dynamics far from equilibrium by doing simulations on megaquop machines, particularly in two dimensions. But when seeking quantum advantage in that arena we should bear in mind that classical methods for such simulations are also advancing impressively, including in the past year (for example, here and here).
To summarize, advances in hardware, control, algorithms, error correction, error mitigation, etc. are bringing us closer to megaquop machines, raising a compelling question for our community: What are the potential uses for these machines? Progress will require innovation at all levels of the stack. The capabilities of early fault-tolerant quantum processors will guide application development, and our vision of potential applications will guide technological progress. Advances in both basic science and systems engineering are needed. These are still the early days of quantum computing technology, but our experience with megaquop machines will guide the way to gigaquops, teraquops, and beyond and hence to widely impactful quantum value that benefits the world.
I thank Dorit Aharonov, Sergio Boixo, Earl Campbell, Roland Farrell, Ashley Montanaro, Mike Newman, Will Oliver, Chris Pattison, Rob Schoelkopf, and Qian Xu for helpful comments.
*The acronym FASQ was suggested to me by Andrew Landahl.
The megaquop machine (image generated by ChatGPT).
In the final stages of writing up a combinatorial argument, no part of which is really deep, but which is giving me absolute fits, so this song is in my head. Penn and Teller! I forgot or perhaps never knew they were in this video.
Also: my 14-year-old daughter knows this song and reports “Everybody knows this song.” But she doesn’t know any other songs by Run-DMC. I’m not sure she knows any other ’80s hip hop at all. “It’s Tricky” wasn’t even Run-DMC’s biggest hit. It’s very mysterious, which pieces of the musical past survive into contemporary culture and which are forgotten.
As part of the celebrations for 20 years of blogging, I am re-posting articles that in some way were notable for the history of the blog. This time I want to (re)-submit to you four pieces I wrote to explain the unexplainable: the very complicated analysis performed by a group of physicists within the CDF experiment, which led them to claim that there was a subtle new physics process hidden in the data collected in Run 2. There would be a lot to tell about that whole story, but suffices to say here that the signal never got confirmed by independent analyses and by DZERO, the competing experiment at the Tevatron. As mesmerizing and striking the CDF were, they were finally archived as some intrinsic incapability of the experiment to make perfect sense of their muon detector signals.
Some people have stories about an inspiring teacher who introduced them to their life’s passion. My story is different: I became a physicist due to a famously bad teacher.
My high school was, in general, a good place to learn science, but physics was the exception. The teacher at the time had a bad reputation, and while I don’t remember exactly why I do remember his students didn’t end up learning much physics. My parents were aware of the problem, and aware that physics was something I might have a real talent for. I was already going to take math at the university, having passed calculus at the high school the year before, taking advantage of a program that let advanced high school students take free university classes. Why not take physics at the university too?
This ended up giving me a huge head-start, letting me skip ahead to the fun stuff when I started my Bachelor’s degree two years later. But in retrospect, I’m realizing it helped me even more. Skipping high-school physics didn’t just let me move ahead: it also let me avoid a class that is in many ways more difficult than university physics.
High school physics is a mess of mind-numbing formulas. How is velocity related to time, or acceleration to displacement? What’s the current generated by a changing magnetic field, or the magnetic field generated by a current? Students learn a pile of apparently different procedures to calculate things that they usually don’t particularly care about.
Once you know some math, though, you learn that most of these formulas are related. Integration and differentiation turn the mess of formulas about acceleration and velocity into a few simple definitions. Understand vectors, and instead of a stack of different rules about magnets and circuits you can learn Maxwell’s equations, which show how all of those seemingly arbitrary rules fit together in one reasonable package.
This doesn’t just happen when you go from high school physics to first-year university physics. The pattern keeps going.
In a textbook, you might see four equations to represent what Maxwell found. But once you’ve learned special relativity and some special notation, they combine into something much simpler. Instead of having to keep track of forces in diagrams, you can write down a Lagrangian and get the laws of motion with a reliable procedure. Instead of a mess of creation and annihilation operators, you can use a path integral. The more physics you learn, the more seemingly different ideas get unified, the less you have to memorize and the more just makes sense. The more physics you study, the easier it gets.
Until, that is, it doesn’t anymore. A physics education is meant to catch you up to the state of the art, and it does. But while the physics along the way has been cleaned up, the state of the art has not. We don’t yet have a unified set of physical laws, or even a unified way to do physics. Doing real research means once again learning the details: quantum computing algorithms or Monte Carlo simulation strategies, statistical tools or integrable models, atomic lattices or topological field theories.
Most of the confusions along the way were research problems in their own day. Electricity and magnetism were understood and unified piece by piece, one phenomenon after another before Maxwell linked them all together, before Lorentz and Poincaré and Einstein linked them further still. Once a student might have had to learn a mess of particles with names like J/Psi, now they need just six types of quarks.
So if you’re a student now, don’t despair. Physics will get easier, things will make more sense. And if you keep pursuing it, eventually, it will stop making sense once again.
Timothy Trudgian, Andrew Yang and I have just uploaded to the arXiv the paper “New exponent pairs, zero density estimates, and zero additive energy estimates: a systematic approach“. This paper launches a project envisioned in this previous blog post, in which the (widely dispersed) literature on various exponents in classical analytic number theory, as well as the relationships between these exponents, are collected in a living database, together with computer code to optimize the relations between them, with one eventual goal being to automate as much as possible the “routine” components of many analytic number theory papers, in which progress on one type of exponent is converted via standard arguments to progress on other exponents.
The database we are launching concurrently with this paper is called the Analytic Number Theory Exponent Database (ANTEDB). This Github repository aims to collect the latest results and relations on exponents such as the following:
The growth exponent of the Riemann zeta function at real part (i.e., the best exponent for which as );
Exponent pairs (used to bound exponential sums for various phase functions and parameters );
Zero density exponents (used to bound the number of zeros of of real part larger than );
etc.. These sorts of exponents are related to many topics in analytic number theory; for instance, the Lindelof hypothesis is equivalent to the assertion .
Information on these exponents is collected both in a LaTeX “blueprint” that is available as a human-readable set of web pages, and as part of our Python codebase. In the future one could also imagine the data being collected in a Lean formalization, but at present the database only contains a placeholder Lean folder.
As a consequence of collecting all the known bounds in the literature on these sorts of exponents, as well as abstracting out various relations between these exponents that were implicit in many papers in this subject, we were then able to run computer-assisted searches to improve some of the state of the art on these exponents in a largely automated fashion (without introducing any substantial new inputs from analytic number theory). In particular, we obtained:
four new exponent pairs;
several new zero density estimates; and
new estimates on the additive energy of zeroes of the Riemann zeta function.
We are hoping that the ANTEDB will receive more contributions in the future, for instance expanding to other types of exponents, or to update the database as new results are obtained (or old ones added). In the longer term one could also imagine integrating the ANTEDB with other tools, such as Lean or AI systems, but for now we have focused primarily on collecting the data and optimizing the relations between the exponents.
The Whole Foods in Madison used to be on University Avenue where Shorewood Boulevard empties onto it. That’s about a 7-minute drive from my house. I stopped in there all the time, just because I was driving by and I needed something that was easy to find there. Last year, Whole Foods moved over to the new Hilldale Yards development, still on University but just across Midvale. It’s now an 8-minute drive from my house. And I haven’t been there since. Somehow, everything east of Midvale is “my zone” — a place I a place I might pass by, and go without planning to go. And anything west of Midvale is only accessible by a dedicated trip.
By the way, the old Whole Foods location is now Fresh Mart, an international (but mostly Middle Eastern and Eastern European) grocery store. It’s awesome to have an international store the size of a Whole Foods but it is really hard to imagine how they’re making rent on a space that big. So please go there, buy some actually sharp feta and freekeh and ajvar and marinated sprats in a can! They have an extremely multicultural hot bar with great grape leaves and a coffeeshop with multiple kinds of baklava! I’m glad it’s in my zone.
The word “abstemious” is one that usually comes to mind only because (apparently sharing this feature only with “facetious”) it has all five English vowels, once each, in alphabetical order. I had always assumed it derived from the word “abstain.” But no! “Abstain” is the Latin negative prefix ab- followed by tenere, “to have” — you don’t have what you abstain from. But to be abstemious is specifically to abstain from alcohol. What comes after ab is actually temetum, a Latin word whose ultimate origin is the proto-Indo-European temH, or “darkness” (i.e. what you experience when you drink too much temetum.)
Anyway, no better time to listen to EBN OZN’s wondrous one hit, “AEIOU Sometimes Y,” which is apparently the first pop song ever recorded entirely on a computer. “Do I want to go out?”
The organizers invited three Preskillites to present talks in John’s honor: Hoi-Kwong Lo, who’s helped steer quantum cryptography and communications; Daniel Gottesman, who’s helped lay the foundations of quantum error correction; and me. I believe that one of the most fitting ways to honor John is by sharing the most exciting physics you know of. I shared about quantum thermodynamics for (simple models of) nuclear physics, along with ten lessons I learned from John. You can watch the talk here and check out the paper, recently published in Physical Review Letters, for technicalities.
John has illustrated this lesson by wrestling with the black-hole-information paradox, including alongside Stephen Hawking. Quantum information theory has informed quantum thermodynamics, as Quantum Frontiers regulars know. Quantum thermodynamics is the study of work (coordinated energy that we can harness directly) and heat (the energy of random motion). Systems exchange heat with heat reservoirs—large, fixed-temperature systems. As I draft this blog post, for instance, I’m radiating heat into the frigid air in Montreal Trudeau Airport.
So much for quantum information. How about high-energy physics? I’ll include nuclear physics in the category, as many of my Europeans colleagues do. Much of nuclear physics and condensed matter involves gauge theories. A gauge theory is a model that contains more degrees of freedom than the physics it describes. Similarly, a friend’s description of the CN Tower could last twice as long as necessary, due to redundancies. Electrodynamics—the theory behind light bulbs—is a gauge theory. So is quantum chromodynamics, the theory of the strong force that holds together a nucleus’s constituents.
Every gauge theory obeys Gauss’s law. Gauss’s law interrelates the matter at a site to the gauge field around the site. For example, imagine a positive electric charge in empty space. An electric field—a gauge field—points away from the charge at every spot in space. Imagine a sphere that encloses the charge. How much of the electric field is exiting the sphere? The answer depends on the amount of charge inside, according to Gauss’s law.
Gauss’s law interrelates the matter at a site with the gauge field nearby…which is related to the matter at the next site…which is related to the gauge field farther away. So everything depends on everything else. So we can’t easily claim that over here are independent degrees of freedom that form a system of interest, while over there are independent degrees of freedom that form a heat reservoir. So how can we define the heat and work exchanged within a lattice gauge theory? If we can’t, we should start biting our nails: thermodynamics is the queen of the physical theories, a metatheory expected to govern all other theories. But how can we define the quantum thermodynamics of lattice gauge theories? My colleague Zohreh Davoudi and her group asked me this question.
I had the pleasure of addressing the question with five present and recent Marylanders…
…the mention of whom in my CQIQC talk invited…
I’m a millennial; social media took off with my generation. But I enjoy saying that my PhD advisor enjoys far more popularity on social media than I do.
How did we begin establishing a quantum thermodynamics for lattice gauge theories?
Someone who had a better idea than I, when I embarked upon this project, was my colleague Chris Jarzynski. So did Dvira Segal, a University of Toronto chemist and CQIQC’s director. So did everyone else who’d helped develop the toolkit of strong-coupling thermodynamics. I’d only heard of the toolkit, but I thought it sounded useful for lattice gauge theories, so I invited Chris to my conversations with Zohreh’s group.
I didn’t create this image for my talk, believe it or not. The picture already existed on the Internet, courtesy of this blog.
Strong-coupling thermodynamics concerns systems that interact strongly with reservoirs. System–reservoir interactions are weak, or encode little energy, throughout much of thermodynamics. For example, I exchange little energy with Montreal Trudeau’s air, relative to the amount of energy inside me. The reason is, I exchange energy only through my skin. My skin forms a small fraction of me because it forms my surface. My surface is much smaller than my volume, which is proportional to the energy inside me. So I couple to Montreal Trudeau’s air weakly.
My surface would be comparable to my volume if I were extremely small—say, a quantum particle. My interaction with the air would encode loads of energy—an amount comparable to the amount inside me. Should we count that interaction energy as part of my energy or as part of the air’s energy? Could we even say that I existed, and had a well-defined form, independently of that interaction energy? Strong-coupling thermodynamics provides a framework for answering these questions.
Kevin Kuns, a former Quantum Frontiers blogger, described how John explains physics through simple concepts, like a ball attached to a spring. John’s gentle, soothing voice resembles a snake charmer’s, Kevin wrote. John charms his listeners into returning to their textbooks and brushing up on basic physics.
Little is more basic than the first law of thermodynamics, synopsized as energy conservation. The first law governs how much a system’s internal energy changes during any process. The energy change equals the heat absorbed, plus the work absorbed, by the system. Every formulation of thermodynamics should obey the first law—including strong-coupling thermodynamics.
Which lattice-gauge-theory processes should we study, armed with the toolkit of strong-coupling thermodynamics? My collaborators and I implicitly followed
and
We don’t want to irritate experimentalists by asking them to run difficult protocols. Tom Rosenbaum, on the left of the previous photograph, is a quantum experimentalist. He’s also the president of Caltech, so John has multiple reasons to want not to irritate him.
Quantum experimentalists have run quench protocols on many quantum simulators, or special-purpose quantum computers. During a quench protocol, one changes a feature of the system quickly. For example, many quantum systems consist of particles hopping across a landscape of hills and valleys. One might flatten a hill during a quench.
We focused on a three-step quench protocol: (1) Set the system up in its initial landscape. (2) Quickly change the landscape within a small region. (3) Let the system evolve under its natural dynamics for a long time. Step 2 should cost work. How can we define the amount of work performed? By following
John wrote a blog post about how the typical physicist is a one-trick pony: they know one narrow subject deeply. John prefers to know two subjects. He can apply insights from one field to the other. A two-trick pony can show that Gauss’s law behaves like a strong interaction—that lattice gauge theories are strongly coupled thermodynamic systems. Using strong-coupling thermodynamics, the two-trick pony can define the work (and heat) exchanged within a lattice gauge theory.
An experimentalist can easily measure the amount of work performed,1 we expect, for two reasons. First, the experimentalist need measure only the small region where the landscape changed. Measuring the whole system would be tricky, because it’s so large and it can contain many particles. But an experimentalist can control the small region. Second, we proved an equation that should facilitate experimental measurements. The equation interrelates the work performed1 with a quantity that seems experimentally accessible.
My team applied our work definition to a lattice gauge theory in one spatial dimension—a theory restricted to living on a line, like a caterpillar on a thin rope. You can think of the matter as qubits2 and the gauge field as more qubits. The system looks identical if you flip it upside-down; that is, the theory has a symmetry. The system has two phases, analogous to the liquid and ice phases of HO. Which phase the system occupies depends on the chemical potential—the average amount of energy needed to add a particle to the system (while the system’s entropy, its volume, and more remain constant).
My coauthor Connor simulated the system numerically, calculating its behavior on a classical computer. During the simulated quench process, the system began in one phase (like HO beginning as water). The quench steered the system around within the phase (as though changing the water’s temperature) or across the phase transition (as though freezing the water). Connor computed the work performed during the quench.1 The amount of work changed dramatically when the quench started steering the system across the phase transition.
Not only could we define the work exchanged within a lattice gauge theory, using strong-coupling quantum thermodynamics. Also, that work signaled a phase transition—a large-scale, qualitative behavior.
What future do my collaborators and I dream of for our work? First, we want for an experimentalist to measure the work1 spent on a lattice-gauge-theory system in a quantum simulation. Second, we should expand our definitions of quantum work and heat beyond sudden-quench processes. How much work and heat do particles exchange while scattering in particle accelerators, for instance? Third, we hope to identify other phase transitions and macroscopic phenomena using our work and heat definitions. Fourth—most broadly—we want to establish a quantum thermodynamics for lattice gauge theories.
Five years ago, I didn’t expect to be collaborating on lattice gauge theories inspired by nuclear physics. But this work is some of the most exciting I can think of to do. I hope you think it exciting, too. And, more importantly, I hope John thought it exciting in Toronto.
I was a student at Caltech during “One Entangled Evening,” the campus-wide celebration of Richard Feynman’s 100th birthday. So I watched John sing and dance onstage, exhibiting no fear of embarrassing himself. That observation seemed like an appropriate note on which to finish with my slides…and invite questions from the audience.
I recently wrote about how the Parker Solar Probe crossed the Sun’s ‘Alfvén surface’: the surface outside which the outflowing solar wind becomes supersonic.
This is already pretty cool—but even better, the ‘sound’ here is not ordinary sound: it consists of vibrations in both the hot electrically conductive plasma of the Sun’s atmosphere and its magnetic field! These vibrations are called ‘Alfvén waves’.
To understand these waves, we need to describe how an electrically conductive fluid interacts with the electric and magnetic fields. We can do this—to a reasonable approximation—using the equations of ‘magnetohydrodynamics’:
These equations also describe other phenomena we see in stars, outer space, fusion reactors, the Earth’s liquid outer core, and the Earth’s upper atmosphere: the ionosphere, and above that the magnetosphere. These phenomena can be very difficult to understand, combining all the complexities of turbulence with new features arising from electromagnetism. Here’s an example, called the Orzag–Tang vortex, simulated by Philip Mocz:
I’ve never studied magnetohydrodynamics—I was afraid that if I learned a little, I’d never stop, because it’s endlessly complex. Now I’m getting curious. But all I want to do today is explain to you, and myself, what the equations of magnetohydrodynamics are—and where they come from.
To get these equations, we assume that our system is described by some time-dependent fields on 3-dimensional space, or whatever region of space our fluid occupies. I’ll write vector fields in boldface and scalar fields in non-bold:
• the velocity of our fluid,
• the density field of our fluid,
• the pressure field of our fluid,
• the electric current,
• the electric field,
• the magnetic field,
You may have noticed one missing: the charge density! We assume that this is zero, because in a highly conductive medium the positive and negative charges quickly even out unless the electric field is changing very rapidly. (When this assumption breaks down we use more complicated equations.)
So, we start with Maxwell’s equations, but with the charge density set to zero:
I’m writing them in SI units, where they include two constants:
The product of these two constants is the reciprocal of the square of the speed of light:
This makes the last term Maxwell added to his equations very small unless the electric field is changing very rapidly:
In magnetohydrodynamics we assume the electric field is changing slowly, so we drop this last term, getting a simpler equation:
We also assume a sophisticated version of Ohm’s law: the electric current is proportional to the force on the charges at that location. But here the force involves not only the electric field but the magnetic field! So, it’s given by the Lorentz force law, namely
where notice we’re assuming the velocity of the charges is the fluid velocity Thus we get
Next we assume local conservation of mass: the increase (or decrease) of the fluid’s density at some point can only be caused by fluid flowing toward (or away from) that point. So, the time derivative of the density is minus the divergence of the momentum density
This is analogous to the equation describing local conservation of charge in electromagnetism: the so-called continuity equation.
We also assume that the pressure is some function of the density:
This is called the equation of state of our fluid. The function depends on the fluid: for example, for an ideal gas is simply proportional to We can use the equation of state to eliminate and work with
Last—but not least—we need an equation of motion saying how the fluid’s velocity changes with time! This equation follows from Newton’s law or more precisely his actual law
where is momentum. But we need to replace by the momentum density and replace by the force density, which is
The force density comes three parts:
• The force of the magnetic field on the current, as described by the Lorentz force law:
• The force caused by the gradient of pressure, pointing toward regions of lower pressure:
• the force caused by viscosity, where faster bits of fluid try to speed up their slower neighbors, and vice versa:
Here we aren’t using just the ordinary time derivative of : we want to keep track of how is changing for a bit of fluid that’s moving along with the flow of the fluid, so we need to add in the derivative of in the direction. For this we use the material derivative:
which also has many other names like ‘convective derivative’ or ‘substantial derivative’.
So, those are the equations of magnetohydrodynamics! Let’s see them in one place. I’ll use the equation of state to eliminate the pressure by writing it as a function of density:
MAGNETOHYDRODYNAMICS
Simplified Maxwell equations:
Ohm’s law:
Local conservation of mass:
Equation of motion:
Notice that in our simplified Maxwell equations, two terms involving the electric field are gone. That’s why these are called the equations of magnetohydrodynamics. You can even eliminate the current from these equations, replacing it with The magnetic field reigns supreme!
Magnetic diffusion
It feels unsatisfying to quit right after I show you the equations of magnetohydrodynamics. Having gotten this far, I can’t resist showing you a couple of cool things we can do with these equations!
First, we can use Ohm’s law to see how the magnetic field tends to ‘diffuse’, like heat spreads out through a medium.
We start with Ohm’s law:
We take the curl of both sides:
We can get every term here to involve if we use two of our simplified Maxwell equations:
We get this:
Then we can use this vector identity:
Since another of the Maxwell equations says we get
and thus
and finally we get
THE MAGNETIC DIFFUSION EQUATION
Except for the last term, this is the heat equation—but not for temperature, which is a scalar field, but for the magnetic field, which is a vector field! The constant says how fast the magnetic field spreads out, so it’s called the magnetic diffusivity.
The last term makes things more complicated and interesting.
Magnetic pressure and tension
Second, and finally, I want to give you more intuition for how the magnetic field exerts a force on the conductive fluid in magnetohydrodynamics. We’ll see that the magnetic field has both pressure and tension.
Remember, the magnetic field exerts a force
More precisely this is the force per volume: the magnetic force density. We can express this solely in terms of the magnetic field using one of our simplified Maxwell equations
The point of all these manipulations was not merely to revive our flagging memories of vector calculus—though that was good too. The real point is that they reveal that the magnetic force density consists of two parts, each with a nice physical intepretation.
Of course the force acts on the fluid, not on the magnetic field lines themselves. But as you dig deeper into magnetohydrodynamics, you’ll see that sometimes the magnetic field lines get ‘frozen in’ to the fluid—that is, they get carried along by the fluid flow, like bits of rope in a raging river. Then the first term above tends to straighten out these field lines, while the second term tends to push them apart!
The second term is minus the gradient of
Since minus the gradient of ordinary pressure creates a force density on a fluid, we call the quantity above the magnetic pressure. Yes, the square of the magnitude of the magnetic field creates a kind of pressure! It pushes the fluid like this:
Its magnitude is where is the radius of curvature of the circle that best fits the magnetic field lines at the given point. And it points toward the center of that circle!
I’m getting tired of proving formulas, so I’ll leave the proofs of these facts as puzzles. If you get stuck and want a hint, go here.
There’s a lot more to say about this, and you can find a lot of it here:
I might or might not say more—but if I don’t, these notes will satisfy your curiosity about exactly when magnetic field lines get ‘frozen in’ to the motion of an electrically conductive fluid, and how magnetic pressure then tends to push these field lines apart, while magnetic tension tends to straighten them out!
All of this is very cool, and I think this subject is a place where all the subtler formulas of vector calculus really get put to good use.
Wow! Biologists seem to have discovered an entirely new kind of life form. They’re called ‘obelisks’, and you probably have some in you.
They were discovered in 2024—not by somebody actually seeing one, but by analyzing huge amounts of genetic data from the human gut. This search found 29,959 new RNA sequences, similar to each other, but very different from any previously known. Thus, we don’t know where these things fit into the tree of life!
Biologists found them when they were trying to solve a puzzle. Even smaller than viruses, there exist ‘viroids’ that are just loops of RNA that cleverly manage to reproduce using the machinery of the cell they infect. Viruses have a protein coat. Viroids are just bare RNA—it doesn’t even code for any proteins!
But all known viroids only infect plants. The first one found causes a disease in potatoes; another causes a disease in avocados, and so on. This raised the puzzle: why aren’t there viroids that infect bacteria, or animals?
Now perhaps we’ve found them! But not quite: while obelisks may work in a similar way, they seem genetically unrelated. Also, their RNA seems to code for two proteins.
Given how little we know about this stuff, I think some caution is in order. Still, this is really cool. Do any of you biologists out there know any research going on now to learn more?
The original paper is free to read on the bioRxiv:
• Ivan N. Zheludev, Robert C. Edgar, Maria Jose Lopez-Galiano, Marcos de la Peña, Artem Babaian, Ami S. Bhatt, Andrew Z. Fire, Viroid-like colonists of human microbiomes, Cell187 (2024), 6521-6536.e18.
I see just one other paper, about an automated system for detecting obelisks:
We recently got back from a family trip to Mexico, a country I’d almost never been to (a couple of childhood day trips to Nogales, a walk across the border into Juárez in 1999, a day and a half in Cabo San Lucas giving a lecture.) I’m a fan! Some surprises:
Drinking pulque under a highway bridge with our family (part of the food tour!) I heard this incredible banger playing on the radio:
“Reaktorn läck i Barsebäck” doesn’t sound like the name of a Mexican song, and indeed, this is a 1980 album track by Sweidish raggare rocker Eddie Meduza, which, for reasons that seem to be completely unknown, is extremely popular in Mexico, where it is (for reasons that, etc etc) known as “Himno a la Banda.”
2. Mexico had secession movements. Yucatán was an independent state in the 1840s. (One of our guides, who moved there from Mexico City, told us it still feels like a different country.) And a Maya revolt in the peninsula created a de facto independent state for decades in the second half of the 19th century. Apparently the perceived weakness of the national government was one cause of the Pastry War.
3. Partly as a result of the above, sisal plantation owners in Yucatán were short of indentured workers, so they imported 1,000 desperate Korean workers in 1905. By the time their contracts ended, independent Korea had been overrun by Japan. So they stayed in Mexico and their descendants still live there today.
4. It is customary in Mexico, or at least in Mexico City, to put a coat rack right next to the table in a restaurant. I guess that makes sense! Why wouldn’t you want your coat right there, and isn’t it nicer for it to be on a rack than hung over the back of your chair?
5. The torta we usually see in America, on a big round roll, is a Central Mexico torta. In Yucatán, a torta is served on French bread, because the filling is usually cocinita pibil, which is so juicy it would make a soft roll soggy. A crusty French roll can soak up the juices without losing its structural integrity. The principle is essentially the same as that of an Italian beef.
After living in Japan for about four months, we left in mid-December. We miss it already.
One of the pleasures we discovered is the onsen, or hot spring. Originally referring to the natural volcanic springs themselves, and the villages around them, there are now onsens all over Japan. Many hotels have an onsen, and most towns will have several. Some people still use them as their primary bath and shower for keeping clean. (Outside of actual volcanic locations, these are technically sento rather than onsen.) You don’t actually wash yourself in the hot baths themselves; they are just for soaking, and there are often several, at different temperatures, mineral content, indoor and outdoor locations, whirlpools and even “electric baths” with muscle-stimulating currents. For actual cleaning, there is a bank of hand showers, usually with soap and shampoo. Some can be very basic, some much more like a posh spa, with massages, saunas, and a restaurant.
Our favourite, about 25 minutes away by bicycle, was Kirari Onsen Tsukuba. When not traveling, we tried to go every weekend, spending a day soaking in the hot water, eating the good food, staring at the gardens, snacking on Hokkaido soft cream — possibly the best soft-serve ice cream in the world (sorry, Carvel!), and just enjoying the quiet and peace. Even our seven- and nine-year old girls have found the onsen spirit, calming and quieting themselves down for at least a few hours.
Living in Tsukuba, lovely but not a common tourist destination, although with plenty of foreigners due to the constellation of laboratories and universities, we were often one of only one or two western families in our local onsen. It sometimes takes Americans (and those from other buttoned-up cultures) some time to get used to their sex-segregated but fully-naked policies of the baths themselves. The communal areas, however, are mixed, and fully-clothed. In fact, many hotels and fancier onsen facilities supply a jinbei, a short-sleeve pyjama set in which you can softly pad around the premises during your stay. (I enjoyed wearing jinbei so much that I purchased a lightweight cotton set for home, and am also trying to get my hands on samue, a somewhat heavier style of traditional Japanese clothing.)
And my newfound love for the onsen is another reason not to get a tattoo beyond the sagging flesh and embarrassment of my future self: in Japan, tattoos are often a symbol of the yakuza, and are strictly forbidden in the onsen, even for foreigners.
Later in our sabbatical, we will be living in the Netherlands, which also has a good public bath culture, but it will be hard to match the calm of the Japanese onsen.
Thanks to everyone who made all those kind remarks in various places last month after my mother died. I've not responded individually (I did not have the strength) but I did read them all and they were deeply appreciated. Yesterday would’ve been mum‘s 93rd birthday. A little side-note occurred to me the other day: Since she left us a month ago, she was just short of having seen two perfect square years. (This year and 1936.) Anyway, still on the theme of playing with numbers, my siblings and I agreed that as a tribute to her on the day, we would all do some kind of outdoor activity for 93 minutes. Over in London, my brother and sister did a joint (probably chilly) walk together in Regents Park and surrounds. I decided to take out a piece of the afternoon at low tide and run along the beach. It went pretty well, [...] Click to continue reading this post →
As a French citizen I should probably disavow the following post and remind myself that I have access to some of the best food in the world. Yet it's impossible to forget the tastes of your childhood. And indeed there are lots of British things that are difficult or very expensive to get hold of in France. Some of them (Marmite, Branston pickle ...) I can import via occasional trips across the channel, or in the luggage of visiting relatives. However, since Brexit this no longer works for fresh food like bacon and sausages. This is probably a good thing for my health, but every now and then I get a hankering for a fry-up or a bacon butty, and as a result of their rarity these are amongst the favourite breakfasts of my kids too. So I've learnt how to make bacon and sausages (it turns out that boudin noir is excellent with a fry-up and I even prefer it to black pudding).
Sausages are fairly labour-intensive, but after about an hour or so's work it's possible to make one or two kilos worth. Back bacon, on the other hand, takes three weeks to make one batch, and I thought I'd share the process here.
1. Cut of meat
The first thing is to get the right piece of pork, since animals are divided up differently in different countries. I've made bacon several times now and keep forgetting which instructions I previously gave to the butcher at my local Grand Frais ... Now I have settled on asking for a carré de porc, and when they (nearly always) tell me that they don't have that in I ask for côtes de porc première in one whole piece, and try to get them to give me a couple of kilos. As you can find on wikipedia, I need the same piece of meat used to make pork chops. I then ask them to remove the spine, but it should still have the ribs. So I start with this:
2. Cure
Next the meat has to be cured for 10 days (I essentially follow the River Cottage recipe). I mix up a 50-50 batch of PDV salt and brown sugar (1 kg in total here), and add some pepper, juniper berries and bay leaves:
Notice that this doesn't include any nitrites or nitrates. I have found that nitrates/nitrites are essential for the flavour in sausages, but in bacon the only thing that they will do (other than be a carcinogen) as far as I can tell is make the meat stay pink when you cook it. I can live without that. This cure makes delicious bacon as far as I'm concerned.
The curing process involves applying 1/10th of the mixture each day for ten days and draining off the liquid produced at each step. After the first coating it looks like this:
The salt and sugar remove water from the meat, and penetrate into it, preserving it. Each day I get liquid at the bottom, which I drain off and apply the next cure. After one day it looks like this:
This time I still had liquid after 10 days:
3. Drying
After ten days, I wash/wipe off the cure and pat it down with some vinegar. If you leave cure on the meat it will be much too salty (and, to be honest, this cure always gives quite salty bacon). So at this point it looks like this:
I then cover the container with a muslin that has been doused with a bit more vinegar, and leave in the fridge (at first) and then in the garage (since it's nice and cold this time of year) for ten days or so. This part removes extra moisture. It's possible that there will be small amounts of white mould that appear during this stage, but these are totally benign: you only have to worry if it starts to smell or you get blue/black mould, but this never happened to me so far.
4. Smoking
After the curing/drying, the bacon is ready to eat and should in principle keep almost indefinitely. However, I prefer smoked bacon, so I cold smoke it. This involves sticking it in a smoker (essentially just a box where you can suspend the meat above some smouldering sawdust) for several hours:
The sawdust is beech wood and slowly burns round in the little spiral device you can see above. Of course, I close the smoker up and usually put it in the shed to protect against the elements:
5. All done!
And then that's it! Delicious back bacon that really doesn't take very long to eat:
As I mentioned above, it's usually still a bit salty, so when I slice it to cook I put the pieces in water for a few minutes before grilling/frying:
Here you see that the colour is just like frying pork chops ... but the flavour is exactly right!
During Christmas holidays I tend to indulge in online chess playing a bit too much, wasting several hours a day that could be used to get back on track with the gazillion research projects I am currently trying to keep pushing. But at times it gives me pleasure, when I conceive some good tactical sequence. Take the position below, from a 5' game on chess.com today. White has obtained a winning position, but can you win it with the clock ticking? (I have less than two minutes left for the rest of the game...)
In Kenneth Grahame’s 1908 novel The Wind in the Willows, a Mole meets a Water Rat who lives on a River. The Rat explains how the River permeates his life: “It’s brother and sister to me, and aunts, and company, and food and drink, and (naturally) washing.” As the River plays many roles in the Rat’s life, so does Carnot’s theorem play many roles in a thermodynamicist’s.
Nicolas Léonard Sadi Carnot lived in France during the turn of the 19th century. His father named him Sadi after the 13th-century Persian poet Saadi Shirazi. Said father led a colorful life himself,1 working as a mathematician, engineer, and military commander for and before the Napoleonic Empire. Sadi Carnot studied in Paris at the École Polytechnique, whose members populate a “Who’s Who” list of science and engineering.
As Carnot grew up, the Industrial Revolution was humming. Steam engines were producing reliable energy on vast scales; factories were booming; and economies were transforming. France’s old enemy Britain enjoyed two advantages. One consisted of inventors: Englishmen Thomas Savery and Thomas Newcomen invented the steam engine. Scotsman James Watt then improved upon Newcomen’s design until rendering it practical. Second, northern Britain contained loads of coal that industrialists could mine to power her engines. France had less coal. So if you were a French engineer during Carnot’s lifetime, you should have cared about engines’ efficiencies—how effectively engines used fuel.2
Carnot proved a fundamental limitation on engines’ efficiencies. His theorem governs engines that draw energy from heat—rather than from, say, the motional energy of water cascading down a waterfall. In Carnot’s argument, a heat engine interacts with a cold environment and a hot environment. (Many car engines fall into this category: the hot environment is burning gasoline. The cold environment is the surrounding air into which the car dumps exhaust.) Heat flows from the hot environment to the cold. The engine siphons off some heat and converts it into work. Work is coordinated, well-organized energy that one can directly harness to perform a useful task, such as turning a turbine. In contrast, heat is the disordered energy of particles shuffling about randomly. Heat engines transform random heat into coordinated work.
In The Wind and the Willows, Toad drives motorcars likely powered by internal combustion, rather than by a steam engine of the sort that powered the Industrial Revolution.
An engine’s efficiency is the bang we get for our buck—the upshot we gain, compared to the cost we spend. Running an engine costs the heat that flows between the environments: the more heat flows, the more the hot environment cools, so the less effectively it can serve as a hot environment in the future. An analogous statement concerns the cold environment. So a heat engine’s efficiency is the work produced, divided by the heat spent.
Carnot upper-bounded the efficiency achievable by every heat engine of the sort described above. Let denote the cold environment’s temperature; and , the hot environment’s. The efficiency can’t exceed . What a simple formula for such an extensive class of objects! Carnot’s theorem governs not only many car engines (Otto engines), but also the Stirling engine that competed with the steam engine, its cousin the Ericsson engine, and more.
In addition to generality and simplicity, Carnot’s bound boasts practical and fundamental significances. Capping engine efficiencies caps the output one can expect of a machine, factory, or economy. The cap also prevents engineers from wasting their time on daydreaming about more-efficient engines.
More fundamentally than these applications, Carnot’s theorem encapsulates the second law of thermodynamics. The second law helps us understand why time flows in only one direction. And what’s deeper or more foundational than time’s arrow? People often cast the second law in terms of entropy, but many equivalent formulations express the law’s contents. The formulations share a flavor often synopsized with “You can’t win.” Just as we can’t grow younger, we can’t beat Carnot’s bound on engines.
One might expect no engine to achieve the greatest efficiency imaginable: , called the Carnot efficiency. This expectation is incorrect in one way and correct in another. Carnot did design an engine that could operate at his eponymous efficiency: an eponymous engine. A Carnot engine can manifest as the thermodynamicist’s favorite physical system: a gas in a box topped by a movable piston. The gas undergoes four strokes, or steps, to perform work. The strokes form a closed cycle, returning the gas to its initial conditions.3
Steampunk artist Todd Cahill beautifully illustrated the Carnot cycle for my book. The gas performs useful work because a weight sits atop the piston. Pushing the piston upward, the gas lifts the weight.
The gas expands during stroke 1, pushing the piston and so outputting work. Maintaining contact with the hot environment, the gas remains at the temperature . The gas then disconnects from the hot environment. Yet the gas continues to expand throughout stroke 2, lifting the weight further. Forfeiting energy, the gas cools. It ends stroke 2 at the temperature .
The gas contacts the cold environment throughout stroke 3. The piston pushes on the gas, compressing it. At the end of the stroke, the gas disconnects from the cold environment. The piston continues compressing the gas throughout stroke 4, performing more work on the gas. This work warms the gas back up to .
In summary, Carnot’s engine begins hot, performs work, cools down, has work performed on it, and warms back up. The gas performs more work on the piston than the piston performs on it.
At what cost, if the engine operates at the Carnot efficiency? The engine mustn’t waste heat. One wastes heat by roiling up the gas unnecessarily—by expanding or compressing it too quickly. The gas must stay in equilibrium, a calm, quiescent state. One can keep the gas quiescent only by running the cycle infinitely slowly. The cycle will take an infinitely long time, outputting zero power (work per unit time). So one can achieve the perfect efficiency only in principle, not in practice, and only by sacrificing power. Again, you can’t win.
Efficiency trades off with power.
Carnot’s theorem may sound like the Eeyore of physics, all negativity and depression. But I view it as a companion and backdrop as rich, for thermodynamicists, as the River is for the Water Rat. Carnot’s theorem curbs diverse technologies in practical settings. It captures the second law, a foundational principle. The Carnot cycle provides intuition, serving as a simple example on which thermodynamicists try out new ideas, such as quantum engines. Carnot’s theorem also provides what physicists call a sanity check: whenever a researcher devises a new (for example, quantum) heat engine, they can confirm that the engine obeys Carnot’s theorem, to help confirm their proposal’s accuracy. Carnot’s theorem also serves as a school exercise and a historical tipping point: the theorem initiated the development of thermodynamics, which continues to this day.
So Carnot’s theorem is practical and fundamental, pedagogical and cutting-edge—brother and sister, and aunts, and company, and food and drink. I just wouldn’t recommend trying to wash your socks in Carnot’s theorem.
1To a theoretical physicist, working as a mathematician and an engineer amounts to leading a colorful life.
2People other than Industrial Revolution–era French engineers should care, too.
3A cycle doesn’t return the hot and cold environments to their initial conditions, as explained above.
You might have heard of the conundrum “What do you give the man who has everything?” I discovered a variation on it last October: how do you celebrate the man who studied (nearly) everything? Physicist Edwin Thompson Jaynes impacted disciplines from quantum information theory to biomedical imaging. I almost wrote “theoretical physicist,” instead of “physicist,” but a colleague insisted that Jaynes had a knack for electronics and helped design experiments, too. Jaynes worked at Washington University in St. Louis (WashU) from 1960 to 1992. I’d last visited the university in 2018, as a newly minted postdoc collaborating with WashU experimentalist Kater Murch. I’d scoured the campus for traces of Jaynes like a pilgrim seeking a saint’s forelock or humerus. The blog post “Chasing Ed Jaynes’s ghost” documents that hunt.
I found his ghost this October.
Kater and colleagues hosted the Jaynes Centennial Symposium on a brilliant autumn day when the campus’s trees were still contemplating shedding their leaves. The agenda featured researchers from across the sciences and engineering. We described how Jaynes’s legacy has informed 21st-century developments in quantum information theory, thermodynamics, biophysics, sensing, and computation. I spoke about quantum thermodynamics and information theory—specifically, incompatible conserved quantities, about which my research-group members and I have blogged many times.
Irfan Siddiqi spoke about quantum technologies. An experimentalist at the University of California, Berkeley, Irfan featured on Quantum Frontiersseven years ago. His lab specializes in superconducting qubits, tiny circuits in which current can flow forever, without dissipating. How can we measure a superconducting qubit? We stick the qubit in a box. Light bounces back and forth across the box. The light interacts with the qubit while traversing it, in accordance with the Jaynes–Cummings model. We can’t seal any box perfectly, so some light will leak out. That light carries off information about the qubit. We can capture the light using a photodetector to infer about the qubit’s state.
Bill Bialek, too, spoke about inference. But Bill is a Princeton biophysicist, so fruit flies preoccupy him more than qubits do. A fruit fly metamorphoses from a maggot that hatches from an egg. As the maggot develops, its cells differentiate: some form a head, some form a tail, and so on. Yet all the cells contain the same genetic information. How can a head ever emerge, to differ from a tail?
A fruit-fly mother, Bill revealed, injects molecules into an egg at certain locations. These molecules diffuse across the egg, triggering the synthesis of more molecules. The knock-on molecules’ concentrations can vary strongly across the egg: a maggot’s head cells contain molecules at certain concentrations, and the tail cells contain the same molecules at other concentrations.
At this point in Bill’s story, I was ready to take my hat off to biophysicists for answering the question above, which I’ll rephrase here: if we find that a certain cell belongs to a maggot’s tail, why does the cell belong to the tail? But I enjoyed even more how Bill turned the question on its head (pun perhaps intended): imagine that you’re a maggot cell. How can you tell where in the maggot you are, to ascertain how to differentiate? Nature asks this question (loosely speaking), whereas human observers ask Bill’s first question.
To answer the second question, Bill recalled which information a cell accesses. Suppose you know four molecules’ concentrations: , , , and . How accurately can you predict the cell’s location? That is, what probability does the cell have of sitting at some particular site, conditioned on the s? That probability is large only at one site, biophysicists have found empirically. So a cell can accurately infer its position from its molecules’ concentrations.
I’m no biophysicist (despite minor evidence to the contrary), but I enjoyed Bill’s story as I enjoyed Irfan’s. Probabilities, information, and inference are abstract notions; yet they impact physical reality, from insects to quantum science. This tension between abstraction and concreteness arrested me when I first encountered entropy, in a ninth-grade biology lecture. The tension drew me into information theory and thermodynamics. These toolkits permeate biophysics as they permeate my disciplines. So, throughout the symposium, I spoke with engineers, medical-school researchers, biophysicists, thermodynamicists, and quantum scientists. They all struck me as my kind of people, despite our distribution across the intellectual landscape. Jaynes reasoned about distributions—probability distributions—and I expect he’d have approved of this one. The man who studied nearly everything deserves a celebration that illuminates nearly everything.
I've been very quiet here over the last couple of weeks. My mother, Delia Maria Johnson, already in hospital since 5th November or so, took a turn for the worse and began a rapid decline. She died peacefully after some days, and to be honest I’ve really not been myself since then.
There's an extra element to the sense of loss when (as it approaches) you are powerless to do anything because of being thousands of miles away. On the plus side, because of the ease of using video calls, and with the help of my sister being there, I was able to be somewhat present during what turned out to be the last moments when she was aware of people around her, and therefore was able to tell her I loved her one last time.
Rather than charging across the world on planes, trains, and in automobiles, probably being out of reach during any significant changes in the situation (the doctors said I would likely not make it in time) I did a number of things locally that I am glad I got to do.
It began with visiting (and sending a photo from) the Santa Barbara mission, a place she dearly loved and was unable to visit again after 2019, along with the pier. These are both places we walked together so much back when I first lived here in what feels like another life.
Then, two nights before mum passed away, but well after she’d seemed already beyond reach of anyone, although perhaps (I’d like to think) still able to hear things, my sister contacted me from her bedside asking if I’d like to read mum a psalm, perhaps one of her favourites, 23 or 91. At first I thought she was already planning the funeral, and expressed my surprise at this since mum was still alive and right next to her. But I’d misunderstood, and she’d in fact had a rather great idea. This suggestion turned into several hours of, having sent on recordings of the two psalms, my digging into the poetry shelf in the study and discovering long neglected collections through which I searched (sometimes accompanied by my wife and son) for additional things to read. I recorded some and sent them along, as well as one from my son, I’m delighted to say. Later, the whole thing turned into me singing various songs while playing my guitar and sending recordings of those along too.
Incidentally, the guitar-playing was an interesting turn of events since not many months ago I decided after a long lapse to start playing guitar again, and try to move the standard of my playing (for vocal accompaniment) to a higher level than I’d previously done, by playing and practicing for a little bit on a regular basis. I distinctly recall thinking at one point during one practice that it would be nice to play for mum, although I did not imagine that playing to her while she was on her actual death-bed would be the circumstance under which I’d eventually play for her, having (to my memory) never directly done so back when I used to play guitar in my youth. (Her overhearing me picking out bits of Queen songs behind my room door when I was a teenager doesn’t count as direct playing for her.)
My old friend Marc Weidenbaum, curator and writer of disquiet.com, reminded me, in his latest post, of the value of blogging. So, here I am (again).
Since September, I have been on sabbatical in Japan, working mostly at QUP (International Center for Quantum-field Measurement Systems for Studies of the Universe and Particles) at the KEK accelerator lab in Tsukuba, Japan, and spending time as well at the Kavli IPMU, about halfway into Tokyo from here. Tsukuba is a “science city” about 30 miles northeast of Tokyo, home to multiple Japanese scientific establishments (such as a University and a major lab for JAXA, the Japanese space agency).
Scientifically, I’ve spent a lot of time thinking and talking about the topology of the Universe, future experiments to measure the cosmic microwave background, and statistical tools for cosmology experiments. And I was honoured to be asked to deliver a set of lectures on probability and statistics in cosmology, a topic which unites most of my research interests nowadays.
Japan, and Tsukuba in particular, is a very nice place to live. It’s close enough to Tokyo for regular visits (by the rapid Tsukuba Express rail line), but quiet enough for our local transport to be dominated by cycling around town. We love the food, the Japanese schools that have welcomed our children, the onsens, and our many views of Mount Fuji.
And after almost four months in Japan, it’s beginning to feel like home.
Unfortunately, we’re leaving our short-term home in Japan this week. After a few weeks of travel in Southeast Asia, we’ll be decamped to the New York area for the rest of the Winter and early Spring. But (as further encouragement to myself to continue blogging) I’ll have much more to say about Japan — science and life — in upcoming posts.
In group meeting last week, Stefan Rankovic (NYU undergrad) presented results on a very low-amplitude possible transit in the lightcurve of a candidate long-period eclipsing binary system found in the NASA Kepler data. The weird thing is that (even though the period is very long) the transit of the possible planet looks just like the transit of the secondary star in the eclipsing binary. Like just like it, only lower in amplitude (smaller in radius).
If the transit looks identical, only lower in amplitude, it suggests that it is taking an extremely similar chord across the primary star, at the same speed, with no difference in inclination. How could that be? Well if they are moving at the same speed on the same path, maybe we have a 1:1 resonance, like a Trojan? If so, there are so many cool things about this system. It was an exciting group meeting, to be sure.
I’m a baker, as you probably know. I’ve regularly made bread, cakes, pies, and all sorts of things for friends and family. About a year ago, someone in the family was diagnosed with a severe allergy to gluten, and within days we removed all gluten products from the kitchen, began … Click to continue reading this post →
The good news (following from last post) is that it worked out! I was almost short of the amount I needed to cover the pie, and so that left nothing for my usual decoration... but it was a hit at dinner and for left-overs today, so that's good!