We at the mathematics department at the University of Edinburgh are doing more and more things in conjunction with our sisters and brothers at Heriot–Watt University, also in Edinburgh. For instance, our graduate students take classes together, and about a dozen of them are members of both departments simultaneously. We’re planning to strengthen those links in the years to come.

The news is that Heriot–Watt are hiring.

They’re looking for one or more “pure” mathematicians. These are permanent jobs at any level, from most junior to most senior. There’s significant interest in category theory there, in contexts such as mathematical physics and semigroup theory — e.g. when I taught an introductory category theory course last year, there was a good bunch of participants from Heriot–Watt.

In case you were wondering, Heriot was goldsmith to the royal courts of Scotland and Denmark in the 16th century. Gold $\mapsto$ money $\mapsto$ university, apparently. Watt is the Scottish engineer James Watt, as in “60-watt lightbulb”.

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the […]

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.

I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces are constructed for each frequency scale , and then a further space is constructed for a given frequency envelope by the formula

where the infimum is taken over all extensions of to the Minkowski spacetime ; similarly one defines

The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab to obtain the equality

(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions of that are optimal for computing are not necessarily the Littlewood-Paley projections of the extensions of that are optimal for computing .

To remedy the problem, one has to prove an upper bound of the form

for all Schwartz (actually we need affinely Schwartz , but one can easily normalise to the Schwartz case). Without loss of generality we may normalise the RHS to be . Thus

for each , and one has to find a single extension of such that

for each . Achieving a that obeys (4) is trivial (just extend by zero), but such extensions do not necessarily obey (5). On the other hand, from (3) we can find extensions of such that

the extension will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the norm does control the norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients , while bounded, are not necessarily summable).

This can be fixed as follows. For each we introduce a time cutoff supported on that equals on and obeys the usual derivative estimates in between (the time derivative of size for each ). Later we will prove the truncation estimate

Assuming this estimate, then if we set , then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the summation and taking limits we can check that is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that

for all .

For this is immediate from (2). Now suppose that for some integer (the case when is treated similarly). Then we can split

where

The contribution of the term is acceptable by (6) and estimate (82) from my paper. The term sums to which is acceptable by (2). So it remains to control the norm of . By the triangle inequality and the fundamental theorem of calculus, we can bound

By hypothesis, . Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have

and then we are done by summing the geometric series in .

It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.

** — 1. Proof of truncation estimate — **

Firstly, by rescaling (and changing as necessary) we may assume that . By the triangle inequality and time translation invariance, it suffices to show an estimate of the form

where is a smooth time cutoff that equals on and is supported in , and all norms are understood to be on . We may normalise the right-hand side to be , thus is supported in frequencies , and by equation (79) of my paper one has the estimates

for all , and our objective is to show that

The bound (11) easily follows from (8), the Leibniz rule, and using the frequency localisation of to ignore spatial derivatives. Now we turn to (12). From the definition of the norms, we have

for all integers , and we need to show that

Fix . We can use Littlewood-Paley operators to split , where is supported on time frequencies and is supported on time frequencies . For the contribution of one can replace in (15) by (say) and the claim then follows from (14), the Leibniz rule, and Hölder’s inequality (again ignoring spatial derivatives). For the contribution of , we discard and observe that has an norm of (and its time derivative has a norm of ), so this contribution is then acceptable from (8) and Hölder’s inequality.

Finally we need to show (13). Similarly to before, we split . We also split , leaving us with the task of proving the four estimates

We begin with (16). The multiplier is disposable in the sense of the paper, and similarly if one replaces by a slightly larger multiplier; this lets us bound the left-hand side of (16) by

The time cutoff commutes with the spatial Fourier projection and can then be discarded by equation (66) of my paper. This term is thus acceptable thanks to (10).

Now we turn to (17). We can freely insert a factor of in front of . Applying estimate (75) from my paper, it then suffices to show that

From the Fourier support of the expression inside the norm, the left-hand side is bounded by

discarding the cutoff and using (9) we see that this contribution is acceptable.

Next, we show (18). Here we use the energy estimate from equation (27) (and (25)) of the paper. By repeating the proof of (11) (and using Lemma 4 from my paper) we see that

so it suffices to show that

Expanding out the d’Lambertian using the Leibniz rule, we are reduced to showing the estimates

For (20) we note that has an norm of , while from (9) has an norm of , so the claim follows from Hölder’s inequality. For (21) we can similarly observe that has an norm of while from (8) we see that has an norm of , so the claim again follows from Hölder’s inequality. A similar argument gives (22) (with an additional gain of coming from the second derivative on ).

Finally, for (19), we observe from the Fourier separation between and that we may replace by (in fact one could do a much more drastic replacement if desired). The claim now follows from repeating the proof of (18).

Why was 7.333… disgusted by 7.666….? Because 7.666…. ate twenty-five turds.

— Gulyás et al., “A pair spectrometer for measuring multipolarities of energetic nuclear transitions” (description of detector; 1504.00489; NIM)

— Krasznahorkay et al., “Observation of Anomalous Internal Pair Creation in 8Be: A Possible Indication of a Light, Neutral Boson” (experimental result; 1504.01527; PRL version; note PRL version differs from arXiv)

— Feng et al., “Protophobic Fifth-Force Interpretation of the Observed Anomaly in

*Editor’s note: the author is a co-author of the paper being highlighted. *

Recently there’s some press (see links below) regarding early hints of a new particle observed in a nuclear physics experiment. In this bite, we’ll summarize the result that has raised the eyebrows of some physicists, and the hackles of others.

Nuclei are bound states of protons and neutrons. They can have excited states analogous to the excited states of at lowoms, which are bound states of nuclei and electrons. The particular nucleus of interest is beryllium-8, which has four neutrons and four protons, which you may know from the triple alpha process. There are three nuclear states to be aware of: the ground state, the 18.15 MeV excited state, and the 17.64 MeV excited state.

Most of the time the excited states fall apart into a lithium-7 nucleus and a proton. But sometimes, these excited states decay into the beryllium-8 ground state by emitting a photon (γ-ray). Even more rarely, these states can decay to the ground state by emitting an electron–positron pair from a virtual photon: this is called **internal pair creation **and it is these events that exhibit an anomaly.

Physicists at the Atomki nuclear physics institute in Hungary were studying the nuclear decays of excited beryllium-8 nuclei. The team, led by Attila J. Krasznahorkay, produced beryllium excited states by bombarding a lithium-7 nucleus with protons.

The proton beam is tuned to very specific energies so that one can ‘tickle’ specific beryllium excited states. When the protons have around 1.03 MeV of kinetic energy, they excite lithium into the 18.15 MeV beryllium state. This has two important features:

- Picking the proton energy allows one to only produce a specific excited state so one doesn’t have to worry about contamination from decays of other excited states.
- Because the 18.15 MeV beryllium nucleus is produced at
*resonance*, one has a very high yield of these excited states. This is very good when looking for very rare decay processes like internal pair creation.

What one *expects* is that most of the electron–positron pairs have small opening angle with a smoothly decreasing number as with larger opening angles.

Instead, the Atomki team found an excess of events with large electron–positron opening angle. In fact, even more intriguing: the excess occurs around a particular opening angle (140 degrees) and forms a bump.

Here’s why a bump is particularly interesting:

- The distribution of ordinary internal pair creation events is smoothly decreasing and so this is very unlikely to produce a bump.
- Bumps can be signs of new particles: if there is a new, light particle that can facilitate the decay, one would expect a bump at an opening angle that depends on the new particle mass.

Schematically, the new particle interpretation looks like this:

As an exercise for those with a background in special relativity, one can use the relation to prove the result:

This relates the mass of the proposed new particle, *X*, to the opening angle θ and the energies *E* of the electron and positron. The opening angle bump would then be interpreted as a new particle with **mass of roughly 17 MeV**. To match the observed number of anomalous events, the rate at which the excited beryllium decays via the *X* boson must be 6×10^{-6} times the rate at which it goes into a γ-ray.

The anomaly has a significance of 6.8σ. This means that it’s highly unlikely to be a statistical fluctuation, as the 750 GeV diphoton bump appears to have been. Indeed, the conservative bet would be some not-understood systematic effect, akin to the 130 GeV Fermi γ-ray line.

Some physicists are concerned that beryllium may be the ‘boy that cried wolf,’ and point to papers by the late Fokke de Boer as early as 1996 and all the way to 2001. de Boer made strong claims about evidence for a new 10 MeV particle in the internal pair creation decays of the 17.64 MeV beryllium-8 excited state. These claims didn’t pan out, and in fact the instrumentation paper by the Atomki experiment rules out that original anomaly.

The proposed evidence for “de Boeron” is shown below:

When the Atomki group studied the same 17.64 MeV transition, they found that a key background component—subdominant E1 decays from nearby excited states—dramatically improved the fit and were not included in the original de Boer analysis. This is the last nail in the coffin for the proposed 10 MeV “de Boeron.”

However, the Atomki group also highlight how their new anomaly in the 18.15 MeV state behaves differently. Unlike the broad excess in the de Boer result, the new excess is concentrated in a bump. There is no known way in which additional internal pair creation backgrounds can contribute to add a bump in the opening angle distribution; as noted above: all of these distributions are smoothly falling.

The Atomki group goes on to suggest that the new particle appears to fit the bill for a dark photon, a reasonably well-motivated copy of the ordinary photon that differs in its overall strength and having a non-zero (17 MeV?) mass.

With the Atomki result was published and peer reviewed in Physics Review Letters, the game was afoot for theorists to understand how it would fit into a theoretical framework like the dark photon. A group from UC Irvine, University of Kentucky, and UC Riverside found that actually, dark photons have a hard time fitting the anomaly simultaneously with other experimental constraints. In the visual language of this recent ParticleBite, the situation was this:

The main reason for this is that a dark photon with mass and interaction strength to fit the beryllium anomaly would necessarily have been seen by the NA48/2 experiment. This experiment looks for dark photons in the decay of neutral pions (π^{0}). These pions typically decay into two photons, but if there’s a 17 MeV dark photon around, some fraction of those decays would go into dark-photon — ordinary-photon pairs. The non-observation of these unique decays rules out the dark photon interpretation.

The theorists then decided to “break” the dark photon theory in order to try to make it fit. They generalized the types of interactions that a new photon-like particle, *X*, could have, allowing protons, for example, to have completely different charges than electrons rather than having exactly opposite charges. Doing this does gross violence to the theoretical consistency of a theory—but they goal was just to see what a new particle interpretation would have to look like. They found that if a new photon-like particle talked to neutrons but not protons—that is, the new force were *protophobic*—then a theory might hold together.

*Editor’s note: what follows is for readers with some physics background interested in a technical detail; others may skip this section.*

How does a new particle that is allergic to protons avoid the neutral pion decay bounds from NA48/2? Pions decay into pairs of photons through the well-known triangle-diagrams of the axial anomaly. The decay into photon–dark-photon pairs proceed through similar diagrams. The goal is then to make sure that these diagrams cancel.

A cute way to look at this is to assume that at low energies, the relevant particles running in the loop aren’t quarks, but rather nucleons (protons and neutrons). In fact, since only the proton can talk to the photon, one only needs to consider proton loops. Thus if the new photon-like particle, *X*, doesn’t talk to protons, then there’s no diagram for the pion to decay into *γX*. This would be great if the story weren’t completely wrong.

The correct way of seeing this is to treat the pion as a quantum superposition of an up–anti-up and down–anti-down bound state, and then make sure that the *X* charges are such that the contributions of the two states cancel. The resulting charges turn out to be protophobic.

The fact that the “proton-in-the-loop” picture gives the correct charges, however, is no coincidence. Indeed, this was precisely how Jack Steinberger calculated the correct pion decay rate. The key here is whether one treats the quarks/nucleons linearly or non-linearly in chiral perturbation theory. The relation to the Wess-Zumino-Witten term—which is what really encodes the low-energy interaction—is carefully explained in chapter 6a.2 of Georgi’s revised *Weak Interactions*.

The above considerations focus on a new particle with the same spin and parity as a photon (spin-1, parity odd). Another result of the UCI study was a systematic exploration of other possibilities. They found that the beryllium anomaly could not be consistent with spin-0 particles. For a parity-odd, spin-0 particle, one cannot simultaneously conserve angular momentum and parity in the decay of the excited beryllium-8 state. (Parity violating effects are negligible at these energies.)

For a parity-odd pseudoscalar, the bounds on axion-like particles at 20 MeV suffocate any reasonable coupling. Measured in terms of the pseudoscalar–photon–photon coupling (which has dimensions of inverse GeV), this interaction is ruled out down to the inverse Planck scale.

Additional possibilities include:

- Dark
*Z*bosons, cousins of the dark photon with spin-1 but indeterminate parity. This is very constrained by atomic parity violation. - Axial vectors, spin-1 bosons with positive parity. These remain a theoretical possibility, though their unknown nuclear matrix elements make it difficult to write a predictive model. (See section II.D of 1608.03591.)

The plot thickens when once also includes results from nuclear theory. Recent results from Saori Pastore, Bob Wiringa, and collaborators point out a very important fact: the 18.15 MeV beryllium-8 state that exhibits the anomaly and the 17.64 MeV state which does not are actually closely related.

Recall (e.g. from the first figure at the top) that both the 18.15 MeV and 17.64 MeV states are both spin-1 and parity-even. They differ in mass and in one other key aspect: the 17.64 MeV state carries isospin charge, while the 18.15 MeV state and ground state do not.

Isospin is the nuclear symmetry that relates protons to neutrons and is tied to electroweak symmetry in the full Standard Model. At nuclear energies, isospin charge is approximately conserved. This brings us to the following puzzle:

*If the new particle has mass around 17 MeV, why do we see its effects in the 18.15 MeV state but not the 17.64 MeV state?*

Naively, if the new particle emitted, *X, *carries no isospin charge, then isospin conservation prohibits the decay of the 17.64 MeV state through emission of an *X* boson. However, the Pastore et al. result tells us that actually, the isospin-neutral and isospin-charged states mix quantum mechanically so that the observed 18.15 and 17.64 MeV states are mixtures of iso-neutral and iso-charged states. In fact, this mixing is actually rather large, with mixing angle of around 10 degrees!

The result of this is that one cannot invoke isospin conservation to explain the non-observation of an anomaly in the 17.64 MeV state. In fact, the only way to avoid this is to assume that the mass of the *X* particle is on the heavier side of the experimentally allowed range. The rate for *X *emission goes like the 3-momentum cubed (see section II.E of 1608.03591), so a small increase in the mass can suppresses the rate of *X *emission by the lighter state by a lot.

The UCI collaboration of theorists went further and extended the Pastore et al. analysis to include a phenomenological parameterization of explicit isospin violation. Independent of the Atomki anomaly, they found that including isospin violation improved the fit for the 18.15 MeV and 17.64 MeV electromagnetic decay widths within the Pastore et al. formalism. The results of including all of the isospin effects end up changing the particle physics story of the Atomki anomaly significantly:

The results of the nuclear analysis are thus that:

- An interpretation of the Atomki anomaly in terms of a new particle tends to push for a slightly heavier
*X*mass than the reported best fit. (*Remark: the Atomki paper does not do a combined fit for the mass and coupling nor does it report the difficult-to-quantify systematic errors associated with the fit. This information is important for understanding the extent to which the X mass can be pushed to be heavier.)* - The effects of isospin mixing and violation are important to include; especially as one drifts away from the purely protophobic limit.

The theoretical structure presented above gives a framework to do phenomenology: fitting the observed anomaly to a particle physics model and then comparing that model to other experiments. This, however, doesn’t guarantee that a nice—or even self-consistent—theory exists that can stretch over the scaffolding.

Indeed, a few challenges appear:

- The isospin mixing discussed above means the
*X*mass must be pushed to the heavier values allowed by the Atomki observation. - The “protophobic” limit is not obviously anomaly-free: simply asserting that known particles have arbitrary charges does not generically produce a mathematically self-consistent theory.
- Atomic parity violation constraints require that the
*X*couple in the same way to left-handed and right-handed matter. The left-handed coupling implies that*X*must also talk to neutrinos: these open up new experimental constraints.

The Irvine/Kentucky/Riverside collaboration first note the need for a careful experimental analysis of the actual mass ranges allowed by the Atomki observation, treating the new particle mass and coupling as simultaneously free parameters in the fit.

Next, they observe that protophobic couplings can be relatively natural. Indeed: the Standard Model *Z* boson is approximately protophobic at low energies—a fact well known to those hunting for dark matter with direct detection experiments. For exotic new physics, one can engineer protophobia through a phenomenon called kinetic mixing where two force particles mix into one another. A tuned admixture of electric charge and baryon number, *(Q-B)*, is protophobic.

Baryon number, however, is an anomalous global symmetry—this means that one has to work hard to make a baryon-boson that mixes with the photon (see 1304.0576 and 1409.8165 for examples). Another alternative is if the photon kinetically mixes with not baryon number, but the anomaly-free combination of “baryon-minus-lepton number,” *Q-(B-L)*. This then forces one to apply additional model-building modules to deal with the neutrino interactions that come along with this scenario.

In the language of the ‘model building blocks’ above, result of this process looks schematically like this:

The theory collaboration presented examples of the two cases, and point out how the additional ‘bells and whistles’ required may tie to additional experimental handles to test these hypotheses. These are simple existence proofs for how complete models may be constructed.

We have delved rather deeply into the theoretical considerations of the Atomki anomaly. The analysis revealed some unexpected features with the types of new particles that could explain the anomaly (dark photon-like, but not exactly a dark photon), the role of nuclear effects (isospin mixing and breaking), and the kinds of features a complete theory needs to have to fit everything (be careful with anomalies and neutrinos). The *single most important next step*, however, is and has always been * experimental verification of the result*.

While the Atomki experiment continues to run with an upgraded detector, what’s really exciting is that a swath of experiments that are either ongoing or in construction will be able to probe the exact interactions required by the new particle interpretation of the anomaly. This means that the result can be independently verified or excluded within a few years. A selection of upcoming experiments is highlighted in section IX of 1608.03591:

We highlight one particularly interesting search: recently a joint team of theorists and experimentalists at MIT proposed a way for the LHCb experiment to search for dark photon-like particles with masses and interaction strengths that were previously unexplored. The proposal makes use of the LHCb’s ability to pinpoint the production position of charged particle pairs and the copious amounts of *D* mesons produced at Run 3 of the LHC. As seen in the figure above, the LHCb reach with this search thoroughly covers the Atomki anomaly region.

So where we stand is this:

- There is an unexpected result in a nuclear experiment that may be interpreted as a sign for new physics.
- The next steps in this story are independent experimental cross-checks; the threshold for a ‘discovery’ is if another experiment can verify these results.
- Meanwhile, a theoretical framework for understanding the results in terms of a new particle has been built and is ready-and-waiting. Some of the results of this analysis are important for faithful interpretation of the experimental results.

*What if it’s nothing?*

This is the conservative take—and indeed, we may well find that in a few years, the possibility that Atomki was observing a new particle will be completely dead. Or perhaps a source of systematic error will be identified and the bump will go away. That’s part of doing science.

Meanwhile, there are some important take-aways in this scenario. First is the reminder that the search for light, weakly coupled particles is an important frontier in particle physics. Second, for this particular anomaly, there are some neat take aways such as a demonstration of how effective field theory can be applied to nuclear physics (see e.g. chapter 3.1.2 of the new book by Petrov and Blechman) and how tweaking our models of new particles can avoid troublesome experimental bounds. Finally, it’s a nice example of how particle physics and nuclear physics are not-too-distant cousins and how progress can be made in particle–nuclear collaborations—one of the Irvine group authors (Susan Gardner) is a bona fide nuclear theorist who was on sabbatical from the University of Kentucky.

*What if it’s real?*

This is a big “what if.” On the other hand, a 6.8σ effect is not a statistical fluctuation and there is no known nuclear physics to produce a new-particle-like bump given the analysis presented by the Atomki experimentalists.

The threshold for “real” is independent verification. If other experiments can confirm the anomaly, then this could be a huge step in our quest to go beyond the Standard Model. While this type of particle is unlikely to help with the Hierarchy problem of the Higgs mass, it could be a sign for other kinds of new physics. One example is the grand unification of the electroweak and strong forces; some of the ways in which these forces unify imply the existence of an additional force particle that may be light and may even have the types of couplings suggested by the anomaly.

**Could it be related to other anomalies?**

The Atomki anomaly isn’t the first particle physics curiosity to show up at the MeV scale. While none of these other anomalies are necessarily related to the type of particle required for the Atomki result (they may not even be compatible!), it is helpful to remember that the MeV scale may still have surprises in store for us.

**The KTeV anomaly**: The rate at which neutral pions decay into electron–positron pairs appears to be off from the expectations based on chiral perturbation theory. In 0712.0007, a group of theorists found that this discrepancy could be fit to a new particle with*axial*couplings. If one fixes the mass of the proposed particle to be 20 MeV, the resulting couplings happen to be in the same ballpark as those required for the Atomki anomaly. The important caveat here is that parameters for an axial vector to fit the Atomki anomaly are unknown, and mixed vector–axial states are severely constrained by atomic parity violation.

**The anomalous magnetic moment of the muon**and the**cosmic lithium problem**: much of the progress in the field of light, weakly coupled forces comes from Maxim Pospelov. The anomalous magnetic moment of the muon,*(g-2)*, has a long-standing discrepancy from the Standard Model (see e.g. this blog post). While this may come from an error in the very, very intricate calculation and the subtle ways in which experimental data feed into it, Pospelov (and also Fayet) noted that the shift may come from a light (in the 10s of MeV range!), weakly coupled new particle like a dark photon. Similarly, Pospelov and collaborators showed that a new light particle in the 1-20 MeV range may help explain another longstanding mystery: the surprising lack of lithium in the universe (APS_{μ}*Physics*synopsis).

**The Proton Radius Problem:**the charge radius of the proton appears to be smaller than expected when measured using the Lamb shift of muonic hydrogen versus electron scattering experiments. See this ParticleBite summary, and this recent review. Some attempts to explain this discrepancy have involved MeV-scale new particles, though the endeavor is difficult. There’s been some renewed popular interest after a new result using deuterium confirmed the discrepancy. However, there was a report that a result at the proton radius problem conference in Trento suggests that the 2S-4P determination of the Rydberg constant may solve the puzzle (though discrepant with other Rydberg measurements).*[Those slides do not appear to be public.]*

**Could it be related to dark matter?**

A lot of recent progress in dark matter has revolved around the possibility that in addition to dark matter, there may be additional light particles that mediate interactions between dark matter and the Standard Model. If these particles are light enough, they can change the way that we expect to find dark matter in sometimes surprising ways. One interesting avenue is called self-interacting dark matter and is based on the observation that these light force carriers can deform the dark matter distribution in galaxies in ways that seem to fit astronomical observations. A 20 MeV dark photon-like particle even fits the profile of what’s required by the self-interacting dark matter paradigm, though it is very difficult to make such a particle consistent with both the Atomki anomaly and the constraints from direct detection.

**Should I be excited?**

Given all of the caveats listed above, some feel that it is too early to be in “drop everything, this is new physics” mode. Others may take this as a hint that’s worth exploring further—as has been done for many anomalies in the recent past. For researchers, it is prudent to be cautious, and it is paramount to be careful; but so long as one does both, then being excited about a new possibility is part what makes our job fun.

For the general public, the tentative hopes of new physics that pop up—whether it’s the Atomki anomaly, or the 750 GeV diphoton bump, a GeV bump from the galactic center, γ-ray lines at 3.5 keV and 130 GeV, or penguins at LHCb—these are the signs that we’re making use of all of the data available to search for new physics. Sometimes these hopes fizzle away, often they leave behind useful lessons about physics and directions forward. Maybe one of these days an anomaly will stick and show us the way forward.

Here are some of the popular-level press on the Atomki result. See the references at the top of this ParticleBite for references to the primary literature.

UC Riverside Press Release

UC Irvine Press Release

Nature News

Quanta Magazine

Quanta Magazine: Abstractions

Symmetry Magazine

Los Angeles Times

In a low-research day, I got my first view of the new location of the NYU Center for Data Science, in the newly renovated building at 60 Fifth Ave. The space is a mix of permanent, hoteling, and studio space for faculty, researchers, staff, and students, designed to meet very diverse needs and wants. It is cool! I also discussed briefly with Daniela Huppenkothen (NYU) the scope of her first paper on the states of GRS 1915, the black-hole source with extremely complex x-ray timing characteristics.

Monoidal categories are often introduced as an abstraction of categories with products. Instead of having the categorical product $\times$, we have some other product $\otimes$, and it’s required to behave in a somewhat product-like way.

But you could try to abstract *more* of the structure of a category with
products than monoidal categories do. After all, when a category has
products, it also comes with special maps $X \times Y \to X$ and $X \times Y \to Y$
for every $X$ and $Y$ (the projections). Abstracting this leads to
the notion of “monoidal category with projections”.

I’m writing this because over at this thread on magnitude homology, we’re making heavy use of semicartesian monoidal categories. These are simply monoidal categories whose unit object is terminal. But the word “semicartesian” is repellently technical, and you’d be forgiven for believing that any mathematics using “semicartesian” anythings is bound to be going about things the wrong way. Name aside, you might simply think it’s rather ad hoc; the nLab article says it initially sounds like centipede mathematics.

I don’t know whether semicartesian monoidal categories are truly necessary to the development of magnitude homology. But I do know that they’re a more reasonable and less ad hoc concept than they might seem, because:

TheoremA semicartesian monoidal category is the same thing as a monoidal category with projections.

So if you believe that “monoidal category with projections” is a reasonable or natural concept, you’re forced to believe the same about semicartesian monoidal categories.

I’m going to keep this post light and sketchy. A **monoidal category with
projections** is a monoidal category $V = (V, \otimes, I)$ together with a
distinguished pair of maps

$\pi^1_{X, Y} \colon X \otimes Y \to X, \qquad \pi^2_{X, Y} \colon X \otimes Y \to Y$

for each pair of objects $X$ and $Y$. We might call these “projections”. The projections are required to satisfy whatever equations they satisfy when $\otimes$ is categorical product $\times$ and the unit object $I$ is terminal. For instance, if you have three objects $X$, $Y$ and $Z$, then I can think of two ways to build a “projection” map $X \otimes Y \otimes Z \to X$:

think of $X \otimes Y \otimes Z$ as $X \otimes (Y \otimes Z)$ and take $\pi^1_{X, Y \otimes Z}$; or

think of $X \otimes Y \otimes Z$ as $(X \otimes Y) \otimes Z$, use $\pi^1_{X \otimes Y, Z}$ to project down to $X \otimes Y$, then use $\pi^1_{X, Y}$ to project from there to $X$.

One of the axioms for a monoidal category with projections is that these two maps are equal. You can guess the others.

A monoidal category is said to be **cartesian** if its monoidal structure
is given by the categorical (“cartesian”) product. So, any cartesian monoidal
category becomes a monoidal category with projections in an obvious way:
take the projections $\pi^i_{X, Y}$ to be the usual product-projections.

That’s the motivating example of a monoidal category with projections, but there are others. For instance, take the ordered set $(\mathbb{N}, \geq)$, and view it as a category in the usual way but with a reversal of direction: there’s one object for each natural number $n$, and there’s a map $n \to m$ iff $n \geq m$. It’s monoidal under addition, with $0$ as the unit. Since $m + n \geq m$ and $m + n \geq n$ for all $m$ and $n$, we have maps $m + n \to m$ and $m + n \to n$.

In this way, $(\mathbb{N}, \geq)$ is a monoidal category with projections. But it’s not cartesian, since the categorical product of $m$ and $n$ in $(\mathbb{N}, \geq)$ is $max\{m, n\}$, not $m + n$.

Now, a monoidal category $(V, \otimes, I)$ is **semicartesian** if the unit
object $I$ is terminal. Again, any cartesian monoidal category gives an
example, but this isn’t the only kind of example. And again, the ordered set
$(\mathbb{N}, \geq)$ demonstrates this: with the monoidal structure just
described, $0$ is the unit object, and it’s terminal.

The point of this post is:

TheoremA semicartesian monoidal category is the same thing as a monoidal category with projections.

I’ll state it no more precisely than that. I don’t know who this result is due to; the nLab page on semicartesian monoidal categories suggests it might be Eilenberg and Kelly, but I learned it from a Part III problem sheet of Peter Johnstone.

The proof goes roughly like this.

Start with a semicartesian monoidal category $V$. To build a monoidal category with projections, we have to define, for each $X$ and $Y$, a projection map $X \otimes Y \to X$ (and similarly for $Y$). Now, since $I$ is terminal, we have a unique map $Y \to I$. Tensoring with $X$ gives a map $X \otimes Y \to Y \otimes I$. But $Y \otimes I \cong Y$, so we’re done. That is, $\pi^1_{X, Y}$ is the composite

$X \otimes Y \stackrel{X \otimes !}{\longrightarrow} X \otimes I \cong X.$

After a few checks, we see that this makes $V$ into a monoidal category with projections.

In the other direction, start with a monoidal category $V$ with projections. We need to show that $V$ is semicartesian. In other words, we have to prove that for each object $X$, there is exactly one map $X \to I$. There’s at least one, because we have

$X \cong X \otimes I \stackrel{\pi^2_{X, I}}{\longrightarrow} I.$

I’ll skip the proof that there’s at most one, but it uses the axiom that the projections are natural transformations. (I didn’t mention that axiom, but of course it’s there.)

So we now have a way of turning a semicartesian monoidal category into a monoidal category with projections and vice versa. To finish the proof of the theorem, we have to show that these two processes are mutually inverse. That’s straightforward.

Here’s something funny about all this. A monoidal category with
projections appears to be a monoidal category with extra *structure*, whereas
a semicartesian monoidal category is a monoidal category with a certain
*property*. The theorem tells us that in fact, there’s at most one possible
way to equip a monoidal category with projections (and there *is* a way if
and only if $I$ is terminal). So having projections turns out to be a
property, not structure.

And that is my defence of semicartesian monoidal categories.

“Tag…you’re it!” is a popular game to play with jets these days at particle accelerators like the LHC. These collimated sprays of radiation are common in various types of high-energy collisions and can present a nasty challenge to both theorists and experimentalists (for more on the basic ideas and importance of jet physics, see my July bite on the subject). The process of tagging a jet generally means identifying the type of particle that initiated the jet. Since jets provide a significant contribution to backgrounds at high energy colliders, identifying where they come from can make doing things like discovering new particles much easier. While identifying backgrounds to new physics is important, in this bite I want to focus on how theorists are now using jets to study the production of hadrons in a unique way.

Over the years, a host of theoretical tools have been developed for making the study of jets tractable. The key steps of “reconstructing” jets are:

- Choose a jet algorithm (i.e. basically pick a metric that decides which particles it thinks are “clustered”),
- Identify potential jet axes (i.e. the centers of the jets),
- Decide which particles are in/out of the jets based on your jet algorithm.

Deciphering the particle content of a jet can often help to uncover what particle initiated the jet. While this is often enough for many analyses, one can ask the next obvious question: how are the momenta of the particles within the jet distributed? In other words, what does the inner geometry of the jet look like?

There are a number of observables that one can look at to study a jet’s geometry. These are generally referred to as **jet substructure observables**. Two basic examples are:

**Jet-shape**: This takes a jet of radius R and then identifies a sub-jet within it of radius r. By measuring the energy fraction contained within sub-jets of variable radius r, one can study where the majority of the jet’s energy/momentum is concentrated.**Jet mass**: By measuring the invariant mass of all of the particles in a jet (while simultaneously considering the jet’s energy and radius) one can gain insight into how focused a jet is.

One way in which phenomenologists are utilizing jet substructure technology is in the study of hadron production. In arXiv:1406.2295, Baumgart et. al. introduced a way to connect the world of jet physics with the world of **quarkonia**. These bound states of charm-anti-charm or bottom-anti-bottom quarks are the source of two things: great buzz words for impressing your friends and several outstanding problems within the standard model. While we’ve been studying quarkonia such the and the for a half-century, there are still a bunch of very basic questions we have about how they are produced (more on this topic in future bites).

This paper offers a fresh approach to studying the various ways in which quarkonia are produced at the LHC by focusing on how they are produced within jets. The wealth of available jet physics technology then provides a new family of interesting observables. The authors first describe the various mechanisms by which quarkonia are produced. In the formalism of Non-relativistic (NR) QCD, the for example, is most frequently produced at the LHC (see Fig. 2) when a high energy gluon splits into a pair in one of several possible angular momentum and color quantum states. This pair then ultimately undergoes non-perturbative (i.e. we can’t really calculate them using standard techniques in quantum field theory) effects and becomes a color-singlet final state particle (as any reasonably minded particle should do). While this model makes some sense, we have no idea how often quarkonia are produced via each mechanism.

This paper introduces a theoretical formalism that looks at the following question: what is the probability that a parton (quark/gluon) hadronizes into a jet with a certain substructure and that contains a specific hadron with some fraction of the original partons energy? The authors show that the answer to this question is correlated with the answer to the question: How often are quarkonia produced via the different intermediate angular-momentum/color states of NRQCD? In other words, they show that studying how the geometry of the jets that contain quarkonia may lead to answers to decades old questions about how quarkonia are produced!

There are several other efforts to study hadron production through the lens of jet physics that have also done preliminary comparisons with ATLAS/CMS data (one such study will be the subject of my next bite). These studies look at the production of more general classes of hadrons and numbers of jets in events and see promising results when compared with 7 TeV data from ATLAS and CMS.

The moral of this story is that jets are now being viewed less as a source of troublesome backgrounds to new physics and more as a laboratory for studying long-standing questions about the underlying nature of hadronization. Jet physics offers innovative ways to look at old problems, offering a host of new and exciting observables to study at the LHC and other experiments.

**Further Reading**

*The November Revolution:*https://www.slac.stanford.edu/history/pubs/gilmannov.pdf. This transcript of a talk provides some nice background on, amongst other things, the momentous discovery of the in 1974 what is often referred to the November Revolution.*An Introduction to the NRQCD Factorization Approach to Heavy Quarkonium*https://cds.cern.ch/record/319642/files/9702225.pdf. As good as it gets when it comes to outlines of the basics of this tried-and-true effective theory. This article will definitely take some familiarity with QFT but provides a great outline of the basics of the NRQCD Lagrangian, fields, decays etc.

[This blog post was written jointly by Terry Tao and Will Sawin.] In the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different from the usual notion of tensor rank, and which (following BCCGNSU) we will call “slice rank”. This notion of rank could […]

*[This blog post was written jointly by Terry Tao and Will Sawin.]*

In the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different from the usual notion of tensor rank, and which (following BCCGNSU) we will call “slice rank”. This notion of rank could then be used to encode the Croot-Lev-Pach-Ellenberg-Gijswijt argument that uses the polynomial method to control capsets.

Afterwards, several papers have applied the slice rank method to further problems – to control tri-colored sum-free sets in abelian groups (BCCGNSU, KSS) and from there to the triangle removal lemma in vector spaces over finite fields (FL), to control sunflowers (NS), and to bound progression-free sets in -groups (P).

In this post we investigate the notion of slice rank more systematically. In particular, we show how to give lower bounds for the slice rank. In many cases, we can show that the upper bounds on slice rank given in the aforementioned papers are sharp to within a subexponential factor. This still leaves open the possibility of getting a better bound for the original combinatorial problem using the slice rank of some other tensor, but for very long arithmetic progressions (at least eight terms), we show that the slice rank method cannot improve over the trivial bound using any tensor.

It will be convenient to work in a “basis independent” formalism, namely working in the category of abstract finite-dimensional vector spaces over a fixed field . (In the applications to the capset problem one takes to be the finite field of three elements, but most of the discussion here applies to arbitrary fields.) Given such vector spaces , we can form the tensor product , generated by the tensor products with for , subject to the constraint that the tensor product operation is multilinear. For each , we have the smaller tensor products , as well as the tensor product

defined in the obvious fashion. Elements of of the form for some and will be called *rank one functions*, and the *slice rank* (or *rank* for short) of an element of is defined to be the least nonnegative integer such that is a linear combination of rank one functions. If are finite-dimensional, then the rank is always well defined as a non-negative integer (in fact it cannot exceed . It is also clearly subadditive:

For , is when is zero, and otherwise. For , is the usual rank of the -tensor (which can for instance be identified with a linear map from to the dual space ). The usual notion of tensor rank for higher order tensors uses complete tensor products , as the rank one objects, rather than , giving a rank that is greater than or equal to the slice rank studied here.

From basic linear algebra we have the following equivalences:

Lemma 1Let be finite-dimensional vector spaces over a field , let be an element of , and let be a non-negative integer. Then the following are equivalent:

- (i) One has .
- (ii) One has a representation of the form
where are finite sets of total cardinality at most , and for each and , and .

- (iii) One has
where for each , is a subspace of of total dimension at most , and we view as a subspace of in the obvious fashion.

- (iv) (Dual formulation) There exist subspaces of the dual space for , of total dimension at least , such that is orthogonal to , in the sense that one has the vanishing
for all , where is the obvious pairing.

*Proof:* The equivalence of (i) and (ii) is clear from definition. To get from (ii) to (iii) one simply takes to be the span of the , and conversely to get from (iii) to (ii) one takes the to be a basis of the and computes by using a basis for the tensor product consisting entirely of functions of the form for various . To pass from (iii) to (iv) one takes to be the annihilator of , and conversely to pass from (iv) to (iii).

One corollary of the formulation (iv), is that the set of tensors of slice rank at most is Zariski closed (if the field is algebraically closed), and so the slice rank itself is a lower semi-continuous function. This is in contrast to the usual tensor rank, which is not necessarily semicontinuous.

Corollary 2Let be finite-dimensional vector spaces over an algebraically closed field . Let be a nonnegative integer. The set of elements of of slice rank at most is closed in the Zariski topology.

*Proof:* In view of Lemma 1(i and iv), this set is the union over tuples of integers with of the projection from of the set of tuples with orthogonal to , where is the Grassmanian parameterizing -dimensional subspaces of .

One can check directly that the set of tuples with orthogonal to is Zariski closed in using a set of equations of the form locally on . Hence because the Grassmanian is a complete variety, the projection of this set to is also Zariski closed. So the finite union over tuples of these projections is also Zariski closed.

We also have good behaviour with respect to linear transformations:

Lemma 3Let be finite-dimensional vector spaces over a field , let be an element of , and for each , let be a linear transformation, with the tensor product of these maps. Then

Furthermore, if the are all injective, then one has equality in (2).

Thus, for instance, the rank of a tensor is intrinsic in the sense that it is unaffected by any enlargements of the spaces .

*Proof:* The bound (2) is clear from the formulation (ii) of rank in Lemma 1. For equality, apply (2) to the injective , as well as to some arbitrarily chosen left inverses of the .

Computing the rank of a tensor is difficult in general; however, the problem becomes a combinatorial one if one has a suitably sparse representation of that tensor in some basis, where we will measure sparsity by the property of being an antichain.

where for each , is a coefficient in . Then one has

where the minimum ranges over all coverings of by sets , and for are the projection maps.

Now suppose that the coefficients are all non-zero, that each of the are equipped with a total ordering , and is the set of maximal elements of , thus there do not exist distinct , such that for all . Then one has

In particular, if is an antichain (i.e. every element is maximal), then equality holds in (4).

*Proof:* By Lemma 3 (or by enlarging the bases ), we may assume without loss of generality that each of the is spanned by the . By relabeling, we can also assume that each is of the form

with the usual ordering, and by Lemma 3 we may take each to be , with the standard basis.

Let denote the rank of . To show (4), it suffices to show the inequality

for any covering of by . By removing repeated elements we may assume that the are disjoint. For each , the tensor

can (after collecting terms) be written as

for some . Summing and using (1), we conclude the inequality (6).

Now assume that the are all non-zero and that is the set of maximal elements of . To conclude the proposition, it suffices to show that the reverse inequality

for some covering . By Lemma 1(iv), there exist subspaces of whose dimension sums to

Let . Using Gaussian elimination, one can find a basis of whose representation in the standard dual basis of is in row-echelon form. That is to say, there exist natural numbers

such that for all , is a linear combination of the dual vectors , with the coefficient equal to one.

We now claim that is disjoint from . Suppose for contradiction that this were not the case, thus there exists for each such that

As is the set of maximal elements of , this implies that

for any tuple other than . On the other hand, we know that is a linear combination of , with the coefficient one. We conclude that the tensor product is equal to

plus a linear combination of other tensor products with not in . Taking inner products with (3), we conclude that , contradicting the fact that is orthogonal to . Thus we have disjoint from .

For each , let denote the set of tuples in with not of the form . From the previous discussion we see that the cover , and we clearly have , and hence from (8) we have (7) as claimed.

As an instance of this proposition, we recover the computation of diagonal rank from the previous blog post:

Example 5Let be finite-dimensional vector spaces over a field for some . Let be a natural number, and for , let be a linearly independent set in . Let be non-zero coefficients in . Thenhas rank . Indeed, one applies the proposition with all equal to , with the diagonal in ; this is an antichain if we give one of the the standard ordering, and another of the the opposite ordering (and ordering the remaining arbitrarily). In this case, the are all bijective, and so it is clear that the minimum in (4) is simply .

The combinatorial minimisation problem in the above proposition can be solved asymptotically when working with tensor powers, using the notion of the Shannon entropy of a discrete random variable .

Let be a tensor of the form (3) for some coefficients . For each natural number , let be the tensor power of copies of , viewed as an element of . Then

and range over the random variables taking values in .

Now suppose that the coefficients are all non-zero and that each of the are equipped with a total ordering . Let be the set of maximal elements of in the product ordering, and let where range over random variables taking values in . Then

as . In particular, if the maximizer in (10) is supported on the maximal elements of (which always holds if is an antichain in the product ordering), then equality holds in (9).

*Proof:*

as , where is the projection map. Then the same thing will apply to and . Then applying Proposition 4, using the lexicographical ordering on and noting that, if are the maximal elements of , then are the maximal elements of , we obtain both (9) and (11).

We first prove the lower bound. By compactness (and the continuity properties of entropy), we can find a random variable taking values in such that

Let be a small positive quantity that goes to zero sufficiently slowly with . Let denote the set of all tuples in that are within of being distributed according to the law of , in the sense that for all , one has

By the asymptotic equipartition property, the cardinality of can be computed to be

if goes to zero slowly enough. Similarly one has

Now let be an arbitrary covering of . By the pigeonhole principle, there exists such that

which by (13) implies that

noting that the factor can be absorbed into the error). This gives the lower bound in (12).

Now we prove the upper bound. We can cover by sets of the form for various choices of random variables taking values in . For each such random variable , we can find such that ; we then place all of in . It is then clear that the cover and that

for all , giving the required upper bound.

It is of interest to compute the quantity in (10). We have the following criterion for when a maximiser occurs:

Proposition 7Let be finite sets, and be non-empty. Let be the quantity in (10). Let be a random variable taking values in , and let denote the essential range of , that is to say the set of tuples such that is non-zero. Then the following are equivalent:

- (i) attains the maximum in (10).
- (ii) There exist weights and a finite quantity , such that whenever , and such that
for all , with equality if . (In particular, must vanish if there exists a with .)

Furthermore, when (i) and (ii) holds, one has

*Proof:* We first show that (i) implies (ii). The function is concave on . As a consequence, if we define to be the set of tuples such that there exists a random variable taking values in with , then is convex. On the other hand, by (10), is disjoint from the orthant . Thus, by the hyperplane separation theorem, we conclude that there exists a half-space

where are reals that are not all zero, and is another real, which contains on its boundary and in its interior, such that avoids the interior of the half-space. Since is also on the boundary of , we see that the are non-negative, and that whenever .

By construction, the quantity

is maximised when . At this point we could use the method of Lagrange multipliers to obtain the required constraints, but because we have some boundary conditions on the (namely, that the probability that they attain a given element of has to be non-negative) we will work things out by hand. Let be an element of , and an element of . For small enough, we can form a random variable taking values in , whose probability distribution is the same as that for except that the probability of attaining is increased by , and the probability of attaining is decreased by . If there is any for which and , then one can check that

for sufficiently small , contradicting the maximality of ; thus we have whenever . Taylor expansion then gives

for small , where

and similarly for . We conclude that for all and , thus there exists a quantity such that for all , and for all . By construction must be nonnegative. Sampling using the distribution of , one has

almost surely; taking expectations we conclude that

The inner sum is , which equals when is non-zero, giving (17).

Now we show conversely that (ii) implies (i). As noted previously, the function is concave on , with derivative . This gives the inequality

for any (note the right-hand side may be infinite when and ). Let be any random variable taking values in , then on applying the above inequality with and , multiplying by , and summing over and gives

By construction, one has

and

so to prove that (which would give (i)), it suffices to show that

or equivalently that the quantity

is maximised when . Since

it suffices to show this claim for the quantity

One can view this quantity as

By (ii), this quantity is bounded by , with equality if is equal to (and is in particular ranging in ), giving the claim.

The second half of the proof of Proposition 7 only uses the marginal distributions and the equation(16), not the actual distribution of , so it can also be used to prove an upper bound on when the exact maximizing distribution is not known, given suitable probability distributions in each variable. The logarithm of the probability distribution here plays the role that the weight functions do in BCCGNSU.

Remark 8Suppose one is in the situation of (i) and (ii) above; assume the nondegeneracy condition that is positive (or equivalently that is positive). We can assign a “degree” to each element by the formula

then every tuple in has total degree at most , and those tuples in have degree exactly . In particular, every tuple in has degree at most , and hence by (17), each such tuple has a -component of degree less than or equal to for some with . On the other hand, we can compute from (19) and the fact that for that . Thus, by asymptotic equipartition, and assuming , the number of “monomials” in of total degree at most is at most ; one can in fact use (19) and (18) to show that this is in fact an equality. This gives a direct way to cover by sets with , which is in the spirit of the Croot-Lev-Pach-Ellenberg-Gijswijt arguments from the previous post.

We can now show that the rank computation for the capset problem is sharp:

Proposition 9Let denote the space of functions from to . Then the function from to , viewed as an element of , has rank as , where is given by the formula

*Proof:* In , we have

Thus, if we let be the space of functions from to (with domain variable denoted respectively), and define the basis functions

of indexed by (with the usual ordering), respectively, and set to be the set

then is a linear combination of the with , and all coefficients non-zero. Then we have . We will show that the quantity of (10) agrees with the quantity of (20), and that the optimizing distribution is supported on , so that by Proposition 6 the rank of is .

To compute the quantity at (10), we use the criterion in Proposition 7. We take to be the random variable taking values in that attains each of the values with a probability of , and each of with a probability of ; then each of the attains the values of with probabilities respectively, so in particular is equal to the quantity in (20). If we now set and

we can verify the condition (16) with equality for all , which from (17) gives as desired.

This statement already follows from the result of Kleinberg-Sawin-Speyer, which gives a “tri-colored sum-free set” in of size , as the slice rank of this tensor is an upper bound for the size of a tri-colored sum-free set. If one were to go over the proofs more carefully to evaluate the subexponential factors, this argument would give a stronger lower bound than KSS, as it does not deal with the substantial loss that comes from Behrend’s construction. However, because it actually constructs a set, the KSS result rules out more possible approaches to give an exponential improvement of the upper bound for capsets. The lower bound on slice rank shows that the bound cannot be improved using only the slice rank of this particular tensor, whereas KSS shows that the bound cannot be improved using any method that does not take advantage of the “single-colored” nature of the problem.

We can also show that the slice rank upper bound in a result of Naslund-Sawin is similarly sharp:

Proposition 10Let denote the space of functions from to . Then the function from , viewed as an element of , has slice rank

*Proof:* Let and be a basis for the space of functions on , itself indexed by . Choose similar bases for and , with and .

Set . Then is a linear combination of the with , and all coefficients non-zero. Order the usual way so that is an antichain. We will show that the quantity of (10) is , so that applying the last statement of Proposition 6, we conclude that the rank of is ,

Let be the random variable taking values in that attains each of the values with a probability of . Then each of the attains the value with probability and with probability , so

Setting and , we can verify the condition (16) with equality for all , which from (17) gives as desired.

We used a slightly different method in each of the last two results. In the first one, we use the most natural bases for all three vector spaces, and distinguish from its set of maximal elements . In the second one we modify one basis element slightly, with instead of the more obvious choice , which allows us to work with instead of . Because is an antichain, we do not need to distinguish and . Both methods in fact work with either problem, and they are both about equally difficult, but we include both as either might turn out to be substantially more convenient in future work.

Proposition 11Let be a natural number and let be a finite abelian group. Let be any field. Let denote the space of functions from to .Let be any -valued function on that is nonzero only when the elements of form a -term arithmetic progression, and is nonzero on every -term constant progression.

Then the slice rank of is .

*Proof:* We apply Proposition 4, using the standard bases of . Let be the support of . Suppose that we have orderings on such that the constant progressions are maximal elements of and thus all constant progressions lie in . Then for any partition of , can contain at most constant progressions, and as all constant progressions must lie in one of the , we must have . By Proposition 4, this implies that the slice rank of is at least . Since is a tensor, the slice rank is at most , hence exactly .

So it is sufficient to find orderings on such that the constant progressions are maximal element of . We make several simplifying reductions: We may as well assume that consists of all the -term arithmetic progressions, because if the constant progressions are maximal among the set of all progressions then they are maximal among its subset . So we are looking for an ordering in which the constant progressions are maximal among all -term arithmetic progressions. We may as well assume that is cyclic, because if for each cyclic group we have an ordering where constant progressions are maximal, on an arbitrary finite abelian group the lexicographic product of these orderings is an ordering for which the constant progressions are maximal. We may assume , as if we have an -tuple of orderings where constant progressions are maximal, we may add arbitrary orderings and the constant progressions will remain maximal.

So it is sufficient to find orderings on the cyclic group such that the constant progressions are maximal elements of the set of -term progressions in in the -fold product ordering. To do that, let the first, second, third, and fifth orderings be the usual order on and let the fourth, sixth, seventh, and eighth orderings be the reverse of the usual order on .

Then let be a constant progression and for contradiction assume that is a progression greater than in this ordering. We may assume that , because otherwise we may reverse the order of the progression, which has the effect of reversing all eight orderings, and then apply the transformation , which again reverses the eight orderings, bringing us back to the original problem but with .

Take a representative of the residue class in the interval . We will abuse notation and call this . Observe that , and are all contained in the interval modulo . Take a representative of the residue class in the interval . Then is in the interval for some . The distance between any distinct pair of intervals of this type is greater than , but the distance between and is at most , so is in the interval . By the same reasoning, is in the interval . Therefore . But then the distance between and is at most , so by the same reasoning is in the interval . Because is between and , it also lies in the interval . Because is in the interval , and by assumption it is congruent mod to a number in the set greater than or equal to , it must be exactly . Then, remembering that and lie in , we have and , so , hence , thus , which contradicts the assumption that .

In fact, given a -term progressions mod and a constant, we can form a -term binary sequence with a for each step of the progression that is greater than the constant and a for each step that is less. Because a rotation map, viewed as a dynamical system, has zero topological entropy, the number of -term binary sequences that appear grows subexponentially in . Hence there must be, for large enough , at least one sequence that does not appear. In this proof we exploit a sequence that does not appear for .

Filed under: expository, math.CO, math.RA Tagged: polynomial method, tensors, Will Sawin

I have spent part of the summer working with Megan Bedell (Chicago) to see if there is any evidence that radial velocity measurements with the *HARPS* instrument might be being affected by calibration issues or helped by taking some kind of hierarchical approach to calibration. We weren't building that hierarchical model, we were looking to see if there is evidence in the residuals for information that a hierarchical model could latch on to. We found nothing, to my surprise. I think this means that the *HARPS* pipelines are absolutely awesome. I think they are closed-source, so we can't do much but inspect the output.

Given this, we decided to start looking at stellar diagnostics—if it isn't the instrument calibration, then maybe it is actually the star itself: We need to ask whether we can we see spectral signatures that predict radial velocity. This is a very general causal formulation of the problem: We do not expect that a star's spectrum will vary with the phase of an exoplanet's orbit (unless it is a very hot planet!), so if anything about the spectrum predicts the radial velocity, we have something to latch on to. The idea is that we might see the spectral signature of hot up-welling or cold down-welling at the stellar surface. There is much work in this area, but I am not sure than anyone has done anything truly data driven (in the style, for example, of *The Cannon*). We discussed first steps towards doing that, with Bedell assigned plotting tasks, and me writing down some methodological ideas.