I guess the most dominant pitching performance I’ve seen in person? Quintana never seemed dominant. The Brewers hit a lot of balls hard. But a 3-hit complete game shutout is a 3-hit complete game shutout. A lot of Cubs fans. A lot a lot. My kids both agreed there were more Cubs than Brewers fans there, in […]

- I guess the most dominant pitching performance I’ve seen in person? Quintana never
*seemed*dominant. The Brewers hit a lot of balls hard. But a 3-hit complete game shutout is a 3-hit complete game shutout. - A lot of Cubs fans. A
*lot*a lot. My kids both agreed there were more Cubs than Brewers fans there, in a game that probably mattered more to Milwaukee. - For Cubs fans to boo Ryan Braun in Wrigley Field is OK, I guess. To come to Miller Park and boo Ryan Braun is classless. Some of those people were wearing Sammy Sosa jerseys!
- This is the first time I’ve sat high up in the outfield. And the view was great, as it’s been from every other seat I’ve ever occupied there. A really nice design. If only the food were better.

There will be a conference on applied category theory! • Applied Category Theory (ACT 2018). School 23–27 April 2018 and conference 30 April–4 May 2018 at the Lorentz Center in Leiden, the Netherlands. Organized by Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford). The plenary speakers will […]

There will be a conference on applied category theory!

• Applied Category Theory (ACT 2018). School 23–27 April 2018 and conference 30 April–4 May 2018 at the Lorentz Center in Leiden, the Netherlands. Organized by Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford).

The plenary speakers will be:

• Samson Abramsky (Oxford)

• John Baez (UC Riverside)

• Kathryn Hess (EPFL)

• Mehrnoosh Sadrzadeh (Queen Mary)

• David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website:

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

Towards an Integrative Science: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.While the workshop will host discussions on a wide range of applications of category theory, there will be four special tracks on exciting new developments in the field:

1. Dynamical systems and networks

2. Systems biology

3. Cognition and AI

4. CausalityAccompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

Sincerely,

The organizersBob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

## About Applied Category Theory

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

## The proposal:

Towards an Integrative ScienceCategory theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez and Mike STay. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945–1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127–135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121–134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317–341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.

Last weekend, I gave a talk on big numbers, as well as a Q&A about quantum computing, at Festivaletteratura: one of the main European literary festivals, held every year in beautiful and historic Mantua, Italy. (For those who didn’t know, as I didn’t: this is the city where Virgil was born, and where Romeo gets banished in *Romeo and Juliet*. Its layout hasn’t substantially changed since the Middle Ages.)

I don’t know how much big numbers or quantum computing have to do with literature, but I relished the challenge of explaining these things to an audience that was not merely “popular” but humanisitically rather than scientifically inclined. In this case, there was not only a math barrier, but *also* a language barrier, as the festival was mostly in Italian and only some of the attendees knew English, to varying degrees. The quantum computing session was live-translated into Italian (the challenge faced by the translator in not mangling this material provided a lot of free humor), but the big numbers talk wasn’t. What’s more, the talk was held outdoors, on the steps of a cathedral, with tons of background noise, including a bell that loudly chimed halfway through the talk. So if my own words weren’t simple and clear, forget it.

Anyway, in the rest of this post, I’ll share a writeup of my big numbers talk. The talk has substantial overlap with my “classic” Who Can Name The Bigger Number? essay from 1999. While I don’t mean to supersede or displace that essay, the truth is that I think and write somewhat differently than I did as a teenager (whuda thunk?), and I wanted to give Scott_{2017} a crack at material that Scott_{1999} has been over already. If nothing else, the new version is more up-to-date and less self-indulgent, and it includes points (for example, the relation between ordinal generalizations of the Busy Beaver function and the axioms of set theory) that I didn’t understand back in 1999.

For regular readers of this blog, I don’t know how much will be new here. But if you’re one of those people who keeps introducing themselves at social events by saying “I really love your blog, Scott, even though I don’t understand anything that’s in it”—something that’s always a bit awkward for me, because, uh, thanks, I guess, but what am I supposed to say next?—then **this lecture is for you**. I hope you’ll read it and understand it.

Thanks so much to Festivaletteratura organizer Matteo Polettini for inviting me, and to Fabrizio Illuminati for moderating the Q&A. I had a wonderful time in Mantua, although I confess there’s something about being Italian that I don’t understand. Namely: how do you derive any pleasure from international travel, if anywhere you go, the pizza, pasta, bread, cheese, ice cream, coffee, architecture, scenery, historical sights, and *pretty much everything else* all fall short of what you’re used to?

**Big Numbers**

by Scott Aaronson

Sept. 9, 2017

My four-year-old daughter sometimes comes to me and says something like: “daddy, I think I *finally* figured out what the biggest number is! Is it a million million million million million million million million thousand thousand thousand hundred hundred hundred hundred twenty eighty ninety eighty thirty a million?”

So I reply, “I’m not even sure exactly what number you named—but whatever it is, why not that number plus one?”

“Oh yeah,” she says. “So is *that* the biggest number?”

Of course there’s no biggest number, but it’s natural to wonder what are the biggest numbers we can name in a reasonable amount of time. Can I have two volunteers from the audience—ideally, two kids who like math?

[Two kids eventually come up. I draw a line down the middle of the blackboard, and place one kid on each side of it, each with a piece of chalk.]

So the game is, you each have ten seconds to write down the biggest number you can. You can’t write anything like “the other person’s number plus 1,” and you also can’t write infinity—it has to be finite. But other than that, you can write basically anything you want, as long as I’m able to understand exactly what number you’ve named. [These instructions are translated into Italian for the kids.]

Are you ready? On your mark, get set, GO!

[The kid on the left writes something like: 999999999

While the kid on the right writes something like: 11111111111111111

Looking at these, I comment:]

9 is bigger than 1, but 1 is a bit faster to write, and as you can see that makes the difference here! OK, let’s give our volunteers a round of applause.

[I didn’t plant the kids, but if I had, I couldn’t have designed a better jumping-off point.]

I’ve been fascinated by how to name huge numbers since I was a kid myself. When I was a teenager, I even wrote an essay on the subject, called Who Can Name the Bigger Number? That essay might *still* get more views than any of the research I’ve done in all the years since! I don’t know whether to be happy or sad about that.

I think the reason the essay remains so popular, is that it shows up on Google whenever someone types something like “what is the biggest number?” Some of you might know that Google itself was named after the huge number called a googol: 10^{100}, or 1 followed by a hundred zeroes.

Of course, a googol isn’t even close to the biggest number we can name. For starters, there’s a googolplex, which is 1 followed by a googol zeroes. Then there’s a googolplexplex, which is 1 followed by a googolplex zeroes, and a googolplexplexplex, and so on. But one of the most basic lessons you’ll learn in this talk is that, when it comes to naming big numbers, whenever you find yourself just repeating the same operation over and over and over, it’s time to step back, and look for something new to do that transcends everything you were doing previously. (Applications to everyday life left as exercises for the listener.)

One of the first people to think about systems for naming huge numbers was Archimedes, who was Greek but lived in what’s now Italy (specifically Syracuse, Sicily) in the 200s BC. Archimedes wrote a sort of pop-science article—possibly history’s *first* pop-science article—called The Sand-Reckoner. In this remarkable piece, which was addressed to the King of Syracuse, Archimedes sets out to calculate an upper bound on the number of grains of sand needed to fill the entire universe, or at least the universe as known in antiquity. He thereby seeks to refute people who use “the number of sand grains” as a shorthand for uncountability and unknowability.

Of course, Archimedes was just guessing about the size of the universe, though he *did* use the best astronomy available in his time—namely, the work of Aristarchus, who anticipated Copernicus. Besides estimates for the size of the universe and of a sand grain, the other thing Archimedes needed was a way to name arbitrarily large numbers. Since he didn’t have Arabic numerals or scientific notation, his system was basically just to compose the word “myriad” (which means 10,000) into bigger and bigger chunks: a “myriad myriad” gets its own name, a “myriad myriad myriad” gets another, and so on. Using this system, Archimedes estimated that ~10^{63} sand grains would suffice to fill the universe. Ancient Hindu mathematicians were able to name similarly large numbers using similar notations. In some sense, the next really fundamental advances in naming big numbers wouldn’t occur until the 20^{th} century.

We’ll come to those advances, but before we do, I’d like to discuss another question that motivated Archimedes’ essay: namely, what are the biggest numbers *relevant to the physical world*?

For starters, how many atoms are in a human body? Anyone have a guess? About 10^{28}. (If you remember from high-school chemistry that a “mole” is 6×10^{23}, this is not hard to ballpark.)

How many stars are in our galaxy? Estimates vary, but let’s say a few hundred billion.

How many stars are in the entire observable universe? Something like 10^{23}.

How many *subatomic particles* are in the observable universe? No one knows for sure—for one thing, because we don’t know what the dark matter is made of—but 10^{90} is a reasonable estimate.

Some of you might be wondering: but for all anyone knows, couldn’t the universe be infinite? Couldn’t it have *infinitely* many stars and particles? The answer to that is interesting: indeed, no one knows whether space goes on forever or curves back on itself, like the surface of the earth. But because of the dark energy, discovered in 1998, it seems likely that even if space is infinite, we can only ever see a finite part of it. The dark energy is a force that pushes the galaxies apart. The further away they are from us, the faster they’re receding—with galaxies far enough away from us receding *faster than light*.

Right now, we can see the light from galaxies that are up to about 45 billion light-years away. (Why 45 billion light-years, you ask, if the universe itself is “only” 13.6 billion years old? Well, when the galaxies emitted the light, they were a lot closer to us than they are now! The universe expanded in the meantime.) If, as seems likely, the dark energy has the form of a cosmological constant, then there’s a somewhat further horizon, such that it’s not just that the galaxies beyond that can’t be seen by us right now—it’s that they can *never* be seen.

In practice, many big numbers come from the phenomenon of exponential growth. Here’s a graph showing the three functions n, n^{2}, and 2^{n}:

The difference is, n and even n^{2} grow in a more-or-less manageable way, but 2^{n} just shoots up off the screen. The shooting-up has real-life consequences—indeed, more important consequences than just about any other mathematical fact one can think of.

The current human population is about 7.5 billion (when I was a kid, it was more like 5 billion). Right now, the population is doubling about once every 64 years. If it continues to double at that rate, and humans don’t colonize other worlds, then you can calculate that, less than 3000 years from now, the entire earth, all the way down to the core, will be made of human flesh. I hope the people use deodorant!

Nuclear chain reactions are a second example of exponential growth: one uranium or plutonium nucleus fissions and emits neutrons that cause, let’s say, two other nuclei to fission, which then cause *four* nuclei to fission, then 8, 16, 32, and so on, until boom, you’ve got your nuclear weapon (or your nuclear reactor, if you do something to slow the process down). A third example is compound interest, as with your bank account, or for that matter an entire country’s GDP. A fourth example is Moore’s Law, which is the thing that said that the number of components in a microprocessor doubled every 18 months (with other metrics, like memory, processing speed, etc., on similar exponential trajectories). Here at Festivaletteratura, there’s a “Hack Space,” where you can see state-of-the-art Olivetti personal computers from around 1980: huge desk-sized machines with maybe 16K of usable RAM. Moore’s Law is the thing that took us from those (and the even bigger, weaker computers before them) to the smartphone that’s in your pocket.

However, a general rule is that *any time we encounter exponential growth in our observed universe, it can’t last for long*. It *will* stop, if not before then when it runs out of whatever resource it needs to continue: for example, food or land in the case of people, fuel in the case of a nuclear reaction. OK, but what about Moore’s Law: what physical constraint will stop *it*?

By some definitions, Moore’s Law has *already* stopped: computers aren’t getting that much faster in terms of clock speed; they’re mostly just getting more and more parallel, with more and more cores on a chip. And it’s easy to see why: the speed of light is finite, which means the speed of a computer will always be limited by the size of its components. And transistors are now just 15 nanometers across; a couple orders of magnitude smaller and you’ll be dealing with individual atoms. And unless we leap really far into science fiction, it’s hard to imagine building a transistor smaller than one atom across!

OK, but what if we *do* leap really far into science fiction? Forget about engineering difficulties: is there any fundamental principle of *physics* that prevents us from making components smaller and smaller, and thereby making our computers faster and faster, without limit?

While no one has tested this directly, it appears from current physics that there *is* a fundamental limit to speed, and that it’s about 10^{43} operations per second, or one operation per Planck time. Likewise, it appears that there’s a fundamental limit to the density with which information can be stored, and that it’s about 10^{69} bits per square meter, or one bit per Planck area. (Surprisingly, the latter limit scales only with the surface area of a region, not with its volume.)

What would happen if you tried to build a faster computer than that, or a denser hard drive? The answer is: cycling through that many different states per second, or storing that many bits, would involve concentrating so much *energy* in so small a region, that the region would exceed what’s called its Schwarzschild radius. If you don’t know what that means, it’s just a fancy way of saying that your computer would collapse to a black hole. I’ve always liked that as Nature’s way of telling you not to do something!

Note that, on the modern view, a black hole *itself* is not only the densest possible object allowed by physics, but also the most efficient possible hard drive, storing ~10^{69} bits per square meter of its event horizon—though the bits are not so easy to retrieve! It’s also, in a certain sense, the fastest possible computer, since it really *does* cycle through 10^{43} states per second—though it might not be computing anything that anyone would care about.

We can also combine these fundamental limits on computer speed and storage capacity, with the limits that I mentioned earlier on the size of the observable universe, which come from the cosmological constant. If we do so, we get an upper bound of ~10^{122} on the number of bits that can ever be involved in *any* computation in our world, no matter how large: if we tried to do a bigger computation than that, the far parts of it would be receding away from us faster than the speed of light. In some sense, this 10^{122} is the most fundamental number that sets the scale of our universe: on the current conception of physics, everything you’ve ever seen or done, or will see or will do, can be represented by a sequence of at most 10^{122} ones and zeroes.

Having said that, in math, computer science, and many other fields (including physics itself), many of us meet bigger numbers than 10^{122} dozens of times before breakfast! How so? Mostly because we choose to ask, not about the number of *things that are*, but about the number of possible *ways they could be*—not about the size of ordinary 3-dimensional space, but the sizes of abstract spaces of possible configurations. And the latter are subject to exponential growth, continuing way beyond 10^{122}.

As an example, let’s ask: how many different novels could possibly be written (say, at most 400 pages long, with a normal-size font, yadda yadda)? Well, we could get a lower bound on the number just by walking around here at Festivaletteratura, but the number that *could* be written certainly far exceeds the number that have been written or ever will be. This was the subject of Jorge Luis Borges’ famous story The Library of Babel, which imagined an immense library containing every book that could possibly be written up to a certain length. Of course, the vast majority of the books are filled with meaningless nonsense, but among their number one can find all the great works of literature, books predicting the future of humanity in perfect detail, books predicting the future except with a single error, etc. etc. etc.

To get more quantitative, let’s simply ask: how many different ways are there to fill the *first page* of a novel? Let’s go ahead and assume that the page is filled with intelligible (or at least grammatical) English text, rather than arbitrary sequences of symbols, at a standard font size and page size. In that case, using standard estimates for the entropy (i.e., compressibility) of English, I estimated this morning that there are maybe ~10^{700} possibilities. So, forget about the rest of the novel: there are astronomically more possible *first pages* than could fit in the observable universe!

We could likewise ask: how many chess games could be played? I’ve seen estimates from 10^{40} up to 10^{120}, depending on whether we count only “sensible” games or also “absurd” ones (though in all cases, with a limit on the length of the game as might occur in a real competition). For Go, by contrast, which is played on a larger board (19×19 rather than 8×8) the estimates for the number of possible games seem to start at 10^{800} and only increase from there. This difference in magnitudes has *something* to do with why Go is a “harder” game than chess, why computers were able to beat the world chess champion already in 1997, but the world Go champion not until last year.

Or we could ask: given a thousand cities, how many routes are there for a salesman that visit each city exactly once? We write the answer as 1000!, pronounced “1000 factorial,” which just means 1000×999×998×…×2×1: there are 1000 choices for the first city, then 999 for the second city, 998 for the third, and so on. This number is about 4×10^{2567}. So again, more possible routes than atoms in the visible universe, yadda yadda.

But suppose the salesman is interested only in the *shortest* route that visits each city, given the distance between every city and every other. We could then ask: to find that shortest route, would a computer need to search exhaustively through all 1000! possibilities—or, maybe not all 1000!, maybe it could be a bit more clever than that, but at any rate, a number that grew exponentially with the number of cities n? Or could there be an algorithm that zeroed in on the shortest route dramatically faster: say, using a number of steps that grew only linearly or quadratically with the number of cities?

This, modulo a few details, is one of the most famous unsolved problems in all of math and science. You may have heard of it; it’s called P versus NP. P (Polynomial-Time) is the class of problems that an ordinary digital computer can solve in a “reasonable” amount of time, where we define “reasonable” to mean, growing at most like the size of the problem (for example, the number of cities) raised to some fixed power. NP (Nondeterministic Polynomial-Time) is the class for which a computer can at least *recognize* a solution in polynomial-time. If P=NP, it would mean that for every combinatorial problem of this sort, for which a computer could recognize a valid solution—Sudoku puzzles, scheduling airline flights, fitting boxes into the trunk of a car, etc. etc.—there would be an algorithm that cut through the combinatorial explosion of possible solutions, and zeroed in on the best one. If P≠NP, it would mean that at least some problems of this kind required astronomical time, regardless of how cleverly we programmed our computers.

Most of us believe that P≠NP—indeed, I like to say that if we were physicists, we would’ve simply declared P≠NP a “law of nature,” and given ourselves Nobel Prizes for the discovery of the law! And if it turned out that P=NP, we’d just give ourselves more Nobel Prizes for the law’s overthrow. But because we’re mathematicians and computer scientists, we call it a “conjecture.”

Another famous example of an NP problem is: I give you (say) a 2000-digit number, and I ask you to find its prime factors. Multiplying two thousand-digit numbers is easy, at least for a computer, but factoring the product back into primes *seems* astronomically hard—at least, with our present-day computers running any known algorithm. Why does anyone care? Well, you might know that, any time you order something online—in fact, every time you see a little padlock icon in your web browser—your personal information, like (say) your credit card number, is being protected by a cryptographic code that depends on the belief that factoring huge numbers is hard, or a few closely-related beliefs. If P=NP, then those beliefs would be false, and indeed *all* cryptography that depends on hard math problems would be breakable in “reasonable” amounts of time.

In the special case of factoring, though—and of the other number theory problems that underlie modern cryptography—it wouldn’t even take anything as shocking as P=NP for them to fall. Actually, that provides a good segue into another case where exponentials, and numbers vastly larger than 10^{122}, regularly arise in the real world: quantum mechanics.

Some of you might have heard that quantum mechanics is complicated or hard. But I can let you in on a secret, which is that it’s incredibly simple once you take the physics out of it! Indeed, I think of quantum mechanics as not exactly even “physics,” but more like an operating system that the rest of physics runs on as application programs. It’s a certain generalization of the rules of probability. In one sentence, the central thing quantum mechanics says is that, to fully describe a physical system, you have to assign a number called an “amplitude” to every *possible* configuration that the system could be found in. These amplitudes are used to calculate the probabilities that the system will be found in one configuration or another if you look at it. But the amplitudes aren’t themselves probabilities: rather than just going from 0 to 1, they can be positive or negative or even complex numbers.

For us, the key point is that, if we have a system with (say) a thousand interacting particles, then the rules of quantum mechanics say we need at least 2^{1000} amplitudes to describe it—which is way more than we could write down on pieces of paper filling the entire observable universe! In some sense, chemists and physicists knew about this immensity since 1926. But they knew it mainly as a practical problem: if you’re trying to simulate quantum mechanics on a conventional computer, then as far as we know, the resources needed to do so increase exponentially with the number of particles being simulated. Only in the 1980s did a few physicists, such as Richard Feynman and David Deutsch, suggest “turning the lemon into lemonade,” and building computers that *themselves* would exploit the exponential growth of amplitudes. Supposing we built such a computer, what would it be good for? At the time, the only obvious application was simulating quantum mechanics itself! And that’s probably *still* the most important application today.

In 1994, though, a guy named Peter Shor made a discovery that dramatically increased the level of interest in quantum computers. That discovery was that a quantum computer, if built, could factor an n-digit number using a number of steps that grows only like about n^{2}, rather than exponentially with n. The upshot is that, if and when practical quantum computers are built, they’ll be able to break almost all the cryptography that’s currently used to secure the Internet.

(Right now, only small quantum computers have been built; the record for using Shor’s algorithm is still to factor 21 into 3×7 with high statistical confidence! But Google is planning within the next year or so to build a chip with 49 quantum bits, or qubits, and other groups around the world are pursuing parallel efforts. Almost certainly, 49 qubits still won’t be enough to do anything *useful*, including codebreaking, but it might be enough to do something *classically hard*, in the sense of taking at least ~2^{49} or 563 trillion steps to simulate classically.)

I should stress, though, that for *other* NP problems—including breaking various other cryptographic codes, and solving the Traveling Salesman Problem, Sudoku, and the other combinatorial problems mentioned earlier—we don’t know any quantum algorithm analogous to Shor’s factoring algorithm. For these problems, we generally think that a quantum computer could solve them in roughly the *square root* of the number of steps that would be needed classically, because of another famous quantum algorithm called Grover’s algorithm. But getting an *exponential* quantum speedup for these problems would, at the least, require an additional breakthrough. No one has proved that such a breakthrough in quantum algorithms is impossible: indeed, no one has proved that it’s impossible even for *classical* algorithms; that’s the P vs. NP question! But most of us regard it as unlikely.

If we’re right, then the upshot is that quantum computers are not magic bullets: they might yield dramatic speedups for certain special problems (like factoring), but they won’t tame the curse of exponentiality, cut through to the optimal solution, every time we encounter a Library-of-Babel-like profusion of possibilities. For (say) the Traveling Salesman Problem with a thousand cities, even a quantum computer—which is the most powerful kind of computer rooted in known laws of physics—might, for all we know, take longer than the age of the universe to find the shortest route.

The truth is, though, the biggest numbers that show up in math are *way* bigger than anything we’ve discussed until now: bigger than 10^{122}, or even

$$ 2^{10^{122}}, $$

which is a rough estimate for the number of quantum-mechanical amplitudes needed to describe our observable universe.

For starters, there’s Skewes’ number, which the mathematician G. H. Hardy once called “the largest number which has ever served any definite purpose in mathematics.” Let π(x) be the number of prime numbers up to x: for example, π(10)=4, since we have 2, 3, 5, and 7. Then there’s a certain estimate for π(x) called li(x). It’s known that li(x) overestimates π(x) for an enormous range of x’s (up to trillions and beyond)—but then at some point, it crosses over and starts underestimating π(x) (then overestimates again, then underestimates, and so on). Skewes’ number is an upper bound on the location of the first such crossover point. In 1955, Skewes proved that the first crossover must happen before

$$ x = 10^{10^{10^{964}}}. $$

Note that this bound has since been substantially improved, to 1.4×10^{316}. But no matter: there are numbers vastly bigger even than Skewes’ original estimate, which have since shown up in Ramsey theory and other parts of logic and combinatorics to take Skewes’ number’s place.

Alas, I won’t have time here to delve into specific (beautiful) examples of such numbers, such as Graham’s number. So in lieu of that, let me just tell you about the sorts of processes, going far beyond exponentiation, that tend to yield such numbers.

The starting point is to remember a sequence of operations we all learn about in elementary school, and then ask why the sequence suddenly and inexplicably stops.

As long as we’re only talking about positive integers, “multiplication” just means “repeated addition.” For example, 5×3 means 5 added to itself 3 times, or 5+5+5.

Likewise, “exponentiation” just means “repeated multiplication.” For example, 5^{3} means 5×5×5.

But what’s repeated exponentiation? For that we introduce a new operation, which we call *tetration*, and write like so: ^{3}5 means 5 raised to itself 3 times, or

$$ ^{3} 5 = 5^{5^5} = 5^{3125} \approx 1.9 \times 10^{2184}. $$

But we can keep going. Let x *pentated* to the y, or xPy, mean x tetrated to itself y times. Let x *sextated* to the y, or xSy, mean x pentated to itself y times, and so on.

Then we can define the Ackermann function, invented by the mathematician Wilhelm Ackermann in 1928, which cuts across *all* these operations to get more rapid growth than we could with any one of them alone. In terms of the operations above, we can give a slightly nonstandard, but perfectly serviceable, definition of the Ackermann function as follows:

A(1) is 1+1=2.

A(2) is 2×2=4.

A(3) is 3 to the 3rd power, or 3^{3}=27.

Not very impressive so far! But wait…

A(4) is 4 tetrated to the 4, or

$$ ^{4}4 = 4^{4^{4^4}} = 4^{4^{256}} = BIG $$

A(5) is 5 pentated to the 5, which I won’t even *try* to simplify. A(6) is 6 sextated to the 6. And so on.

More than just a curiosity, the Ackermann function actually shows up sometimes in math and theoretical computer science. For example, the *inverse* Ackermann function—a function α such that α(A(n))=n, which therefore grows as slowly as the Ackermann function grows quickly, and which is at most 4 for any n that would ever arise in the physical universe—sometimes appears in the running times of real-world algorithms.

In the meantime, though, the Ackermann function also has a more immediate application. Next time you find yourself in a biggest-number contest, like the one with which we opened this talk, you can just write A(1000), or even A(A(1000)) (after specifying that A means the Ackermann function above). You’ll win—*period*—unless your opponent has also heard of something Ackermann-like or beyond.

OK, but Ackermann is very far from the end of the story. If we want to go incomprehensibly beyond it, the starting point is the so-called “Berry Paradox”, which was first described by Bertrand Russell, though he said he learned it from a librarian named Berry. The Berry Paradox asks us to imagine leaping past exponentials, the Ackermann function, and every other particular system for naming huge numbers. Instead, why not just go straight for a single gambit that seems to beat everything else:

**The biggest number that can be specified using a hundred English words or fewer**

Why is this called a paradox? Well, do any of you see the problem here?

Right: if the above made sense, then we could just as well have written

**Twice the biggest number that can be specified using a hundred English words or fewer**

But *we just specified that number*—one that, by definition, takes more than a hundred words to specify—using far fewer than a hundred words! Whoa. What gives?

Most logicians would say the resolution of this paradox is simply that the concept of “specifying a number with English words” isn’t precisely defined, so phrases like the ones above don’t actually name definite numbers. And how do we know that the concept isn’t precisely defined? Why, because if it was, then it would lead to paradoxes like the Berry Paradox!

So if we want to escape the jaws of logical contradiction, then in this gambit, we ought to replace English by a clear, logical language: one that can be used to specify numbers in a completely unambiguous way. Like … oh, I know! Why not write:

**The biggest number that can be specified using a computer program that’s at most 1000 bytes long**

To make this work, there are just two issues we need to get out of the way. First, what does it mean to “specify” a number using a computer program? There are different things it could mean, but for concreteness, let’s say a computer program specifies a number N if, when you run it (with no input), the program runs for exactly N steps and then stops. A program that runs forever doesn’t specify any number.

The second issue is, which programming language do we have in mind: BASIC? C? Python? The answer is that it won’t much matter! The Church-Turing Thesis, one of the foundational ideas of computer science, implies that every “reasonable” programming language can emulate every other one. So the story here can be repeated with just about any programming language of your choice. For concreteness, though, we’ll pick one of the first and simplest programming languages, namely “Turing machine”—the language invented by Alan Turing all the way back in 1936!

In the Turing machine language, we imagine a one-dimensional tape divided into squares, extending infinitely in both directions, and with all squares initially containing a “0.” There’s also a tape head with n “internal states,” moving back and forth on the tape. Each internal state contains an instruction, and the only allowed instructions are: write a “0” in the current square, write a “1” in the current square, move one square left on the tape, move one square right on the tape, jump to a different internal state, halt, and do any of the previous conditional on whether the current square contains a “0” or a “1.”

Using Turing machines, in 1962 the mathematician Tibor Radó invented the so-called Busy Beaver function, or BB(n), which allowed naming *by far* the largest numbers anyone had yet named. BB(n) is defined as follows: consider all Turing machines with n internal states. Some of those machines run forever, when started on an all-0 input tape. Discard them. Among the ones that eventually halt, there must be some machine that runs for a maximum number of steps before halting. However many steps that is, that’s what we call BB(n), the n^{th} Busy Beaver number.

The first few values of the Busy Beaver function have actually been calculated, so let’s see them.

BB(1) is 1. For a 1-state Turing machine on an all-0 tape, the choices are limited: either you halt in the very first step, or else you run forever.

BB(2) is 6, as isn’t *too* hard to verify by trying things out with pen and paper.

BB(3) is 21: that determination was already a research paper.

BB(4) is 107 (another research paper).

Much like with the Ackermann function, not very impressive yet! But wait:

BB(5) is not yet known, but it’s known to be at least 47,176,870.

BB(6) is at least 7.4×10^{36,534}.

BB(7) is at least

$$ 10^{10^{10^{10^{18,000,000}}}}. $$

Clearly we’re dealing with a monster here, but can we understand just how terrifying of a monster? Well, call a sequence f(1), f(2), … *computable*, if there’s some computer program that takes n as input, runs for a finite time, then halts with f(n) as its output. To illustrate, f(n)=n^{2}, f(n)=2^{n}, and even the Ackermann function that we saw before are all computable.

But I claim that the Busy Beaver function grows faster than *any* computable function. Since this talk should have at least *some* math in it, let’s see a proof of that claim.

Maybe the nicest way to see it is this: suppose, to the contrary, that there were a computable function f that grew at least as fast as the Busy Beaver function. Then by using that f, we could take the Berry Paradox from before, and turn it into an *actual* contradiction in mathematics! So for example, suppose the program to compute f were a thousand bytes long. Then we could write another program, not much longer than a thousand bytes, to run for (say) 2×f(1000000) steps: that program would just need to include a subroutine for f, plus a little extra code to feed that subroutine the input 1000000, and then to run for 2×f(1000000) steps. But by assumption, f(1000000) is at least the maximum number of steps that any program up to a million bytes long can run for—even though we just wrote a program, less than a million bytes long, that ran for more steps! This gives us our contradiction. The only possible conclusion is that the function f, and the program to compute it, couldn’t have existed in the first place.

(As an alternative, rather than arguing by contradiction, one could simply start with any computable function f, and then build programs that compute f(n) for various “hardwired” values of n, in order to show that BB(n) must grow at least as rapidly as f(n). Or, for yet a third proof, one can argue that, if any upper bound on the BB function were computable, then one could use that to solve the halting problem, which Turing famously showed to be uncomputable in 1936.)

In some sense, it’s not so surprising that the BB function should grow uncomputably quickly—because if it *were* computable, then huge swathes of mathematical truth would be laid bare to us. For example, suppose we wanted to know the truth or falsehood of the Goldbach Conjecture, which says that every even number 4 or greater can be written as a sum of two prime numbers. Then we’d just need to write a program that checked each even number one by one, and halted if and only if it found one that *wasn’t* a sum of two primes. Suppose that program corresponded to a Turing machine with N states. Then by definition, if it halted at all, it would have to halt after at most BB(N) steps. But that means that, if we *knew* BB(N)—or even any upper bound on BB(N)—then we could find out whether our program halts, by simply running it for the requisite number of steps and seeing. In that way we’d learn the truth or falsehood of Goldbach’s Conjecture—and similarly for the Riemann Hypothesis, and every other famous unproved mathematical conjecture (there are a lot of them) that can be phrased in terms of a computer program never halting.

(Here, admittedly, I’m using “we could find” in an *extremely* theoretical sense. Even if someone handed you an N-state Turing machine that ran for BB(N) steps, the number BB(N) would be so hyper-mega-astronomical that, in practice, you could probably never distinguish the machine from one that simply ran forever. So the aforementioned “strategy” for proving Goldbach’s Conjecture, or the Riemann Hypothesis would probably never yield fruit before the heat death of the universe, even though *in principle* it would reduce the task to a “mere finite calculation.”)

OK, you wanna know something else wild about the Busy Beaver function? In 2015, my former student Adam Yedidia and I wrote a paper where we proved that BB(8000)—i.e., the 8000^{th} Busy Beaver number—*can’t be determined* using the usual axioms for mathematics, which are called Zermelo-Fraenkel (ZF) set theory. Nor can B(8001) or any larger Busy Beaver number.

To be sure, BB(8000) *has* some definite value: there are finitely many 8000-state Turing machines, and each one either halts or runs forever, and among the ones that halt, there’s *some* maximum number of steps that any of them runs for. What we showed is that math, if it limits itself to the currently-accepted axioms, can never prove the value of BB(8000), even in principle.

The way we did that was by explicitly constructing an 8000-state Turing machine, which (in effect) enumerates all the consequences of the ZF axioms one after the next, and halts if and only if it ever finds a contradiction—that is, a proof of 0=1. Presumably set theory is actually consistent, and therefore our program runs forever. But if you *proved* the program ran forever, you’d also be proving the consistency of set theory. And has anyone heard of any obstacle to doing that? Of course, Gödel’s Incompleteness Theorem! Because of Gödel, if set theory is consistent (well, technically, also arithmetically sound), then it can’t prove our program either halts or runs forever. But that means set theory can’t determine BB(8000) either—because if it could do *that*, then it could also determine the behavior of our program.

To be clear, it was long understood that there’s *some* computer program that halts if and only if set theory is inconsistent—and therefore, that the axioms of set theory can determine at most k values of the Busy Beaver function, for *some* positive integer k. “All” Adam and I did was to prove the first explicit upper bound, k≤8000, which required a lot of optimizations and software engineering to get the number of states down to something reasonable (our initial estimate was more like k≤1,000,000). More recently, Stefan O’Rear has improved our bound—most recently, he says, to k≤1000, meaning that, at least by the lights of ZF set theory, fewer than a thousand values of the BB function can ever be known.

Meanwhile, let me remind you that, at present, only four values of the function *are* known! Could the value of BB(100) already be independent of set theory? What about BB(10)? BB(5)? Just how early in the sequence do you leap off into Platonic hyperspace? I don’t know the answer to that question but would love to.

Ah, you ask, but is there any number sequence that grows so fast, it blows *even the Busy Beavers* out of the water? There is!

Imagine a magic box into which you could feed in any positive integer n, and it would instantly spit out BB(n), the n^{th} Busy Beaver number. Computer scientists call such a box an “oracle.” Even though the BB function is uncomputable, it still makes mathematical sense to imagine a Turing machine that’s enhanced by the magical ability to access a BB oracle any time it wants: call this a “super Turing machine.” Then let SBB(n), or the nth super Busy Beaver number, be the maximum number of steps that any n-state *super* Turing machine makes before halting, if given no input.

By simply repeating the reasoning for the ordinary BB function, one can show that, not only does SBB(n) grow faster than any computable function, it grows faster than *any function computable by super Turing machines* (for example, BB(n), BB(BB(n)), etc).

Let a super duper Turing machine be a Turing machine with access to an oracle for the super Busy Beaver numbers. Then you can use super duper Turing machines to define a super duper Busy Beaver function, which you can use in turn to define super duper pooper Turing machines, and so on!

Let “level-1 BB” be the ordinary BB function, let “level-2 BB” be the super BB function, let “level 3 BB” be the super duper BB function, and so on. Then clearly we can go to “level-k BB,” for any positive integer k.

But we need not stop even there! We can then go to level-ω BB. What’s ω? Mathematicians would say it’s the “first infinite ordinal”—the ordinals being a system where you can pass from any set of numbers you can possibly name (even an infinite set), to the next number larger than all of them. More concretely, the level-ω Busy Beaver function is simply the Busy Beaver function for Turing machines that are able, whenever they want, to call an oracle to compute the level-k Busy Beaver function, *for any positive integer k of their choice*.

But why stop there? We can then go to level-(ω+1) BB, which is just the Busy Beaver function for Turing machines that are able to call the level-ω Busy Beaver function as an oracle. And thence to level-(ω+2) BB, level-(ω+3) BB, etc., defined analogously. But then we can transcend that entire sequence and go to level-2ω BB, which involves Turing machines that can call level-(ω+k) BB as an oracle for any positive integer k. In the same way, we can pass to level-3ω BB, level-4ω BB, etc., until we transcend that entire sequence and pass to level-ω^{2} BB, which can call *any* of the previous ones as oracles. Then we have level-ω^{3} BB, level-ω^{4} BB, etc., until we transcend *that* whole sequence with level-ω^{ω} BB. But we’re still not done! For why not pass to level

$$ \omega^{\omega^{\omega}} $$,

level

$$ \omega^{\omega^{\omega^{\omega}}} $$,

etc., until we reach level

$$ \left. \omega^{\omega^{\omega^{.^{.^{.}}}}}\right\} _{\omega\text{ times}} $$?

(This last ordinal is also called ε_{0}.) And mathematicians know how to keep going even to way, way bigger ordinals than ε_{0}, which give rise to ever more rapidly-growing Busy Beaver sequences. Ordinals achieve something that on its face seems paradoxical, which is to systematize the concept of transcendence.

So then just how far can you push this? Alas, ultimately the answer depends on which axioms you assume for mathematics. The issue is this: once you get to sufficiently enormous ordinals, you need some systematic way to *specify* them, say by using computer programs. But then the question becomes which ordinals you can “prove to exist,” by giving a computer program together with a proof that the program does what it’s supposed to do. The more powerful the axiom system, the bigger the ordinals you can prove to exist in this way—but every axiom system will run out of gas at some point, only to be transcended, in Gödelian fashion, by a yet more powerful system that can name yet larger ordinals.

So for example, if we use Peano arithmetic—invented by the Italian mathematician Giuseppe Peano—then Gentzen proved in the 1930s that we can name any ordinals below ε_{0}, but not ε_{0} itself or anything beyond it. If we use ZF set theory, then we can name vastly bigger ordinals, but once again we’ll eventually run out of steam.

(Technical remark: some people have claimed that we can transcend this entire process by passing from first-order to second-order logic. But I fundamentally disagree, because with second-order logic, *which number you’ve named* could depend on the model of set theory, and therefore be impossible to pin down. With the ordinal Busy Beaver numbers, by contrast, the number you’ve named might be breathtakingly hopeless ever to compute—but provided the notations have been fixed, and the ordinals you refer to actually exist, at least we know there *is* a unique positive integer that you’re talking about.)

Anyway, the upshot of all of this is that, if you try to hold a name-the-biggest-number contest between two actual professionals who are trying to win, it will (alas) degenerate into an argument about the axioms of set theory. For the stronger the set theory you’re allowed to assume consistent, the bigger the ordinals you can name, therefore the faster-growing the BB functions you can define, therefore the bigger the actual numbers.

So, yes, in the end the biggest-number contest just becomes another Gödelian morass, but one can get surprisingly far before that happens.

In the meantime, our universe seems to limit us to at most 10^{122} choices that could ever be made, or experiences that could ever be had, by any one observer. Or fewer, if you believe that you won’t live until the heat death of the universe in some post-Singularity computer cloud, but for at most about 10^{2} years. In the meantime, the survival of the human race might hinge on people’s ability to understand much smaller numbers than 10^{122}: for example, a billion, a trillion, and other numbers that characterize the exponential growth of our civilization and the limits that we’re now running up against.

On a happier note, though, if our goal is to make math engaging to young people, or to build bridges between the quantitative and literary worlds, the way this festival is doing, it seems to me that it wouldn’t hurt to let people know about the vastness that’s out there. Thanks for your attention.

This T-shirt came to mind last September. I was standing in front of a large silver-colored table littered with wires, cylinders, and tubes. Greg Bentsen was pointing at components and explaining their functions. He works in Monika Schleier-Smith’s lab, as … Continue reading

This T-shirt came to mind last September. I was standing in front of a large silver-colored table littered with wires, cylinders, and tubes. Greg Bentsen was pointing at components and explaining their functions. He works in Monika Schleier-Smith’s lab, as a PhD student, at Stanford.

Monika’s group manipulates rubidium atoms. A few thousand atoms sit in one of the cylinders. That cylinder contains another cylinder, an *optical* *cavity*, that contains the atoms. A mirror caps each of the cavity’s ends. Light in the cavity bounces off the mirrors.

Light bounces off your bathroom mirror similarly. But we can describe your bathroom’s light accurately with Maxwellian electrodynamics, a theory developed during the 1800s. We describe the cavity’s light with *quantum electrodynamics* (QED). Hence we call the lab’s set-up *cavity QED*.

The light interacts with the atoms, entangling with them. The entanglement imprints information about the atoms on the light. Suppose that light escaped from the cavity. Greg and friends could measure the light, then infer about the atoms’ quantum state.

A little light leaks through the mirrors, though most light bounces off. From leaked light, you can infer about the ensemble of atoms. You can’t infer about individual atoms. For example, consider an atom’s electrons. Each electron has a quantum property called a *spin*. We sometimes imagine the spin as an arrow that points upward or downward. Together, the electrons’ spins form the atom’s *joint spin*. You can tell, from leaked light, whether one atom’s spin points upward. But you can’t tell which atom’s spin points upward. You can’t see the atoms for the ensemble.

Monika’s team can. They’ve cut a hole in their cylinder. Light escapes the cavity through the hole. The light from the hole’s left-hand edge carries information about the leftmost atom, and so on. The team develops a photograph of the line of atoms. Imagine holding a photograph of a line of people. You can point to one person, and say, “Aha! *She’s* the xkcd fan.” Similarly, Greg and friends can point to one atom in their photograph and say, “Aha! *That* atom has an upward-pointing spin.” Monika’s team is developing *single-site imaging.*

Monika’s team plans to image atoms in such detail, they won’t need for light to leak through the mirrors. Light leakage creates problems, including by entangling the atoms with the world outside the cavity. Suppose you had to diminish the amount of light that leaks from a rubidium cavity. How should you proceed?

Tell the mirrors,

You should lengthen the cavity. Why? Imagine a photon, a particle of light, in the cavity. It zooms down the cavity’s length, hits a mirror, bounces off, retreats up the cavity’s length, hits the other mirror, and bounces off. The photon repeats this process until a mirror hit fails to generate a bounce. The mirror *transmits* the photon to the exterior; the photon leaks out. How can you reduce leaks? By preventing photons from hitting mirrors so often, by forcing the photons to zoom longer, by lengthening the cavity, by shifting the mirrors outward.

So Greg hinted, beside that silver-colored table in Monika’s lab. The hint struck a chord: I recognized the impulse to

The impulse had led me to Stanford.

Weeks earlier, I’d written my first paper about quantum chaos and information scrambling. I’d sat and read and calculated and read and sat and emailed and written. I needed to stand up, leave my cavity, and image my work from other perspectives.

Stanford physicists had written quantum-chaos papers I admired. So I visited, presented about my work, and talked. Patrick Hayden introduced me to a result that might help me apply my result to another problem. His group helped me simplify a mathematical expression. Monika reflected that a measurement scheme I’d proposed sounded not unreasonable for cavity QED.

And Greg led me to recognize the principle behind my visit: Sometimes, you have to

to move forward.

*With gratitude to Greg, Monika, Patrick, and the rest of Monika’s and Patrick’s groups for their time, consideration, explanations, and feedback. With thanks to Patrick and Stanford’s Institute for Theoretical Physics for their hospitality.*

The top quark is the heaviest known matter corpuscle we consider elementary.

Elementary is an overloaded word in English, so I need to explain what it means in the context of subatomic particles. If we grab a dictionary we get several possibilities, like e.g.- elementary: pertanining to or dealing with elements, rudiments, or first principles

- elementary: of the nature of an ultimate constituent; uncompounded

- elementary: not decomposable into elements or other primary constituents

- elementary: simple

Elementary is an overloaded word in English, so I need to explain what it means in the context of subatomic particles. If we grab a dictionary we get several possibilities, like e.g.- elementary: pertanining to or dealing with elements, rudiments, or first principles

- elementary: of the nature of an ultimate constituent; uncompounded

- elementary: not decomposable into elements or other primary constituents

- elementary: simple

I want to show you a combinatorial interpretation of the reverse Bessel polynomials which I learnt from Alan Sokal. The sequence of reverse Bessel polynomials begins as follows.

$\begin{aligned} \theta_0(R)&=1\\ \theta_1(R)&=R+1\\ \theta_2(R)&=R^2+3R+3\\ \theta_3(R)&= R^3 +6R^2+15R+15 \end{aligned}$

To give you a flavour of the combinatorial interpretation we will prove, you can see that the second reverse Bessel polynomial can be read off the following set of ‘weighted Schröder paths’: multiply the weights together on each path and add up the resulting monomials.

In this post I’ll explain how to prove the general result, using a certain result about weighted Dyck paths that I’ll also prove. At the end I’ll leave some further questions for the budding enumerative combinatorialists amongst you.

These reverse Bessel polynomials have their origins in the theory of Bessel functions, but which I’ve encountered in the theory of magnitude, and they are key to a formula I give for the magnitude of an odd dimensional ball which I have just posted on the arxiv.

In that paper I use the combinatorial expression for these Bessel polynomials to prove facts about the magnitude.

Here, to simplify things slightly, I have used the standard reverse Bessel polynomials whereas in my paper I use a minor variant (see below).

I should add that a very similar expression can be given for the ordinary, unreversed Bessel polynomials; you just need a minor modification to the way the weights on the Schröder paths are defined. I will leave that as an exercise.

The reverse Bessel polynomials have many properties. In particular they satisfy the recursion relation $\theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R)$ and $\theta_i(R)$ satisfies the differential equation $R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0.$ There’s an explicit formula: $\theta_i(R) = \sum_{t=0}^i \frac{(i+t)!}{(i-t)!\, t!\, 2^t}R^{i-t}.$

I’m interested in them because they appear in my formula for the magnitude of odd dimensional balls. To be more precise, in my formula I use the associated Sheffer polynomials, $(\chi_i(R))_{i=0}^\infty$; they are related by $\chi_i(R)=R\theta_{i-1}(R)$, so the coefficients are the same, but just moved around a bit. These polynomials have a similar but slightly more complicated combinatorial interpretation.

In my paper I prove that the magnitude of the $(2p+1)$-dimensional ball of radius $R$ has the following expression:

$\left|B^{2p+1}_R \right|= \frac{\det[\chi_{i+j+2}(R)]_{i,j=0}^p}{(2p+1)!\, R\,\det[\chi_{i+j}(R)]_{i,j=0}^p}$

As the each polynomial $\chi_i(R)$ has a path counting interpretation, one can use the rather beautiful Lindström-Gessel-Viennot Lemma to give a path counting interpretation to the determinants in the above formula and find some explicit expression. I will probably blog about this another time. (Fellow host Qiaochu has also blogged about the LGV Lemma.)

Before getting on to Bessel polynomials and weighted Schröder paths, we need to look at counting weighted Dyck paths, which are simpler and more classical.

A **Dyck path** is a path in the lattice $\mathbb{Z}^2$ which starts at $(0,0)$, stays in the upper half plane, ends back on the $x$-axis at $(2{i},0)$ and has steps going either diagonally right and up or right and down. The integer $2{i}$ is called the length of the path. Let $D_{i}$ be the set of length $2{i}$ Dyck paths.

For each Dyck path $\sigma$, we will weight each edge going right and down, from $(x,y)$ to $(x+1,y-1)$ by $y$ then we will take $w(\sigma)$, the weight of $\sigma$, to be the product of all the weights on its steps. Here are all five weighted Dyck paths of length six.

Famously, the number of Dyck paths of length $2{i}$ is given by the ${i}$th Catalan number; here, however, we are interested in the number of paths weighted by the weighting(!). If we sum over the weights of each of the above diagrams we get $6+4+2+2+1=15$. Note that this is $5\times 3 \times 1$. This is a pattern that holds in general.

Theorem A.(Françon and Viennot) The weighted count of length $2{i}$ Dyck paths is equal to the double factorial of $2{i} -1$: $\begin{aligned} \sum_{\sigma\in D_{i}} w(\sigma)&= (2{i} -1)\cdot (2{i} -3)\cdot (2{i}-5)\cdot \cdots\cdot 3\cdot 1 \\ &\eqqcolon (2{i} -1)!!. \end{aligned}$

The following is a nice combinatorial proof of this theorem that I found in a survey paper by Callan. (I was only previously aware of a high-tech proof involving continued fractions and a theorem of Gauss.)

The first thing to note is that the weight of a Dyck path is actually counting something. It is counting the ways of labelling each of the down steps in the diagram by a positive integer less than the height (i.e.~the weight) of that step. We call such a labelling a **height labelling**. Note that we have no choice of weighting but we often have choice of height labelling. Here’s a height labelled Dyck path.

So the weighted count of Dyck paths of length $2{i}$ is precisely the number of height labelled Dyck paths of length $2{i}$. $\sum_{\sigma\in D_{i}} w(\sigma) = \#\{\text{height labelled paths of length }\,\,2{i}\}$

We are going to consider **marked** Dyck paths, which just means we single out a specific vertex. A path of length $2{i}$ has $2{i} + 1$ vertices. Thus

$\begin{aligned} \#\{\text{height labelled,}\,\, &\text{ MARKED paths of length }\,\,2{i}\}\\ &=(2{i}+1)\times\#\{\text{height labelled paths of length }\,\,2{i}\}. \end{aligned}$

Hence the theorem will follow by induction if we find a bijection

$\begin{aligned} \{\text{height labelled,}\,\,&\text{ paths of length }\,\,2{i} \}\\ &\cong \{\text{height labelled, MARKED paths of length }\,\,2{i}-2 \}. \end{aligned}$

Such a bijection can be constructed in the following way. Given a height labelled Dyck path, remove the left-hand step and the first step that has a label of one on it. On each down step between these two deleted steps decrease the label by one. Now join the two separated parts of the path together and mark the vertex at which they are joined. Here is an example of the process.

Working backwards it is easy to describe the inverse map. And so the theorem is proved.

In order to give a path theoretic interpretation of reverse Bessel polynomials we will need to use Schröder paths. These are like Dyck paths except we allow a certain kind of flat step.

A **Schröder path** is a path in the lattice $\mathbb{Z}^2$ which starts at $(0,0)$, stays in the upper half plane, ends back on the $x$-axis at $(2{i},0)$ and has steps going either diagonally right and up, diagonally right and down, or horizontally two units to the right. The integer $2{i}$ is called the length of the path. Let $S_{i}$ be the set of all length $2{i}$ Schröder paths.

For each Schröder path $\sigma$, we will weight each edge going right and down, from $(x,y)$ to $(x+1,y-1)$ by $y$ and we will weight each flat edge by the indeterminate $R$. Then we will take $w(\sigma)$, the weight of $\sigma$, to be the product of all the weights on its steps.

Here is the picture of all six length four weighted Schröder paths again.

You were asked at the top of this post to check that the sum of the weights equals the second reverse Bessel polynomial. Of course that result generalizes!

The following theorem was shown to me by Alan Sokal, he proved it using continued fractions methods, but these essentially amount to the combinatorial proof I’m about to give.

Theorem B.The weighted count of length $2{i}$ Schröder paths is equal to the ${i}$th reverse Bessel polynomial: $\sum_{\sigma\in S_{i}} w(\sigma)= \theta_{i}(R).$

The idea is to observe that you can remove the flat steps from a weighted Schröder path to obtain a weighted Dyck path. If a Schröder path has length $2{i}$ and $t$ upward steps then it has $t$ downward steps and ${i}-t$ flat steps, so it has a total of ${i}+t$ steps. This means that there are $\binom{{i}+t}{{i}-t}$ length $2{i}$ Schröder paths with the same underlying length $2t$ Dyck path (we just choose were to insert the flat steps). Let’s write $S^t_{i}$ for the set of Schröder paths of length $2{i}$ with $t$ upward steps. $\begin{aligned} \sum_{\sigma\in S_{i}} w(\sigma) &= \sum_{t=0}^{i} \sum_{\sigma\in S^t_{i}} w(\sigma) = \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}\sum_{\sigma'\in D_t} w(\sigma')R^{{i}-t}\\ &= \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}(2t-1)!!\,R^{{i}-t}\\ &= \sum_{t=0}^{i} \frac{({i}+t)!}{({i}-t)!\,(2t)!}\frac{(2t)!}{2^t t!}R^{{i}-t}\\ &= \theta_{i}(R), \end{aligned}$ where the last equality comes from the formula for $\theta_{i}(R)$ given at the beginning of the post.

Thus we have the required combinatorial interpretation of the reverse Bessel polynomials.

The first question that springs to mind for me is if it is possible to give a bijective proof of Theorem B, similar in style, perhaps (or perhaps not), to the proof given of Theorem A, basically using the recursion relation $\theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R)$ rather than the explicit formular for them.

The second question would be whether the differential equation $R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0.$ has some sort of combinatorial interpretation in terms of paths.

I’m interested to hear if anyone has any thoughts.

Last time we proved Flajolet’s Fundamental Lemma about enumerating Dyck paths. This time I want to give some examples, in particular to relate this to what I wrote previously about Dyck paths, Schröder paths and what they have to do with reverse Bessel polynomials.

We’ll see that the generating function of the sequence of reverse Bessel polynomials $\left(\theta_i(R)\right)_{i=0}^\infty$ has the following continued fraction expansion.

$\sum_{i=0}^\infty \theta_i(R) \,t^i = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}$

I’ll even give you a snippet of SageMath code so you can have a play around with this if you like.

Let’s just recall from last time that if we take Motzkhin paths weighted by $a_i$s, $b_i$s and $c_i$s as in this example,

then when we sum the weightings of all Motzkhin paths together we have the following continued fraction expression. $\sum_{\sigma\,\,\mathrm{Motzkhin}} w_{a,b,c}(\sigma) = \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}} \in\mathbb{Z}[[a_i, b_i, c_i]]$

Flajolet’s Fundamental Lemma is very beautiful, but we want a power series going up in terms of path length. So let’s use another variable $t$ to keep track of path length. All three types of step in a Motzkhin path have length one. We can set $a_i=\alpha_i t$, $b_i=\beta_i t$ and $c_i=\gamma_i t$. Then $\sum_{\sigma} w_{a, b, c}(\sigma)\in \mathbb{Z}[\alpha_i, \beta_i, \gamma_i][[t]]$, and the coefficient of $t^\ell$ will be the sum of the weights of Motzkhin paths of length $\ell$. This coefficient will be a polynomial (rather than a power series) as there are only finitely paths of a given length.

$\sum_{\ell=0}^\infty\left(\sum_{\sigma\,\,\text{Motzkhin length}\,\,\ell} w_{\alpha,\beta,\gamma}(\sigma)\right)t^\ell = \frac{1} {1- \gamma_{0}t - \frac{\alpha_{1}\beta_1 t^2} {1-\gamma_{1}t - \frac{\alpha_{2}\beta_2 t^2} {1- \gamma_2t - \frac{\alpha_3 \beta_3 t^2} {1-\dots }}}}$

Such a continued fraction is called a **Jacobi** (or J-type) continued fraction. They crop up in the study of moments of orthogonal polynomials and also in birth-death processes.

For example, I believe that Euler proved the following Jacobi continued fraction expansion of the generating function of the factorials. $\sum_{\ell=0}^\infty \ell!\, t^\ell = \frac{1} {1- t - \frac{t^2} {1-3 t - \frac{4 t^2} {1- 5t - \frac{9 t^2} {1-\dots }}}}$ We can get the right hand side by taking $\alpha_i=\beta_i=i$ and $\gamma_i=2i+1$. Here is a Motzkhin path weighted in that way.

The equation above is telling us that if we weight Motzkhin paths in that way, then the weighted count of Motzkhin paths of length $\ell$ is $\ell!$, and that deserves an exclamation mark! (You’re invited to verify this for Motzkhin paths of length 4.)

I’ve put some SageMath code at the bottom of this post if you want to check the continued fraction equality numerically.

A Dyck path is a Motzkhin path with no flat steps. So if we weight the flat steps in Motzkhin paths with $0$ then when we do a weighted count then we just count the weighted Dyck paths. This means setting $\gamma_i=0$.

Also the weigh $\alpha_i$ on an up step always appears with the weight $\beta_i$ on a corresponding down step (what goes up must come down!) so we can simplify things by just putting a weighting $\alpha_i\beta_i$ — which we’ll rename as $\alpha_i$ — on the down step from level $i$ and put a weighting of $1$ on each up step. We can call this weighting $w_\alpha$.

Putting this together we get the following, where we’ve noted that there are no Dyck paths of odd length.

$\sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Dyck length}\,\,2n} w_\alpha(\sigma)\right)t^{2n} = \frac{1} {1- \frac{\alpha_{1} t^2} {1- \frac{\alpha_{2} t^2} {1- \frac{\alpha_3 t^2} {1-\dots }}}}$

This kind of continued fraction is called a **Stieltjes** (or S-type) continued fraction. Of course, we could replace $t^2$ by $t$ in the above, without any ill effect.

Previously we proved combinatorially that with the weighting where $\alpha_i=i$ the weighted count of Dyck paths of length $2n$ was precisely $(2n-1)!!$. This means that we have proved the following continued fraction expansion of the generating function of the odd double factorials.

$\sum_{n=0}^\infty (2n -1)!!\, t^{2n} = \frac{1} {1- \frac{ t^2} {1- \frac{2 t^2} {1- \frac{3 t^2} {1-\dots }}}}$

I believe this was originally proved by Gauss, but I have no idea how.

Again there’s some SageMath code at the end for you to see this in action.

What I’m really interested in, you’ll remember, is reverse Bessel polynomials, and these are giving weighted counts of Schroder paths. Using continued fractions in this context is less standard than for Dyck paths and Motzkhin paths as above, but it only requires a minor modification. I learnt about this from Alan Sokal.

The difference between Motzkhin paths and Schröoder paths is that the flat steps have length $2$ in Schroder paths. Remember that the power of $t$ was encoding the length, so we just have to assign $t^2$ to each flat step rather than $t$. So if we put $a_i= t$, $b_i=\alpha_i t$ and $c_i = \gamma_i t^2$ in Flajolet’s Fundamental Lemma then we get the following.

$\sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{\alpha,\gamma}(\sigma)\right)t^{2n} = \frac{1} {1- \gamma_{0}t^2 - \frac{\alpha_{1} t^2} {1-\gamma_{1}t^2 - \frac{\alpha_{2} t^2} {1- \gamma_2t^2 - \frac{\alpha_3 t^2} {1-\dots }}}}$

Here $w_{\alpha, \gamma}$ is the weighting where we put $\alpha_i$s on the down steps and $\gamma_i$s on the flat steps.

This kind of continued fraction is called a **Thron** (or T-type) continued fraction. Again, we could replace $t^2$ by $t$ in the above, without any ill effect.

We saw before that if we take the weighting, $w_{rBp}$, with $\alpha_i:=i$ and $\gamma_i:=R$, such as in the following picture,

then the weighted sum of Schroder paths of length $2n$ is precisely the $n$th reverse Bessel polynomial: $\theta_i(R)= \sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{rBp}(\sigma).$

Putting that together with the Thron continued fraction above we get the following Thron continued fraction expansion for the generating function of the reverse Bessel polynomials.

$\sum_{n=0}^\infty \theta_n(R) t^n = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}$

This expression is given by Paul Barry, without any reference, in the formulas section of the entry in the Online Encyclopedia of Integer Sequences.

See the end of the post for some SageMath code to check this numerically.

In my recent magnitude paper I actually work backwards. I start with the continued fraction expansion as a given, and use Flajolet’s Fundamental Lemma to give the Schr&\ouml;der path interpretation of the reverse Bessel polynomials. Of course, I now know that I can bypass the use of continued fractions completely, and have a purely combinatorial proof of this interpretation. Regardless of that, however, the theory of lattice paths and continued fractions remains beautiful.

It’s quite easy to play around with these continued fractions in SageMath, at least to some finite order. I thought I’d let you have some code to get you started…

Here’s some SageMath code for you to check the Jacodi continued fraction expansion of the generating function of the factorials.

```
# T = Z[t]
T.<t> = PolynomialRing(ZZ)
# We'll take the truncated continued fraction to be in the
# ring of rational functions, P = Z(t)
P = Frac(T)
def j_ctd_frac(alphas, gammas):
if alphas == [] or gammas == []:
return 1
else:
return P(1/(1 - gammas[0]*t - alphas[0]*t^2*
j_ctd_frac(alphas[1:], gammas[1:])))
cf(t) = j_ctd_frac([1, 4, 9, 16, 25, 36], [1, 3, 5, 7, 9, 11])
print cf(t).series(t, 10)
```

The above code can be used to define a Stieltjes continued fraction and check out the expansion of Gauss on the odd double factorials.

```
def s_ctd_frac(alphas):
gammas = [0]*len(alphas)
return j_ctd_frac(alphas, gammas)
cf(t) = s_ctd_frac([1, 2, 3, 4, 5, 6])
print cf(t).series(t, 13)
```

Here’s the code for getting the reverse Bessel polynomials from a Thron continued fraction.

```
S.<R> = PolynomialRing(ZZ)
T.<t> = PowerSeriesRing(S)
def t_ctd_frac(alphas, gammas):
if alphas == [] or gammas == []:
return 1
else:
return (1/(1- gammas[0]*t^2 - alphas[0]*t^2*
t_ctd_frac(alphas[1:], gammas[1:])))
print T(t_ctd_frac([1, 2, 3, 4, 5, 6], [R, R, R, R, R, R]),
prec=13)
```

I’m running a special session on applied category theory, and now the program is available: • Applied category theory, Fall Western Sectional Meeting of the AMS, 4-5 November 2017, U.C. Riverside. This is going to be fun. My former student Brendan Fong is now working with David Spivak at MIT, and they’re both coming. My […]

I’m running a special session on applied category theory, and now the program is available:

• Applied category theory, Fall Western Sectional Meeting of the AMS, 4-5 November 2017, U.C. Riverside.

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at MIT, and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopolou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone—for example, most of my students will be speaking at this special session, and other people too—but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

9:00 a.m.

A higher-order temporal logic for dynamical systems.

**David I. Spivak**, MIT

10:00 a.m.

Algebras of open dynamical systems on the operad of wiring diagrams.

**Dmitry Vagner***, Duke University

**David I. Spivak**, MIT

**Eugene Lerman**, University of Illinois at Urbana-Champaign

10:30 a.m.

Abstract dynamical systems.

**Christina Vasilakopoulou***, University of California, Riverside

**David Spivak**, MIT

**Patrick Schultz**, MIT

3:00 p.m.

Black boxes and decorated corelations.

**Brendan Fong**, MIT

4:00 p.m.

Compositional modelling of open reaction networks.

**Blake S. Pollard***, University of California, Riverside

**John C. Baez**, University of California, Riverside

4:30 p.m.

A bicategory of coarse-grained Markov processes.

**Kenny Courser**, University of California, Riverside

5:00 p.m.

A bicategorical syntax for pure state qubit quantum mechanics.

**Daniel M. Cicala**, University of California, Riverside

5:30 p.m.

Open systems in classical mechanics.

**Adam Yassine**, University of California Riverside

9:00 a.m.

Controllability and observability: diagrams and duality.

**Jason Erbele**, Victor Valley College

9:30 a.m.

Frobenius monoids, weak bimonoids, and corelations.

**Brandon Coya**, University of California, Riverside

10:00 a.m.

Compositional design and tasking of networks.

**John D. Foley***, Metron, Inc.

**John C. Baez**, University of California, Riverside

**Joseph Moeller**, University of California, Riverside

**Blake S. Pollard**, University of California, Riverside

10:30 a.m.

Operads for modeling networks.

**Joseph Moeller***, University of California, Riverside

**John Foley**, Metron Inc.

**John C. Baez**, University of California, Riverside

**Blake S. Pollard**, University of California, Riverside

2:00 p.m.

Reeb graph smoothing via cosheaves.

**Vin de Silva**, Department of Mathematics, Pomona College

3:00 p.m.

Knowledge representation in bicategories of relations.

**Evan Patterson***, Stanford University, Statistics Department

3:30 p.m.

The multiresolution analysis of flow graphs.

**Steve Huntsman***, BAE Systems

4:00 p.m.

Categorical logic as a foundation for reasoning under uncertainty.

**Ralph L. Wojtowicz***, Shepherd University

4:30 p.m.

Data modeling and integration using the open source tool Algebraic Query Language (AQL).

**Peter Y. Gates***, Categorical Informatics

**Ryan Wisnesky**, Categorical Informatics

I’m running a special session on applied category theory, and now the program is available:

- Applied category theory, Fall Western Sectional Meeting of the AMS, 4-5 November 2017, U.C. Riverside.

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at M.I.T., and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopolou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone — for example, most of my students will be speaking at this special session, and other people too — but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Wanna see what the talks are about?

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

9:00 a.m.

A higher-order temporal logic for dynamical systems.

**David I. Spivak**, M.I.T.

10:00 a.m.

Algebras of open dynamical systems on the operad of wiring diagrams.

**Dmitry Vagner***, Duke University

**David I. Spivak**, M.I.T.

**Eugene Lerman**, University of Illinois at Urbana-Champaign

10:30 a.m.

Abstract dynamical systems.

**Christina Vasilakopoulou***, University of California, Riverside

**David Spivak**, M.I.T.

**Patrick Schultz**, M.I.T.

3:00 p.m.

Black boxes and decorated corelations.

**Brendan Fong**, M.I.T.

4:00 p.m.

Compositional modelling of open reaction networks.

**Blake S. Pollard***, University of California, Riverside

**John C. Baez**, University of California, Riverside

4:30 p.m.

A bicategory of coarse-grained Markov processes.

**Kenny Courser**, University of California, Riverside

5:00 p.m.

A bicategorical syntax for pure state qubit quantum mechanics.

**Daniel Michael Cicala**, University of California, Riverside

5:30 p.m.

Open systems in classical mechanics.

**Adam Yassine**, University of California Riverside

9:00 a.m.

Controllability and observability: diagrams and duality.

**Jason Erbele**, Victor Valley College

9:30 a.m.

Frobenius monoids, weak bimonoids, and corelations.

**Brandon Coya**, University of California, Riverside

10:00 a.m.

Compositional design and tasking of networks.

**John D. Foley***, Metron, Inc.

**John C. Baez**, University of California, Riverside

**Joseph Moeller**, University of California, Riverside

**Blake S. Pollard**, University of California, Riverside

10:30 a.m.

Operads for modeling networks.

**Joseph Moeller***, University of California, Riverside

**John Foley**, Metron Inc.

**John C. Baez**, University of California, Riverside

**Blake S. Pollard**, University of California, Riverside

2:00 p.m.

Reeb graph smoothing via cosheaves.

**Vin de Silva**, Department of Mathematics, Pomona College

3:00 p.m.

Knowledge representation in bicategories of relations.

**Evan Patterson***, Stanford University, Statistics Department

3:30 p.m.

The multiresolution analysis of flow graphs.

**Steve Huntsman***, BAE Systems

4:00 p.m.

Categorical logic as a foundation for reasoning under uncertainty.

**Ralph L. Wojtowicz***, Shepherd University

4:30 p.m.

Data modeling and integration using the open source tool Algebraic Query Language (AQL).

**Peter Y. Gates***, Categorical Informatics

**Ryan Wisnesky**, Categorical Informatics

We’re having a conference on applied category theory!

- Applied Category Theory (ACT 2018). Summer school April 23rd to 27th and conference April 30th to May 4th 2018 at the Lorentz Center in Leiden, the Netherlands. Organized by Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford).

The plenary speakers will be:

- Samson Abramsky (Oxford)
- John Baez (UC Riverside)
- Kathryn Hess (EPFL)
- Mehrnoosh Sadrzadeh (Queen Mary)
- David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website.

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

*Towards an integrative science*: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.

While the workshop will host talks on a wide range of applications of category theory, there will be three special tracks on exciting new developments in the field:

- Dynamical systems and networks
- Systems biology
- Cognition and AI
- Causality

Accompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

Sincerely,

The organizers

Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

Category theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez and Mike Stay. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945-1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127-135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55-92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121-134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317-341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.

It has been in the news recently — or rather, the small corner of the news that is of particular interest to mathematicians — that Maryanthe Malliaris and Saharon Shelah recently had an unexpected breakthrough when they stumbled on a proof that two infinities were equal that had been conjectured, and widely believed, to be […]

It has been in the news recently — or rather, the small corner of the news that is of particular interest to mathematicians — that Maryanthe Malliaris and Saharon Shelah recently had an unexpected breakthrough when they stumbled on a proof that two infinities were equal that had been conjectured, and widely believed, to be distinct. Or rather, since both were strictly between the cardinality of the natural numbers and the cardinality of the reals, they were widely believed to be distinct in some models of set theory where the continuum hypothesis fails.

A couple of days ago, John Baez was sufficiently irritated by a Quanta article on this development that he wrote a post on Google Plus in which he did a much better job of explaining what was going on. As a result of reading that, and following and participating in the ensuing discussion, I have got interested in the problem. In particular, as a complete non-expert, I am struck that a problem that looks purely combinatorial (though infinitary) should, according to Quanta, have a solution that involves highly non-trivial arguments in proof theory and model theory. It makes me wonder, again as a complete non-expert so probably very naively, whether there is a simpler purely combinatorial argument that the set theorists missed because they believed too strongly that the two infinities were different.

I certainly haven’t found such an argument, but I thought it might be worth at least setting out the problem, in case it appeals to anyone, and giving a few preliminary thoughts about it. I’m not expecting much from this, but if there’s a small chance that it leads to a fruitful mathematical discussion, then it’s worth doing. As I said above, I am indebted to John Baez and to several commenters on his post for being able to write much of what I write in this post, as can easily be checked if you read that discussion as well.

The problem concerns the structure you obtain when you take the power set of the natural numbers and quotient out by the relation “has a finite symmetric difference with”. That is, we regard two sets and as equivalent if you can turn into by removing finitely many elements and adding finitely many other elements.

It’s easy to check that this is an equivalence relation. We can also define a number of the usual set-theoretic operations. For example, writing for the equivalence class of , we can set to be , to be , to be , etc. It is easy to check that these operations are well-defined.

What about the subset relation? That too has an obvious definition. We don’t want to say that if , since that is not well-defined. However, we can define to be *almost contained in* if the set is finite, and then say that if is almost contained in . This *is* well-defined and it’s also easy to check that it is true if and only if , which is the sort of thing we’d like to happen if our finite-fuzz set theory is to resemble normal set theory as closely as possible.

I will use a non-standard piece of terminology and refer to an equivalence class of sets as an f-set, the “f” standing for “finite” or “fuzzy” (though these fuzzy sets are not to be confused with the usual definition of fuzzy sets, which I don’t know and probably never will know). I’ll also say things like “is f-contained in” (which means the same as “is almost contained in” except that it refers to the f-sets rather than to representatives of their equivalence classes).

So far so good, but things start to get a bit less satisfactory when we consider infinite intersections and unions. How are we to define , for example?

An obvious property we would like is that the intersection should be the largest f-set that is contained in all the . However, simple examples show that there doesn’t have to be a largest f-set contained in all the . Indeed, let be an infinite sequence of subsets of such that is infinite for every . Then is almost contained in every if and only if is finite for every . Given any such set, we can find for each an element of that is not contained in (since is infinite but is finite). Then the set is also almost contained in every , and is properly contained in (in the obvious sense).

OK, we don’t seem to have a satisfactory definition of infinite intersections, but we could at least hope for a satisfactory definition of “has an empty intersection”. And indeed, there is an obvious one. Given a collection of f-sets , we say that its intersection is empty if the only f-set that is f-contained in every is . (Note that is the equivalence class of the empty set, which consists of all finite subsets of .) In terms of the sets rather than their equivalence classes, this is saying that there is no infinite set that is almost contained in every .

An important concept that appears in many places in mathematics, but particularly in set theory, is the *finite-intersection property*. A collection of subsets of a set is said to have this property if is non-empty whenever . This definition carries over to f-sets with no problem at all, since finite f-intersections were easy to define.

Let’s ask ourselves a little question here: can we find a collection of f-sets with the finite-intersection property but with an empty intersection? That is, no *finite* intersection is empty, but the intersection of *all* the f-sets *is* empty.

That should be pretty easy. For sets, there are very simple examples like — finitely many of those have a non-empty intersection, but there is no set that’s contained in all of them.

Unfortunately, all those sets are the same if we turn them into f-sets. But there is an obvious way of adjusting the example: we just take sets such that is infinite for each and . That ought to do the job once we turn each into its equivalence class .

Except that it *doesn’t* do the job. In fact, we’ve already observed that we can just pick a set with and then will be a non-empty f-intersection of the .

However, here’s an example that does work. We’ll take all f-sets such that has density 1. (This means that tends

to 1 as tends to infinity.) Since the intersection of any two sets of density 1 has density 1 (a simple exercise), this collection of f-sets has the finite-intersection property. I claim that any f-set contained in all these f-sets must be .

Indeed, let be an infinite set and the enumeration of its elements in increasing order. We can pick a subsequence such that for every , and the corresponding subset is an infinite subset of with density zero. Therefore, is a set of density 1 that does not almost contain .

The number of f-sets we took there in order to achieve an f-empty intersection was huge: the cardinality of the continuum. (That’s another easy exercise.) Did we really need that many? This innocent question leads straight to a definition that is needed in order to understand what Malliaris and Shelah did.

**Definition.** The cardinal **p** is the smallest cardinality of a collection of f-sets such that has the finite-intersection property but also has an empty f-intersection.

It is simple to prove that this cardinal is uncountable, but it is also known not to be as big as the cardinality of the continuum (where again this means that there are models of set theory — necessarily ones where CH fails — for which it is strictly smaller). So it is a rather nice intermediate cardinal, which partially explains its interest to set theorists.

The cardinal **p** is one of the two infinities that Malliaris and Shelah proved were the same. The other one is closely related. Define a *tower* to be a collection of f-sets that does not contain and is totally ordered by inclusion. Note that a tower trivially satisfies the finite-intersection property: if belong to , then the smallest of the f-sets is the f-intersection and it isn’t f-empty. So let’s make another definition.

**Definition.** The cardinal **t** is the smallest cardinality of a tower that has an empty f-intersection.

Since a tower has the finite-intersection property, we are asking for something strictly stronger before, so strictly harder to obtain. It follows that **t** is at least as large as **p**.

And now we have the obvious question: is the inequality strict? As I have said, it was widely believed that it was, and a big surprise when Malliaris and Shelah proved that the two infinities were in fact equal.

What does this actually say? It says that if you can find a bunch of f-sets with the finite-intersection property and an empty f-intersection, then you can find a totally ordered example that has at most the cardinality of .

I don’t have a sophisticated answer to this that would explain why it is hard to experts in set theory. I just want to think about why it might be hard to prove the statement using a naive approach.

An immediate indication that things might be difficult is that it isn’t terribly easy to give *any* example of a tower with an empty f-intersection, let alone one with small cardinality.

An indication of the problem we face was already present when I gave a failed attempt to construct a system of sets with the finite-intersection property and empty intersection. I took a nested sequence such that the sets had empty intersection, but that didn’t work because I could pick an element from each and put those together to make a non-empty f-intersection. (I’m using “f-intersection” to mean any f-set f-contained in all the given f-sets. In general, we can’t choose a largest one, so it’s far from unique. The usual terminology would be to say that if is almost contained in every set from a collection of sets, then is a *pseudointersection* of that collection. But I’m trying to express as much as possible in terms of f-sets.)

Anyone who is familiar with ordinal hierarchies will see that there is an obvious thing we could do here. We could start as above, and then when we find the annoying f-intersection we simply add it to the tower and call it . And then inside we can find another nested decreasing sequence of sets and call those and so on. Those will also have a non-empty f-intersection, which we could call , and so on.

Let’s use this idea to prove that there do exist towers with empty f-intersections. I shall build a collection of non-empty f-sets by transfinite induction. If I have already built , I let be any non-empty f-set that is strictly f-contained in . That tells me how to build my sets at successor ordinals. If is a limit ordinal, then I’ll take to be a non-empty f-intersection of all the with .

But how am I so sure that such an f-intersection exists? I’m not, but if it doesn’t exist, then I’m very happy, as that means that the f-sets with form a tower with empty f-intersection.

Since all the f-sets in this tower are distinct, the process has to terminate at some point, and that implies that a tower with empty f-intersection must exist.

For a lot of ordinal constructions like this, one can show that the process terminates at the first uncountable ordinal, . To set theorists, this has extremely small cardinality — by definition, the smallest one after the cardinality of the natural numbers. In some models of set theory, there will be a dizzying array of cardinals between this and the cardinality of the continuum.

In our case it is not too hard to prove that the process doesn’t terminate *before* we get to the first uncountable ordinal. Indeed, if is a countable limit ordinal, then we can take an increasing sequence of ordinals that tend to , pick an element from , and define to be .

However, there doesn’t seem to be any obvious argument to say that the f-sets with have an empty f-intersection, even if we make some effort to keep our sets small (for example, by defining to consist of every other element of ). In fact, we sort of know that there won’t be such an argument, because if there were, then it would show that there was a tower whose cardinality was that of the first uncountable ordinal. That would prove that **t** had this cardinality, and since **p** is uncountable (that is easy to check) we would immediately know that **p** and **t** were equal.

So that’s already an indication that something subtle is going on that you need to be a proper set theorist to understand properly.

But do we need to understand these funny cardinalities to solve the problem? We don’t need to know what they are — just to prove that they are the same. Perhaps that can still be done in a naive way.

So here’s a very naive idea. Let’s take a set of f-sets with the finite intersection property and empty f-intersection, and let’s try to build a tower with empty intersection using only sets from . This would certainly be sufficient for showing that has cardinality at most that of , and if has minimal cardinality it would show that **p**=**t**.

There’s almost no chance that this will work, but let’s at least see where it goes wrong, or runs into a brick wall.

At first things go swimmingly. Let . Then there must exist an f-set that does not f-contain , since otherwise itself would be a non-empty f-intersection for . But then is a proper f-subset of , and by the finite-intersection property it is not f-empty.

By iterating this argument, we can therefore obtain a nested sequence of f-sets in .

The next thing we’d like to do is create . And this, unsurprisingly, is where the brick wall is. Consider, for example, the case where consists of all sets of density 1. What if we stupidly chose in such a way that for every ? Then our diagonal procedure — picking an element from each set — would yield a set of density zero. Of course, we could go for a different diagonal procedure. We would need to prove that for this particular and any nested sequence we can always find an f-intersection that belongs to . That’s equivalent to saying that for any sequence of dense sets we can find a set such that is finite for every and has density 1.

That’s a fairly simple (but not trivial) exercise I think, but when I tried to write a proof straight down I failed — it’s more like a pen-and-paper job until you get the construction right. But here’s the real question I’d like to know the answer to right at this moment. It splits into two questions actually.

**Question 1.** *Let be a collection of f-sets with the finite-intersection property and no non-empty f-intersection. Let be a nested sequence of elements of . Must this sequence have an f-intersection that belongs to ?*

**Question 2.** *If, as seems likely, the answer to Question 1 is no, must it at least be the case that there exists a nested sequence in with an f-intersection that also belongs to ?*

If the answer to Question 2 turned out to be yes, it would naturally lead to the following further question.

**Question 3.** *If the answer to Question 2 is yes, then how far can we go with it? For example, must contain a nested transfinite sequence of uncountable length?*

Unfortunately, even a positive answer to Question 3 would not be enough for us, for reasons I’ve already given. It might be the case that we can indeed build nice big towers in , but that the arguments stop working once we reach the first uncountable ordinal. Indeed, it might well be known that there are sets with the finite-intersection property and no non-empty f-intersection that do not contain towers that are bigger than this. If that’s the case, it would give at least one serious reason for the problem being hard. It would tell us that we can’t prove the equality by just finding a suitable tower inside : instead, we’d need to do something more indirect, constructing a tower and some non-obvious injection from to . (It would be non-obvious because it would not preserve the subset relation.)

Another way the problem might be difficult is if does contain a tower with no non-empty f-intersection, but we can’t extend an arbitrary tower in to a tower with this property. Perhaps if we started off building our tower the wrong way, it would lead us down a path that had a dead end long before the tower was big enough, even though good paths and good towers did exist.

But these are just pure speculations on my part. I’m sure the answers to many of my questions are known. If so, I’ll be interested to hear about it, and to understand better why Malliaris and Shelah had to use big tools and a much less obvious argument than the kind of thing I was trying to do above.

The Department of Physics and Astronomy at Rice University in Houston, Texas, invites applications for a tenure-track faculty position (Assistant Professor level) in Theoretical Astro-Particle physics and/or Cosmology. The department seeks an outstanding individual whose research will complement and connect existing activities in Nuclear/Particle physics and Astrophysics groups at Rice University (see http://physics.rice.edu). This is the second position in a *Cosmic Frontier* effort that may eventually grow to three members. The successful applicant will be expected to develop an independent and vigorous research program, and teach graduate and undergraduate courses. A PhD in Physics, Astrophysics or related field is required.

Applicants should send the following: (i) cover letter; (ii) curriculum vitae (including electronic links to 2 relevant publications); (iii) research statement (4 pages or less); (iv) teaching statement (2 pages or less); and (v) the names, professional affiliations, and email addresses of three references. To apply, please visit: http://jobs.rice.edu/postings/11772. Applications will be accepted until the position is filled, but only those received by Dec 15, 2017 will be assured full consideration. The appointment is expected to start in July 2018. Further inquiries should be directed to the chair of the search committee, Prof. Paul Padley (padley@rice.edu).

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.

In my last post I talked about certain types of lattice paths with weightings on them and formulas for the weighted count of the paths, in particular I was interested in expressing the reverse Bessel polynomials as a certain weighted count of Schröder paths. I alluded to some connection with continued fractions and it is this connection that I want to explain here and in my next post.

In this post I want to prove Flajolet’s Fundamental Lemma. Alan Sokal calls this Flajolet’s Master Theorem, but Viennot takes the stance that it deserves the high accolade of being described as a ‘Fundamental Lemma’, citing Aigner and Ziegler in Proofs from THE BOOK:

“The essence of mathematics is proving theorems – and so, that is what mathematicians do: They prove theorems. But to tell the truth, what they really want to prove, once in their lifetime, is a Lemma, like the one by Fatou in analysis, the Lemma of Gauss in number theory, or the Burnside-Frobenius Lemma in combinatorics.

“Now what makes a mathematical statement a true Lemma? First, it should be applicable to a wide variety of instances, even seemingly unrelated problems. Secondly, the statement should, once you have seen it, be completely obvious. The reaction of the reader might well be one of faint envy: Why haven’t I noticed this before? And thirdly, on an esthetic level, the Lemma – including its proof – should be beautiful!”

Interestingly, Aigner and Ziegler were building up to describing a result of Viennot’s – the Gessel-Lindström-Viennot Lemma – as a fundamental lemma! (I hope to talk about that lemma in a later post.)

Anyway, Flajolet’s Fundamental Lemma that I will describe and prove below is about expressing the weighted count of paths that look like

as a continued fraction

$\frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}}$

Next time I’ll give a few examples, including the connection with reverse Bessel polynomials.

We consider Motzkhin paths, which are like Dyck paths and Schröder paths we considered last time, but here the flat paths have length $1$.

A **Motzkhin path**, then, is a lattice path in $\mathbb{N}^2$ starting at $(0, 0)$, having steps in the direction $(1,1)$, $(1,-1)$ or $(1,0)$. The path finishes at some $(\ell, 0)$. Here is a Motzkhin path.

(Actually at this point the length of each step is a bit of a red herring, but let’s not worry about that.)

We want to count weighted paths, so we’re going to have to weight them. We’ll do it in a universal way to start with. Let $\{a_i\}_{i=1}^\infty$, $\{b_i\}_{i=1}^\infty$ and $\{c_i\}_{i=0}^\infty$, be three sets of commuting indeterminates. Now weight each step in a path in the following way. Each step going up to level $i$ will be given the weight $a_i$; each step going down from level $i$ will be given the weight $b_i$; and each flat step at level $i$ will be given weight $c_i$. Here’s the path from above with the weights marked on it.

The weight $w_{a,b,c}(\sigma)$ of a path $\sigma$ is just the product of the weights of each of its steps, so the weight of the above path is $c_0a_1^2b_1^2c_1a_2b_2$.

If you try to start writing down the sum of the weightings of all Motzkhin paths you’ll get a power series that begins

$1 + c_0 + a_1b_1 + c_0^2 + 2a_1b_1c_0 + a_1b_1c_1 + c_0^3 + \dots \in\mathbb{Z}[[a_i, b_i, c_i]]$

Flajolet’s Fundamental Lemma will give us a formula for this power series.

In order to prove the result about the enumeration of weightings of all paths we will need to consider slightly more general paths that don’t just start on the $x$-axis. So define a $(h,k)$–path to be like a Motzkhin path except that it starts at some point $(0, h)$, for $h\ge 0$, does not go below the line $y=h$ nor above the line $y=k$ and finishes at some point $(\ell, h)$. Let $P_h^k$ denote the set of all $(h,k)$–paths.

Here is a $(2,4)$–path with the weights marked on. Of course this is also, for instance, a $(2,13)$–path.

We want the weighted sum of all Motzkhin paths, so in order to calculate that we will take $p_h^k$ to be the sum of all weights of $(h,k)$-paths: $p_h^k\coloneqq \sum_{\sigma\in P_h^k} w_{a,b,c}(\sigma)\in \mathbb{Z}[[a_i, b_i, c_i]].$ There is a beautifully simple expression for $p^k_h$.

Observe first that any path in $P_k^k$ is constrained to lie at level $k$ so must simply be a product of flat steps which all have weight $c_k$, thus

$p^k_k = 1 + c_k +c_k^2 + c_k^3+\dots = \frac{1}{1-c_k}.$

Given two paths $\sigma_1, \sigma_2\in P^k_h$ we can multiply them together simply by placing $\sigma_2$ after $\sigma_1$. The above pictured example is the product of three paths in $P_2^4$, the middle one being a flat path. Weighting is clearly preserved by this multiplication: $w_{a,b,c}(\sigma_1\sigma_2)=w_{a,b,c}(\sigma_1)w_{a,b,c}(\sigma_2)$.

An **indecomposable** $(h,k)$-path is a path which only returns to level $h$ at its finishing point, i.e. as the name suggests, it can not be decomposed into a non-trivial product. It is clear that any path uniquely decomposes as a product of indecomposable paths. There are two types of non-trivial indecomposable $(h,k)$-paths: there is the single flat step; and there are the paths which are an up step followed by a path in $P_{h+1}^k$ followed by a down step back to level $h$. We let $I_{h}^k$ be the set of non-trivial indecomposable $(h,k)$-paths.

This all leads to the following argument to deduce an expression for the weighted count of all $(h,k)$-paths.

$\begin{aligned} p^k_h&=\sum_{\sigma\in P^k_h} w_{a,b,c}(\sigma)\\ &= \sum_{n=0}^\infty \sum_{\pi_i,\dots,\pi_n \in I^k_h} w_{a,b,c}(\pi_1\dots \pi_n)\\ &= \sum_{n=0}^\infty \sum_{\pi_i,\dots,\pi_n \in I^k_h} w_{a,b,c}(\pi_1)\dots w_{a,b,c}(\pi_n)\\ &= \frac{1}{1- \sum_{\pi\in I^k_h} w_{a,b,c}(\pi)} \\ &= \frac{1}{1- c_k - \sum_{\sigma\in P^k_{h+1}}a_{h+1} w_{a,b,c}(\sigma)b_{h+1}} \\ &= \frac{1}{1- c_k- a_{h+1}b_{h+1}\sum_{\sigma\in P^k_{h+1}} w_{a,b,c}(\sigma)} \\ &= \frac{1}{1- c_k - a_{h+1} b_{h+1}p_{h+1}^k} \end{aligned}$

This is a lovely recursive expression for the weighted count $p_h^k$. Using the fact $p^k_k=\frac{1}{1-c_k}$ that we gave above, we obtain the following.

Lemma$p_h^k= \frac{1} {1- c_{h} - \frac{a_{h+1} b_{h+1}} {1-c_{h+1} - \frac{a_{h+2} b_{h+2}} {\qquad \frac{\vdots}{1- c_{k-1}-\frac{a_k b_k}{1-c_k}} }}}$

Now taking $h=0$ and letting $k\to \infty$ we get the following continued fraction expansion for the weighted count of all Motzkhin paths starting at level $0$.

Flajolet’s Fundamental Lemma$\sum_{\sigma\,\,\mathrm{Motzkhin}} w_{a,b,c}(\sigma) = \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}} \in\mathbb{Z}[[a_i, b_i, c_i]]$

How lovely and simple is that?

Next time I’ll give some examples and applications which include the Dyck paths and Scröder paths we looked at previously.

The Department of Physics and Astronomy at Rice University in Houston, TX invites applications for a tenure-track faculty position in experimental condensed matter physics. The department expects to make an appointment at the assistant professor level. This search seeks an outstanding individual whose research interest is in hard condensed matter systems, who will complement and extend existing experimental and theoretical activities in condensed matter physics on semiconductor and nanoscale structures, strongly correlated systems, topological matter, and related quantum materials (see http://physics.rice.edu/). A PhD in physics or related field is required.

Applicants to this search should submit the following: (1) cover letter; (2) curriculum vitae; (3) research statement; (4) teaching statement; and (5) the names, professional affiliations, and email addresses of three references. For full details and to apply, please visit: http://jobs.rice.edu/postings/11782. Applications will be accepted until the position is filled. The review of applications will begin**October 15 2017**, but all those received by December 1 2017 will be assured full consideration. The appointment is expected to start in July 2018. Further inquiries should be directed to the chair of the search committee, Prof. Emilia Morosan (emorosan@rice.edu).

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.

Applicants to this search should submit the following: (1) cover letter; (2) curriculum vitae; (3) research statement; (4) teaching statement; and (5) the names, professional affiliations, and email addresses of three references. For full details and to apply, please visit: http://jobs.rice.edu/postings/11782. Applications will be accepted until the position is filled. The review of applications will begin

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.

Our data-driven model for stars, *The Cannon*, is a regression. That is, it figures out how the labels generate the spectral pixels with a model for possible functional forms for that generation. I spent part of today building a Jupyter notebook to demonstrate that—when the assumptions underlying the regression are correct—the results of the regression are accurate (and precise). That is, the maximum-likelihood regression estimator is a good one. That isn't surprising; there are very general proofs; but it answers some questions (that my collaborators have) about cases where the labels (the regressors) are correlated in the training set.