An Operadic Introduction to Entropy
Posted by Tom Leinster
Bless British trains. A two-hour delay with nothing to occupy me provided the perfect opportunity to figure out the relationships between some of the results that John, Tobias and I have come up with recently.
This post is intended to serve two purposes. First, for those who have been following, it ties together several of the theorems we’ve found. I hope it will make them seem less like a bunch of distinct but vaguely similar results and more like a coherent whole.
Second, for those who haven’t been following, it’s an introduction to entropy, and our recent results on it, for the categorically minded.
I will not assume:
- that you know anything whatsoever about entropy
- that you’ve read any other posts on this blog.
I will assume:
- that you know the definitions of operad and algebra for an operad
- that you know a bit of category theory, including roughly what category theorists mean when they use the word lax.
Operads, algebras and internal algebras
Let be an operad in . I’ll keep it open as to whether is symmetric or not. It makes sense to talk about -algebras in any category with finite products, and in particular in . But is actually a 2-category, so there are further variations available to us.
For a start, we can consider weak (or pseudo) -algebras, where the axioms only hold up to coherent isomorphism. In what follows, I won’t be scrupulously careful to distinguish between strict and weak -algebras. That’s much the same kind of carelessness as when we pretend that a monoidal category is strict.
But more importantly, we can vary the flavour of maps between -algebras. In particular, we can consider lax maps. Given -algebras and in , a lax map is a functor together with a natural transformation
for each , satisfying coherence conditions.
Now let be a -algebra in . An internal -algebra in is a lax map of -algebras, where is the terminal -algebra in . (Previously, I’ve sometimes called this a ‘lax point’ of . The internal algebra terminology is, I think, due to Michael Batanin.) The examples that follow shortly should explain the name.
Explicitly, an internal -algebra in consists of:
- an object
- for each operation , a map
satisfying the axioms
- for any operations and
- , where the on the left-hand side is the unit of the operad
- if we’re doing symmetric operads, whenever and .
Examples
- Take the terminal non-symmetric operad . Then a -algebra in is a monoidal category, and an internal algebra in is an internal monoid in .
- Fix a monoid , and let be the non-symmetric operad with no non-unary operations, and whose monoid of unary operations is . I’ll call this the monoid for -sets. Then a -algebra in is a category with a left action by . An internal algebra in is an object together with a map for each , satisfying the evident action-like axioms.
Although I started with an operad in , the concept of internal -algebra makes sense for an operad in any category with finite limits. Our -algebras would then be categories internal to (and equipped with a -algebra structure).
The only case we’ll need is , the category of topological spaces. So will be a topological operad, and our -algebras will be topological categories, that is, categories internal to . I’ll write for the category of topological categories. The explicit description of internal -algebras still holds, except that we add the condition that the assignment
defines a continuous map from to the space of maps in .
If you’re not comfortable with internal categories, all you need to extract from the previous two paragraphs is ‘some continuity conditions will be slapped on’.
I’ll introduce a particularly important topological operad, . The elements of are the probability distributions on an -element set:
This is the -simplex, and is topologized as such.
To define the composition, imagine that you have a distribution on and then further distributions on for each . From this you can create a new distribution on : first choose an element according to , then choose an element according to . Your output element is . That is,
The unit of the operad is, inevitably, the unique probability distribution .
I’ll also introduce a particularly important -algebra in .
As a topological category, it’s the additive monoid , regarded as a one-object category. In this post, when appears as a category, it will always mean this. (The order on could also be used to make it into a category, but that won’t come into this story.)
The -algebra structure is trivial on objects, since the category only has one object. On maps — that is, on real numbers — it’s given by
(, ).
Now, what’s an internal -algebra in ?
Well, according to our explicit description, it is:
- an object of the category — but only has one object
- an element for each probability distribution , which I’ll now write as for legibility
satisfying the axioms
- for any permutation (that is, is symmetric in its arguments)
- is continuous, for each .
This has now become a very explicit problem: find all the sequences of functions
satisfying these four axioms. Fortunately, there’s already a theorem solving this problem. It was found and proved by Fadeev in the 1950s. To state Fadeev’s theorem, we need the definition of Shannon entropy. For a probability distribution on a finite set, the Shannon entropy is
where by convention, . This gives a sequence of functions
It’s easy to verify that satisfies the four axioms above. And obviously the set of s satisfying those four axioms is closed under multiplication by nonnegative scalars, so satisfies them too for any . Fadeev proved the converse:
every satisfying the four axioms above is a nonnegative scalar multiple of Shannon entropy.
In this sense, the internal -algebras in ‘are’ the scalar multiples of Shannon entropy.
All the theorems described in the rest of this post will depend on this one.
The free -algebra containing an internal -algebra
Fix a symmetric operad , say in or . The free -algebra containing an internal -algebra is a -algebra in (or ) equipped with an internal -algebra and initial as such.
What does this mean? Well, first of all, is supposed to denote an object of , and is a family of maps , one for each operation of , as in the definition of internal algebra. The initiality means that for any -algebra and internal -algebra in , there is a unique (strict) map of -algebras such that
A slightly more abstract way to put it is that for all -algebras , there is an isomorphism
The two statements are equivalent by the Yoneda Lemma.
This being a universal property, it characterizes and uniquely up to isomorphism, if they exist. And they do exist: here’s an explicit description.
- An object of is a pair where and
-
a map in consists of:
- a map of finite sets, together with
- for each , an element
- where is a certain permutation describing the way in which the fibres are interleaved
- I’ll leave you to figure out what the -action on must be.
- , and I’ll also leave you to figure out what must be.
A few comments are in order. First, this description might seem a bit fearsome, but it’s not so bad; I’ll give a couple of examples in a moment. Second, if we were doing non-symmetric operads then we’d require to be order-preserving, and there wouldn’t be a permutation . Third, is only a weak -algebra. It would be strict except for the symmetries. But as I said earlier, I’m going to ignore the difference between weak and strict. Fourth, if we’re doing topological operads then is a category in , and I hope you can guess what the topology is.
Examples
- Let be the terminal non-symmetric operad, so that -algebras in are monoidal categories and internal -algebras are internal monoids. Then is the monoidal category of finite totally ordered sets (or a skeleton thereof), including the empty set. The ‘generic’ internal monoid in has , the one-element totally ordered set. It’s been well-known since at least Categories for the Working Mathematician that a monoidal functor from to another monoidal category amounts to an internal monoid in .
- Fix a monoid , and let be the monoid for -sets. If you regard as a category with single object , then is the slice category . In other words, the objects of are the elements of , and a map in is an element such that . If has cancellation then is the poset of elements of , ordered by divisibility.
Now for the most important example: the operad for probability distributions. An object of is a finite probability space. I’ll write a typical object as , where is a finite set and is a probability distribution on . A map in is almost just a measure-preserving map. Precisely, a map in consists of a map of sets together with, for each , a probability distribution on the fibre , such that
for each and . But if for all then this formula determines uniquely, and it follows that a map amounts to a map of sets preserving measure, that is, satisfying
I’ll write this category as , for two reasons: it’s a category formed freely from the operad , and it’s almost the same as the category of finite probability spaces.
The -algebra structure on can be described as follows. Let . When is applied to an -tuple of objects
of , the result is the object
where whenever . The action of on the maps in is defined in a similar way.
The universal property of implies that
strict maps of -algebras correspond to internal -algebras in .
But Fadeev’s theorem classifies the internal -algebras in : they correspond to the nonnegative scalar multiples of Shannon entropy. So this says:
strict maps of -algebras correspond to nonnegative scalar multiplies of Shannon entropy.
How they correspond can be extracted from what I’ve said so far, but let me bring it down to earth by stating it as explicitly as I know how. In it, I’ll denote maps in by Greek letters such as , to save me from writing pairs such as every time.
Theorem P Let be a function from the set of maps in to the set , satisfying the following conditions:
- Functoriality: for all composable maps , in
- Convex combinations: for all maps , in and all
- Continuity: is continuous.
Then there is some such that for all maps in ,
Conversely, for any , the function defined by this formula satisfies the conditions above.
Every part of this theorem comes directly from what we know so far, with the aid of a little routine calculation. For example, the convex combinations axiom (which says that defines a map of -algebras) should in principle involve convex combinations of any number of maps, but an easy induction shows that two suffices. The formula is perhaps a surprise, but again pops out if you work through the details of the correspondence.
I’ve called this Theorem P. Later there will be Theorems P, M and M, each one characterizing entropy in some categorical way. I’ll show you how they all connect up.
The actual category of finite probability spaces
You might not like the fact that Theorem P involves the category , which isn’t quite the same as the category of finite probability spaces. It’s only zero probabilities that prevent it from being the same. But we have to face the fact: it’s different.
So, can we find a Theorem P with instead of ? Yes, we can. To do so, let’s identify what makes the two categories different.
Let and be finite probability spaces. In both categories, a map involves a measure-preserving map . In , it’s exactly that, but in , it also involves a probability distribution on each fibre , satisfying a compatibility condition. That condition determines uniquely when , but not when .
Let’s say that a -algebra in is special (for want of a better word) if whenever we have and maps
in satisfying the condition that
(), then
This basically says that when you multiply a map by zero, the result doesn’t depend on . ‘Multiplying by zero’ doesn’t make sense in this world of convex combinations, but that’s the idea. For this reason, is special and is not.
Another special -algebra is the monoid , since if and with whenever , then .
You can easily force a non-special -algebra to be special, by identifying all the parallel pairs of maps of the kind that appear in the last displayed equation. In other words, the inclusion
has a left adjoint. (When I speak of -algebras, I will always mean algebras in .) And when this left adjoint is applied to , the result is .
This adjointness immediately tells us that
So by Theorem P,
strict maps of -algebras correspond to nonnegative scalar multiples of Shannon entropy.
Or explicitly:
Theorem P Let be a function from the set of maps in to the set , satisfying the following conditions:
- Functoriality: for all composable maps , in
- Convex combinations: for all maps , in and all
- Continuity: is continuous.
Then there is some such that for all maps in ,
Conversely, for any , the function defined by this formula satisfies the conditions above.
The theorem is now in a very widely understandable form: no operads, no funny categories.
From probability spaces to measure spaces
In some sense there’s not much difference between a measure space (at least, a finite one) and a probability space: unless the measure space has total measure zero, which is pretty boring, you can always scale it so that it’s a probability space. Nonetheless, there are some advantages and disadvantages to each. In some contexts, one seems more natural than the other.
Both Theorems P and P, concerning probability spaces, have analogues for measure spaces. I’ll explain them now.
To switch from probability to measure spaces, we switch operads.
The operad will be replaced by an operad I’ll call . First note that for any monoid , there’s an operad whose set of -ary operations is , and with multiplication given by
for and . Let be the operad resulting in this way from the multiplicative monoid .
A -algebra in consists of a set together with a map
for each , satisfying axioms. We might write this as
In fact, an -algebra in amounts to a commutative monoid equipped with an action by the multiplicative monoid , acting by monoid homomorphisms. So we do have
(which is what “acting by monoid homomorphisms” means), but we don’t have
There’s no chance of having either of these, first because the definition of the operad doesn’t mention the additive structure on , and second because these are equations of the type that an operad can’t encode: variables are duplicated or disappear from one side of the equation to the other. So, an -algebra is quite like a module over the rig , but not entirely.
From now on, I’ll regard as a topological operad (in the obvious way), and ‘algebras’ for it will always be (strict or weak) algebras in , as for .
There’s an obvious inclusion of operads . This induces a functor
which for general reasons has a left adjoint. Write for the image under this left adjoint of . Explicitly:
- an object of is a pair where is a finite set and is a measure on (with all subsets regarded as measurable), or equivalently an element of
- a map in consists of a map of sets together with a probability measure on the fibre for each , such that for each and .
As for , this category is almost the same as the category whose objects are finite sets equipped with a measure, and whose maps are the measure-preserving maps. Indeed, when each is nonzero, the maps in are exactly the measure-preserving maps. This resemblance to is the reason for the name . (It is not the free -algebra containing an internal -algebra.)
Once you start thinking about it, it’s pretty obvious what the -algebra structure of must be: it’s the only possible thing. I won’t write it out.
The additive monoid is naturally an -algebra, by taking linear combinations. Its underlying -algebra structure is the one we met before. So by adjointness,
But Theorem P tells us exactly what the right-hand side is. Hence
strict maps of -algebras correspond to nonnegative scalar multiples of Shannon entropy.
Again, the precise form of the correspondence follows from the details of what’s gone before. (Of course, I’m not imagining that you’re diligently writing out those details, but I am imagining that you’re imagining their existence.) When you unwind it all, you get the following explicit result. It refers to the Shannon entropy of an arbitrary measure space , where is a finite set. This is, by definition,
where .
Theorem M Let be a function from the set of maps in to the set , satisfying the following conditions:
- Functoriality: for all composable maps , in
- Additivity: for all maps , in
- Homogeneity: for all maps in and all
- Continuity: is continuous.
Then there is some such that for all maps in ,
Conversely, for any , the function defined by this formula satisfies the conditions above.
If you’re still reading, maybe you can guess what the fourth and final theorem will be. It is in fact the main theorem of the (first) paper that John, Tobias and I are writing, and it involves rather than this funny category .
So, imitating what we did for , let’s say that an -algebra is special if for all maps
in , the maps are equal. The inclusion
has a left adjoint, which sends the non-special algebra to the special algebra .
The -algebra is also special. So by adjointness,
which by Theorem M gives
strict maps of -algebras correspond to nonnegative scalar multiples of Shannon entropy.
Explicitly:
Theorem M Let be a function from the set of maps in to the set , satisfying the following conditions:
- Functoriality: for all composable maps , in
- Additivity: for all maps , in
- Homogeneity: for all maps in and all
- Continuity: is continuous.
Then there is some such that for all maps in ,
Conversely, for any , the function defined by this formula satisfies the conditions above.
Tying it all together
We began with the operad , and the fact that the internal -algebras in the -algebra correspond to the scalar multiples of Shannon entropy. This is Fadeev’s theorem from the 1950s, dressed up.
From there, we proved four closely related theorems: P, P, M and M. The purpose of this last section is to give an overview of how they’re related.
We have a commutative square of forgetful functors
(the one down the right-hand edge having been induced by the inclusion of operads ). As always, refers to algebras in .
Each of these forgetful functors has a left adjoint, giving another commutative square
There is a canonical object of , namely, the free -algebra containing an internal -algebra. Applying to the functors shown in this last square gives:
The universal property of tells us that the maps from it to are the internal -algebras in , which by Fadeev’s theorem are the scalar multiples of Shannon entropy. This is “Theorem P”. Then adjointness gives us three more theorems for free:
From this point of view, Theorem P (concerning ) is the fundamental theorem from which the rest follow. Theorem M (concerning , and included in our paper) is, at the other extreme, the height of refinement. Nevertheless, it can be stated in a totally direct and explicit way.
Re: An Operadic Introduction to Entropy
[in an earlier version of this post Tom had asked how to typeset “mapsto”-arrows that point in various directions, my reply below refers to that]
You need to explicitly code them as unicode characters: typing
produces
↤ ↥ ↦ or ↧