An Operadic Introduction to Entropy
Posted by Tom Leinster
Bless British trains. A twohour delay with nothing to occupy me provided the perfect opportunity to figure out the relationships between some of the results that John, Tobias and I have come up with recently.
This post is intended to serve two purposes. First, for those who have been following, it ties together several of the theorems we’ve found. I hope it will make them seem less like a bunch of distinct but vaguely similar results and more like a coherent whole.
Second, for those who haven’t been following, it’s an introduction to entropy, and our recent results on it, for the categorically minded.
I will not assume:
 that you know anything whatsoever about entropy
 that you’ve read any other posts on this blog.
I will assume:
 that you know the definitions of operad and algebra for an operad
 that you know a bit of category theory, including roughly what category theorists mean when they use the word lax.
Operads, algebras and internal algebras
Let $P$ be an operad in $Set$. I’ll keep it open as to whether $P$ is symmetric or not. It makes sense to talk about $P$algebras in any category with finite products, and in particular in $Cat$. But $Cat$ is actually a 2category, so there are further variations available to us.
For a start, we can consider weak (or pseudo) $P$algebras, where the axioms only hold up to coherent isomorphism. In what follows, I won’t be scrupulously careful to distinguish between strict and weak $P$algebras. That’s much the same kind of carelessness as when we pretend that a monoidal category is strict.
But more importantly, we can vary the flavour of maps between $P$algebras. In particular, we can consider lax maps. Given $P$algebras $B$ and $A$ in $Cat$, a lax map $B \to A$ is a functor $F: B \to A$ together with a natural transformation
$\begin{matrix} P_n \times B^n &\stackrel{1 \times F^n}{\to} &P_n \times A^n \\ \downarrow &\Leftarrow &\downarrow \\ B &\stackrel{F}{\to} &A \end{matrix}$
for each $n \in \mathbb{N}$, satisfying coherence conditions.
Now let $A$ be a $P$algebra in $Cat$. An internal $P$algebra in $A$ is a lax map $1 \to A$ of $P$algebras, where $1$ is the terminal $P$algebra in $Cat$. (Previously, I’ve sometimes called this a ‘lax point’ of $A$. The internal algebra terminology is, I think, due to Michael Batanin.) The examples that follow shortly should explain the name.
Explicitly, an internal $P$algebra in $A$ consists of:
 an object $a \in A$
 for each operation $p \in P_n$, a map $\alpha_p: p(a, \ldots, a) \to a$
satisfying the axioms
 $\alpha_{p \circ (r_1, \ldots, r_n)} = \alpha_p \circ p(\alpha_{r_1}, \ldots, \alpha_{r_n})$ for any operations $p \in P_n$ and $r_i \in P_{k_i}$
 $\alpha_1 = 1_a$, where the $1$ on the lefthand side is the unit of the operad
 if we’re doing symmetric operads, $\alpha_{\sigma p} = \alpha_p$ whenever $p \in P_n$ and $\sigma \in S_n$.
Examples
 Take the terminal nonsymmetric operad $P$. Then a $P$algebra $A$ in $Cat$ is a monoidal category, and an internal algebra in $A$ is an internal monoid in $A$.
 Fix a monoid $M$, and let $P$ be the nonsymmetric operad with no nonunary operations, and whose monoid of unary operations is $M$. I’ll call this the monoid for $M$sets. Then a $P$algebra in $Cat$ is a category $A$ with a left action by $M$. An internal algebra in $A$ is an object $a$ together with a map $\alpha_m: m\cdot a \to a$ for each $m \in M$, satisfying the evident actionlike axioms.
Although I started with an operad $P$ in $Set$, the concept of internal $P$algebra makes sense for an operad $P$ in any category $\mathcal{E}$ with finite limits. Our $P$algebras would then be categories internal to $\mathcal{E}$ (and equipped with a $P$algebra structure).
The only case we’ll need is $\mathcal{E} = Top$, the category of topological spaces. So $P$ will be a topological operad, and our $P$algebras $A$ will be topological categories, that is, categories internal to $Top$. I’ll write $Cat(Top)$ for the category of topological categories. The explicit description of internal $P$algebras still holds, except that we add the condition that the assignment
$p \mapsto \alpha_p$
defines a continuous map from $P_n$ to the space of maps in $A$.
If you’re not comfortable with internal categories, all you need to extract from the previous two paragraphs is ‘some continuity conditions will be slapped on’.
I’ll introduce a particularly important topological operad, $\mathbf{P}$. The elements of $\mathbf{P}_n$ are the probability distributions on an $n$element set:
$\mathbf{P}_n = \{ \mathbf{p} \in \mathbb{R}^n  p_i \geq 0, \sum_i p_i = 1 \}.$
This is the $(n  1)$simplex, and is topologized as such.
To define the composition, imagine that you have a distribution $\mathbf{p}$ on $\{1, \ldots, n\}$ and then further distributions $\mathbf{r}_i$ on $\{1, \ldots, k_i\}$ for each $i \in \{1, \ldots, n\}$. From this you can create a new distribution on $\{1, \ldots, k_1 + \cdots + k_n\}$: first choose an element $i \in \{1, \ldots, n\}$ according to $\mathbf{p}$, then choose an element $j \in \{1, \ldots, k_i\}$ according to $\mathbf{r}_i$. Your output element is $k_1 + \cdots + k_{i  1} + j$. That is,
$\mathbf{p} \circ (\mathbf{r}_1, \ldots, \mathbf{r}_n) = (p_1 r_{1 1}, \ldots, p_1 r_{1 k_1}, \ldots, p_n r_{n 1}, \ldots, p_n r_{n k_n}).$
The unit of the operad $\mathbf{P}$ is, inevitably, the unique probability distribution $(1) \in \mathbf{P}_1$.
I’ll also introduce a particularly important $P$algebra in $Cat(Top)$.
As a topological category, it’s the additive monoid $\mathbb{R}_+ = [0, \infty)$, regarded as a oneobject category. In this post, when $\mathbb{R}_+$ appears as a category, it will always mean this. (The order on $\mathbb{R}_+$ could also be used to make it into a category, but that won’t come into this story.)
The $\mathbf{P}$algebra structure is trivial on objects, since the category $\mathbb{R}_+$ only has one object. On maps — that is, on real numbers — it’s given by
$\mathbf{p}(x_1, \ldots, x_n) = \sum_i p_i x_i$
($\mathbf{p} \in \mathbf{P}_n$, $x_i \in \mathbb{R}_+$).
Now, what’s an internal $\mathbf{P}$algebra in $\mathbb{R}_+$?
Well, according to our explicit description, it is:
 an object of the category $\mathbb{R}_+$ — but $\mathbb{R}_+$ only has one object
 an element $\alpha_\mathbf{p} \in \mathbb{R}_+$ for each probability distribution $\mathbf{p} \in \mathbf{P}_n$, which I’ll now write as $\alpha(\mathbf{p})$ for legibility
satisfying the axioms
 $\alpha(\mathbf{p} \circ (\mathbf{r}_1, \ldots, \mathbf{r}_n)) = \alpha(\mathbf{p}) + \sum_i p_i \alpha(\mathbf{r}_i)$
 $\alpha((1)) = 0$
 $\alpha(\sigma \mathbf{p}) = \alpha(\mathbf{p})$ for any permutation $\sigma$ (that is, $\alpha$ is symmetric in its arguments)
 $\alpha: \mathbf{P}_n \to \mathbb{R}_+$ is continuous, for each $n$.
This has now become a very explicit problem: find all the sequences of functions
$\Bigl( \mathbf{P}_n \stackrel{\alpha}{\to} \mathbb{R}_+ \Bigr)$
satisfying these four axioms. Fortunately, there’s already a theorem solving this problem. It was found and proved by Fadeev in the 1950s. To state Fadeev’s theorem, we need the definition of Shannon entropy. For a probability distribution $\mathbf{p} \in \mathbf{P}_n$ on a finite set, the Shannon entropy is
$H(\mathbf{p}) = \sum_i p_i \ln(p_i)$
where by convention, $0 \ln(0) = 0$. This gives a sequence of functions
$\Bigl( \mathbf{P}_n \stackrel{H}{\to} \mathbb{R}_+ \Bigr).$
It’s easy to verify that $H$ satisfies the four axioms above. And obviously the set of $\alpha$s satisfying those four axioms is closed under multiplication by nonnegative scalars, so $c H$ satisfies them too for any $c \geq 0$. Fadeev proved the converse:
every $\alpha$ satisfying the four axioms above is a nonnegative scalar multiple of Shannon entropy.
In this sense, the internal $\mathbf{P}$algebras in $\mathbb{R}_+$ ‘are’ the scalar multiples of Shannon entropy.
All the theorems described in the rest of this post will depend on this one.
The free $P$algebra containing an internal $P$algebra
Fix a symmetric operad $P$, say in $Set$ or $Top$. The free $P$algebra containing an internal $P$algebra is a $P$algebra $D$ in $Cat$ (or $Cat(Top)$) equipped with an internal $P$algebra $(d, \delta)$ and initial as such.
What does this mean? Well, first of all, $d$ is supposed to denote an object of $D$, and $\delta$ is a family of maps $\delta_p: p(d, \ldots, d) \to d$, one for each operation $p$ of $P$, as in the definition of internal algebra. The initiality means that for any $P$algebra $A$ and internal $P$algebra $(a, \alpha)$ in $A$, there is a unique (strict) map $F: A \to B$ of $P$algebras such that
$F(d) = a, \quad F(\delta_p) = \alpha_p for all p.$
A slightly more abstract way to put it is that for all $P$algebras $A$, there is an isomorphism
$\{strict maps D \to A\} \cong \{internal Palgebras in A\}.$
The two statements are equivalent by the Yoneda Lemma.
This being a universal property, it characterizes $D$ and $(d, \delta)$ uniquely up to isomorphism, if they exist. And they do exist: here’s an explicit description.
 An object of $D$ is a pair $(n, p)$ where $n \in \mathbb{N}$ and $p \in P_n$

a map $(m, s) \to (n, p)$ in $D$ consists of:
 a map $f: \{1, \ldots, m\} \to \{1, \ldots, n\}$ of finite sets, together with
 for each $i \in \{1, \ldots, n\}$, an element $r \in P_{f^{1}(i)}$
 $s = \tau\cdot (p \circ (r_1, \ldots, r_n))$ where $\tau \in S_m$ is a certain permutation describing the way in which the fibres $f^{1}(i)$ are interleaved
 I’ll leave you to figure out what the $P$action on $D$ must be.
 $d = (1, 1_P)$, and I’ll also leave you to figure out what $\delta$ must be.
A few comments are in order. First, this description might seem a bit fearsome, but it’s not so bad; I’ll give a couple of examples in a moment. Second, if we were doing nonsymmetric operads then we’d require $f$ to be orderpreserving, and there wouldn’t be a permutation $\tau$. Third, $D$ is only a weak $P$algebra. It would be strict except for the symmetries. But as I said earlier, I’m going to ignore the difference between weak and strict. Fourth, if we’re doing topological operads then $D$ is a category in $Top$, and I hope you can guess what the topology is.
Examples
 Let $P$ be the terminal nonsymmetric operad, so that $P$algebras in $Cat$ are monoidal categories and internal $P$algebras are internal monoids. Then $D$ is the monoidal category of finite totally ordered sets (or a skeleton thereof), including the empty set. The ‘generic’ internal monoid $(d, \delta)$ in $D$ has $d = 1$, the oneelement totally ordered set. It’s been wellknown since at least Categories for the Working Mathematician that a monoidal functor from $D$ to another monoidal category $A$ amounts to an internal monoid in $A$.
 Fix a monoid $M$, and let $P$ be the monoid for $M$sets. If you regard $M$ as a category with single object $\star$, then $D$ is the slice category $M/\star$. In other words, the objects of $D$ are the elements of $M$, and a map $s \to p$ in $D$ is an element $r \in M$ such that $s = p r$. If $M$ has cancellation then $D$ is the poset of elements of $M$, ordered by divisibility.
Now for the most important example: the operad $\mathbf{P}$ for probability distributions. An object of $D$ is a finite probability space. I’ll write a typical object as $(X, p)$, where $X$ is a finite set and $\mathbf{p} = (p_i)_{i \in X}$ is a probability distribution on $X$. A map in $D$ is almost just a measurepreserving map. Precisely, a map $(Y, \mathbf{s}) \to (X, \mathbf{p})$ in $D$ consists of a map of sets $f: Y \to X$ together with, for each $i \in X$, a probability distribution $\mathbf{r}_i$ on the fibre $f^{1}(i)$, such that
$s_j = p_i r_{i j}$
for each $i \in X$ and $j \in f^{1}(i)$. But if $p_i \neq 0$ for all $i \in X$ then this formula determines $\mathbf{r}_i$ uniquely, and it follows that a map $(Y, \mathbf{s}) \to (X, \mathbf{p})$ amounts to a map of sets $f: Y \to X$ preserving measure, that is, satisfying
$p_i = \sum_{j \in f^{1}(i)} s_j.$
I’ll write this category $D$ as $\mathbf{FP}$, for two reasons: it’s a category formed freely from the operad $\mathbf{P}$, and it’s almost the same as the category of finite probability spaces.
The $\mathbf{P}$algebra structure on $\mathbf{FP}$ can be described as follows. Let $\mathbf{p} \in \mathbf{P}_n$. When $\mathbf{p}$ is applied to an $n$tuple of objects
$(X_1, \mathbf{r}_1), \ldots, (X_n, \mathbf{r}_n)$
of $\mathbf{FP}$, the result is the object
$(X_1 + \cdots + X_n, \mathbf{s})$
where $s_j = p_i r_{i j}$ whenever $j \in X_i$. The action of $\mathbf{P}$ on the maps in $\mathbf{FP}$ is defined in a similar way.
The universal property of $\mathbf{FP}$ implies that
strict maps of $\mathbf{P}$algebras $\mathbf{FP} \to \mathbb{R}_+$ correspond to internal $\mathbf{P}$algebras in $\mathbb{R}_+$.
But Fadeev’s theorem classifies the internal $\mathbf{P}$algebras in $\mathbb{R}_+$: they correspond to the nonnegative scalar multiples of Shannon entropy. So this says:
strict maps of $\mathbf{P}$algebras $\mathbf{FP} \to \mathbb{R}_+$ correspond to nonnegative scalar multiplies of Shannon entropy.
How they correspond can be extracted from what I’ve said so far, but let me bring it down to earth by stating it as explicitly as I know how. In it, I’ll denote maps in $\mathbf{FP}$ by Greek letters such as $\phi$, to save me from writing pairs such as $(f, \mathbf{r})$ every time.
Theorem P Let $F$ be a function from the set of maps in $\mathbf{FP}$ to the set $\mathbb{R}_+$, satisfying the following conditions:
 Functoriality: $F(\psi \circ \phi) = F(\psi) + F(\phi)$ for all composable maps $\psi$, $\phi$ in $\mathbf{FP}$
 Convex combinations: $F(\lambda \phi + (1  \lambda) \psi) = \lambda F(\phi) + (1  \lambda) F(\psi)$ for all maps $\phi$, $\psi$ in $\mathbf{FP}$ and all $\lambda \in [0, 1]$
 Continuity: $F$ is continuous.
Then there is some $c \geq 0$ such that for all maps $\phi: (Y, \mathbf{s}) \to (X, \mathbf{p})$ in $\mathbf{FP}$,
$F(\phi) = c(H(\mathbf{s})  H(\mathbf{p})).$
Conversely, for any $c \geq 0$, the function $F$ defined by this formula satisfies the conditions above.
Every part of this theorem comes directly from what we know so far, with the aid of a little routine calculation. For example, the convex combinations axiom (which says that $F$ defines a map of $\mathbf{P}$algebras) should in principle involve convex combinations of any number of maps, but an easy induction shows that two suffices. The formula $H(\mathbf{s})  H(\mathbf{p})$ is perhaps a surprise, but again pops out if you work through the details of the correspondence.
I’ve called this Theorem P. Later there will be Theorems P$'$, M and M$'$, each one characterizing entropy in some categorical way. I’ll show you how they all connect up.
The actual category of finite probability spaces
You might not like the fact that Theorem P involves the category $\mathbf{FP}$, which isn’t quite the same as the category $FinProb$ of finite probability spaces. It’s only zero probabilities that prevent it from being the same. But we have to face the fact: it’s different.
So, can we find a Theorem P$'$ with $FinProb$ instead of $\mathbf{FP}$? Yes, we can. To do so, let’s identify what makes the two categories different.
Let $(Y, \mathbf{s})$ and $(X, \mathbf{p})$ be finite probability spaces. In both categories, a map $(Y, \mathbf{s}) \to (X, \mathbf{p})$ involves a measurepreserving map $f: X \to Y$. In $FinProb$, it’s exactly that, but in $\mathbf{FP}$, it also involves a probability distribution $\mathbf{r}_i$ on each fibre $f^{1}(i)$, satisfying a compatibility condition. That condition determines $\mathbf{r}_i$ uniquely when $p_i \neq 0$, but not when $p_i = 0$.
Let’s say that a $\mathbf{P}$algebra $A$ in $Cat(Top)$ is special (for want of a better word) if whenever we have $\mathbf{p} \in \mathbf{P}_n$ and maps
$\phi_1, \psi_1: a_1 \to b_1, \quad \ldots, \quad \phi_n, \psi_n: a_n \to b_n$
in $A$ satisfying the condition that
$p_i \neq 0 \quad \Rightarrow \quad \phi_i = \psi_i$
($1 \leq i \leq n$), then
$\mathbf{p}(\phi_1, \ldots, \phi_n) = \mathbf{p}(\psi_1, \ldots, \psi_n).$
This basically says that when you multiply a map $\phi: a \to b$ by zero, the result doesn’t depend on $\phi$. ‘Multiplying by zero’ doesn’t make sense in this world of convex combinations, but that’s the idea. For this reason, $FinProb$ is special and $\mathbf{FP}$ is not.
Another special $\mathbf{P}$algebra is the monoid $\mathbb{R}_+$, since if $\mathbf{p} \in \mathbf{P}_n$ and $\mathbf{x}, \mathbf{y} \in \mathbb{R}_+^n$ with $x_i = y_i$ whenever $p_i \neq 0$, then $\sum p_i x_i = \sum p_i y_i$.
You can easily force a nonspecial $\mathbf{P}$algebra to be special, by identifying all the parallel pairs of maps of the kind that appear in the last displayed equation. In other words, the inclusion
$SpecialAlg(\mathbf{P}) \hookrightarrow Alg(\mathbf{P})$
has a left adjoint. (When I speak of $\mathbf{P}$algebras, I will always mean algebras in $Cat(Top)$.) And when this left adjoint is applied to $\mathbf{FP}$, the result is $FinProb$.
This adjointness immediately tells us that
$Hom_{Alg(\mathbf{P})}(FinProb, \mathbb{R}_+) \cong Hom_{Alg(\mathbf{P})}(\mathbf{FP}, \mathbb{R}_+).$
So by Theorem P,
strict maps of $\mathbf{P}$algebras $FinProb \to \mathbb{R}_+$ correspond to nonnegative scalar multiples of Shannon entropy.
Or explicitly:
Theorem P$'$ Let $F$ be a function from the set of maps in $FinProb$ to the set $\mathbb{R}_+$, satisfying the following conditions:
 Functoriality: $F(g \circ f) = F(g) + F(f)$ for all composable maps $g$, $f$ in $FinProb$
 Convex combinations: $F(\lambda f + (1  \lambda) g) = \lambda F(f) + (1  \lambda) F(g)$ for all maps $f$, $g$ in $FinProb$ and all $\lambda \in [0, 1]$
 Continuity: $F$ is continuous.
Then there is some $c \geq 0$ such that for all maps $f: (Y, \mathbf{s}) \to (X, \mathbf{p})$ in $FinProb$,
$F(f) = c(H(\mathbf{s})  H(\mathbf{p})).$
Conversely, for any $c \geq 0$, the function $F$ defined by this formula satisfies the conditions above.
The theorem is now in a very widely understandable form: no operads, no funny categories.
From probability spaces to measure spaces
In some sense there’s not much difference between a measure space (at least, a finite one) and a probability space: unless the measure space has total measure zero, which is pretty boring, you can always scale it so that it’s a probability space. Nonetheless, there are some advantages and disadvantages to each. In some contexts, one seems more natural than the other.
Both Theorems P and P$'$, concerning probability spaces, have analogues for measure spaces. I’ll explain them now.
To switch from probability to measure spaces, we switch operads.
The operad $\mathbf{P}$ will be replaced by an operad I’ll call $\mathbf{M}$. First note that for any monoid $G$, there’s an operad whose set of $n$ary operations is $G^n$, and with multiplication given by
$\mathbf{g} \circ (\mathbf{h}_1, \ldots, \mathbf{h}_n) = (g_1 h_{1 1}, \ldots, g_1 h_{1 k_1}, \ldots, g_n h_{n 1}, \ldots, g_n h_{n k_n})$
for $\mathbf{g} \in G^n$ and $\mathbf{h}_i \in G^{k_i}$. Let $\mathbf{M}$ be the operad resulting in this way from the multiplicative monoid $[0, \infty)$.
A $\mathbf{M}$algebra in $Set$ consists of a set $X$ together with a map
$[0, \infty)^n \times X^n \to X$
for each $n$, satisfying axioms. We might write this as
$((\mu_1, \ldots, \mu_n), (x_1, \ldots, x_n)) \mapsto \sum_i \mu_i x_i.$
In fact, an $\mathbf{M}$algebra $X$ in $Set$ amounts to a commutative monoid equipped with an action by the multiplicative monoid $[0, \infty)$, acting by monoid homomorphisms. So we do have
$\lambda(x + y) = \lambda x + \lambda y, \quad \lambda 0 = 0$
(which is what “acting by monoid homomorphisms” means), but we don’t have
$(\lambda_1 + \lambda_2) x = \lambda_1 x + \lambda_2 x, \quad 0 x = 0.$
There’s no chance of having either of these, first because the definition of the operad $\mathbf{M}$ doesn’t mention the additive structure on $[0, \infty)$, and second because these are equations of the type that an operad can’t encode: variables are duplicated or disappear from one side of the equation to the other. So, an $\mathbf{M}$algebra is quite like a module over the rig $[0, \infty)$, but not entirely.
From now on, I’ll regard $\mathbf{M}$ as a topological operad (in the obvious way), and ‘algebras’ for it will always be (strict or weak) algebras in $Cat(Top)$, as for $\mathbf{P}$.
There’s an obvious inclusion of operads $\mathbf{P} \hookrightarrow \mathbf{M}$. This induces a functor
$Alg(\mathbf{M}) \to Alg(\mathbf{P})$
which for general reasons has a left adjoint. Write $\mathbf{FM} \in Alg(\mathbf{M})$ for the image under this left adjoint of $\mathbf{FP} \in Alg(\mathbf{P})$. Explicitly:
 an object of $\mathbf{FM}$ is a pair $(X, \mu)$ where $X$ is a finite set and $\mu$ is a measure on $X$ (with all subsets regarded as measurable), or equivalently an element of $[0, \infty)^X$
 a map $(Y, \nu) \to (X, \mu)$ in $\mathbf{FM}$ consists of a map of sets $f: Y \to X$ together with a probability measure $\mathbf{r}_i$ on the fibre $f^{1}(i)$ for each $i \in X$, such that $\nu_j = \mu_i r_{i j}$ for each $i \in X$ and $j \in f^{1}(i)$.
As for $\mathbf{FP}$, this category $\mathbf{FM}$ is almost the same as the category $FinMeas$ whose objects are finite sets equipped with a measure, and whose maps are the measurepreserving maps. Indeed, when each $\mu_i$ is nonzero, the maps $(Y, \nu) \to (X, \mu)$ in $\mathbf{FM}$ are exactly the measurepreserving maps. This resemblance to $FinMeas$ is the reason for the name $\mathbf{FM}$. (It is not the free $\mathbf{M}$algebra containing an internal $\mathbf{M}$algebra.)
Once you start thinking about it, it’s pretty obvious what the $\mathbf{M}$algebra structure of $\mathbf{FM}$ must be: it’s the only possible thing. I won’t write it out.
The additive monoid $\mathbb{R}_+$ is naturally an $\mathbf{M}$algebra, by taking linear combinations. Its underlying $\mathbf{P}$algebra structure is the one we met before. So by adjointness,
$Hom_{Alg(\mathbf{M})}(\mathbf{FM}, \mathbb{R}_+) \cong Hom_{Alg(\mathbf{P})}(\mathbf{FP}, \mathbb{R}_+).$
But Theorem P tells us exactly what the righthand side is. Hence
strict maps of $\mathbf{M}$algebras $\mathbf{FM} \to \mathbb{R}_+$ correspond to nonnegative scalar multiples of Shannon entropy.
Again, the precise form of the correspondence follows from the details of what’s gone before. (Of course, I’m not imagining that you’re diligently writing out those details, but I am imagining that you’re imagining their existence.) When you unwind it all, you get the following explicit result. It refers to the Shannon entropy of an arbitrary measure space $(X, \mu)$, where $X$ is a finite set. This is, by definition,
$H(\mu) = \Vert\mu\Vert\ln\Vert\mu\Vert  \sum_i \mu_i \ln(\mu_i)$
where $\Vert\mu\Vert = \sum_i \mu_i$.
Theorem M Let $F$ be a function from the set of maps in $\mathbf{FM}$ to the set $\mathbb{R}_+$, satisfying the following conditions:
 Functoriality: $F(\psi \circ \phi) = F(\psi) + F(\phi)$ for all composable maps $\psi$, $\phi$ in $\mathbf{FM}$
 Additivity: $F(\phi + \psi) = F(\phi) + F(\psi)$ for all maps $\phi$, $\psi$ in $\mathbf{FM}$
 Homogeneity: $F(\lambda\phi) = \lambda F(\phi)$ for all maps $\phi$ in $\mathbf{FM}$ and all $\lambda \in [0, \infty)$
 Continuity: $F$ is continuous.
Then there is some $c \geq 0$ such that for all maps $\phi: (Y, \nu) \to (X, \mu)$ in $\mathbf{FM}$,
$F(\phi) = c(H(\nu)  H(\mu)).$
Conversely, for any $c \geq 0$, the function $F$ defined by this formula satisfies the conditions above.
If you’re still reading, maybe you can guess what the fourth and final theorem will be. It is in fact the main theorem of the (first) paper that John, Tobias and I are writing, and it involves $FinMeas$ rather than this funny category $\mathbf{FM}$.
So, imitating what we did for $\mathbf{P}$, let’s say that an $\mathbf{M}$algebra $A$ is special if for all maps
$\phi, \psi: a \to b$
in $A$, the maps $0\phi, 0\phi: 0 a \to 0 b$ are equal. The inclusion
$SpecialAlg(\mathbf{M}) \hookrightarrow Alg(\mathbf{M})$
has a left adjoint, which sends the nonspecial algebra $\mathbf{FM}$ to the special algebra $FinMeas$.
The $\mathbf{M}$algebra $\mathbb{R}_+$ is also special. So by adjointness,
$Hom_{Alg(\mathbf{M})}(FinMeas, \mathbb{R}_+) \cong Hom_{Alg(\mathbf{M})}(\mathbf{FM}, \mathbb{R}_+)$
which by Theorem M gives
strict maps of $\mathbf{M}$algebras $FinMeas \to \mathbb{R}_+$ correspond to nonnegative scalar multiples of Shannon entropy.
Explicitly:
Theorem M$'$ Let $F$ be a function from the set of maps in $FinMeas$ to the set $\mathbb{R}_+$, satisfying the following conditions:
 Functoriality: $F(g \circ f) = F(g) + F(f)$ for all composable maps $g$, $f$ in $FinMeas$
 Additivity: $F(f + g) = F(f) + F(g)$ for all maps $f$, $g$ in $FinMeas$
 Homogeneity: $F(\lambda f) = \lambda F(f)$ for all maps $f$ in $FinMeas$ and all $\lambda \in [0, \infty)$
 Continuity: $F$ is continuous.
Then there is some $c \geq 0$ such that for all maps $f: (Y, \nu) \to (X, \mu)$ in $FinMeas$,
$F(f) = c(H(\nu)  H(\mu)).$
Conversely, for any $c \geq 0$, the function $F$ defined by this formula satisfies the conditions above.
Tying it all together
We began with the operad $\mathbf{P}$, and the fact that the internal $\mathbf{P}$algebras in the $\mathbf{P}$algebra $\mathbb{R}_+$ correspond to the scalar multiples of Shannon entropy. This is Fadeev’s theorem from the 1950s, dressed up.
From there, we proved four closely related theorems: P, P$'$, M and M$'$. The purpose of this last section is to give an overview of how they’re related.
We have a commutative square of forgetful functors
$\begin{matrix} SpecialAlg(\mathbf{M}) &\hookrightarrow &Alg(\mathbf{M}) \\ \downarrow & &\downarrow \\ SpecialAlg(\mathbf{P}) &\hookrightarrow &Alg(\mathbf{P}) \end{matrix}$
(the one down the righthand edge having been induced by the inclusion of operads $P \hookrightarrow M$). As always, $Alg$ refers to algebras in $Cat(Top)$.
Each of these forgetful functors has a left adjoint, giving another commutative square
$\begin{matrix} SpecialAlg(\mathbf{M}) &\leftarrow &Alg(\mathbf{M}) \\ \uparrow & &\uparrow \\ SpecialAlg(\mathbf{P}) &\leftarrow &Alg(\mathbf{P}). \end{matrix}$
There is a canonical object of $Alg(\mathbf{P})$, namely, the free $\mathbf{P}$algebra $\mathbf{FP}$ containing an internal $\mathbf{P}$algebra. Applying to $\mathbf{FP}$ the functors shown in this last square gives:
$\begin{matrix} FinMeas & ↤ &\mathbf{FM} \\ ↥ & & ↥ \\ FinProb & ↤ &\mathbf{FP}. \end{matrix}$
The universal property of $\mathbf{FP}$ tells us that the maps from it to $\mathbb{R}_+$ are the internal $\mathbf{P}$algebras in $\mathbb{R}_+$, which by Fadeev’s theorem are the scalar multiples of Shannon entropy. This is “Theorem P”. Then adjointness gives us three more theorems for free:
$\begin{matrix} Theorem M' & ↤ &Theorem M \\ ↥ & & ↥ \\ Theorem P' & ↤ &Theorem P. \end{matrix}$
From this point of view, Theorem P (concerning $\mathbf{FP}$) is the fundamental theorem from which the rest follow. Theorem M$'$ (concerning $FinMeas$, and included in our paper) is, at the other extreme, the height of refinement. Nevertheless, it can be stated in a totally direct and explicit way.
Re: An Operadic Introduction to Entropy
[in an earlier version of this post Tom had asked how to typeset “mapsto”arrows that point in various directions, my reply below refers to that]
You need to explicitly code them as unicode characters: typing
produces
↤ ↥ ↦ or ↧