### Functional Equations II: Shannon Entropy

#### Posted by Tom Leinster

In the second instalment of the functional equations course that I’m teaching, I introduced Shannon entropy. I also showed that up to a constant factor, it’s uniquely characterized by a functional equation that it satisfies: the chain rule.

Notes for the course so far are here. For a quick summary of today’s session, read on.

You can read the full story in the notes, but here I’ll state the main result as concisely as I can.

For $n \geq 0$, let $\Delta_n$ denote the set of probability distributions
$\mathbf{p} = (p_1, \ldots, p_n)$ on $\{1, \ldots, n\}$. The **Shannon
entropy** of $\mathbf{p} \in \Delta_n$ is

$H(\mathbf{p}) = - \sum_{i \colon p_i \gt 0} p_i \log p_i.$

Now, given

$\mathbf{w} \in \Delta_n, \,\, \mathbf{p}^1 \in \Delta_{k_1}, \ldots, \mathbf{p}^n \in \Delta_{k_n},$

we obtain a composite distribution

$\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n) = (w_1 p^1_1, \ldots, w_1 p^1_{k_1}, \, \ldots, \, w_n p^n_1, \ldots, w_n p^n_{k_n}).$

The **chain rule** for $H$ states that

$H(\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n)) = H(\mathbf{w}) + \sum_{i = 1}^n w_i H(\mathbf{p}^i).$

So, $(H: \Delta_n \to \mathbb{R}^+)_{n \geq 1}$ is a sequence of continuous functions satisfying the chain rule. Clearly, the same is true of any nonnegative scalar multiple of $H$.

Theorem (Faddeev, 1956)The only sequences of continuous functions $(\Delta_n \to \mathbb{R}^+)_{n \geq 1}$ satisfying the chain rule are the scalar multiples of entropy.

One interesting aspect of the proof is where the difficulty lies. Let $I:
\Delta_n \to \mathbb{R}^+$ be continuous functions satisfying the chain
rule; we have to show that $I$ is proportional to $H$. All the effort and
ingenuity goes into showing that $I$ is proportional to $H$ *when
restricted to the uniform distributions*. In other words, the hard part is
to show that there exists a constant $c$ such that

$I(1/n, \ldots, 1/n) = c H(1/n, \ldots, 1/n)$

for all $n \geq 1$. But once *that’s* done, showing that $I(\mathbf{p}) =
c H(\mathbf{p})$ is a pushover. The notes show you how!

## Re: Functional Equations II: Shannon Entropy

So who is your audience for this course?