Functional Equations II: Shannon Entropy
Posted by Tom Leinster
In the second instalment of the functional equations course that I’m teaching, I introduced Shannon entropy. I also showed that up to a constant factor, it’s uniquely characterized by a functional equation that it satisfies: the chain rule.
Notes for the course so far are here. For a quick summary of today’s session, read on.
You can read the full story in the notes, but here I’ll state the main result as concisely as I can.
For , let denote the set of probability distributions on . The Shannon entropy of is
Now, given
we obtain a composite distribution
The chain rule for states that
So, is a sequence of continuous functions satisfying the chain rule. Clearly, the same is true of any nonnegative scalar multiple of .
Theorem (Faddeev, 1956) The only sequences of continuous functions satisfying the chain rule are the scalar multiples of entropy.
One interesting aspect of the proof is where the difficulty lies. Let be continuous functions satisfying the chain rule; we have to show that is proportional to . All the effort and ingenuity goes into showing that is proportional to when restricted to the uniform distributions. In other words, the hard part is to show that there exists a constant such that
for all . But once that’s done, showing that is a pushover. The notes show you how!
Re: Functional Equations II: Shannon Entropy
So who is your audience for this course?