A Bit More About Entropy
Posted by David Corfield
There’s been a lot of interest in entropy of late around here. I thought I’d record what I’d found since it’s spread over a few posts.
Entropy, we have seen, can provide a measure of information loss under coarse-graining. From a distribution over the restaurants in a town, if for each restaurant I specify a distribution over the dishes served there, then I can generate a distribution over all instances of restaurant and dish. On the other hand, from such a distribution over all dishes in the restaurants of the town, I can coarse-grain to give a distribution over restaurants. What Tom, John and Tobias show is that a sensible positive real-valued measure of what is lost is equal to the difference between the entropies of each distribution.
Now the kind of measure-preserving mapping which takes a distribution over restaurants to a distribution over dishes in restaurants has been named by others a congruent embedding by a Markov mapping. They are part of a larger story in which entropy can be situated.
It starts with Čencov in Statistical decision rules and optimal inference (1982). The term congruent embedding comes from the way a measure-preserving map from distributions over restaurants to distributions over dishes in restaurants can be seen as an embedding of the simplex of distributions over things, , into the simplex of distributions over things, .
Now Čencov showed that the only metric on the manifolds for which all congruent embeddings induce an isometry is
A simple calculation shows that this is equal to the Fisher information metric.
Campbell in An extended Čencov characterization of the information metric then showed that it’s worth looking outside to the full cone of measures, . He extended Čencov’s result to the positive cones by showing that metrics giving rise to isometries under congruent embeddings are highly constrained, and include
where . This metric has come to be known as the Shahshahani metric for reasons you can discover from Marc Harper’s papers discussed on John’s blog here.
So, vectors in the tangent plane at a point in the subspace of probability distributions, , have the form , where , and is the obvious basis.
The unit normal vector at for the Shahshahani metric is since
and
Now another very natural quantity in this set up is the invariant vector field . I found this after a discussion with Urs on cohomology and characteristic classes. It is invariant under the multiplicative action of ,
An obvious thing to try now is to take the inner product at a point of and . We find
the entropy of the distribution .
Relative entropy seems to arise then as though you parallel transport the invariant vector to , then compare the projections of it and onto the unit normal vector at .
I wonder if from the geometry of the situation we can see why the Fisher-Shahshahani metric emerges as the curvature of the relative entropy .
Re: A Bit More About Entropy
I don’t get your calculation showing that the vector field is invariant under the multiplicative action of . Where does the term involving come from? The only point (with positive coordinates) where vanishes is at . However, this point is not invariant under the -action. So how is it possible that is invariant?