Entropies vs. Means
Posted by Tom Leinster
If you’ve been watching this blog, you can’t help but have noticed the current entropy-fest. It started on John’s blog Azimuth, generated a lengthy new page on John’s patch of the nLab, and led to first this entry at the Café, then this one.
Things have got pretty unruly. It’s good unruliness, in the same way that brainstorming is good, but in this post I want to do something to help those of us who are confused by the sheer mass of concepts, questions and results—which I suspect is all of us.
I want to describe a particular aspect of the geography of this landscape of ideas. Specifically, I’ll describe some connections between the concepts of entropy and mean.
This can be thought of as background to the project of finding categorical characterizations of entropy. There will be almost no category theory in this post.
I’ll begin by describing the most vague and the most superficial connections between entropy and means. Then I’ll build up to a more substantial connection that appeared in the comments on the first Café post, finishing with a connection that we haven’t seen here before.
Something vague I’m interested in measures of size. This has occupied a large part of my mathematical life for the last few years. Means aren’t exactly a measure of size, but they almost are: the mean number of cameras owned by a citizen of Cameroon is the size of the set of Cameroonian cameras, divided by the size of the population. So I naturally got interested in means: see these two posts, for instance. On the other hand, entropy is also a kind of size measure, as I argued in these two posts. So the two concepts were already somewhat connected in my mind.
Something superficial All I want to say here is: look at the definitions! Just look at them!
So, I’d better give you these definitions.
Basic definitions I’ll write
(which previously I’ve written as ). For each , the power mean of order is the function
defined for by
Think of this as an average of , weighted by . The three exceptional values of are handled by taking limits:
The minimum, product and maximum are, like the sum, taken over all such that . I’ll generally assume that ; these cases never cause trouble. So: the only definition you need to pay attention to is the one for generic .
Now for a definition of entropy… almost. Actually, I’m going to work with the closely related notion of diversity. For , the diversity of order is the map
defined by
for , and again by taking limits in the exceptional cases:
where again the min, product and max are over all such that .
The name ‘diversity’ originates from an ecological application. We think of as representing a community of species in proportions , and as a measure of that community’s biodiversity. Different values of the parameter represent different opinions on how much importance should be assigned to rare or common species. (Newspaper stories on biodiversity typically focus on threats to rare species, but the balance of common species is also important for the healthy functioning of an ecosystem as a whole.) Theoretical ecologists often call the Hill number of order .
Now, many of you know not as ‘diversity’, but as Rényi extropy. I’d like to advocate the name ‘diversity’.
First, diversity is a fundamental concept and deserves a simple name. It’s much more general than just something from ecology: it applies whenever you have a collection of things divided into classes.
Second, ‘Rényi extropy’ is a terribly off-putting name. It assumes you already understand entropy (itself a significant task), then that you understand Rényi entropy (whose meaning you couldn’t possibly guess since it’s named after a person), and then that you’re familiar with the half-jokey usage of ‘extropy’ to mean the exponential of entropy. In contrast, diversity is something that can be understood directly, without knowing about entropy of any kind.
An enormously important property of diversity is that it is an effective number. This means that the value it assigns to the uniform distribution on a set is the cardinality of that set:
This is what distinguishes diversity from the various other functions of that get used (e.g. Rényi entropy and the entropy variously named after Havrda, Charvát, Daróczy, Patil, Taillie and Tsallis). I recently gave a little explanation of why effective numbers are so important, and I gave a different explanation (using terminology differently) in this post on entropy, diversity and cardinality.
Something superficial, continued Let me now go back to my superficial reason for thinking that means will be useful in the study of entropy and diversity. I declared: just look at the formulas! There’s an obvious resemblance. And in particular, look what happens in the definition of power mean when you put and :
This reminds me of some other things. To study quadratic forms , it’s really helpful to study the associated bilinear forms . Or, similarly, you’ll often be able to prove more about a Banach space if you know it’s a Hilbert space.
Moreover, there are reasons for thinking that something quite significant is going on in the step ‘put ’. I suspect that fundamentally, is a function on , but is a measure. By equating them we’re really taking advantage of the finiteness of our sets. For more general sets or spaces, we might need to keep and separate.
Something substantial To explain this more substantial connection between diversity and means, I first need to explain how the simplices form an operad.
If you know what an operad is, it’s enough for me to tell you that any convex subset of is naturally a -algebra via the action
(). That should enable you to work out what the composition in must be.
If you don’t know what an operad is, all you need to know is the following. An operad structure on the sequence of sets consists of a choice of map
for each , satisfying some axioms. The map is written
and called composition. The particular operad structure that I have in mind has its composition defined by
So the composite is obtained by putting the probability distributions side by side, weighting them by respectively.
Here’s the formula for the diversity of a composite:
Notice that the diversity of a composite depends only on and the diversities , not on the distributions themselves. Pushing that thought, you might hope that it wouldn’t depend on itself, only its diversity; but it’s not to be.
(Here I’m assuming that . I’ll let you work out those cases, or you can find them here. And you should take what I say about the case with a pinch of salt; I haven’t paid much attention to it.)
Digressing briefly, this expression can be written as a mean:
where is the vector with th component . I call this a digression because I don’t know whether this is a useful observation. It’s a different connection between the diversity of a composite and means that I want to point out here.
To explain that connection, I need a couple more bits of terminology. The partition function of a probability distribution is the function
defined by
Any probability distribution belongs to a one-parameter family of probability distributions, defined by
where . These are sometimes called the escort distributions of . (In particular, , so there’s something especially convenient about the case .)
A small amount of elementary algebra tells us that the diversity of a composite can be re-expressed as follows:
This is the connection I’ve been building up to: the diversity of a composite expressed in terms of a power mean.
To understand this further, think of a large ecological community spread over several islands, with the special feature that no species can be found on more than one island. The distribution gives the relative sizes of the total populations on the different islands, and the distribution gives the relative abundances of the various species on the th island.
Now, the formula tells us the diversity of the composite community in terms of the diversities of the islands and their relative sizes. More exactly, it expresses it as a product of two factors: the diversity between the islands (), and the average diversity within the islands ().
Something new …where ‘new’ is in the sense of ‘new to this conversation’, not ‘new to the world’.
We’ve just seen how, for each real number , the diversity of a composite decomposes as a product of two factors. The first factor is the diversity of . The second is some kind of mean of the diversities of the s, weighted by a distribution depending on .
We know this because we have a formula for . But what if we take the description in the previous paragraph as axiomatic? In other words, suppose that we have for each functions
and some kind of ‘mean operation’ , satisfying
What does this tell us about , and ? Could it even be that it forces , and for some ?
Well, it depends what you mean by ‘mean’. But that’s a subject that’s been well raked over, and there are several axiomatic characterizations of the power means out there. So let me skip that part of the question and assume immediately that for some .
So now we’ve decided what our mean operation is, but we still have an undetermined thing called ‘diversity’ and an undetermined operation for turning one probability distribution into another. All we have by way of constraints is the equation above for the diversity of a composite, and perhaps we’ll also allow ourselves some further basic assumptions on diversity, such as continuity.
The theorem is that these meagre assumptions are enough to determine diversity uniquely.
Theorem (Routledge) Let . Let
be families of functions such that
- is an effective number
- is symmetric
- is continuous
- for all .
Then and .
This result appeared in
R. D. Routledge, Diversity indices: which ones are admissible? Journal of Theoretical Biology 76 (1979), 503–515.
And the moral is: diversity, hence entropy, can be uniquely characterized using means.
Postscript This theorem is closer to the basic concerns of ecology than you might imagine. When a geographical area is divided into several zones, you can ask how much of the biological diversity of the area should be attributed to variation between the zones, and how much to variation within the zones. This is very like our island scenario above, but more complicated, since the same species may be present in multiple zones.
Ecologists talk about -diversity (the average within-zone diversity), -diversity (the diversity between the zones), and -diversity (the global diversity, i.e. that of the whole community). The concept of -diversity can play a part in conservation decisions. For example, if the -diversity of our area is perceived or measured to be low, that means that some of the zones are quite similar to each other. In that case, it might not be important to conserve all of them: resources can be concentrated on just a few.
The theorem tells us something about how -, - and -diversity must be defined if simple and desirable properties are to hold. This story reached a definitive end in a quite recent paper:
Lou Jost, Partitioning diversity into independent alpha and beta components, Ecology 88 (2007), 2427–2439.
But Jost’s paper takes us beyond what we’re currently doing, so I’ll leave it there for now.
Re: Entropies vs. Means
Thank you, Tom. I have been reading the Renyi Entropy posts feeling more and more lost. But ‘Diversity’ has helped me regain orientation.