Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

July 24, 2021

Entropy and Diversity Is Out!

Posted by Tom Leinster

My new book, Entropy and Diversity: The Axiomatic Approach, is in the shops!

If you live in a place where browsing the shelves of an academic bookshop is possible, maybe you’ll find it there. If not, you can order the paperback or hardback from CUP. And you can always find it on the arXiv.

Paperback and hardback with flowers and foliage

I posted here when the book went up on the arXiv. It actually appeared in the shops a couple of months ago, but at the time all the bookshops here were closed by law and my feelings of celebration were dampened.

But today someone asked a question on MathOverflow that prompted me to write some stuff about the book and feel good about it again, so I’m going to share a version of that answer here. It was long for MathOverflow, but it’s shortish for a blog post.

There are lots of aspects of the book that I’m not going to write about here. My previous post summarizes the contents, and the book itself is quite discursive. Today, I’m just going to stick to that MathOverflow question (by Aidan Rocke) and what I answered.

Briefly put, the question asked:

Might there be a natural geometric interpretation of the exponential of entropy?

I answered more or less as follows.

Paperback and hardback with flowers and foliage

The direct answer to your literal question is that I don’t know of a compelling geometric interpretation of the exponential of entropy. But the spirit of your question is more open, so I’ll explain (1) a non-geometric interpretation of the exponential of entropy, and (2) a geometric interpretation of the exponential of maximum entropy.

Diversity as the exponential of entropy

The exponential of entropy (Shannon entropy, or more generally Rényi entropy) has long been used by ecologists to quantify biological diversity. One takes a community with nn species and writes p=(p 1,,p n)\mathbf{p} = (p_1, \ldots, p_n) for their relative abundances, so that p i=1\sum p_i = 1. Then D q(p)D_q(\mathbf{p}), the exponential of the Rényi entropy of p\mathbf{p} of order q[0,]q \in [0, \infty], is a measure of the diversity of the community, or the “effective number of species” in the community.

Ecologists call D qD_q the Hill number of order qq, after the ecologist Mark Hill, who introduced them in 1973 (acknowledging the prior work of Rényi). There is a precise mathematical sense in which the Hill numbers are the only well-behaved measures of diversity, at least if one is modelling an ecological community in this crude way. That’s Theorem 7.4.3 of my book. I won’t talk about that here.

Explicitly, for q[0,]q \in [0, \infty],

D q(p)=( i:p i0p i q) 1/(1q) D_q(\mathbf{p}) = \biggl( \sum_{i:\,p_i \neq 0} p_i^q \biggr)^{1/(1 - q)}

(q1,q \neq 1, \infty). The two exceptional cases are defined by taking limits in qq, which gives

D 1(p)= i:p i0p i p i D_1(\mathbf{p}) = \prod_{i:\, p_i \neq 0} p_i^{-p_i}

(the exponential of Shannon entropy) and

D (p)=1/max i:p i0p i. D_\infty(\mathbf{p}) = 1/\max_{i:\, p_i \neq 0} p_i.

Rather than picking one qq to work with, it’s best to consider all of them. So, given an ecological community and its abundance distribution p\mathbf{p}, we graph D q(p)D_q(\mathbf{p}) against qq.

This is called the diversity profile of the community, and is quite informative. Different values of the parameter qq tell you different things about the community. Specifically, low values of qq pay close attention to rare species, and high values of qq ignore them.

For example, here’s the diversity profile for the global community of great apes:

ape diversity profile

(from Figure 4.3 of my book). What does it tell us? At least two things:

  • The value at q=0q = 0 is 88, because there are 88 species of great ape present on Earth. D 0D_0 measures only presence or absence, so that a nearly extinct species contributes as much as a common one.

  • The graph drops very quickly to 11 — or rather, imperceptibly more than 11. This is because 99.9% of ape individuals are of a single species (humans, of course: we “outcompeted” the rest, to put it diplomatically). It’s only the very smallest values of qq that are affected by extremely rare species. Non-small qqs barely notice such rare species, so from their point of view, there is essentially only 11 species. That’s why D q(p)1D_q(\mathbf{p}) \approx 1 for most qq.

Maximum diversity as a geometric invariant

A major drawback of the Hill numbers is that they pay no attention to how similar or dissimilar the species may be. “Diversity” should depend on the degree of variation between the species, not just their abundances. Christina Cobbold and I found a natural generalization of the Hill numbers that factors this in — similarity-sensitive diversity measures.

I won’t give the definition (see that last link or Chapter 6 of the book), but mathematically, this is basically a definition of the entropy or diversity of a probability distribution on a metric space. (As before, entropy is the log of diversity.) When all the distances are \infty, it reduces to the Rényi entropies/Hill numbers.

And there’s some serious geometric content here.

Let’s think about maximum diversity. Given a list of species of known similarities to one another — or mathematically, given a metric space — one can ask what the maximum possible value of the diversity is, maximizing over all possible species distributions p\mathbf{p}. In other words, what’s the value of

sup pD q(p), \sup_{\mathbf{p}} D_q(\mathbf{p}),

where D qD_q now denotes the similarity-sensitive (or metric-sensitive) diversity? Diversity is not usually maximized by the uniform distribution (e.g. see Example 6.3.1 in the book), so the question is not trivial.

In principle, the answer depends on qq. But magically, it doesn’t! Mark Meckes and I proved this. So the maximum diversity

D max(X):=sup pD q(p) D_{\text{max}}(X) := \sup_{\mathbf{p}} D_q(\mathbf{p})

is a well-defined real invariant of finite metric spaces XX, independent of the choice of q[0,]q \in [0, \infty].

All this can be extended to compact metric spaces, as Emily Roff and I showed. So every compact metric space has a maximum diversity, which is a nonnegative real number.

What on earth is this invariant? There’s a lot we don’t yet know, but we do know that maximum diversity is closely related to some classical geometric invariants.

For instance, when X nX \subseteq \mathbb{R}^n is compact,

Vol(X)=n!ω nlim tD max(tX)t n, \text{Vol}(X) = n! \omega_n \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{t^n},

where ω n\omega_n is the volume of the unit nn-ball and tXtX is XX scaled by a factor of tt. This is Proposition 9.7 of my paper with Roff and follows from work of Juan Antonio Barceló and Tony Carbery. In short: maximum diversity determines volume.

Another example: Mark Meckes showed that the Minkowski dimension of a compact space X nX \subseteq \mathbb{R}^n is given by

dim Mink(X)=lim tD max(tX)logt \dim_{\text{Mink}}(X) = \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{\log t}

(Theorem 7.1 here). So, maximum diversity determines Minkowski dimension too.

There’s much more to say about the geometric aspects of maximum diversity. Maximum diversity is closely related to another recent invariant of metric spaces, magnitude. Mark and I wrote a survey paper on the more geometric and analytic aspects of magnitude, and you can find more on all this in Chapter 6 of my book.

Postscript

Although diversity is closely related to entropy, the diversity viewpoint really opens up new mathematical questions that you don’t see from a purely information-theoretic standpoint. The mathematics of diversity is a rich, fertile and underexplored area, beckoning us to come and investigate.

Scholar studying Entropy and Diversity

Posted at July 24, 2021 8:51 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3337

15 Comments & 0 Trackbacks

Re: Entropy and Diversity Is Out!

The exponential of

iXp ilnp i \sum_{i \in X} p_i \ln p_i

is

iXp i p i \prod_{i \in X} {p_i}^{p_i}

and last week David Jaz-Myers told me a nice categorification of this quantity. Suppose π:PX\pi \colon P \to X is any function between sets. Let P iP_i be the fiber over iXi \in X:

P i={yP:π(y)=i} P_i = \{y \in P: \; \pi(y) = i \}

We can think of PP as a ‘set over XX’, and in the category of sets over XX we have

End(P)= iXP i P i End(P) = \prod_{i \in X} {P_i}^{P_i}

In other words, this is the set of functions from PP to itself that preserve every fiber.

David told me that someone had just come out with a paper using this idea to think about entropy in new ways. I can’t remember who that was, but maybe someone can tell us.

Posted by: John Baez on July 25, 2021 5:24 AM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

You may be thinking of Spivak and Hosgood’s new paper.

Coincidentally, because of seeing a different paper on information loss associated to a map, namely this one of Fullwood and Parzygnat, I was starting to think about how to interpret finite probability spaces with rational probabilities as given by combinatorial problems of counting points in fibres of a function, and how one should interpret entropy. And my colleague David Butler told me, when I shared my thoughts, he was trying to get such an approach into the intro probability bridging course for people coming to uni with insufficient mathematics background (and then replacing counting points with areas of the rectangles in a histogram — which is essentially what Spivak and Hosgood are doing in a much more highbrow way).

So these ideas are very attractive and natural from many points of view, and should be better known!

Posted by: David Roberts on July 25, 2021 7:41 AM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

Oh, and one tiny reason why I was exploring this is that it helps makes sense of the convention for calculating entropy that 0log(0)=00\cdot\log(0)=0, since one is counting the number of functions \emptyset\to \emptyset, of which there is one.

Posted by: David Roberts on July 25, 2021 7:46 AM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

John very nearly wrote:

We can think of PP as a ‘set over XX’, and in the category of sets over XX we have

End X(P)= iXP i P i.End_X(P) = \prod_{i \in X} P_i^{P_i}.

(I’ve added an XX subscript to your EndEnd, to make clear they’re endomorphisms over XX.)

Right. And we can try to take this further and get our hands on the number

D(p)= iXp i p i D(\mathbf{p}) = \prod_{i \in X} p_i^{-p_i}

where — assuming PP and XX are finite —

p i=|P i||P|. p_i = \frac{|P_i|}{|P|}.

An easy observation:

D(p) |P|=|End X(P)||End(P)|, D(\mathbf{p})^{-|P|} = \frac{|End_X(P)|}{|End(P)|},

that is,

D(p) |P|=probability that a random endo ofPis an endo overX. D(\mathbf{p})^{-|P|} = \text{probability that a random endo of} \ P\ \text{is an endo over}\ X.

For example, consider the probability distribution

p=(2/10,5/10,3/10). \mathbf{p} = (2/10, 5/10, 3/10).

Think of a 10-element set partitioned into three blocks. There are 10 1010^{10} endomorphisms of the set, and 2 25 53 32^2 5^5 3^3 of them preserve the blocks. The probability that a random endomorphism preserves the blocks is

2 25 53 310 10=D(p) 10. \frac{2^2 5^5 3^3}{10^{10}} = D(\mathbf{p})^{-10}.

Joachim Kock and I spent a little bit of time trying to squeeze something out of this, but to no avail.

Posted by: Tom Leinster on July 25, 2021 10:53 AM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

In our notation, this quantity D(p) |P|D(\mathbf{p})^{-|P|} would be L(d) A(d)L(d)^{-A(d)}, but this isn’t something that popped up when we were writing our paper, and I can’t think of a nice intuition for it off the top of my head…

Posted by: Tim Hosgood on July 25, 2021 5:49 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

Here’s a blog article explaining the new paper I was talking about:

Posted by: John Baez on July 25, 2021 4:12 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

I see that Tim and David use the term “length” to mean the diversity of order 11, i.e. the exponential of Shannon entropy. They say in the abstract of their paper that “classically” it has also been called the “perplexity”, although they don’t seem to give a reference for that.

I’m not surprised to see this quantity going by many names, given (1) the extremely varied contexts in which entropy itself arises, and (2) the fact that the exponential of entropy is often a more natural quantity than entropy itself.

The name “perplexity” reminds me of expected surprise, although that’s a term for one kind of entropy rather than diversity (i.e. a logarithmic thing, not an exponential thing).

Posted by: Tom Leinster on July 25, 2021 5:06 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

David and I searched for a name for this quantity for a while. Since this is far from my usual area, I had no idea, and could only think of looking on the Wikipedia page for Shannon entropy, where it didn’t seem to be mentioned. I can’t remember exactly, but I think David mentioned to me that he’d asked somebody about it and they’d mentioned the term “perplexity”, but I do admit to being rather lax with finding any citation to back this up!

(As for the term “length”, this is simply to fit in with the rest of the analogy, since we already have “width” and “area”, and these terms end up corresponding to the usual definitions for a certain rectangle that you can draw.)

Posted by: Tim Hosgood on July 25, 2021 5:40 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

Just in case it sounds like I was telling you off, I wasn’t! It’s very hard to look up the name of something when you don’t already know its name (and if you did, you wouldn’t need to look it up). I experienced that myself a few years ago when I stumbled across Fermat quotients without knowing what they were called. Despite my best web-searching efforts, it was weeks or maybe months before I discovered their name.

And many of these entropic quantities have an absolute ton of names. For instance, the introduction to Chapter 3 of my book lists fourteen names for relative entropy.

I vaguely remember that the exponential of entropy has also been called “extropy”.

Posted by: Tom Leinster on July 25, 2021 6:05 PM | Permalink | Reply to this

What’s in a name? That which we call an entropy…

Several years ago, a colleague and I were looking into axiomatic characterizations of Shannon entropy and how they might be abstracted away to provide a notion of “information” in contexts where a probability distribution has not been defined. It was pure luck that we learned that the conditions we’d arrived at defined something called the rank function of a polymatroid.

Posted by: Blake Stacey on July 25, 2021 9:20 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

The difficulty of finding out the names for mathematical concepts is a peculiar phenomenon in an age where, in general, it’s so incredibly easy to find abundant information on even the most obscure topics.

In the phase when I was looking into Fermat quotients but didn’t know what they were called, I was pretty sure they had to have a well-known name. But that confidence didn’t help me find out what that name was. From the moment I did discover the name (I forget how), I had access to more information about them than I needed or will ever need. It’s a real step change.

Posted by: Tom Leinster on July 25, 2021 10:28 PM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

These sorts of problems are exactly the sorts of things that Valeria et al are hoping to solve! https://topos.site/blog/2021/07/introducing-the-mathfoldr-project/

Posted by: Tim Hosgood on July 26, 2021 12:37 AM | Permalink | Reply to this

Re: Entropy and Diversity Is Out!

This is a long shot: could

MR2400035 (2009b:11007) Jesse Elliott, Ring structures on groups of arithmetic functions, J. Number Theory 128 (2008) 709 – 730

be useful for this?

Posted by: jackjohnson on July 26, 2021 5:16 PM | Permalink | Reply to this

I think this is a fancy way of interpreting Boltzmann’s entropy formula (in which apart from a constant, entropy is defined as the logarithm of the number of microstates that correspond to a macroscopic state).

Posted by: Steve Huntsman on July 26, 2021 2:32 PM | Permalink | Reply to this

Re:

Thanks for reminding me of that, Steve! That’s a great example to keep in mind.

Posted by: David Roberts on July 27, 2021 12:54 AM | Permalink | Reply to this

Post a New Comment