Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 8, 2017

Functional Equations V: Expected Surprise

Posted by Tom Leinster

In today’s class I explained the concept of “expected surprise”, which also made an appearance on this blog back in 2008: Entropy, Diversity and Cardinality (Part 1). Expected surprise is a way of interpreting the qq-deformed entropies that I like to call “surprise entropies”, and that are usually and mistakenly attributed to Tsallis. These form a one-parameter family of deformations of ordinary Shannon entropy.

Also in this week’s session: qq-logarithms, and a sweet, unexpected surprise:

Surprise entropies are much easier to characterize than ordinary entropy!

For instance, all characterization theorems for Shannon entropy involve some regularity condition (continuity or at least measurability), whereas each of its qq-deformed cousins has an easy characterization that makes no regularity assumption at all.

It’s all on pages 18–21 of the course notes so far.

Posted at March 8, 2017 1:36 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2946

12 Comments & 0 Trackbacks

Re: Functional Equations V: Expected Surprise

Wow, that characterization of surprise entropies is indeed an unexpected surprise!

I agree that that particular theorem does not transfer to the case of Shannon entropy. Nevertheless, I’d like to propose a candidate characterization of Shannon entropy that doesn’t involve any regularity condition. After two paragraphs of prelude, I’ll give a sketch of how it could work. It’s not a complete sketch yet, since I don’t get continuity for free, but only boundedness on Δ 2\Delta_2; but in light of the equivalence of continuity and boundedness in the case of Cauchy’s functional equation, I’m hoping that this could be enough. Do you happen to know?

Basically, the new ingredient is that we can simulate any biased coin using two expected tosses of a fair coin, like this: use the fair coin to generate the infinite distribution (12,14,18,)(\tfrac{1}{2},\tfrac{1}{4},\tfrac{1}{8},\ldots) by tossing the coin repeatedly until it lands heads. Now for any bias p(0,1)p\in(0,1), the binary expansion p= ip i2 ip = \sum_i \frac{p_i}{2^i} shows that the biased coin (p,1p)(p,1-p) is a coarse-graining of the above infinite distribution.

More generally, I think that this kind of procedure lets us simulate any countable distribution with finite entropy using a finite expected number of fair coin tosses.

Now let Δ \Delta_\infty be the infinite-dimensional ‘simplex’ consisting of those countable distributions that have finite entropy. These should be precisely those distributions that can be generated from a finite expected number of fair coin tosses. My conjecture is that up to a scalar multiple, Shannon entropy is the only function H:Δ [0,)H:\Delta_\infty\to[0,\infty) which satisfies the chain rule. So why would boundedness on Δ 2\Delta_2 be automatic? That’s because of the above simulation argument: there’s a chain rule equation equating H(12,14,18,)H(\tfrac{1}{2},\tfrac{1}{4},\tfrac{1}{8},\ldots) to H(p,1p)H(p,1-p) plus some extra terms. Since these extra terms are nonnegative by assumption, we have H(p,1p)H(12,14,18,)=2H(12,12)H(p,1-p)\leq H(\tfrac{1}{2},\tfrac{1}{4},\tfrac{1}{8},\ldots) = 2H(\tfrac{1}{2},\tfrac{1}{2}) for all p(0,1)p\in(0,1), QED.

Posted by: Tobias Fritz on March 8, 2017 4:38 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

Wow, that characterization of surprise entropies is indeed an unexpected surprise!

Yes, it’s much simpler than the proof at the end of the document I sent you and John.

As it says in the notes, I extracted it from the book by Aczél and Daróczy. They rightly emphasize that it doesn’t need any kind of regularity condition. However, they spread the proof over about 30 pages, completely obscuring the fact that it can be done in less than ten lines. The version in my notes contains absolutely no ideas that aren’t in that book — it was just a matter of untangling the notation.

I like the “two expected tosses” construction, and I like your conjecture! A few questions, though. You’ve shown that

H(p,1p)H(1/2,1/4,) H(p, 1 - p) \leq H(1/2, 1/4, \ldots)

for all p(0,1)p \in (0, 1). But why is the right-hand side equal to 2H(1/2,1/2)2H(1/2, 1/2)? And is HH of any distribution bounded above by H(1/2,1/4,)H(1/2, 1/4, \ldots)? And, is there a way of describing Δ \Delta_\infty that doesn’t refer to a pre-existing notion of entropy?

Posted by: Tom Leinster on March 8, 2017 4:58 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

That sounds like a whole lot of work – I think that we should give more credit to those kinds of simplifications. (I confess to never actually having properly read the book, despite having worked on the subject…)

Anyway, let me get to your questions. Apologies for having swept a couple of things under the rug in my previous comment; it was already getting a bit long, and I also haven’t actually worked things out in all detail yet. (I’m not sure if I ever will; as a stand-alone result, it may not be interesting enough.)


But why is the right-hand side equal to 2H(1/2,1/2)2H(1/2, 1/2)?

That follows from an instance of the chain rule, H(1/2,1/4,)=H(1/2,1/2)+12H(1/2,1/4,). H(1/2,1/4,\ldots) = H(1/2,1/2) + \frac{1}{2} H(1/2,1/4,\ldots). Intuitively, tossing a fair coin until it lands heads is the same as tossing it once, and then, with probability 1/21/2, tossing it until it lands heads!


And is HH of any distribution bounded above by H(1/2,1/4,)H(1/2, 1/4, \ldots)?

No, only of the binary ones. Simulating an nn-ary distribution with a fair coin will in general require more than two expected tosses. But for any fixed nn, one can bound H(p)H(\mathbf{p}) for pΔ n\mathbf{p}\in\Delta_n either using a generalization of the n=2n=2 argument, or using the chain rule repeatedly to reduce to the binary case. With the latter, I think that one will get the bound H(p)2nH(1/2,1/2)H(\mathbf{p})\leq 2n H(1/2,1/2).


And, is there a way of describing Δ \Delta_\infty that doesn’t refer to a pre-existing notion of entropy?

Yes, I think so: it’s the set of all those distributions p\mathbf{p} that can be simulated using a finite expected number of tosses of a fair coin. My notion of ‘simulation’ here is this: you’re allowed to consider any finite or infinite binary decision tree, such that doing a random walk down the tree terminates almost surely, and such that every leaf node is labelled by an outcome of the distribution p\mathbf{p} that you want to simulate. The protocol then is like this: you start at the root node and toss the coin; depending on the outcome, you then either follow the left or the right edge down. You do this at every node until you end up at a leaf. There, you terminate by announcing the label of this leaf as the outcome of the protocol. Since you land at any given leaf with probability 2 depth2^{-depth}, the simulation is correct if the sum of all these probabilities matches the desired outcome probability specified by p\mathbf{p}. The expected number of fair coin tosses is given by the sum over all nodes, again weighted by 2 depth2^{-depth}.

Then, I think that the finite-entropy distributions are precisely those which can be simulated in this sense, using a finite expected number of tosses. But I haven’t actually proven this yet. Also, it may well be well-known to those who know things well.

Of course, the fair coin here could be replaced by any other nontrivial distribution (with finite entropy). It’s just simplest to explain things in terms of a fair coin.


So do you have an inkling on whether boundedness on Δ 2\Delta_2 could replace continuity in Faddeev’s characterization (Theorem 2.4 in your lecture notes)?

Posted by: Tobias Fritz on March 8, 2017 5:53 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

Thanks. My question about boundedness was a stupid one: of course, HH isn’t bounded! My excuse is that I was commenting fast, in a hurry to leave the office and get home. Serves me right :-)

So do you have an inkling on whether boundedness on Δ 2\Delta_2 could replace continuity in Faddeev’s characterization (Theorem 2.4 in your lecture notes)?

I don’t have an inkling, I’m afraid. I’m overawed by people who speak a foreign language well enough to use words like “inkling”, but that doesn’t help.

If I seriously wanted to weaken the continuity condition in Faddeev’s theorem by a boundedness condition, I might warm up by learning more about Cauchy’s equation. I know that any measurable additive function \mathbb{R} \to \mathbb{R} is linear, and that measurability can be weakened to “bounded on a set of positive measure”, but I’ve never read the proof of either of these theorems.

Incidentally:

I confess to never actually having properly read the book, despite having worked on the subject…

I definitely haven’t “read” Aczél and Daróczy’s book either. I don’t think it’s a book one reads so much as consults. At first I found it forbiddingly technical, and it’s true that the style is very particular, but it’s not so bad once you get used to it.

Posted by: Tom Leinster on March 8, 2017 6:18 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

I’m overawed by people who speak a foreign language well enough to use words like “inkling”

I learned this word here at the Café! That was some time before I learned ‘overawed’ ;)

I’ve just done a quick literature search. Turns out that Dároczy and Kátai have shown in 1970 that boundedness is indeed enough. (The paper starts at p.83.) A closely related 1975 article by Diderrich seems to establish the same result, but without assuming nonnegativity of the measure of information.

This comes quite close to establishing my conjecture. The main thing that’s missing is to extend the characterization from finitely supported distributions to the ones in Δ \Delta_\infty. I’m confident that this is possible, but it may still require a good deal of work. If anyone wants to give it a try, let me know so that I can provide some hints on how to go about it.

Posted by: Tobias Fritz on March 8, 2017 6:50 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

Thanks. I see that both those papers you found make reference to what Aczél and Daróczy call, in their book, the “Fundamental equation of information”:

f(x)+(1x)f(y1x)=f(y)+(1y)f(x1y) f(x) + (1 - x) f \Bigl(\frac{y}{1 - x}\Bigr) = f(y) + (1 - y) f \Bigl(\frac{x}{1 - y}\Bigr)

which always seems to come along with the further conditions that f(0)=f(1)f(0) = f(1) and f(1/2)=1f(1/2) = 1. Here ff is a real-valued function on [0,1][0, 1], and the equation is to hold for all x,y[0,1)x, y \in [0, 1) such that x+y1x + y \leq 1. The “obvious” solution ff is

f(x)=H(x,1x)=xlogx(1x)log(1x) f(x) = H(x, 1 - x) = - x \log x - (1 - x) \log(1 - x)

where logarithms are taken to base 2.

When I said before that I’d stripped out a lot of unnecessary stuff in Aczél and Daróczy’s characterization theorem for surprise entropy, this was part of what I stripped out. I confess, I don’t know what’s so “fundamental” about this equation.

I notice that where Daróczy and Kátai cite a certain German-language paper of Faddeev, they spell his name Faddejew. Rényi, citing the same paper, spells it Fadeev. I don’t have that paper myself, but I do have an earlier Russian-language version of it, where his name is spelt Фаддеев, and his Russian Wikipedia article confirms this spelling. (Incidentally, his son, the well-known physicist Ludvig Faddeev, died only eleven days ago.)

Posted by: Tom Leinster on March 9, 2017 7:34 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

So, now we want to know: are Elliptic Entropies interesting?

Posted by: Jesse C. McKeown on March 9, 2017 4:46 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

I don’t know — I’ve never heard of them! Can you tell us something about them to whet our appetites?

Posted by: Tom Leinster on March 9, 2017 7:36 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

I’m merely reacting to how, in the latest notes, you point out (in different words, as an exercise) that the qq-logarithm maps ordinary mulitplication to the Mulitplicative Formal Group: x+ 1qy=x+y+(1q)xy; x+_{1-q} y = x + y + (1-q) x y; so I was wondering if the third famous family of Formal Groups had turned up in quantified-diversity questions.

Posted by: Jesse C. McKeown on March 10, 2017 8:48 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

Sorry Jesse, you’re still being too cryptic for me. What is the “third famous family of Formal Groups”? What does this have to do with the phrase “elliptic entropies” that you used before?

Posted by: Tom Leinster on March 10, 2017 9:41 PM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

So, there are about three types of abelian group manifolds that locally look like \mathbb{C}: a maximal-rank discrete lattice subgroup can have rank 2 or 1 or 0; the rank 2 case is modeled by \mathbb{C}, (or because it doesn’t really care about complex multiplication, by 2\mathbb{R}^2) with the usual addition; and the rank-1 case is just as nicely ×\mathbb{C}^{\times}, the nonzero complex numbers with complex multiplication; and the rank 0 cases are the Elliptic Curves. As groups, they’re just tori, but they can have lots of different complex structures.

Anyways, the third family of formal group laws would be a power series x+ Ey=x+y+O(xy) x +_E y = x + y + O(x y) expressing an elliptic group in some natural chart near the identity on an elliptic curve EE. And as it happens, every formal group law over the rationals (or other field of characteristic zero) has its own logarithm over the same field. That might mean the distinction between them as formal group laws isn’t interesting, but the ln qln_q in your notes’ examples does different analytic things than ordinary lnln, so… I was just wondering.

It might even be a terribly silly wonder, to the extent that “logarithm of a formal group law” really can mean “any power series of the form x+O(x 2)x + O(x^2)”; but, say, taking a projective chart of a projective curve of genus 1 would narrow the playing field a bit.

Posted by: Jesse C. McKeown on March 11, 2017 12:13 AM | Permalink | Reply to this

Re: Functional Equations V: Expected Surprise

Thanks!

There was some discussion in class about the word “deformed”, which I’d used in reference to these surprise entropies. People were asking to what extent this fit in with more mainstream uses of the word “deformation”. I confessed that it was just something I’d made up.

On the other hand, there is perhaps some justification. The multiplicative formula for the qq-logarithm,

ln q(xy)=ln q(x)+ln q(y)+(1q)ln q(x)ln q(y), \ln_q(x y) = \ln_q(x) + \ln_q(y) + (1 - q) \ln_q(x) \ln_q(y),

gives rise to a multiplicative formula for the function d q:xxln q(x)d_q \colon x \mapsto x \ln_q(x),

d q(xy)=xd q(y)+yd q(x)+(1q)d q(x)d q(y), d_q(x y) = x d_q(y) + y d_q(x) + (1 - q) d_q(x) d_q (y),

which in turn gives rise to a multiplicative formula for the surprise entropy of order qq,

S q(pr)=S q(p)+S q(r)+(1q)S q(p)S q(r). S_q(\mathbf{p} \otimes \mathbf{r}) = S_q(\mathbf{p}) + S_q(\mathbf{r}) + (1 - q) S_q(\mathbf{p}) S_q(\mathbf{r}).

This is different from the multiplicative formula for entropy in my notes, but more resembles the kind of formulas in your comment.

The multiplicative formula for entropy in my notes was

S q(pr)=S q(p)+(p i q)S q(r). S_q(\mathbf{p} \otimes \mathbf{r}) = S_q(\mathbf{p}) + \Bigl( \sum p_i^q \Bigr) S_q(\mathbf{r}).

This has the notable feature that the right-hand side is non-symmetric in p\mathbf{p} and r\mathbf{r} except in the “classical” case q=1q = 1. So there are echoes of situations in deformation theory, where we have a classical algebra that’s commutative, but its deformations are all noncommutative.

Posted by: Tom Leinster on March 11, 2017 9:25 PM | Permalink | Reply to this

Post a New Comment