## February 28, 2017

### Functional Equations IV: A Simple Characterization of Relative Entropy

#### Posted by Tom Leinster

Relative entropy appears in mathematics, physics, engineering, and statistics under an unhelpful variety of names: relative information, information gain, information divergence, Kullback-Leibler divergence, and so on and on. This hints at how important it is.

In the functional equations course that I’m currently delivering, I stated and proved a theorem today that characterizes relative entropy uniquely:

Theorem   Relative entropy is essentially the only function that satisfies three trivial requirements and the chain rule.

You can find this on pages 15–19 of the course notes so far, including an explanation of the “chain rule” in terms of Swiss and Canadian French.

I don’t know who this theorem is due to. I came up with it myself, I’m not aware of its existence in the literature, and it didn’t appear on this list until I added it. However, it could have been proved any time since the 1950s, and I bet someone’s done it before.

The proof I gave owes a great deal to the categorical characterization of relative entropy by John Baez and Tobias Fritz, which John blogged about here before.

Posted at February 28, 2017 7:05 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2944

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

A problem I had in preparing for this week’s session was that I couldn’t think of a good example of two languages that use the same set of accents. If they use different accents, then the relative entropy one way round or the other is infinite, which makes things rather trivial. Swiss and Canadian French was the best I could do.

The only other suggestions I’ve been given are other variants of French, or the various versions of German or Dutch.

So, can anyone think of two different languages that use precisely the same accents?

Because some people reading this will be mathematicians, I’d better say explicitly: answers involving languages with no accents don’t count!

Posted by: Tom Leinster on February 28, 2017 7:58 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Here are some examples, though they might be a bit esoteric:

Basque, Quechua, and Aymara each only use ñ.

Swedish and Finnish both use only å, ä, and ö.

Māori, Niuean, the official Romanization of Hindi, and the most common Romanization of Japanese all use only ā, ē, ī, ō, and ū. Hawaiian and Tahitian also use only these diacritics, but use ʻ or ’ as a distinct letter. Each of these languages doesn’t use some standard English letters, e.g. x.

Posted by: Arun Debray on March 1, 2017 3:03 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Thanks — that’s impressive knowledge!

Posted by: Tom Leinster on March 2, 2017 9:45 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

The example of Danish and Norwegian is perhaps too degenerate? Their alphabet has å and ø as the only accented letters, plus the additional letter æ.

Posted by: Tobias Fritz on February 28, 2017 9:23 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Aha, good one! Joachim Kock once told me the story of why Swedish has different accents from Danish and Norwegian; it’s more interesting than it might sound! Perhaps he’ll drop in and tell us.

Posted by: Tom Leinster on February 28, 2017 11:54 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Swedish is more interesting than it might sound? That reminds me of Mark Twain’s comment about the music of Wagner.

(If you post an article about some deep math and make a trivial digression, people will talk endlessly about the latter. I’m actually fascinated by your new ideas on relative entropy. And yes, I’m deliberately misunderstanding your remark.)

Posted by: John Baez on March 2, 2017 9:27 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Well, I thought I’d throw in a trivial digression to get people talking, at least. These posts haven’t exactly been inundated with comments. Honestly, seeing a post entitled “Such-and-Such Part IV” is going to put off most people, as they’ll probably assume they won’t understand anything unless they followed parts 1–3. And I’m not writing much to tempt readers in the posts themselves, because I’ve already used up my explanatory energy writing the actual notes. Anyway, that’s honestly fine: not every post has to provoke a long public conversation.

While I have your ear, let me ask you something that came up in Tuesday’s class. As you know, the characterization theorem that I presented there is based heavily on your paper with Tobias. In my notes, Lemma 2.16 is introduced this way:

Now we come to a clever part of Baez and Fritz’s argument.

(It’s derived from your Lemma 14.) When I presented it in class, I could give no explanation whatsoever except that it works. That bothers me, as you can imagine.

How on earth did you and Tobias come up with it? What’s the idea?

Posted by: Tom Leinster on March 2, 2017 9:58 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

I’m afraid that I also can’t give any explanation whatsoever, except that it just works – if there was any intuition for it, I’m sure that we would have mentioned it in the paper. Finding that lemma took a good deal of hard work, toying around with ‘bootstrapping’ the theorem by proving it on larger and larger classes of morphisms.

I still find it somewhat surprising that we didn’t actually need to use induction on the cardinalities involved. This could enable a generalization of our argument to a suitable class of infinite probability spaces; in fact I’ve heard of somebody who seems to have done this, but haven’t seen any details yet. Your new theorem may well be relevant for this, too.

Posted by: Tobias Fritz on March 3, 2017 4:42 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

OK, thanks. I would have been pleased and impressed if you’d been able to offer an intuitive explanation, but I’m possibly even more impressed that you managed to find the argument without any overarching intuition to guide you.

In theorems that uniquely characterize information-theoretic quantities, it’s common to have some steps in the proof that look something like “let’s decompose $(1/2, 1/2, 0)$ in two different ways” (or the same but with some other little distribution). Your argument is also about decomposing a certain distribution in two ways, but it’s much more complicated. I tried to find a general framework to fit it in, but didn’t manage.

Posted by: Tom Leinster on March 6, 2017 5:40 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

I am very interested in the course you are teaching and I appreciate the link to the notes. Unfortunately I won’t have the time till late May to actually read the notes. Any chance you could link the notes from your home page? I looked at your pages but couldn’t find any links to your FA course (except indirectly through the n-cat posts).

Posted by: Eugene on March 7, 2017 3:51 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

Thanks. There’s not much point me linking to notes from my home page just yet, as whatever I linked to would need to be replaced every week. But as long as I remember, I’ll put a copy of the finished notes (well, as “finished” as they’ll ever be) on my web page once the course is over. If I forget, I’d enjoy being reminded!

Posted by: Tom Leinster on March 7, 2017 6:25 PM | Permalink | Reply to this

### Re: Functional Equations IV: A Simple Characterization of Relative Entropy

I just wrote up the characterization theorem for relative entropy:

Tom Leinster, A short characterization of relative entropy. arXiv:1712.04903, 10 pages, 2017.

As a bonus, I also threw in short and simple characterization theorems for $q$-logarithmic entropy and $q$-logarithmic relative entropy. These are often called “Tsallis (relative) entropy”, though they shouldn’t be, as other people defined them, used them, and proved theorems about them for twenty years before Tsallis rediscovered them. I called them “surprise entropy” and “surprise relative entropy” in the seminar course.

Posted by: Tom Leinster on December 14, 2017 8:25 PM | Permalink | Reply to this

Post a New Comment