Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

May 2, 2022

Shannon Entropy from Category Theory

Posted by John Baez

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here.

It’s fun to talk about work that I did with Tobias Fritz and Tom Leinster here on the nn-Café — I’ve never given a talk where I went into as much detail as I will now. In fact I will talk a bit about all these:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

Posted at May 2, 2022 1:37 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3394

7 Comments & 0 Trackbacks

Matrices

It seems a little weird to be working with actual functions between finite sets, rather than matrices where each row is a probability measure. Is this what’s happening when you talk about “convex combinations of functions”? I couldn’t quite parse that slide.

Posted by: Allen Knutson on May 2, 2022 12:21 PM | Permalink | Reply to this

Re: Matrices

If my understanding is correct, the notion of a convex linear combination of morphisms is a (well-motivated) “notation trick”. With a different choice of notation it might be less confusing.

To make sense of it in a very wordy way, note that there is a forgetful functor:

(1)F:FinMeasFinSet F \colon \mathbf{FinMeas} \longrightarrow \mathbf{FinSet}

acting on objects by F[(X,p)]=XF[(X,p)] = X, and a morphism f:(X,p)(Y,q)f \colon (X, p) \to (Y, q) in FinMeas\mathbf{FinMeas} by forgetting the information about pp and qq and just remembering the function XYX \to Y, which we denote F(f)F(f).

As a remark: remember that a morphism in FinMeas is simply a morphism in FinSet\mathbf{FinSet} that satisfies some condition based on the extra data of the measures. Thus, the forgetful functor above roughly preserves data, but forgets information about condition (which wouldn’t make sense in FinSet anyway).

With that in mind, for any two morphisms f:(X,p)(X,p)f \colon (X, p) \to (X', p') and g:(Y,q)(Y,q)g \colon (Y,q) \to (Y', q') the convex linear combination: λf+(1λ)g\lambda f + (1 - \lambda) g for any λ[0,1]\lambda \in [0,1] is given by the data of its underlying function:

(2)F(λf+(1λ)g)=F(f)F(g):XYXY F(\lambda f + (1 - \lambda) g) = F(f) \coprod F(g): X \coprod Y \longrightarrow X' \coprod Y'

and it is guaranteed to satisfy the appropriate condition that makes it into a morphism from (XY,λp+(1λ)q)(X \coprod Y, \lambda p + (1-\lambda) q) to (XY,λp+(1λ)q)(X' \coprod Y', \lambda p' + (1-\lambda)q'). Note that the value of λ\lambda doesn’t tell us anything about the underlying functions in FinSet\mathbf{FinSet}, but tells us about info about what objects we are considering in FinMeas\mathbf{FinMeas}.

That might clarify things a bit more.

Posted by: Tom Mainiero on May 12, 2022 5:36 PM | Permalink | Reply to this

Re: Matrices

To actually answer your question though: I don’t think the convex combination construction of morphisms has to do anything with some linear structures when you interpret functions between finite sets as matrices.

Posted by: Tom Mainiero on May 12, 2022 6:05 PM | Permalink | Reply to this

Re: Matrices

Everything Tom just said is true.

Posted by: John Baez on May 13, 2022 8:06 PM | Permalink | Reply to this

Re: Matrices

Allan wrote:

It seems a little weird to be working with actual functions between finite sets, rather than matrices where each row is a probability measure.

The latter are called ‘stochastic maps’ and they don’t always decrease entropy. But measure-preserving functions — actual functions between finite sets — always decrease entropy. As I point out, the latter fact is an example of the ‘data processing inequality’. And my slides give this slogan for what’s going on:

Deterministic processing of random data always decreases entropy!

Allan wrote:

Is this what’s happening when you talk about “convex combinations of functions”? I couldn’t quite parse that slide.

Here’s what the slide says:


We can define convex linear combinations of objects in FinProbFinProb. For any 0λ10 \le \lambda \le 1, let

(1)λ(X,p)+(1λ)(Y,q) \lambda (X,p) \;+ \;(1 - \lambda) (Y,q)

be the disjoint union of XX and YY, with the probability distribution given by λp\lambda p on XX and (1λ)q(1-\lambda)q on YY.

We can also define convex linear combinations of morphisms.

(2)f:(X,p)(X,p),g:(Y,q)(Y,q) f : (X,p) \to (X',p') , \qquad g : (Y,q) \to (Y', q')

give

(3)λf+(1λ)g:λ(X,p)+(1λ)(Y,q)λ(X,p)+(1λ)(Y,q) \lambda f + (1-\lambda)g : \lambda (X,p) + (1 - \lambda) (Y,q) \to \lambda (X',p') + (1 - \lambda) (Y',q')

This is simply the function that equals ff on XX and gg on YY.


As I emphasized in my actual talk, the big scary-looking equation (3) is really no big deal: it’s just saying the source and target of a function between finite sets, and its name! And the function itself is very simple: it’s just the function that equals ff on the set XX and gg on the set YY.

The point is that taking convex linear combinations is functorial on FinProbFinProb: we can do it both to objects and morphisms, and the usual rules of a functor apply. So what I’m really saying is the FinProbFinProb is a category internal to the category of convex spaces.

Posted by: John Baez on May 13, 2022 8:03 PM | Permalink | Reply to this

Slides

I don’t see the slides when I click on the link.

Posted by: Mozibur Rahman Ullah on May 11, 2022 7:04 AM | Permalink | Reply to this

Re: Slides

I do. But if you go here, you should be able to see the slides and a video both for my talk and Tai-Danae Bradley’s related talk “Operads and entropy”.

Posted by: John Baez on May 13, 2022 7:45 PM | Permalink | Reply to this

Post a New Comment