## June 5, 2011

### Category-Theoretic Characterizations of Entropy III

#### Posted by John Baez

Some of us recently tried an experiment here: writing a paper on the blog. It was a great success, and now we’re basically done:

While the discussion leading up to this paper (here, here and here) was intense and erudite, the final result is simple and sweet. If the paper itself doesn’t make that clear to you, maybe this summary will:

All this concerns entropy for classical systems. But now Tobias and I are trying to generalize our work to quantum systems. They work a bit differently. Let me explain.

In what follows, I won’t assume any knowledge of quantum theory, or physics at all. I’ll try to define all my terms and state the problem we’re facing in a purely mathematical way. It’s pretty simple.

In our previous ‘classical’ conversation, we talked about the Shannon entropy of a probability measure on a finite set. Following a proposal by Tobias, in our new ‘quantum’ discussion we’ll talk about the von Neumann entropy of a state on a finite-dimensional C*-algebra. I think it’s easiest to understand this idea if we focus on how it generalizes what we did before.

The trick is to view our previous conversation a little bit differently. Instead of focusing on a finite set, we’ll now focus on the algebra of complex-valued functions on that finite set. As we’ll see, this kind of algebra is precisely a ‘commutative C*-algebra’. (Don’t worry, I’ll define that term in a minute.) And then we can try to generalize all our ideas to the not-necessarily-commutative case.

A lot of this is already known. Here’s how it goes.

Definition. A $\ast$-algebra is an associative algebra $A$ over the complex numbers (with multiplicative unit $1 \in A$) equipped with an operation $* : A \to A$ obeying $(a + b)^* = a^* + b^*$ $(a b)^* = b^* a^*$ $(\lambda a)^* = \overline{\lambda} a^*$ and $a^{* * } = a$ for all $a,b \in A$ and $\lambda \in \mathbb{C}$.

Definition. A C*-algebra is a $\ast$-algebra $A$ equipped with a norm making it into a Banach space and obeying $\|a b\| \le \|a \| \|b \|$ and $\|a^* a\| = \|a\|^2$ for all $a, b \in A$.

We will only be interested in finite-dimensional C*-algebras, and then the condition ‘making it into a Banach space’ is redundant.

If you want an example, fear not: finite-dimensional C*-algebras are easy to classify, so I will give you all the examples:

Theorem. Every finite-dimensional C*-algebra is isomorphic to a finite direct sum of matrix algebras.

To be a bit more explicit, the algebra $M_n(\mathbb{C})$ of complex $n \times n$ matrices can be made into a C*-algebra in a unique way. We define the norm of an $n \times n$ matrix $a$ to be $\|a\| = sup_{\|\psi\| = 1} \|a \psi\|$ where the supremum is taken over all unit vectors $\psi \in \mathbb{C}^n$. We define $a^*$ to be the adjoint (that is, conjugate transpose) of the matrix $a$. There is a straightforward way to make the direct sum of two C*-algebras into a C*-algebra, which I’ll gladly explain if you want. So, that gives us lots of finite-dimensional C*-algebras. Finally, we’ll only be interested in homomorphisms of C*-algebras that preserve both the algebra structure and the $\ast$ operation (but not necessarily the norm). So, ‘isomorphic’ means isomorphic in a way that preserves both these structures. Such an isomorphism automatically preserves the norm as well.

As a corollary, we have:

Theorem. Every finite-dimensional commutative C*-algebra is isomorphic to the algebra of complex functions on a finite set.

The reason is that only $1 \times 1$ matrix algebras are commutative. So, our previous result implies that a finite-dimensional commutative C*-algebra is isomorphic to a finite direct sum of $1 \times 1$ matrix algebras. But a finite direct sum of $1 \times 1$ matrix algebras is isomorphic to the algebra of complex functions on a finite set!

In case you’re a stickler for detail, I should add that the algebra of functions on a finite set can be made into a C*-algebra in a unique way, which I’ll again be glad to explain if you ask. If $X$ is a finite set, we’ll write $C(X)$ for the C*-algebra of complex-valued functions on $X$.

We can recover $X$ from $C(X)$. So, the idea is that finite-dimensional commutative C*-algebras really ‘are’ just finite sets. The process of generalizing our work from ‘classical’ to ‘quantum’ situations simply amounts to generalizing various ideas from commutative C*-algebras to not-necessarily-commutative ones. But we think of it as generalizing ideas from finite sets to finite direct sums of matrix algebras.

I should be a bit more precise here. If we have a map $f: X \to Y$ between finite sets, we can pull back any complex-valued function on $Y$ to one on $X$, and this gives a homomorphism from $C(Y)$ to $C(X)$. Note the contravariance! So we have a contravariant functor from the category of finite sets to the category of finite-dimensional commutative C*-algebras, say: $FinSet^{op} \to FinCommC^*$ and in fact this is an equivalence of categories. So, when I said that finite-dimensional commutative C*-algebras really ‘are’ just finite sets, I was lying slightly: there’s an ‘op’ we need to be careful about. Of course we also have an inclusion of categories $FinCommC^* \hookrightarrow FinC^*$ where we forget about commutativity, so we get an inclusion $FinSet^{op} \hookrightarrow FinC^*$ This is lets us take ideas about finite sets and try to generalize them to finite direct sums of matrix algebras.

And it’s well-known how to do this, when it comes to entropy! Let me summarize.

1. Given a complex function on a finite set we can sum it over that set. Similarly, given a matrix we can take its trace: the sum of its diagonal entries. Both these are special cases of the trace on a finite direct sum of matrix algebras: for this, you just take the trace of each matrix in the direct sum and add them all up. So, where you say $\sum_{i \in X}$ in our old work, now you’ll see $tr$.

2. A complex function on a finite set is nonnegative iff it’s of the form $a a^*$ for some other function on that set. So, we define an element of a C*-algebra to be nonnegative iff it’s of the form $a a^*$ for some other element of that C*-algebra. In particular, a matrix is nonnegative iff it has an orthonormal basis of eigenvectors with nonnegative eigenvalues.

3. A probability distribution on a finite set is a complex function on that set that’s nonnegative and sums to 1. So, we define a state on a finite-dimensional C*-algebra $A$ to be an element $p \in A$ that’s nonnegative and has $tr(p) = 1$.

4. Given a probability distribution on a finite set, say $p: X \to \mathbb{C}$, its Shannon entropy is $H(p) = - \sum_{i \in X} p_i \, \ln(p_i)$ So, given a state $p$ on a finite-dimensional C*-algebra $A$, we define its von Neumann entropy to be $H(p) = - tr(p \, \ln(p))$ Here you may wonder how we’re defining $\ln(p)$. It’s enough to explain this for matrix algebras, since if your C*-algebra is a finite direct sum of these, you just take the logarithm in each piece. If $p$ is nonnegative matrix it has a orthonormal basis of eigenvectors with nonnegative eigenvalues. So, to get $\ln(p)$, just keep the same eigenvectors and take the logarithm of each eigenvalue. The problem of taking the logarithm of zero is irrelevant, just as it was for Shannon entropy.

We studied Shannon entropy using $FinProb$, the category of finite sets equipped with probability distributions. What about von Neumann entropy? Let’s try the same thing and define a category $FinState$ to be the category of finite-dimensional C*-algebras equipped with states.

An object of $FinState$ is a pair $(A,p)$ where $A$ is a finite-dimensional C*-algebra and $p$ is a state on $A$. But what’s a morphism in $FinState$?

Given a probability distribution on a finite set $X$, we can push it forwards along a map $f: X \to Y$ and get a probability distribution on $Y$. But remember, we’ve got an inclusion of categories $FinSet^{op} \hookrightarrow FinC^*$ Beware the deadly op! So we should hope that given a state $q$ on a finite-dimensional C*-algebra $B$ and a homomorphism $f: A \to B$, we can pull $q$ back to get a state on $A$. And this is true. I’ll be glad to explain how if you ask.

So, a morphism $f: (A,p) \to (B,q)$ in $FinState$ is a homomorphism of C*-algebras $f: A \to B$ such that pulling back $q$ along $f$, we get $p$. With this definition we get an inclusion $FinProb^{op} \hookrightarrow FinState$ which embeds our old ‘classical’ story into the new ‘quantum’ story we’re telling now.

Now, given a morphism $f: (A,p) \to (B,q)$ in $FinState$ we can copy our previous work and define the entropy change $\Delta H(f) = H(q) - H(p)$ more or less as we did in our previous work, modulo the deadly ‘op’. But there’s one big difference right away: this change in entropy can be either positive or negative! Before it always took the same sign.

So, we might hope that $\Delta H$ gives a functor $\Delta H: FinState \to \mathbb{R}$ where we treat $\mathbb{R}$ as a category with one object, real numbers as morphisms, and addition as composition. And it’s true! Indeed, it’s utterly obvious.

So the fun starts when we try to characterize $\Delta H$ as the unique functor, up to a constant multiple, such that some conditions hold.

Before, when we did this for Shannon entropy, we needed two conditions (in addition to a condition that says we have a functor). Both those conditions have analogues for von Neumann entropy… and they’re both true!

1. Convex linearity. Given objects $(A,p)$ and $(B,q)$ in $FinState$, and a number $0 \le \lambda \le 1$, we can take a direct sum of C*-algebras and a convex linear combination of states to get an object $(A \oplus B, \lambda p + (1-\lambda) q)$. Given morphisms $f : (A,p) \to (A',p'), \qquad g: (B,q) \to (B',q')$ their direct sum gives a morphism $f \oplus g: (A \oplus B, \lambda p + (1-\lambda) q) \to (A' \oplus B', \lambda p' + (1-\lambda) q')$ To be cute, and to copy what we did in the Shannon entropy case, we could call this morphism $\lambda f + (1-\lambda) g$ Beware! We aren’t really taking a linear combination of the C*-algebra homomorphisms $f$ and $g$, so perhaps this is bad notation. But it’s cute, because then this condition holds: $\Delta H(\lambda f + (1-\lambda) g) = \lambda \Delta H(f) + (1-\lambda) \Delta H(g)$ At least I think it does—I’ll have to check.

2. Continuity. The entropy change $\Delta H(f)$ depends continuously on $f$. This is slightly subtler than in the Shannon case, because we can ‘slightly change’ a homomorphism between noncommutative C*-algebras, while we can’t ‘slightly change’ a map between finite sets: we can only slightly change the probability distributions on those sets. But it’s still true.

So, let us try to settle this question:

Question. Is every convex-linear and continuous functor $F: FinState \to \mathbb{R}$ a constant multiple of the change in von Neumann entropy? In other words, does there exist a constant $c \in \mathbb{R}$ such that $F(f) = c \; \Delta H(f)$ for all morphisms $f$ in $FinState$?

One fly in the ointment is that now entropy change can take either sign. This is why $F$ takes values in $\mathbb{R}$ rather than $\mathbb{R}^+$. And this means we can’t instantly conclude that $F$ of an isomorphism is zero, as we did before. Maybe we need more conditions now. Or maybe we just need to be smarter.

There’s a lot more to say, but this should at least start the conversation…

By the way:

Puzzle. If you wanted to classify finite-dimensional C*-algebras using pictures of some famous sort, what kind of pictures could you use?

Posted at June 5, 2011 3:08 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2404

### Re: Category-Theoretic Characterizations of Entropy III

But there’s one big difference right away: this change in entropy can be either positive or negative!

There is Araki’s notion of relative entropy of states on von Neumann algebras (a reference is here). In the finite dimensional case this is

$S(\phi / \psi) = tr\left( \rho_\psi (log \psi - log \phi) \right) \,.$

This enjoys what is called the “strict positivity” property, which says that $\phi(1) = \psi(1)$ implies that $S(\phi/\psi) \geq 0$.

Posted by: Urs Schreiber on June 5, 2011 1:57 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

In the finite-dimensional case, Araki’s concept just boils down to ‘ordinary’ quantum relative entropy: when $\rho$ and $\sigma$ are density matrices on the same Hilbert space, then the quantum relative entropy is

$\tr[\rho(\log\rho-\log\sigma)]$

which reproduces classical relative entropy in the case that both density matrices are diagonal. And indeed, this is always non-negative.

But what does this have to do with the entropy change under morphisms in $\Fin\State$? I don’t see the relation.

Maybe we can discuss this with some concrete examples in mind which show that the entropy change can take on both signs.

Example $(-)$: The entropy change can be negative. This we already know from the classical case, so we can consider the classical example explained in the third paragraph of our paper. In $C^\ast$-algebra world, classical means commutative, so let’s consider the $C^\ast$-algebras $\mathbb{C}$ and $\mathbb{C}^2$. By linearity and preservation of the unit element, there’s only a single morphism $\mathbb{C}\rightarrow\mathbb{C}^2$ in $\FinC^\ast$. Under the contravariant equivalence to $\Fin\Set$, this is the unique map from a two-element set to a one-element set.

Following John’s definition from above, a state on $\mathbb{C}^2$ is of the form $(\lambda,1-\lambda)\in\mathbb{C}^2$ for $\lambda\in [0,1]$. On $\mathbb{C}$, there is a unique state, namely $1\in\mathbb{C}$. For any $\lambda$, these two states are compatible with the unique morphism $\mathbb{C}\rightarrow\mathbb{C}^2$.

So here, the entropy of the domain is $0$, while for non-trivial $\lambda$ it is non-zero on the codomain. Since we should think of the morphism as a process from the codomain to the domain, this is an entropy loss.

Example $(+)$: The entropy change can be positive. Let’s consider the $C^\ast$-algebras $\mathbb{C}^n$ and $M_n(\mathbb{C})$. There is a morphism $\mathbb{C}^n\rightarrow M_n(\mathbb{C})$ which corresponds to the inclusion of the subalgebra of diagonal matrices. Pulling back a state from $M_n(\mathbb{C})$ to $\mathbb{C}^n$ simply means restricting to the diagonal entries, which is a nice map $M_n(\mathbb{C})\rightarrow \mathbb{C}^n$.

Now we can for example consider the pure states $p\in M_n(\mathbb{C})$, which are defined to be the rank-$1$ projection operators. (It should be clear that these are states.) These all have an entropy of $0$. However, the diagonal entries of a generic pure state are an $n$-element probability distribution with non-zero entropy! In this case, we have a morphism in $\Fin\State$ with an entropy gain. In quantum mechanics, this phenomenon is part of the theory of decoherence. Or, a bit more elementary: pulling back to $\mathbb{C}^n$ turns a quantum state into the outcome probability distribution of a measurement! So, we recover a basic fact of quantum theory: a measurement on a quantum state can have entropy even when the quantum state does not, since the measurement outcome are not deterministic!

Posted by: Tobias Fritz on June 6, 2011 3:05 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

But what does this have to do with the entropy change under morphisms in $FinState$ ?

Ah, then I misunderstood the motivation. I had gotten the impression that there was bewilderment on why the sign could change, and just meant to point to the notion of entropy where it does not. Sorry for the confusion.

Posted by: Urs Schreiber on June 6, 2011 4:26 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

At one point I wanted to use relative entropy to make states into a Lawvere metric space, which is a category enriched over $[0,\infty]$. The relative entropy is always nonnegative:

$S(\phi / \psi ) \ge 0$

It’s not symmetric; in general

$S(\phi / \psi ) \ne S(\psi / \phi)$

but that’s okay for a Lawvere metric space. The real problem is that the triangle inequality fails!

If we try to cure this problem, we’re naturally led to the Fisher information metric.

Posted by: John Baez on June 7, 2011 10:40 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

The real problem is that the triangle inequality fails!

That’s because the relative entropy is a kind of squared distance, e.g., comparison to squared Hellinger distance, Pythagorean relation, etc.

A divergence is not necessarily symmetric, that is, the relation $D[P : Q] = D[Q : P]$ does not generally hold, nor does it satisfy the triangular inequality. It usually has the dimension of squared distance, and a Pythagorean-like relation holds in some cases. (Amari)

Posted by: David Corfield on June 7, 2011 12:30 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

John wrote:

The real problem is that the triangle inequality fails!

David wrote:

That’s because the relative entropy is a kind of squared distance…

Are you claiming that the square root of relative entropy obeys the triangle inequality? That would be great! But I have the sad feeling you’re not really claiming that.

Posted by: John Baez on June 7, 2011 12:41 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

In 2008, I asked David exactly this question (using the synonym “Kullback-Leibler divergence” for “relative entropy”). David replied:

I’ve never seen it said anywhere that the square root of the KL-divergence satisfies the triangle inequality, but then I’ve never seen it said that it fails.

Then Suresh Venkat replied (using the synonym “KL distance” for “relative entropy”):

Here’s what it is known: if you construct the symmetrized version of the KL distance, termed the Jensen-Shannon distance (JS(p,q) = 0.5(KL(p,m) + KL(q,m)), m = 0.5(p+q)), then this is also a Bregman divergence, AND its square root is a metric.

I don’t know what a Bregman divergence is, but perhaps that doesn’t matter here. Let me prettify Suresh’s formula: $JS(p, q) = \frac{1}{2} \Bigl( KL\Bigl(p, \frac{p + q}{2}\Bigr) + KL\Bigr(q, \frac{p + q}{2}\Bigr) \Bigr).$ According to Suresh, $\sqrt{JS(-, -)}$ is a metric.

I also don’t know how important the symmetrization is here. Probably Suresh is like most people in using the word “metric” to include the symmetry axiom, but some of us round here are more relaxed about that. It’s the triangle inequality we’re concerned with, anyway.

Posted by: Tom Leinster on June 7, 2011 1:58 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Also, the symmetrization that Suresh mentions is curiously elaborate. If someone said to me “symmetrize the function $\rho(-, -)$”, I’d think of either $\frac{1}{2} (\rho(p, q) + \rho(q, p))$ or, in some circumstances, $\max\{ \rho(p, q), \rho(q, p) \}.$ (Now that I think about it, I guess any kind of mean of $\rho(p, q)$ and $\rho(q, p)$ would be reasonable.) But I wouldn’t think of writing down Suresh’s formula. I don’t understand what it really means.

I strongly suspect that the square root of entropy does not satisfy the triangle inequality. It can’t be that hard, and if it were true I think David or Suresh would know about it.

Posted by: Tom Leinster on June 7, 2011 2:08 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I thought I might as well bash it out.

No, the square root of relative entropy does not obey the triangle inequality. Here’s an example using probability measures on a two-element set. Take $p = (0.9, 0.1), \quad q = (0.2, 0.8), \quad r = (0.1, 0.9).$ Then, writing $H(p \Vert q) = \sum_i p_i \log \frac{p_i}{q_i},$ for the relative entropy (Kullback-Leibler divergence), we have $\sqrt{H(p \Vert q)} + \sqrt{H(q \Vert r)} - \sqrt{H(p \Vert r)} = -0.0447... \lt 0.$ Here I’ve used natural logarithms, but choosing a different base only affects the quantity computed by a positive constant factor: it’s still negative.

Posted by: Tom Leinster on June 7, 2011 3:00 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Well done. I was just wondering whether it would be worth bashing out a case and you’ve done one.

In the same family (Amari’s $\alpha$-divergences) as the KL-divergence is the squared Hellinger distance, whose square root is unsurprisingly a distance.

There’s a poster to be had of lots of different distances for probability distributions, with pictures of their inventors. One for the office perhaps.

Posted by: David Corfield on June 7, 2011 4:09 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

That poster is pretty nice — though the immediate reaction of one of my colleagues was “they’re all men”.

Posted by: Tom Leinster on June 9, 2011 12:53 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I guess my immediate reaction to that is: “and therefore?” (Sorry, I’m reading this several weeks after the fact.)

I’m wondering whether your colleague has examples of women mathematicians who were omitted and shouldn’t have been?

Posted by: Todd Trimble on June 27, 2011 7:01 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I think it was just an inconsequential comment. (The word “though” in my comment might have suggested more criticism than was intended.) I don’t know of any women who should have been on there, and I suspect my colleague doesn’t either; it’s not an area of expertise for either of us.

What I found interesting about that brief interaction was that I hadn’t noticed they were all men. There are about 20 faces on that poster, mostly from the 20th century. If it had been a poster of 20 20th century journalists, say, and they’d all been men, I reckon I would have noticed pretty quickly. But for mathematicians it simply didn’t register.

Posted by: Tom Leinster on June 27, 2011 9:19 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Tom wrote:

No, the square root of relative entropy does not obey the triangle inequality.

Oh, darn! But it’s nice that you checked. I’m completely taking your word for it; I haven’t redone your calculation. I don’t doubt you, but it would be interesting to understand how the ‘symmetrization’ in the Jensen-Shannon divergence saves the day.

As David point out, there are lots of metrics on probability distributions. The problem is finding some with clear conceptual interpretations. I like relative entropy because I know what it means: it’s ‘the amount of information gained when you start out thinking the right probability distribution is $q$ and someone tells you it’s $p$’.

The Jensen-Shannon divergence is something like ‘the expected amount of information gained when you start out thinking the right probability distribution is $(p+q)/2$ and someone flips a coin and tells you it’s either $p$ or $q$’. That’s a bit less compelling.

Posted by: John Baez on June 8, 2011 10:54 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

All those re-readings and alpha homogeneity
let a q sneak in.

${H}_{\alpha }\left(\lambda p\right)={\lambda }^{\alpha }{H}_{q}\left(p\right)$

It was shown that 2 probability measures extend to the Cartesian product and H( p X q ) = H(p) + H(q).
Is there a similar natural extension of p to power set of p = 2^p ?
And then what is H( 2^p )? That is, does the evaluation of the map of numbers to points act as sum on subsets in such a way as to be compatible with the uniform measure to give the power set a probability measure?

William

Posted by: William Patton on June 5, 2011 8:46 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

William wrote:

All those re-readings and alpha homogeneity let a $q$ sneak in.

I finally understand what you meant! I see that Tom has already corrected this typo in the PDF version of our paper, but I just fixed it now in the wiki version.

Is there a similar natural extension of p to power set of p?

So you’re trying to turn a probability distribution $p$ on a set $X$ into a probability distribution on $2^X$ in some nice way? Here’s one way: you go through the elements of $X$ one at a time, and put each element $i \in X$ into your subset $S$ with probability $p_i$.

Posted by: John Baez on June 8, 2011 11:06 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I was hoping for something cool involving the layered structure of 2^X along the binomial coefficients. Maybe it could even involve the Hodge star map.

I am not sure I follow the hint you give.
Lets try the easy case of the set {1,2,3}
and the uniform probability u = 1/3 for each point or even the heavy probability 1,0,0. If I write with lots of braces, 2^X becomes

[
{{}} , empty layer
{ 1 , 2, 3 } , singleton layer
{ {1,2} , {1,3} , {2,3} } , pair layer
{ { 1,2,3} } , triple layer
]

So I hope for 8 numbers with the empty layer number being 0
0
f1(u) f1(u) f1(u)
f2({u,u}) f2({u,u}) f2({u,u})
f3({u,u,u})

I do not really see what to do for f2() with your description.
Certainly I could take f1 = pi*1, f2,f3 = 0 . That just gives me back my original 3 numbers and the other 4 are 0. It is not very satisfying.

I think the following does work, maybe it is what you said?
In the description beginning with all the braces, put the p(i) in place of each i.
Then replace each pair of braces with the mean by the count of the subset size.

This gives
[ uniform………heavy
0 ………….. 0
3u/3 ……….. 1/3
(3(2u/2))/3 …. ( 1/2+1/2+0 )/3
3u/3 ……….. 1/3
]
it seems to be that to remove the bracket here I use ordinary sum in the reals,
maybe this is fair because at this layer there are no more Braces{} to enforce mean.
There are only numbers left to sum - no more sets.

I got here because I am uncomfortable about the issue of ignoring the Domain X
of the measure space {X,B,u}. In some sense I want the differential du so that for finite sets, integral on X ( 1x du) = 1 for the characteristic function 1x of X.

In particular, if Y contains X then a measurable F on (X,Bx,sx) with sum 1 does not extend to (Y,By,sy) by mapping to 0 the elements of Y-X. Pushing down from Y to X does not work even if Fy is 0 on Y-X because dy is smaller than dx so sum Fdy = 1 implies sum Fdx >1.

One other strange thing happens with this form of measure ux,uy. There is an easy choice for sum of probabilities Fx,Fy for disjoint X,Y. The Union comes with the measure du(X+Y) = 1/(|X|+|Y|) so
sum (( pi |Y|duX + qj|X|duY )du(X+Y) ) = 1 is a fine probability on the union.

This is just supposed to be saying that if X is a 3 set any Y a 5 set then a probability on X is a function from X -> [0,inf] with sum = |X| ( or meanX = 1 ).
So for disjoint X,Y sumX is 3 and sumY is 5 implies sum(X+Y) = |X+Y| = 8.
This is a particular convex sum with lambda = |X|/|X+Y| but convexity has not yet reared its head.

Shouldn’t in some sense probabilities on an N-dim space depend on the volume form dN?
The simplex is kind of over normalized as the volume is half the unit N volume. But if I look at the vectors of length 2, then the volume of the N dimensional 2 box is 2^n times the unit Box.

Posted by: William Patton on June 8, 2011 11:52 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Maybe we can start with the following: given a functor $F:\Fin\State\rightarrow\mathbb{R}$, is there a function on $S$ on the objects of $\Fin\State$ such that

$F(f)=S(\cod(f))-S(\dom(f)) \qquad ?$

In the case of $\Fin\Prob$, we have done this in our paper by noting that the category has a terminal element, which gives us a natural choice for $S$ by assigning to each object the value of $F$ on the unique morphism to the terminal object.

Since I am still wearing my cohomology glasses, let me phrase this as follows: functoriality

$F(f)-F(g\circ f)+F(g) = 0$

means that $F$ is a $1$-cocyle over the edges of (the nerve of) $\Fin\Prob$. What we would like to know is whether this implies that $F$ is a coboundary!

So for $\Fin\Prob$, we have used the existence of a terminal object to show that first cohomology of the nerve vanishes, $H^1(\Fin\Prob)=0$. A terminal object in $\Fin\State$ does unfortunately not exist, so it’s not clear whether the statement still holds. At least we can reformulate the question as

$H^1(\Fin\State) = 0 \qquad ?$

And possibly now we can attack it with topological methods? After all, the nerve of (the skeleton of?) $\Fin\State$ is just some simplicial complex, and we need to know its cohomology groups!

A technical question: given equivalent categories, what is the relation between their nerves? Are they homotopy equivalent?

Posted by: Tobias Fritz on June 6, 2011 3:26 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I just received this email from John, asking to post it on his behalf since he’s unable to comment at the moment:

(begin John)

Tobias wrote:

So for $\Fin\Prob$, we have used the existence of a terminal object to show that first cohomology of the nerve vanishes, $H^1(\Fin\Prob)=0$. A terminal object in $\Fin\State$ does unfortunately not exist, so it’s not clear whether the statement still holds

Beware the evil ‘op’! $FinState$ doesn’t contain $FinProb$ as a full subcategory; it contains $FinProb^{op}$. The terminal object of $FinProb$ becomes the initial object of $FinState$. Namely, the initial C*-algebra, $\mathbb{C}$, equipped with its only possible state.

Having an initial object is also sufficient for the nerve of a category to be contractible.

(/end John)

Whoops! And I thought I had studied noncommutative geometry long enough to be able to avoid this sort of blunder. So yes, you’re of course right, and we can use the same argument to prove that any functor $F:\Fin\State\rightarrow\mathbb{R}$ is of the required form.

But then we also want to postulate convex linearity and continuity in order to (potentially) prove that differences of von Neumann entropy are the unique such functor (up to a scalar). I don’t know how we could prove or disprove this, though suspect that it’s more likely to be false than true.

Would it help to consider not just the pull-back of states from the codomain algebra to the domain algebra, but also a push-forward from the domain algebra to the codomain algebra? Defining this is simple: given a state as an element of the domain algebra, we can simply apply the homomorphism to get an element of the codomain algebra. When the former element is positive and has unit trace, then so does the latter.

This is inspired by Tom’s way of thinking in the classical case: while a measure on a finite set is ‘the same thing’ as a positive-valued function, the two are morally different. In particular, the compatibility with maps between spaces is different. Spinning this further, one can argue that what we really should do is to characterize relative entropy.

Of course this has been said several times before in this discussion. But I wonder whether doing this becomes necessary in the quantum case.

Ok, sorry for the rant, I have nothing better to say and shut up now ;)

Posted by: Tobias Fritz on June 6, 2011 6:20 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Tobias wrote:

A technical question: given equivalent categories, what is the relation between their nerves? Are they homotopy equivalent?

Sorry, I just saw this question now. Yes, they are: you can use the equivalence of categories to build a homotopy equivalence between the nerves. But the converse is false, obviously. The ‘walking morphism’:

$\bullet \to \bullet$

and the ‘walking isomorphism’:

$\bullet \stackrel{\sim}{\to} \bullet$

have homotopy equivalent nerves but are not equivalent as categories.

In fact there are two famous ‘model category’ structures on Cat: in one the weak equivalences are equivalences of categories, while in the other the weak equivalances are functors inducing homotopy equivalences of their nerves.

Posted by: John Baez on June 8, 2011 11:16 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

A technical question: given equivalent categories, what is the relation between their nerves? Are they homotopy equivalent?

Sorry, I just saw this question now. Yes, they are

By the way, information along these lines is being collected at geometric realization of categories.

Posted by: Urs Schreiber on June 8, 2011 1:30 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Posted by: Mike Shulman on June 6, 2011 11:29 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

On my other blog I wrote:

Like Frankenstein’s monster, our paper’s main result was initially jolted into life by huge blasts of power: in this case, not lightning but category theory. It was awesome to behold, but too scary for this blog.

First Tom Leinster realized that the concept of entropy fell out — unexpectedly, but very naturally — from considerations involving ‘operads’, which are collections of abstract operations. He was looking at a particular operad where the operations are ‘convex linear combinations’, and he discovered that this operad has entropy lurking in its heart. Then Tobias Fritz figured out a nice way to state Tom’s result without mentioning operads. By now we’ve taught the monster table manners, found it shoes that fit, and it’s ready for polite society.

In short, while the original idea was developed using operads, the main theorem can be stated rather nicely without operads, and that will let about 50 times as many people understand what it’s saying.

(I think that number is about right. I think any math grad student or any theoretical physicist could understand our paper now, for example.)

Posted by: John Baez on June 7, 2011 7:25 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I hope they don’t disappear forever. That operadic composition for probabilities has something to do with what’s called the branching property for information measures, which has something to do with the cocycle equation, which has something to do with cohomology.

In an ideal world, entropy will turn out to be so special as it’s the coboundary of a cocycle which is a representative of some nontrivial cohomology class.

Posted by: David Corfield on June 7, 2011 11:33 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

David wrote:

I hope they don’t disappear forever.

The blog articles still exist, you know! But I think we’d need to do something a bit more exciting with operads and entropy to justify a formal paper on that subject. There are too many papers to write and not enough time. Right now I’m imagining a rather short paper characterizing von Neumann entropy followed by a longer paper on ‘functorial thermodynamics’ (or whatever we decide to call it), which explains the category-theoretic view of Shannon entropy, Rényi entropy, Tsallis entropy, the partition function, and free energy. We’ve worked out a lot of that story, but need to organize it more clearly and write it up.

In an ideal world, entropy will turn out to be so special as it’s the coboundary of a cocycle which is a representative of some nontrivial cohomology class.

That sounds rather odd: the coboundary of a cocycle is zero. Entropy is special, and zero is special too, but I don’t think you’re trying to say that therefore entropy is zero.

Posted by: John Baez on June 7, 2011 12:02 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

That sounds rather odd: the coboundary of a cocycle is zero.

I wasn’t clear. This is about Tobias’s point that $f(x) = x log x$ is a cocycle multiplicatively, and its additive coboundary is Shannon entropy.

Posted by: David Corfield on June 7, 2011 3:57 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Sorry, what I meant by that overly terse comment was not that I was sad that the operads had vanished from the paper, but that I thought maybe an operadic approach would also shed light on the quantum story (and could perhaps then be excised from the published version also).

Posted by: Mike Shulman on June 7, 2011 3:48 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Here’s one way to characterize von Neumann entropy. It’s probably not the nicest, but it’s a way to get started.

Suppose we have a functor

$F: FinState \to \mathbb{R}$

By an argument Tobias gave, we know that there’s a function $E$ from objects of $FinState$ to real numbers such that for every morphism $f: (A,p) \to (B,q)$ in $FinStat$,

$F(f) = E(B,q) - E(A,p)$

We’re just trying to find conditions that guarantee that $E$ is a nonnegative multiple of the von Neumann entropy.

For starters, let’s assume $F$ is continuous and convex-linear as described above. Then it restricts to a functor

$F \circ I : FinProb^{op} \to \mathbb{R}$

which is also continuous and convex-linear, where $I$ the inclusion

$I: FinProb^{op} \to FinState$

Next, let’s assume that in fact

$F \circ I : FinProb^{op} \to \mathbb{R}_+$

In other words, assume that $F \circ I$ applied to any morphism in $FinProb^{op}$ gives a nonnegative number. By the main theorem in our previous paper, this implies that $F \circ I$ is a constant multiple of the change in Shannon entropy.

So, this means that the function $E(A,p)$ is a constant multiple of von Neumann entropy at least when $A$ is a commutative C*-algebra.

I don’t see how this implies the same result for noncommutative C*-algebras. But I haven’t found a counterexample. This question should be fairly easy to settle: are there funny ‘nonstandard’ notions of entropy for quantum systems that restrict to Shannon entropy for classical ones? But instead of thinking about that now, I just want to find some characterization of the usual von Neumann entropy.

So, I’ll use an idea from here:

Stephanie works at the CQT, so it’s nice to be using her work… and this is a very interesting paper for people trying to understand entropy more deeply.

(I’d never heard of the New Journal of Physics. Will they still call it that several centuries from now? Or will they rename it the Not-So-New Journal of Physics? Well, I guess they didn’t do that with New York…)

Anyway, translating their idea to our context, look at any object $(B,q) \in FinState$. This is a C*-algebra $B$ together with a state on it. There will always be at least one morphism

$i: (A,p) \to (B,q)$

coming from the inclusion of a maximal commutative sub-C*-algebra $A \subseteq B$. And it’s a fact that the von Neumann entropy of $q$ equals the infimum of the entropies of $p$ over all such $(A,p)$. So now let’s assume:

Condition X: For any object $(B,q) \in FinState$, the supremum of $F(i)$ over all morphisms $i: (A,p) \to (B,q)$ where $i$ is the inclusion of a maximal commutative sub-C*-algebra of $A \subseteq B$ equals zero.

There are some tricky signs to keep track of here, thanks to the ‘evil op’, but I think we want a supremum here! If I got this right, condition X guarantees that $E(B,q)$ is the infimum of $E(A,q)$ over all commutative C*-subalgebras $A \subseteq B$.

This completely determines $E$ (since we already know that for commutative C*-algebras $E$ is a multiple of Shannon entropy). And, by the fact I mentioned, it forces $E$ to be a multiple of the von Neumann entropy!

In short:

Theorem (?) Suppose we have a functor $F: FinState \to \mathbb{R}$ that is continuous and convex-linear, whose restriction to $FinProb^{op}$ is nonnegative, and that obeys Condition X. Then $F$ is a nonnegative multiple of the change in von Neumann entropy.

We can make Condition $X$ sound a bit more category-theoretic, as follows. Suppose $C$ is a full subcategory of $D$. Say a maximal subobject in $C$ of $d \in D$ is a subobject $c \hookrightarrow d$ that lies in $C$ and is maximal with this property. In other words, it’s not properly contained in any other subobject of $d$ that lies in $C$.

Then I believe $F$ obeys Condition X iff for any object of $FinState$, the supremum of $F(i)$ over all inclusions $i$ of maximal subobjects in $FinProb^{op}$ is zero.

Posted by: John Baez on June 7, 2011 8:12 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I should add, in case it’s not obvious, that this attempted characterization of von Neumann entropy amounts to saying “If you liked Shannon entropy, you’re gonna love von Neumann entropy.” In other words, we’ve got a nice characterization of Shannon entropy for $FinProb$, and now we’re showing that any well-behaved extension of this concept from $FinProb$ to $FinState$ must give the von Neumann entropy.

This seems to be the main way people justify von Neumann entropy. All the characterizations of von Neumann entropy that I’ve seen rely on the special ‘classical’ case of Shannon entropy somehow. Does anyone know that doesn’t?

It seems a bit unsatisfactory for the characterization of entropy in quantum theory to rely so heavily on its characterization in the classical case. The world is quantum-mechanical, after all.

I sort of like Tobias’ idea of using both pushforwards and pullbacks. Nonnegative elements of a finite-dimensional C*-algebra $A$ may be identified with nonnegative linear functionals on that algebra, thanks to the trace on $A$, which provides an vector space isomorphism $A \cong A^*$. So, an algebra homomorphism $A \to B$ gives rise to a linear map $B^* \to A^*$ as usual, but this can then be identified with a linear map $B \to A$, which is not an algebra homomorphism but sends nonnegative elements to nonnegative elements!

I’m hoping that somehow this idea, combined with the inclusions of maximal commutative C*-algebras in any C*-algebra, together with the fact that we know a characterization of entropy that works for finite-dimensional commutative C*-algebras (from our last paper), will somehow let us state a nice characterization of entropy for all finite-dimensional C*-algebras that doesn’t actually mention commutative ones.

Posted by: John Baez on June 7, 2011 9:16 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

This seems to be the main way people justify von Neumann entropy. All the characterizations of von Neumann entropy that I’ve seen rely on the special ‘classical’ case of Shannon entropy somehow. Does anyone know [one] that doesn’t?

Unfortunately I don’t, but it’s interesting in this connection to consider the characterization of the Schatten norms contained in the paper of Aubrun and Nechita that Tom blogged about recently. The Schatten norms are typically motivated by saying, “We already know that $L^p$ norms are important; these are the noncommutative version.” Aubrun and Nechita prove that the Schatten norms are uniquely characterized among unitarily invariant norms by the fact that they play nicely with tensor products. They derive this as a consequence of a related characterization of (commutative) $\ell^p$ norms, but the statement itself doesn’t mention the commutative case at all.

Posted by: Mark Meckes on June 7, 2011 3:38 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Thanks, Mark! So maybe using the relation between $L^p$ norms and Rényi or Tsallis entropy, which must extend to a relation between Schatten norms and the quantum version of Rényi or Tsallis entropy, we can parlay the characterization of Schatten norms into a characterization of quantum Rényi or Tsallis entropy.

The idea of ‘playing well with the tensor product’ is already incorporated in Ochs’ characterization of von Neumann entropy, as reported by Tobias here. So, that’s another sign that this is a handy condition.

Posted by: John Baez on June 8, 2011 4:57 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

When the Aubrun–Nechita paper first appeared, I asked Guillaume Aubrun if they had thought about how their results related to characterizations of Rényi entropy, even in the commutative case. (That was on my mind at the time because of the conversations going on at Azimuth.) He said they hadn’t yet, but would. One shortcoming of this kind of approach is that (as far as I see) both the results and the methods of the Aubrun–Nechita paper (and the earlier Fernández-González–Palazuelos–Pérez-García paper we learned about here) only apply to actual $\ell^p$ norms, i.e., in the case $p \ge 1$, which keeps us from being able to take the limit $p\to 0$ to get at Shannon and von Neumann entropy. Tom faced the same issue in his application of these results to characterizing $p$-means: he had to restrict to $p \ge 1$ although the $p$-means are of interest for all $p \in [-\infty, \infty]$.

Posted by: Mark Meckes on June 8, 2011 2:33 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Mark mentioned the unwelcome restriction to $p \geq 1$, although

the $p$-means are of interest for all $p \in [-\infty, \infty]$.

I don’t actually know of any characterization of the $p$-means ($=$ power means) that covers negative $p$.

In Hardy, Littlewood and Pólya’s book Inequalities, there’s a characterization of the $p$-means covering $p \in (0, \infty)$. (It takes a bit of work to extract it in elementary form, as it’s phrased in the language of Stieltjes integrals.) There must surely be other similar characterizations out there, and presumably someone’s tried to address the full $p \in [-\infty, \infty]$ case—or at least, $p \in (-\infty, \infty)$. But personally, I’ve never seen this done. Has anyone else?

Just to clarify a bit: there are at least two things that might be meant by “a characterization of the power means” or “a characterization of the Rényi/Tsallis/… entropies”. The question is: do you specify the order $p$/$\alpha$ beforehand?

For example, Hardy, Littlewood and Pólya’s theorem says: take a system of functions of a certain kind satisfying certain axioms (not mentioning anything called “$p$”). Then there exists $p \in (0, \infty)$ such that the system of functions is the power mean of order $p$.

Similarly, Aubrun and Nechita’s theorem says: take a system of norms of a certain kind satisfying a certain axiom (not mentioning anything called “$p$”). Then there exists $p \in [1, \infty]$ such that the system of norms is the norm of order $p$.

But our characterization of Tsallis entropy said: take $\alpha \in (0, \infty)$ and take a map of a certain kind satisfying certain axioms that do mention $\alpha$. Then the map is Tsallis entropy of order $\alpha$.

Posted by: Tom Leinster on June 8, 2011 3:14 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Concerning characterizations of von Neumann entropy which are not based on Shannon entropy, this seems to be one:

W. Ochs, A new axiomatic characterization of the von Neumann entropy, Reports on Mathematical Physics 8(1), pp. 109–120

It’s behind a paywall; if anyone has trouble accessing it, send me an email. BTW, it’s hilarious that Elsevier seems to locate Munich in France! This is almost as funny as the paper Bell’s inequality holds for all non-product states, which should have been titled “Bell’s inequality is violated by all non-product states”.

So far I have only skimmed Ochs’ characterization. But if I get it right, then his characterization of von Neumann entropy contains only the following conditions on a function $S$ which assigns to each density matrix a nonnegative number:

• It is invariant under isometries: if $\rho$ is a density matrix and $V$ is an isometry (an operator with $V^\ast V=1$, or in other words a unitary onto its range), then $S(V\rho V^\ast)=S(\rho)$. (Actually, Ochs talks about partial isometries, but that should be equivalent.)

• It is additive under tensor products: $S(\rho_1\otimes\rho_2)=S(\rho_1)+S(\rho_2)$. The tensor product here is the quantum analogue of the cartesian product of probability spaces.

• It is subadditive: if $\rho$ is a density matrix on a tensor product Hilbert space $H_1\otimes H_2$ and $\rho_1$ and $\rho_2$ are the reduced density matrices on $H_1$ and $H_2$, respectively, then $S(\rho)\leq S(\rho_1)+S(\rho_2)$.

• It’s continuous. (The condition described in the paper is actually weaker than this.)

To my mind, it’s quite surprising that these few conditions characterize von Neumann entropy! I will have to read the paper in some more detail.

Posted by: Tobias Fritz on June 7, 2011 4:10 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Sorry, the paper says that all these conditions are satisfied precisely by the scalar multiples of von Neumann entropy, not just by von Neumann entropy itself.

Posted by: Tobias Fritz on June 7, 2011 4:22 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Note: I’m checking in from my mobile just before nodding off so this might be total rubbish, but those conditions remind me of coherent risk measures in finance.

Posted by: Eric on June 7, 2011 5:02 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Thanks, Tobias! I hadn’t looked at that paper by W. Ochs; somehow the abstract looked unpromising, because I thought he was using ‘dispersion’ in a technical sense.

If he’s proved the result you state, there may not be much need for a better characterization of von Neumann entropy—it sounds like a strong result!

Of course, I imagine he’s considering von Neumann entropy only for states on matrix algebras, not arbitrary finite-dimensional C*-algebras. There could be a better result where you take advantage of the extra demand that entropy be nicely behaved on all finite-dimensional C*-algebras. My first theorem along those lines was not exactly thrilling, but it illustrates the idea: for example, we can try to use maximal commutative sub-C*-algebras to do something interesting.

If I ever visit “München, France” I’ll try to meet Ochs.

Posted by: John Baez on June 8, 2011 4:42 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

So, here’s a bit of commentary on Ochs’ paper ‘A new axiomatic characterization of the von Neumann entropy’.

(By the way, he listed his address as “München, GFR”. I bet the robots at Elsevier thought “GFR” was an abbreviation for France.)

With some preliminary work he shows that any entropy function meeting the conditions Tobias listed gives rise to a function $F$ sending each probability distribution $(p_1, \dots, p_n)$ to a nonnegative number. The reason is that any state on $M_n(\mathbb{C})$ is a nonnegative matrix with trace one; his assumptions let us diagonalize this matrix without changing the entropy, and the diagonal entries give a probability distribution $(p_1, \dots, p_n)$.

He then shows the conditions Tobias listed imply a bunch of conditions on this function $F$. They are:

(F1): $F(p_1, \dots, p_n)$ is invariant under permutations of the $p_i$.

(F2):

$F(p_1, \dots, p_n) = F(p_1, \dots, p_n,0)$

(F3): a condition saying $F$ is finite and not always zero—irrelevant from our viewpoint.

(F4):

$F(p_{11}, \dots p_{n m}) \le F(\sum_{j=1}^m p_{1j}, \dots, \sum_{j=1}^m p_{n j}) + F(\sum_{i=1}^n p_{i1}, \dots, \sum_{i=1}^n p_{i n})$

(F5):

$F(p_1 q_1, p_1 q_2, \dots, p_n q_m) = F(p_1, \dots, p_n) + F(q_1, \dots , q_m)$

(F6):

$lim_{q \to 0} F(q,1-q) = 0$

Conditions (F1) and (F2) come from the invariance of entropy under isometries. Condition (F4) follows from the subadditivity of entropy. Condition (F5) comes from its additivity under tensor products. Condition (F6) comes from continuity.

Then he says:

As Forte [5] and Aczél, Forte and Ng [1] have recently shown in remarkable papers, the function $F$ is already determined by the properties (F1) to (F6), and has the form

$F(p_1, \dots, p_n) = - a \sum_{i=1}^n p_i \, \ln p_i$

So, I think the ‘magic’ in Ochs’ result lies in the work of Aczél, Forte and Ng. I bet this work was yet another characterization of Shannon entropy. It’s a bit like Faddeev’s characterization, but different.

Posted by: John Baez on June 8, 2011 10:14 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

We’ve bumped up against Forte and Ng before here and following comment. What are the papers Forte [5] and Aczél, Forte and Ng [1]? I don’t have access to Ochs.

Posted by: David Corfield on June 8, 2011 10:32 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

David wrote:

What are the papers Forte [5] and Aczél, Forte and Ng [1]?

I was afraid someone was going to ask that. They are:

• B. Forte, in Symposia Math. Vol. XI, Academic Press, New York, 1976.

• J. Aczél, B. Forte and C. T. Ng, Why the Shannon and Hartley entropies are ‘natural’, Adv. Appl. Prob. 6 (1974), 131-146.

Or maybe page 121; I’m getting different opinions on the web. I can’t find the title of Forte’s first paper.

If anyone wants a bibliography of 578 papers on entropy and related issues, especially in the quantum case, there’s one by Christopher Fuchs here.

The Hartley function seems to be an example of how you can get your name on something if you write a paper about it at the right time.

Posted by: John Baez on June 8, 2011 11:26 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

As you know, I’ve looked into the mathematical use of ‘natural’, so it’s interesting to see it here in the title of a paper.

I wonder if Shannon entropy is one of those entities which combines many excellent qualities, only subsets of which are needed to characterise it, like Hazewinkel’s star example of the ring of symmetric functions in an infinity of indeterminates.

Posted by: David Corfield on June 8, 2011 12:57 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Since no one seems to have tackled the puzzle at the end of John’s post so far, let me try to give it a go.

Puzzle. If you wanted to classify finite-dimensional C*-algebras using pictures of some famous sort, what kind of pictures could you use?

I like to think of a matrix algebra $M_n(\mathbb{C})$ as the groupoid algebra of the groupoid on $n$ objects in which any two objects are isomorphic via a unique isomorphism. A finite-dimensional $C^\ast$-algebra as a direct sum of matrix algebras then is the category algebra of a disjoint union of such groupoids.

So, is the answer “graphs of groupoids” correct?

Posted by: Tobias Fritz on June 7, 2011 5:13 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Don’t you use Bratteli diagrams? Must make sure I got the spelling right.

Posted by: David Corfield on June 7, 2011 7:49 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I thought about that too, but then what’s so famous about those? Maybe they show up in other areas under different names. I could probably tell if I remembered what a Bratteli diagram actually is.

Posted by: Tobias Fritz on June 7, 2011 9:11 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

No, finite-dimensional C*-algebras are not classified by Bratteli diagrams… they’re classified by a really famous type of diagram. Guess! And then prove!

Bratteli diagrams are more complicated. As the Wikipedia article almost comes out and admits, these diagrams classify sequences of finite-dimensional C*-algebras with each one included into the next, like this:

$A_1 \hookrightarrow A_2 \hookrightarrow A_3 \hookrightarrow A_4 \hookrightarrow \cdots$

More precisely, Bratteli diagrams classify sequences of this sort up to the obvious notion of isomorphism. Obvious? Yes: a sequence of this sort gives a functor from the category

$\bullet \to \bullet \to \bullet \to \cdots$

to the category of finite-dimensional C*-algebras, and an isomorphism between such sequences is a natural isomorphism between such functors.

Anyway, while this stuff is fun, my question was about something simpler.

Posted by: John Baez on June 8, 2011 4:18 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Posted by: David Corfield on June 8, 2011 8:47 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

A Young diagram is a thing like this:

How would such things classify finite-dimensional C*-algebras?

Posted by: John Baez on June 8, 2011 9:25 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

The (not very deep) reasoning went: if every finite-dimensional C*-algebra is isomorphic to a finite direct sum of matrix algebras, and each finite-dimensional matrix algebra is of the form $M_n(\mathbb{C})$, then each could be associated to a multiset of positive integers.

Couldn’t I associate the diagram you included with the sum of $M_5(\mathbb{C})$, $M_4(\mathbb{C})$ and $M_1(\mathbb{C})$?

Posted by: David Corfield on June 8, 2011 9:37 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

You’re right! I was just testing you. I wanted to make sure you weren’t just saying “Well, he must mean either Dynkin diagrams or Young diagrams, and Dynkin diagrams couldn’t be right, so…”

Posted by: John Baez on June 8, 2011 10:26 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

(This is the first time I’m writing to a blog, so I’m sorry if I mess this up completely!)

I just read the paper and I was delighted! I’m currently investigating the direct opposite: Characterizing information loss of deterministic input-output systems via entropy. In particular, I use the conditional entropy of the input given the output as a measure of information loss. For static systems (i.e., described by a function, Y=g(X)) and iid input samples on a discrete alphabet the conditional entropy is equal to the difference between the entropy at the input and the entropy at the output:

H(X|Y) = H(X) - H(Y)

The nice thing about conditional entropy is that, under mild constraints, it can also act as a measure of information loss for continuous alphabets.

So it was great to see that Shannon entropy is indeed THE characterization for information loss, if certain properties are desired.

Thanks for that!
Bernhard

Posted by: Bernhard on June 10, 2011 9:45 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Nice! We do mention, rather unobtrusively, that our ‘information loss’ can always be seen as a conditional entropy. This is a point Tobias brought up. Here’s what we say:

We begin by establishing a useful formula for $F(f) = H(p) - H(q)$, where as usual $f$ is a morphism $p \to q$ in $\Fin\Prob$. Since $f$ is measure-preserving, we have $q_j = \sum_{i \in f^{-1}(j)} p_i.$ So \begin{aligned} \sum_j q_j \ln q_j &=& \sum_j \sum_{i \in f^{-1}(j)} p_i \ln q_j \\ &=& \sum_j \sum_{i \in f^{-1}(j)} p_i \ln q_{f(i)} \\ &=& \sum_i p_i \ln q_{f(i)} \end{aligned} where in the last step we note that summing over all points $i$ that map to $j$ and then summing over all $j$ is the same as summing over all $i$. So, \begin{aligned} F(f) &=& - \sum_i p_i\ln p_i + \sum_j q_j \ln q_j \\ &=& \sum_i ( -p_i \ln p_i + p_i \ln q_{f(i)}) \end{aligned} and thus $F(f) = \sum_{i \in X} p_i \ln \frac{q_{f(i)}}{p_i}$ where the quantity in the sum is defined to be zero when $p_i = 0$. If we think of $p$ and $q$ as the distributions of random variables $x \in X$ and $y \in Y$ with $y = f(x)$, then $F(f)$ is exactly the conditional entropy of $x$ given $y$. So, what we are calling ‘information loss’ is a special case of conditional entropy.

Posted by: John Baez on June 10, 2011 11:03 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Thanks for your response! I came to the same conclusion by the properties of entropies:

H(X)-H(Y) = H(X,Y) - H(Y) = H(X|Y),

where the first equality is due to the fact that Y is a function of X. The nice thing is that while H(X) and H(Y) are not defined for continuous random variables (or at least infinite), under mild constraints on the function the conditional entropy is still useful (and finite).

However, I owe you guys because you justified my choice of conditional entropy as a measure of information loss. Do you, by the way, intend to publish your results somewhere else aside from arXiv?

Posted by: Bernhard on June 14, 2011 8:16 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Bernhard, I just took a look at your brand new paper and was surprised to find out that it seems to be intimately related to what we have been doing!

In categorical language, what Bernhard and his coauthor do is the following: they consider a category where an object is a (discrete time and stationary) stochastic process and a morphism is a measure-preserving function between these, defined in a way which allows the function to have some memory about the past. Then they go on to define a $\mathbb{R}_{\geq 0}$-valued “information loss” functor on this category (Definition 2) and prove its functoriality (Theorem 3). Without having checked the details of this, this information loss functor should also be (convex) linear in the sense of our paper.

Now one could ask: which additional properties does one need in order to characterize this information loss functor up to a scalar multiple? Or are these requirements already enough?

Posted by: Tobias Fritz on June 17, 2011 3:22 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

A very relevant paper seems to be this:

Petz gives a few axioms necessary to characterize quantum relative entropy which was mentioned by Urs. He also works with finite-dimensional $C^\ast$-algebras!

I can report back once I’ve read the paper in more detail.

Posted by: Tobias Fritz on June 10, 2011 3:28 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Thanks for the reference.

I have also quickly added that to the $n$Lab entry entropy . Somebody should expand that entry further…

Posted by: Urs Schreiber on June 11, 2011 3:24 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I have been trying to understand Petz’s paper some further. One challenge in grasping his result is his relative quantum entropy analogue of the glomming formula; in case you want to know how it looks like, it’s property (i) in his paper and called “conditional expectation property”.

In order to make some progress in understanding this property, I have been figuring out what it states for classical relative entropy, which is also known as Kullback-Leibler divergence. Let $p$ and $p'$ be probability measures on a finite set $X$. Then their relative entropy is $S_X(p||p') = \sum_{x\in X} p_x(\log p_x-\log p'_x)$

I will now define a category $\Fin\Prob_\rel$ on which this relative entropy can be regarded as a functor. An object of $\Fin\Prob_\rel$ is defined to be a finite probability space $(X,p)$. A morphism $f:(X,p)\rightarrow (Y,q)$ is defined to be a measure-preserving function which comes equipped additionally with a probability measure $f^y$ on each fiber $f^{-1}(y)$. We may think of this structure as specifying conditional probabilities $P(x|y)$. No further compatibility condition is imposed.

Such a morphism defines another probability measure on $X$ whose weight on $x\in X$ is given by $f^{f(x)}_x q_{f(x)}$. Intuitively, this means to determine the distribution of $x\in X$ as the product of the distribution of $y\in Y$ (which is $q$) and the conditional distribution $P(x|y)$ which is given in terms of the $f^y$. I will denote this new measure by $f^\ast q$.

Then the claim is that there is a functor $\Fin\Prob_\rel\to\mathbb{R}_{\geq 0}$ defined by $(f:(X,p) \to (Y,q)) \mapsto S_X(p,f^\ast q)$ Functoriality of this expression means that relative entropy satisfies the formula $S_X(p,(g f)^\ast r)=S_Y(q,g^\ast r)+S_X(p,f^\ast q)$ for morphisms $f:(X,p) \to (Y,q)$ and $g:(Y,q)\to(Z,r)$. This formula is indeed correct and can be verified by a slightly tedious but straightforward calculation. It specializes to a glomming formula for relative entropy when $Z$ is the one-point space. Then one obtains precisely the classical case of Petz’s conditional expectation property.

This functoriality might pave the way for a characterization of (classical) relative entropy as a functor $\Fin\Prob_\rel\to\mathbb{R}_{\geq 0}$ preserving certain structure. I wonder how it relates to the observation that relative entropy also also arises from the cohomological constructions.

So should we continue with classical relative entropy first before looking further at the quantum case?

Posted by: Tobias Fritz on June 17, 2011 12:23 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I might be talking to myself here at the moment, but let me anyway try to explain a categorical characterization theorem of (Shannon) relative entropy, aka Kullback-Leibler divergence. This characterization is pretty much in lines with our paper.

Let $\Fin\Prob_\rel$ be the category described above: an object of $\Fin\Prob_\rel$ is a finite probability space $(X,p)$; a morphism $f:(X,p)\to (Y,q)$ is a measure-preserving function $X\to Y$ together with a probability measure on each fiber. Composition is the obvious one. This defines another probability measure $f^\ast q$ on $X$ by the obivous procedure which David has called a congruent embedding by a Markov mapping.

Example: let $Y$ be a one-element set. Then $q$ is unique. A morphism $f:(X,p)\to (Y,q)$ then has a unique underlying function, and this function is automatically measure-preserving. In this case, the additional data of a probability measure on each fiber just amounts to a second probability measure $f^\ast q$ on all of $X$.

Just like in $\Fin\Prob$, also in $\Fin\Prob_\rel$ there is an obvious notion of “convex direct sum” $\lambda x\oplus (1-\lambda)y$ of objects and of morphisms.

Theorem: let $S':\Fin\Prob_\rel\to \mathbb{R}_{\geq 0}$ be a measurable functor such that $S'(\lambda f\oplus (1-\lambda)g) = \lambda S'(f) + (1-\lambda) S'(g)$ holds. Then there is a constant $C\in\mathbb{R}_{\geq 0}$ such that $S'(f) = C\cdot S(p, f^\ast q)$ for all morphisms $f:(X,p)\to(Y,q)$, where $S(p,f^\ast q)$ is the relative Shannon entropy between $p$ and $f^\ast q$.

This result is contained in Petz’ paper, although Petz does it in even greater generality (the quantum case) and uses a different language.

This theorem does not mention any kind of information loss.

Do you think that this should be written down in detail?

Posted by: Tobias Fritz on June 20, 2011 5:53 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Tobias wrote:

I might be talking to myself here at the moment…

Not really, though I am quite distracted by the quantum gravity conference here in Zurich, and the presence of my former students Derek Wise and Jeffrey Morton, along with lots of other friends.

Do you think that this should be written down in detail?

Maybe. I have a few questions, though. I could probably answer them by reading Petz’ paper, but you could probably answer them more easily.

1. Does the proof of this theorem boil down to using Fadeev’s theorem, or does the function $p \ln p$ appear for some completely different reason?
2. Can you state a theorem like you’ve done in the quantum case as well? Right now I’m mostly interested in giving a nice category-theoretic characterization of entropy (or relative entropy) for quantum systems.
3. Do you think your theorem sheds new light on Petz’s result? Do you think it’s worth trying to publish it?
Posted by: John Baez on June 23, 2011 11:32 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Thanks for the reply, John! I’m happy to see that someone took note of what I wrote.

1. Does the proof of this theorem boil down to using Fadeev’s theorem, or does the function $p\ln p$ appear for some completely different reason?

Petz’s paper doesn’t even mention Faddeev’s theorem! Writing $F(x,y)$ for the relative entropy between (classical) binary probability distributions $(x,1-x)$ and $(y,1-y)$ (with $y\neq 0$ and $\neq 1$), he notes that his axioms imply the functional equation

$F(x,y)+(1-x)F\left(\frac{u}{1-x},\frac{v}{1-y}\right) = F(u,v) + (1-u) F\left( \frac{x}{1-u}, \frac{y}{1-v} \right)$

together with the condition $F(1/2,1/2)=0$. He refers to this paper for the “lengthy, but elementary analysis” that the only measurable solution is given by

$F(x,y) = x\log\frac{x}{y} + (1-x) \log\frac{1-x}{1-y}$

up to a scalar.

1. Can you state a theorem like you’ve done in the quantum case as well?

I have spent about two days trying to do this for von Neumann entropy, and failed. But yes, for relative von Neumann entropy it should be possible to do this along very much the same lines as I did above for classical relative entropy. I know how the quantum theorem should look like; so far I haven’t been able to prove that quantum relative entropy satisfies functoriality, which is essentially the only non-trivial part of the proof. The problem is that the formula $\log(xy)=\log x+\log y$ is pretty much wrong when $x$ and $y$ are elements of a $C^*$-algebra. Knowing how to prove stuff without using this equation will require a bit of digging in the literature on quantum relative entropy.

I’m a bit tired now and will not explain the unproven theorem in this comment; I hope to do so once I can prove it!

1. Do you think your theorem sheds new light on Petz’s result? Do you think it’s worth trying to publish it?

Hmm, hard to tell. Essentially my observation boils down to finding a category such that (the classical case of) Petz’s “conditional expectation property” is a special case of functoriality of relative entropy. It’s like in our paper: the glomming formula corresponds to functoriality for a composition of morphisms $X\to Y\to \ast$, so functoriality is a bit more general than the glomming formula. Same thing in the relative entropy case.

At the moment, it’s probably not enough to write a paper about it. I have only rephrased Petz’s result in category-theoretical terms; well, so far actually only the classical version of Petz’s result! This rephrasing might be interesting for people who care about category theory and about information theory; but a publishable result should probably appeal also to people interested in only either of the two subjects, right?

Posted by: Tobias Fritz on June 23, 2011 7:04 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

I can now state a category-theoretical characterization of quantum relative entropy. It’s basically just a reformulation of Petz’s result. At the moment I’ll have to be quite brief and do without explaining the intuitive meaning of the concepts involved.

Let’s define a category $\FinC^\ast_\rel$. An object $(A,\rho)$ of this category is a finite-dimensional $C^\ast$-algebra together with a distinguished state $\rho$. A morphism $f:(A,\rho)\to(B,\phi)$ is an inclusion of $C^\ast$-algebras $A\subseteq B$, such that $\phi_{|A}=\rho$, together with a conditional expectation $B\to A$; this is a map which has norm $1$, is an $A$-bimodule homomorphism and restricts to the identity on $A$. (In the commutative case, it corresponds to an “averaging over the fibers” operation.)

Using the conditional expectation, one can extend $\rho$ to a state on $B$; let’s write $f^*\rho$ for this.

Theorem (Petz): Let $S':\FinC^\ast_\rel\to\mathbb{R}_{\geq 0}$ be a (measurable) functor which satisfies convex linearity. Then $S'$ is a scalar multiple of the relative entropy

$S(f) = S(f^*\rho||\phi)$

for every morphism $f:(A,\rho)\to(B,\phi)$.

Posted by: Tobias Fritz on June 28, 2011 7:43 PM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Very nice! Do you know if the quantum version of ‘conditional expectation’ is an existing notion, or did you make it up? It’s a nice generalization of the classical version, which is often phrased in terms of a sub-$\sigma$-algebra of a $\sigma$-algebra. A set with a $\sigma$-algebra of subsets gives a commutative von Neumann algebra, so the classical notion of conditional expectiation is a special case of the notion for general von Neumann algebras.

Posted by: John Baez on June 29, 2011 7:59 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

For a $C^*$ algebra B and a $C^*$ subalgebra A of B a unital projection of norm one from B to A is called a conditional expectation in the literature without attribution to a particular author. I read it in David E. Evans and Yasuyuki Kawahigashi: “Quantum Symmetries on Operator Algebras” (It is listed in the index.) There is a more specific notion of conditional expectation for $II_1$ factors with trace, also mentioned in that book as theorem 5.25, which comes closer to the classical notion (no wonder).
Posted by: Tim van Beek on June 29, 2011 8:46 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Tim wrote:

For a $C^\ast$ algebra $B$ and a $C^\ast$ subalgebra $A$ of $B$ a unital projection of norm one from $B$ to $A$ is called a conditional expectation in the literature

Right, thanks. Petz himself also uses the term “conditional expectation” in his paper in the $C^\ast$-algebraic context. Since you seem to know about this, I have one question: sometimes an additional requirement is taken to be that the conditional expectation (let’s call it $E$) should be an $A$-bimodule map:

$E(a_1 b a_2) = a_1 E(b) a_2 \qquad \forall a_1,a_2\in A,\: b\in B$

In particular, this is what we need in the above theorem. I haven’t been able to figure out whether this condition is actually implied being a projection of norm $1$, and the literature I found is not very clear at this point. Do you know? Can you point to a reference?

John, what should we do with this? I’m going to be offline now for a week or two, but then will definitely get back to this. Some options would be: we could either write it down as it is, so that it would become a rather short paper of a more expository nature. Or we could try to find some abstract picture behind the category $\FinC^\ast_\rel$. Or we could try to take the present theorem as a starting point for going back to thinking about characterizing (non-relative) von Neumann entropy.

Posted by: Tobias Fritz on June 29, 2011 9:30 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Tobias wrote:

I haven’t been able to figure out whether this condition is actually implied being a projection of norm 1, and the literature I found is not very clear at this point.

I’m pretty sure that this is proven in Takesaki’s first book about the “Theory of Operator Algebras”, but I don’t have that one at hand right now. Thanks to the wonders of the internet, I have instead found what seems to be the original research paper that proves this:

(It is theorem 1 right on the first page.)

Posted by: Tim van Beek on June 29, 2011 10:18 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

Wow!! I wonder why the wonders of the internet didn’t turn that up for me. Probably I was too fixated on the term “conditional expectation”, which Tumiyama apparently doesn’t use.

Posted by: Tobias Fritz on June 29, 2011 10:38 AM | Permalink | Reply to this

### Re: Category-Theoretic Characterizations of Entropy III

For those who prefer reading Latexed documents: we put our paper on the arXiv.

Posted by: Tom Leinster on June 12, 2011 2:18 AM | Permalink | Reply to this

Post a New Comment