Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 21, 2019

Entropy mod p

Posted by Tom Leinster

I’ve just arXived a new paper:

Tom Leinster, Entropy modulo a prime, arXiv:1903.06961

clock with 13 hours

In it, you’ll find answers to these questions and more — and I’ll also answer them in this post:

  • What are logarithms and derivations mod pp?

  • What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime pp?

  • What is the right mod pp analogue of information loss?

  • What does it mean to say that log83(mod7)\log \sqrt{8} \equiv 3 \pmod{7}?

  • Why is entropy mod pp a polynomial?

  • What does all this have to do with polylogarithms and homology?

I’ve posted about entropy mod pp a couple of times before, during the time I was working the whole thing out, but I’ve tidied it up now enough to write it up. Here are my answers to the questions above.

  • What are logarithms and derivations mod pp?

What this question isn’t about is discrete logarithms.

What it is about is the following problem. If you wanted an analogue of the ordinary real logarithm in /p\mathbb{Z}/p\mathbb{Z}, your first thought would probably be to seek a group homomorphism

log:((/p) ×,)(/p,+). \log : \bigl((\mathbb{Z}/p\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).

But that’s impossible: Lagrange’s theorem immediately implies that the only such homomorphism is trivial.

However, there’s a next-best thing. It’s the Fermat quotient, which up to a constant factor is the unique homomorphism

q:((/p 2) ×,)(/p,+). q: \bigl((\mathbb{Z}/p^2\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).

It’s given explicitly by the formula

q(a)=a p11p, q(a) = \frac{a^{p - 1} - 1}{p},

which is an integer by Fermat’s little theorem. The Fermat quotient turns out to provide an acceptable substitute for the (non-existent) logarithm.

What about derivations? Over \mathbb{R}, the function

:x{xlogx if x>0 0 if x=0 \partial : x \mapsto \begin{cases} -x \log x &\text{if }\ x \gt 0 \\ 0 &\text{if }\ x = 0 \end{cases}

is a nonlinear derivation, in the sense that it satisfies the Leibniz rule

(xy)=x(y)+(x)y. \partial(x y) = x \partial(y) + \partial(x) y.

The correct analogue over /p\mathbb{Z}/p\mathbb{Z} is the function

:/p 2/p \partial: \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}

given by

(a)=aa pp. \partial(a) = \frac{a - a^p}{p}.

This satisfies the same Leibniz rule. If aa is not divisible by pp then (a)=aq(a)\partial(a) = -a q(a), as you’d expect from the analogy with the real case. Otherwise, (a)=a/p\partial(a) = a/p.

  • What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime pp?

Ordinarily, a finite probability distribution π\pi is a finite sequence (π 1,,π n)(\pi_1, \ldots, \pi_n) of nonnegative reals summing to 11. What I’m interested in is the same kind of thing, but where π 1,,π n\pi_1, \ldots, \pi_n are elements of the ring /p\mathbb{Z}/p\mathbb{Z}.

This might seem odd. So it’s useful to be somewhat aware of the idea of “negative probabilties”, which has a long history in physics going back at least to Feynman. There are several nice papers on that topic, including Feynman’s and a couple by Blass and Gurevich, but here I’ll just link to a recent paper of Blass and Gurevich which gives lots of references and history on the first page. To quote from the abstract:

We argue that the right question is not what negative probabilities are but what they are for.

My attitude to “probabilities mod pp” is the same. Which isn’t to claim I know what they’re “for” — in fact, I’d love to find an application for this theory, even within mathematics — but it’s a much easier question to answer than “what are they?”

Anyway, I show in my paper that the right definition of the entropy H(π)H(\pi) of a probability distribution π=(π 1,,π n)\pi = (\pi_1, \ldots, \pi_n) mod pp is

H(π)=1p(1π i p)/p. H(\pi) = \frac{1}{p} \Bigl( 1 - \sum \pi_i^p \Bigr) \in \mathbb{Z}/p\mathbb{Z}.

At first, this looks like it doesn’t even make sense. The π i\pi_i are integers mod pp, so 1π i p1 - \sum \pi_i^p is an integer mod pp, so how can you divide it by pp? After all, p=0p = 0 in /p\mathbb{Z}/p\mathbb{Z}, and you can’t divide by zero.

What saves the day is the little lemma that if ab(modp)a \equiv b \pmod{p} then a pb p(modp 2)a^p \equiv b^p \pmod{p^2}. This is a pleasant exercise. The upshot is that the operation of raising to the power of pp can be regarded as a map

() p:/p/p 2. (-)^p : \mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p^2\mathbb{Z}.

Interpreted in this way, 1π i p1 - \sum \pi_i^p is an element of /p 2\mathbb{Z}/p^2\mathbb{Z}. Since π i p=π i=1\sum \pi_i^p = \sum \pi_i = 1 as elements of /p\mathbb{Z}/p\mathbb{Z}, we can divide by pp, and the result is the entropy.

(Another way to put it is that H(π)=1p(1a i p)H(\pi) = \tfrac{1}{p}(1 - \sum a_i^p), where a 1,,a na_1, \ldots, a_n are integers representing π 1,,π n\pi_1, \ldots, \pi_n.)

Why’s this the right definition of entropy? Because up to a scalar multiple, it’s the one and only quantity that satisfies the same functional equation that characterizes real entropy — variously called the chain rule, recursivity, or the grouping law. All this is done precisely in the paper.

  • What is the right analogue of information loss mod pp?

There’s a quote about the Grothendieck–Riemann–Roch theorem I really like:

Grothendieck came along and said “No, the Riemann–Roch theorem is not a theorem about varieties, it’s a theorem about morphisms between varieties.”

(Nicholas Katz, quoted here).

In geometry and topology, this kind of conceptual shift is often described with the word relative. Instead of studying spaces XX, you study spaces XX relative to a base BB, which means maps XBX \to B of a suitable kind. The classical case of mere spaces is where BB is the one-point space.

Categorically, it’s the conceptual shift from objects to maps. Entropy is an invariant of probability spaces. What about maps between probability spaces — measure-preserving transformations? Is there a sensible numerical invariant of these, which reduces to entropy when the codomain is the one-point space?

John Baez, Tobias Fritz and I showed that the answer is yes. You can talk in a reasonable way about the information loss of a measure-preserving map between probability spaces. (The terminology makes sense if you think of such a map as a deterministic process: click the link for explanation!) It’s closely related to conditional entropy.

The shift from entropy to information loss has the following big advantage. The recursivity equation that characterizes entropy is a bit complicated. But John, Tobias and I proved that information loss is characterized by a couple of equations that are nothing more than linearity conditions. They’re easy:

  1. if you do two processes in sequence, the amount of information lost by the composite processes is the sum of the amount lost by each process individually;

  2. if you flip a probability-λ\lambda coin and do one or other of two different processes, the information lost overall is λ\lambda times the information loss of the first process plus 1λ1 - \lambda times the information lost by the second.

Everything I’ve just been saying about information loss is over \mathbb{R}. Over /p\mathbb{Z}/p\mathbb{Z}, it’s much harder to understand things information-theoretically. But I show in my new paper that there is a precisely analogous definition of information loss mod pp, and it satisfies a precisely analogous characterization theorem.

  • What does it mean to say that log83(mod7)\log \sqrt{8} \equiv 3 \pmod{7}?

The direct inspiration for my paper was a very short note by Maxim Kontsevich (which I’m grateful to Herbert Gangl for pointing out to me). In it, Kontsevich made an intriguing suggestion.

Consider a finite probability distribution

π=(π 1,,π n) \pi = (\pi_1, \ldots, \pi_n)

where all the probabilities π i\pi_i are rational numbers, say α i/β\alpha_i/\beta. Rational probabilities arise naturally: you get them in ball/urn or scenarios and other combinatorial problems.

We can, of course, regard the π i\pi_is as real numbers, and take the real entropy

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i \in \mathbb{R}.

But there’s also something else we can do.

Only finitely many primes divide β\beta, and as long as pp doesn’t have the bad luck to be one of them, each π i=α i/β\pi_i = \alpha_i/\beta defines an element of /p\mathbb{Z}/p\mathbb{Z}. Hence π\pi defines a probability distribution mod pp, and we can take its mod pp entropy

H p(π)/p. H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}.

Kontsevich’s suggestion, playfully made, was to view H p(π)H_p(\pi) as the “residue mod pp” of the real (usually transcendental) number H (π)H_\mathbb{R}(\pi).

For instance, take

π=(1/4,1/4,1/2) \pi = (1/4, 1/4, 1/2)

and p=7p = 7. The real entropy of the distribution is

H (π)=14log4+14log4+12log2=log8. H_\mathbb{R}(\pi) = \frac{1}{4} \log 4 + \frac{1}{4} \log 4 + \frac{1}{2} \log 2 = \log \sqrt{8}.

In /7\mathbb{Z}/7\mathbb{Z},

π=(2,2,4), \pi = (2, 2, 4),

so

H 7(π)=17(1[2 7+2 7+4 7])=3/7. H_7(\pi) = \frac{1}{7} \Bigl( 1 - [2^7 + 2^7 + 4^7]\Bigr) = 3 \in \mathbb{Z}/7\mathbb{Z}.

In this sense,

log83(mod7). \log \sqrt{8} \equiv 3 \pmod{7}.

There’s a bit of work to do in order to see that Kontsevich’s suggestion really makes sense. For instance, if we’re going to view H p(π)/pH_p(\pi) \in \mathbb{Z}/p\mathbb{Z} as the “residue” of the real number H (π)H_\mathbb{R}(\pi), then it had better be the case that

H (π)=H (σ)H p(π)=H p(σ). H_\mathbb{R}(\pi) = H_\mathbb{R}(\sigma) \quad\Rightarrow\quad H_p(\pi) = H_p(\sigma).

And I show in the paper that it is.

  • Why is entropy mod pp a polynomial?

The formula for real entropy is not a polynomial; it involves logarithms. But over /p\mathbb{Z}/p\mathbb{Z}, entropy is a polynomial.

I don’t mean the formula 1p(1 iπ i p)\tfrac{1}{p} (1 - \sum_i \pi_i^p) above, which isn’t a polynomial in the usual sense: recall that we had to interpret it rather carefully in order for it to make sense. The genuine polynomial formula turns out to be

H(π)=π 1 r 1π n r nr 1!r n!, H(\pi) = - \sum \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!},

where the sum is over all 0r 1,,r n<p0 \leq r_1, \ldots, r_n \lt p summing to pp.

It’s a triviality that entropy can be expressed as some polynomial, for the simple reason that every function K nKK^n \to K on a finite field KK can be. But what’s interesting is this particular polynomial.

Let’s call this polynomial hh: so

h(x 1,,x n)=x 1 r 1x n r nr 1!r n!. h(x_1, \ldots, x_n) = - \sum \frac{x_1^{r_1} \cdots x_n^{r_n}}{r_1! \cdots r_n!}.

By a small abuse of notation, hh is actually the name of a whole sequence of polynomials h(x 1,,x n)h(x_1, \ldots, x_n), one for each n1n \geq 1.

The one-variable polynomial h(x 1)h(x_1) is zero. The two-variable polynomial h(x 1,x 2)h(x_1, x_2) reduces to something simpler in the case where x 1+x 2=1x_1 + x_2 = 1 (which we care about because we’re interested in probability distributions):

h(x,1x)= 0<r<px rr. h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}.

That’s not immediately obvious from the definition, but it’s another pleasant exercise.

  • What does this have to do with polylogarithms and homology?

The sum

h(x,1x)= 0<r<px rr h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}

is, formally, a truncation of the power series of log(1x)-\log(1 - x). That’s a bit surprising, in that it’s the mod pp analogue of

H (x,1x)=xlogx(1x)log(1x), H_\mathbb{R}(x, 1 - x) = - x \log x - (1 - x) \log(1 - x),

which is not the same as log(1x)-\log(1 - x)! Nonetheless, it’s true.

There’s a classical theory of polylogarithms, which are the functions

z r1z rr s z \mapsto \sum_{r \geq 1} \frac{z^r}{r^s}

on the complex plane. Here ss is a complex parameter. When s=1s = 1, we recover the ordinary logarithm, essentially: it’s zlog(1z)z \mapsto -\log(1 - z).

Kontsevich’s note, and the paper of Elbaz-Vincent and Gangl in which Kontsevich’s note was published as an appendix, are both interested in “finite polylogarithms”. These are polynomials of the form

0<r<px rr s \sum_{0 \lt r \lt p} \frac{x^r}{r^s}

for integers s1s \geq 1. I know very little about this work — but its starting point is the case s=1s = 1 that comes out of the theory of entropy mod pp.

Finally, what’s the homological connection? We got into this a bit in the conversation Charles Rezk and I had on another recent post, and my comment there gives some links to related work. But the simplest thing to say is this.

Consider the entropy polynomials h(x 1,x 2,,x n)h(x_1, x_2, \ldots, x_n). We looked at the cases n=1n = 1 and n=2n = 2. What about n=3n = 3? It turns out that

h(x,y,z)=h(x,y)+h(x+y,z). h(x, y, z) = h(x, y) + h(x + y, z).

It’s obvious from the definition of hh that it’s a symmetric polynomial, so also

h(x,y,z)=h(y,z)+h(x,y+z). h(x, y, z) = h(y, z) + h(x, y + z).

Putting together these two equations gives

h(x,y)h(x,y+z)+h(x+y,z)h(y,z)=0. h(x, y) - h(x, y + z) + h(x + y, z) - h(y, z) = 0.

In other words, hh is a cocycle! And it can be shown that it’s a nontrivial cocycle — that is, you can’t find a jj satisfying

h(x,y)=j(x)j(x+y)+j(y). h(x, y) = j(x) - j(x + y) + j(y).

Work by people including Cathelineau, Kontsevich, Elbaz-Vincent, Gangl, Bennequin and Baudot exploits this homological connection (references here), but as I don’t know much about all that, I’ll stop there.

Posted at March 21, 2019 8:05 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3098

15 Comments & 0 Trackbacks

Re: Entropy mod p

No one’s biting, maybe because I’ve posted about this a couple of times before, or maybe because it’s hard to see what this is all for.

If that’s your reaction, I’m with you. I find it remarkable that the formalism of entropy over \mathbb{R} — so rich in meaning in information theory, physics, and many other subjects — has a precise analogue over /p\mathbb{Z}/p\mathbb{Z}, where no such meaning is apparent. Still, it’s frustrating not to be able to assign a meaning.

The trouble is not so much with entropy modulo a prime, but probabilities modulo a prime. How can you possibly understand a “probability” that belongs to /p\mathbb{Z}/p\mathbb{Z}?

I don’t know, but it’s useful to compare the situation with negative probabilities. I’ve already recommended a recent paper of Blass and Gurevich, the introduction to which is extremely lucid. And I’ve already quoted a line from the abstract:

We argue that the right question is not what negative probabilities are but what they are for.

Here’s some more from the introduction to their paper:

Quote from Blass and Gurevich paper

They go on to show how negative probabilities can be used (“what they are for”).

What I’m lacking is a demonstration of how probabilities mod pp, and entropy mod pp, can be used.

Posted by: Tom Leinster on March 23, 2019 4:23 PM | Permalink | Reply to this

Re: Entropy mod p

I wrote:

What I’m lacking is a demonstration of how probabilities mod pp, and entropy mod pp, can be used.

That’s actually not quite true. I can come up with artificial problems that can be “solved” with entropy mod pp, though probably there are easier solutions.

Here’s one. Suppose we have an infinite row of urns (1, 2, 3, …), each containing a number of balls. There are only finitely many balls in total, so if we record the number of balls in each urn then we get a sequence

a 1,a 2, a_1, a_2, \ldots

of nonnegative integers, all but finitely many of which are zero.

Suppose we allow ourselves to do the following moves:

  • If some ball contains three or more balls, we can transfer two of them to an empty urn and one more of them to another empty urn.

  • The inverse of the previous move.

  • If urns ii, jj and kk contain the same number of balls as each other, and urns ii', jj' and kk' also contain the same number of balls as each other, we can transfer one ball from urn ii to urn ii', one from urn jj to urn jj', and one from urn kk to urn kk'.

Given two sequences (a i)(a_i) and (b i)(b_i) as above, one can ask whether it’s possible to transform (a i)(a_i) into (b i)(b_i) using a sequence of these moves.

Obviously, a necessary condition for this is that a i\sum a_i = b i\sum b_i (since the moves preserve the total number of balls).

But it’s not sufficient. For example, the sequences

1,2,3,4,5,6,7,8,9,10,0,0,0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 0, 0, \ldots

into

3,1,4,1,5,9,2,6,5,3,5,8,3,0,0,0, 3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 3, 0, 0, 0, \ldots

both sum to 5555 (i.e. there are 5555 balls), but it is impossible to transform one to the other using the moves above.

One way to prove this is to use entropy mod 33. I’m definitely not claiming it’s the most easy or natural way, but you can do it, as follows.

Since 551(mod3)55 \equiv 1 \pmod{3}, any finite sequence summing to 5555 is a probability distribution mod 33, so we can take its entropy. Specifically, the entropy of such a sequence (a i)(a_i) is

13(1a i 3)/3. \frac{1}{3} \Bigl( 1 - \sum a_i^3 \Bigr) \in \mathbb{Z}/3\mathbb{Z}.

So, two sequences (a i)(a_i) and (b i)(b_i) have the same entropy mod 33 iff

a i 3b i 3(mod9). \sum a_i^3 \equiv \sum b_i^3 \pmod{9}.

It’s easy to see that each of the allowable moves leaves the sum of the cubes unchanged, mod 99. So, a necessary condition for one sequence to be transformable into another is that the sums of their cubes are the same, mod 99.

For the two sequences above, that’s not the case. For the first, the cube-sum is 2-2 mod 9 (so its entropy mod 3 is (1(2))/3=1(1- (-2))/3 = 1). For the second, the cube-sum is 11 mod 9 (so its entropy mod 3 is (11)/3=0(1 - 1)/3 = 0). Thus, the first can’t be transformed into the second.

Posted by: Tom Leinster on March 23, 2019 4:43 PM | Permalink | Reply to this

Re: Entropy mod p

Rest assured, I’m not being quiet out of a lack of interest! I’m just operating under the disadvantage that many of the fun things I know how to do with entropy over \mathbb{R} depend upon notions of joint, mutual and conditional entropy.

I know a bit about “negative probability”, since it relates to the topic I’ve been exploring in my recent guest posts (thanks again to John for inviting me!). In the application of “quasiprobability” to physics, one finds that intermediate quantities within calculations can go negative, while the final answer lies nicely in the interval [0,1][0,1]. I’ve been trying to think of an analogue where the intermediate quantities are probabilities mod pp, but I haven’t come up with one yet!

Posted by: Blake Stacey on March 23, 2019 10:30 PM | Permalink | Reply to this

Re: Entropy mod p

Thanks!

There should be notions of conditional entropy, mutual information, etc., that accompany the base notion of entropy mod pp.

I had a go at finding the right definition of relative entropy (this isn’t in the paper), and my conclusion was that in order to make it work, you had to define the relative entropy

H(πσ) H(\pi \| \sigma)

where π\pi is a probability distribution mod pp but σ\sigma is a probability distribution mod p 2p^2. Having them both be distributions mod pp seems to be impossible, in the sense that if you ask for relative entropy to obey the usual kinds of rules, there’s nothing nontrivial that does the job. I don’t know what to make of this except that perhaps it suggests passing to the limit and working with the pp-adic integers.

In the application of “quasiprobability” to physics, one finds that intermediate quantities within calculations can go negative, while the final answer lies nicely in the interval [0,1][0,1]. I’ve been trying to think of an analogue where the intermediate quantities are probabilities mod pp, but I haven’t come up with one yet!

I had a go at this too, but so far with no success. I also looked fruitlessly for an analogue of what Blass and Gurevich call “Piponi’s scenario”.

Posted by: Tom Leinster on March 25, 2019 10:23 AM | Permalink | Reply to this

Re: Entropy mod p

Blake wrote:

many of the fun things I know how to do with entropy over \mathbb{R} depend upon notions of joint, mutual and conditional entropy.

On reflection, there are immediate candidates for all of these notions.

After all, in ordinary information theory (i.e. over \mathbb{R}), all of these notions can be derived from Shannon entropy:

  • the joint entropy of a pair of random variables (X,Y)(X, Y) is just the Shannon entropy of the pair (X,Y)(X, Y) (itself a random variable)

  • conditional entropy is given in terms of joint and ordinary entropy, by H(X|Y)=H(X,Y)H(Y) H(X | Y) = H(X, Y) - H(Y)

  • mutual information is also given in terms of joint and ordinary entropy, by I(X;Y)=H(X)+H(Y)H(X,Y). I(X; Y) = H(X) + H(Y) - H(X, Y).

You could define joint entropy, conditional entropy and mutual information mod pp by these exact same formulas.

In what I hope is self-explanatory notation, the resulting definitions are: H(X) =1p(1 xP(x) p) H(X,Y) =1p(1 x,yP(x,y) p) H(X|Y) =1p( yP(y) p x,yP(x,y) p) I(X;Y) =1p(1+ x,yP(x,y) p xP(x) p yP(y) p). \begin{aligned} H(X) & = \frac{1}{p} \Bigl( 1 - \sum_x \mathbf{P}(x)^p \Bigr) \\ H(X, Y) & = \frac{1}{p} \Bigl( 1 - \sum_{x, y} \mathbf{P}(x, y)^p \Bigr) \\ H(X | Y) & = \frac{1}{p} \Bigl( \sum_y \mathbf{P}(y)^p - \sum_{x, y} \mathbf{P}(x, y)^p \Bigr) \\ I(X; Y) & = \frac{1}{p} \Bigl( 1 + \sum_{x, y} \mathbf{P}(x, y)^p - \sum_x \mathbf{P}(x)^p - \sum_y \mathbf{P}(y)^p \Bigr). \end{aligned} These are the formulas for ordinary entropy, joint entropy, conditional entropy and mutual information, respectively. As in the main post, all ppth powers are to be interpreted as integers mod p 2p^2 (recalling that the congruence class mod pp of an integer determines the congruence class mod p 2p^2 of its ppth power).

Of course, you then have to ask whether these definitions have satisfactory properties, analogous to those that hold in the real case. For instance, is it the case that H(X|Y)= y:P(Y=y)0P(Y=y)H(X|y) H(X | Y) = \sum_{y: \mathbf{P}(Y = y) \neq 0} \mathbf{P}(Y = y) H(X|y) (a formula that’s often taken as the definition of conditional entropy)? If my back-of-an-envelope calculations are right, it is indeed.

What next? What fun can be had?

Posted by: Tom Leinster on March 25, 2019 3:51 PM | Permalink | Reply to this

Re: Entropy mod p

Fascinating and mysterious stuff! I wish I had more time to think about this.

Over \mathbb{R}, we have plenty of entropic inequalities, whose complete classification is still an open problem. As a basic example, take H(XY)H(X)+H(Y)H(XY) \leq H(X) + H(Y), or equivalently the nonnegativity of mutual information. Is there something analogous for entropy mod pp? In other words, if you have given values for H(X)H(X) and H(Y)H(Y), are there any constraints on what H(XY)H(XY) can be?

Posted by: Tobias Fritz on March 25, 2019 4:22 PM | Permalink | Reply to this

Re: Entropy mod p

The issue of entropic inequalities does seem to be an important one that it would be good to have an analogue for. (And these inequalities can be subtle: Mutual information between two random variables is nonnegative, but the tertiary mutual information among three random variables doesn’t have to be.)

Posted by: Blake Stacey on March 25, 2019 11:25 PM | Permalink | Reply to this

Re: Entropy mod p

if you have given values for H(X)H(X) and H(Y)H(Y), are there any constraints on what H(XY)H(X Y) can be?

Interesting question.

Of course, no inequalities are going to happen as we’re in /p\mathbb{Z}/p\mathbb{Z}. But Tobias’s question above makes perfect sense.

I suspect the answer is no, there are no constraints. I spent a little while trying to show this, or more exactly that given any x,y,z/px, y, z \in \mathbb{Z}/p\mathbb{Z}, it’s possible to construct mod pp random variables XX and YY such that

H(X)=x,H(Y)=y,H(XY)=z. H(X) = x, \quad H(Y) = y, \quad H(X Y) = z.

I didn’t succeed, but I reckon it can be done.

As a piece of evidence for my belief, here’s something that happens mod pp but looks weird from the classical real viewpoint: two random variables XX and YY that both have entropy 00, but where the joint distribution has nonzero entropy. Take XX and YY to have joint distribution

(2 1 1 1 1 0 0 0 1) \begin{pmatrix} 2 &-1 &-1 \\ -1&1 &0 \\ 0 &0 &1 \end{pmatrix}

The distributions of XX and YY are the marginals, i.e. the column- and row-sums, which are (1,0,0)(1, 0, 0) and (0,0,1)(0, 0, 1). Both have entropy 00. But the joint entropy H(X,Y)H(X, Y) is

1p(1(2 p+3(1) p+21 p))=(22 p)/p \frac{1}{p} (1 - (2^p + 3\cdot (-1)^p + 2\cdot 1^p)) = (2 - 2^p)/p

(assuming p2p \neq 2). Now pp divides 22 p2 - 2^p, but p 2p^2 doesn’t divide 22 p2 - 2^p unless pp is a Wieferich prime, which are pretty rare. So H(X,Y)0H(X, Y) \neq 0 for most primes pp.

I don’t think we’ll need anything as sophisticated as Wieferich primes in order to show that H(X)H(X), H(Y)H(Y) and H(X,Y)H(X, Y) can take any triple of values. There’s probably some simple explicit construction. Well, that’s my guess anyway.

Posted by: Tom Leinster on March 26, 2019 1:27 PM | Permalink | Reply to this

Re: Entropy mod p

This is fascinating stuff! And while it all seems a bit strange, since nobody knows what to do with entropy mod pp yet, or what it means, I feel sure people will eventually figure it out: anything with formal properties this nice must be good for something, and must ‘mean’ something. You are just a bit ahead of your time here.

Have you thought at all about pp-adic entropy? I don’t know if it exists, though someone has studied the pp-adic entropy of a group action.

Posted by: John Baez on March 24, 2019 7:08 AM | Permalink | Reply to this

Re: Entropy mod p

That’s a nice thing to say :-)

I’ve only dipped my toes into the pp-adic idea. I think the paper of Deninger that you link to goes in another direction, as although his entropies take values in p\mathbb{Q}_p, his work doesn’t involve probabilities in p\mathbb{Q}_p.

It’s a bit hard to tell, as he’s not generalizing directly from the notion of the entropy of a probability distribution, but from the notion of the entropy of a group action. If I’m understanding it right, that ultimately depends on the notion of the entropy of a cover of a space (perhaps a partition of a measure space or a finite open cover of a topological space). So I think in principle that these are real probabilities. But I’m enough of a novice at this kind of stuff that I’m far from 100% sure.

Posted by: Tom Leinster on March 25, 2019 10:29 AM | Permalink | Reply to this

Re: Entropy mod p

Perhaps topological entropy has a mod pp version, if one takes the smallest cardinality of a subcover mod pp? I’m not sure if this is really a natural thing to do though.

Posted by: Tobias Fritz on March 25, 2019 4:27 PM | Permalink | Reply to this

Entropy mod p

If UU is a Z/p\mathbf{Z}/p-valued function on XX, maybe your entropy gives a notion of “Boltzmann distribution” attached to the “Hamiltonian” UU. Find the probability distribution that’s a stationary point for the entropy, instead of maximizing it.

If your probabilities are π 1\pi_1,…π n\pi_n, subject to π i=1\sum \pi_i = 1 and π iU(i)=U\sum \pi_i U(i) = \langle U\rangle, then the Lagrange multipliers way to find the “maximum entropy distribution” is to find the critical points of

H p(π 1,,π n)+α(1π i)+β(Uπ iU(i))H_p(\pi_1,\ldots,\pi_n) + \alpha(1 - \sum \pi_i) + \beta(\langle U \rangle - \sum \pi_i U(i))

that obey the same constraints. I think the derivative of H pH_p with respect to π i\pi_i (just treating it as a polynomial) is 1(p1)!((π i)π i p1)\frac{1}{(p-1)!}((\sum \pi_i) - \pi_i^{p-1}), so the Boltzmann distribution should solve

π i p1=(1+α+βU(i))\pi_i^{p-1} = (1 + \alpha + \beta U(i)) and π i=1\sum \pi_i = 1. Maybe take

Z(β)= i(1+βU(i)) 1/(p1)Z(\beta) = \sum_i (1 + \beta U(i))^{1/(p-1)}

(which is a well-defined series in Z/p[[β]]\mathbf{Z}/p[[\beta]]) for the “partition function.”

Posted by: DT on March 27, 2019 9:56 PM | Permalink | Reply to this

Re: Entropy mod p

The formula for real entropy is not a polynomial; it involves logarithms.

Morally it’s a polynomial :) log(x)=dxx+C\log(x)=\int\frac{dx}{x}+C. Integrate x 1+ϵx^{-1+\epsilon} and get x ϵϵ\frac{x^\epsilon}{\epsilon}. This blows up as ϵ0\epsilon\rightarrow 0. So renormalise by finding an appropriate counterterm to subtract. The term 1ϵ\frac{1}{\epsilon} will do. log(x)=lim ϵ0x ϵ1ϵ\log(x)=\lim_{\epsilon\rightarrow 0}\frac{x^\epsilon-1}{\epsilon} So log\log is “just a monomial added to a constant”. It’s tempting to say that modulo pp we should replace ϵ\epsilon in the limit with pp, which is just 00 modulo pp. But it doesn’t quite give the formula in the original article.

Posted by: Dan Piponi on March 28, 2019 9:10 PM | Permalink | Reply to this

Re: Entropy mod p

That’s an excellent point. Let me expand on it by mentioning something that you surely know about but not everyone does: the qq-logarithms and qq-logarithmic entropies.

For each qq \in \mathbb{R}, the qq-logarithm is the function

ln q:(0,) \ln_q : (0, \infty) \to \mathbb{R}

defined by

ln q(x)= 1 xt qdt. \ln_q(x) = \int_1^x t^{-q} d t.

So ln 1\ln_1 is the ordinary natural logarithm log\log, and for q1q \neq 1,

ln q(x)=x 1q11q. \ln_q(x) = \frac{x^{1 - q} - 1}{1 - q}.

It’s not too hard to show that ln q(x)\ln_q(x) is continuous in qq, and in particular,

log(x)=lim q1ln q(x) \log(x) = \lim_{q \to 1} \ \ln_q(x)

for each fixed xx. That’s exactly the formula you wrote, with a change of variables ϵ=1q\epsilon = 1 - q.

Ordinary Shannon entropy is defined in terms of logarithms:

H(π)=π ilog(1/π i). H(\pi) = \sum \pi_i \log(1/\pi_i).

We can get a family of deformations of Shannon entropy by replacing this logarithm by the qq-logarithm. The resulting qq-logarithmic entropy (usually misattributed to Tsallis) is

S q(π)=π iln q(1/π i). S_q(\pi) = \sum \pi_i \ln_q(1/\pi_i).

When q=1q = 1, this reduces to the Shannon definition, and for q1q \neq 1,

S q(π)=1q1(1π i q). S_q(\pi) = \frac{1}{q - 1} \Bigl( 1 - \sum \pi_i^q \Bigr).

I’d noticed the close resemblance between this formula and the formula for entropy mod pp,

H(π)=1p(1π i p). H(\pi) = \frac{1}{p} \Bigl( 1 - \sum \pi_i^p \Bigr).

If it wasn’t for the discrepancy in the constant terms, we’d be somehow replacing the continuous parameter qq with a prime number pp. I don’t know what to make of this.

Posted by: Tom Leinster on March 28, 2019 10:43 PM | Permalink | Reply to this

Re: Entropy mod p

A bit late to the discussion but I think the discrepancy in the constant term is purely the result of normalising in a non-standard way.

To explain this I need to explain how the pp-adic exponential and logarithm work: if you consider the usual power series definitions of log\log and exp\exp ie log(1+x)= n=1 (1) n+1x nn\log (1+x)=\sum_{n=1}^\infty (-1)^{n+1} \frac{x^n}{n} and expx= n=0 x nn!\exp x=\sum_{n=0}^\infty \frac{x^n}{n!} then log\log defines a group isomorphism (1+p p,)(p p,+)(1+p\mathbb{Z}_p,\cdot)\to (p\mathbb{Z}_p,+) with inverse exp\exp. Moreover since

(1) p ×(/p) ××(1+p p), \mathbb{Z}_p^\times\cong (\mathbb{Z}/p\mathbb{Z})^{\times}\times (1+p\mathbb{Z}_p),

log\log on 1+p p1+p\mathbb{Z}_p extends uniquely to a group homomorphism p ×p p\mathbb{Z}_p^\times\to p\mathbb{Z}_p; this sends all the p1p-1st roots of 11 to 00.

Now if you compose this extended log map with reduction mod p 2p^2 then its kernel is 1+p 2 p1+p^2\mathbb{Z}_p and you induce a group homomorphism p ×/(1+p 2 p)p p/p 2 p\mathbb{Z}_p^\times/(1+p^2\mathbb{Z}_p)\to p\mathbb{Z}_p/p^2\mathbb{Z}_p. The domain of this is naturally isomorphic to (/p 2) ×(\mathbb{Z}/p^2\mathbb{Z})^\times and the codomain is naturally isomorphic to /p\mathbb{Z}/p\mathbb{Z} (under division by pp). So you’d expect this to be your mod pp logarithm. But in fact they differ by a factor of 1-1.

So I’d propose a better way to normalise this mod pp logarithm is log(x)=1px p11p1=1p(1x p1).\log(x)=\frac{1}{p}\frac{x^{p-1}-1}{p-1}=-\frac{1}{p}(1-x^{p-1}). As hinted above the 1p\frac{1}{p} is just coming from the fact that you want the result to live in /p\mathbb{Z}/p\mathbb{Z} not (arguably its more natural home) p/p 2p\mathbb{Z}/p^2\mathbb{Z}.

As an extension to this already extended remark you can also view the extended log\log defined on p ×\mathbb{Z}_p^\times as log(x)=lim nx p n(p1)1p n(p1)\log(x)=\lim_{n\to \infty} \frac{x^{p^n(p-1)}-1}{p^n(p-1)} which is exactly parallel to the formula mentioned by Dan Piponi since p n(p1)0p^n(p-1)\to 0 and nn\to\infty.

I’d be surprised if you couldn’t extend all of the results in your paper to this pp-adic setting with very little effort although I haven’t made any attempt to do so for myself.

Posted by: Simon Wadsley on April 9, 2019 9:55 AM | Permalink | Reply to this

Post a New Comment