## March 21, 2019

### Entropy mod p

#### Posted by Tom Leinster

I’ve just arXived a new paper:

Tom Leinster, Entropy modulo a prime, arXiv:1903.06961

In it, you’ll find answers to these questions and more — and I’ll also answer them in this post:

• What are logarithms and derivations mod $p$?

• What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime $p$?

• What is the right mod $p$ analogue of information loss?

• What does it mean to say that $\log \sqrt{8} \equiv 3 \pmod{7}$?

• Why is entropy mod $p$ a polynomial?

• What does all this have to do with polylogarithms and homology?

I’ve posted about entropy mod $p$ a couple of times before, during the time I was working the whole thing out, but I’ve tidied it up now enough to write it up. Here are my answers to the questions above.

• What are logarithms and derivations mod $p$?

What this question isn’t about is discrete logarithms.

What it is about is the following problem. If you wanted an analogue of the ordinary real logarithm in $\mathbb{Z}/p\mathbb{Z}$, your first thought would probably be to seek a group homomorphism

$\log : \bigl((\mathbb{Z}/p\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).$

But that’s impossible: Lagrange’s theorem immediately implies that the only such homomorphism is trivial.

However, there’s a next-best thing. It’s the Fermat quotient, which up to a constant factor is the unique homomorphism

$q: \bigl((\mathbb{Z}/p^2\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).$

It’s given explicitly by the formula

$q(a) = \frac{a^{p - 1} - 1}{p},$

which is an integer by Fermat’s little theorem. The Fermat quotient turns out to provide an acceptable substitute for the (non-existent) logarithm.

What about derivations? Over $\mathbb{R}$, the function

$\partial : x \mapsto \begin{cases} -x \log x &\text{if }\ x \gt 0 \\ 0 &\text{if }\ x = 0 \end{cases}$

is a nonlinear derivation, in the sense that it satisfies the Leibniz rule

$\partial(x y) = x \partial(y) + \partial(x) y.$

The correct analogue over $\mathbb{Z}/p\mathbb{Z}$ is the function

$\partial: \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}$

given by

$\partial(a) = \frac{a - a^p}{p}.$

This satisfies the same Leibniz rule. If $a$ is not divisible by $p$ then $\partial(a) = -a q(a)$, as you’d expect from the analogy with the real case. Otherwise, $\partial(a) = a/p$.

• What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime $p$?

Ordinarily, a finite probability distribution $\pi$ is a finite sequence $(\pi_1, \ldots, \pi_n)$ of nonnegative reals summing to $1$. What I’m interested in is the same kind of thing, but where $\pi_1, \ldots, \pi_n$ are elements of the ring $\mathbb{Z}/p\mathbb{Z}$.

This might seem odd. So it’s useful to be somewhat aware of the idea of “negative probabilties”, which has a long history in physics going back at least to Feynman. There are several nice papers on that topic, including Feynman’s and a couple by Blass and Gurevich, but here I’ll just link to a recent paper of Blass and Gurevich which gives lots of references and history on the first page. To quote from the abstract:

We argue that the right question is not what negative probabilities are but what they are for.

My attitude to “probabilities mod $p$” is the same. Which isn’t to claim I know what they’re “for” — in fact, I’d love to find an application for this theory, even within mathematics — but it’s a much easier question to answer than “what are they?”

Anyway, I show in my paper that the right definition of the entropy $H(\pi)$ of a probability distribution $\pi = (\pi_1, \ldots, \pi_n)$ mod $p$ is

$H(\pi) = \frac{1}{p} \Bigl( 1 - \sum \pi_i^p \Bigr) \in \mathbb{Z}/p\mathbb{Z}.$

At first, this looks like it doesn’t even make sense. The $\pi_i$ are integers mod $p$, so $1 - \sum \pi_i^p$ is an integer mod $p$, so how can you divide it by $p$? After all, $p = 0$ in $\mathbb{Z}/p\mathbb{Z}$, and you can’t divide by zero.

What saves the day is the little lemma that if $a \equiv b \pmod{p}$ then $a^p \equiv b^p \pmod{p^2}$. This is a pleasant exercise. The upshot is that the operation of raising to the power of $p$ can be regarded as a map

$(-)^p : \mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p^2\mathbb{Z}.$

Interpreted in this way, $1 - \sum \pi_i^p$ is an element of $\mathbb{Z}/p^2\mathbb{Z}$. Since $\sum \pi_i^p = \sum \pi_i = 1$ as elements of $\mathbb{Z}/p\mathbb{Z}$, we can divide by $p$, and the result is the entropy.

(Another way to put it is that $H(\pi) = \tfrac{1}{p}(1 - \sum a_i^p)$, where $a_1, \ldots, a_n$ are integers representing $\pi_1, \ldots, \pi_n$.)

Why’s this the right definition of entropy? Because up to a scalar multiple, it’s the one and only quantity that satisfies the same functional equation that characterizes real entropy — variously called the chain rule, recursivity, or the grouping law. All this is done precisely in the paper.

• What is the right analogue of information loss mod $p$?

There’s a quote about the Grothendieck–Riemann–Roch theorem I really like:

Grothendieck came along and said “No, the Riemann–Roch theorem is not a theorem about varieties, it’s a theorem about morphisms between varieties.”

(Nicholas Katz, quoted here).

In geometry and topology, this kind of conceptual shift is often described with the word relative. Instead of studying spaces $X$, you study spaces $X$ relative to a base $B$, which means maps $X \to B$ of a suitable kind. The classical case of mere spaces is where $B$ is the one-point space.

Categorically, it’s the conceptual shift from objects to maps. Entropy is an invariant of probability spaces. What about maps between probability spaces — measure-preserving transformations? Is there a sensible numerical invariant of these, which reduces to entropy when the codomain is the one-point space?

John Baez, Tobias Fritz and I showed that the answer is yes. You can talk in a reasonable way about the information loss of a measure-preserving map between probability spaces. (The terminology makes sense if you think of such a map as a deterministic process: click the link for explanation!) It’s closely related to conditional entropy.

The shift from entropy to information loss has the following big advantage. The recursivity equation that characterizes entropy is a bit complicated. But John, Tobias and I proved that information loss is characterized by a couple of equations that are nothing more than linearity conditions. They’re easy:

1. if you do two processes in sequence, the amount of information lost by the composite processes is the sum of the amount lost by each process individually;

2. if you flip a probability-$\lambda$ coin and do one or other of two different processes, the information lost overall is $\lambda$ times the information loss of the first process plus $1 - \lambda$ times the information lost by the second.

Everything I’ve just been saying about information loss is over $\mathbb{R}$. Over $\mathbb{Z}/p\mathbb{Z}$, it’s much harder to understand things information-theoretically. But I show in my new paper that there is a precisely analogous definition of information loss mod $p$, and it satisfies a precisely analogous characterization theorem.

• What does it mean to say that $\log \sqrt{8} \equiv 3 \pmod{7}$?

The direct inspiration for my paper was a very short note by Maxim Kontsevich (which I’m grateful to Herbert Gangl for pointing out to me). In it, Kontsevich made an intriguing suggestion.

Consider a finite probability distribution

$\pi = (\pi_1, \ldots, \pi_n)$

where all the probabilities $\pi_i$ are rational numbers, say $\alpha_i/\beta$. Rational probabilities arise naturally: you get them in ball/urn or scenarios and other combinatorial problems.

We can, of course, regard the $\pi_i$s as real numbers, and take the real entropy

$H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i \in \mathbb{R}.$

But there’s also something else we can do.

Only finitely many primes divide $\beta$, and as long as $p$ doesn’t have the bad luck to be one of them, each $\pi_i = \alpha_i/\beta$ defines an element of $\mathbb{Z}/p\mathbb{Z}$. Hence $\pi$ defines a probability distribution mod $p$, and we can take its mod $p$ entropy

$H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}.$

Kontsevich’s suggestion, playfully made, was to view $H_p(\pi)$ as the “residue mod $p$” of the real (usually transcendental) number $H_\mathbb{R}(\pi)$.

For instance, take

$\pi = (1/4, 1/4, 1/2)$

and $p = 7$. The real entropy of the distribution is

$H_\mathbb{R}(\pi) = \frac{1}{4} \log 4 + \frac{1}{4} \log 4 + \frac{1}{2} \log 2 = \log \sqrt{8}.$

In $\mathbb{Z}/7\mathbb{Z}$,

$\pi = (2, 2, 4),$

so

$H_7(\pi) = \frac{1}{7} \Bigl( 1 - [2^7 + 2^7 + 4^7]\Bigr) = 3 \in \mathbb{Z}/7\mathbb{Z}.$

In this sense,

$\log \sqrt{8} \equiv 3 \pmod{7}.$

There’s a bit of work to do in order to see that Kontsevich’s suggestion really makes sense. For instance, if we’re going to view $H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}$ as the “residue” of the real number $H_\mathbb{R}(\pi)$, then it had better be the case that

$H_\mathbb{R}(\pi) = H_\mathbb{R}(\sigma) \quad\Rightarrow\quad H_p(\pi) = H_p(\sigma).$

And I show in the paper that it is.

• Why is entropy mod $p$ a polynomial?

The formula for real entropy is not a polynomial; it involves logarithms. But over $\mathbb{Z}/p\mathbb{Z}$, entropy is a polynomial.

I don’t mean the formula $\tfrac{1}{p} (1 - \sum_i \pi_i^p)$ above, which isn’t a polynomial in the usual sense: recall that we had to interpret it rather carefully in order for it to make sense. The genuine polynomial formula turns out to be

$H(\pi) = - \sum \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!},$

where the sum is over all $0 \leq r_1, \ldots, r_n \lt p$ summing to $p$.

It’s a triviality that entropy can be expressed as some polynomial, for the simple reason that every function $K^n \to K$ on a finite field $K$ can be. But what’s interesting is this particular polynomial.

Let’s call this polynomial $h$: so

$h(x_1, \ldots, x_n) = - \sum \frac{x_1^{r_1} \cdots x_n^{r_n}}{r_1! \cdots r_n!}.$

By a small abuse of notation, $h$ is actually the name of a whole sequence of polynomials $h(x_1, \ldots, x_n)$, one for each $n \geq 1$.

The one-variable polynomial $h(x_1)$ is zero. The two-variable polynomial $h(x_1, x_2)$ reduces to something simpler in the case where $x_1 + x_2 = 1$ (which we care about because we’re interested in probability distributions):

$h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}.$

That’s not immediately obvious from the definition, but it’s another pleasant exercise.

• What does this have to do with polylogarithms and homology?

The sum

$h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}$

is, formally, a truncation of the power series of $-\log(1 - x)$. That’s a bit surprising, in that it’s the mod $p$ analogue of

$H_\mathbb{R}(x, 1 - x) = - x \log x - (1 - x) \log(1 - x),$

which is not the same as $-\log(1 - x)$! Nonetheless, it’s true.

There’s a classical theory of polylogarithms, which are the functions

$z \mapsto \sum_{r \geq 1} \frac{z^r}{r^s}$

on the complex plane. Here $s$ is a complex parameter. When $s = 1$, we recover the ordinary logarithm, essentially: it’s $z \mapsto -\log(1 - z)$.

Kontsevich’s note, and the paper of Elbaz-Vincent and Gangl in which Kontsevich’s note was published as an appendix, are both interested in “finite polylogarithms”. These are polynomials of the form

$\sum_{0 \lt r \lt p} \frac{x^r}{r^s}$

for integers $s \geq 1$. I know very little about this work — but its starting point is the case $s = 1$ that comes out of the theory of entropy mod $p$.

Finally, what’s the homological connection? We got into this a bit in the conversation Charles Rezk and I had on another recent post, and my comment there gives some links to related work. But the simplest thing to say is this.

Consider the entropy polynomials $h(x_1, x_2, \ldots, x_n)$. We looked at the cases $n = 1$ and $n = 2$. What about $n = 3$? It turns out that

$h(x, y, z) = h(x, y) + h(x + y, z).$

It’s obvious from the definition of $h$ that it’s a symmetric polynomial, so also

$h(x, y, z) = h(y, z) + h(x, y + z).$

Putting together these two equations gives

$h(x, y) - h(x, y + z) + h(x + y, z) - h(y, z) = 0.$

In other words, $h$ is a cocycle! And it can be shown that it’s a nontrivial cocycle — that is, you can’t find a $j$ satisfying

$h(x, y) = j(x) - j(x + y) + j(y).$

Work by people including Cathelineau, Kontsevich, Elbaz-Vincent, Gangl, Bennequin and Baudot exploits this homological connection (references here), but as I don’t know much about all that, I’ll stop there.

Posted at March 21, 2019 8:05 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3098

### Re: Entropy mod p

No one’s biting, maybe because I’ve posted about this a couple of times before, or maybe because it’s hard to see what this is all for.

If that’s your reaction, I’m with you. I find it remarkable that the formalism of entropy over $\mathbb{R}$ — so rich in meaning in information theory, physics, and many other subjects — has a precise analogue over $\mathbb{Z}/p\mathbb{Z}$, where no such meaning is apparent. Still, it’s frustrating not to be able to assign a meaning.

The trouble is not so much with entropy modulo a prime, but probabilities modulo a prime. How can you possibly understand a “probability” that belongs to $\mathbb{Z}/p\mathbb{Z}$?

I don’t know, but it’s useful to compare the situation with negative probabilities. I’ve already recommended a recent paper of Blass and Gurevich, the introduction to which is extremely lucid. And I’ve already quoted a line from the abstract:

We argue that the right question is not what negative probabilities are but what they are for.

Here’s some more from the introduction to their paper:

They go on to show how negative probabilities can be used (“what they are for”).

What I’m lacking is a demonstration of how probabilities mod $p$, and entropy mod $p$, can be used.

Posted by: Tom Leinster on March 23, 2019 4:23 PM | Permalink | Reply to this

### Re: Entropy mod p

I wrote:

What I’m lacking is a demonstration of how probabilities mod $p$, and entropy mod $p$, can be used.

That’s actually not quite true. I can come up with artificial problems that can be “solved” with entropy mod $p$, though probably there are easier solutions.

Here’s one. Suppose we have an infinite row of urns (1, 2, 3, …), each containing a number of balls. There are only finitely many balls in total, so if we record the number of balls in each urn then we get a sequence

$a_1, a_2, \ldots$

of nonnegative integers, all but finitely many of which are zero.

Suppose we allow ourselves to do the following moves:

• If some ball contains three or more balls, we can transfer two of them to an empty urn and one more of them to another empty urn.

• The inverse of the previous move.

• If urns $i$, $j$ and $k$ contain the same number of balls as each other, and urns $i'$, $j'$ and $k'$ also contain the same number of balls as each other, we can transfer one ball from urn $i$ to urn $i'$, one from urn $j$ to urn $j'$, and one from urn $k$ to urn $k'$.

Given two sequences $(a_i)$ and $(b_i)$ as above, one can ask whether it’s possible to transform $(a_i)$ into $(b_i)$ using a sequence of these moves.

Obviously, a necessary condition for this is that $\sum a_i$ = $\sum b_i$ (since the moves preserve the total number of balls).

But it’s not sufficient. For example, the sequences

$1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 0, 0, \ldots$

into

$3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 3, 0, 0, 0, \ldots$

both sum to $55$ (i.e. there are $55$ balls), but it is impossible to transform one to the other using the moves above.

One way to prove this is to use entropy mod $3$. I’m definitely not claiming it’s the most easy or natural way, but you can do it, as follows.

Since $55 \equiv 1 \pmod{3}$, any finite sequence summing to $55$ is a probability distribution mod $3$, so we can take its entropy. Specifically, the entropy of such a sequence $(a_i)$ is

$\frac{1}{3} \Bigl( 1 - \sum a_i^3 \Bigr) \in \mathbb{Z}/3\mathbb{Z}.$

So, two sequences $(a_i)$ and $(b_i)$ have the same entropy mod $3$ iff

$\sum a_i^3 \equiv \sum b_i^3 \pmod{9}.$

It’s easy to see that each of the allowable moves leaves the sum of the cubes unchanged, mod $9$. So, a necessary condition for one sequence to be transformable into another is that the sums of their cubes are the same, mod $9$.

For the two sequences above, that’s not the case. For the first, the cube-sum is $-2$ mod 9 (so its entropy mod 3 is $(1- (-2))/3 = 1$). For the second, the cube-sum is $1$ mod 9 (so its entropy mod 3 is $(1 - 1)/3 = 0$). Thus, the first can’t be transformed into the second.

Posted by: Tom Leinster on March 23, 2019 4:43 PM | Permalink | Reply to this

### Re: Entropy mod p

Rest assured, I’m not being quiet out of a lack of interest! I’m just operating under the disadvantage that many of the fun things I know how to do with entropy over $\mathbb{R}$ depend upon notions of joint, mutual and conditional entropy.

I know a bit about “negative probability”, since it relates to the topic I’ve been exploring in my recent guest posts (thanks again to John for inviting me!). In the application of “quasiprobability” to physics, one finds that intermediate quantities within calculations can go negative, while the final answer lies nicely in the interval $[0,1]$. I’ve been trying to think of an analogue where the intermediate quantities are probabilities mod $p$, but I haven’t come up with one yet!

Posted by: Blake Stacey on March 23, 2019 10:30 PM | Permalink | Reply to this

### Re: Entropy mod p

Thanks!

There should be notions of conditional entropy, mutual information, etc., that accompany the base notion of entropy mod $p$.

I had a go at finding the right definition of relative entropy (this isn’t in the paper), and my conclusion was that in order to make it work, you had to define the relative entropy

$H(\pi \| \sigma)$

where $\pi$ is a probability distribution mod $p$ but $\sigma$ is a probability distribution mod $p^2$. Having them both be distributions mod $p$ seems to be impossible, in the sense that if you ask for relative entropy to obey the usual kinds of rules, there’s nothing nontrivial that does the job. I don’t know what to make of this except that perhaps it suggests passing to the limit and working with the $p$-adic integers.

In the application of “quasiprobability” to physics, one finds that intermediate quantities within calculations can go negative, while the final answer lies nicely in the interval $[0,1]$. I’ve been trying to think of an analogue where the intermediate quantities are probabilities mod $p$, but I haven’t come up with one yet!

I had a go at this too, but so far with no success. I also looked fruitlessly for an analogue of what Blass and Gurevich call “Piponi’s scenario”.

Posted by: Tom Leinster on March 25, 2019 10:23 AM | Permalink | Reply to this

### Re: Entropy mod p

Blake wrote:

many of the fun things I know how to do with entropy over $\mathbb{R}$ depend upon notions of joint, mutual and conditional entropy.

On reflection, there are immediate candidates for all of these notions.

After all, in ordinary information theory (i.e. over $\mathbb{R}$), all of these notions can be derived from Shannon entropy:

• the joint entropy of a pair of random variables $(X, Y)$ is just the Shannon entropy of the pair $(X, Y)$ (itself a random variable)

• conditional entropy is given in terms of joint and ordinary entropy, by $H(X | Y) = H(X, Y) - H(Y)$

• mutual information is also given in terms of joint and ordinary entropy, by $I(X; Y) = H(X) + H(Y) - H(X, Y).$

You could define joint entropy, conditional entropy and mutual information mod $p$ by these exact same formulas.

In what I hope is self-explanatory notation, the resulting definitions are: \begin{aligned} H(X) & = \frac{1}{p} \Bigl( 1 - \sum_x \mathbf{P}(x)^p \Bigr) \\ H(X, Y) & = \frac{1}{p} \Bigl( 1 - \sum_{x, y} \mathbf{P}(x, y)^p \Bigr) \\ H(X | Y) & = \frac{1}{p} \Bigl( \sum_y \mathbf{P}(y)^p - \sum_{x, y} \mathbf{P}(x, y)^p \Bigr) \\ I(X; Y) & = \frac{1}{p} \Bigl( 1 + \sum_{x, y} \mathbf{P}(x, y)^p - \sum_x \mathbf{P}(x)^p - \sum_y \mathbf{P}(y)^p \Bigr). \end{aligned} These are the formulas for ordinary entropy, joint entropy, conditional entropy and mutual information, respectively. As in the main post, all $p$th powers are to be interpreted as integers mod $p^2$ (recalling that the congruence class mod $p$ of an integer determines the congruence class mod $p^2$ of its $p$th power).

Of course, you then have to ask whether these definitions have satisfactory properties, analogous to those that hold in the real case. For instance, is it the case that $H(X | Y) = \sum_{y: \mathbf{P}(Y = y) \neq 0} \mathbf{P}(Y = y) H(X|y)$ (a formula that’s often taken as the definition of conditional entropy)? If my back-of-an-envelope calculations are right, it is indeed.

What next? What fun can be had?

Posted by: Tom Leinster on March 25, 2019 3:51 PM | Permalink | Reply to this

### Re: Entropy mod p

Over $\mathbb{R}$, we have plenty of entropic inequalities, whose complete classification is still an open problem. As a basic example, take $H(XY) \leq H(X) + H(Y)$, or equivalently the nonnegativity of mutual information. Is there something analogous for entropy mod $p$? In other words, if you have given values for $H(X)$ and $H(Y)$, are there any constraints on what $H(XY)$ can be?

Posted by: Tobias Fritz on March 25, 2019 4:22 PM | Permalink | Reply to this

### Re: Entropy mod p

The issue of entropic inequalities does seem to be an important one that it would be good to have an analogue for. (And these inequalities can be subtle: Mutual information between two random variables is nonnegative, but the tertiary mutual information among three random variables doesn’t have to be.)

Posted by: Blake Stacey on March 25, 2019 11:25 PM | Permalink | Reply to this

### Re: Entropy mod p

if you have given values for $H(X)$ and $H(Y)$, are there any constraints on what $H(X Y)$ can be?

Interesting question.

Of course, no inequalities are going to happen as we’re in $\mathbb{Z}/p\mathbb{Z}$. But Tobias’s question above makes perfect sense.

I suspect the answer is no, there are no constraints. I spent a little while trying to show this, or more exactly that given any $x, y, z \in \mathbb{Z}/p\mathbb{Z}$, it’s possible to construct mod $p$ random variables $X$ and $Y$ such that

$H(X) = x, \quad H(Y) = y, \quad H(X Y) = z.$

I didn’t succeed, but I reckon it can be done.

As a piece of evidence for my belief, here’s something that happens mod $p$ but looks weird from the classical real viewpoint: two random variables $X$ and $Y$ that both have entropy $0$, but where the joint distribution has nonzero entropy. Take $X$ and $Y$ to have joint distribution

$\begin{pmatrix} 2 &-1 &-1 \\ -1&1 &0 \\ 0 &0 &1 \end{pmatrix}$

The distributions of $X$ and $Y$ are the marginals, i.e. the column- and row-sums, which are $(1, 0, 0)$ and $(0, 0, 1)$. Both have entropy $0$. But the joint entropy $H(X, Y)$ is

$\frac{1}{p} (1 - (2^p + 3\cdot (-1)^p + 2\cdot 1^p)) = (2 - 2^p)/p$

(assuming $p \neq 2$). Now $p$ divides $2 - 2^p$, but $p^2$ doesn’t divide $2 - 2^p$ unless $p$ is a Wieferich prime, which are pretty rare. So $H(X, Y) \neq 0$ for most primes $p$.

I don’t think we’ll need anything as sophisticated as Wieferich primes in order to show that $H(X)$, $H(Y)$ and $H(X, Y)$ can take any triple of values. There’s probably some simple explicit construction. Well, that’s my guess anyway.

Posted by: Tom Leinster on March 26, 2019 1:27 PM | Permalink | Reply to this

### Re: Entropy mod p

This is fascinating stuff! And while it all seems a bit strange, since nobody knows what to do with entropy mod $p$ yet, or what it means, I feel sure people will eventually figure it out: anything with formal properties this nice must be good for something, and must ‘mean’ something. You are just a bit ahead of your time here.

Have you thought at all about $p$-adic entropy? I don’t know if it exists, though someone has studied the $p$-adic entropy of a group action.

Posted by: John Baez on March 24, 2019 7:08 AM | Permalink | Reply to this

### Re: Entropy mod p

That’s a nice thing to say :-)

I’ve only dipped my toes into the $p$-adic idea. I think the paper of Deninger that you link to goes in another direction, as although his entropies take values in $\mathbb{Q}_p$, his work doesn’t involve probabilities in $\mathbb{Q}_p$.

It’s a bit hard to tell, as he’s not generalizing directly from the notion of the entropy of a probability distribution, but from the notion of the entropy of a group action. If I’m understanding it right, that ultimately depends on the notion of the entropy of a cover of a space (perhaps a partition of a measure space or a finite open cover of a topological space). So I think in principle that these are real probabilities. But I’m enough of a novice at this kind of stuff that I’m far from 100% sure.

Posted by: Tom Leinster on March 25, 2019 10:29 AM | Permalink | Reply to this

### Re: Entropy mod p

Perhaps topological entropy has a mod $p$ version, if one takes the smallest cardinality of a subcover mod $p$? I’m not sure if this is really a natural thing to do though.

Posted by: Tobias Fritz on March 25, 2019 4:27 PM | Permalink | Reply to this

### Entropy mod p

If $U$ is a $\mathbf{Z}/p$-valued function on $X$, maybe your entropy gives a notion of “Boltzmann distribution” attached to the “Hamiltonian” $U$. Find the probability distribution that’s a stationary point for the entropy, instead of maximizing it.

If your probabilities are $\pi_1$,…$\pi_n$, subject to $\sum \pi_i = 1$ and $\sum \pi_i U(i) = \langle U\rangle$, then the Lagrange multipliers way to find the “maximum entropy distribution” is to find the critical points of

$H_p(\pi_1,\ldots,\pi_n) + \alpha(1 - \sum \pi_i) + \beta(\langle U \rangle - \sum \pi_i U(i))$

that obey the same constraints. I think the derivative of $H_p$ with respect to $\pi_i$ (just treating it as a polynomial) is $\frac{1}{(p-1)!}((\sum \pi_i) - \pi_i^{p-1})$, so the Boltzmann distribution should solve

$\pi_i^{p-1} = (1 + \alpha + \beta U(i))$ and $\sum \pi_i = 1$. Maybe take

$Z(\beta) = \sum_i (1 + \beta U(i))^{1/(p-1)}$

(which is a well-defined series in $\mathbf{Z}/p[[\beta]]$) for the “partition function.”

Posted by: DT on March 27, 2019 9:56 PM | Permalink | Reply to this

### Re: Entropy mod p

The formula for real entropy is not a polynomial; it involves logarithms.

Morally it’s a polynomial :) $\log(x)=\int\frac{dx}{x}+C$. Integrate $x^{-1+\epsilon}$ and get $\frac{x^\epsilon}{\epsilon}$. This blows up as $\epsilon\rightarrow 0$. So renormalise by finding an appropriate counterterm to subtract. The term $\frac{1}{\epsilon}$ will do. $\log(x)=\lim_{\epsilon\rightarrow 0}\frac{x^\epsilon-1}{\epsilon}$ So $\log$ is “just a monomial added to a constant”. It’s tempting to say that modulo $p$ we should replace $\epsilon$ in the limit with $p$, which is just $0$ modulo $p$. But it doesn’t quite give the formula in the original article.

Posted by: Dan Piponi on March 28, 2019 9:10 PM | Permalink | Reply to this

### Re: Entropy mod p

That’s an excellent point. Let me expand on it by mentioning something that you surely know about but not everyone does: the $q$-logarithms and $q$-logarithmic entropies.

For each $q \in \mathbb{R}$, the $q$-logarithm is the function

$\ln_q : (0, \infty) \to \mathbb{R}$

defined by

$\ln_q(x) = \int_1^x t^{-q} d t.$

So $\ln_1$ is the ordinary natural logarithm $\log$, and for $q \neq 1$,

$\ln_q(x) = \frac{x^{1 - q} - 1}{1 - q}.$

It’s not too hard to show that $\ln_q(x)$ is continuous in $q$, and in particular,

$\log(x) = \lim_{q \to 1} \ \ln_q(x)$

for each fixed $x$. That’s exactly the formula you wrote, with a change of variables $\epsilon = 1 - q$.

Ordinary Shannon entropy is defined in terms of logarithms:

$H(\pi) = \sum \pi_i \log(1/\pi_i).$

We can get a family of deformations of Shannon entropy by replacing this logarithm by the $q$-logarithm. The resulting $q$-logarithmic entropy (usually misattributed to Tsallis) is

$S_q(\pi) = \sum \pi_i \ln_q(1/\pi_i).$

When $q = 1$, this reduces to the Shannon definition, and for $q \neq 1$,

$S_q(\pi) = \frac{1}{q - 1} \Bigl( 1 - \sum \pi_i^q \Bigr).$

I’d noticed the close resemblance between this formula and the formula for entropy mod $p$,

$H(\pi) = \frac{1}{p} \Bigl( 1 - \sum \pi_i^p \Bigr).$

If it wasn’t for the discrepancy in the constant terms, we’d be somehow replacing the continuous parameter $q$ with a prime number $p$. I don’t know what to make of this.

Posted by: Tom Leinster on March 28, 2019 10:43 PM | Permalink | Reply to this

### Re: Entropy mod p

A bit late to the discussion but I think the discrepancy in the constant term is purely the result of normalising in a non-standard way.

To explain this I need to explain how the $p$-adic exponential and logarithm work: if you consider the usual power series definitions of $\log$ and $\exp$ ie $\log (1+x)=\sum_{n=1}^\infty (-1)^{n+1} \frac{x^n}{n}$ and $\exp x=\sum_{n=0}^\infty \frac{x^n}{n!}$ then $\log$ defines a group isomorphism $(1+p\mathbb{Z}_p,\cdot)\to (p\mathbb{Z}_p,+)$ with inverse $\exp$. Moreover since

(1)$\mathbb{Z}_p^\times\cong (\mathbb{Z}/p\mathbb{Z})^{\times}\times (1+p\mathbb{Z}_p),$

$\log$ on $1+p\mathbb{Z}_p$ extends uniquely to a group homomorphism $\mathbb{Z}_p^\times\to p\mathbb{Z}_p$; this sends all the $p-1$st roots of $1$ to $0$.

Now if you compose this extended log map with reduction mod $p^2$ then its kernel is $1+p^2\mathbb{Z}_p$ and you induce a group homomorphism $\mathbb{Z}_p^\times/(1+p^2\mathbb{Z}_p)\to p\mathbb{Z}_p/p^2\mathbb{Z}_p$. The domain of this is naturally isomorphic to $(\mathbb{Z}/p^2\mathbb{Z})^\times$ and the codomain is naturally isomorphic to $\mathbb{Z}/p\mathbb{Z}$ (under division by $p$). So you’d expect this to be your mod $p$ logarithm. But in fact they differ by a factor of $-1$.

So I’d propose a better way to normalise this mod $p$ logarithm is $\log(x)=\frac{1}{p}\frac{x^{p-1}-1}{p-1}=-\frac{1}{p}(1-x^{p-1}).$ As hinted above the $\frac{1}{p}$ is just coming from the fact that you want the result to live in $\mathbb{Z}/p\mathbb{Z}$ not (arguably its more natural home) $p\mathbb{Z}/p^2\mathbb{Z}$.

As an extension to this already extended remark you can also view the extended $\log$ defined on $\mathbb{Z}_p^\times$ as $\log(x)=\lim_{n\to \infty} \frac{x^{p^n(p-1)}-1}{p^n(p-1)}$ which is exactly parallel to the formula mentioned by Dan Piponi since $p^n(p-1)\to 0$ and $n\to\infty$.

I’d be surprised if you couldn’t extend all of the results in your paper to this $p$-adic setting with very little effort although I haven’t made any attempt to do so for myself.