### Entropy mod *p*

#### Posted by Tom Leinster

I’ve just arXived a new paper:

Tom Leinster, Entropy modulo a prime, arXiv:1903.06961

In it, you’ll find answers to these questions and more — and I’ll also answer them in this post:

What are logarithms and derivations mod $p$?

What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime $p$?

What is the right mod $p$ analogue of information loss?

What does it mean to say that $\log \sqrt{8} \equiv 3 \pmod{7}$?

Why is entropy mod $p$ a polynomial?

What does all this have to do with polylogarithms and homology?

I’ve posted about entropy mod $p$ a couple of times before, during the time I was working the whole thing out, but I’ve tidied it up now enough to write it up. Here are my answers to the questions above.

- What are logarithms and derivations mod $p$?

What this question *isn’t* about is discrete
logarithms.

What it *is* about is the following problem. If you wanted an analogue of
the ordinary real logarithm in $\mathbb{Z}/p\mathbb{Z}$, your first thought
would probably be to seek a group homomorphism

$\log : \bigl((\mathbb{Z}/p\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).$

But that’s impossible: Lagrange’s theorem immediately implies that the only such homomorphism is trivial.

However, there’s a next-best thing. It’s the Fermat quotient, which up to a constant factor is the unique homomorphism

$q: \bigl((\mathbb{Z}/p^2\mathbb{Z})^\times, \cdot\bigr) \to (\mathbb{Z}/p\mathbb{Z}, +).$

It’s given explicitly by the formula

$q(a) = \frac{a^{p - 1} - 1}{p},$

which is an integer by Fermat’s little theorem. The Fermat quotient turns out to provide an acceptable substitute for the (non-existent) logarithm.

What about derivations? Over $\mathbb{R}$, the function

$\partial : x \mapsto \begin{cases} -x \log x &\text{if }\ x \gt 0 \\ 0 &\text{if }\ x = 0 \end{cases}$

is a **nonlinear derivation**, in the sense that it satisfies the Leibniz
rule

$\partial(x y) = x \partial(y) + \partial(x) y.$

The correct analogue over $\mathbb{Z}/p\mathbb{Z}$ is the function

$\partial: \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}$

given by

$\partial(a) = \frac{a - a^p}{p}.$

This satisfies the same Leibniz rule. If $a$ is not divisible by $p$ then $\partial(a) = -a q(a)$, as you’d expect from the analogy with the real case. Otherwise, $\partial(a) = a/p$.

- What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime $p$?

Ordinarily, a finite probability distribution $\pi$ is a finite sequence $(\pi_1, \ldots, \pi_n)$ of nonnegative reals summing to $1$. What I’m interested in is the same kind of thing, but where $\pi_1, \ldots, \pi_n$ are elements of the ring $\mathbb{Z}/p\mathbb{Z}$.

This might seem odd. So it’s useful to be somewhat aware of the idea of “negative probabilties”, which has a long history in physics going back at least to Feynman. There are several nice papers on that topic, including Feynman’s and a couple by Blass and Gurevich, but here I’ll just link to a recent paper of Blass and Gurevich which gives lots of references and history on the first page. To quote from the abstract:

We argue that the right question is not what negative probabilities are but what they are for.

My attitude to “probabilities mod $p$” is the same. Which isn’t to claim I know what they’re “for” — in fact, I’d love to find an application for this theory, even within mathematics — but it’s a much easier question to answer than “what are they?”

Anyway, I show in my paper that the right definition of the entropy $H(\pi)$ of a probability distribution $\pi = (\pi_1, \ldots, \pi_n)$ mod $p$ is

$H(\pi) = \frac{1}{p} \Bigl( 1 - \sum \pi_i^p \Bigr) \in \mathbb{Z}/p\mathbb{Z}.$

At first, this looks like it doesn’t even make sense. The $\pi_i$ are integers mod $p$, so $1 - \sum \pi_i^p$ is an integer mod $p$, so how can you divide it by $p$? After all, $p = 0$ in $\mathbb{Z}/p\mathbb{Z}$, and you can’t divide by zero.

What saves the day is the little lemma that if $a \equiv b \pmod{p}$ then $a^p \equiv b^p \pmod{p^2}$. This is a pleasant exercise. The upshot is that the operation of raising to the power of $p$ can be regarded as a map

$(-)^p : \mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p^2\mathbb{Z}.$

Interpreted in this way, $1 - \sum \pi_i^p$ is an element of $\mathbb{Z}/p^2\mathbb{Z}$. Since $\sum \pi_i^p = \sum \pi_i = 1$ as elements of $\mathbb{Z}/p\mathbb{Z}$, we can divide by $p$, and the result is the entropy.

(Another way to put it is that $H(\pi) = \tfrac{1}{p}(1 - \sum a_i^p)$, where $a_1, \ldots, a_n$ are integers representing $\pi_1, \ldots, \pi_n$.)

Why’s this the right definition of entropy? Because up to a scalar
multiple, it’s the one and only quantity that satisfies the same functional
equation that characterizes real entropy — variously called the
*chain rule*, *recursivity*, or the *grouping law*. All this is done
precisely in the paper.

- What is the right analogue of information loss mod $p$?

There’s a quote about the Grothendieck–Riemann–Roch theorem I really like:

Grothendieck came along and said “No, the Riemann–Roch theorem is

nota theorem about varieties, it’s a theorem about morphisms between varieties.”

(Nicholas Katz, quoted here).

In geometry and topology, this kind of conceptual shift is often described
with the word *relative*. Instead of studying spaces $X$, you study
spaces $X$ relative to a base $B$, which means maps $X \to B$ of a suitable
kind. The classical case of mere spaces is where $B$ is the one-point
space.

Categorically, it’s the conceptual shift from objects to maps. Entropy is an invariant of probability spaces. What about maps between probability spaces — measure-preserving transformations? Is there a sensible numerical invariant of these, which reduces to entropy when the codomain is the one-point space?

John Baez, Tobias Fritz and I showed
that the answer is yes. You can talk in a reasonable way about the *information loss* of a
measure-preserving map between probability spaces. (The terminology makes
sense if you think of such a map as a deterministic process: click the link
for explanation!) It’s closely related to conditional entropy.

The shift from entropy to information loss has the following big advantage. The recursivity equation that characterizes entropy is a bit complicated. But John, Tobias and I proved that information loss is characterized by a couple of equations that are nothing more than linearity conditions. They’re easy:

if you do two processes in sequence, the amount of information lost by the composite processes is the sum of the amount lost by each process individually;

if you flip a probability-$\lambda$ coin and do one or other of two different processes, the information lost overall is $\lambda$ times the information loss of the first process plus $1 - \lambda$ times the information lost by the second.

Everything I’ve just been saying about information loss is over $\mathbb{R}$. Over $\mathbb{Z}/p\mathbb{Z}$, it’s much harder to understand things information-theoretically. But I show in my new paper that there is a precisely analogous definition of information loss mod $p$, and it satisfies a precisely analogous characterization theorem.

- What does it mean to say that $\log \sqrt{8} \equiv 3 \pmod{7}$?

The direct inspiration for my paper was a very short note by Maxim Kontsevich (which I’m grateful to Herbert Gangl for pointing out to me). In it, Kontsevich made an intriguing suggestion.

Consider a finite probability distribution

$\pi = (\pi_1, \ldots, \pi_n)$

where all the probabilities $\pi_i$ are rational numbers, say $\alpha_i/\beta$. Rational probabilities arise naturally: you get them in ball/urn or scenarios and other combinatorial problems.

We can, of course, regard the $\pi_i$s as real numbers, and take the real entropy

$H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i \in \mathbb{R}.$

But there’s also something else we can do.

Only finitely many primes divide $\beta$, and as long as $p$ doesn’t have the bad luck to be one of them, each $\pi_i = \alpha_i/\beta$ defines an element of $\mathbb{Z}/p\mathbb{Z}$. Hence $\pi$ defines a probability distribution mod $p$, and we can take its mod $p$ entropy

$H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}.$

Kontsevich’s suggestion, playfully made, was to view $H_p(\pi)$ as the “residue mod $p$” of the real (usually transcendental) number $H_\mathbb{R}(\pi)$.

For instance, take

$\pi = (1/4, 1/4, 1/2)$

and $p = 7$. The real entropy of the distribution is

$H_\mathbb{R}(\pi) = \frac{1}{4} \log 4 + \frac{1}{4} \log 4 + \frac{1}{2} \log 2 = \log \sqrt{8}.$

In $\mathbb{Z}/7\mathbb{Z}$,

$\pi = (2, 2, 4),$

so

$H_7(\pi) = \frac{1}{7} \Bigl( 1 - [2^7 + 2^7 + 4^7]\Bigr) = 3 \in \mathbb{Z}/7\mathbb{Z}.$

In this sense,

$\log \sqrt{8} \equiv 3 \pmod{7}.$

There’s a bit of work to do in order to see that Kontsevich’s suggestion really makes sense. For instance, if we’re going to view $H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}$ as the “residue” of the real number $H_\mathbb{R}(\pi)$, then it had better be the case that

$H_\mathbb{R}(\pi) = H_\mathbb{R}(\sigma) \quad\Rightarrow\quad H_p(\pi) = H_p(\sigma).$

And I show in the paper that it is.

- Why is entropy mod $p$ a polynomial?

The formula for real entropy is not a polynomial; it involves logarithms.
But over $\mathbb{Z}/p\mathbb{Z}$, entropy *is* a polynomial.

I don’t mean the formula $\tfrac{1}{p} (1 - \sum_i \pi_i^p)$ above, which
isn’t a polynomial in the usual sense: recall that we had to interpret it
rather carefully in order for it to make sense. The *genuine* polynomial
formula turns out to be

$H(\pi) = - \sum \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!},$

where the sum is over all $0 \leq r_1, \ldots, r_n \lt p$ summing to $p$.

It’s a triviality that entropy can be expressed as *some* polynomial, for
the simple reason that every function $K^n \to K$ on a finite field $K$ can
be. But what’s interesting is this *particular* polynomial.

Let’s call this polynomial $h$: so

$h(x_1, \ldots, x_n) = - \sum \frac{x_1^{r_1} \cdots x_n^{r_n}}{r_1! \cdots r_n!}.$

By a small abuse of notation, $h$ is actually the name of a whole sequence of polynomials $h(x_1, \ldots, x_n)$, one for each $n \geq 1$.

The one-variable polynomial $h(x_1)$ is zero. The two-variable polynomial $h(x_1, x_2)$ reduces to something simpler in the case where $x_1 + x_2 = 1$ (which we care about because we’re interested in probability distributions):

$h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}.$

That’s not immediately obvious from the definition, but it’s another pleasant exercise.

- What does this have to do with polylogarithms and homology?

The sum

$h(x, 1 - x) = \sum_{0 \lt r \lt p} \frac{x^r}{r}$

is, formally, a truncation of the power series of $-\log(1 - x)$. That’s a bit surprising, in that it’s the mod $p$ analogue of

$H_\mathbb{R}(x, 1 - x) = - x \log x - (1 - x) \log(1 - x),$

which is not the same as $-\log(1 - x)$! Nonetheless, it’s true.

There’s a classical theory of polylogarithms, which are the functions

$z \mapsto \sum_{r \geq 1} \frac{z^r}{r^s}$

on the complex plane. Here $s$ is a complex parameter. When $s = 1$, we recover the ordinary logarithm, essentially: it’s $z \mapsto -\log(1 - z)$.

Kontsevich’s note, and the paper of Elbaz-Vincent and Gangl in which Kontsevich’s note was published as an appendix, are both interested in “finite polylogarithms”. These are polynomials of the form

$\sum_{0 \lt r \lt p} \frac{x^r}{r^s}$

for integers $s \geq 1$. I know very little about this work — but its starting point is the case $s = 1$ that comes out of the theory of entropy mod $p$.

Finally, what’s the homological connection? We got into this a bit in the conversation Charles Rezk and I had on another recent post, and my comment there gives some links to related work. But the simplest thing to say is this.

Consider the entropy polynomials $h(x_1, x_2, \ldots, x_n)$. We looked at the cases $n = 1$ and $n = 2$. What about $n = 3$? It turns out that

$h(x, y, z) = h(x, y) + h(x + y, z).$

It’s obvious from the definition of $h$ that it’s a symmetric polynomial, so also

$h(x, y, z) = h(y, z) + h(x, y + z).$

Putting together these two equations gives

$h(x, y) - h(x, y + z) + h(x + y, z) - h(y, z) = 0.$

In other words, $h$ is a cocycle! And it can be shown that it’s a
nontrivial cocycle — that is, you *can’t* find a $j$ satisfying

$h(x, y) = j(x) - j(x + y) + j(y).$

Work by people including Cathelineau, Kontsevich, Elbaz-Vincent, Gangl, Bennequin and Baudot exploits this homological connection (references here), but as I don’t know much about all that, I’ll stop there.

## Re: Entropy mod p

They go on to show how negative probabilities can be used (“what they are for”).

What I’m lacking is a demonstration of how probabilities mod $p$, and entropy mod $p$, can be used.