Entropy mod p
Posted by Tom Leinster
I’ve just arXived a new paper:
Tom Leinster, Entropy modulo a prime, arXiv:1903.06961
In it, you’ll find answers to these questions and more — and I’ll also answer them in this post:
What are logarithms and derivations mod ?
What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime ?
What is the right mod analogue of information loss?
What does it mean to say that ?
Why is entropy mod a polynomial?
What does all this have to do with polylogarithms and homology?
I’ve posted about entropy mod a couple of times before, during the time I was working the whole thing out, but I’ve tidied it up now enough to write it up. Here are my answers to the questions above.
- What are logarithms and derivations mod ?
What this question isn’t about is discrete logarithms.
What it is about is the following problem. If you wanted an analogue of the ordinary real logarithm in , your first thought would probably be to seek a group homomorphism
But that’s impossible: Lagrange’s theorem immediately implies that the only such homomorphism is trivial.
However, there’s a next-best thing. It’s the Fermat quotient, which up to a constant factor is the unique homomorphism
It’s given explicitly by the formula
which is an integer by Fermat’s little theorem. The Fermat quotient turns out to provide an acceptable substitute for the (non-existent) logarithm.
What about derivations? Over , the function
is a nonlinear derivation, in the sense that it satisfies the Leibniz rule
The correct analogue over is the function
given by
This satisfies the same Leibniz rule. If is not divisible by then , as you’d expect from the analogy with the real case. Otherwise, .
- What is the right definition of entropy for probability distributions whose “probabilities” are integers modulo a prime ?
Ordinarily, a finite probability distribution is a finite sequence of nonnegative reals summing to . What I’m interested in is the same kind of thing, but where are elements of the ring .
This might seem odd. So it’s useful to be somewhat aware of the idea of “negative probabilties”, which has a long history in physics going back at least to Feynman. There are several nice papers on that topic, including Feynman’s and a couple by Blass and Gurevich, but here I’ll just link to a recent paper of Blass and Gurevich which gives lots of references and history on the first page. To quote from the abstract:
We argue that the right question is not what negative probabilities are but what they are for.
My attitude to “probabilities mod ” is the same. Which isn’t to claim I know what they’re “for” — in fact, I’d love to find an application for this theory, even within mathematics — but it’s a much easier question to answer than “what are they?”
Anyway, I show in my paper that the right definition of the entropy of a probability distribution mod is
At first, this looks like it doesn’t even make sense. The are integers mod , so is an integer mod , so how can you divide it by ? After all, in , and you can’t divide by zero.
What saves the day is the little lemma that if then . This is a pleasant exercise. The upshot is that the operation of raising to the power of can be regarded as a map
Interpreted in this way, is an element of . Since as elements of , we can divide by , and the result is the entropy.
(Another way to put it is that , where are integers representing .)
Why’s this the right definition of entropy? Because up to a scalar multiple, it’s the one and only quantity that satisfies the same functional equation that characterizes real entropy — variously called the chain rule, recursivity, or the grouping law. All this is done precisely in the paper.
- What is the right analogue of information loss mod ?
There’s a quote about the Grothendieck–Riemann–Roch theorem I really like:
Grothendieck came along and said “No, the Riemann–Roch theorem is not a theorem about varieties, it’s a theorem about morphisms between varieties.”
(Nicholas Katz, quoted here).
In geometry and topology, this kind of conceptual shift is often described with the word relative. Instead of studying spaces , you study spaces relative to a base , which means maps of a suitable kind. The classical case of mere spaces is where is the one-point space.
Categorically, it’s the conceptual shift from objects to maps. Entropy is an invariant of probability spaces. What about maps between probability spaces — measure-preserving transformations? Is there a sensible numerical invariant of these, which reduces to entropy when the codomain is the one-point space?
John Baez, Tobias Fritz and I showed that the answer is yes. You can talk in a reasonable way about the information loss of a measure-preserving map between probability spaces. (The terminology makes sense if you think of such a map as a deterministic process: click the link for explanation!) It’s closely related to conditional entropy.
The shift from entropy to information loss has the following big advantage. The recursivity equation that characterizes entropy is a bit complicated. But John, Tobias and I proved that information loss is characterized by a couple of equations that are nothing more than linearity conditions. They’re easy:
if you do two processes in sequence, the amount of information lost by the composite processes is the sum of the amount lost by each process individually;
if you flip a probability- coin and do one or other of two different processes, the information lost overall is times the information loss of the first process plus times the information lost by the second.
Everything I’ve just been saying about information loss is over . Over , it’s much harder to understand things information-theoretically. But I show in my new paper that there is a precisely analogous definition of information loss mod , and it satisfies a precisely analogous characterization theorem.
- What does it mean to say that ?
The direct inspiration for my paper was a very short note by Maxim Kontsevich (which I’m grateful to Herbert Gangl for pointing out to me). In it, Kontsevich made an intriguing suggestion.
Consider a finite probability distribution
where all the probabilities are rational numbers, say . Rational probabilities arise naturally: you get them in ball/urn or scenarios and other combinatorial problems.
We can, of course, regard the s as real numbers, and take the real entropy
But there’s also something else we can do.
Only finitely many primes divide , and as long as doesn’t have the bad luck to be one of them, each defines an element of . Hence defines a probability distribution mod , and we can take its mod entropy
Kontsevich’s suggestion, playfully made, was to view as the “residue mod ” of the real (usually transcendental) number .
For instance, take
and . The real entropy of the distribution is
In ,
so
In this sense,
There’s a bit of work to do in order to see that Kontsevich’s suggestion really makes sense. For instance, if we’re going to view as the “residue” of the real number , then it had better be the case that
And I show in the paper that it is.
- Why is entropy mod a polynomial?
The formula for real entropy is not a polynomial; it involves logarithms. But over , entropy is a polynomial.
I don’t mean the formula above, which isn’t a polynomial in the usual sense: recall that we had to interpret it rather carefully in order for it to make sense. The genuine polynomial formula turns out to be
where the sum is over all summing to .
It’s a triviality that entropy can be expressed as some polynomial, for the simple reason that every function on a finite field can be. But what’s interesting is this particular polynomial.
Let’s call this polynomial : so
By a small abuse of notation, is actually the name of a whole sequence of polynomials , one for each .
The one-variable polynomial is zero. The two-variable polynomial reduces to something simpler in the case where (which we care about because we’re interested in probability distributions):
That’s not immediately obvious from the definition, but it’s another pleasant exercise.
- What does this have to do with polylogarithms and homology?
The sum
is, formally, a truncation of the power series of . That’s a bit surprising, in that it’s the mod analogue of
which is not the same as ! Nonetheless, it’s true.
There’s a classical theory of polylogarithms, which are the functions
on the complex plane. Here is a complex parameter. When , we recover the ordinary logarithm, essentially: it’s .
Kontsevich’s note, and the paper of Elbaz-Vincent and Gangl in which Kontsevich’s note was published as an appendix, are both interested in “finite polylogarithms”. These are polynomials of the form
for integers . I know very little about this work — but its starting point is the case that comes out of the theory of entropy mod .
Finally, what’s the homological connection? We got into this a bit in the conversation Charles Rezk and I had on another recent post, and my comment there gives some links to related work. But the simplest thing to say is this.
Consider the entropy polynomials . We looked at the cases and . What about ? It turns out that
It’s obvious from the definition of that it’s a symmetric polynomial, so also
Putting together these two equations gives
In other words, is a cocycle! And it can be shown that it’s a nontrivial cocycle — that is, you can’t find a satisfying
Work by people including Cathelineau, Kontsevich, Elbaz-Vincent, Gangl, Bennequin and Baudot exploits this homological connection (references here), but as I don’t know much about all that, I’ll stop there.
Re: Entropy mod p
They go on to show how negative probabilities can be used (“what they are for”).
What I’m lacking is a demonstration of how probabilities mod , and entropy mod , can be used.