Entropy Modulo a Prime
Posted by Tom Leinster
In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The -logarithm”.
Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:
Conclusion: If we have a random variable which takes finitely many values with all probabilities in then we can define not only the transcendental number but also its “residues modulo ” for almost all primes !
Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.
The “” that Kontsevich uses here is the symbol for entropy — or more exactly, Shannon entropy. So, I’ll begin by recalling what that is. That will pave the way for what I really want to talk about, which is a kind of entropy for probability distributions where the “probabilities” are not real numbers, but elements of the field of integers modulo a prime .
Let be a finite probability distribution. (It would be more usual to write a probability distribution as , but I want to reserve that letter for prime numbers.) The entropy of is
Usually this is just written as , but I want to emphasize the role of the real numbers here: both the probabilities and the entropy belong to .
There are applications of entropy in dozens of branches of science… but none will be relevant here! This is purely a mathematical story, though if anyone can think of any possible application or interpretation of entropy modulo a prime, I’d love to hear it.
The challenge now is to find the correct analogue of entropy when the field is replaced by the field of integers mod , for any prime . So, we want to define a kind of entropy
when .
We immediately run into an obstacle. Over , probabilities are required to be nonnegative. Indeed, the logarithm in the definition of entropy doesn’t make sense otherwise. But in , there is no notion of positive or negative. So, what are we even going to define the entropy of?
We take the simplest way out: ignore the problem. So, writing
we’re going to try to define
for each .
Let’s try the most direct approach to doing this. That is, let’s stare at the formula defining real entropy…
… and try to write down the analogous formula over .
The immediate question is: what should play the role of the logarithm mod ?
The crucial property of the ordinary logarithm is that it converts multiplication into addition. Specifically, we’re concerned here with logarithms of nonzero probabilities, and defines a homomorphism from the multiplicative group of nonzero probabilities to the additive group .
Mod , then, we want a homomorphism from the multiplicative group of nonzero probabilities to the additive group . And here we hit another obstacle: a simple argument using Lagrange’s theorem shows that apart from the zero map, no such homomorphism exists.
So, we seem to be stuck. Actually, we’re stuck in a way that often happens when you try to construct something new, working by analogy with something old: slavishly imitating the old situation, symbol for symbol, often doesn’t work. In the most interesting analogies, there are wrinkles.
To make some progress, instead of looking at the formula for entropy, let’s look at the properties of entropy.
The most important property is a kind of recursivity. In the language spoken by many patrons of the Café, finite probability distributions form an operad. Explicitly, this means the following.
Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I draw from a pack of cards. This is a two-stage process with 58 possible final outcomes: either the face of a die or a playing card. Assuming that the coin toss, die roll and card draw are all fair, the probability distribution on the 58 outcomes is
with copies of and copies of . Generally, given a probability distribution on elements and, for each , a probability distribution on elements, we get a composite distribution
on elements.
For example, take the coin-die-card process above. Writing for the uniform distribution on elements, the final distribution on elements is , which I wrote out explicitly above.
The important recursivity property of entropy is called the chain rule, and it states that
It’s easy to check that this is true. (It’s also nice to understand it in terms of information… but if I follow every tempting explanatory byway, I’ll run out of energy too soon.) And in fact, it characterizes entropy almost uniquely:
Theorem Let be a function assigning a real number to each finite probability distribution . The following are equivalent:
is continuous in and satisfies the chain rule;
for some constant .
The theorem as stated is due to Faddeev, and I blogged about it earlier this year. In fact, you can weaken “continuous” to “measurable” (a theorem of Lee), but that refinement won’t be important here.
What is important is this. In our quest to imitate real entropy in , we now have something to aim for. Namely: we want a sequence of functions satisfying the obvious analogue of the chain rule. And if we’re really lucky, there will be essentially only one such sequence.
We’ll discover that this is indeed the case. Once we’ve found the right definition of and proved this, we can very legitimately baptize as “entropy mod ” — no matter what weird and wonderful formula might be used to define it — because it has the same characteristic properties as entropy over .
I think I’ll leave you on that cliff-hanger. If you’d like to guess what the definition of entropy mod is, go ahead! Otherwise, I’ll tell you next time.
Re: Entropy Modulo a Prime
As a number theorist, it seems wrong to me that the analogues of R are the finite fields rather than the p adic fields or p adic integers.
An obvious advantage of working in the p adics is that logarithm is well defined automatically. Moreover, if we restrict our probabilities to take values in the p adic integers, then entropy is also automatically an integer and reduction makes sense.
I suppose the reduction of the p adic entropy equals whatever you have in mind over the finite fields?