## December 14, 2017

### Entropy Modulo a Prime

#### Posted by Tom Leinster

In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The $1\tfrac{1}{2}$-logarithm”.

Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:

Conclusion: If we have a random variable $\xi$ which takes finitely many values with all probabilities in $\mathbb{Q}$ then we can define not only the transcendental number $H(\xi)$ but also its “residues modulo $p$” for almost all primes $p$ !

Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.

The “$H$” that Kontsevich uses here is the symbol for entropy — or more exactly, Shannon entropy. So, I’ll begin by recalling what that is. That will pave the way for what I really want to talk about, which is a kind of entropy for probability distributions where the “probabilities” are not real numbers, but elements of the field $\mathbb{Z}/p\mathbb{Z}$ of integers modulo a prime $p$.

Let $\pi = (\pi_1, \ldots, \pi_n)$ be a finite probability distribution. (It would be more usual to write a probability distribution as $p$, but I want to reserve that letter for prime numbers.) The entropy of $\pi$ is

$H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.$

Usually this is just written as $H$, but I want to emphasize the role of the real numbers here: both the probabilities $\pi_i$ and the entropy $H_\mathbb{R}(\pi)$ belong to $\mathbb{R}$.

There are applications of entropy in dozens of branches of science… but none will be relevant here! This is purely a mathematical story, though if anyone can think of any possible application or interpretation of entropy modulo a prime, I’d love to hear it.

The challenge now is to find the correct analogue of entropy when the field $\mathbb{R}$ is replaced by the field $\mathbb{Z}/p\mathbb{Z}$ of integers mod $p$, for any prime $p$. So, we want to define a kind of entropy

$H_p(\pi_1, \ldots, \pi_n) \in \mathbb{Z}/p\mathbb{Z}$

when $\pi_i \in \mathbb{Z}/p\mathbb{Z}$.

We immediately run into an obstacle. Over $\mathbb{R}$, probabilities are required to be nonnegative. Indeed, the logarithm in the definition of entropy doesn’t make sense otherwise. But in $\mathbb{Z}/p\mathbb{Z}$, there is no notion of positive or negative. So, what are we even going to define the entropy of?

We take the simplest way out: ignore the problem. So, writing

$\Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1 \},$

we’re going to try to define

$H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}$

for each $\pi = (\pi_1, \ldots, \pi_n) \in \Pi_n$.

Let’s try the most direct approach to doing this. That is, let’s stare at the formula defining real entropy…

$H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i$

… and try to write down the analogous formula over $\mathbb{Z}/p\mathbb{Z}$.

The immediate question is: what should play the role of the logarithm mod $p$?

The crucial property of the ordinary logarithm is that it converts multiplication into addition. Specifically, we’re concerned here with logarithms of nonzero probabilities, and $\log$ defines a homomorphism from the multiplicative group $(0, 1]$ of nonzero probabilities to the additive group $\mathbb{R}$.

Mod $p$, then, we want a homomorphism from the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^\times$ of nonzero probabilities to the additive group $\mathbb{Z}/p\mathbb{Z}$. And here we hit another obstacle: a simple argument using Lagrange’s theorem shows that apart from the zero map, no such homomorphism exists.

So, we seem to be stuck. Actually, we’re stuck in a way that often happens when you try to construct something new, working by analogy with something old: slavishly imitating the old situation, symbol for symbol, often doesn’t work. In the most interesting analogies, there are wrinkles.

To make some progress, instead of looking at the formula for entropy, let’s look at the properties of entropy.

The most important property is a kind of recursivity. In the language spoken by many patrons of the Café, finite probability distributions form an operad. Explicitly, this means the following.

Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I draw from a pack of cards. This is a two-stage process with 58 possible final outcomes: either the face of a die or a playing card. Assuming that the coin toss, die roll and card draw are all fair, the probability distribution on the 58 outcomes is

$(1/12, \ldots, 1/12, 1/104, \ldots, 1/104),$

with $6$ copies of $1/12$ and $52$ copies of $1/104$. Generally, given a probability distribution $\gamma = (\gamma_1, \ldots, \gamma_n)$ on $n$ elements and, for each $i \in \{1, \ldots, n\}$, a probability distribution $\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i})$ on $k_i$ elements, we get a composite distribution

$\gamma \circ (\pi^1, \ldots, \pi^n) = (\gamma_1 \pi^1_1, \ldots, \gamma_1 \pi^1_{k_1}, \ldots, \gamma_n \pi^n_1, \ldots, \gamma_n \pi^n_{k_n})$

on $k_1 + \cdots + k_n$ elements.

For example, take the coin-die-card process above. Writing $u_n$ for the uniform distribution on $n$ elements, the final distribution on $58$ elements is $u_2 \circ (u_6, u_{52})$, which I wrote out explicitly above.

The important recursivity property of entropy is called the chain rule, and it states that

$H_\mathbb{R}(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_\mathbb{R}(\gamma) + \sum_{i = 1}^n \gamma_i H_\mathbb{R}(\pi^i).$

It’s easy to check that this is true. (It’s also nice to understand it in terms of information… but if I follow every tempting explanatory byway, I’ll run out of energy too soon.) And in fact, it characterizes entropy almost uniquely:

Theorem   Let $I$ be a function assigning a real number $I(\pi)$ to each finite probability distribution $\pi$. The following are equivalent:

• $I$ is continuous in $\pi$ and satisfies the chain rule;

• $I = c H_\mathbb{R}$ for some constant $c \in \mathbb{R}$.

The theorem as stated is due to Faddeev, and I blogged about it earlier this year. In fact, you can weaken “continuous” to “measurable” (a theorem of Lee), but that refinement won’t be important here.

What is important is this. In our quest to imitate real entropy in $\mathbb{Z}/p\mathbb{Z}$, we now have something to aim for. Namely: we want a sequence of functions $H_p : \Pi_n \to \mathbb{Z}/p\mathbb{Z}$ satisfying the obvious analogue of the chain rule. And if we’re really lucky, there will be essentially only one such sequence.

We’ll discover that this is indeed the case. Once we’ve found the right definition of $H_p$ and proved this, we can very legitimately baptize $H_p$ as “entropy mod $p$” — no matter what weird and wonderful formula might be used to define it — because it has the same characteristic properties as entropy over $\mathbb{R}$.

I think I’ll leave you on that cliff-hanger. If you’d like to guess what the definition of entropy mod $p$ is, go ahead! Otherwise, I’ll tell you next time.

Posted at December 14, 2017 11:00 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3001

### Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of R are the finite fields rather than the p adic fields or p adic integers.

An obvious advantage of working in the p adics is that logarithm is well defined automatically. Moreover, if we restrict our probabilities to take values in the p adic integers, then entropy is also automatically an integer and reduction makes sense.

I suppose the reduction of the p adic entropy equals whatever you have in mind over the finite fields?

Posted by: Asvin on December 14, 2017 11:39 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of $\mathbb{R}$ are the finite fields rather than the $p$-adic fields or $p$-adic integers.

Well, we can dream of replacing $\mathbb{R}$ with all sorts of things! Kontsevich’s note was specifically about entropy over the field with $p$ elements. One can speculate about other finite fields (and maybe someone’s already figured it out). And I agree that $\mathbb{Q}_p$ and $\mathbb{Z}_p$ are also natural contexts.

I’m not sure I know what $p$-adic entropy is. A few times, I’ve browsed a 2006 paper of Deninger, and there’s a definition of $p$-adic entropy at the top of page 2. But that’s a definition of the $p$-adic entropy of a group action (an important special case being $\mathbb{Z}$-actions, i.e. automorphisms), whereas here we’re talking about entropy of probability distributions. What’s the connection between the two?

Or to make it specific, how would you calculate (say) $H_3(1, 2, 2, 2)$ using the $3$-adic approach that you outline?

Posted by: Tom Leinster on December 15, 2017 12:16 AM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

An obvious advantage of working in the p adics is that logarithm is well defined automatically.

The usual logarithm formula converges for $\pi \in \mathbb{Q}_p$ iff $|1-\pi| \lt 1$, i.e. $\pi \in \mathbb{Z}^\times_p$. Because $\mathbb{Z}_p^\times$ is a group under multiplication, $\mathbb{Z}_p^\times$-valued measures on finite spaces are closed under the operadic composition Tom describes. And the same calculations which show the chain rule holds in the usual case should show it also holds here.

I’m not sure what to think of the condition of being a probability measure in this context.

• If we define a probability measure $\pi$ to be one with $\sum_k \pi(k) = 1$, then there are no $\mathbb{Z}^\times_p$-valued probability measures on an $n$-element set unless $n \equiv 1$ mod $p$. This seems like a strange restriction.

• Another option might be to define a probability measure $\pi$ to be one such that $|\sum_k \pi(k)| = 1$. With this definition, there are no $\mathbb{Z}^\times_p$-valued probability measures on $n$ unless $n$ is a unit mod $p$, and moreover in this case every $\mathbb{Z}^\times_p$-valued measure is a probability measure. That feels a little more natural.

• Unfortunately for my feelings of “naturalness”, it appears that that the first, but not the second option, is closed under operadic composition. One might go with a hybrid option where one simply defines a probability measure on a finite set to be a $\mathbb{Z}_p^\times$-valued measure on a finite sset whose cardinality is 1 mod $p$.

Any of these options also starts to resemble the conditions for $\pi$ to be an $\mathbb{R}$-valued probability measure: first, to be in the radius of convergence of the logarithm formula, the values $\pi(k)$ should be positive [and $\lt 2$], while to be a probability measure the $\pi(k)$ should also add up to 1 (in absolute value, for the second or third option).

Posted by: Tim Campion on December 15, 2017 7:56 AM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

Just a few thoughts :

1. The p adic logarithm is usually extended to all of Q_p by defining log p = 0. I don’t know if that is the right thing to do in this case.

2 The interval [0,1] is not all that natural in a number theoritic /adelic perspective. Maybe instead of working with probability, we can work with odds (p/(1-p))which have range (0,infinity) which is a more natural thing to consider since it is a connected subgroup R-0.

At least on class field theory, the right analogues of this connected component are the p adic numbers congruent to 1 modulo a fixed power of p. Once again, I don’t know if this is the right thing to look at in this context.

Posted by: Asvin on December 15, 2017 9:17 AM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

Note that one needs to specify the value of the extension of the logarithm not just on the uniformiser, but on Teichmüller lifts. At least according to Wikipedia, the Iwasawa logarithm simply ignores the Teichmüller lift, which means (it seems to me) that we don’t understand anything better about the finite-field logarithm by using this lift.

Posted by: L Spice on December 16, 2017 11:38 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

You say that the $p$-adic logarithm $\log(x)$ is defined if and only if $|1 - x| < 1$, with which I agree; but then you say that the domain of the $p$-adic logarithm is $ℤ_p^⨉$, and I think that you must mean $1 + pℤ_p$. (If it were really defined on $ℤ_p^⨉$, then, assuming it behaved nicely in a sense to be obvious from the rest of this sentence, we’d have a reasonable candidate $𝔽_p^⨉ → ℤ_p^⨉ → ℤ_p → 𝔽_p$ for the finite-field logarithm.) (Sorry about those enormous multiplication signs; I can’t figure out how to get them reasonably sized.)

Posted by: L Spice on December 16, 2017 11:35 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

Re the multiplication signs: I’ve just been doing things like $\mathbb{Z}_p^\times$.

Posted by: Tom Leinster on December 16, 2017 11:46 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

Huh; I could have sworn I tried exactly that (except that I usually omit braces around my single-letter arguments), and the post just stopped entirely immediately before the first backslash. Maybe I should have been using a filter other than “Markdown with itex to MathML”?

As a test: $\mathbb{Z}_p^\times$. Huh, looks like it’s the braces that make all the difference. Here it is (or, rather, isn’t) without:

Posted by: L Spice on December 19, 2017 8:29 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

I leave the text filter on the default, which is indeed Markdown with itex to MathML. And I’m not enough of an expert to guess what was happening with your earlier posts, unfortunately!

Posted by: Tom Leinster on December 20, 2017 12:12 AM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

… the observation was that the power series $log (1+x) = \sum_n \frac{x^n}{n}$ only converges to define a function for $|1-x| \lt 1$; this is just as true in $\mathbb{R}$ or $\mathbb{C}$ as it is in $\mathbb{Z}_{(p)}$. One then makes reasonable choices to analytically extend the function so defined…

Posted by: Jesse C. McKeown on December 16, 2017 11:58 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

The exact comment to which I was responding was:

The usual logarithm formula converges for $\pi \in \mathbb{Q}_p$ iff $|1 - \pi| < 1$, i.e. $\pi \in \mathbb{Z}_p^\times$.

Posted by: L Spice on December 19, 2017 8:36 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime

OK, so where have we got to? As I understand it, the current status is as follows:

• I asked the question: how should we define the entropy of a “probability distribution mod $p$”, i.e., a finite tuple of elements of $\mathbb{Z}/p\mathbb{Z}$ summing to $1$? (That entropy should also be an element of $\mathbb{Z}/p\mathbb{Z}$.) I’ll answer the question in the next post, but I’m also interested in learning different approaches or viewpoints.

• Asvin mentioned something called “$p$-adic entropy”. I’d met that concept before, but only the concept of the $p$-adic entropy of a group action, not of anything like a probability distribution of $p$-adic numbers. I haven’t yet seen any literature on $p$-adic entropy of that kind (and would be glad to be pointed to some).

• Asvin also pointed out that there’s such a thing as the $p$-adic logarithm. So, given $\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{Q}_p^n$ summing to $1$, we could attempt to define the entropy of $\pi$ by the usual formula — $H(\pi) = -\sum_{i: \pi_i \neq 0} \pi_i \log \pi_i$ — but now interpreting the logarithm as a $p$-adic logarithm. However…

• Tim pointed out that the power series defining the $p$-adic logarithm only converges in the disc of radius $1$ about $1$, and deduced that in this sense, the formula above is only well-defined if the cardinality of the support of $\pi$ is congruent to $1$ mod $p$. So, it’s usually not well-defined. But…

• Asvin then observed that the $p$-adic logarithm can be extended to $\mathbb{Q}_p^\times$ (or even $\mathbb{C}_p^\times$) in such a way that the usual functional equation $\log(z w) = \log(z) + \log(w)$ holds everywhere and $\log(p) = 0$. According to Wikipedia, this is called the Iwasawa logarithm.

• So, we can use the Iwasawa logarithm to define the entropy $H(\pi)$ of any finite tuple $\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{C}_p^n$ summing to $1$. Simply take Shannon’s formula (above) and interpret $\log$ as the Iwasawa logarithm. Then $H(\pi)$ is also an element of $\mathbb{C}_p$.

What I’m not getting is how you could use this definition of $p$-adic entropy to obtain a definition of the entropy of a probability distribution in $\mathbb{Z}/p\mathbb{Z}$. Asvin, could you explain? E.g. can we try the example of $p = 3$ and $\pi = (1, 2, 2, 2)$?

Posted by: Tom Leinster on December 15, 2017 7:16 PM | Permalink | Reply to this

Post a New Comment