Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 14, 2017

Entropy Modulo a Prime

Posted by Tom Leinster

In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The 1121\tfrac{1}{2}-logarithm”.

Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:

Conclusion: If we have a random variable ξ\xi which takes finitely many values with all probabilities in \mathbb{Q} then we can define not only the transcendental number H(ξ)H(\xi) but also its “residues modulo pp” for almost all primes pp !

Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.

The “HH” that Kontsevich uses here is the symbol for entropy — or more exactly, Shannon entropy. So, I’ll begin by recalling what that is. That will pave the way for what I really want to talk about, which is a kind of entropy for probability distributions where the “probabilities” are not real numbers, but elements of the field /p\mathbb{Z}/p\mathbb{Z} of integers modulo a prime pp.

Let π=(π 1,,π n)\pi = (\pi_1, \ldots, \pi_n) be a finite probability distribution. (It would be more usual to write a probability distribution as pp, but I want to reserve that letter for prime numbers.) The entropy of π\pi is

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.

Usually this is just written as HH, but I want to emphasize the role of the real numbers here: both the probabilities π i\pi_i and the entropy H (π)H_\mathbb{R}(\pi) belong to \mathbb{R}.

There are applications of entropy in dozens of branches of science… but none will be relevant here! This is purely a mathematical story, though if anyone can think of any possible application or interpretation of entropy modulo a prime, I’d love to hear it.

The challenge now is to find the correct analogue of entropy when the field \mathbb{R} is replaced by the field /p\mathbb{Z}/p\mathbb{Z} of integers mod pp, for any prime pp. So, we want to define a kind of entropy

H p(π 1,,π n)/p H_p(\pi_1, \ldots, \pi_n) \in \mathbb{Z}/p\mathbb{Z}

when π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}.

We immediately run into an obstacle. Over \mathbb{R}, probabilities are required to be nonnegative. Indeed, the logarithm in the definition of entropy doesn’t make sense otherwise. But in /p\mathbb{Z}/p\mathbb{Z}, there is no notion of positive or negative. So, what are we even going to define the entropy of?

We take the simplest way out: ignore the problem. So, writing

Π n={(π 1,,π n)(/p) n:π 1++π n=1}, \Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1 \},

we’re going to try to define

H p(π)/p H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}

for each π=(π 1,,π n)Π n\pi = (\pi_1, \ldots, \pi_n) \in \Pi_n.

Let’s try the most direct approach to doing this. That is, let’s stare at the formula defining real entropy…

H (π)= i:π i0π ilogπ i H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i

… and try to write down the analogous formula over /p\mathbb{Z}/p\mathbb{Z}.

The immediate question is: what should play the role of the logarithm mod pp?

The crucial property of the ordinary logarithm is that it converts multiplication into addition. Specifically, we’re concerned here with logarithms of nonzero probabilities, and log\log defines a homomorphism from the multiplicative group (0,1](0, 1] of nonzero probabilities to the additive group \mathbb{R}.

Mod pp, then, we want a homomorphism from the multiplicative group (/p) ×(\mathbb{Z}/p\mathbb{Z})^\times of nonzero probabilities to the additive group /p\mathbb{Z}/p\mathbb{Z}. And here we hit another obstacle: a simple argument using Lagrange’s theorem shows that apart from the zero map, no such homomorphism exists.

So, we seem to be stuck. Actually, we’re stuck in a way that often happens when you try to construct something new, working by analogy with something old: slavishly imitating the old situation, symbol for symbol, often doesn’t work. In the most interesting analogies, there are wrinkles.

To make some progress, instead of looking at the formula for entropy, let’s look at the properties of entropy.

The most important property is a kind of recursivity. In the language spoken by many patrons of the Café, finite probability distributions form an operad. Explicitly, this means the following.

Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I draw from a pack of cards. This is a two-stage process with 58 possible final outcomes: either the face of a die or a playing card. Assuming that the coin toss, die roll and card draw are all fair, the probability distribution on the 58 outcomes is

(1/12,,1/12,1/104,,1/104), (1/12, \ldots, 1/12, 1/104, \ldots, 1/104),

with 66 copies of 1/121/12 and 5252 copies of 1/1041/104. Generally, given a probability distribution γ=(γ 1,,γ n)\gamma = (\gamma_1, \ldots, \gamma_n) on nn elements and, for each i{1,,n}i \in \{1, \ldots, n\}, a probability distribution π i=(π 1 i,,π k i i)\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i}) on k ik_i elements, we get a composite distribution

γ(π 1,,π n)=(γ 1π 1 1,,γ 1π k 1 1,,γ nπ 1 n,,γ nπ k n n) \gamma \circ (\pi^1, \ldots, \pi^n) = (\gamma_1 \pi^1_1, \ldots, \gamma_1 \pi^1_{k_1}, \ldots, \gamma_n \pi^n_1, \ldots, \gamma_n \pi^n_{k_n})

on k 1++k nk_1 + \cdots + k_n elements.

For example, take the coin-die-card process above. Writing u nu_n for the uniform distribution on nn elements, the final distribution on 5858 elements is u 2(u 6,u 52)u_2 \circ (u_6, u_{52}), which I wrote out explicitly above.

The important recursivity property of entropy is called the chain rule, and it states that

H (γ(π 1,,π n))=H (γ)+ i=1 nγ iH (π i). H_\mathbb{R}(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_\mathbb{R}(\gamma) + \sum_{i = 1}^n \gamma_i H_\mathbb{R}(\pi^i).

It’s easy to check that this is true. (It’s also nice to understand it in terms of information… but if I follow every tempting explanatory byway, I’ll run out of energy too soon.) And in fact, it characterizes entropy almost uniquely:

Theorem   Let II be a function assigning a real number I(π)I(\pi) to each finite probability distribution π\pi. The following are equivalent:

  • II is continuous in π\pi and satisfies the chain rule;

  • I=cH I = c H_\mathbb{R} for some constant cc \in \mathbb{R}.

The theorem as stated is due to Faddeev, and I blogged about it earlier this year. In fact, you can weaken “continuous” to “measurable” (a theorem of Lee), but that refinement won’t be important here.

What is important is this. In our quest to imitate real entropy in /p\mathbb{Z}/p\mathbb{Z}, we now have something to aim for. Namely: we want a sequence of functions H p:Π n/pH_p : \Pi_n \to \mathbb{Z}/p\mathbb{Z} satisfying the obvious analogue of the chain rule. And if we’re really lucky, there will be essentially only one such sequence.

We’ll discover that this is indeed the case. Once we’ve found the right definition of H pH_p and proved this, we can very legitimately baptize H pH_p as “entropy mod pp” — no matter what weird and wonderful formula might be used to define it — because it has the same characteristic properties as entropy over \mathbb{R}.

I think I’ll leave you on that cliff-hanger. If you’d like to guess what the definition of entropy mod pp is, go ahead! Otherwise, I’ll tell you next time.

Posted at December 14, 2017 11:00 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3001

12 Comments & 0 Trackbacks

Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of R are the finite fields rather than the p adic fields or p adic integers.

An obvious advantage of working in the p adics is that logarithm is well defined automatically. Moreover, if we restrict our probabilities to take values in the p adic integers, then entropy is also automatically an integer and reduction makes sense.

I suppose the reduction of the p adic entropy equals whatever you have in mind over the finite fields?

Posted by: Asvin on December 14, 2017 11:39 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

I’m glad there’s at least one number theorist reading!

As a number theorist, it seems wrong to me that the analogues of \mathbb{R} are the finite fields rather than the pp-adic fields or pp-adic integers.

Well, we can dream of replacing \mathbb{R} with all sorts of things! Kontsevich’s note was specifically about entropy over the field with pp elements. One can speculate about other finite fields (and maybe someone’s already figured it out). And I agree that p\mathbb{Q}_p and p\mathbb{Z}_p are also natural contexts.

I’m not sure I know what pp-adic entropy is. A few times, I’ve browsed a 2006 paper of Deninger, and there’s a definition of pp-adic entropy at the top of page 2. But that’s a definition of the pp-adic entropy of a group action (an important special case being \mathbb{Z}-actions, i.e. automorphisms), whereas here we’re talking about entropy of probability distributions. What’s the connection between the two?

Or to make it specific, how would you calculate (say) H 3(1,2,2,2)H_3(1, 2, 2, 2) using the 33-adic approach that you outline?

Posted by: Tom Leinster on December 15, 2017 12:16 AM | Permalink | Reply to this

Re: Entropy Modulo a Prime

An obvious advantage of working in the p adics is that logarithm is well defined automatically.

The usual logarithm formula converges for π p\pi \in \mathbb{Q}_p iff |1π|<1|1-\pi| \lt 1, i.e. π p ×\pi \in \mathbb{Z}^\times_p. Because p ×\mathbb{Z}_p^\times is a group under multiplication, p ×\mathbb{Z}_p^\times-valued measures on finite spaces are closed under the operadic composition Tom describes. And the same calculations which show the chain rule holds in the usual case should show it also holds here.

I’m not sure what to think of the condition of being a probability measure in this context.

  • If we define a probability measure π\pi to be one with kπ(k)=1\sum_k \pi(k) = 1, then there are no p ×\mathbb{Z}^\times_p-valued probability measures on an nn-element set unless n1n \equiv 1 mod pp. This seems like a strange restriction.

  • Another option might be to define a probability measure π\pi to be one such that | kπ(k)|=1|\sum_k \pi(k)| = 1. With this definition, there are no p ×\mathbb{Z}^\times_p-valued probability measures on nn unless nn is a unit mod pp, and moreover in this case every p ×\mathbb{Z}^\times_p-valued measure is a probability measure. That feels a little more natural.

  • Unfortunately for my feelings of “naturalness”, it appears that that the first, but not the second option, is closed under operadic composition. One might go with a hybrid option where one simply defines a probability measure on a finite set to be a p ×\mathbb{Z}_p^\times-valued measure on a finite sset whose cardinality is 1 mod pp.

Any of these options also starts to resemble the conditions for π\pi to be an \mathbb{R}-valued probability measure: first, to be in the radius of convergence of the logarithm formula, the values π(k)\pi(k) should be positive [and <2\lt 2], while to be a probability measure the π(k)\pi(k) should also add up to 1 (in absolute value, for the second or third option).

Posted by: Tim Campion on December 15, 2017 7:56 AM | Permalink | Reply to this

Re: Entropy Modulo a Prime

Just a few thoughts :

  1. The p adic logarithm is usually extended to all of Q_p by defining log p = 0. I don’t know if that is the right thing to do in this case.

2 The interval [0,1] is not all that natural in a number theoritic /adelic perspective. Maybe instead of working with probability, we can work with odds (p/(1-p))which have range (0,infinity) which is a more natural thing to consider since it is a connected subgroup R-0.

At least on class field theory, the right analogues of this connected component are the p adic numbers congruent to 1 modulo a fixed power of p. Once again, I don’t know if this is the right thing to look at in this context.

Posted by: Asvin on December 15, 2017 9:17 AM | Permalink | Reply to this

Re: Entropy Modulo a Prime

Note that one needs to specify the value of the extension of the logarithm not just on the uniformiser, but on Teichm¼ller lifts. At least according to Wikipedia, the Iwasawa logarithm simply ignores the Teichm¼ller lift, which means (it seems to me) that we don’t understand anything better about the finite-field logarithm by using this lift.

Posted by: L Spice on December 16, 2017 11:38 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

You say that the pp-adic logarithm log(x)\log(x) is defined if and only if |1x|<1|1 - x| &lt; 1, with which I agree; but then you say that the domain of the pp-adic logarithm is p &#8484;_p^&#10761;, and I think that you must mean 1+p p1 + p&#8484;_p. (If it were really defined on p &#8484;_p^&#10761;, then, assuming it behaved nicely in a sense to be obvious from the rest of this sentence, we’d have a reasonable candidate 𝔽 p p p𝔽 p&#120125;_p^&#10761; &#8594; &#8484;_p^&#10761; &#8594; &#8484;_p &#8594; &#120125;_p for the finite-field logarithm.) (Sorry about those enormous multiplication signs; I can’t figure out how to get them reasonably sized.)

Posted by: L Spice on December 16, 2017 11:35 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

Re the multiplication signs: I’ve just been doing things like $\mathbb{Z}_p^\times$.

Posted by: Tom Leinster on December 16, 2017 11:46 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

Huh; I could have sworn I tried exactly that (except that I usually omit braces around my single-letter arguments), and the post just stopped entirely immediately before the first backslash. Maybe I should have been using a filter other than “Markdown with itex to MathML”?

As a test: p ×\mathbb{Z}_p^\times. Huh, looks like it’s the braces that make all the difference. Here it is (or, rather, isn’t) without: syntax error at token Z.

Posted by: L Spice on December 19, 2017 8:29 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

I leave the text filter on the default, which is indeed Markdown with itex to MathML. And I’m not enough of an expert to guess what was happening with your earlier posts, unfortunately!

Posted by: Tom Leinster on December 20, 2017 12:12 AM | Permalink | Reply to this

Re: Entropy Modulo a Prime

… the observation was that the power series log(1+x)= nx nn\mathbb Zlog (1+x) = \sum_n \frac{x^n}{n} only converges to define a function for |1x|<1|1-x| \lt 1; this is just as true in \mathbb{R} or \mathbb{C} as it is in (p)\mathbb{Z}_{(p)}. One then makes reasonable choices to analytically extend the function so defined…

Posted by: Jesse C. McKeown on December 16, 2017 11:58 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

The exact comment to which I was responding was:

The usual logarithm formula converges for π p\pi \in \mathbb{Q}_p iff |1π|<1|1 - \pi| &lt; 1, i.e. π p ×\pi \in \mathbb{Z}_p^\times.

Posted by: L Spice on December 19, 2017 8:36 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime

OK, so where have we got to? As I understand it, the current status is as follows:

  • I asked the question: how should we define the entropy of a “probability distribution mod pp”, i.e., a finite tuple of elements of /p\mathbb{Z}/p\mathbb{Z} summing to 11? (That entropy should also be an element of /p\mathbb{Z}/p\mathbb{Z}.) I’ll answer the question in the next post, but I’m also interested in learning different approaches or viewpoints.

  • Asvin mentioned something called “pp-adic entropy”. I’d met that concept before, but only the concept of the pp-adic entropy of a group action, not of anything like a probability distribution of pp-adic numbers. I haven’t yet seen any literature on pp-adic entropy of that kind (and would be glad to be pointed to some).

  • Asvin also pointed out that there’s such a thing as the pp-adic logarithm. So, given π=(π 1,,π n) p n\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{Q}_p^n summing to 11, we could attempt to define the entropy of π\pi by the usual formula — H(π)= i:π i0π ilogπ i H(\pi) = -\sum_{i: \pi_i \neq 0} \pi_i \log \pi_i — but now interpreting the logarithm as a pp-adic logarithm. However…

  • Tim pointed out that the power series defining the pp-adic logarithm only converges in the disc of radius 11 about 11, and deduced that in this sense, the formula above is only well-defined if the cardinality of the support of π\pi is congruent to 11 mod pp. So, it’s usually not well-defined. But…

  • Asvin then observed that the pp-adic logarithm can be extended to p ×\mathbb{Q}_p^\times (or even p ×\mathbb{C}_p^\times) in such a way that the usual functional equation log(zw)=log(z)+log(w) \log(z w) = \log(z) + \log(w) holds everywhere and log(p)=0\log(p) = 0. According to Wikipedia, this is called the Iwasawa logarithm.

  • So, we can use the Iwasawa logarithm to define the entropy H(π)H(\pi) of any finite tuple π=(π 1,,π n) p n\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{C}_p^n summing to 11. Simply take Shannon’s formula (above) and interpret log\log as the Iwasawa logarithm. Then H(π)H(\pi) is also an element of p\mathbb{C}_p.

What I’m not getting is how you could use this definition of pp-adic entropy to obtain a definition of the entropy of a probability distribution in /p\mathbb{Z}/p\mathbb{Z}. Asvin, could you explain? E.g. can we try the example of p=3p = 3 and π=(1,2,2,2)\pi = (1, 2, 2, 2)?

Posted by: Tom Leinster on December 15, 2017 7:16 PM | Permalink | Reply to this

Post a New Comment