Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 16, 2017

Entropy Modulo a Prime (Continued)

Posted by Tom Leinster

In the comments last time, a conversation got going about pp-adic entropy. But here I’ll return to the original subject: entropy modulo pp. I’ll answer the question:

Given a “probability distribution” mod pp, that is, a tuple π=(π 1,,π n)(/p) n \pi = (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n summing to 11, what is the right definition of its entropy H p(π)/p? H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}?

How will we know when we’ve got the right definition? As I explained last time, the acid test is whether it satisfies the chain rule

H p(γ(π 1,,π n))=H p(γ)+ i=1 nγ iH p(π i). H_p(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_p(\gamma) + \sum_{i = 1}^n \gamma_i H_p(\pi^i).

This is supposed to hold for all γ=(γ 1,,γ n)Π n\gamma = (\gamma_1, \ldots, \gamma_n) \in \Pi_n and π i=(π 1 i,,π k i i)Π k i\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i}) \in \Pi_{k_i}, where Π n\Pi_n is the hyperplane

Π n={(π 1,,π n)(/p) n:π 1++π n=1}, \Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1\},

whose elements we’re calling “probability distributions” mod pp. And if God is smiling on us, H pH_p will be essentially the only quantity that satisfies the chain rule. Then we’ll know we’ve got the right definition.

Black belts in functional equations will be able to use the chain rule and nothing else to work out what H pH_p must be. But the rest of us might like an extra clue, and we have one in the definition of real Shannon entropy:

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i.

Now, we saw last time that there is no logarithm mod pp; that is, there is no group homomorphism

(/p) ×/p. (\mathbb{Z}/p\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.

But there is a next-best thing: a homomorphism

(/p 2) ×/p. (\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.

This is called the Fermat quotient q pq_p, and it’s given by

q p(n)=n p11p/p. q_p(n) = \frac{n^{p - 1} - 1}{p} \in \mathbb{Z}/p\mathbb{Z}.

Let’s go through why this works.

The elements of /p 2\mathbb{Z}/p^2\mathbb{Z} are the congruence classes mod p 2p^2 of the integers not divisible by pp. Fermat’s little theorem says that whenever nn is not divisible by pp,

n p11p \frac{n^{p - 1} - 1}{p}

is an integer. This, or rather its congruence class mod pp, is the Fermat quotient. The congruence class of nn mod p 2p^2 determines the congruence class of n p11n^{p - 1} - 1 mod p 2p^2, and it therefore determines the congruence class of (n p11)/p(n^{p - 1} - 1)/p mod pp. So, q pq_p defines a function (/p 2) ×/p(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}. It’s a pleasant exercise to show that it’s a homomorphism. In other words, q pq_p has the log-like property

q p(mn)=q p(m)+q p(n) q_p(m n) = q_p(m) + q_p(n)

for all integers m,nm, n not divisible by pp.

In fact, it’s essentially unique as such. Any other homomorphism (/p 2) ×/p(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z} is a scalar multiple of q pq_p. (This follows from the classical theorem that the group (/p 2) ×(\mathbb{Z}/p^2\mathbb{Z})^\times is cyclic.) It’s just like the fact that up to a scalar multiple, the real logarithm is the unique measurable function log:(0,)R\log : (0, \infty) \to \R such that log(xy)=logx+logy\log(x y) = \log x + \log y, but here there’s nothing like measurability complicating things.

So: q pq_p functions as a kind of logarithm. Given a mod pp probability distribution π=Π n\pi = \in \Pi_n, we might therefore guess that the right definition of its entropy is

i:π i0π iq p(a i), - \sum_{i : \pi_i \neq 0} \pi_i q_p(a_i),

where a ia_i is an integer representing π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}.

However, this doesn’t work. It depends on the choice of representatives a ia_i.

To get the right answer, we’ll look at real entropy in a slightly different way. Define :[0,1]\partial_\mathbb{R}: [0, 1] \to \mathbb{R} by

(x)={xlogx if x0, 0 if x=0.. \partial_\mathbb{R}(x) = \begin{cases} - x \log x &if  x \neq 0, \\ 0 &if  x = 0. \end{cases}.

Then \partial_\mathbb{R} has the derivative-like property

(xy)=x (y)+ (x)y. \partial_\mathbb{R}(x y) = x \partial_\mathbb{R}(y) + \partial_\mathbb{R}(x) y.

A linear map with this property is called a derivation, so it’s reasonable to call \partial_\mathbb{R} a nonlinear derivation.

The observation that \partial_\mathbb{R} is a nonlinear derivation turns out to be quite useful. For instance, real entropy is given by

H (π)= i=1 n (π i) H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i)

(πΠ n\pi \in \Pi_n), and verifying the chain rule for H H_\mathbb{R} is done most neatly using the derivation property of \partial_\mathbb{R}.

An equivalent formula for real entropy is

H (π)= i=1 n (π i) ( i=1 nπ i). H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i) - \partial_\mathbb{R}\biggl( \sum_{i = 1}^n \pi_i \biggr).

This is a triviality: π i=1\sum \pi_i = 1, so (π i)=0\partial_\mathbb{R}\bigl( \sum \pi_i \bigr) = 0, so this is the same as the previous formula. But it’s also quite suggestive: H (π)H_\mathbb{R}(\pi) measures the extent to which the nonlinear derivation \partial_\mathbb{R} fails to preserve the sum π i\sum \pi_i.

Now let’s try to imitate this in /p\mathbb{Z}/p\mathbb{Z}. Since q pq_p plays a similar role to log\log, it’s natural to define

p(n)=nq p(n)=nn pp \partial_p(n) = -n q_p(n) = \frac{n - n^p}{p}

for integers nn not divisible by pp. But the last expression makes sense even if nn is divisible by pp. So, we can define a function

p:/p 2/p \partial_p : \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}

by p(n)=(nn p)/p\partial_p(n) = (n - n^p)/p. (This is called a pp-derivation.) It’s easy to check that p\partial_p has the derivative-like property

p(mn)=m p(n)+ p(m)n. \partial_p(m n) = m \partial_p(n) + \partial_p(m) n.

And now we arrive at the long-awaited definition. The entropy mod pp of π=(π 1,,π n)\pi = (\pi_1, \ldots, \pi_n) is

H p(π)= i=1 n p(a i) p( i=1 na i), H_p(\pi) = \sum_{i = 1}^n \partial_p(a_i) - \partial_p\biggl( \sum_{i = 1}^n a_i \biggr),

where a ia_i \in \mathbb{Z} represents π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}. This is independent of the choice of representatives a ia_i. And when you work it out explicitly, it gives

H p(π)=1p(1 i=1 na i p). H_p(\pi) = \frac{1}{p} \biggl( 1 - \sum_{i = 1}^n a_i^p \biggr).

Just as in the real case, H pH_p satisfies the chain rule, which is most easily shown using the derivation property of p\partial_p.

Before I say any more, let’s have some examples.

  • In the real case, the uniform distribution u n=(1/n,,1/n)u_n = (1/n, \ldots, 1/n) has entropy logn\log n. Mod pp, this distribution only makes sense if pp does not divide nn (otherwise 1/n1/n is undefined); but assuming that, we do indeed have H p(u n)=q p(n)H_p(u_n) = q_p(n), as we’d expect.

  • When we take our prime pp to be 22, a probability distribution π\pi is just a sequence of bits like (0,0,1,0,1,1,1,0,1)(0, 0, 1, 0, 1, 1, 1, 0, 1) with an odd number of 11s. Its entropy H 2(π)/2H_2(\pi) \in \mathbb{Z}/2\mathbb{Z} turns out to be 00 if the number of 11s is congruent to 11 mod 44, and 11 if the number of 11s is congruent to 33 mod 44.

  • What about distributions on two elements? In other words, let α/p\alpha \in \mathbb{Z}/p\mathbb{Z} and put π=(α,1α)\pi = (\alpha, 1 - \alpha). What is H p(π)H_p(\pi)?

    It takes a bit of algebra to figure this out, but it’s not too hard, and the outcome is that for p2p \neq 2, H p(α,1α)= r=1 p1α rr. H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r}. This function was, in fact, the starting point of Kontsevich’s note, and it’s what he called the 1121\tfrac{1}{2}-logarithm.

We’ve now succeeded in finding a definition of entropy mod pp that satisfies the chain rule. That’s not quite enough, though. In principle, there could be loads of things satisfying the chain rule, in which case, what special status would ours have?

But in fact, up to the inevitable constant factor, our definition of entropy mod pp is the one and only definition satisfying the chain rule:

Theorem   Let (I:Π n/p)(I: \Pi_n \to \mathbb{Z}/p\mathbb{Z}) be a sequence of functions. Then II satisfies the chain rule if and only if I=cH pI = c H_p for some c/pc \in \mathbb{Z}/p\mathbb{Z}.

This is precisely analogous to the characterization theorem for real entropy, except that in the real case some analytic condition on II has to be imposed (continuity in Faddeev’s theorem, and measurability in the stronger theorem of Lee). So, this is excellent justification for calling H pH_p the entropy mod pp.

I’ll say nothing about the proof except the following. In Faddeev’s theorem over \mathbb{R}, the tricky part of the proof involves the fact that the sequence (logn) n1(\log n)_{n \geq 1} is not uniquely characterized up to a constant factor by the equation log(mn)=logm+logn\log(m n) = \log m + \log n; to make that work, you have to introduce some analytic condition. Over /p\mathbb{Z}/p\mathbb{Z}, the tricky part involves the fact that the domain of the “logarithm” (Fermat quotient) is not /p\mathbb{Z}/p\mathbb{Z}, but /p 2\mathbb{Z}/p^2\mathbb{Z}. So, analytic difficulties are replaced by number-theoretic difficulties.

Kontsevich didn’t actually write down a definition of entropy mod pp in his two-and-a-half page note. He did exactly enough to show that there must be a unique sensible such definition… and left it there! Of course he could have worked it out if he’d wanted to, and maybe he even did, but he didn’t write it up here.

Anyway, let’s return to the quotation from Kontsevich that I began my first post with:

Conclusion: If we have a random variable ξ\xi which takes finitely many values with all probabilities in \mathbb{Q} then we can define not only the transcendental number H(ξ)H(\xi) but also its “residues modulo pp” for almost all primes pp !

In the notation of these posts, he’s saying the following. Let

π=(π 1,,π n) \pi = (\pi_1, \ldots, \pi_n)

be a real probability distribution in which each π i\pi_i is rational. There are only finitely many primes that divide one or more of the denominators of π 1,,π n\pi_1, \ldots, \pi_n. For primes pp not belonging to this exceptional set, we can interpret π\pi as a probability distribution in /p\mathbb{Z}/p\mathbb{Z}. We can therefore take its mod pp entropy, H p(π)H_p(\pi).

Kontsevich is playfully suggesting that we view H p(π)/pH_p(\pi) \in \mathbb{Z}/p\mathbb{Z} as the residue class mod pp of H (π)H_\mathbb{R}(\pi) \in \mathbb{R}.

There is more to this than meets the eye! Different real probability distributions can have the same real entropy, so there’s a question of consistency. Kontsevich’s suggestion only makes sense if

H (π)=H (π)H p(π)=H p(π). H_\mathbb{R}(\pi) = H_\mathbb{R}(\pi') \implies H_p(\pi) = H_p(\pi').

And this is true! I have a proof, though I’m not convinced it’s optimal. Does anyone see an easy argument for this?

Let’s write (p)\mathcal{H}^{(p)} for the set of real numbers of the form H (π)H_\mathbb{R}(\pi), where π\pi is a real probability distribution whose probabilities π i\pi_i can all be expressed as fractions with denominator not divisible by pp. We’ve just seen that there’s a well-defined map

[.]: (p)/p [.] : \mathcal{H}^{(p)} \to \mathbb{Z}/p\mathbb{Z}

defined by

[H (π)]=H p(π). [H_\mathbb{R}(\pi)] = H_p(\pi).

For x (p)x \in \mathcal{H}^{(p)} \subseteq \mathbb{R}, we view [x][x] as the congruence class mod pp of xx. This notion of “congruence class” even behaves something like the ordinary notion, in the sense that [.][.] preserves addition.

(We can even go a bit further. Accompanying the characterization theorem for entropy mod pp, there is a characterization theorem for information loss mod pp, strictly analogous to the theorem that John Baez, Tobias Fritz and I proved over \mathbb{R}. I won’t review that stuff here, but the point is that an information loss is a difference of entropies, and this enables us to define the congruence class mod pp of the difference of two elements of (p)\mathcal{H}^{(p)}. The same additivity holds.)

There’s just one more thing. In a way, the definition of entropy mod pp is unsatisfactory. In order to define it, we had to step outside the world of /p\mathbb{Z}/p\mathbb{Z} by making arbitrary choices of representing integers, and then we had to show that the definition was independent of those choices. Can’t we do it directly?

In fact, we can. It’s a well-known miracle about finite fields KK that any function KKK \to K is a polynomial. It’s a slightly less well-known miracle that any function K nKK^n \to K, for any n0n \geq 0, is also a polynomial.

Of course, multiple polynomials can induce the same function. For instance, the polynomials x px^p and xx induce the same function /p/p\mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}. But it’s still possible to make a uniqueness statement. Given a function F:K nKF : K^n \to K, there’s a unique polynomial fK[x 1,,x n]f \in K[x_1, \ldots, x_n] that induces FF and is of degree less than the order of KK in each variable separately.

So, there must be a polynomial representing entropy, of order less than pp in each variable. And as it turns out, it’s this one:

H p(π 1,,π n)= 0r 1,,r n<p: r 1++r n=pπ 1 r 1π n r nr 1!r n!. H_p(\pi_1, \ldots, \pi_n) = - \sum_{\substack{0 \leq r_1, \ldots, r_n \lt p:\\r_1 + \cdots + r_n = p}} \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!}.

You can check that when n=2n = 2, this is in fact the same polynomial r=1 p1π 1 r/r\sum_{r = 1}^{p - 1} \pi_1^r/r as we met before — Kontsevich’s 1121\tfrac{1}{2}-logarithm.

It’s striking that this direct formula for entropy modulo a prime looks quite unlike the formula for real entropy,

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.

It’s also striking that in the case n=2n = 2, the formula for real entropy is

H (α,1α)=αlogα(1α)log(1α), H_\mathbb{R}(\alpha, 1 - \alpha) = - \alpha \log \alpha - (1 - \alpha) \log(1 - \alpha),

whereas mod pp, we get

H p(α,1α)= r=1 p1α rr, H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r},

which is a truncation of the Taylor series of log(1α)-\log(1 - \alpha). And yet, the characterization theorems for entropy over \mathbb{R} and over /p\mathbb{Z}/p\mathbb{Z} are strictly analogous.

As I see it, there are two or three big open questions:

  • Entropy over \mathbb{R} can be understood, interpreted and applied in many ways. How can we understand, interpret or apply entropy mod pp?

  • Entropy over \mathbb{R} and entropy mod pp are defined in roughly analogous ways, and uniquely characterized by strictly analogous theorems. Is there a common generalization? That is, can we unify the two definitions and characterization theorems, perhaps proving a theorem about entropy over suitable fields?

Posted at December 16, 2017 4:53 PM UTC

TrackBack URL for this Entry:

19 Comments & 0 Trackbacks

Re: Entropy Modulo a Prime (Continued)

Typo: “definitin”.

Posted by: Blake Stacey on December 17, 2017 8:37 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Fixed. Thanks.

Posted by: Tom Leinster on December 17, 2017 9:10 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

I wonder, does this definition of entropy mod pp furnish a definition of mutual information akin to that familiar from the Shannon theory? Given two random variables XX and YY to which we can ascribe a joint probability π XY\pi_{X Y} with marginals π X\pi_X and π Y\pi_Y, can we write I p(X;Y)=H p(π X)+H p(π Y)H p(π XY)I_p(X;Y) = H_p(\pi_X) + H_p(\pi_Y) - H_p(\pi_{X Y}) and have it behave sensibly?

Posted by: Blake Stacey on December 17, 2017 11:23 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Good question!

What would count as behaving “sensibly”? All I can think of for now are symmetry (which is trivial) and various inequalities (which don’t make sense). What are the other important properties of mutual information?

Posted by: Tom Leinster on December 17, 2017 11:36 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

To begin to answer my own question: in the classical Shannon-type setting, mutual information can be expressed in terms of relative entropy in a couple of ways. Writing H()H(-\|-) for relative entropy,

I(X;Y)= yPr(y)H((X|y)X) I(X; Y) = \sum_y Pr(y) H\bigl( (X|y) \| X \bigr)

and also

I(X;Y)=H((X,Y)XY). I(X; Y) = H\bigl( (X, Y) \| X \otimes Y \bigr).

Here I’ve written everything in terms of random variables rather than distributions; my (X,Y)(X, Y) is your π XY\pi_{X Y}, and my XYX \otimes Y is the independent coupling of XX and YY.

It would be nice if the same equations held mod pp, with your definition of mutual information. But first we need a definition of relative entropy mod pp!

I spent a little while last week trying to find the right definition of relative entropy mod pp, without success. But these equations might provide a way forward.

Posted by: Tom Leinster on December 17, 2017 11:46 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Defining “sensibly” in this context is indeed tricky, since the properties of joint and mutual information that I’m used to leaning on are pretty much all inequalities that don’t make sense here!

The polynomial representation looks rather combinatorial, like a weighted counting of coincidences in repeated draws. But I haven’t yet been able to extract meaning from that.

Posted by: Blake Stacey on December 18, 2017 6:44 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Typo: around the middle of the page, when you quote Kontsevich’s conclusion, you mean to say “with all probabilities in \mathbb{Q}” instead of “with all probabilities in \mathbb{R}”.

Posted by: Martin on December 31, 2017 8:35 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Oops! That’s a bad one. Fixed now. Thanks!

Posted by: Tom Leinster on January 1, 2018 1:13 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Very interesting. This discussion seems closely related to Connes’ The Witt construction in characteristic one and Quantization; I don’t understand it yet, but Connes uses entropy to construct a “characteristic one” version of Witt vectors.

Posted by: Qiaochu Yuan on January 4, 2018 9:00 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Very interesting.


I was sort of depressed about the lack of response to these posts, because I think this story is both really neat (for what we do know) and really intriguing (for what we don’t). The credit for this must ultimately go to Kontsevich.

But later I realized a probable reason for the lack of response: that I didn’t give any way of understanding or interpreting entropy mod pp. And there’s a simple reason: I don’t have one!

Entropy mod pp is one of those things that I can only understand by analogy — in this case, the analogy with real entropy. Connes seems to be steering by analogy too, judging by the title of his section 6. But I’d love to have a direct interpretation of entropy mod pp.

Can you say any more about the relationship you see between Connes’s work and my posts? Browsing his paper for a few minutes, I can see some familiar-looking formulas, but nothing deeper.

Posted by: Tom Leinster on January 4, 2018 9:42 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Interesting stuff, Tom!

You have a connection between entropy mod pp and pp-derivations, which are closely related to Witt vectors, and hence a connection between entropy mod pp and Witt vectors. This reminds me a connection I know about between the ‘big’ Witt vectors, which somehow combine the usual Witt vectors for all primes simultaneously, and the entropy function.

This might be too big for a blog comment, but let me give it a shot.

First, the connection between pp-derivations and the usual Witt vector functor W (p)W^{(p)} with respect to a prime pp: (The pro’s usually call it the `pp-typical’ Witt vector functor, by the way.) It’s the right adjoint of the forgetful functor from the category of rings (commutative) equipped with a pp-derivation to the category of rings. This observation was first made by Andre Joyal, I believe. It could have been made before him, in that it follows from facts known at the time using a bit of category theory, but to the best of my knowledge it was not. It’s perhaps nice to compare it to the fact that the right adjoint of the forgetful functor from rings with an ordinary derivation to rings is given by the divided power series functor sending RR to the set of series na nt nn!\sum_n a_n \frac{t^n}{n!}, where a nRa_n\in R and t nn!\frac{t^n}{n!} is a formal symbol and the ring operations and derivation are what you’d expect.

Second, the big Witt vectors. The big Witt vector functor WW can be described in a few ways. One, it’s the right adjoint of the forgetful functor from lambda-rings to rings. This is analogous to the definition above of the pp-typical Witt vector functor, but the connection is tighter than that. This is because a lambda-ring has canonical family of pp-derivations δ p\delta_p for each prime pp. (Formally, δ p(x)=(ψ p(x)x p)/p\delta_p(x)=(\psi_p(x)-x^p)/p, where ψ p\psi_p is the pp-th Adams operation. So δ p\delta_p is a `witness’ to the Frobenius lift property of the Adams operation. More properly, δ p\delta_p is the operation associated to the symmetric function ((x 1 p+x 2 p+)(x 1+x 2+) p)/p((x_1^p+x_2^p+\cdots)-(x_1+x_2+\cdots)^p)/p.) Even further, the δ p\delta_p for varying pp satisfy certain commutation relations. They are what you get by rewriting the equation ψ pψ q=ψ qψ p\psi_p\circ\psi_q=\psi_q\circ\psi_p in terms of the δ\delta’s and solving for δ pδ q\delta_p\circ\delta_q. All the denominators will magically disappear. So a lambda-ring has commuting endomorphisms ψ p\psi_p, one for each prime pp, each of which reduces to the Frobenius map modulo its prime. (In fact, if you incant the right spells from category theory and the words ‘torsion free’, this can serve as another definition of lambda-ring.)

In particular, each big Witt vector ring W(R)W(R) comes with canonical commuting endomorphisms ψ p\psi_p, indexed by the primes, such that ψ p\psi_p reduces to the Frobenius map xx px\mapsto x^p in W(R)/pW(R)W(R)/ p W(R).

Third, we want to consider the big Witt vectors of \mathbb{R}. Actually, that’s too simple. It’s just the product ring {1,2,3,}\mathbb{R}^{\{1,2,3,\dots\}}, where each operator ψ p\psi_p acts through the exponent by scaling by pp. What we really want to consider is W( +)W(\mathbb{R}_+), the big Witt vectors of the semiring (or ‘rig’) +\mathbb{R}_+ of reals x0x \geq 0. Let me not get into the definition of this. (You can find the definition in my paper “Witt vectors, semirings, and total positivity”.) Instead, please just take my word that W( +)W(\mathbb{R}_+) is a sub-semiring of {1,2,3,}\mathbb{R}^{\{1,2,3,\dots\}} defined by certain complicated inequalities and it is stable under the endomorphisms ψ p\psi_p.

Now for the amazing thing: The family of endomorphisms ψ p\psi_p of W( +)W(\mathbb{R}_+), which is a priori defined only for prime numbers pp, can be naturally extended to a family of endomorphisms for all real p>1p &gt; 1 !! This is a consequence of the theory of `total positivity’ surrounding the Edrei-Thoma theorem.

Now let’s interpret this geometrically by taking Spec\mathrm{Spec} of everything. What I really mean is to just pass to the opposite category where we can use the language of geometry. Then Spec(W( +))\mathrm{Spec}(W(\mathbb{R}_+)) is some kind of space which has commuting endomorphisms ψ p\psi_p, for pp prime, which interpolate to real pp > 11. We might think of this as a flow parameterized by multiplicative time pp.

Now, this is very reminiscent of Deninger’s picture to prove the Riemann hypothesis! In it, he likes to look at the infinitesimal generator of the flow. If we do this here, the entropy function will come in.

How does this work? Up to some topological closure, any Witt vector in W( +)W(\mathbb{R}_+) can be written as a countable (or finite) sum n[a n]\sum_{n} [a_n], where a n>0a_n &gt; 0 is real and [a n]W( +)[a_n]\in W(\mathbb{R}_+) denotes the so-called Teichmueller representative of a na_n. (Let’s not worry about what this means, and let’s ignore ay convergence issues.) Then the Frobenius flow ψ p\psi_p, for p>1p &gt; 1 is given by ψ p( n[a n])= n[a n p] \psi_p(\sum_n [a_n]) = \sum_n [a_n^p] Now consider the derivative with respect to log(p)\log(p) (we consider log(p)\log(p) to make our flow additively parameterized). Since [][-] is a multiplicative function, we can write [a p]=[a] p[a^p]=[a]^p, and hence we have δ( n[a n])=lim p1 nd([a n] p)dlog(p)= n[a n]log[a n] \delta(\sum_n [a_n]) = \lim_{p\to 1} \sum_n \frac{d([a_n]^p)}{d\log(p)} = \sum_n[a_n]\log[a_n] And this is some Witt vector version of the entropy function!

I was really excited when I noticed this a few years ago. The main reason was the formal similarity to Deninger’s picture, but I haven’t been able to make anything of real substance out of it. Another reason is that Connes also has a connection between Witt vectors, semirings, and entropy, but this I haven’t really looked into.

Do you sense any connection to your post? Whereas I let pp tend to 11 in \mathbb{R}, you look close to pp in p\mathbb{Q}_p.

Posted by: James Borger on January 5, 2018 5:00 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Thanks for this excellent and detailed comment! Among other things, it reminded me that I saw Joyal talk about Witt vectors at Category Theory 2015 in Aveiro, Portugal. I remember enjoying it very much and thinking about it for a while afterwards. Looking at my notes now, I don’t see the punchline stated as clearly as you put it in your comment. I’m sure he was talking about the fact you mention; I just can’t have got the point straight in my head during his lecture.

Among other things, my notes say that if pp is invertible in a (commutative) ring AA than W (p)(A)W^{(p)}(A) is simply the product ring A {1,2,3,}A^{\{1, 2, 3, \ldots\}}. Is that right?

I’ll have to spend a while thinking about the rest of your post, and get back to you with more questions.

Posted by: Tom Leinster on January 6, 2018 12:31 AM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

You wrote, “if pp is invertible in a (commutative) ring AA than W (p)(A)W^{(p)}(A) is simply the product ring A {1,2,3,}A^{\{1,2,3,\dots\}}. Is that right?”

Yes, that’s true, although I’d prefer to write the exponent as the monoid ={0,1,2,}\mathbb{N}=\{0,1,2,\dots\}, or perhaps {1,p,p 2,}\{1,p,p^2,\dots\}. :)

There should be a pretty formal (i.e. category theoretic) proof. Rather than trying to cook one up, maybe I’ll just say that it should follow from two key facts: (i) giving a pp-derivation on a [1/p]\mathbb{Z}[1/p]-algebra is equivalent to giving an endomorphism (δ\delta corresponds to xx p+pδ(x)x\mapsto x^p+p\delta(x)) and hence an action of \mathbb{N}, and (ii) the forgetful functor from rings with pp-derivation to rings is both monadic and comonadic.

Posted by: James Borger on January 6, 2018 3:31 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

That’s funny — I had A A^\mathbb{N} in my notes from Joyal’s talk (with 00 \in \mathbb{N}), but I saw {1,2,}\mathbb{R}^{\{1, 2, \ldots\}} later in your post, so I thought I was conforming to your preferred convention by starting from 11. Obviously not! Anyway, glad that small point’s cleared up.

Posted by: Tom Leinster on January 6, 2018 4:24 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Also, here’s something about pp-adic polylogarithms. I don’t think it’s related to the main point of your posts, but there’s no harm in putting it here.

pp-adic analogues of polylogarithms have been studied quite a bit in the theory of ‘mixed Tate motives’. In fact, there are two important pp-adic analogues of the polylogarithms. The first is the usual jj-th polylogarithm:

Li j(x)= kx kk jLi_j(x)=\sum_k \frac{x^k}{k^j}

This will converge if xx is pp-adically sufficiently close to 00 but not in general because there are pp’s in the denominators of the coefficients. A closely related point is that you can’t reduce the series modulo pp. The simplest way of getting around this is just to drop the bad terms and define

Li j (p)(x)= pkx kk jLi^{(p)}_j(x)=\sum_{p\nmid k} \frac{x^k}{k^j}

Surprisingly, this series is also important, and it’s the second of the two analogues. There are no problems reducing this new series modulo pp, and it has the following relation to Kontsevich’s function: Li j (p)(x) k=1 p1x kk j11x pmodp, Li^{(p)}_j(x) \equiv \sum_{k=1}^{p-1} \frac{x^k}{k^j} \cdot \frac{1}{1-x^p} \mod p, which is why I brought it up.

How do these two series come up in geometry? Briefly, consider the variety obtained by first puncturing the projective line at 00 and \infty, and then gluing the points 11 and xx together transversely. Then the space of complex points of this variety is homotopy equivalent to a circle with two point glued together. So its H 1H^1 of has rank 22, and it is naturally an extension of H 1H^1 of the twice punctured projective line by H 0H^0 of the point. Such an extension is split if you view the cohomology as being a vector space or an abelian group. But in algebraic geometry, cohomology takes values in richer categories, such as those of Hodge structures or Galois representations, and these categories are not semi-simple. It turns out that in these categories the extension class of this H 1H^1 is given by the logarithm of xx, where the meaning of the logarithm depends on the cohomology theory. For mixed Hodge structures, the extension class is the usual log(x)\log(x). But let’s consider pp-adic cohomology theories instead. These involve both pp-adic differential equations and Frobenius-twisted linear algebra. If you focus on the differential equations, the usual pp-adic logarithm Li 1(x)Li_1(x) comes up, but if you focus on the Frobenius linear algebra, the second one Li 1 (p)(x)Li^{(p)}_1(x) naturally comes up. If you want to impress people, you could call the package of both of them the pp-adic realization of the motivic logarithm. The usual complex logarithm would be the Hodge realization.

For the polylogarithms, there is a similar but more complicated story. I’m not sure what relation there is to entropy mod pp, which was the main point of your posts. Maybe this would be clearer if I understood the relation between the usual logarithm and entropy at a deeper level.

Posted by: James Borger on January 5, 2018 5:14 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Sorry for the following trivial observations, but I couldn’t resist. :-)

In fact, we can. It’s a well-known miracle about finite fields KK that any function KKK \to K is a polynomial.

I’m guessing “miracle” is pretty facetious! In any case, my immediate thought here was Lagrange interpolation.

It’s a slightly less well-known miracle that any function K nKK^n \to K, for any n0n \geq 0, is also a polynomial.

I didn’t consciously know that, but here’s a cheap way of seeing it for those who like explicit formulas: if KK has qq elements, then the polynomial

p(x 1,,x n)= j=1 n(1x j q1)p(x_1, \ldots, x_n) = \prod_{j=1}^n (1 - x_j^{q-1})

has the property that p(0,0,,0)=1p(0, 0, \ldots, 0) = 1 and p(a 1,,a n)=0p(a_1, \ldots, a_n) = 0 whenever (a 1,,a n)(0,,0)(a_1, \ldots, a_n) \neq (0, \ldots, 0). Thus if the collection

(a i1,,a in)(a_{i 1}, \ldots, a_{i n})

for i=1,,q ni = 1, \ldots, q^n lists all q nq^n distinct points of K nK^n, then an arbitrary function f:K nKf: K^n \to K will be represented by the polynomial

i=1 q nf(a i1,,a in)p(x 1a i1,x na in).\sum_{i=1}^{q^n} f(a_{i 1}, \ldots, a_{i n}) \cdot p(x_1 - a_{i 1}, \ldots x_n - a_{i n}).

(More generally, I believe there are multivariable forms of Lagrange interpolation which work over any field KK, but I’d rather not embark on details at the moment. Roughly, I guess that if you are given distinct points a iK na_i \in K^n for i=1,,Mi = 1, \ldots, M, then vectors built out of entries of type j=1 na ij k\prod_{j = 1}^n a_{i j}^k can be arranged to be linearly independent so long as the upper bound of the exponents kk is taken to be sufficiently high. Then combine that with various tricks of Cramer’s rule type. At least that’s the idea I extracted looking at this.)

Posted by: Todd Trimble on January 6, 2018 7:59 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

That’s a nice short proof! I knew there existed a Lagrange interpolation proof that every nn-ary operation on a finite field is a polynomial, but I’d never actually gone through it. It’s simpler than I’d anticipated.

I’m guessing “miracle” is pretty facetious!

No, I was being sincere. Of course it’s easy to prove. But if one’s intuition about functions and polynomials on fields is based on \mathbb{R} (or indeed \mathbb{C}), then it does seem miraculous. It’s as if every function \mathbb{R} \to \mathbb{R} were a polynomial.

It’s like the way that when you do Fourier analysis for finite abelian groups, all the analytic complications present in the real cse (regularity conditions, approximations to the identity, etc.) vanish in a puff of smoke. Every function on a finite abelian group is a trigonometric polynomial — it’s that simple.

If I remember my undergraduate schedule correctly, I took Galois theory at about the same time I took numerical analysis. In one course I learned that every function from a finite field to itself is a polynomial, and in the other I learned about Lagrange interpolation. But the proof of the fact about fields was not the Lagrange one; it was the case n=1n = 1 of the proof I give below. I don’t think I learned the Lagrange proof until much later. Both courses were optional, so neither lecturer could assume that students in their audience was attending the other course.

Here’s the proof I had in mind that for a finite field KK and n0n \geq 0, every function K nKK^n \to K can be represented by a polynomial. In fact, it’s a proof that every function K nKK^n \to K can be uniquely represented by a polynomial of degree <q\lt q in each variable separately, where qq is the order of KK.

Let A nA_n denote the additive group of polynomials in K[x 1,,x n]K[x_1, \ldots, x_n] of degree <q\lt q in each variable separately. There’s a canonical function

R:A nSet(K n,K) R : A_n \to \mathbf{Set}(K^n, K)

that assigns to each polynomial the function that it represents. The claim is that RR is a bijection. The domain and codomain each have q q nq^{q^n} elements, so it suffices to prove that RR is injective. Now Set(K n,K)\mathbf{Set}(K^n, K) is a group under pointwise addition, and RR is a homomorphism, so it suffices to prove that kerR\ker R is trivial.

In other words, we just have to prove that any polynomial of degree <q\lt q in each of the variables x 1,,x nx_1, \ldots, x_n separately that vanishes everywhere must, in fact, be the zero polynomial. This is an easy induction on nn.

This is longer than your Lagrange proof, but it proves a bit more (namely, the uniqueness of the representing polynomial subject to it being of small enough degree).

Does anyone know anywhere in the literature where there’s a proof of the theorem I just proved? I spent a while looking, and found plenty of places where (1) the case n=1n = 1 is proved, or (2) the author asserts that the general case is well-known, but nowhere where the general case is actually proved. I do believe it is well-known, but I had a curiously hard time tracking it down.

Posted by: Tom Leinster on January 6, 2018 9:26 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

Sweet! I like that proof. But I think we can use the same idea as applied to the proof I gave: the polynomial I wrote down manifestly has the property that the degree in each variable is less than qq. This shows that the map A nSet(K n,K)A_n \to \mathbf{Set}(K^n, K) is surjective. A surjection between two finite sets of the same cardinality is a bijection.

Posted by: Todd Trimble on January 6, 2018 10:21 PM | Permalink | Reply to this

Re: Entropy Modulo a Prime (Continued)

I concede, that’s a shorter proof!

So, however much or little I manage to teach myself about Witt vectors in what remains of the evening, I can tell myself that I’ve learned a nice new piece of mathematics today.

Posted by: Tom Leinster on January 6, 2018 10:36 PM | Permalink | Reply to this

Post a New Comment