## December 16, 2017

### Entropy Modulo a Prime (Continued)

#### Posted by Tom Leinster

In the comments last time, a conversation got going about $p$-adic entropy. But here I’ll return to the original subject: entropy modulo $p$. I’ll answer the question:

Given a “probability distribution” mod $p$, that is, a tuple $\pi = (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n$ summing to $1$, what is the right definition of its entropy $H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}?$

How will we know when we’ve got the right definition? As I explained last time, the acid test is whether it satisfies the chain rule

$H_p(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_p(\gamma) + \sum_{i = 1}^n \gamma_i H_p(\pi^i).$

This is supposed to hold for all $\gamma = (\gamma_1, \ldots, \gamma_n) \in \Pi_n$ and $\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i}) \in \Pi_{k_i}$, where $\Pi_n$ is the hyperplane

$\Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1\},$

whose elements we’re calling “probability distributions” mod $p$. And if God is smiling on us, $H_p$ will be essentially the only quantity that satisfies the chain rule. Then we’ll know we’ve got the right definition.

Black belts in functional equations will be able to use the chain rule and nothing else to work out what $H_p$ must be. But the rest of us might like an extra clue, and we have one in the definition of real Shannon entropy:

$H_\mathbb{R}(\pi) = - \sum_{i: \pi_i \neq 0} \pi_i \log \pi_i.$

Now, we saw last time that there is no logarithm mod $p$; that is, there is no group homomorphism

$(\mathbb{Z}/p\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.$

But there is a next-best thing: a homomorphism

$(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}.$

This is called the Fermat quotient $q_p$, and it’s given by

$q_p(n) = \frac{n^{p - 1} - 1}{p} \in \mathbb{Z}/p\mathbb{Z}.$

Let’s go through why this works.

The elements of $\mathbb{Z}/p^2\mathbb{Z}$ are the congruence classes mod $p^2$ of the integers not divisible by $p$. Fermat’s little theorem says that whenever $n$ is not divisible by $p$,

$\frac{n^{p - 1} - 1}{p}$

is an integer. This, or rather its congruence class mod $p$, is the Fermat quotient. The congruence class of $n$ mod $p^2$ determines the congruence class of $n^{p - 1} - 1$ mod $p^2$, and it therefore determines the congruence class of $(n^{p - 1} - 1)/p$ mod $p$. So, $q_p$ defines a function $(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}$. It’s a pleasant exercise to show that it’s a homomorphism. In other words, $q_p$ has the log-like property

$q_p(m n) = q_p(m) + q_p(n)$

for all integers $m, n$ not divisible by $p$.

In fact, it’s essentially unique as such. Any other homomorphism $(\mathbb{Z}/p^2\mathbb{Z})^\times \to \mathbb{Z}/p\mathbb{Z}$ is a scalar multiple of $q_p$. (This follows from the classical theorem that the group $(\mathbb{Z}/p^2\mathbb{Z})^\times$ is cyclic.) It’s just like the fact that up to a scalar multiple, the real logarithm is the unique measurable function $\log : (0, \infty) \to \R$ such that $\log(x y) = \log x + \log y$, but here there’s nothing like measurability complicating things.

So: $q_p$ functions as a kind of logarithm. Given a mod $p$ probability distribution $\pi = \in \Pi_n$, we might therefore guess that the right definition of its entropy is

$- \sum_{i : \pi_i \neq 0} \pi_i q_p(a_i),$

where $a_i$ is an integer representing $\pi_i \in \mathbb{Z}/p\mathbb{Z}$.

However, this doesn’t work. It depends on the choice of representatives $a_i$.

To get the right answer, we’ll look at real entropy in a slightly different way. Define $\partial_\mathbb{R}: [0, 1] \to \mathbb{R}$ by

$\partial_\mathbb{R}(x) = \begin{cases} - x \log x &if  x \neq 0, \\ 0 &if  x = 0. \end{cases}.$

Then $\partial_\mathbb{R}$ has the derivative-like property

$\partial_\mathbb{R}(x y) = x \partial_\mathbb{R}(y) + \partial_\mathbb{R}(x) y.$

A linear map with this property is called a derivation, so it’s reasonable to call $\partial_\mathbb{R}$ a nonlinear derivation.

The observation that $\partial_\mathbb{R}$ is a nonlinear derivation turns out to be quite useful. For instance, real entropy is given by

$H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i)$

($\pi \in \Pi_n$), and verifying the chain rule for $H_\mathbb{R}$ is done most neatly using the derivation property of $\partial_\mathbb{R}$.

An equivalent formula for real entropy is

$H_\mathbb{R}(\pi) = \sum_{i = 1}^n \partial_\mathbb{R}(\pi_i) - \partial_\mathbb{R}\biggl( \sum_{i = 1}^n \pi_i \biggr).$

This is a triviality: $\sum \pi_i = 1$, so $\partial_\mathbb{R}\bigl( \sum \pi_i \bigr) = 0$, so this is the same as the previous formula. But it’s also quite suggestive: $H_\mathbb{R}(\pi)$ measures the extent to which the nonlinear derivation $\partial_\mathbb{R}$ fails to preserve the sum $\sum \pi_i$.

Now let’s try to imitate this in $\mathbb{Z}/p\mathbb{Z}$. Since $q_p$ plays a similar role to $\log$, it’s natural to define

$\partial_p(n) = -n q_p(n) = \frac{n - n^p}{p}$

for integers $n$ not divisible by $p$. But the last expression makes sense even if $n$ is divisible by $p$. So, we can define a function

$\partial_p : \mathbb{Z}/p^2\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}$

by $\partial_p(n) = (n - n^p)/p$. (This is called a $p$-derivation.) It’s easy to check that $\partial_p$ has the derivative-like property

$\partial_p(m n) = m \partial_p(n) + \partial_p(m) n.$

And now we arrive at the long-awaited definition. The entropy mod $p$ of $\pi = (\pi_1, \ldots, \pi_n)$ is

$H_p(\pi) = \sum_{i = 1}^n \partial_p(a_i) - \partial_p\biggl( \sum_{i = 1}^n a_i \biggr),$

where $a_i \in \mathbb{Z}$ represents $\pi_i \in \mathbb{Z}/p\mathbb{Z}$. This is independent of the choice of representatives $a_i$. And when you work it out explicitly, it gives

$H_p(\pi) = \frac{1}{p} \biggl( 1 - \sum_{i = 1}^n a_i^p \biggr).$

Just as in the real case, $H_p$ satisfies the chain rule, which is most easily shown using the derivation property of $\partial_p$.

Before I say any more, let’s have some examples.

• In the real case, the uniform distribution $u_n = (1/n, \ldots, 1/n)$ has entropy $\log n$. Mod $p$, this distribution only makes sense if $p$ does not divide $n$ (otherwise $1/n$ is undefined); but assuming that, we do indeed have $H_p(u_n) = q_p(n)$, as we’d expect.

• When we take our prime $p$ to be $2$, a probability distribution $\pi$ is just a sequence of bits like $(0, 0, 1, 0, 1, 1, 1, 0, 1)$ with an odd number of $1$s. Its entropy $H_2(\pi) \in \mathbb{Z}/2\mathbb{Z}$ turns out to be $0$ if the number of $1$s is congruent to $1$ mod $4$, and $1$ if the number of $1$s is congruent to $3$ mod $4$.

• What about distributions on two elements? In other words, let $\alpha \in \mathbb{Z}/p\mathbb{Z}$ and put $\pi = (\alpha, 1 - \alpha)$. What is $H_p(\pi)$?

It takes a bit of algebra to figure this out, but it’s not too hard, and the outcome is that for $p \neq 2$, $H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r}.$ This function was, in fact, the starting point of Kontsevich’s note, and it’s what he called the $1\tfrac{1}{2}$-logarithm.

We’ve now succeeded in finding a definition of entropy mod $p$ that satisfies the chain rule. That’s not quite enough, though. In principle, there could be loads of things satisfying the chain rule, in which case, what special status would ours have?

But in fact, up to the inevitable constant factor, our definition of entropy mod $p$ is the one and only definition satisfying the chain rule:

Theorem   Let $(I: \Pi_n \to \mathbb{Z}/p\mathbb{Z})$ be a sequence of functions. Then $I$ satisfies the chain rule if and only if $I = c H_p$ for some $c \in \mathbb{Z}/p\mathbb{Z}$.

This is precisely analogous to the characterization theorem for real entropy, except that in the real case some analytic condition on $I$ has to be imposed (continuity in Faddeev’s theorem, and measurability in the stronger theorem of Lee). So, this is excellent justification for calling $H_p$ the entropy mod $p$.

I’ll say nothing about the proof except the following. In Faddeev’s theorem over $\mathbb{R}$, the tricky part of the proof involves the fact that the sequence $(\log n)_{n \geq 1}$ is not uniquely characterized up to a constant factor by the equation $\log(m n) = \log m + \log n$; to make that work, you have to introduce some analytic condition. Over $\mathbb{Z}/p\mathbb{Z}$, the tricky part involves the fact that the domain of the “logarithm” (Fermat quotient) is not $\mathbb{Z}/p\mathbb{Z}$, but $\mathbb{Z}/p^2\mathbb{Z}$. So, analytic difficulties are replaced by number-theoretic difficulties.

Kontsevich didn’t actually write down a definition of entropy mod $p$ in his two-and-a-half page note. He did exactly enough to show that there must be a unique sensible such definition… and left it there! Of course he could have worked it out if he’d wanted to, and maybe he even did, but he didn’t write it up here.

Anyway, let’s return to the quotation from Kontsevich that I began my first post with:

Conclusion: If we have a random variable $\xi$ which takes finitely many values with all probabilities in $\mathbb{Q}$ then we can define not only the transcendental number $H(\xi)$ but also its “residues modulo $p$” for almost all primes $p$ !

In the notation of these posts, he’s saying the following. Let

$\pi = (\pi_1, \ldots, \pi_n)$

be a real probability distribution in which each $\pi_i$ is rational. There are only finitely many primes that divide one or more of the denominators of $\pi_1, \ldots, \pi_n$. For primes $p$ not belonging to this exceptional set, we can interpret $\pi$ as a probability distribution in $\mathbb{Z}/p\mathbb{Z}$. We can therefore take its mod $p$ entropy, $H_p(\pi)$.

Kontsevich is playfully suggesting that we view $H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}$ as the residue class mod $p$ of $H_\mathbb{R}(\pi) \in \mathbb{R}$.

There is more to this than meets the eye! Different real probability distributions can have the same real entropy, so there’s a question of consistency. Kontsevich’s suggestion only makes sense if

$H_\mathbb{R}(\pi) = H_\mathbb{R}(\pi') \implies H_p(\pi) = H_p(\pi').$

And this is true! I have a proof, though I’m not convinced it’s optimal. Does anyone see an easy argument for this?

Let’s write $\mathcal{H}^{(p)}$ for the set of real numbers of the form $H_\mathbb{R}(\pi)$, where $\pi$ is a real probability distribution whose probabilities $\pi_i$ can all be expressed as fractions with denominator not divisible by $p$. We’ve just seen that there’s a well-defined map

$[.] : \mathcal{H}^{(p)} \to \mathbb{Z}/p\mathbb{Z}$

defined by

$[H_\mathbb{R}(\pi)] = H_p(\pi).$

For $x \in \mathcal{H}^{(p)} \subseteq \mathbb{R}$, we view $[x]$ as the congruence class mod $p$ of $x$. This notion of “congruence class” even behaves something like the ordinary notion, in the sense that $[.]$ preserves addition.

(We can even go a bit further. Accompanying the characterization theorem for entropy mod $p$, there is a characterization theorem for information loss mod $p$, strictly analogous to the theorem that John Baez, Tobias Fritz and I proved over $\mathbb{R}$. I won’t review that stuff here, but the point is that an information loss is a difference of entropies, and this enables us to define the congruence class mod $p$ of the difference of two elements of $\mathcal{H}^{(p)}$. The same additivity holds.)

There’s just one more thing. In a way, the definition of entropy mod $p$ is unsatisfactory. In order to define it, we had to step outside the world of $\mathbb{Z}/p\mathbb{Z}$ by making arbitrary choices of representing integers, and then we had to show that the definition was independent of those choices. Can’t we do it directly?

In fact, we can. It’s a well-known miracle about finite fields $K$ that any function $K \to K$ is a polynomial. It’s a slightly less well-known miracle that any function $K^n \to K$, for any $n \geq 0$, is also a polynomial.

Of course, multiple polynomials can induce the same function. For instance, the polynomials $x^p$ and $x$ induce the same function $\mathbb{Z}/p\mathbb{Z} \to \mathbb{Z}/p\mathbb{Z}$. But it’s still possible to make a uniqueness statement. Given a function $F : K^n \to K$, there’s a unique polynomial $f \in K[x_1, \ldots, x_n]$ that induces $F$ and is of degree less than the order of $K$ in each variable separately.

So, there must be a polynomial representing entropy, of order less than $p$ in each variable. And as it turns out, it’s this one:

$H_p(\pi_1, \ldots, \pi_n) = - \sum_{\substack{0 \leq r_1, \ldots, r_n \lt p:\\r_1 + \cdots + r_n = p}} \frac{\pi_1^{r_1} \cdots \pi_n^{r_n}}{r_1! \cdots r_n!}.$

You can check that when $n = 2$, this is in fact the same polynomial $\sum_{r = 1}^{p - 1} \pi_1^r/r$ as we met before — Kontsevich’s $1\tfrac{1}{2}$-logarithm.

It’s striking that this direct formula for entropy modulo a prime looks quite unlike the formula for real entropy,

$H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.$

It’s also striking that in the case $n = 2$, the formula for real entropy is

$H_\mathbb{R}(\alpha, 1 - \alpha) = - \alpha \log \alpha - (1 - \alpha) \log(1 - \alpha),$

whereas mod $p$, we get

$H_p(\alpha, 1 - \alpha) = \sum_{r = 1}^{p - 1} \frac{\alpha^r}{r},$

which is a truncation of the Taylor series of $-\log(1 - \alpha)$. And yet, the characterization theorems for entropy over $\mathbb{R}$ and over $\mathbb{Z}/p\mathbb{Z}$ are strictly analogous.

As I see it, there are two or three big open questions:

• Entropy over $\mathbb{R}$ can be understood, interpreted and applied in many ways. How can we understand, interpret or apply entropy mod $p$?

• Entropy over $\mathbb{R}$ and entropy mod $p$ are defined in roughly analogous ways, and uniquely characterized by strictly analogous theorems. Is there a common generalization? That is, can we unify the two definitions and characterization theorems, perhaps proving a theorem about entropy over suitable fields?

Posted at December 16, 2017 4:53 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3003

### Re: Entropy Modulo a Prime (Continued)

Typo: “definitin”.

Posted by: Blake Stacey on December 17, 2017 8:37 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Fixed. Thanks.

Posted by: Tom Leinster on December 17, 2017 9:10 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

I wonder, does this definition of entropy mod $p$ furnish a definition of mutual information akin to that familiar from the Shannon theory? Given two random variables $X$ and $Y$ to which we can ascribe a joint probability $\pi_{X Y}$ with marginals $\pi_X$ and $\pi_Y$, can we write $I_p(X;Y) = H_p(\pi_X) + H_p(\pi_Y) - H_p(\pi_{X Y})$ and have it behave sensibly?

Posted by: Blake Stacey on December 17, 2017 11:23 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Good question!

What would count as behaving “sensibly”? All I can think of for now are symmetry (which is trivial) and various inequalities (which don’t make sense). What are the other important properties of mutual information?

Posted by: Tom Leinster on December 17, 2017 11:36 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

To begin to answer my own question: in the classical Shannon-type setting, mutual information can be expressed in terms of relative entropy in a couple of ways. Writing $H(-\|-)$ for relative entropy,

$I(X; Y) = \sum_y Pr(y) H\bigl( (X|y) \| X \bigr)$

and also

$I(X; Y) = H\bigl( (X, Y) \| X \otimes Y \bigr).$

Here I’ve written everything in terms of random variables rather than distributions; my $(X, Y)$ is your $\pi_{X Y}$, and my $X \otimes Y$ is the independent coupling of $X$ and $Y$.

It would be nice if the same equations held mod $p$, with your definition of mutual information. But first we need a definition of relative entropy mod $p$!

I spent a little while last week trying to find the right definition of relative entropy mod $p$, without success. But these equations might provide a way forward.

Posted by: Tom Leinster on December 17, 2017 11:46 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Defining “sensibly” in this context is indeed tricky, since the properties of joint and mutual information that I’m used to leaning on are pretty much all inequalities that don’t make sense here!

The polynomial representation looks rather combinatorial, like a weighted counting of coincidences in repeated draws. But I haven’t yet been able to extract meaning from that.

Posted by: Blake Stacey on December 18, 2017 6:44 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Typo: around the middle of the page, when you quote Kontsevich’s conclusion, you mean to say “with all probabilities in $\mathbb{Q}$” instead of “with all probabilities in $\mathbb{R}$”.

Posted by: Martin on December 31, 2017 8:35 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Oops! That’s a bad one. Fixed now. Thanks!

Posted by: Tom Leinster on January 1, 2018 1:13 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Very interesting. This discussion seems closely related to Connes’ The Witt construction in characteristic one and Quantization; I don’t understand it yet, but Connes uses entropy to construct a “characteristic one” version of Witt vectors.

Posted by: Qiaochu Yuan on January 4, 2018 9:00 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Very interesting.

Thanks!

I was sort of depressed about the lack of response to these posts, because I think this story is both really neat (for what we do know) and really intriguing (for what we don’t). The credit for this must ultimately go to Kontsevich.

But later I realized a probable reason for the lack of response: that I didn’t give any way of understanding or interpreting entropy mod $p$. And there’s a simple reason: I don’t have one!

Entropy mod $p$ is one of those things that I can only understand by analogy — in this case, the analogy with real entropy. Connes seems to be steering by analogy too, judging by the title of his section 6. But I’d love to have a direct interpretation of entropy mod $p$.

Can you say any more about the relationship you see between Connes’s work and my posts? Browsing his paper for a few minutes, I can see some familiar-looking formulas, but nothing deeper.

Posted by: Tom Leinster on January 4, 2018 9:42 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Interesting stuff, Tom!

You have a connection between entropy mod $p$ and $p$-derivations, which are closely related to Witt vectors, and hence a connection between entropy mod $p$ and Witt vectors. This reminds me a connection I know about between the ‘big’ Witt vectors, which somehow combine the usual Witt vectors for all primes simultaneously, and the entropy function.

This might be too big for a blog comment, but let me give it a shot.

First, the connection between $p$-derivations and the usual Witt vector functor $W^{(p)}$ with respect to a prime $p$: (The pro’s usually call it the $p$-typical’ Witt vector functor, by the way.) It’s the right adjoint of the forgetful functor from the category of rings (commutative) equipped with a $p$-derivation to the category of rings. This observation was first made by Andre Joyal, I believe. It could have been made before him, in that it follows from facts known at the time using a bit of category theory, but to the best of my knowledge it was not. It’s perhaps nice to compare it to the fact that the right adjoint of the forgetful functor from rings with an ordinary derivation to rings is given by the divided power series functor sending $R$ to the set of series $\sum_n a_n \frac{t^n}{n!}$, where $a_n\in R$ and $\frac{t^n}{n!}$ is a formal symbol and the ring operations and derivation are what you’d expect.

Second, the big Witt vectors. The big Witt vector functor $W$ can be described in a few ways. One, it’s the right adjoint of the forgetful functor from lambda-rings to rings. This is analogous to the definition above of the $p$-typical Witt vector functor, but the connection is tighter than that. This is because a lambda-ring has canonical family of $p$-derivations $\delta_p$ for each prime $p$. (Formally, $\delta_p(x)=(\psi_p(x)-x^p)/p$, where $\psi_p$ is the $p$-th Adams operation. So $\delta_p$ is a witness’ to the Frobenius lift property of the Adams operation. More properly, $\delta_p$ is the operation associated to the symmetric function $((x_1^p+x_2^p+\cdots)-(x_1+x_2+\cdots)^p)/p$.) Even further, the $\delta_p$ for varying $p$ satisfy certain commutation relations. They are what you get by rewriting the equation $\psi_p\circ\psi_q=\psi_q\circ\psi_p$ in terms of the $\delta$’s and solving for $\delta_p\circ\delta_q$. All the denominators will magically disappear. So a lambda-ring has commuting endomorphisms $\psi_p$, one for each prime $p$, each of which reduces to the Frobenius map modulo its prime. (In fact, if you incant the right spells from category theory and the words ‘torsion free’, this can serve as another definition of lambda-ring.)

In particular, each big Witt vector ring $W(R)$ comes with canonical commuting endomorphisms $\psi_p$, indexed by the primes, such that $\psi_p$ reduces to the Frobenius map $x\mapsto x^p$ in $W(R)/ p W(R)$.

Third, we want to consider the big Witt vectors of $\mathbb{R}$. Actually, that’s too simple. It’s just the product ring $\mathbb{R}^{\{1,2,3,\dots\}}$, where each operator $\psi_p$ acts through the exponent by scaling by $p$. What we really want to consider is $W(\mathbb{R}_+)$, the big Witt vectors of the semiring (or ‘rig’) $\mathbb{R}_+$ of reals $x \geq 0$. Let me not get into the definition of this. (You can find the definition in my paper “Witt vectors, semirings, and total positivity”.) Instead, please just take my word that $W(\mathbb{R}_+)$ is a sub-semiring of $\mathbb{R}^{\{1,2,3,\dots\}}$ defined by certain complicated inequalities and it is stable under the endomorphisms $\psi_p$.

Now for the amazing thing: The family of endomorphisms $\psi_p$ of $W(\mathbb{R}_+)$, which is a priori defined only for prime numbers $p$, can be naturally extended to a family of endomorphisms for all real $p > 1$ !! This is a consequence of the theory of `total positivity’ surrounding the Edrei-Thoma theorem.

Now let’s interpret this geometrically by taking $\mathrm{Spec}$ of everything. What I really mean is to just pass to the opposite category where we can use the language of geometry. Then $\mathrm{Spec}(W(\mathbb{R}_+))$ is some kind of space which has commuting endomorphisms $\psi_p$, for $p$ prime, which interpolate to real $p$ > $1$. We might think of this as a flow parameterized by multiplicative time $p$.

Now, this is very reminiscent of Deninger’s picture to prove the Riemann hypothesis! In it, he likes to look at the infinitesimal generator of the flow. If we do this here, the entropy function will come in.

How does this work? Up to some topological closure, any Witt vector in $W(\mathbb{R}_+)$ can be written as a countable (or finite) sum $\sum_{n} [a_n]$, where $a_n > 0$ is real and $[a_n]\in W(\mathbb{R}_+)$ denotes the so-called Teichmueller representative of $a_n$. (Let’s not worry about what this means, and let’s ignore ay convergence issues.) Then the Frobenius flow $\psi_p$, for $p > 1$ is given by $\psi_p(\sum_n [a_n]) = \sum_n [a_n^p]$ Now consider the derivative with respect to $\log(p)$ (we consider $\log(p)$ to make our flow additively parameterized). Since $[-]$ is a multiplicative function, we can write $[a^p]=[a]^p$, and hence we have $\delta(\sum_n [a_n]) = \lim_{p\to 1} \sum_n \frac{d([a_n]^p)}{d\log(p)} = \sum_n[a_n]\log[a_n]$ And this is some Witt vector version of the entropy function!

I was really excited when I noticed this a few years ago. The main reason was the formal similarity to Deninger’s picture, but I haven’t been able to make anything of real substance out of it. Another reason is that Connes also has a connection between Witt vectors, semirings, and entropy, but this I haven’t really looked into.

Do you sense any connection to your post? Whereas I let $p$ tend to $1$ in $\mathbb{R}$, you look close to $p$ in $\mathbb{Q}_p$.

Posted by: James Borger on January 5, 2018 5:00 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Thanks for this excellent and detailed comment! Among other things, it reminded me that I saw Joyal talk about Witt vectors at Category Theory 2015 in Aveiro, Portugal. I remember enjoying it very much and thinking about it for a while afterwards. Looking at my notes now, I don’t see the punchline stated as clearly as you put it in your comment. I’m sure he was talking about the fact you mention; I just can’t have got the point straight in my head during his lecture.

Among other things, my notes say that if $p$ is invertible in a (commutative) ring $A$ than $W^{(p)}(A)$ is simply the product ring $A^{\{1, 2, 3, \ldots\}}$. Is that right?

I’ll have to spend a while thinking about the rest of your post, and get back to you with more questions.

Posted by: Tom Leinster on January 6, 2018 12:31 AM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

You wrote, “if $p$ is invertible in a (commutative) ring $A$ than $W^{(p)}(A)$ is simply the product ring $A^{\{1,2,3,\dots\}}$. Is that right?”

Yes, that’s true, although I’d prefer to write the exponent as the monoid $\mathbb{N}=\{0,1,2,\dots\}$, or perhaps $\{1,p,p^2,\dots\}$. :)

There should be a pretty formal (i.e. category theoretic) proof. Rather than trying to cook one up, maybe I’ll just say that it should follow from two key facts: (i) giving a $p$-derivation on a $\mathbb{Z}[1/p]$-algebra is equivalent to giving an endomorphism ($\delta$ corresponds to $x\mapsto x^p+p\delta(x)$) and hence an action of $\mathbb{N}$, and (ii) the forgetful functor from rings with $p$-derivation to rings is both monadic and comonadic.

Posted by: James Borger on January 6, 2018 3:31 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

That’s funny — I had $A^\mathbb{N}$ in my notes from Joyal’s talk (with $0 \in \mathbb{N}$), but I saw $\mathbb{R}^{\{1, 2, \ldots\}}$ later in your post, so I thought I was conforming to your preferred convention by starting from $1$. Obviously not! Anyway, glad that small point’s cleared up.

Posted by: Tom Leinster on January 6, 2018 4:24 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Also, here’s something about $p$-adic polylogarithms. I don’t think it’s related to the main point of your posts, but there’s no harm in putting it here.

$p$-adic analogues of polylogarithms have been studied quite a bit in the theory of ‘mixed Tate motives’. In fact, there are two important $p$-adic analogues of the polylogarithms. The first is the usual $j$-th polylogarithm:

$Li_j(x)=\sum_k \frac{x^k}{k^j}$

This will converge if $x$ is $p$-adically sufficiently close to $0$ but not in general because there are $p$’s in the denominators of the coefficients. A closely related point is that you can’t reduce the series modulo $p$. The simplest way of getting around this is just to drop the bad terms and define

$Li^{(p)}_j(x)=\sum_{p\nmid k} \frac{x^k}{k^j}$

Surprisingly, this series is also important, and it’s the second of the two analogues. There are no problems reducing this new series modulo $p$, and it has the following relation to Kontsevich’s function: $Li^{(p)}_j(x) \equiv \sum_{k=1}^{p-1} \frac{x^k}{k^j} \cdot \frac{1}{1-x^p} \mod p,$ which is why I brought it up.

How do these two series come up in geometry? Briefly, consider the variety obtained by first puncturing the projective line at $0$ and $\infty$, and then gluing the points $1$ and $x$ together transversely. Then the space of complex points of this variety is homotopy equivalent to a circle with two point glued together. So its $H^1$ of has rank $2$, and it is naturally an extension of $H^1$ of the twice punctured projective line by $H^0$ of the point. Such an extension is split if you view the cohomology as being a vector space or an abelian group. But in algebraic geometry, cohomology takes values in richer categories, such as those of Hodge structures or Galois representations, and these categories are not semi-simple. It turns out that in these categories the extension class of this $H^1$ is given by the logarithm of $x$, where the meaning of the logarithm depends on the cohomology theory. For mixed Hodge structures, the extension class is the usual $\log(x)$. But let’s consider $p$-adic cohomology theories instead. These involve both $p$-adic differential equations and Frobenius-twisted linear algebra. If you focus on the differential equations, the usual $p$-adic logarithm $Li_1(x)$ comes up, but if you focus on the Frobenius linear algebra, the second one $Li^{(p)}_1(x)$ naturally comes up. If you want to impress people, you could call the package of both of them the $p$-adic realization of the motivic logarithm. The usual complex logarithm would be the Hodge realization.

For the polylogarithms, there is a similar but more complicated story. I’m not sure what relation there is to entropy mod $p$, which was the main point of your posts. Maybe this would be clearer if I understood the relation between the usual logarithm and entropy at a deeper level.

Posted by: James Borger on January 5, 2018 5:14 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Sorry for the following trivial observations, but I couldn’t resist. :-)

In fact, we can. It’s a well-known miracle about finite fields $K$ that any function $K \to K$ is a polynomial.

I’m guessing “miracle” is pretty facetious! In any case, my immediate thought here was Lagrange interpolation.

It’s a slightly less well-known miracle that any function $K^n \to K$, for any $n \geq 0$, is also a polynomial.

I didn’t consciously know that, but here’s a cheap way of seeing it for those who like explicit formulas: if $K$ has $q$ elements, then the polynomial

$p(x_1, \ldots, x_n) = \prod_{j=1}^n (1 - x_j^{q-1})$

has the property that $p(0, 0, \ldots, 0) = 1$ and $p(a_1, \ldots, a_n) = 0$ whenever $(a_1, \ldots, a_n) \neq (0, \ldots, 0)$. Thus if the collection

$(a_{i 1}, \ldots, a_{i n})$

for $i = 1, \ldots, q^n$ lists all $q^n$ distinct points of $K^n$, then an arbitrary function $f: K^n \to K$ will be represented by the polynomial

$\sum_{i=1}^{q^n} f(a_{i 1}, \ldots, a_{i n}) \cdot p(x_1 - a_{i 1}, \ldots x_n - a_{i n}).$

(More generally, I believe there are multivariable forms of Lagrange interpolation which work over any field $K$, but I’d rather not embark on details at the moment. Roughly, I guess that if you are given distinct points $a_i \in K^n$ for $i = 1, \ldots, M$, then vectors built out of entries of type $\prod_{j = 1}^n a_{i j}^k$ can be arranged to be linearly independent so long as the upper bound of the exponents $k$ is taken to be sufficiently high. Then combine that with various tricks of Cramer’s rule type. At least that’s the idea I extracted looking at this.)

Posted by: Todd Trimble on January 6, 2018 7:59 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

That’s a nice short proof! I knew there existed a Lagrange interpolation proof that every $n$-ary operation on a finite field is a polynomial, but I’d never actually gone through it. It’s simpler than I’d anticipated.

I’m guessing “miracle” is pretty facetious!

No, I was being sincere. Of course it’s easy to prove. But if one’s intuition about functions and polynomials on fields is based on $\mathbb{R}$ (or indeed $\mathbb{C}$), then it does seem miraculous. It’s as if every function $\mathbb{R} \to \mathbb{R}$ were a polynomial.

It’s like the way that when you do Fourier analysis for finite abelian groups, all the analytic complications present in the real cse (regularity conditions, approximations to the identity, etc.) vanish in a puff of smoke. Every function on a finite abelian group is a trigonometric polynomial — it’s that simple.

If I remember my undergraduate schedule correctly, I took Galois theory at about the same time I took numerical analysis. In one course I learned that every function from a finite field to itself is a polynomial, and in the other I learned about Lagrange interpolation. But the proof of the fact about fields was not the Lagrange one; it was the case $n = 1$ of the proof I give below. I don’t think I learned the Lagrange proof until much later. Both courses were optional, so neither lecturer could assume that students in their audience was attending the other course.

Here’s the proof I had in mind that for a finite field $K$ and $n \geq 0$, every function $K^n \to K$ can be represented by a polynomial. In fact, it’s a proof that every function $K^n \to K$ can be uniquely represented by a polynomial of degree $\lt q$ in each variable separately, where $q$ is the order of $K$.

Let $A_n$ denote the additive group of polynomials in $K[x_1, \ldots, x_n]$ of degree $\lt q$ in each variable separately. There’s a canonical function

$R : A_n \to \mathbf{Set}(K^n, K)$

that assigns to each polynomial the function that it represents. The claim is that $R$ is a bijection. The domain and codomain each have $q^{q^n}$ elements, so it suffices to prove that $R$ is injective. Now $\mathbf{Set}(K^n, K)$ is a group under pointwise addition, and $R$ is a homomorphism, so it suffices to prove that $\ker R$ is trivial.

In other words, we just have to prove that any polynomial of degree $\lt q$ in each of the variables $x_1, \ldots, x_n$ separately that vanishes everywhere must, in fact, be the zero polynomial. This is an easy induction on $n$.

This is longer than your Lagrange proof, but it proves a bit more (namely, the uniqueness of the representing polynomial subject to it being of small enough degree).

Does anyone know anywhere in the literature where there’s a proof of the theorem I just proved? I spent a while looking, and found plenty of places where (1) the case $n = 1$ is proved, or (2) the author asserts that the general case is well-known, but nowhere where the general case is actually proved. I do believe it is well-known, but I had a curiously hard time tracking it down.

Posted by: Tom Leinster on January 6, 2018 9:26 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

Sweet! I like that proof. But I think we can use the same idea as applied to the proof I gave: the polynomial I wrote down manifestly has the property that the degree in each variable is less than $q$. This shows that the map $A_n \to \mathbf{Set}(K^n, K)$ is surjective. A surjection between two finite sets of the same cardinality is a bijection.

Posted by: Todd Trimble on January 6, 2018 10:21 PM | Permalink | Reply to this

### Re: Entropy Modulo a Prime (Continued)

I concede, that’s a shorter proof!

So, however much or little I manage to teach myself about Witt vectors in what remains of the evening, I can tell myself that I’ve learned a nice new piece of mathematics today.

Posted by: Tom Leinster on January 6, 2018 10:36 PM | Permalink | Reply to this

Post a New Comment