Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 10, 2019

How Much Work Can It Be To Add Some Things Up?

Posted by Tom Leinster

Here’s a puzzle for you. Calculating anything takes work. How much work? Well, obviously it depends how you’re doing the calculating. But let’s make some assumptions.

Imagine we have a machine that can add up any finite sequence x 1,,x nx_1, \ldots, x_n of nonnegative reals. Doing so costs W(x 1,,x n)W(x_1, \ldots, x_n) \in \mathbb{R} units of work — maybe measured in electricity, or potatoes, or whatever our machine uses as fuel. What could the function WW reasonably be?

Let’s make two assumptions.

The first is about adding up in stages. If you had a book filled with numbers, and you were given the dreary task of adding them all up, you might take a page by page approach: first you add up the numbers on each page, writing the total at the bottom of the page, then you add up all the page totals to give the final answer.

Obviously the two-stage approach gives the same result, but we’ll assume that in terms of work done, it’s a perfectly efficient strategy. To take a simple example (a two-page book), obviously

v+w+x+y+z=(v+w+x)+(y+z), v + w + x + y + z = (v + w + x) + (y + z),

and we’re assuming that

W(v,w,x,y,z)=W(v,w,x)+W(y,z)+W(v+w+x,y+z). W(v,w,x,y,z) = W(v,w,x) + W(y,z) + W(v+w+x,y+z).

The first term on the right-hand side indicates the work done to add up the three numbers v,w,xv, w, x on the first page. The second measures the work done to get the total of the numbers y,zy, z on the second page. The third term gives the work done in adding up the page totals to obtain the grand total. And in general, the equation is

W(x 11,,x 1k 1,,x n1,,x nk n)= i=1 nW(x i1,,x ik i)+W( j=1 k 1x 1j,, j=1 k nx nj) W(x_{11}, \ldots, x_{1 k_1}, \ldots, x_{n 1}, \ldots, x_{n k_n}) = \sum_{i = 1}^n W(x_{i 1}, \ldots, x_{i k_i}) + W\biggl( \sum_{j = 1}^{k_1} x_{1 j}, \ldots, \sum_{j = 1}^{k_n} x_{n j} \biggr)

for any x ij0x_{i j} \geq 0.

For our second assumption, we imagine that our machine is really doing some work, shifting around those numbers x ix_i like they were physical objects. I’m imagining something like one of those problems about piles of earth that optimal transport people talk about: x ix_i is the size of the iith pile, and adding them together involves moving them.

Mounds of rubble

On that basis, if the piles are all twice as big, then the process of adding them together take twice as much work. Generally:

W(cx 1,,cx n)=cW(x 1,,x n) W(c x_1, \ldots, c x_n) = c W(x_1, \ldots, x_n)

for any c,x i0c, x_i \geq 0.

And that’s it. Under these two assumptions, there’s basically only one function WW that will do — only one way to measure the work needed to add some things up. The challenge is to find it.

Posted at March 10, 2019 2:14 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3094

14 Comments & 0 Trackbacks

Re: How Much Work Can It Be To Add Some Things Up?

Just to get the ball rolling, I’m going to pessimistically guess W=0W=0.

Posted by: Yemon Choi on March 10, 2019 7:21 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

(To expand a little more on my silly answer: I had a hunch that the answer could be something to do with entropy just on sociological grounds, as a long-time reader of this blog; but I just stupidly misremembered the formula for entropy and hence got something which didn’t work)

Posted by: Yemon Choi on March 10, 2019 10:05 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

Well, of course you’re right that W=0W = 0 is a solution! More generally, it’s clear that the space of solutions is closed under taking scalar multiples.

And I suspected some people might guess it was basically entropy on sociological grounds, but there was nothing I could think of doing to avoid that…

Posted by: Tom Leinster on March 10, 2019 11:24 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

W(x 1,,x n)=x ilogx i(x i)log(x i)W(x_1, \dots, x_n) = \sum x_i \log x_i - (\sum x_i) \log (\sum x_i)

Continuity in the parameters should be enough to nail it down.

I think this is equivalent to the abstract characterization of entropy in terms of its behavior under conditioning.

Posted by: Alexander Shamov on March 10, 2019 8:06 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

BTW, here’s a way of guessing the answer - though perhaps not quite proving.

Suppose that instead of W(cx 1,,cx n)=cW(x 1,,x n)W(cx_1, \dots, cx_n) = c W(x_1, \dots, x_n) we required W(cx 1,,cx n)=c λW(x 1,,x n)W(cx_1, \dots, cx_n) = c^\lambda W(x_1, \dots, x_n) for some λ\lambda. Clearly, the multiplicative group of positive reals acts on the space of solutions of the functional equation, so looking for other eigenfunctions seems like a reasonable thing to do.

Denote w n:=W(1,,1)w_n := W(1, \dots, 1) (with nn copies of 11). By breaking an mnmn-tuple into an nn-tuple of mm-tuples we get

w mn=mw n+n λw mw_{mn} = m w_n + n^\lambda w_m

And by symmetry, also

w mn=nw m+m λw nw_{mn} = n w_m + m^\lambda w_n

Subtracting these, we get

w n=n λnw_n = n^\lambda - n, up to a multiplicative constant that doesn’t depend on nn that we can set to 11.

Now, if our x ix_i’s are integer multiples of some small ε\varepsilon, we can represent W(x 1,,x n)W(x_1, \dots, x_n) as an intermediate step in a calcuation of the sum of ε 1( ix i)\varepsilon^{-1} (\sum_i x_i) copies of ε\varepsilon, reducing everything to the sequence (w n)(w_n):

W(x 1,,x n)=ε λ(w ε 1x iw ε 1x i)W(x_1, \dots, x_n) = \varepsilon^{\lambda} \left( w_{\varepsilon^{-1} \sum x_i} - \sum w_{\varepsilon^{-1} x_i} \right)

so by a continuity argument:

W(x 1,,x n)=(x i) λx i λW(x_1, \dots, x_n) = (\sum x_i)^\lambda - \sum x_i^\lambda

The limit as λ1\lambda \to 1 is 00, but normalizing it by (λ1) 1(\lambda-1)^{-1}, we get our logarithms.

Posted by: Alexander Shamov on March 10, 2019 8:31 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

Yup!

That’s a nice argument. It’s an interesting fact that the various deformations of Shannon entropy are often easier to axiomatize than Shannon entropy itself, because you can use the kind of symmetry-breaking argument that you’ve used here. Some more examples of this technique are here.

You do need some sort of regularity assumption, otherwise you can replace log\log in your formula by flogf \circ \log where f:f : \mathbb{R} \to \mathbb{R} is some function that is additive but not \mathbb{R}-linear. (A classical argument using the axiom of choice shows that such functions exist.) Continuity is enough. Measurability is enough if you also assume that WW is symmetric in its arguments. I don’t know whether measurability is enough without that symmetry argument. If I had to guess, I’d say yes.

Posted by: Tom Leinster on March 10, 2019 11:29 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

Here’s how I worked through this.

You didn’t write down an equation for adding three things, but I think it should have the form W(x,y)+W(x+y,z)=W(x,y,z)=W(y,z)+W(x,y+z). W(x,y)+W(x+y,z)= W(x,y,z) = W(y,z) + W(x,y+z). That is, the function W:R 0×R 0RW\colon R_{\geq0}\times R_{\geq0}\to R is a 2-cocycle on the monoid (R 0,+)(R_{\geq0},+). I think this already implies all the “higher” identities (i.e., all the ways for computing the work for adding n4n\geq4 numbers come out the same).

I figured you probably wanted W(0,0)=0W(0,0)=0, i.e., WW is a normalized 2-cocycle. Even if not, we can write W(x,y)=W(x,y)+W(0,0)W(x,y)=W'(x,y)+W(0,0), where WW' is normalized 2-cocycle.

Let V(x,y):=W(x,y)W(y,x)V(x,y):= W(x,y)-W(y,x). Then VV is also a 2-cocycle (and is normalized). In fact, VV is biadditive: V(x+y,z)=V(x,z)+V(y,z),V(x,y+z)=V(x,y)+V(x,z) V(x+y,z)=V(x,z)+V(y,z),\qquad V(x,y+z)=V(x,y)+V(x,z) and antisymmetric: V(x,y)=V(y,x)V(x,y)=-V(y,x). (Note that any biadditive WW is a 2-cocycle.)

Now, if WW is actually a 2-cocycle on the group (R,+)(R,+), then I’m in good shape, because any such cocycle has the form W(x,y)=V(x,y)+df(x,y) W(x,y) = V(x,y) + df(x,y) where VV is an anti-symmetric bilinear function and dfdf is a 2-coboundary, i.e., f:RRf\colon R\to R is some function and df(x,y):=f(x+y)f(x)f(y).df(x,y):= f(x+y)-f(x)-f(y). I.e., group cohomology H 2(R,R)=H^2(R,R)= the group of alternating bilinear functions R×RRR \times R\to R, and two 2-cocycles W 1W_1 and W 2W_2 represent the same cohomology class if their anti-symmetrizations coincide.
I’m not entirely sure it’s the same for (R 0,+)(R_{\geq0}, +), but let’s go with it for now.

If we also require that WW be symmetric (which I don’t think you actually specified), then clearly V=0V=0, so W=dfW=df.

In which case, it comes down to having an ff such that df(cx,cy)=cdf(x,y)df(cx,cy)=c\,df(x,y). Clearly f(x)=μx+λxlogxf(x)=\mu x+\lambda x\log x works, and presumably continuity forces this, by Alexander’s argument, giving W(x,y)=λ[(x+y)log(x+y)xlogxylogy]W(x,y)=\lambda [(x+y)\log(x+y) - x\log x - y\log y].

I guess this remains the only answer even if we don’t require symmetry but enforce continuity, because there are no non-trivial continuous alternating 2-forms VV on R 0R_{\geq0} (and therefore none such that V(cx,cy)=cV(x,y)V(cx,cy)=cV(x,y)).

Posted by: Charles Rezk on March 10, 2019 10:35 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

Nice!

There are various papers that take this homological approach to entropy. I think Jean-Louis Cathelineau may have been first to see the connection, e.g. there’s something on p.58-59 of his paper Sur l’homologie de SL 2SL_2 a coefficents dans l’action adjointe, and it’s also right there at the start of his paper Remarques sur les différentielles des polylogarithmes uniformes. Maxim Kontsevich also came upon it in a privately-circulated note he wrote for Hirzebruch’s retirement celebrations, later published as an appendix to an article by Elbaz-Vincent and Gangl. And Pierre Baudot and Daniel Bennequin have a whole 65-page paper called The homological nature of entropy.

To clear up a couple of small points from your comment:

You didn’t write down an equation for adding three things

I’m not sure I understand what you mean by that, but the big displayed equation just before the picture of mounds of rubble includes what you need. Take n=2n = 2 and k 1=2,k 2=1k_1 = 2, k_2 = 1 to get

W(x,y,z)=W(x,y)+W(x+y,z). W(x, y, z) = W(x, y) + W(x + y, z).

And similarly, you can get

W(x,y,z)=W(y,z)+W(x,y+z). W(x, y, z) = W(y, z) + W(x, y + z).

Then as you say, equating the two expressions for W(x,y,z)W(x, y, z) gives the cocycle identity.

I figured you probably wanted W(0,0)=0W(0,0)=0

This follows from the homogeneity requirement:

W(0,0)=W(00,00)=0W(0,0)=0. W(0, 0) = W(0 \cdot 0, 0 \cdot 0) = 0 W(0, 0) = 0.

Posted by: Tom Leinster on March 11, 2019 12:17 AM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

I would read the equation for n=2n=2, k 1=2k_1=2, and k 2=1k_2=1 as W(x,y,z)=W(x,y)+W(z)+W(x+y,z). W(x,y,z)= W(x,y)+W(z)+W(x+y,z). So to get the above equation I need to assume that W(x)=0W(x)=0. Which is what was bothering me I think.

Posted by: Charles Rezk on March 11, 2019 1:03 AM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

You’re right, that is indeed what the equation says. I’m so used to assuming that W(x)=0W(x) = 0 that I didn’t even see I was assuming it.

Nevertheless, no further axioms are needed. You can get W(x)=0W(x) = 0 for free. Just note that the case n=1n = 1, k 1=1k_1 = 1 of the general equation is

W(x)=W(x)+W(x), W(x) = W(x) + W(x),

giving W(x)=0W(x) = 0.

Posted by: Tom Leinster on March 11, 2019 2:10 AM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

The homogeneity assumption W(cx 1,,cx n)=cW(x 1,,x n)W(c x_1,\ldots,c x_n) = c W(x_1, \ldots, x_n) means that WW is the unique homogeneous of degree 11 extension of a function on the set of probability distributions. Assuming that the x ijx_{i j} add up to 11, the other equation becomes one or another known characterization of entropy. But I haven’t thought things through enough to say whether it’s a classical characterization, or maybe this one.

Posted by: Mark Meckes on March 10, 2019 11:12 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

Yes, absolutely. And the explicit formula for that homogeneous extension is the formula at the start of Alexander’s first comment, up to a constant factor.

The theorem needed is more or less the classical Faddeev characterization of entropy. (At least, assuming WW is symmetric.)

I say “more or less” because he (like most other writers on these things) used a non-symmetric special case of the main equation in my post: in my notation above, it’s the case k 1=2k_1 = 2, k 2==k n=1k_2 = \cdots = k_n = 1. This forced him to also assume a symmetry axiom. In the presence of the symmetry axiom, this special case is equivalent to the general case by a mundane induction. But if you assume the general case, you don’t need symmetry.

Posted by: Tom Leinster on March 10, 2019 11:37 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

In the interests of categorification I wanted to look at this problem with the x ix_i restricted to the positive integers. I thought it would be easy to prove something like W(3,1)<W(2,2),W(3,1)\lt W(2,2), which we would expect from an entropy function. Then I fell into a rabbit hole…

Taking as unknown quantities the two-place functions W(a,b),W(a, b), there is a system of linear equations generated by the 2-cocycle condition: W(a,b)+W(a+b,c)=W(a,b+c)+W(b,c) W(a, b) + W(a+b, c) = W(a, b+c) + W(b, c) as well as the linearity condition cW(a,b)=W(ca,cb).cW(a, b) = W(ca, cb). This is an infinite set of equations, but we can truncate to the set of unknowns {W(a,b):a+bn}\{W(a, b) : a+b \le n\} for some upper bound n.n. I also restrict to W(a,b)=W(b,a).W(a, b)=W(b, a).

For example, truncating at n=6n=6 we have a linear system which has solution the kernel of the following 8×98\times 9 matrix [W(1,1) W(2,1) W(3,1) W(2,2) W(4,1) W(3,2) W(5,1) W(4,2) W(3,3) 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 0 1 1 0 0 2 0 0 0 0 0 1 0 2 0 0 1 0 0 0 0 0 3 0 0 0 0 0 0 0 1] \left[ \begin{array}{rrrrrrrrr} W(1,1) & W(2,1) & W(3,1) & W(2,2) & W(4,1) & W(3,2) & W(5,1) & W(4,2) & W(3,3) \\ -1 & 1 & 1 & -1 & 0 & 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 1 & 1 & -1 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & 1 & -1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 & 0 & 1 & 1 & 0 & -1 \\ -1 & 0 & 0 & 0 & 1 & 0 & 1 & -1 & 0 \\ 0 & 2 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\ 2 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 \\ 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \end{array} \right] The columns are labelled by the unkown. The kernel of this matrix has dimension 3. Fixing W(1,1)=1W(1,1)=1 I found there are 6 solutions: (1,1,2,2,1,2,2,2,3) (1,1,2,2,2,3,1,2,3) (1,2,1,2,1,1,4,4,3) (1,2,1,2,2,2,3,4,3) (1,2,1,2,3,3,2,4,3) (1,2,1,2,4,4,1,4,3) \begin{array}{c} (1, 1, 2, 2, 1, 2, 2, 2, 3)\\ (1, 1, 2, 2, 2, 3, 1, 2, 3)\\ (1, 2, 1, 2, 1, 1, 4, 4, 3)\\ (1, 2, 1, 2, 2, 2, 3, 4, 3)\\ (1, 2, 1, 2, 3, 3, 2, 4, 3)\\ (1, 2, 1, 2, 4, 4, 1, 4, 3) \end{array} Notice that for all of these W(3,1)W(2,2)W(3,1)\le W(2,2) so this is looking promising.

I did some numerics to push these calculations further. Here is a graph showing positive integer solutions truncating at n=16.n=16. This time I fixed W(1,1)=2W(1,1)=2 as there are no solutions with W(1,1)=1.W(1,1)=1. I have plotted only the values W(a,na),W(a,n-a), as compared to the real solution for W(a,na)W(a,n-a) in cyan. The x-axis is labelled by a1.a-1.

a graph

Posted by: Simon Burton on April 7, 2019 5:01 PM | Permalink | Reply to this

Re: How Much Work Can It Be To Add Some Things Up?

From numerical simulation it looks like the above system of equations truncated to nn has kernel with dimension equal to the number of prime factors in nn. I’m guessing this corresponds to the choice of value of log(p)\log(p) for prime pp.

Also, I finally found a proof that W(3,1)<W(2,2).W(3,1)\lt W(2,2). It uses the two equations W(2,1)+W(3,1)=W(1,1)+W(2,2) W(6,2)+W(8,1)=W(2,1)+W(6,3). \begin{array}{c} W(2,1) + W(3,1) = W(1,1) + W(2,2) \\ W(6,2) + W(8,1) = W(2,1) + W(6,3). \end{array}

These simplify to W(2,1)+W(3,1)=3W(1,1) 2W(3,1)+W(8,1)=4W(2,1). \begin{array}{c} W(2,1) + W(3,1) = 3W(1,1)\\ 2W(3,1) + W(8,1) = 4W(2,1). \end{array} Multiplying the first equation by 4 and then combining with the second equation we get 6W(3,1)+W(8,1)=12W(1,1)=6W(2,2) 6W(3,1) + W(8,1) = 12W(1,1) = 6W(2,2) and because W(a,b)>0W(a,b)\gt 0 the result follows, as long as we are willing to cancel the factor of 6.

All of this makes me think that entropy can be defined taking values in a rig, or 2-rig, or traced monoidal category, or some other gadget like that. Working just over the natural numbers I would hope to see the “real” entropy show up as a colimit, similar to how Paolo Perrone explains here .

Posted by: Simon Burton on April 12, 2019 3:54 PM | Permalink | Reply to this

Post a New Comment