## April 14, 2017

### Value

#### Posted by Tom Leinster What is the value of the whole in terms of the values of the parts?

More specifically, given a finite set whose elements have assigned “values” $v_1, \ldots, v_n$ and assigned “sizes” $p_1, \ldots, p_n$ (normalized to sum to $1$), how can we assign a value $\sigma(\mathbf{p}, \mathbf{v})$ to the set in a coherent way?

This seems like a very general question. But in fact, just a few sensible requirements on the function $\sigma$ are enough to pin it down almost uniquely. And the answer turns out to be closely connected to existing mathematical concepts that you probably already know.

Let’s write

$\Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n : p_i \geq 0, \sum p_i = 1 \Bigr\}$

for the set of probability distributions on $\{1, \ldots, n\}$. Assuming that our “values” are positive real numbers, we’re interested in sequences of functions

$\Bigl( \sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty) \Bigr)_{n \geq 1}$

that aggregate the values of the elements to give a value to the whole set. So, if the elements of the set have relative sizes $\mathbf{p} = (p_1, \ldots, p_n)$ and values $\mathbf{v} = (v_1, \ldots, v_n)$, then the value assigned to the whole set is $\sigma(\mathbf{p}, \mathbf{v})$.

Here are some properties that it would be reasonable for $\sigma$ to satisfy.

Homogeneity  The idea is that whatever “value” means, the value of the set and the value of the elements should be measured in the same units. For instance, if the elements are valued in kilograms then the set should be valued in kilograms too. A switch from kilograms to grams would then multiply both values by 1000. So, in general, we ask that

$\sigma(\mathbf{p}, c\mathbf{v}) = c \sigma(\mathbf{p}, \mathbf{v})$

for all $\mathbf{p} \in \Delta_n$, $\mathbf{v} \in (0, \infty)^n$ and $c \in (0, \infty)$.

Monotonicity  The values of the elements are supposed to make a positive contribution to the value of the whole, so we ask that if $v_i \leq v'_i$ for all $i$ then

$\sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')$

for all $\mathbf{p} \in \Delta_n$.

Replication  Suppose that our $n$ elements have the same size and the same value, $v$. Then the value of the whole set should be $n v$. This property says, among other things, that $\sigma$ isn’t an average: putting in more elements of value $v$ increases the value of the whole set!

If $\sigma$ is homogeneous, we might as well assume that $v = 1$, in which case the requirement is that

$\sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.$

Modularity  This one’s a basic logical axiom, best illustrated by an example.

Imagine that we’re very ambitious and wish to evaluate the entire planet — or at least, the part that’s land. And suppose we already know the values and relative sizes of every country.

We could, of course, simply put this data into $\sigma$ and get an answer immediately. But we could instead begin by evaluating each continent, and then compute the value of the planet using the values and sizes of the continents. If $\sigma$ is sensible, this should give the same answer.

The notation needed to express this formally is a bit heavy. Let $\mathbf{w} \in \Delta_n$; in our example, $n = 7$ (or however many continents there are) and $\mathbf{w} = (w_1, \ldots, w_7)$ encodes their relative sizes. For each $i = 1, \ldots, n$, let $\mathbf{p}^i \in \Delta_{k_i}$; in our example, $\mathbf{p}^i$ encodes the relative sizes of the countries on the $i$th continent. Then we get a probability distribution

$\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n) = (w_1 p^1_1, \ldots, w_1 p^1_{k_1}, \,\,\ldots, \,\, w_n p^n_1, \ldots, w_n p^n_{k_n}) \in \Delta_{k_1 + \cdots + k_n},$

which in our example encodes the relative sizes of all the countries on the planet. (Incidentally, this composition makes $(\Delta_n)$ into an operad, a fact that we’ve discussed many times before on this blog.) Also let

$\mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1}, \,\,\ldots,\,\, \mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.$

In the example, $v^i_j$ is the value of the $j$th country on the $i$th continent. Then the value of the $i$th continent is $\sigma(\mathbf{p}^i, \mathbf{v}^i)$, so the axiom is that

$\sigma \bigl( \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n), (v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n}) \bigr) = \sigma \Bigl( \mathbf{w}, \bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n, \mathbf{v}^n) \bigr) \Bigr).$

The left-hand side is the value of the planet calculated in a single step, and the right-hand side is its value when calculated in two steps, with continents as the intermediate stage.

Symmetry  It shouldn’t matter what order we list the elements in. So it’s natural to ask that

$\sigma(\mathbf{p}, \mathbf{v}) = \sigma(\mathbf{p} \tau, \mathbf{v} \tau)$

for any $\tau$ in the symmetric group $S_n$, where the right-hand side refers to the obvious $S_n$-actions.

Absent elements should count for nothing! In other words, if $p_1 = 0$ then we should have

$\sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr) = \sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).$

This isn’t quite triival. I haven’t yet given you any examples of the kind of function that $\sigma$ might be, but perhaps you already have in mind a simple one like this:

$\sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.$

In words, the value of the whole is simply the sum of the values of the parts, regardless of their sizes. But if $\sigma$ is to have the “absent elements” property, this won’t do. (Intuitively, if $p_i = 0$ then we shouldn’t count $v_i$ in the sum, because the $i$th element isn’t actually there.) So we’d better modify this example slightly, instead taking

$\sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.$

This function (or rather, sequence of functions) does have the “absent elements” property.

Continuity in positive probabilities  Finally, we ask that for each $\mathbf{v} \in (0, \infty)^n$, the function $\sigma(-, \mathbf{v})$ is continuous on the interior of the simplex $\Delta_n$, that is, continuous over those probability distributions $\mathbf{p}$ such that $p_1, \ldots, p_n \gt 0$.

Why only over the interior of the simplex? Basically because of natural examples of $\sigma$ like the one just given, which is continuous on the interior of the simplex but not the boundary. Generally, it’s sometimes useful to make a sharp, discontinuous distinction between the cases $p_i \gt 0$ (presence) and $p_i = 0$ (absence).

Arrow’s famous theorem states that a few apparently mild conditions on a voting system are, in fact, mutually contradictory. The mild conditions above are not mutually contradictory. In fact, there’s a one-parameter family $\sigma_q$ of functions each of which satisfies these conditions. For real $q \neq 1$, the definition is

$\sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.$

For instance, $\sigma_0$ is the example of $\sigma$ given above.

The formula for $\sigma_q$ is obviously invalid at $q = 1$, but it converges to a limit as $q \to 1$, and we define $\sigma_1(\mathbf{p}, \mathbf{v})$ to be that limit. Explicitly, this gives

$\sigma_1(\mathbf{p}, \mathbf{v}) = \prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.$

In the same way, we can define $\sigma_{-\infty}$ and $\sigma_\infty$ as the appropriate limits:

$\sigma_{-\infty}(\mathbf{p}, \mathbf{v}) = \max_{i \,:\, p_i \gt 0} v_i/p_i, \qquad \sigma_{\infty}(\mathbf{p}, \mathbf{v}) = \min_{i \,:\, p_i \gt 0} v_i/p_i.$

And it’s easy to check that for each $q \in [-\infty, \infty]$, the function $\sigma_q$ satisfies all the natural conditions listed above.

These functions $\sigma_q$ might be unfamiliar to you, but they have some special cases that are quite well-explored. In particular:

• Suppose you’re in a situation where the elements don’t have “sizes”. Then it would be natural to take $\mathbf{p}$ to be the uniform distribution $\mathbf{u}_n = (1/n, \ldots, 1/n)$. In that case, $\sigma_q(\mathbf{u}_n, \mathbf{v}) = const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)},$ where the constant is a certain power of $n$. When $q \leq 0$, this is exactly a constant times $\|\mathbf{v}\|_{1 - q}$, the $(1 - q)$-norm of the vector $\mathbf{v}$.

• Suppose you’re in a situation where the elements don’t have “values”. Then it would be natural to take $\mathbf{v}$ to be $\mathbf{1} = (1, \ldots, 1)$. In that case, $\sigma_q(\mathbf{p}, \mathbf{1}) = \bigl( \sum p_i^q \bigr)^{1/(1 - q)}.$ This is the quantity that ecologists know as the Hill number of order $q$ and use as a measure of biological diversity. Information theorists know it as the exponential of the Rényi entropy of order $q$, the special case $q = 1$ being Shannon entropy. And actually, the general formula for $\sigma_q$ is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).

Anyway, the big — and as far as I know, new — result is:

Theorem  The functions $\sigma_q$ are the only functions $\sigma$ with the seven properties above.

So although the properties above don’t seem that demanding, they actually force our notion of “aggregate value” to be given by one of the functions in the family $(\sigma_q)_{q \in [-\infty, \infty]}$. And although I didn’t even mention the notions of diversity or entropy in my justification of the axioms, they come out anyway as special cases.

I covered all this yesterday in the tenth and penultimate installment of the functional equations course that I’m giving. It’s written up on pages 38–42 of the notes so far. There you can also read how this relates to more realistic measures of biodiversity than the Hill numbers. Plus, you can see an outline of the (quite substantial) proof of the theorem above.

Posted at April 14, 2017 4:17 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2959

### Re: Value

This looks fascinating! But how am I supposed to think of such a notion of value? Concretely, what does it mean that each element of the set has an assigned probability $p_i$? The obvious interpretation is that the element $i$ is actually contained in the set only with probability $p_i$, and that all these probabilities are independent. This would suggest that taking the expectation value $\sum_i p_i v_i$ should be a reasonable notion of value, but this contradicts the replication property (as you note explicitly).

Perhaps the answer is that you want diversity to be a value in itself? Do you have in mind a way to motivate this without talking about diversity measures?

Oh, and there’s a small typo in the definition of $\sigma_1$, where the subscript $i$ should be in the exponent.

evaluate the entire planet

A beautiful pun!

Posted by: Tobias Fritz on April 14, 2017 5:09 PM | Permalink | Reply to this

### Re: Value

Thanks!

To be honest, I’m not sure how you’re supposed to think about value in general. I’d like some help developing my intuition about it.

One point I find helpful is Remark 5.18 of the notes: that if you regard each of the “parts” or “elements” $i$ as made up of a certain number of individuals, then $\sigma_q(\mathbf{p}, \mathbf{v})$ can be understood as the number of individuals times the average value per individual.

I’d tend to think of $p_i$ as proportions. A typical example (Example 5.16(iv) in the notes) is this. We have an ecological community of some kind, divided into $n$ subcommunities. These might be geographical sites, e.g. west of the river and east of the river. Then $p_i$ denotes the size of the $i$th subcommunity, normalized so that $\sum p_i = 1$. And $v_i$ denotes the diversity of the $i$th subcommunity. My general question then becomes: what is the value of the whole community in terms of the values of the subcommunities? That, as you more or less guessed, was how I started on this line of thought.

But I’d love to have better intuition. It would be really helpful to have some examples that have nothing to do with diversity. If you can think of any, please pass them on!

Posted by: Tom Leinster on April 14, 2017 5:32 PM | Permalink | Reply to this

### Re: Value

My intuition also has a hard time reconciling the replication property with the normalization of the $p_i$’s. Normalizing the “total size” seems to say that we don’t care about the “total amount” but about some kind of weighted average, but replication says that’s not it at all.

Here’s another way to describe the formula for $\sigma_p$ that is probably in your notes (or at least in the omitted proof): take the $q^{\mathrm{th}}$ power of all the sizes, then use them as the weights in the formula for the weighted (1-q)-power mean. (Of course it isn’t exactly a power mean any more, since the $q^{\mathrm{th}}$ powers of sizes will no longer sum to 1.)

Posted by: Mike Shulman on April 15, 2017 12:37 PM | Permalink | Reply to this

### Re: Value

There’s another way to see the formula for $\sigma_q$ as a power mean:

$\sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum p_i \Bigl(\frac{v_i}{p_i}\Bigr)^{1 - q} \Bigr)^{1/(1 - q)} = M_{1 - q}(\mathbf{p}, \mathbf{v}/\mathbf{p})$

where the quotient $\mathbf{v}/\mathbf{p}$ is defined coordinatewise and $M_{1 - q}$ is the power mean of order $1 - q$.

Suppose each “element” $i$ of our set is made up of a collection of $k_i$ individuals, and write $k = \sum_i k_i$. Then it’s reasonable to take $p_i = k_i/k$, and by the formula above,

$\sigma_q( \mathbf{p}, \mathbf{v} ) = k \cdot M_{1 - q}\Bigl(\mathbf{p}, \Bigl(\frac{v_1}{k_1}, \ldots, \frac{v_n}{k_n}\Bigr)\Bigr).$

This invites us to imagine taking the value of the $i$th element and sharing it evenly among the $k_i$ individuals that make up that element. Then each individual has a value of $v_i/k_i$, and the last displayed equation says that the value of the whole is the number of individuals times the average value of each individual.

It may be that “value” is not the best word. I’m open to suggestions!

Posted by: Tom Leinster on April 15, 2017 2:32 PM | Permalink | Reply to this

### Re: Value

You mentioned this in passing:

Incidentally, this composition makes $(\Delta_n)$ into an operad, a fact that we’ve discussed many times before on this blog.

but you didn’t say what all the properties of $\sigma$ mean in operadic language. I expect you noticed that most of them are quite natural:

• Modularity means that $\sigma$ makes $(0,\infty)$ into an algebra over the operad $(\Delta_n)$. (Actually, it doesn’t include the unit condition explicitly, but that follows from Homogeneity and Replication.)

• Symmetry means that this is an algebra for the symmetric operad $(\Delta_n)$.

• Absent Elements means that this is actually an algebra for the semicartesian operad $(\Delta_n)$.

• Homogeneity means that this is an algebra in the category of $(0,\infty)$-sets, where $(0,\infty)$ is a multiplicative monoid.

• Monotonicity means that it is an algebra in the category of posets.

• Continuity In Positive Probabilities means that if we restrict it to an action of the operad $(\Delta^\circ_n)$ of interiors of simplices then it is an algebra in the category of topological spaces. Absent Elements means that the action of all of $(\Delta_n)$ is determined by its action on $(\Delta^\circ_n)$; in fact I suspect that $(\Delta_n)$ is the free semicartesian operad on the symmetric operad $(\Delta^\circ_n)$.

So all together these properties say that we are making $(0,\infty)$ into an algebra for $(\Delta^\circ_n)$ in the category of topological ordered $(0,\infty)$-sets, where $(\Delta^\circ_n)$ is a topological operad (without order or $(0,\infty)$-action).

I can’t think of a nice operadic way to state Replication, though, which probably has something to do with its weirdness to me. I notice that Replication for a fixed value of $n$ is just a normalization condition; its nontriviality has to do with how the normalization of the actions at different values of $n$ are related.

Posted by: Mike Shulman on April 15, 2017 11:08 PM | Permalink | Reply to this

### Re: Value

Thanks, Mike; that’s nice! The post above is very close to stuff I said at the functional equations course, where the audience is very mixed (a wide variety of different kind of mathematician, plus a few biologists and physicists). That mixedness is really great, but prevents me from saying sleek categorical things like you just did.

I’ve been trying to think of how to state the replication principle in a nice categorical way, but without success. At one stage I thought the key might be to extend the $(\Delta_n)$-algebra structure $\sigma_q$ on $(0, \infty)$ to a $(0, \infty)^\bullet$-algebra structure, where by $(0, \infty)^\bullet$ I mean the operad whose set of $n$-ary operations is $(0, \infty)^n$ and which contains $(\Delta_n)$ as a suboperad. But I haven’t figured that out.

There’s a similar, related, challenge that I’ve come across before. The Shannon entropy of a finite probability distribution $\mathbf{p}$,

$H(\mathbf{p}) = - \sum_i p_i \log p_i,$

depends on the choice of base for the logarithm, and changing the base multiplies $H$ by a constant factor. Several axiomatic characterizations of Shannon entropy only characterize it up to a constant factor, or else include an axiom specifically to eliminate that factor. For instance,

$H(1/n, 1/n, \ldots, 1/n) = \log n,$

so including the axiom “$H(1/2, 1/2) = 1$” forces the base of the logarithm to be $2$.

One could choose to work not with entropy but with its “exponential”. Let me temporarily write $H_b(\mathbf{p})$ for the entropy defined using logarithms to base $b$. Then by the “exponential of entropy”, I mean

$D(\mathbf{p}) = b^{H_b(\mathbf{p})} = 1/p_1^{p_1} \cdots p_n^{p_n}.$

This is independent of the choice of $b$! So whereas $H(\mathbf{p})$ is only well-defined up to a constant factor (if we allow $b$ to vary), $D(\mathbf{p})$ is really a unique, canonical thing. That’s one benefit of using $D$ rather than $H$. Another, related, benefit is the formula

$D(1/n, 1/n, \ldots, 1/n) = n,$

which again doesn’t contain a logarithm. This last equation is often called the “effective number” property, and is very close to the property I called “replication” in my post.

Now, obviously any axiomatic characterization of Shannon entropy $H$ can be translated into an axiomatic characterization of $D$. Since many of these characterizations only pin $H$ down up to a constant factor, they only pin $D$ down up to a constant power. However, there’s something special about $D$ itself as opposed to $D^c$ for any power $c \neq 1$, since getting $D(1/n, \ldots, 1/n)$ to come out as $n$ rather than $n^c$ is special. So we might find ourselves putting in the condition

$D(1/n, \ldots, 1/n) = n$

by hand.

This isn’t very satisfactory; it would be better if it could be understood as something categorically natural. But I don’t know how to do this. For example, back here I described a categorical characterization of $H$: Shannon entropy and its scalar multiples are exactly the internal $\mathbf{P}$-algebras in the categorical $\mathbf{P}$-algebra $\mathbb{R}_+$. (Here $\mathbf{P}$ is the operad $(\Delta_n)$, and all the terminology is defined in that post.) You can easily translate that into a categorical characterization of $D$ and its powers. But I don’t know a natural categorical way to put my finger on $D$ itself.

Posted by: Tom Leinster on April 17, 2017 11:21 AM | Permalink | Reply to this

### Re: Value

Is there any particular reason you object to the scaling freedom in characterizations of entropy? Lots of perfectly respectable characterization theorems only work up to scaling: For each $n$-dimensional vector space $V$, there is a unique alternating $n$-linear form on $V^n$, up to a constant factor. There is a unique translation-invariant $\sigma$-finite measure on $\mathbb{R}^n$, up to a constant factor. There are standard choices of normalization in those cases, but I’m not sure if they’re exactly categorically natural.

Posted by: Mark Meckes on April 17, 2017 6:55 PM | Permalink | Reply to this

### Re: Value

Is there any particular reason you object to the scaling freedom in characterizations of entropy?

Sorry, I probably expressed myself badly. I don’t object to the scaling freedom in characterizations of entropy. What I object to — or more accurately, would like to find a way to bypass — is the freedom in characterizations of diversity $D$ (that is, the exponential of entropy).

The fundamental difference between $D(\mathbf{p}) = \prod_i p_i^{-p_i}$ and its logarithm $H(\mathbf{p}) = - \sum_i p_i \log p_i$ is that $D$ does not depend on a choice of base, but $H$ does. So while it seems reasonable to me that characterizations of $H$ leave the base unspecified (and therefore contain one degree of freedom), it seems like more of a shortcoming when characterizations of $D$ contain a corresponding degree of freedom.

So, I want to find a categorical viewpoint from which the “effective number” condition $D(1/n, \ldots, 1/n) = n$ is natural.

Posted by: Tom Leinster on April 17, 2017 7:11 PM | Permalink | Reply to this

### Re: Value

Ah, I wasn’t reading closely enough. I agree, it would be very nice to find a categorical viewpoint that makes the effective number condition natural.

Posted by: Mark Meckes on April 17, 2017 7:28 PM | Permalink | Reply to this

Post a New Comment