### Value

#### Posted by Tom Leinster

What is the value of the whole in terms of the values of the parts?

More specifically, given a finite set whose elements have assigned “values” $v_1, \ldots, v_n$ and assigned “sizes” $p_1, \ldots, p_n$ (normalized to sum to $1$), how can we assign a value $\sigma(\mathbf{p}, \mathbf{v})$ to the set in a coherent way?

This seems like a very general question. But in fact, just a few sensible requirements on the function $\sigma$ are enough to pin it down almost uniquely. And the answer turns out to be closely connected to existing mathematical concepts that you probably already know.

Let’s write

$\Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n : p_i \geq 0, \sum p_i = 1 \Bigr\}$

for the set of probability distributions on $\{1, \ldots, n\}$. Assuming that our “values” are positive real numbers, we’re interested in sequences of functions

$\Bigl( \sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty) \Bigr)_{n \geq 1}$

that aggregate the values of the elements to give a value to the whole set. So, if the elements of the set have relative sizes $\mathbf{p} = (p_1, \ldots, p_n)$ and values $\mathbf{v} = (v_1, \ldots, v_n)$, then the value assigned to the whole set is $\sigma(\mathbf{p}, \mathbf{v})$.

Here are some properties that it would be reasonable for $\sigma$ to satisfy.

**Homogeneity** The idea is that whatever “value” means, the value of
the set and the value of the elements should be measured in the same
units. For instance, if the elements are valued in kilograms then the set
should be valued in kilograms too. A switch from kilograms to grams would then
multiply both values by 1000. So, in general, we ask that

$\sigma(\mathbf{p}, c\mathbf{v}) = c \sigma(\mathbf{p}, \mathbf{v})$

for all $\mathbf{p} \in \Delta_n$, $\mathbf{v} \in (0, \infty)^n$ and $c \in (0, \infty)$.

**Monotonicity** The values of the elements are supposed to make a
*positive* contribution to the value of the whole, so we ask that if
$v_i \leq v'_i$ for all $i$ then

$\sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')$

for all $\mathbf{p} \in \Delta_n$.

**Replication** Suppose that our $n$ elements have the same size and
the same value, $v$. Then the value of the whole set should be $n v$.
This property says, among other things, that $\sigma$ isn’t an *average*: putting in more
elements of value $v$ increases the value of the whole set!

If $\sigma$ is homogeneous, we might as well assume that $v = 1$, in which case the requirement is that

$\sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.$

**Modularity** This one’s a basic logical axiom, best illustrated by
an example.

Imagine that we’re very ambitious and wish to evaluate the entire planet — or at least, the part that’s land. And suppose we already know the values and relative sizes of every country.

We could, of course, simply put this data into $\sigma$ and get an answer immediately.
But we could instead begin by evaluating each *continent*, and then
compute the value of the planet using the values and sizes of the
continents. If $\sigma$ is sensible, this should give the same answer.

The notation needed to express this formally is a bit heavy. Let $\mathbf{w} \in \Delta_n$; in our example, $n = 7$ (or however many continents there are) and $\mathbf{w} = (w_1, \ldots, w_7)$ encodes their relative sizes. For each $i = 1, \ldots, n$, let $\mathbf{p}^i \in \Delta_{k_i}$; in our example, $\mathbf{p}^i$ encodes the relative sizes of the countries on the $i$th continent. Then we get a probability distribution

$\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n) = (w_1 p^1_1, \ldots, w_1 p^1_{k_1}, \,\,\ldots, \,\, w_n p^n_1, \ldots, w_n p^n_{k_n}) \in \Delta_{k_1 + \cdots + k_n},$

which in our example encodes the relative sizes of all the countries on the planet. (Incidentally, this composition makes $(\Delta_n)$ into an operad, a fact that we’ve discussed many times before on this blog.) Also let

$\mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1}, \,\,\ldots,\,\, \mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.$

In the example, $v^i_j$ is the value of the $j$th country on the $i$th continent. Then the value of the $i$th continent is $\sigma(\mathbf{p}^i, \mathbf{v}^i)$, so the axiom is that

$\sigma \bigl( \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n), (v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n}) \bigr) = \sigma \Bigl( \mathbf{w}, \bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n, \mathbf{v}^n) \bigr) \Bigr).$

The left-hand side is the value of the planet calculated in a single step, and the right-hand side is its value when calculated in two steps, with continents as the intermediate stage.

**Symmetry** It shouldn’t matter what order we list the elements
in. So it’s natural to ask that

$\sigma(\mathbf{p}, \mathbf{v}) = \sigma(\mathbf{p} \tau, \mathbf{v} \tau)$

for any $\tau$ in the symmetric group $S_n$, where the right-hand side refers to the obvious $S_n$-actions.

**Absent elements** should count for nothing! In other words, if $p_1 = 0$
then we should have

$\sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr) = \sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).$

This isn’t *quite* triival. I haven’t yet given you any examples of the kind of function that $\sigma$
might be, but perhaps you already have in mind a simple one like this:

$\sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.$

In words, the value of the whole is simply the sum of the values of the parts, regardless of their sizes. But if $\sigma$ is to have the “absent elements” property, this won’t do. (Intuitively, if $p_i = 0$ then we shouldn’t count $v_i$ in the sum, because the $i$th element isn’t actually there.) So we’d better modify this example slightly, instead taking

$\sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.$

This function (or rather, sequence of functions) *does* have the “absent elements” property.

**Continuity in positive probabilities** Finally, we ask that for
each $\mathbf{v} \in (0, \infty)^n$, the function $\sigma(-, \mathbf{v})$
is continuous on the interior of the simplex $\Delta_n$, that is,
continuous over those probability distributions
$\mathbf{p}$ such that $p_1, \ldots, p_n \gt 0$.

Why only over the *interior* of the simplex? Basically because of
natural examples of $\sigma$ like the one just given, which is continuous
on the interior of the simplex but not the boundary. Generally, it’s
sometimes useful to make a sharp, discontinuous distinction between the
cases $p_i \gt 0$ (presence) and $p_i = 0$ (absence).

Arrow’s famous theorem states that a few apparently mild conditions on a voting system are, in fact, mutually contradictory. The mild conditions above are not mutually contradictory. In fact, there’s a one-parameter family $\sigma_q$ of functions each of which satisfies these conditions. For real $q \neq 1$, the definition is

$\sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.$

For instance, $\sigma_0$ is the example of $\sigma$ given above.

The formula for $\sigma_q$ is obviously invalid at $q = 1$, but it converges to a limit as $q \to 1$, and we define $\sigma_1(\mathbf{p}, \mathbf{v})$ to be that limit. Explicitly, this gives

$\sigma_1(\mathbf{p}, \mathbf{v}) = \prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.$

In the same way, we can define $\sigma_{-\infty}$ and $\sigma_\infty$ as the appropriate limits:

$\sigma_{-\infty}(\mathbf{p}, \mathbf{v}) = \max_{i \,:\, p_i \gt 0} v_i/p_i, \qquad \sigma_{\infty}(\mathbf{p}, \mathbf{v}) = \min_{i \,:\, p_i \gt 0} v_i/p_i.$

And it’s easy to check that for each $q \in [-\infty, \infty]$, the function $\sigma_q$ satisfies all the natural conditions listed above.

These functions $\sigma_q$ might be unfamiliar to you, but they have some special cases that are quite well-explored. In particular:

Suppose you’re in a situation where the elements don’t have “sizes”. Then it would be natural to take $\mathbf{p}$ to be the uniform distribution $\mathbf{u}_n = (1/n, \ldots, 1/n)$. In that case, $\sigma_q(\mathbf{u}_n, \mathbf{v}) = const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)},$ where the constant is a certain power of $n$. When $q \leq 0$, this is exactly a constant times $\|\mathbf{v}\|_{1 - q}$, the $(1 - q)$-norm of the vector $\mathbf{v}$.

Suppose you’re in a situation where the elements don’t have “values”. Then it would be natural to take $\mathbf{v}$ to be $\mathbf{1} = (1, \ldots, 1)$. In that case, $\sigma_q(\mathbf{p}, \mathbf{1}) = \bigl( \sum p_i^q \bigr)^{1/(1 - q)}.$ This is the quantity that ecologists know as the Hill number of order $q$ and use as a measure of biological diversity. Information theorists know it as the exponential of the Rényi entropy of order $q$, the special case $q = 1$ being Shannon entropy. And actually, the

*general*formula for $\sigma_q$ is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).

Anyway, the big — and as far as I know, new — result is:

TheoremThe functions $\sigma_q$ are the only functions $\sigma$ with the seven properties above.

So although the properties above don’t seem that demanding, they actually force our notion of “aggregate value” to be given by one of the functions in the family $(\sigma_q)_{q \in [-\infty, \infty]}$. And although I didn’t even mention the notions of diversity or entropy in my justification of the axioms, they come out anyway as special cases.

I covered all this yesterday in the tenth and penultimate installment of the functional equations course that I’m giving. It’s written up on pages 38–42 of the notes so far. There you can also read how this relates to more realistic measures of biodiversity than the Hill numbers. Plus, you can see an outline of the (quite substantial) proof of the theorem above.

## Re: Value

This looks fascinating! But how am I supposed to think of such a notion of value? Concretely, what does it mean that each element of the set has an assigned probability $p_i$? The obvious interpretation is that the element $i$ is actually contained in the set only with probability $p_i$, and that all these probabilities are independent. This would suggest that taking the expectation

value$\sum_i p_i v_i$ should be a reasonable notion of value, but this contradicts the replication property (as you note explicitly).Perhaps the answer is that you want diversity to be a value in itself? Do you have in mind a way to motivate this without talking about diversity measures?

Oh, and there’s a small typo in the definition of $\sigma_1$, where the subscript $i$ should be in the exponent.

A beautiful pun!