What is the value of the whole in terms of the values of the parts?

More specifically, given a finite set whose elements have assigned
“values” $v_1, \ldots, v_n$ and assigned “sizes” $p_1, \ldots, p_n$ (normalized to sum to $1$), how can
we assign a value $\sigma(\mathbf{p}, \mathbf{v})$ to the set in a
coherent way?

This seems like a very general question. But in fact, just a few sensible
requirements on the function $\sigma$ are enough to pin it down almost
uniquely. And the answer turns out to be closely connected to
existing mathematical concepts that you probably already know.

Let’s write

$\Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n :
p_i \geq 0, \sum p_i = 1 \Bigr\}$

for the set of probability distributions on $\{1, \ldots, n\}$. Assuming that our
“values” are positive real numbers, we’re interested in sequences of
functions

$\Bigl(
\sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty)
\Bigr)_{n \geq 1}$

that aggregate the values of the elements to give a value to the whole
set. So, if the elements of the set have relative sizes $\mathbf{p} =
(p_1, \ldots, p_n)$ and values $\mathbf{v} = (v_1, \ldots, v_n)$, then the
value assigned to the whole set is $\sigma(\mathbf{p}, \mathbf{v})$.

Here are some properties that it would be reasonable for $\sigma$ to
satisfy.

**Homogeneity** The idea is that whatever “value” means, the value of
the set and the value of the elements should be measured in the same
units. For instance, if the elements are valued in kilograms then the set
should be valued in kilograms too. A switch from kilograms to grams would then
multiply both values by 1000. So, in general, we ask that

$\sigma(\mathbf{p}, c\mathbf{v})
=
c \sigma(\mathbf{p}, \mathbf{v})$

for all $\mathbf{p} \in \Delta_n$, $\mathbf{v} \in (0, \infty)^n$ and $c
\in (0, \infty)$.

**Monotonicity** The values of the elements are supposed to make a
*positive* contribution to the value of the whole, so we ask that if
$v_i \leq v'_i$ for all $i$ then

$\sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')$

for all $\mathbf{p} \in \Delta_n$.

**Replication** Suppose that our $n$ elements have the same size and
the same value, $v$. Then the value of the whole set should be $n v$.
This property says, among other things, that $\sigma$ isn’t an *average*: putting in more
elements of value $v$ increases the value of the whole set!

If $\sigma$ is homogeneous, we might as well assume that $v =
1$, in which case the requirement is that

$\sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.$

**Modularity** This one’s a basic logical axiom, best illustrated by
an example.

Imagine that we’re very ambitious and wish to evaluate
the entire planet — or at least, the part that’s land. And suppose
we already know the values and relative sizes of every country.

We could, of course, simply put this data into $\sigma$ and get an answer immediately.
But we could instead begin by evaluating each *continent*, and then
compute the value of the planet using the values and sizes of the
continents. If $\sigma$ is sensible, this should give the same answer.

The notation needed to express this formally is a bit heavy. Let
$\mathbf{w} \in \Delta_n$; in our example, $n = 7$ (or however many continents there are) and $\mathbf{w} = (w_1, \ldots, w_7)$ encodes their
relative sizes. For each $i = 1, \ldots, n$, let $\mathbf{p}^i \in
\Delta_{k_i}$; in our example, $\mathbf{p}^i$ encodes the relative sizes of the
countries on the $i$th continent. Then we get a probability distribution

$\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n)
=
(w_1 p^1_1, \ldots, w_1 p^1_{k_1},
\,\,\ldots, \,\,
w_n p^n_1, \ldots, w_n p^n_{k_n})
\in
\Delta_{k_1 + \cdots + k_n},$

which in our example encodes the relative sizes of all the countries on the
planet. (Incidentally, this composition makes $(\Delta_n)$ into an operad,
a fact that we’ve discussed many times
before
on this blog.) Also let

$\mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1},
\,\,\ldots,\,\,
\mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.$

In the example, $v^i_j$ is the value of the $j$th country on the $i$th
continent. Then the value of the $i$th continent is $\sigma(\mathbf{p}^i, \mathbf{v}^i)$, so the axiom is that

$\sigma
\bigl(
\mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n),
(v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n})
\bigr)
=
\sigma \Bigl( \mathbf{w},
\bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n,
\mathbf{v}^n) \bigr)
\Bigr).$

The left-hand side is the value of the planet calculated in a single step,
and the right-hand side is its value when calculated in two steps,
with continents as the intermediate stage.

**Symmetry** It shouldn’t matter what order we list the elements
in. So it’s natural to ask that

$\sigma(\mathbf{p}, \mathbf{v})
=
\sigma(\mathbf{p} \tau, \mathbf{v} \tau)$

for any $\tau$ in the symmetric group $S_n$, where the right-hand side
refers to the obvious $S_n$-actions.

**Absent elements** should count for nothing! In other words, if $p_1 = 0$
then we should have

$\sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr)
=
\sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).$

This isn’t *quite* triival. I haven’t yet given you any examples of the kind of function that $\sigma$
might be, but perhaps you already have in mind a simple one like this:

$\sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.$

In words, the value of the whole is simply the sum of the values of the
parts, regardless of their sizes. But if $\sigma$ is to have the “absent
elements” property, this won’t do. (Intuitively, if $p_i = 0$ then we
shouldn’t count $v_i$ in the sum, because the $i$th element isn’t actually
there.) So we’d better modify this example slightly, instead taking

$\sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.$

This function (or rather, sequence of functions) *does* have the “absent elements” property.

**Continuity in positive probabilities** Finally, we ask that for
each $\mathbf{v} \in (0, \infty)^n$, the function $\sigma(-, \mathbf{v})$
is continuous on the interior of the simplex $\Delta_n$, that is,
continuous over those probability distributions
$\mathbf{p}$ such that $p_1, \ldots, p_n \gt 0$.

Why only over the *interior* of the simplex? Basically because of
natural examples of $\sigma$ like the one just given, which is continuous
on the interior of the simplex but not the boundary. Generally, it’s
sometimes useful to make a sharp, discontinuous distinction between the
cases $p_i \gt 0$ (presence) and $p_i = 0$ (absence).

Arrow’s famous
theorem
states that a few apparently mild conditions on a voting system are, in
fact, mutually contradictory. The mild conditions above are not mutually
contradictory. In fact, there’s a one-parameter family $\sigma_q$ of
functions each of which satisfies these conditions. For real $q \neq 1$,
the definition is

$\sigma_q(\mathbf{p}, \mathbf{v})
=
\Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.$

For instance, $\sigma_0$ is the example of $\sigma$ given above.

The formula for $\sigma_q$ is obviously invalid at $q = 1$, but it converges to a limit as $q
\to 1$, and we define $\sigma_1(\mathbf{p}, \mathbf{v})$ to be that limit.
Explicitly, this gives

$\sigma_1(\mathbf{p}, \mathbf{v})
=
\prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.$

In the same way, we can define $\sigma_{-\infty}$ and $\sigma_\infty$ as
the appropriate limits:

$\sigma_{-\infty}(\mathbf{p}, \mathbf{v})
=
\max_{i \,:\, p_i \gt 0} v_i/p_i,
\qquad
\sigma_{\infty}(\mathbf{p}, \mathbf{v})
=
\min_{i \,:\, p_i \gt 0} v_i/p_i.$

And it’s easy to check that for each $q \in [-\infty, \infty]$, the
function $\sigma_q$ satisfies all the natural conditions listed above.

These functions $\sigma_q$ might be unfamiliar to you, but they have some
special cases that are quite well-explored. In particular:

Suppose you’re in a situation where the elements don’t have “sizes”.
Then it would be natural to take $\mathbf{p}$ to be the uniform
distribution $\mathbf{u}_n = (1/n, \ldots, 1/n)$. In that case,
$\sigma_q(\mathbf{u}_n, \mathbf{v})
= const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)},$
where the constant is a certain power of $n$. When $q \leq 0$, this is
exactly a constant times $\|\mathbf{v}\|_{1 - q}$, the $(1 -
q)$-norm
of the vector $\mathbf{v}$.

Suppose you’re in a situation where the elements don’t have “values”.
Then it would be natural to take $\mathbf{v}$ to be $\mathbf{1} = (1,
\ldots, 1)$. In that case,
$\sigma_q(\mathbf{p}, \mathbf{1})
=
\bigl( \sum p_i^q \bigr)^{1/(1 - q)}.$
This is the quantity that ecologists know as the Hill number of order
$q$
and use as a measure of biological diversity. Information theorists know
it as the exponential of the Rényi
entropy of order $q$,
the special case $q = 1$ being Shannon entropy. And actually, the *general* formula for $\sigma_q$ is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).

Anyway, the big — and as far as I know, new — result is:

**Theorem** *The functions $\sigma_q$ are the only functions
$\sigma$ with the seven properties above.*

So although the properties above don’t seem that demanding, they actually
force our notion of “aggregate value” to be given by one of the functions
in the family $(\sigma_q)_{q \in [-\infty, \infty]}$. And although I
didn’t even mention the notions of diversity or entropy in my justification
of the axioms, they come out anyway as special cases.

I covered all this yesterday in the tenth and penultimate installment of the
functional equations course that I’m giving. It’s written up on
pages 38–42 of the notes so
far. There you can also read
how this relates to more realistic measures of
biodiversity
than the Hill numbers. Plus, you can see an outline of the (quite
substantial) proof of the theorem above.