Value
Posted by Tom Leinster
What is the value of the whole in terms of the values of the parts?
More specifically, given a finite set whose elements have assigned “values” and assigned “sizes” (normalized to sum to ), how can we assign a value to the set in a coherent way?
This seems like a very general question. But in fact, just a few sensible requirements on the function are enough to pin it down almost uniquely. And the answer turns out to be closely connected to existing mathematical concepts that you probably already know.
Let’s write
for the set of probability distributions on . Assuming that our “values” are positive real numbers, we’re interested in sequences of functions
that aggregate the values of the elements to give a value to the whole set. So, if the elements of the set have relative sizes and values , then the value assigned to the whole set is .
Here are some properties that it would be reasonable for to satisfy.
Homogeneity The idea is that whatever “value” means, the value of the set and the value of the elements should be measured in the same units. For instance, if the elements are valued in kilograms then the set should be valued in kilograms too. A switch from kilograms to grams would then multiply both values by 1000. So, in general, we ask that
for all , and .
Monotonicity The values of the elements are supposed to make a positive contribution to the value of the whole, so we ask that if for all then
for all .
Replication Suppose that our elements have the same size and the same value, . Then the value of the whole set should be . This property says, among other things, that isn’t an average: putting in more elements of value increases the value of the whole set!
If is homogeneous, we might as well assume that , in which case the requirement is that
Modularity This one’s a basic logical axiom, best illustrated by an example.
Imagine that we’re very ambitious and wish to evaluate the entire planet — or at least, the part that’s land. And suppose we already know the values and relative sizes of every country.
We could, of course, simply put this data into and get an answer immediately. But we could instead begin by evaluating each continent, and then compute the value of the planet using the values and sizes of the continents. If is sensible, this should give the same answer.
The notation needed to express this formally is a bit heavy. Let ; in our example, (or however many continents there are) and encodes their relative sizes. For each , let ; in our example, encodes the relative sizes of the countries on the th continent. Then we get a probability distribution
which in our example encodes the relative sizes of all the countries on the planet. (Incidentally, this composition makes into an operad, a fact that we’ve discussed many times before on this blog.) Also let
In the example, is the value of the th country on the th continent. Then the value of the th continent is , so the axiom is that
The left-hand side is the value of the planet calculated in a single step, and the right-hand side is its value when calculated in two steps, with continents as the intermediate stage.
Symmetry It shouldn’t matter what order we list the elements in. So it’s natural to ask that
for any in the symmetric group , where the right-hand side refers to the obvious -actions.
Absent elements should count for nothing! In other words, if then we should have
This isn’t quite triival. I haven’t yet given you any examples of the kind of function that might be, but perhaps you already have in mind a simple one like this:
In words, the value of the whole is simply the sum of the values of the parts, regardless of their sizes. But if is to have the “absent elements” property, this won’t do. (Intuitively, if then we shouldn’t count in the sum, because the th element isn’t actually there.) So we’d better modify this example slightly, instead taking
This function (or rather, sequence of functions) does have the “absent elements” property.
Continuity in positive probabilities Finally, we ask that for each , the function is continuous on the interior of the simplex , that is, continuous over those probability distributions such that .
Why only over the interior of the simplex? Basically because of natural examples of like the one just given, which is continuous on the interior of the simplex but not the boundary. Generally, it’s sometimes useful to make a sharp, discontinuous distinction between the cases (presence) and (absence).
Arrow’s famous theorem states that a few apparently mild conditions on a voting system are, in fact, mutually contradictory. The mild conditions above are not mutually contradictory. In fact, there’s a one-parameter family of functions each of which satisfies these conditions. For real , the definition is
For instance, is the example of given above.
The formula for is obviously invalid at , but it converges to a limit as , and we define to be that limit. Explicitly, this gives
In the same way, we can define and as the appropriate limits:
And it’s easy to check that for each , the function satisfies all the natural conditions listed above.
These functions might be unfamiliar to you, but they have some special cases that are quite well-explored. In particular:
Suppose you’re in a situation where the elements don’t have “sizes”. Then it would be natural to take to be the uniform distribution . In that case, where the constant is a certain power of . When , this is exactly a constant times , the -norm of the vector .
Suppose you’re in a situation where the elements don’t have “values”. Then it would be natural to take to be . In that case, This is the quantity that ecologists know as the Hill number of order and use as a measure of biological diversity. Information theorists know it as the exponential of the Rényi entropy of order , the special case being Shannon entropy. And actually, the general formula for is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).
Anyway, the big — and as far as I know, new — result is:
Theorem The functions are the only functions with the seven properties above.
So although the properties above don’t seem that demanding, they actually force our notion of “aggregate value” to be given by one of the functions in the family . And although I didn’t even mention the notions of diversity or entropy in my justification of the axioms, they come out anyway as special cases.
I covered all this yesterday in the tenth and penultimate installment of the functional equations course that I’m giving. It’s written up on pages 38–42 of the notes so far. There you can also read how this relates to more realistic measures of biodiversity than the Hill numbers. Plus, you can see an outline of the (quite substantial) proof of the theorem above.
Re: Value
This looks fascinating! But how am I supposed to think of such a notion of value? Concretely, what does it mean that each element of the set has an assigned probability ? The obvious interpretation is that the element is actually contained in the set only with probability , and that all these probabilities are independent. This would suggest that taking the expectation value should be a reasonable notion of value, but this contradicts the replication property (as you note explicitly).
Perhaps the answer is that you want diversity to be a value in itself? Do you have in mind a way to motivate this without talking about diversity measures?
Oh, and there’s a small typo in the definition of , where the subscript should be in the exponent.
A beautiful pun!