### Means

#### Posted by Tom Leinster

I’ve been interested for a while in abstract notions of size, and this has got me thinking about abstract notions of measure—hence, inescapably, of integration. Now integration can be thought of as an averaging process: for instance, the mean value of a function $f: [0, 1] \to \mathbf{R}$ is simply $\int_0^1 f$. So means provide another way to think about measure and integral (and probability, indeed).

For most types of integration there are restrictions on the functions involved:
you can only integrate an *integrable* function. But I’m going to
describe three slightly unusual settings in which the challenge is to find a
notion of integration, or mean, in which *every* function is allowed.
Here they are:

- The mean of binary digits
- Amenable groups
- Arrow’s Theorem on voting systems.

Then there’s a postscript on voting and enriched categories.

### The mean of binary digits

Let’s start with something really minimal and structureless. Fix a set $X$. Is there any meaningful way of assigning to each function $f: X \to \{0, 1\}$ a ‘mean value’ $\mu(f) \in \{0, 1\}$?

It depends what mean means. Certainly the mean of a constant function should be that constant, so the first axiom is:

a.if $f$ has constant value $s$ then $\mu(f) = s$ (where $s \in \{0, 1\}$)

Means are usually linear, so we now want to say something like ‘$\mu(f + g) = \mu(f) + \mu(g)$’. We can’t necessarily add together binary digits (since $1 + 1 = 2$), but it’s OK if $f$ and $g$ have disjoint supports. So the second and final axiom is:

b.if $f, g: X \to \{0, 1\}$ and there is no $x \in X$ for which $f(x) = g(x) = 1$ then $\mu(f + g) = \mu(f) + \mu(g)$.

A **mean** on $X$ is a function $\mu: \{0, 1\}^X \to \{0, 1\}$ satisfying
these two axioms.

There are lots of equivalent formulations of this concept.

For instance, $\{0,
1\}$ is a Boolean algebra, so $\{0, 1\}^X$ is too, and you can equivalently
replace **b** with the superficially stronger axiom that $\mu$ preserves the
whole Boolean algebra structure:
$\begin{aligned}
f \leq g &\implies \mu(f) \leq \mu(g); \\
\mu(f \vee g) &= \mu(f) \vee \mu(g); \\
\mu(f \wedge g) &= \mu(f) \wedge \mu(g); \\
\mu(1 - f) &= 1 - \mu(f)
\end{aligned}$
where $\vee = max$ and $\wedge = min$.

Or, you can switch from integration to measure. It’s perhaps unfortunate that
the letter $\mu$ is commonly used both for mean and for measure, but I’ll take
advantage of this coincidence. For subsets $Y$ of $X$, I’ll write
$\mu(Y)$ to mean $\mu(\chi_Y)$, where $\chi_Y$ is the characteristic function
of $Y$. Writing $P$ for powerset,
$\mu$ then becomes a map $P(X) \to \{0, 1\}$.
Axiom **a** then says that $\mu(\emptyset) = 0$ and $\mu(X) = 1$, and axiom
**b** can be stated as the inclusion-exclusion principle:
$\mu(Y \cup Z) = \mu(Y) + \mu(Z) - \mu(Y \cap Z)$
($Y, Z \subseteq X$). Such a $\mu$ is also called a **valuation**.

Or, since the mean value is restricted to lie in $\{0, 1\}$, you might as well
just record those subsets $Y \subseteq X$ for which $\mu(Y) = 1$. Let’s write
$\mathcal{U} = \{ Y \subseteq X: \mu(Y) = 1\},$
and say that a subset $Y \subseteq X$ is **large** if $Y \in
\mathcal{U}$, and **small** if $X \setminus Y \in \mathcal{U}$. Axioms
**a** and **b** are then equivalent to:

- every subset of $X$ is either large or small, but not both
- every subset of a small set is small
- a finite union of small sets is small.

A set $\mathcal{U}$ of subsets of $X$ with these three
properties is called an **ultrafilter** on $X$. So, means are the same as
ultrafilters.

In all three of the situations that I’ll describe, the basic question is this:

Do nontrivial means exist?

In the current situation, ‘nontrivial’ means the following. There is, for each
$x \in X$, a trivial mean $\mu$ given by
$\mu(f) = f(x).$
If you’re thinking of $f$ as a function $X \to \{0, 1\}$, this is evaluation at
$x$. If you’re thinking of $f$ as an element of the product set $\{0, 1\}^X =
\{0, 1\} \times \{0, 1\} \times \cdots$, it’s projection onto the $x$th
factor. Regarding $\mu$ as a measure, it’s the point measure, or Dirac delta,
at $x$. Regarding $\mu$ as an ultrafilter, it’s the
**principal ultrafilter** at
$x$: a subset is large if and only if it contains $x$.

It’s an easy exercise to show that when $X$ is finite, these are the
*only* means. When $X$ is infinite, the situation is more interesting.
There *are* nontrivial means, but you can’t construct any: you have to
invoke some form of the Axiom of Choice. (I’ll assume Choice throughout.) So
the answer to the question is:

No if $X$ is finite; yes if $X$ is infinite.

### Amenable groups

Now let’s try the same thing, but giving $X$ the structure of a group. The mean should respect the group structure in the following sense. Given $x \in X$, write $\begin{aligned} \tau_x: X& \to &X \\ y & \mapsto &x y \end{aligned}$ for left translation by $X$. Then we require

c.$\mu(f \circ \tau_x) = \mu(f)$, for all $f: X \to \{ 0, 1 \}$ and $x \in X$.

Thinking in terms of measure, this says that the measure of a subset is the same as the measure of each of its translates.

But there’s an instant problem. For example, taking $X = \mathbf{Z}$, we have $\mu(2\mathbf{Z}) = \mu(1 + 2\mathbf{Z})$ —the even and odd numbers have the same measure. But the disjoint union of $2\mathbf{Z}$ and $1 + 2\mathbf{Z}$ is $\mathbf{Z}$, which has measure $1$. So $2\mu(2\mathbf{Z}) = 1$. Similarly, $n\mu(n\mathbf{Z}) = 1$ for any positive integer $n$, and if $X$ is a finite group of order $n$ then $n\mu(\{x\}) = 1$ for all $x \in X$. All of these things are impossible if mean values are restricted to lie in $\{0, 1\}$.

So, we make the question more interesting by allowing $\mu$ to take values in
the real interval $[0, 1]$. Precisely: a **mean** on a group $X$ is a
function
$\mu: \{0, 1\}^X \to [0, 1]$
satisfying axioms **a**, **b** and **c**. A group is **amenable**
if it admits at least one mean. (There are no ‘trivial’
means to be excluded.)

The word ‘amenable’ is kind of a pun. In ordinary English the word means ‘willing’ or ‘easy to work with’. But said out loud (by me, at least) it sounds like ‘a-mean-able’.

(You might think that there should be separate definitions of *left
amenable* and *right amenable*, because of the asymmetry in the
definition above. But the two are equivalent. Any group $X$ is isomorphic
to its opposite $X^{op}$, the same set with the multiplication reversed. Hence
$X$ is left amenable iff $X^{op}$ is left amenable. On the other hand,
$X^{op}$ is left amenable iff $X$ is right amenable.)

The theory of amenable groups has been intensively studied. There are a zillion equivalent ways of stating the definition, and there are close and fascinating links to all sorts of topics in algebra, topology and analysis. Here I’ll just say a few simple things.

Every finite group is amenable. In fact, there is exactly one mean on a finite group: it’s the mean in the obvious sense.

Every abelian group is amenable. This is not at all obvious, as far as I know. If you haven’t seen this before, I encourage you to stop and think for a few minutes about how you might prove that $\mathbf{Z}$ is amenable: how would you define a mean, or a measure?

…

From the arguments above, the measure needs to be set up so that any coset of $n\mathbf{Z}$ has measure $1/n$. What about, say, the set of all primes in $\mathbf{Z}$? You’d imagine that it would probably have measure $0$, since the primes get sparser and sparser, but that’s not so clear.

The standard way to ‘construct’ a measure, which I won’t describe, uses the concept of Følner sequence. It’s a non-constructive definition. Among other things, it involves choosing a nonprincipal ultrafilter on the set $\mathbf{N}$—in other words, a nontrivial mean on $\mathbf{N}$, in the sense of the first part above.

That’s just $\mathbf{Z}$. Showing that *every* abelian group is amenable
usually seems to be proved by bootstrapping your way up from $\mathbf{Z}$ to
finitely generated abelian groups, and from there to all abelian groups. At
one point I tried to discover whether the proof could be streamlined, and made
a bit of progress, thanks to nice
people at MathOverflow.

The free group on two or more elements is *not* amenable. It’s a pleasant
exercise to show this.

There are some groups whose status is famously unknown. This is so for Thompson’s group $F$. Two papers on this question were arXived last year: one claiming to prove that $F$ is amenable, and the other claiming to prove that it is not. Danny Calegari had an excellent blog post on this affair. We also discussed it a bit here at the Café.

So in this second situation, the existence of means is a very substantial issue.

### Arrow’s Theorem

So far we’ve discussed means of *numbers*. Now we’re going to discuss
means of *structures*. Given a set $C$ and a bunch of group structures on
$C$, what is the mean group structure? Given a bunch of topologies on $C$,
what is the mean topology?

This might sound like madness. But there are some structures for which it is a reasonable question. Arrow’s Theorem concerns the case of total orders.

Here’s a standard way of narrating Arrow’s Theorem. In an election, there are a finite set $X$ of voters and a finite set $C$ of candidates. Each voter chooses a total order on $C$. The outcome of the election is also to be a total order on $C$. Is there any reasonable system for deciding the outcome on the basis of the votes? Arrow’s Theorem says ‘no’. And it’s a very strong ‘no’, in that the interpretation of the word ‘reasonable’ is dramatically generous.

When we come to define ‘reasonable’, there will be very few concessions to equality, either between voters or between candidates. Perhaps men’s votes count for more than women’s, or the government has used its power to rig the system in its favour. But it needn’t be anything reprehensible. Perhaps the voters are the members of a committee and the chair has a casting vote, so that not all voters are equal. Or perhaps the vote is on various options for constitutional change, and the system gives preference to the status quo.

You can find plenty of expositions of Arrow’s Theorem, but what follows is one that I haven’t seen before, using categorical language.

Fix finite sets $X$ (the voters) and $C$ (the candidates). Given a set $A$,
write $T(A)$ for the set of total orders on $A$. A
**mean** or **voting system** for $X$ and $C$ is a function
$\mu: T(C)^X \to T(C)$
satisfying axioms **i** and **ii** below.

It will be convenient to write $X = \{1, \ldots, n\}$, so that an element of $T(C)^X = T(C)^n$ may be written as $(\leq_1, \ldots, \leq_n)$ where each $\leq_i$ is a total order on the set $C$ of candidates.

Total orders have the following key property: when you choose a total order on $A$, you also implicitly choose a total order on each subset of $A$. So for any subset $B \subseteq A$, there is a restriction map $T(A) \to T(B)$. This makes $T$ into a functor $T: P(C)^{op} \to Set,$ where $P(C)$ is the powerset of $C$, ordered by inclusion.

First axiom:

i.$\mu$ extends to a natural transformation $\begin{aligned} &\stackrel{T(-)^n}{\to} & \\ P(C)^{op} &\Downarrow \mu_\bullet & Set. \\ &\stackrel{T\quad}{\to} & \end{aligned}$

In other words, there is a natural transformation $\mu_\bullet: T(-)^n \to T$ whose $C$-component is $\mu: T(C)^n \to T(C)$. It’s not hard to show that there can be at most one such extension. (Use the fact that any total order on a subset of $C$ can be extended to a total order on $C$ itself.)

But what does this *mean*? In its traditional formulation, this axiom
rejoices in the name of the Independence
of Irrelevant Alternatives. (It’s a wonderful name, though in my opinion
it’s hard to beat the splendour of
Sylvester’s
Law of Inertia.)
Suppose there is an election with three candidates: $C = \{ a, b, c \}$.
For some reason the election has to be re-run, with the same voters
and the same candidates. In between the two elections, candidates $a$ and $b$
do nothing interesting, but candidate $c$ makes outrageous
statements. The effect is that in the second election, no voter changes their
opinion on whether $a$ is better than $b$, but some voters change their
opinion about $c$. Independence of Irrelevant Alternatives says that in the
outcomes, $a$ and
$b$ are placed in the same
order both times. (Candidate $c$ is the ‘irrelevant alternative’.)

Briefly put: the axiom says that the voting system is compatible with restricting to subsets of the candidates. This is why it has something to do with naturality.

There is a canonical natural transformation $\begin{aligned} &\stackrel{T(-)^n}{\to} & \\ P(C)^{op} &\Uparrow \Delta & Set \\ &\stackrel{T\quad}{\to} & \end{aligned}$ —the diagonal. Its component $\Delta_A: T(A) \to T(A)^n$ at $A \subseteq C$ sends $\leq \in T(A)$ to $(\leq, \ldots, \leq) \in T(A)^n$.

Second axiom: if everyone puts the candidates in the same order, then the outcome of the election also puts the candidates in that order. That is:

ii.$\mu_\bullet \circ \Delta = 1_T$.

This is not the axiomatization of voting systems to be found in most statements of Arrow’s Theorem, but unless I’m mistaken, it’s equivalent.

There are trivial means or voting systems: dictatorships. For any $x \in X = \{1, \ldots, n\}$, there is a mean $\mu$ given by $\mu(\leq_1, \ldots, \leq_n) = \leq_x.$ The corresponding natural transformation $\mu_\bullet$ is the $x$th projection, which is obviously a one-sided inverse of $\Delta$.

Do nontrivial means exist?

If there are only two candidates, yes. For example, choose any percentage $p > 0$, and declare the first candidate to be the winner iff they are preferred by $\geq p$ percent of the electorate. But otherwise, no:

Theorem (Arrow)Let $X$ and $C$ be finite sets with $|X| > 0$ and $|C| > 2$. Then every mean for $X$ and $C$ is trivial.

I’ve interpreted this as a theorem about voting, and that’s the most common interpretation. But I have a couple of reservations about that.

The first is that Arrow’s Theorem is, in a way, *too* strong.
If you talk to someone in the street about what a fair voting system is,
they’ll probably expect things about all voters having an equal voice and all
candidates being treated equally. None of this comes into Arrow’s Theorem.
It’s qualitative rather than quantitative.

Now, there *are* interesting
quantitative aspects of voting theory, such as
Condorcet’s
paradox. Suppose that there are three candidates, $a$, $b$ and $c$, and
three voters, $1$, $2$ and $3$, who vote as follows:
$\begin{aligned}
1: &a > b > c \\
2: &b > c > a \\
3: &c > a > b.
\end{aligned}$
Most voters put $a$ above $b$, so the final outcome should do so too. But
also, most voters put $b$ above $c$, and most voters put $c$ above $a$. So the
outcome should put $a$ above $b$ above $c$ above $a$. *Zut
alors!* Clearly the only fair outcome ranks $a$, $b$ and $c$
equally, an option that’s not available: the outcome has to be a
total order.

But this difficulty has nothing to do with Arrow’s Theorem, since the hypotheses there don’t demand any kind of numerical fairness; they don’t demand that majority opinion prevails.

The second reservation is that real voting systems very seldom fit the hypotheses.

The input is unrealistic: as a voter, I’ve never had to place a
total order on a list of candidates. Either I’ve simply had to put an X in a
box, or I’ve had
to write numbers in boxes—*but there’s always been the option of
leaving some boxes blank*. In the latter case, it’s a total order on a
*subset* of the set of candidates. I’m sure there are real situations
in which you do have put a total order on the whole set of candidates,
but it’s by no means the only possibility available.

The output is also unrealistic. In most political elections, a predetermined number $k$ of candidates will be elected, and the rest will not. (Often $k = 1$.) All that’s necessary is a fair way of deciding who the top $k$ candidates are. There’s no need to come up with an order relation.

### Postscript: voting systems and enriched categories

The problem of finding good voting systems has been studied at great length. For example, a quick browse reveals a bunch of impossibility theorems related to Arrow’s: the Gibbard–Satterthwaite Theorem, the Duggan–Schwartz Theorem, the Holmström Theorem, … . I know nothing about all this. But perhaps there’s something to be gained by thinking about means of structures.

How do I want to be able to vote? Not by imposing a total order. There
are probably a few candidates that I want to put near the top, and a few about
which I know nothing, and a few I actively oppose and want to put near the
bottom. For some pairs of candidates
I have an opinion about who is better; for some I’d say they’re about the same;
for some, I don’t know enough to compare them. So what I
want is to impose a preorder. (A **preorder** is a reflexive transitive
relation; it
is a partial order if $a \leq b \leq a$ implies $a = b$.)

In the definition above of mean or voting system, we used just one key property of total orders: a total order on a set induces a total order on each subset, in a functorial way. Preorders also have this key property. So we can replace the total orders functor $T$ by the preorders functor $O$ throughout, and ask, again:

Do nontrivial means exist?

As usual, I have to say what ‘trivial’ means. There is for each nonempty
subset $K \subseteq X$ (which I like to think of as a
cabal)
a mean $\mu_K: O(C)^n \to O(C)$, defined as follows. Let $(\leq_1, \ldots,
\leq_n) \in O(C)^n$, and write
$\leq = \mu_K(\leq_1, \ldots, \leq_n).$
Then
$a \leq b \quad \text{ iff } \quad a \leq_k b \quad\text{for all} \quad k \in K.$
Call these the **trivial** means. They’re maybe not *that* trivial:
e.g. the one in which $K = X$ is fair in every way, but in a large
population it will very likely put the discrete (antichain) order on the set of
candidates, making it useless.

Conjecture 1For $|C| > 2$, every mean is trivial.

(If there is an infinite number $n = |X|$ of voters, there are nontrivial means got by choosing nontrivial filters on $X$; but we are supposing that $X$ is finite.)

Conjecture 2Conjecture 1 has already been settled.

This stuff is well worked-over!

But this still isn’t *really* how I want to be able to vote. I don’t only
want to be able to preorder the candidates; I want to be able to say *how
much* I prefer this candidate to that one. Visualizing my ordered set as a
Hasse diagram, I want to put lengths on the edges.

For example, the UK has a white supremacist party called the British National Party. If there’s ever a BNP candidate in an election that I’m voting in, I want not only to put that candidate at the bottom of the diagram: I want to assign a length of $\infty$ to all the edges going down to that bottom vertex. (And I reckon that if that possibility was available, far-right parties would find much less success in elections. The strong antipathy that most people have towards them would be taken into account by the voting system, which at present only allows us to ignore them.)

So, for each pair $(a, b)$ of candidates, I assign a non-negative real number $d(a, b)$, specifying how much I prefer $a$ to $b$. That is, I place on the set $C$ of candidates a metric in the sense of Lawvere: a function $d: C \times C \to [0, \infty]$ satisfying $d(a, a) = 0$ and $d(a, b) + d(b, c) \geq d(a, c)$.

A metric on a set determines a metric on every subset. So we can define ‘mean’ in the usual way, imitating the case of total orders. And we can ask, yet again:

Do nontrivial means exist?

And the answer is emphatically yes! There’s even one that’s fair, in every sense of the word that I can think of. At the election, each voter chooses a metric on $C$. Writing $M(C)$ for the set of metrics on $C$, this gives an element $(d_1, \ldots, d_n) \in M(C)^n.$ Now define $d \in M(C)$ by $d(a, b) = \frac{1}{n}(d_1(a, b) + \cdots + d_n(a, b)).$ What could be fairer than that?

Of course, this is a bit tongue-in-cheek. In real elections, not everyone manages to successfully put an X in a box; so it could be argued that it is too much to expect every citizen to specify a Lawvere metric. And you can imagine the candidates standing around the day after the election, saying “OK, the voters have metrized us. Now: who gets to be president?”

But ignoring this…

…the theory here is really about enriched categories. Preorders are categories enriched in the two-element ordered set $0 < 1$. Metric spaces (in this generalized sense) are categories enriched in the ordered set $([0, \infty], \geq)$. In both cases, some monoidal category $\mathbf{V}$ has been fixed, and each voter chooses a $\mathbf{V}$-category structure on $C$ (as object-set). And enriched categories have the key property: a $\mathbf{V}$-category structure on a set determines a $\mathbf{V}$-category structure on every subset.

So, let $\mathbf{V}$ be a monoidal category. Fix finite sets $X = \{1, \ldots, n\}$ (voters) and $C$ (candidates).

Given a set $A$, write $E(A)$ for the set of $\mathbf{V}$-category structures on the object-set $A$. Then $E$ is a functor $E: P(C)^{op} \to Set$ in a natural way.

The diagonal functor $\delta: \mathbf{V} \to \mathbf{V}^n$ is lax (indeed,
strict) monoidal,
so induces a functor $\mathbf{V}–Cat \to \mathbf{V}^n–Cat$. A
$\mathbf{V}^n$-category structure on a set $A$ is just an $n$-tuple of
$\mathbf{V}$-category structures on $A$, so there is an induced natural
transformation
$\begin{aligned}
&\stackrel{E(-)^n}{\to} & \\
P(C)^{op} &\Uparrow \Delta & Set \\
&\stackrel{E\quad}{\to} &
\end{aligned}$
A **mean** for $X$ and $C$ is a natural
transformation $\mu: E(-)^n \to E$
such that $\mu \circ \Delta = 1_E$.

For both the examples above—preorders and metric spaces—I used the following method to construct the means. Suppose that we have a lax monoidal functor $m: \mathbf{V}^n \to \mathbf{V}$ such that $m \circ \delta = 1_\mathbf{V}$. Then, just as the functor $\delta$ induced the natural transformation $\Delta$, the functor $m$ induces a natural transformation $\mu: E(-)^n \to E$. And functoriality gives $\mu \circ \Delta = 1_E$: so $\mu$ is a mean.

When $\mathbf{V} = (0 < 1)$, such an $m$ is a **filter** on $X$ (like an
ultrafilter, but without the axiom that every subset is either large or
small). For example, every subset $K \subseteq X$ determines a filter $m_K$: a
subset $Y \subset X$ is large (i.e. $m_K(\chi_X) = 1$) iff $Y \supseteq K$.
The resulting mean is the ‘cabal’ mean $\mu_K$ defined above.

(Using finiteness of $X$, you can show that these are the *only* filters
on $X$. But that doesn’t close off the possibility that there are other means
on $X$, not arising in this way. That’s Conjecture 1.)

When $\mathbf{V} = [0, \infty]$, such an $m$ is an order-preserving function
$m: [0, \infty]^n \to [0, \infty]$ satisfying
$m(x_1, \ldots, x_n) + m(y_1, \ldots, y_n)
\geq
m(x_1 + y_1, \ldots, x_n + y_n)$
and $m(x, \ldots, x) = x$. The resulting mean $\mu$ is given by
$\mu(d_1, \ldots, d_n) = d$
where
$d(a, b) = m(d_1(a, b), \ldots, d_n(a, b))$
($a, b \in C$). Among these $m$s are those for which the inequality is an
equality, and they are precisely the weighted means in the usual real-number
sense: functions of the form
$m: (x_1, \ldots, x_n) \mapsto p_1 x_1 + \cdots + p_n x_n$
where $p_i \geq 0$ and $\sum_i p_i = 1$. And among *these* is the
standard mean,
with $p_i = 1/n$. This gives the fair voting system $\mu$ described above.

I have no idea how you might take the mean of sets, or of vector spaces, or of objects of other popular enriching categories $\mathbf{V}$.

## Re: Means

Two unrelated comments:

(1) I don’t have the link, but I think there is some related discussion a while back over on Tao’s blog comparing Arrow’s theorem to (non)existence of nontrivial ultrafilters.

(2) Another concern when designing a voting system is whether it creates an incentive for voters to misrepresent their opinions. For example, in your metric voting, where the computer takes the mean, any party that I have ranked infinitely lower than another party will not win. If I am a partisan, I might reasonably decide that I’d rather no party win, and rank everyone except my party infinitely low, with the understanding that someone else will do the same for their party. This leads to a situation, if I’m understanding your algorithm correctly, in which there is no government. Note that if someone else does as I say, then I should as well, and conversely if no one else does, then I might as well, because it makes my party win. So voters are forced to play a Prisoners’ Dilemma game. An example of where parties make such decisions is the US Congress.