Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 7, 2010

Means

Posted by Tom Leinster

I’ve been interested for a while in abstract notions of size, and this has got me thinking about abstract notions of measure—hence, inescapably, of integration. Now integration can be thought of as an averaging process: for instance, the mean value of a function f:[0,1]Rf: [0, 1] \to \mathbf{R} is simply 0 1f\int_0^1 f. So means provide another way to think about measure and integral (and probability, indeed).

For most types of integration there are restrictions on the functions involved: you can only integrate an integrable function. But I’m going to describe three slightly unusual settings in which the challenge is to find a notion of integration, or mean, in which every function is allowed. Here they are:

  1. The mean of binary digits
  2. Amenable groups
  3. Arrow’s Theorem on voting systems.

Then there’s a postscript on voting and enriched categories.

 

The mean of binary digits

Let’s start with something really minimal and structureless. Fix a set XX. Is there any meaningful way of assigning to each function f:X{0,1} f: X \to \{0, 1\} a ‘mean value’ μ(f){0,1}\mu(f) \in \{0, 1\}?

It depends what mean means. Certainly the mean of a constant function should be that constant, so the first axiom is:

a.  if ff has constant value ss then μ(f)=s\mu(f) = s (where s{0,1}s \in \{0, 1\})

Means are usually linear, so we now want to say something like ‘μ(f+g)=μ(f)+μ(g)\mu(f + g) = \mu(f) + \mu(g)’. We can’t necessarily add together binary digits (since 1+1=21 + 1 = 2), but it’s OK if ff and gg have disjoint supports. So the second and final axiom is:

b.  if f,g:X{0,1}f, g: X \to \{0, 1\} and there is no xXx \in X for which f(x)=g(x)=1f(x) = g(x) = 1 then μ(f+g)=μ(f)+μ(g)\mu(f + g) = \mu(f) + \mu(g).

A mean on XX is a function μ:{0,1} X{0,1}\mu: \{0, 1\}^X \to \{0, 1\} satisfying these two axioms.

There are lots of equivalent formulations of this concept.

For instance, {0,1}\{0, 1\} is a Boolean algebra, so {0,1} X\{0, 1\}^X is too, and you can equivalently replace b with the superficially stronger axiom that μ\mu preserves the whole Boolean algebra structure: fg μ(f)μ(g); μ(fg) =μ(f)μ(g); μ(fg) =μ(f)μ(g); μ(1f) =1μ(f) \begin{aligned} f \leq g &\implies \mu(f) \leq \mu(g); \\ \mu(f \vee g) &= \mu(f) \vee \mu(g); \\ \mu(f \wedge g) &= \mu(f) \wedge \mu(g); \\ \mu(1 - f) &= 1 - \mu(f) \end{aligned} where =max\vee = max and =min\wedge = min.

Or, you can switch from integration to measure. It’s perhaps unfortunate that the letter μ\mu is commonly used both for mean and for measure, but I’ll take advantage of this coincidence. For subsets YY of XX, I’ll write μ(Y)\mu(Y) to mean μ(χ Y)\mu(\chi_Y), where χ Y\chi_Y is the characteristic function of YY. Writing PP for powerset, μ\mu then becomes a map P(X){0,1}P(X) \to \{0, 1\}. Axiom a then says that μ()=0\mu(\emptyset) = 0 and μ(X)=1\mu(X) = 1, and axiom b can be stated as the inclusion-exclusion principle: μ(YZ)=μ(Y)+μ(Z)μ(YZ) \mu(Y \cup Z) = \mu(Y) + \mu(Z) - \mu(Y \cap Z) (Y,ZXY, Z \subseteq X). Such a μ\mu is also called a valuation.

Or, since the mean value is restricted to lie in {0,1}\{0, 1\}, you might as well just record those subsets YXY \subseteq X for which μ(Y)=1\mu(Y) = 1. Let’s write 𝒰={YX:μ(Y)=1}, \mathcal{U} = \{ Y \subseteq X: \mu(Y) = 1\}, and say that a subset YXY \subseteq X is large if Y𝒰Y \in \mathcal{U}, and small if XY𝒰X \setminus Y \in \mathcal{U}. Axioms a and b are then equivalent to:

  • every subset of XX is either large or small, but not both
  • every subset of a small set is small
  • a finite union of small sets is small.

A set 𝒰\mathcal{U} of subsets of XX with these three properties is called an ultrafilter on XX. So, means are the same as ultrafilters.

In all three of the situations that I’ll describe, the basic question is this:

Do nontrivial means exist?

In the current situation, ‘nontrivial’ means the following. There is, for each xXx \in X, a trivial mean μ\mu given by μ(f)=f(x). \mu(f) = f(x). If you’re thinking of ff as a function X{0,1}X \to \{0, 1\}, this is evaluation at xx. If you’re thinking of ff as an element of the product set {0,1} X={0,1}×{0,1}×\{0, 1\}^X = \{0, 1\} \times \{0, 1\} \times \cdots, it’s projection onto the xxth factor. Regarding μ\mu as a measure, it’s the point measure, or Dirac delta, at xx. Regarding μ\mu as an ultrafilter, it’s the principal ultrafilter at xx: a subset is large if and only if it contains xx.

It’s an easy exercise to show that when XX is finite, these are the only means. When XX is infinite, the situation is more interesting. There are nontrivial means, but you can’t construct any: you have to invoke some form of the Axiom of Choice. (I’ll assume Choice throughout.) So the answer to the question is:

No if XX is finite; yes if XX is infinite.

Amenable groups

Now let’s try the same thing, but giving XX the structure of a group. The mean should respect the group structure in the following sense. Given xXx \in X, write τ x:X X y xy \begin{aligned} \tau_x: X& \to &X \\ y & \mapsto &x y \end{aligned} for left translation by XX. Then we require

c.  μ(fτ x)=μ(f)\mu(f \circ \tau_x) = \mu(f), for all f:X{0,1}f: X \to \{ 0, 1 \} and xXx \in X.

Thinking in terms of measure, this says that the measure of a subset is the same as the measure of each of its translates.

But there’s an instant problem. For example, taking X=ZX = \mathbf{Z}, we have μ(2Z)=μ(1+2Z) \mu(2\mathbf{Z}) = \mu(1 + 2\mathbf{Z}) —the even and odd numbers have the same measure. But the disjoint union of 2Z2\mathbf{Z} and 1+2Z1 + 2\mathbf{Z} is Z\mathbf{Z}, which has measure 11. So 2μ(2Z)=12\mu(2\mathbf{Z}) = 1. Similarly, nμ(nZ)=1n\mu(n\mathbf{Z}) = 1 for any positive integer nn, and if XX is a finite group of order nn then nμ({x})=1n\mu(\{x\}) = 1 for all xXx \in X. All of these things are impossible if mean values are restricted to lie in {0,1}\{0, 1\}.

So, we make the question more interesting by allowing μ\mu to take values in the real interval [0,1][0, 1]. Precisely: a mean on a group XX is a function μ:{0,1} X[0,1] \mu: \{0, 1\}^X \to [0, 1] satisfying axioms a, b and c. A group is amenable if it admits at least one mean. (There are no ‘trivial’ means to be excluded.)

The word ‘amenable’ is kind of a pun. In ordinary English the word means ‘willing’ or ‘easy to work with’. But said out loud (by me, at least) it sounds like ‘a-mean-able’.

(You might think that there should be separate definitions of left amenable and right amenable, because of the asymmetry in the definition above. But the two are equivalent. Any group XX is isomorphic to its opposite X opX^{op}, the same set with the multiplication reversed. Hence XX is left amenable iff X opX^{op} is left amenable. On the other hand, X opX^{op} is left amenable iff XX is right amenable.)

The theory of amenable groups has been intensively studied. There are a zillion equivalent ways of stating the definition, and there are close and fascinating links to all sorts of topics in algebra, topology and analysis. Here I’ll just say a few simple things.

Every finite group is amenable. In fact, there is exactly one mean on a finite group: it’s the mean in the obvious sense.

Every abelian group is amenable. This is not at all obvious, as far as I know. If you haven’t seen this before, I encourage you to stop and think for a few minutes about how you might prove that Z\mathbf{Z} is amenable: how would you define a mean, or a measure?

From the arguments above, the measure needs to be set up so that any coset of nZn\mathbf{Z} has measure 1/n1/n. What about, say, the set of all primes in Z\mathbf{Z}? You’d imagine that it would probably have measure 00, since the primes get sparser and sparser, but that’s not so clear.

The standard way to ‘construct’ a measure, which I won’t describe, uses the concept of Følner sequence. It’s a non-constructive definition. Among other things, it involves choosing a nonprincipal ultrafilter on the set N\mathbf{N}—in other words, a nontrivial mean on N\mathbf{N}, in the sense of the first part above.

That’s just Z\mathbf{Z}. Showing that every abelian group is amenable usually seems to be proved by bootstrapping your way up from Z\mathbf{Z} to finitely generated abelian groups, and from there to all abelian groups. At one point I tried to discover whether the proof could be streamlined, and made a bit of progress, thanks to nice people at MathOverflow.

The free group on two or more elements is not amenable. It’s a pleasant exercise to show this.

There are some groups whose status is famously unknown. This is so for Thompson’s group FF. Two papers on this question were arXived last year: one claiming to prove that FF is amenable, and the other claiming to prove that it is not. Danny Calegari had an excellent blog post on this affair. We also discussed it a bit here at the Café.

So in this second situation, the existence of means is a very substantial issue.

Arrow’s Theorem

So far we’ve discussed means of numbers. Now we’re going to discuss means of structures. Given a set CC and a bunch of group structures on CC, what is the mean group structure? Given a bunch of topologies on CC, what is the mean topology?

This might sound like madness. But there are some structures for which it is a reasonable question. Arrow’s Theorem concerns the case of total orders.

Here’s a standard way of narrating Arrow’s Theorem. In an election, there are a finite set XX of voters and a finite set CC of candidates. Each voter chooses a total order on CC. The outcome of the election is also to be a total order on CC. Is there any reasonable system for deciding the outcome on the basis of the votes? Arrow’s Theorem says ‘no’. And it’s a very strong ‘no’, in that the interpretation of the word ‘reasonable’ is dramatically generous.

When we come to define ‘reasonable’, there will be very few concessions to equality, either between voters or between candidates. Perhaps men’s votes count for more than women’s, or the government has used its power to rig the system in its favour. But it needn’t be anything reprehensible. Perhaps the voters are the members of a committee and the chair has a casting vote, so that not all voters are equal. Or perhaps the vote is on various options for constitutional change, and the system gives preference to the status quo.

You can find plenty of expositions of Arrow’s Theorem, but what follows is one that I haven’t seen before, using categorical language.

Fix finite sets XX (the voters) and CC (the candidates). Given a set AA, write T(A)T(A) for the set of total orders on AA. A mean or voting system for XX and CC is a function μ:T(C) XT(C) \mu: T(C)^X \to T(C) satisfying axioms i and ii below.

It will be convenient to write X={1,,n}X = \{1, \ldots, n\}, so that an element of T(C) X=T(C) nT(C)^X = T(C)^n may be written as ( 1,, n) (\leq_1, \ldots, \leq_n) where each i\leq_i is a total order on the set CC of candidates.

Total orders have the following key property: when you choose a total order on AA, you also implicitly choose a total order on each subset of AA. So for any subset BAB \subseteq A, there is a restriction map T(A)T(B)T(A) \to T(B). This makes TT into a functor T:P(C) opSet, T: P(C)^{op} \to Set, where P(C)P(C) is the powerset of CC, ordered by inclusion.

First axiom:

i.  μ\mu extends to a natural transformation T() n P(C) op μ Set. T \begin{aligned} &\stackrel{T(-)^n}{\to} & \\ P(C)^{op} &\Downarrow \mu_\bullet & Set. \\ &\stackrel{T\quad}{\to} & \end{aligned}

In other words, there is a natural transformation μ :T() nT\mu_\bullet: T(-)^n \to T whose CC-component is μ:T(C) nT(C)\mu: T(C)^n \to T(C). It’s not hard to show that there can be at most one such extension. (Use the fact that any total order on a subset of CC can be extended to a total order on CC itself.)

But what does this mean? In its traditional formulation, this axiom rejoices in the name of the Independence of Irrelevant Alternatives. (It’s a wonderful name, though in my opinion it’s hard to beat the splendour of Sylvester’s Law of Inertia.) Suppose there is an election with three candidates: C={a,b,c}C = \{ a, b, c \}. For some reason the election has to be re-run, with the same voters and the same candidates. In between the two elections, candidates aa and bb do nothing interesting, but candidate cc makes outrageous statements. The effect is that in the second election, no voter changes their opinion on whether aa is better than bb, but some voters change their opinion about cc. Independence of Irrelevant Alternatives says that in the outcomes, aa and bb are placed in the same order both times. (Candidate cc is the ‘irrelevant alternative’.)

Briefly put: the axiom says that the voting system is compatible with restricting to subsets of the candidates. This is why it has something to do with naturality.

There is a canonical natural transformation T() n P(C) op Δ Set T \begin{aligned} &\stackrel{T(-)^n}{\to} & \\ P(C)^{op} &\Uparrow \Delta & Set \\ &\stackrel{T\quad}{\to} & \end{aligned} —the diagonal. Its component Δ A:T(A)T(A) n\Delta_A: T(A) \to T(A)^n at ACA \subseteq C sends T(A)\leq \in T(A) to (,,)T(A) n(\leq, \ldots, \leq) \in T(A)^n.

Second axiom: if everyone puts the candidates in the same order, then the outcome of the election also puts the candidates in that order. That is:

ii. μ Δ=1 T\mu_\bullet \circ \Delta = 1_T.

This is not the axiomatization of voting systems to be found in most statements of Arrow’s Theorem, but unless I’m mistaken, it’s equivalent.

There are trivial means or voting systems: dictatorships. For any xX={1,,n}x \in X = \{1, \ldots, n\}, there is a mean μ\mu given by μ( 1,, n)= x. \mu(\leq_1, \ldots, \leq_n) = \leq_x. The corresponding natural transformation μ \mu_\bullet is the xxth projection, which is obviously a one-sided inverse of Δ\Delta.

Do nontrivial means exist?

If there are only two candidates, yes. For example, choose any percentage p>0p > 0, and declare the first candidate to be the winner iff they are preferred by p\geq p percent of the electorate. But otherwise, no:

Theorem (Arrow)  Let XX and CC be finite sets with |X|> 0|X| >  0 and |C|> 2|C| >  2. Then every mean for XX and CC is trivial.

I’ve interpreted this as a theorem about voting, and that’s the most common interpretation. But I have a couple of reservations about that.

The first is that Arrow’s Theorem is, in a way, too strong. If you talk to someone in the street about what a fair voting system is, they’ll probably expect things about all voters having an equal voice and all candidates being treated equally. None of this comes into Arrow’s Theorem. It’s qualitative rather than quantitative.

Now, there are interesting quantitative aspects of voting theory, such as Condorcet’s paradox. Suppose that there are three candidates, aa, bb and cc, and three voters, 11, 22 and 33, who vote as follows: 1: a>b>c 2: b>c>a 3: c>a>b. \begin{aligned} 1: &a > b > c \\ 2: &b > c > a \\ 3: &c > a > b. \end{aligned} Most voters put aa above bb, so the final outcome should do so too. But also, most voters put bb above cc, and most voters put cc above aa. So the outcome should put aa above bb above cc above aa. Zut alors! Clearly the only fair outcome ranks aa, bb and cc equally, an option that’s not available: the outcome has to be a total order.

But this difficulty has nothing to do with Arrow’s Theorem, since the hypotheses there don’t demand any kind of numerical fairness; they don’t demand that majority opinion prevails.

The second reservation is that real voting systems very seldom fit the hypotheses.

The input is unrealistic: as a voter, I’ve never had to place a total order on a list of candidates. Either I’ve simply had to put an X in a box, or I’ve had to write numbers in boxes—but there’s always been the option of leaving some boxes blank. In the latter case, it’s a total order on a subset of the set of candidates. I’m sure there are real situations in which you do have put a total order on the whole set of candidates, but it’s by no means the only possibility available.

The output is also unrealistic. In most political elections, a predetermined number kk of candidates will be elected, and the rest will not. (Often k=1k = 1.) All that’s necessary is a fair way of deciding who the top kk candidates are. There’s no need to come up with an order relation.

Postscript: voting systems and enriched categories

The problem of finding good voting systems has been studied at great length. For example, a quick browse reveals a bunch of impossibility theorems related to Arrow’s: the Gibbard–Satterthwaite Theorem, the Duggan–Schwartz Theorem, the Holmström Theorem, … . I know nothing about all this. But perhaps there’s something to be gained by thinking about means of structures.

How do I want to be able to vote? Not by imposing a total order. There are probably a few candidates that I want to put near the top, and a few about which I know nothing, and a few I actively oppose and want to put near the bottom. For some pairs of candidates I have an opinion about who is better; for some I’d say they’re about the same; for some, I don’t know enough to compare them. So what I want is to impose a preorder. (A preorder is a reflexive transitive relation; it is a partial order if abaa \leq b \leq a implies a=ba = b.)

In the definition above of mean or voting system, we used just one key property of total orders: a total order on a set induces a total order on each subset, in a functorial way. Preorders also have this key property. So we can replace the total orders functor TT by the preorders functor OO throughout, and ask, again:

Do nontrivial means exist?

As usual, I have to say what ‘trivial’ means. There is for each nonempty subset KXK \subseteq X (which I like to think of as a cabal) a mean μ K:O(C) nO(C)\mu_K: O(C)^n \to O(C), defined as follows. Let ( 1,, n)O(C) n(\leq_1, \ldots, \leq_n) \in O(C)^n, and write =μ K( 1,, n). \leq = \mu_K(\leq_1, \ldots, \leq_n). Then ab iff a kbfor allkK. a \leq b \quad \text{ iff } \quad a \leq_k b \quad\text{for all} \quad k \in K. Call these the trivial means. They’re maybe not that trivial: e.g. the one in which K=XK = X is fair in every way, but in a large population it will very likely put the discrete (antichain) order on the set of candidates, making it useless.

Conjecture 1  For |C|> 2|C| >  2, every mean is trivial.

(If there is an infinite number n=|X|n = |X| of voters, there are nontrivial means got by choosing nontrivial filters on XX; but we are supposing that XX is finite.)

Conjecture 2  Conjecture 1 has already been settled.

This stuff is well worked-over!

But this still isn’t really how I want to be able to vote. I don’t only want to be able to preorder the candidates; I want to be able to say how much I prefer this candidate to that one. Visualizing my ordered set as a Hasse diagram, I want to put lengths on the edges.

For example, the UK has a white supremacist party called the British National Party. If there’s ever a BNP candidate in an election that I’m voting in, I want not only to put that candidate at the bottom of the diagram: I want to assign a length of \infty to all the edges going down to that bottom vertex. (And I reckon that if that possibility was available, far-right parties would find much less success in elections. The strong antipathy that most people have towards them would be taken into account by the voting system, which at present only allows us to ignore them.)

So, for each pair (a,b)(a, b) of candidates, I assign a non-negative real number d(a,b)d(a, b), specifying how much I prefer aa to bb. That is, I place on the set CC of candidates a metric in the sense of Lawvere: a function d:C×C[0,] d: C \times C \to [0, \infty] satisfying d(a,a)=0d(a, a) = 0 and d(a,b)+d(b,c)d(a,c)d(a, b) + d(b, c) \geq d(a, c).

A metric on a set determines a metric on every subset. So we can define ‘mean’ in the usual way, imitating the case of total orders. And we can ask, yet again:

Do nontrivial means exist?

And the answer is emphatically yes! There’s even one that’s fair, in every sense of the word that I can think of. At the election, each voter chooses a metric on CC. Writing M(C)M(C) for the set of metrics on CC, this gives an element (d 1,,d n)M(C) n. (d_1, \ldots, d_n) \in M(C)^n. Now define dM(C)d \in M(C) by d(a,b)=1n(d 1(a,b)++d n(a,b)). d(a, b) = \frac{1}{n}(d_1(a, b) + \cdots + d_n(a, b)). What could be fairer than that?

Of course, this is a bit tongue-in-cheek. In real elections, not everyone manages to successfully put an X in a box; so it could be argued that it is too much to expect every citizen to specify a Lawvere metric. And you can imagine the candidates standing around the day after the election, saying “OK, the voters have metrized us. Now: who gets to be president?”

But ignoring this…

…the theory here is really about enriched categories. Preorders are categories enriched in the two-element ordered set 0<10 &lt; 1. Metric spaces (in this generalized sense) are categories enriched in the ordered set ([0,],)([0, \infty], \geq). In both cases, some monoidal category V\mathbf{V} has been fixed, and each voter chooses a V\mathbf{V}-category structure on CC (as object-set). And enriched categories have the key property: a V\mathbf{V}-category structure on a set determines a V\mathbf{V}-category structure on every subset.

So, let V\mathbf{V} be a monoidal category. Fix finite sets X={1,,n}X = \{1, \ldots, n\} (voters) and CC (candidates).

Given a set AA, write E(A)E(A) for the set of V\mathbf{V}-category structures on the object-set AA. Then EE is a functor E:P(C) opSet E: P(C)^{op} \to Set in a natural way.

The diagonal functor δ:VV n\delta: \mathbf{V} \to \mathbf{V}^n is lax (indeed, strict) monoidal, so induces a functor VCatV nCat\mathbf{V}&ndash;Cat \to \mathbf{V}^n&ndash;Cat. A V n\mathbf{V}^n-category structure on a set AA is just an nn-tuple of V\mathbf{V}-category structures on AA, so there is an induced natural transformation E() n P(C) op Δ Set E \begin{aligned} &\stackrel{E(-)^n}{\to} & \\ P(C)^{op} &\Uparrow \Delta & Set \\ &\stackrel{E\quad}{\to} & \end{aligned} A mean for XX and CC is a natural transformation μ:E() nE\mu: E(-)^n \to E such that μΔ=1 E\mu \circ \Delta = 1_E.

For both the examples above—preorders and metric spaces—I used the following method to construct the means. Suppose that we have a lax monoidal functor m:V nV m: \mathbf{V}^n \to \mathbf{V} such that mδ=1 Vm \circ \delta = 1_\mathbf{V}. Then, just as the functor δ\delta induced the natural transformation Δ\Delta, the functor mm induces a natural transformation μ:E() nE\mu: E(-)^n \to E. And functoriality gives μΔ=1 E\mu \circ \Delta = 1_E: so μ\mu is a mean.

When V=(0<1)\mathbf{V} = (0 &lt; 1), such an mm is a filter on XX (like an ultrafilter, but without the axiom that every subset is either large or small). For example, every subset KXK \subseteq X determines a filter m Km_K: a subset YXY \subset X is large (i.e. m K(χ X)=1m_K(\chi_X) = 1) iff YKY \supseteq K. The resulting mean is the ‘cabal’ mean μ K\mu_K defined above.

(Using finiteness of XX, you can show that these are the only filters on XX. But that doesn’t close off the possibility that there are other means on XX, not arising in this way. That’s Conjecture 1.)

When V=[0,]\mathbf{V} = [0, \infty], such an mm is an order-preserving function m:[0,] n[0,]m: [0, \infty]^n \to [0, \infty] satisfying m(x 1,,x n)+m(y 1,,y n)m(x 1+y 1,,x n+y n) m(x_1, \ldots, x_n) + m(y_1, \ldots, y_n) \geq m(x_1 + y_1, \ldots, x_n + y_n) and m(x,,x)=xm(x, \ldots, x) = x. The resulting mean μ\mu is given by μ(d 1,,d n)=d \mu(d_1, \ldots, d_n) = d where d(a,b)=m(d 1(a,b),,d n(a,b)) d(a, b) = m(d_1(a, b), \ldots, d_n(a, b)) (a,bCa, b \in C). Among these mms are those for which the inequality is an equality, and they are precisely the weighted means in the usual real-number sense: functions of the form m:(x 1,,x n)p 1x 1++p nx n m: (x_1, \ldots, x_n) \mapsto p_1 x_1 + \cdots + p_n x_n where p i0p_i \geq 0 and ip i=1\sum_i p_i = 1. And among these is the standard mean, with p i=1/np_i = 1/n. This gives the fair voting system μ\mu described above.

I have no idea how you might take the mean of sets, or of vector spaces, or of objects of other popular enriching categories V\mathbf{V}.

Posted at August 7, 2010 4:00 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2254

20 Comments & 1 Trackback

Re: Means

Two unrelated comments:

(1) I don’t have the link, but I think there is some related discussion a while back over on Tao’s blog comparing Arrow’s theorem to (non)existence of nontrivial ultrafilters.

(2) Another concern when designing a voting system is whether it creates an incentive for voters to misrepresent their opinions. For example, in your metric voting, where the computer takes the mean, any party that I have ranked infinitely lower than another party will not win. If I am a partisan, I might reasonably decide that I’d rather no party win, and rank everyone except my party infinitely low, with the understanding that someone else will do the same for their party. This leads to a situation, if I’m understanding your algorithm correctly, in which there is no government. Note that if someone else does as I say, then I should as well, and conversely if no one else does, then I might as well, because it makes my party win. So voters are forced to play a Prisoners’ Dilemma game. An example of where parties make such decisions is the US Congress.

Posted by: Theo on August 7, 2010 5:28 PM | Permalink | Reply to this

Re: Means

Hi Theo,

(1) Thanks. I hadn’t seen that blog entry of Tao’s, but I guess you mean this one. I enjoyed the beginning part, where he muses in general terms about what makes hard analysis so difficult. I haven’t got much further yet, but I browsed ahead and saw that he was making connections between ultrafilters and voting.

(2) Yes, I think you do understand correctly. Suppose that for each pair (a,b)(a, b) of distinct candidates, there exists a voter who infinitely prefers aa to bb. (This is likely in a large population.) Then under the ‘fair’ system that I described, the final outcome is the metric dd on the set of candidates in which d(a,b)=d(a, b) = \infty for all aba \neq b. So the government would have to be everyone or no one.

This highlights the difference between a fair system (one in which all voters and all candidates are treated equally) and what might be called a faithful system (one that faithfully reflects the wishes of the voters). To take an extreme example: there is a system that always ranks all the candidates equally, no matter how people vote. This is fair but wholly unfaithful: even if the entire electorate votes Smith for president, the presidency will be shared.

One fix is to have voters measure their preference on a scale of 00 to 11, say. (I’m not sure that the preferences should then be called ‘distances’.)

An example of where parties make such decisions is the US Congress.

Can you explain what you mean by this?

Posted by: Tom Leinster on August 7, 2010 6:37 PM | Permalink | Reply to this

Re: Means

This is going to be one of those posts that describes a slightly different problem to what you’ve described, but which there’s a small chance may be a source of ideas for your domain…

Part of the motivation for an average statistic is to be in some sense a typical representative of a (sometimes implicit) ensemble. As you’ve noted, one of the things about a generalised arithmetic mean is that its attractiveness basically depends on relative distances being constant in the presence of global transformations. Part of the motivation for other notions of average is to avoid this issue. Another issue is that sometimes pairs of elements may not have an “universally accepted” distance (eg, is dd((wealthy,healthy),(wealthy,unhealthy)) bigger or smaller than dd((wealthy,healthy),(poor,healthy))?), which is again a problem for a mean.

There’s a strong interest in the machine learning and decision making under uncertainty theory communities (and from those who want to make decisions now in the presence of uncertainty rather than just procrastinate, eg, ecologists, etc) in the question of what one can conclude from a poset (probably some techniques only require a pre-order, but that’s not seen as important) that one views as incomplete knowledge of a definite one of its linear extensions (ie, total order). In particular, two definitely known elements always generate a constraint (called a “comparable relation” in mathematical posets, but generally called a “constraint” in computer science) in the poset, otherwise you may get a constraint or may not depending if the uncertiainty overlaps or not. Then one can ask questions about its “set of linear extensions” that one hopes one can use as a surrogate for knowledge of the “unknown true total order”. (This is not strictly related to voting, although clearly a plethora of voting schemes, none contradicting Arrow’s theorem, could be constructed from these elements.)

To pick a bonkers example, suppose you’re at a wine shop and have got a list of wine prices, some of which are given as ranges (eg, 50-60 pounds: you don’t know what you’ll be charged, but it’ll be in that range). You follow the buying rule that the top 10 prices are for gullible rich people, but from then on price pretty much correlates with quality. Then you might be interested in what the most probable (ie, mode average) price to be 11th in the list is given your uncertain knowledge. Or if you’re buying 5 bottles of wine you might want to know what the most probable (ie, mode average) set of elements to be in positions 11-15 is. (Or you might just say `Screw this, I’m going to a transparently priced wine shop’, except no-one ever seems to do that in contrived `illustrative examples’.) In both of these cases, one can straightforwardly define the mode over the set of all linear extensions of the poset, and indeed could define the expectation (although without knowing if it satisfies invariance properties, or indeed what sort of invariance is desirable.) What may make this useful in practice is that a particular question of a poset with a given degree of uncertainty may have an “average” answer which is much more “representative” than the pairwise uncertainties suggests. (As a contrived example, consider a poset where two elements AA and BB have identically uncertain values and all other elements have exact values less than the minimum value of AA. Then the question “give me the average set of 2 highest valued elements” is more representative than “give me the average highest valued element”.)

There’s two ways in which “measure” can come from a poset. There’s the classical “structural” measure denoted by the “constraints on comparable elements” in the poset. It looks like, given a pairwise probability of each relative ordering between incomparable elements, treating each one independently gives a consistent probability distribution over the linear extensions of the poset. With this one can attempt to use various Monte Carlo sampling techniques to generate independent samples from the linear extensions and hence approximate any quantity that can be expressed in terms of the probability distribution. (Incidentally, all of these seem to assume that the probability/measure of a given linear extension comes from the poset in this way. If by any chance anyone is aware of any work on looking at probabilistic models over linear extensions of posets where each individual linear extension has a contribution to its probability that comes from some arbitrary function (ie, black box), I’d be very interested to know about it.)

One example reference to this sort of stuff is

Ranking with Uncertain Scores, Mohamed A. Soliman and Ihab F. Ilyas

although a google search with terms like “poset”, “linear extension”, “ranking” and “CDF” will bring up lots of others.

Of course, all this lacks both the theoretical guarantees of the axiom based construction and linearity of the mean. Taking your final line in the sense above rather than the categorical framework you’re looking at, as a really wild, untested thought one might be able to define modes and other probability based statistics over any structure which one can regard as describing the uncertain knowledge about some other structure. For example, I’m sure that you’re familiar with all the possible parses of “time flies like an arrow”, which one could store as a tree where nodes represent “this letter string with this interpretation” and branching denotes different interpretations in remaining parts. One could then ask “what is the mode average part of speech of flies?” and get “noun” (I think). (Again, to stress it’s clear this is a different notion of a an average to your looking at the mean.) Likewise one might be able to formulate averages of elements of those bitstreams which would produce a given Lempel-Ziv-Welch codebook.

Posted by: bane on August 8, 2010 9:21 AM | Permalink | Reply to this

Re: Means

Oops: even after proofing, some mistakes:

In both of these cases, one can straightforwardly define the mode over the set of all linear extensions of the poset

should extend “derived from your knowledge of the prices and price ranges”.

There’s the classical “structural” measure denoted by the “constraints on comparable elements” in the poset.

should be

There’s the classical “structural” measure implicitly embodied by the “constraints on comparable elements” in the poset.

Posted by: bane on August 8, 2010 9:37 AM | Permalink | Reply to this

Re: Means

First, this is beautiful.

A minor quibble: It is an overstatement to say that the Condorcet Paradox has nothing to do with Arrow’s Theorem. The standard proof of Arrow’s Theorem (at least the one I teach my students) has the Condorcet Paradox at its core.

(We start as follows: With the voters’ preferences you’ve listed, assume WLOG that b ranks above a in the associated social ordering; then prove a lemma that says that says that b will continue to rank above a in the social ordering for any triple of input orderings such that voter 2 ranks b above a. Summarize this by saying that voter 2 is a “dictator for b over a”. Now argue that if voter 2 is a dictator for b over a, he must also be a dictator for c over a and for b over c, and hence (by repeating the argument) for any pair of candidates at all. This proves the theorem with 3 voters; the general case consists of a reduction to this case.)

Because your argument is essentially Arrow’s (though in much more beautiful language) I think it must also make essential use, at some level, of the Condorcet Paradox. I’ll try to think about the best way to make this explicit.

Posted by: Steven E. Landsburg on August 10, 2010 3:44 PM | Permalink | Reply to this

standrad citations, before Arrow; Re: Means

Weisstein, Eric W. “Social Choice Theory.” From MathWorld–A Wolfram Web Resource.

“The theory of analyzing a decision between a collection of alternatives made by a collection of n voters with separate opinions. Any choice for the entire group should reflect the desires of the individual voters to the extent possible.”

“Fair choice procedures usually satisfy anonymity (invariance under permutation of voters), duality (each alternative receives equal weight for a single vote), and monotonicity (a change favorable for X does not hurt X). Simple majority vote is anonymous, dual, and monotone. May’s theorem states a stronger result.”

Taylor, A. Mathematics and Politics: Strategy, Voting, Power, and Proof. New York: Springer-Verlag, 1995.

Young, S. C.; Taylor, A. D.; and Zwicker, W. S. “Counting Quota Systems: A Combinatorial Question from Social Choice Theory.” Math. Mag. 68, 331-342,

Oh, and see also:

A tight quantitative version of Arrow’s impossibility theorem

Nathan Keller
(Submitted on 20 Mar 2010)

Abstract: The well-known Impossibility Theorem of Arrow asserts that any Generalized Social Welfare Function (GSWF) with at least three alternatives, which satisfies Independence of Irrelevant Alternatives (IIA) and Unanimity and is not a dictatorship, is necessarily non-transitive. In 2002, Kalai asked whether one can obtain the following quantitative version of the theorem: For any $\epsilon>0$, there exists $\delta=\delta(\epsilon)$ such that if a GSWF on three alternatives satisfies the IIA condition and its probability of non-transitive outcome is at most $\delta$, then the GSWF is at most $\epsilon$-far from being a dictatorship or from breaching the Unanimity condition. In 2009, Mossel proved such quantitative version, with $\delta(\epsilon)=\exp(-C/\epsilon^{21})$, and generalized it to GSWFs with $k$ alternatives, for all $k \geq 3$. In this paper we show that the quantitative version holds with $\delta(\epsilon)=C \cdot \epsilon^3$, and that this result is tight up to logarithmic factors. Furthermore, our result (like Mossel’s) generalizes to GSWFs with $k$ alternatives. Our proof is based on the works of Kalai and Mossel, but uses also an additional ingredient: a combination of the Bonami-Beckner hypercontractive inequality with a reverse hypercontractive inequality due to Borell, applied to find simultaneously upper bounds and lower bounds on the “noise correlation” between Boolean functions on the discrete cube.

Posted by: Jonathan Vos Post on August 10, 2010 7:00 PM | Permalink | Reply to this

Re: Means

Thanks, Steven. I confess that I haven’t done more than skim the proof of Arrow’s Theorem, so it was foolhardy of me to claim that Condorcet’s paradox had nothing to do with it… as you’ve shown.

Posted by: Tom Leinster on August 10, 2010 9:19 PM | Permalink | Reply to this

Re: Means

Interesting! If convex spaces are the right setting for weighted means, should they be seen as forming a more restricted setting than the setting for simple means?

Posted by: David Corfield on August 11, 2010 2:30 PM | Permalink | Reply to this

Re: Means

Could you formulate this question in some more detail? In what sense are weighted means a more ‘restricted’ setting than simple means? By a ‘simple mean’, you mean simply the unweighted average of a couple of numbers?

My personal philosophy is to consider an averaging operation to be defined in terms of the structure of algebra over a suitable monad. For example, the convexity monad describes weighted averages, and its algebras are the convex spaces. So a convex space is a set together with a weighted averages operation.

To put it generally, are there any monads like this behind the different concepts of ‘average’ treated by Tom? (I don’t mean the ultrafilter monad here, although this might be related.)

For example, what about ultrafilters? Well, there’s an obvious candidate monad for interpreting an ultrafilter as an algebra: for any set XX, consider the functor SetSet,AA XSet\rightarrow Set,\: A\mapsto A^X which takes the XX-fold product of a set. An ultrafilter on XX is then in particular an algebra structure on {0,1}\{0,1\} for this functor. There is also an obvious way to turn this functor into a monad, thanks to the diagonal map XX×XX\rightarrow X\times X and the terminal map X{*}X\rightarrow\{\ast\}. These yield the ‘inclusion of constant functions’ map AA XA\rightarrow A^X and the ‘restriction to the diagonal’ map A X×XA XA^{X\times X}\rightarrow A^X. Then the first ultrafilter axiom (reproducing the value of a constant function) is obviously equivalent to compatibility of the algebra structure map with the unit of the monad.

But, unfortunately, when also postulating compatibility with the monad multiplication, then the only algebra structures on {0,1}\{0,1\} of this monad are the principal ultrafilters… so does the above philosophy fail in this case?

Posted by: Tobias Fritz on August 11, 2010 6:29 PM | Permalink | Reply to this

Re: Means

Could you formulate this question in some more detail?

I didn’t have anything very profound in mind, just that in the convex space situation you need an nn-simplex worth of means to average nn elements, whereas in Tom’s case it was a single mean.

Is the mean for metrics on the set of candidates, Tom describes, a point in the simplex of weighted means? Do such metrics form a convex space?

Does

I have no idea how you might take the mean of sets, or of vector spaces, or of objects of other popular enriching categories V\mathbf{V}

have something to do with their not forming convex spaces?

Posted by: David Corfield on August 12, 2010 10:14 AM | Permalink | Reply to this

Re: Means

David wrote, essentially:

Do metrics on a given set form a convex space?

Yes. More than that, the set of metrics on a given set is a convex cone: the sum of two metrics is a metric, and you can multiply a metric by a non-negative scalar to obtain a new metric.

Posted by: Tom Leinster on August 12, 2010 1:29 PM | Permalink | Reply to this

Re: Means

Were we to think about means of spaces, we might look back to considering distances between groupoids, or between ω\omega-groupoids. Regarding the latter, overlap of Postnikov towers might point a way to closeness.

We also considered another approach to defining distances between spaces.

Posted by: David Corfield on August 12, 2010 1:53 PM | Permalink | Reply to this

Re: Means

I said in the post that, given a bunch of metrics on a set CC, taking the ordinary mean gives a new metric on CC. If I understand correctly, David’s question about convex spaces asks whether you can also take weighted means; and yes, you can.

But there’s a whole lot more you can do. As I said in the post, you can take a kind of mean “based” on any function m:[0,] n[0,] m: [0, \infty]^n \to [0, \infty] satisfying m(x 1,,x n)+m(y 1,,y n)m(x 1+y 1,,x n+y n) m(x_1, \ldots, x_n) + m(y_1, \ldots, y_n) \geq m(x_1 + y_1, \ldots, x_n + y_n) and m(x,,x)=xm(x, \ldots, x) = x. Here nn is the number of metrics that we’re trying to combine, and “based” means that we’ll define the mm-mean dd of d 1,,d nd_1, \ldots, d_n by d(a,b)=m(d 1(a,b),,d n(a,b)) d(a, b) = m(d_1(a, b), \ldots, d_n(a, b)) (a,bCa, b \in C).

The ordinary mean comes about by putting m(x 1,,x n)=1n(x 1++x n). m(x_1, \ldots, x_n) = \frac{1}{n}(x_1 + \cdots + x_n). The weighted mean comes about by putting m(x 1,,x n)=w ix i m(x_1, \ldots, x_n) = \sum w_i x_i where w 1,,w nw_1, \ldots, w_n are non-negative weights summing to 11.

But these are just the mms giving equality in the inequality above. So you can do a lot more. Now, the inequality looks very much like one of the axioms for a norm. Indeed, given any norm \Vert\cdot\Vert on R n\mathbf{R}^n, and any non-negative scalars λ i\lambda_i, you can put m(x 1,,x n)=(λ 1x 1,,λ nx n). m(x_1, \dots, x_n) = \Vert (\lambda _1 x_1, \ldots, \lambda_n x_n) \Vert. We were also supposed to have m(x,,x)=xm(x, \ldots, x) = x, which places one constraint on the choice of λ 1,,λ n\lambda_1, \ldots, \lambda_n.

For example, take the pp-norm p\Vert\cdot\Vert_p for some p1p \geq 1. Writing w i=λ i pw_i = \lambda_i^p, m(x 1,,x n) =(w 1 1/px 1,,w n 1/px n) p =((w i 1/px i) p) 1/p =(w ix i p) 1/p \begin{aligned} m(x_1, \ldots, x_n) & = \Vert (w_1^{1/p} x_1, \ldots, w_n^{1/p} x_n) \Vert_p \\ & = (\sum (w_i^{1/p} x_i)^p)^{1/p} \\ & = (\sum w_i x_i^p)^{1/p} \end{aligned} and the constraint m(x,,x)=xm(x, \ldots, x) = x becomes iw i=1\sum_i w_i = 1.

This is the generalized mean or power mean of x 1,,x nx_1, \ldots, x_n, of order pp, weighted by w 1,,w nw_1, \ldots, w_n. The theory of power means is very well-understood; you can find a systematic exposition in the book Inequalities of Hardy, Littlewood and Pólya.

A fair voting system is one in which the weights are equal: w 1==w n=1/nw_1 = \cdots = w_n = 1/n.

So there is a whole family of fair voting systems, indexed by the real numbers p1p \geq 1. The system of order pp takes ‘votes’ (metrics) d 1,,d nd_1, \ldots, d_n and produces the ‘outcome’ dd given by d(a,b)=(1nd i(a,b) p) 1/p. d(a, b) = (\frac{1}{n} \sum d_i(a, b)^p)^{1/p}. In other words: transform the distances d i(a,b)d_i(a, b) by taking the ppth power, then take the ordinary mean, then apply the inverse transformation.

The fair system I mentioned before is the case p=1p = 1.

As you vary pp, you vary the amount of attention that the system pays to mildly held views. For example, the limiting case p=p = \infty is d(a,b)=max id i(a,b). d(a, b) = \max_i d_i(a, b). This system cares only about the strongest opinions. Recall that aa and bb represent candidates, ii runs over the set of voters, and the larger d i(a,b)d_i(a, b) is, the more voter ii prefers candidate aa to candidate bb.

How much attention should a good voting system should pay to mildly, as opposed to strongly, held views? Should strength of opinion matter? In all real-life political voting systems that I know of, it’s not taken to matter at all. But perhaps there are situations where it is taken to matter, and I’m not being imaginative enough in what I think of as a ‘voting system’.

There’s a closely analogous issue in the quantification of biodiversity. Different ecologists have different opinions on how much emphasis should be placed on rare, as opposed to common, species. This causes genuine, non-artificial, disagreement over what the word ‘diversity’ should mean. See, for instance, slides 42–53 and 87 of my talk at Category Theory 2010.

Posted by: Tom Leinster on August 12, 2010 2:23 PM | Permalink | Reply to this

Re: Means

There is a sense in which strength of opinion matters in many real-life political voting systems: wherever voting is not mandatory, you need to care enough to bother to vote. This results in strength of opinion being measured in a very rough way, since it’s reduced to a value in {0,1}\{0,1\}.

Posted by: Mark Meckes on August 12, 2010 2:48 PM | Permalink | Reply to this

Re: Means

True. And there’s a corresponding thing involving biological diversity measures. The simplest and most widespread measure of diversity is species richness, which is just the number of species present. It’s totally insensitive to whether a species is rare or common; all that matters is that there’s at least one. So the contribution of each species to the diversity measure is just a value in {0,1}\{0, 1\}.

On another note, I guess enthusiasts for voting reform must make the argument that if the voting system were better, more people would bother to vote. Where proportional representation isn’t used, many people will judge that their vote can’t change anything. And to go back to my example of extremist parties, there are plenty of people who don’t bother voting because they don’t see much difference between the major political parties, but they might bother if they were able to use their vote to express a strong negative opinion about a party.

Posted by: Tom Leinster on August 13, 2010 10:36 PM | Permalink | Reply to this

Re: Means

Steve Landsburg pointed me to this page and I found the approach very interesting. I just wanted to pass along a couple of references.

I believe Conjectures 1 and 2 are both true. The reference is John Weymark, 1984, “Arrow’s theorem with social quasi-orderings,” Public Choice, link.

Also, it looks like the following paper develops a similar line of thought to the topic: Hans Keiding, 1981, “The categorical approach to social choice theory,” Mathematical Social Sciences, link.

Thanks for the interesting post!

Mark

Posted by: Mark Fey on August 11, 2010 10:08 PM | Permalink | Reply to this

Re: Means

Thanks!

(I’m particularly happy to learn that my conjectures are correct, because I’ve had a long run of conjectures at this blog that have all turned out to be false. I was beginning to think that I’d do better if, whenever I felt tempted to make a public conjecture, I negated it first.)

I see that what I called a ‘cabal’ is close to what’s called an oligarchy in the literature. (Weymark attributes it to a 1969 paper by Gibbard.)

The paper by Keiding looks very interesting, and it does seem to have a lot in common with what I wrote. Actually, I was pleasantly surprised by the paper: it’s written with significantly more categorical expertise than I was expecting. And he seems to get some good theorems out.

Posted by: Tom Leinster on August 14, 2010 12:28 AM | Permalink | Reply to this

Re: Means

I haven’t studied Weymark’s paper in enough detail to be sure just what he is doing, but he seems to be addressing a slightly different case; for example, he appears to require that each voter should specify a complete quasiorder.

In any case, conjecture 1 is false (and we must therefore hope that conjecture 2 is false also). Suppose, for example, that there are 2 voters. Voter 1 knows less than voter 2, but his opinions are easier to obtain. So we always rank aba \leq b if voter 1 strictly prefers bb to aa, but if voter 1 rates aa and bb equally (in the sense that a 1ba \leq_1 b and b 1ab \leq_1 a) then we contact voter 2 and rank aa and bb according to his opinions. It is easy to check that this gives a nontrivial mean.

So means don’t quite correspond to filters on the set of voters. Instead, as it turns out, they correspond to weaker structures I’ll call colanders. Since sets such as {i|a ib}\{i| a \leq_i b\} will be important, I’ll call this set S abS_{ab}. To motivate the defintion of colanders, observe first of all that, in the counterexample above, to decide whether aba \leq b we need to know not only the set S abS_{ab}, but also the set S baS_{ba}. To model this situation a colander will not just be a set of subsets of XX; it will be a set of pairs of subsets of XX. For any colander 𝒞\mathcal{C}, the mean μ 𝒞\mu_{\mathcal{C}}corresponding to 𝒞\mathcal{C} will have aba \leq b iff (S ab,S ba)𝒞(S_{ab}, S_{ba}) \in \mathcal{C}.

With this in mind, consider the typical proof that, given a filter \mathcal{F} on XX, we get a mean by taking aba \leq b iff S abS_{ab} \in \mathcal{F}. Transitivity follows from the fact that, for any aa, bb and cc, S abS bcS acS_{ab} \cap S_{bc} \subseteq S_{ac}. But since also S baS cbS caS_{ba} \cap S_{cb} \subseteq S_{ca}, we still get transitivity if we use the rule specified above for any set of pairs of subsets of XX closed under (pointwise) intersection and (pointwise) extension.

In fact, we can do even better than this. If a ica \leq_i c but a iba \nleq_i b then we must also have b icb \ngeq_i c, and vice versa, so the ways in which (S ac,S ca)(S_{ac}, S_{ca}) can extend (S abS bc,S baS cb)(S_{ab} \cap S_{bc}, S_{ba} \cap S_{cb}) are limited. More precisely, let’s say that (P,P)(Q,Q)(P, P') \sqsubseteq (Q, Q') iff (PP)Q=P(P \cup P') \cap Q = P and (PP)Q=P(P \cup P') \cap Q' = P'. Then we always have (S abS bc,S baS cb)(S ac,S ca)(S_{ab} \cap S_{bc}, S_{ba} \cap S_{cb}) \sqsubseteq (S_{ac}, S_{ca}).

Define a colander to be a set 𝒞\mathcal{C} of pairs of subsets of XX such that

  • (X,X)𝒞(X, X) \in {\mathcal{C}}.
  • (X,)𝒞(X, \emptyset) \in \mathcal{C}.
  • 𝒞\mathcal{C} is closed under (pointwise) intersection.
  • 𝒞\mathcal{C} is upward-closed with respect to \sqsubseteq.

The argument of the last few paragraphs show that the last couple of conditions imply that μ 𝒞\mu_{\mathcal{C}} sends lists of quasiorders to quasiorders. The first couple of conditions imply that μ 𝒞\mu_{\mathcal{C}} satisfies axiom ii, and axiom i trivially holds. Thus for any colander 𝒞\mathcal{C}, μ 𝒞\mu_{\mathcal{C}} is a mean.

Conversely, assuming that there are at least 3 candidates, we can show that any mean arises from a colander in this way. Suppose we have some mean μ\mu. For each aba \neq b in CC, whether μ\mu dictates aba \leq b can only depend on the sets S abS_{ab} and S baS_{ba}. Let 𝒞 ab\mathcal{C}_{ab} be the set of pairs (P,P)(P, P') such that if S ab=PS_{ab} = P and S ba=PS_{ba} = P' then μ\mu dictates that aba \leq b.

The first step is to show that all the sets 𝒞 ab\mathcal{C}_{ab} are equal. To see this, it is enough to show that 𝒞 ab\mathcal{C}_{ab} stays the same when we vary bb (then the same fact about aa follows by symmetry). So suppose aa, bb and bb' are distinct elements of CC. Let PP and PP' be sets and consider the sequence of quasiorders on the set {a,b,b}\{a, b, b'\} in which the iith voter ranks

  • a iba \leq_i b iff iPi \in P.
  • b iab \leq_i a iff iPi \in P'.
  • a iba \leq_i b' iff iPi \in P.
  • b iab' \leq_i a iff iPi \in P'.
  • b ibb \leq_i b' always.
  • b ibb' \leq_i b always.

Then the restriction of μ\mu to {a,b,b}\{a, b, b'\} must send this list to some quasiorder \leq. The last 2 conditions above ensure that bbb \leq b' and bbb' \leq b, so that aba \leq b iff aba \leq b'. But aba \leq b iff (P,P)𝒞 ab(P, P') \in \mathcal{C}_{ab} and aba \leq b' iff (P,P)𝒞 ab(P, P') \in \mathcal{C}_{ab'}. Since (P,P)(P, P') was arbitrary, we may deduce 𝒞 ab=𝒞 ab\mathcal{C}_{ab} = \mathcal{C}_{ab'}.

Since all the 𝒞 ab\mathcal{C}_{ab} are equal, we’ll denote their common value by 𝒞\mathcal{C}. It is now enough to prove that 𝒞\mathcal{C} is a colander. The first two conditions follow immediately from the fact that μ\mu satisfies ii. To show that 𝒞\mathcal{C} is closed under (pointwise) intersection, pick any elements (P,P)(P, P') and (Q,Q)(Q, Q') of 𝒞\mathcal{C} and any three distinct elements aa, bb and cc of CC. Consider the sequence of quasiorders on {a,b,c}\{a, b, c\} in which the iith voter ranks

  • a iba \leq_i b iff iPi \in P.
  • b iab \leq_i a iff iPi \in P'.
  • b icb \leq_i c iff iQi \in Q.
  • c ibc \leq_i b iff iQi \in Q'.
  • a ica \leq_i c iff iPQi \in P \cap Q.
  • c iac \leq_i a iff iPQi \in P' \cap Q'.

Let \leq be the overall quasiorder μ\mu gives in this case. Since (P,P)𝒞(P, P') \in \mathcal{C} we have aba \leq b. Since (Q,Q)𝒞(Q, Q') \in \mathcal{C} we have bcb \leq c. By transitivity, aca \leq c, and so (PQ,PQ)𝒞(P \cap Q, P' \cap Q') \in \mathcal{C}.

Finally, to show that 𝒞\mathcal{C} is upward-closed with respect to \sqsubseteq, let (P,P)𝒞(P, P') \in \mathcal{C}, let (Q,Q)(P,P)(Q, Q') \sqsupseteq (P, P') and let aa, bb and cc be any distinct elements of CC. Consider the sequence of quasiorders on {a,b,c}\{a, b, c\} in which the iith voter ranks

  • a iba \leq_i b iff iPi \in P.
  • b iab \leq_i a iff iPi \in P'.
  • b icb \leq_i c iff iPi \in P.
  • c ibc \leq_i b iff iPi \in P'.
  • a ica \leq_i c iff iQi \in Q.
  • c iac \leq_i a iff iQi \in Q'.

Let \leq be the overall quasiorder μ\mu gives in this case. Since (P,P)𝒞(P, P') \in \mathcal{C} we have aba \leq b and bcb \leq c. By transitivity, aca \leq c, and so (Q,Q)𝒞(Q, Q') \in \mathcal{C}.

Thus 𝒞\mathcal{C} is a colander with μ=μ 𝒞\mu = \mu_{\mathcal{C}}, as required.

I’m not sure whether the set of colanders on XX has a simpler characterisation than the one I gave, even for XX a finite set.

Posted by: Nathan Bowler on August 16, 2010 4:59 PM | Permalink | Reply to this

Re: Means

Nathan wrote:

conjecture 1 is false

so I guess I’d better implement my plan of negating any conjecture I feel tempted to make.

Thanks for settling it, anyway!

Incidentally, what Weymark calls a quasi-ordering and Nathan calls a quasiorder is what we usually call a preorder round these parts. It’s a reflexive transitive relation.

Posted by: Tom Leinster on August 16, 2010 9:20 PM | Permalink | Reply to this

Re: Means

Off the post’s topic but perhaps of interest to Café folk: MathML support has now been turned on in WebKit, the Web browsing engine that runs Safari and Google Chrome.

http://webkit.org/blog/1366/announcing…mathml/

My first impression from looking at posts here is that it doesn’t fully cover MathML or, at least, doesn’t match Firefox’s rendering yet. For example, in Mike Stay’s post on Wick rotation, most necessary math formatting happens, but: it can’t find glyphs for some characters (𝒰, bold capital A); characters that shouldn’t be slanted (blackboard bold R) are; there’s too little horizontal spacing generally and too much to the left of sub/superscripts; display math is pushed to the left instead of centered; and labels to the right of expressions (starting with “unitless parametrization”) are italicized and contain no spaces, like variable names.

This would seem to be the ideal community of MathML advocates to tell Apple, Google, and co. what’s wrong or what you want at bugs.webkit.org.

I’m viewing this on a Mac, with the nightly WebKit build from nightly.webkit.org. Frequent (weekly?) WebKit builds for Windows are also available there. Dev builds of Chrome get WebKit updates on Google’s schedule, not immediately.

(Further offtopic, I have a gadget that converts LaTeX in pages to SVG (for browsers supporting it; PNGs for those that don’t) using John Forkosh’s mathTeX and pydvi2svg: mathcache.appspot.com. Adding SVG support to it was a suggestion that originally came up here. I know y’all have reasons for using MathML; only noting it since the feature was conceived in these here comment threads and y’all do math on the Web.)

Posted by: Randall Farmer on August 18, 2010 3:20 AM | Permalink | Reply to this
Read the post Definitions of Ultrafilter
Weblog: The n-Category Café
Excerpt: Several equivalent definitions of ultrafilter, including a particularly simple one that I'd like to find a reference for.
Tracked: July 2, 2011 11:53 PM

Post a New Comment