Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 2, 2011

Characterizing the Generalized Means

Posted by Tom Leinster

Generalized means are things like arithmetic means and geometric means. They can be ‘fair’, giving all their inputs equal status, or they can be weighted. I guess the first major result on them was the theorem that the arithmetic mean is always greater than or equal to the geometric mean.

Another, later, landmark was the 1934 book Inequalities of Hardy, Littlewood and Pólya, where they proved a characterization theorem for generalized means. It looks like this:

If you have some sort of ‘averaging operation’ with all the properties you’d expect of something called an averaging operation, then there aren’t many options: it must be of a certain prescribed form.

That’s ancient history. It could be even more ancient than Hardy, Littlewood and Pólya: I don’t know whether the characterization in their book is due to them, or whether it’s older still.

Yesterday, however, I posted about a new theorem of Guillaume Aubrun and Ion Nechita that gives a startlingly simple characterization of the p-norms. Since p-norms and generalized means are closely related, I wondered, out loud, whether it might be possible to deduce from their result a simple new characterization of generalized means. And if I’m not mistaken, the answer is yes.

To say it more precisely: unless I’ve made a mistake, it’s possible to derive from their result a simple characterization of the generalized means of orders 1. (I’ll explain what that means in a moment.) Whether it’s new is harder to know; I’m not familiar enough with the literature to be confident on that.

A quick write-up is here.

Here’s the statement. I’ll write Δ I={w I:w i0,  iw i=1} for the set of probability distributions on a finite set I. For any real p1, wΔ I and x I, we can form the mean of order p of the numbers x is, weighted by the w is: M p(w,x)=( iw ix i p) 1/p. There’s a sensible way to do it for p= too. These operations M p are the generalized means that we’re going to characterize.

Here we go. A system of means M consists of a function M:Δ I×[0,) I[0,) for each finite set I, satisfying:

  • Normalization:  M((1),(c))=c for all c[0,). (This tells you what the average of a one-member family is.)
  • Monotonicity:  M(w,x)M(w,y) if x iy i for all i.
  • Functoriality:  M(fw,x)=M(w,xf) whenever f:IJ is a map of finite sets, wΔ I and x[0,) J. I haven’t defined the notation fw and xf, nor have I explained the intuitive idea behind this slick notation; you can figure it out for yourself or click the link.

A system of means is multiplicative if M((v iw j) i,j,(x iy j) i,j)=M(v,x)M(w,y) whenever vΔ I, wΔ J, x[0,) I, y[0,) J. It satisfies the triangle inequality if M(w,x+y)M(w,x)+M(w,y).

For each p[1,], we have the generalized mean M p of order p. It’s a multiplicative system of norms satisfying the triangle inequality. And there are no others:

Theorem  Every multiplicative system of means satisfying the triangle inequality is of the form M p for some p[1,].

Proof here, I hope.

Posted at March 2, 2011 11:59 PM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2366

9 Comments & 0 Trackbacks

Re: Characterizing the Generalized Means

Very nice! I haven’t had a chance to look at the proof yet, but here’s a tiny correction to the post: in the statement of the theorem you wrote “norms” when you meant “means”.

Posted by: Mark Meckes on March 3, 2011 2:35 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Thanks! It was all done in a hurry, so let’s hope it stands up.

Typo fixed too. Talk about spoiling the punchline.

Posted by: Tom Leinster on March 3, 2011 2:44 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Now that you’ve used that startlingly simple characterization of L p norms to get a startlingly simple characterization of generalized means, how about using it to get a startlingly simple characterization of Rényi entropies?

The formula for Rényi entropies looks a bit like the formula for generalized means.

The multiplicativity axiom for a system of means looks like the additivity axiom for Rényi entropy. There’s a logarithm in the definition of Rényi entropy, which converts multiplication to addition. As you’ve pointed out, this isn’t such a big deal.

And if you need the extra flexibility given by those ‘weights’ to prove your theorem about generalize means, maybe you can incorporate them using the already-known idea of ‘relative’ Rényi entropy, also known as the ‘Rényi divergence’. The formula on the link would look more promising if we rewrote it as

R p(w,w)=1p1log iw i(w iw i) p

which looks quite similar to your

M p(w,x)=( iw ix i p) 1/p.

if we set x i=w i/w i.

So, I’m pretty optimistic. I hope you do it!

Posted by: John Baez on March 9, 2011 2:40 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Oh. Only now did I read this. My optimism is unshaken, but I guess the idea was already in the air.

Posted by: John Baez on March 9, 2011 3:43 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Thanks for the thoughts, John. Yes indeed. In fact my whole reason for being interested in this is that I’m interested in characterizations of Rényi entropy and divergence. That in turn is because of my interest in measures of diversity. (The formulas in these two posts from a couple of years ago will also make you think of p-norms, generalized means and Rényi entropy; indeed Rényi entropy is mentioned there.)

Here’s a sticking point. Aubrun and Nechita’s characterization is all about norms, and in order for the p-norm to be a norm we need p to be 1.

On the other hand, I’m most interested in Rényi entropy when “p” is a real number 1. In the ecological application, we have n species in proportions w 1,,w n, we have an n×n matrix Z indicating the similarities between species (with entries in [0,1]), and we have a parameter q0 reflecting the importance attached to common vs. rare species. The diversity of order q of the community is D q Z(w)=(w i(Zw) i q1) 1/(1q). Writing x i=1/(Zw) i, this is equal to M 1q(w,x). The “p” here is 1q, so being most interested in q[0,] means being most interested in p[,1].

Negative values of q are probably also interesting, but I definitely want to be able to handle nonnegative values, which are precisely those excluded by the hypotheses of the Aubrun–Nechita theorem.

So my hope is that Aubrun and Nechita’s argument can be adapted to cover generalized means of order p[,1]. I had a quick try at this, but then other things came along and I had to put it on hold.

Posted by: Tom Leinster on March 9, 2011 4:02 PM | Permalink | Reply to this

Re: Characterizing the Generalized Means

I think I can now weaken the triangle inequality to the convexity condition M(w,(x+y)/2)max{M(w,x),M(w,y)}. The updated proof is in the pdf.

Posted by: Tom Leinster on March 11, 2011 4:08 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

I put a note on the arXiv.

Posted by: Tom Leinster on March 15, 2011 1:09 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Hi Tom,
There is a vast literature on functional equations, and questions such characterizing generalized means have been investigated in depth. I don’t know how this connects to your results, but you might benefit by looking into the books of Janos Aczel, for example (“Functional equations in several variables”, 1989 (with Dhombres), or “Functional equations and their applications”, 1966). There is also a beautiful little book by Einar Hille and the last chapter (if I remember correctly) deals with this. I don’t think it was ever officially published, but a copy can be found in the mathematics library of the Technion, Israel, if you know someone there.
In any case, these are enjoyable reads.
best regards,
Orr

Posted by: Orr Shalit on March 22, 2011 3:44 PM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Hi Orr,

Thanks very much for the references. I’ve been discovering the (many) works of Aczél on functional equations, but I didn’t know about the book with Dhombres, or the book by Hilles.

It’s always difficult for an outsider to a subject to quickly discover whether a result is new or not. Aubrun and Nechita’s use of probabilistic methods seems to be quite unlike anything I’ve seen in the work of Aczél and co, but then again, most of that work is decades old.

I suppose it’s possible that A and N’s characterization of the p-norms is new but my characterization of the power means isn’t, though it seems slightly unlikely. But as you say, there are already several known characterizations of the power means out there — and some of them are very nice.

Posted by: Tom Leinster on March 24, 2011 2:11 PM | Permalink | Reply to this

Post a New Comment