Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 2, 2011

Characterizing the Generalized Means

Posted by Tom Leinster

Generalized means are things like arithmetic means and geometric means. They can be ‘fair’, giving all their inputs equal status, or they can be weighted. I guess the first major result on them was the theorem that the arithmetic mean is always greater than or equal to the geometric mean.

Another, later, landmark was the 1934 book Inequalities of Hardy, Littlewood and Pólya, where they proved a characterization theorem for generalized means. It looks like this:

If you have some sort of ‘averaging operation’ with all the properties you’d expect of something called an averaging operation, then there aren’t many options: it must be of a certain prescribed form.

That’s ancient history. It could be even more ancient than Hardy, Littlewood and Pólya: I don’t know whether the characterization in their book is due to them, or whether it’s older still.

Yesterday, however, I posted about a new theorem of Guillaume Aubrun and Ion Nechita that gives a startlingly simple characterization of the pp-norms. Since pp-norms and generalized means are closely related, I wondered, out loud, whether it might be possible to deduce from their result a simple new characterization of generalized means. And if I’m not mistaken, the answer is yes.

To say it more precisely: unless I’ve made a mistake, it’s possible to derive from their result a simple characterization of the generalized means of orders 1\geq 1. (I’ll explain what that means in a moment.) Whether it’s new is harder to know; I’m not familiar enough with the literature to be confident on that.

A quick write-up is here.

Here’s the statement. I’ll write Δ I={w I:w i0,  iw i=1} \Delta_I = \{ w \in \mathbb{R}^I : w_i \geq 0,   \sum_i w_i = 1 \} for the set of probability distributions on a finite set II. For any real p1p \geq 1, wΔ Iw \in \Delta_I and x Ix \in \mathbb{R}^I, we can form the mean of order pp of the numbers x ix_is, weighted by the w iw_is: M p(w,x)=( iw ix i p) 1/p. M_p(w, x) = (\sum_i w_i x_i^p)^{1/p}. There’s a sensible way to do it for p=p = \infty too. These operations M pM_p are the generalized means that we’re going to characterize.

Here we go. A system of means MM consists of a function M:Δ I×[0,) I[0,)M: \Delta_I \times [0, \infty)^I \to [0, \infty) for each finite set II, satisfying:

  • Normalization:  M((1),(c))=cM((1), (c)) = c for all c[0,)c \in [0, \infty). (This tells you what the average of a one-member family is.)
  • Monotonicity:  M(w,x)M(w,y)M(w, x) \leq M(w, y) if x iy ix_i \leq y_i for all ii.
  • Functoriality:  M(fw,x)=M(w,xf)M(f w, x) = M(w, x f) whenever f:IJf: I \to J is a map of finite sets, wΔ Iw \in \Delta_I and x[0,) Jx \in [0, \infty)^J. I haven’t defined the notation fwf w and xfx f, nor have I explained the intuitive idea behind this slick notation; you can figure it out for yourself or click the link.

A system of means is multiplicative if M((v iw j) i,j,(x iy j) i,j)=M(v,x)M(w,y) M((v_i w_j)_{i, j}, (x_i y_j)_{i, j}) = M(v, x) M(w, y) whenever vΔ Iv \in \Delta_I, wΔ Jw \in \Delta_J, x[0,) Ix \in [0, \infty)^I, y[0,) Jy \in [0, \infty)^J. It satisfies the triangle inequality if M(w,x+y)M(w,x)+M(w,y). M(w, x + y) \leq M(w, x) + M(w, y).

For each p[1,]p \in [1, \infty], we have the generalized mean M pM_p of order pp. It’s a multiplicative system of norms satisfying the triangle inequality. And there are no others:

Theorem  Every multiplicative system of means satisfying the triangle inequality is of the form M pM_p for some p[1,]p \in [1, \infty].

Proof here, I hope.

Posted at March 2, 2011 11:59 PM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2366

9 Comments & 0 Trackbacks

Re: Characterizing the Generalized Means

Very nice! I haven’t had a chance to look at the proof yet, but here’s a tiny correction to the post: in the statement of the theorem you wrote “norms” when you meant “means”.

Posted by: Mark Meckes on March 3, 2011 2:35 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Thanks! It was all done in a hurry, so let’s hope it stands up.

Typo fixed too. Talk about spoiling the punchline.

Posted by: Tom Leinster on March 3, 2011 2:44 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Now that you’ve used that startlingly simple characterization of L pL^p norms to get a startlingly simple characterization of generalized means, how about using it to get a startlingly simple characterization of Rényi entropies?

The formula for Rényi entropies looks a bit like the formula for generalized means.

The multiplicativity axiom for a system of means looks like the additivity axiom for Rényi entropy. There’s a logarithm in the definition of Rényi entropy, which converts multiplication to addition. As you’ve pointed out, this isn’t such a big deal.

And if you need the extra flexibility given by those ‘weights’ to prove your theorem about generalize means, maybe you can incorporate them using the already-known idea of ‘relative’ Rényi entropy, also known as the ‘Rényi divergence’. The formula on the link would look more promising if we rewrote it as

R p(w,w)=1p1log iw i(w iw i) pR_p(w,w') = \frac{1}{p - 1} \; \log \sum_i w_i \left(\frac{w'_i}{w_i}\right)^p

which looks quite similar to your

M p(w,x)=( iw ix i p) 1/p. M_p(w, x) = (\sum_i w_i x_i^p)^{1/p}.

if we set x i=w i/w ix_i = w'_i/w_i.

So, I’m pretty optimistic. I hope you do it!

Posted by: John Baez on March 9, 2011 2:40 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Oh. Only now did I read this. My optimism is unshaken, but I guess the idea was already in the air.

Posted by: John Baez on March 9, 2011 3:43 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Thanks for the thoughts, John. Yes indeed. In fact my whole reason for being interested in this is that I’m interested in characterizations of Rényi entropy and divergence. That in turn is because of my interest in measures of diversity. (The formulas in these two posts from a couple of years ago will also make you think of pp-norms, generalized means and Rényi entropy; indeed Rényi entropy is mentioned there.)

Here’s a sticking point. Aubrun and Nechita’s characterization is all about norms, and in order for the pp-norm to be a norm we need pp to be 1\geq 1.

On the other hand, I’m most interested in Rényi entropy when “pp” is a real number 1\leq 1. In the ecological application, we have nn species in proportions w 1,,w nw_1, \ldots, w_n, we have an n×nn \times n matrix ZZ indicating the similarities between species (with entries in [0,1][0, 1]), and we have a parameter q0q \geq 0 reflecting the importance attached to common vs. rare species. The diversity of order qq of the community is D q Z(w)=(w i(Zw) i q1) 1/(1q). D_q^Z(w) = \left( \sum w_i (Z w)_i^{q - 1} \right)^{1/(1 - q)}. Writing x i=1/(Zw) ix_i = 1/(Z w)_i, this is equal to M 1q(w,x). M_{1 - q}(w, x). The “pp” here is 1q1 - q, so being most interested in q[0,]q \in [0, \infty] means being most interested in p[,1]p \in [-\infty, 1].

Negative values of qq are probably also interesting, but I definitely want to be able to handle nonnegative values, which are precisely those excluded by the hypotheses of the Aubrun–Nechita theorem.

So my hope is that Aubrun and Nechita’s argument can be adapted to cover generalized means of order p[,1]p \in [-\infty, 1]. I had a quick try at this, but then other things came along and I had to put it on hold.

Posted by: Tom Leinster on March 9, 2011 4:02 PM | Permalink | Reply to this

Re: Characterizing the Generalized Means

I think I can now weaken the triangle inequality to the convexity condition M(w,(x+y)/2)max{M(w,x),M(w,y)}. M(w, (x + y)/2) \leq max \{ M(w, x), M(w, y) \}. The updated proof is in the pdf.

Posted by: Tom Leinster on March 11, 2011 4:08 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

I put a note on the arXiv.

Posted by: Tom Leinster on March 15, 2011 1:09 AM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Hi Tom,
There is a vast literature on functional equations, and questions such characterizing generalized means have been investigated in depth. I don’t know how this connects to your results, but you might benefit by looking into the books of Janos Aczel, for example (“Functional equations in several variables”, 1989 (with Dhombres), or “Functional equations and their applications”, 1966). There is also a beautiful little book by Einar Hille and the last chapter (if I remember correctly) deals with this. I don’t think it was ever officially published, but a copy can be found in the mathematics library of the Technion, Israel, if you know someone there.
In any case, these are enjoyable reads.
best regards,
Orr

Posted by: Orr Shalit on March 22, 2011 3:44 PM | Permalink | Reply to this

Re: Characterizing the Generalized Means

Hi Orr,

Thanks very much for the references. I’ve been discovering the (many) works of Aczél on functional equations, but I didn’t know about the book with Dhombres, or the book by Hilles.

It’s always difficult for an outsider to a subject to quickly discover whether a result is new or not. Aubrun and Nechita’s use of probabilistic methods seems to be quite unlike anything I’ve seen in the work of Aczél and co, but then again, most of that work is decades old.

I suppose it’s possible that A and N’s characterization of the p-norms is new but my characterization of the power means isn’t, though it seems slightly unlikely. But as you say, there are already several known characterizations of the power means out there — and some of them are very nice.

Posted by: Tom Leinster on March 24, 2011 2:11 PM | Permalink | Reply to this

Post a New Comment