Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

November 6, 2006

Infinite-Dimensional Exponential Families

Posted by David Corfield

Back on my old blog I posted a few times on information geometry (1, 2, 3, 4). One key idea is the duality between projecting from a prior distribution onto the manifold of distributions, a specified set of whose moments match those of the empirical distribution, and projecting from the empirical distribution onto the corresponding exponential family. Legendre transforms govern this duality.

Now, one of the most important developments in machine learning over the past decade has been the use of kernel methods. For example, in the support vector machine (SVM) approach to classification, the data space is mapped into a feature space, a reproducing kernel Hilbert space. A linear classifier is then chosen in this feature space which does the best job at separating points with different labels. This classifier corresponds to a nonlinear decision boundary in the original space. The ‘Bayesian’ analogue employs Gaussian processes (GP).

Putting the two ideas together what we need is nonparametric information geometry, involving projection onto infinite-dimensional exponential families. Various people have worked to find maximal exponential families using Orlicz spaces. But to capture SVM and GP methods it looks like we want a more restricted form of exponential family, where the manifold of models is locally isomorphic to a reproducing kernel Hilbert space. Someone working on this, Kenji Fukumizu, is now in Tübingen, so I’m hoping to learn a few things from him over the next few days.

Information geometry, both finite and infinite-dimensional, has a quantum equivalent, see, e.g., here. Does anyone know what the major achievements of this field are?

Posted at November 6, 2006 11:42 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1021

8 Comments & 2 Trackbacks

Re: Infinite-Dimensional Exponential Families

Hi David,

I remember trying to dig up information about this topic while I was in Tuebingen.
I found some interesting work about this in
Pistone, G. and Sempi, C. (1995), `An infinite dimensional geometric structure on the space of all the probability measures equivalent to a given one’, The Annals of Statistics 33(5), 1543-1561.

This is a tough paper, but seems to be a good starting point (there are other works by the authors going in this direction).

Of course, this does not go in the direction of RKHS, but at least lays down the proper foundations for infinite dimensional information geometry (thus generalizing the stuff in Amari’s book).

Cheers,
Olivier.

Posted by: Olivier Bousquet on November 6, 2006 4:50 PM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

Yes. To explain to others why the need to look for something smaller though: the likelihood function with finite samples is not continuous on the Pistone-Sempi style Banach manifolds of distributions. On the other hand, evaluation at a point is a continuous functional on a reproducing kernel Hilbert space, so it makes sense to form a manifold based on these spaces. There’s still the problem of ill-posedness if you want to use maximum likelihood estimation, as this is to choose an estimator in an infinite-dimensional space using only finitely many constraints. Fukumizu’s solution is to form a series of finite submanifolds of increasing dimension as the number of samples increases.

Posted by: David Corfield on November 7, 2006 10:21 AM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

There’s still the problem of ill-posedness if you want to use maximum likelihood estimation, as this is to choose an estimator in an infinite-dimensional space using only finitely many constraints. Fukumizu’s solution is to form a series of finite submanifolds of increasing dimension as the number of samples increases.
Indeed, it comes down to sampling. I think a decent way to start is to ‘initialize’ some estimatation process with some prior, observe a sample, maximize the likelihood of that sample, then receive another sample, and calculate the conditional measure given the 2nd observation conditional on the 1st observation, 3rd conditional upon 2nd which is conditional upon 1st, etc. Eventually, one would assume that the influence of distant sample diminishes as the samplesize increases so at some point you can drop off samples from the distant past to create some sort of moving probabilistic average? This stuff is more in the realm of Inference theory, than probability. M.M. Rao claims that all of stochastic inference boils down to a two-person zero-sum game where the other player is deemed enigmatically “the nature” as in von Neumann/Morgsten in the Theory of Games and Economic Behavior.. very interesting stuff.

Posted by: Stephen Crowley on May 1, 2007 10:24 PM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

> This stuff is more in the realm of Inference theory, than probability. M.M. Rao claims that all of stochastic inference boils down to a two-person zero-sum game where the other player is deemed enigmatically “the nature” as in von Neumann/Morgsten in the Theory of Games and Economic Behavior

Do you happen to know if Rao’s claim is in any way related to Volodya Vovk’s game theoretic probability? My feeling about this competitive online style of probability is that it doesn’t leave much room for “inference” per se, the “experts” being presupposed

On an unrelated, but hopefully not too provocative note to David: where does one draw the line between “set theoretic dust” (cf your fine book) and “measure theoretic smoosh” (my own perhaps unfair take on this whole categorical probability business)?

Personally I’d love to read of more categorical or logical approaches to the Polya/Jaynes/Cox axioms if anyone knew of any? So far Youssef’s exotic probability references are about the furthest I’ve seen this taken. And on the computer science side, Arnborg’s plausibility of a probability paper provides the most convincing Cox-style derivation of probability I’ve seen.

Posted by: Allan E on May 2, 2007 3:14 AM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

Do you happen to know if Rao’s claim is in any way related to Volodya Vovk’s game theoretic probability? My feeling about this competitive online style of probability is that it doesn’t leave much room for “inference” per se, the “experts” being presupposed
Just looked at Vovk’s stuff, it doesn’t appear to be related. Wald seems to be really the one who developed the statistical decision theory along these lines.

Posted by: Stephen Crowley on May 2, 2007 4:43 PM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

I’m not sure what Rao meant by that statement. Certainly there are minimax results in statistical inference. If I know Nature has chosen a fixed unknown distribution and gives me an iid sample, how do I ensure that I achieve the least loss whatever this distribution?

There’s certainly a link between Vovk’s work and Peter Grünwald’s work on Minimum Description Length and minimum regret results. The latter has results like whatever sequence Nature throws at me, I can ensure that I only do so much worse, guessing each time a new data point comes in, than the best of a set of competitor experts, which is selected after all the data is known.

I’m not sure what you mean by ‘this whole categorical probability business’. There have been objections to measure theory as the basic language of probability theory, rather like those against set theory for mathematics. By ‘categorical’ do you mean the category theoretic treatment with monads?

Posted by: David Corfield on May 9, 2007 8:46 AM | Permalink | Reply to this

Re: Infinite-Dimensional Exponential Families

This thesis of Cena is described by Pistone as an improvement of his earlier work.

Posted by: David Corfield on November 7, 2006 11:23 AM | Permalink | Reply to this
Read the post NIPS 2006
Weblog: The n-Category Café
Excerpt: NIPS 2006 conference, Vancouver
Tracked: November 27, 2006 4:27 PM
Read the post Probability in Amsterdam
Weblog: The n-Category Café
Excerpt: Foundations of the Formal Sciences conference in Amsterdam
Tracked: May 1, 2007 9:07 AM

Re: Infinite-Dimensional Exponential Families

Note, if I ever come back to this, Density Estimation in Infinite Dimensional Exponential Families.

Posted by: David Corfield on December 13, 2013 1:13 PM | Permalink | Reply to this

Post a New Comment