Common Applications
Posted by David Corfield
I’ve been reading some of Jorg Lemm’s papers in recent days. He’s written a book - Bayesian Field Theory - which I don’t have access to, but he had written a paper of the same name earlier. In it (page 6, note 1) he remarks that:
statistical field theories, which encompass quantum
mechanics and quantum field theory in their Euclidean formulation, are technically
similar to a nonparametric Bayesian approach.
It is intriguing to see so many constructions of mathematical physics - mean field methods, diffusion models, free energy - find a use in learning theory. But what to make of it? If we think it needs an explanation at all, we might say that perhaps it’s telling us that we only have a limited number of tools, so should expect to use them time and again. If we were washed up on a desert island with just a knife in our pocket, we’d find a host of uses for it, with little in common between them, e.g., opening a clam and sharpening a stick.
David Ruelle favoured this kind of explanation about multiple application in “Is our mathematics natural? The case of equilibrium statistical mechanics.” Bull. Amer. Math. Soc. 19, 259-268 (1988). Our minds have a limited repertoire, which explains why mathematicians keep bumping into the same constructions. Closer to this blog, a similar question is why the deeper reaches of number theory (Langlands programme) and quantum field theory (duality) are so closely related. In Mathematics in the 20th Century, Michael Atiyah’s predictions for the 21st century went thus:
What about the 21st century? I have said the 21st century might be
the era of quantum mathematics or, if you like, of infinite dimensional
mathematics. What could this mean? Quantum
mathematics could mean, if we get that far, ‘understanding properly
the analysis, geometry, topology, algebra of various non-linear
function spaces’, and by ‘understanding properly’ I mean
understanding it in such a way as to get quite rigorous proofs of all
the beautiful things the physicists have been speculating about.
This work requires generalising the duality between position and momentum in classical mechanics:
This replaces a space by its dual space, and in linear
theories that duality is just the Fourier transform. But in non-linear theories, how to replace a Fourier transform is one of the big challenges. Large parts of
mathematics are concerned with how to generalise dualities in nonlinear
situations. Physicists seem to be able to do so in a remarkable way in their string
theories and in M-theory…understanding those non-linear dualities does seem to
be one of the big challenges of the next century as well. (Atiyah 2002: 14-15, my emphasis)
Again, is this just a sign of our limited repertoire? (Perhaps Atiyah might have said that the problem is also how to categorify such dualities.)
A second strain of explanation for multiple application of a piece of mathematics, on the other hand, is that the things it is applied to really are similar. It is no accident that the same tools work in different situations when the tasks are very similar. With regards to commonalities between Bayesian statistics and physics, Edwin Jaynes would favour this latter explanation. Recently this has been expressed by Caticha in The Information Geometry of Space and Time:
The point of view that has been prevalent among scientists is that the laws
of physics mirror the laws of nature. The reflection might be imperfect, a
mere approximation to the real thing, but it is a reflection nonetheless. The
connection between physics and nature could, however, be less direct. The laws
of physics could be mere rules for processing information about nature. If this
second point of view turns out to be correct one would expect many aspects
of physics to mirror the structure of theories of inference. Indeed, it should
be possible to derive the “laws of physics” appropriate to a certain problem
by applying standard rules of inference to the information that happens to be
relevant to the problem at hand.
Noting that statistical mechanics and quantum mechanics can be largely constructed by considering them as ways of manipulating information, Caticha goes on to take on general relativity.
Now, John Baez raises an interesting example of commonality of structure in this comment, between natural selection and Bayesian inference. I could imagine explanations by both of the strains above.
- The structure of Bayes’ theorem (which doesn’t require you to be a Bayesian to use) is a very simple one relevant in many combinatorial situations, which is how we like to think about the world.
- Evolution is a kind of learning.
Posted at December 21, 2006 3:14 PM UTC
TrackBack URL for this Entry: http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1084
Re: Common Applications
Interesting observartions.
A tiny comment:
I think everybody will agree that the general pattern of statistical mechanics is indeed about more about inference than about nature per se. But at some point you want to apply all this to a particular case. Usually this amounts to specifying a Hamiltonian function.
And the precise details of that function is what encodes information about nature.
So there is a bit of information about nature - encoded in a Hamiltonian - and then there are means to extract certain parts of that information (entropy maximization, etc.).
Interestingly, while quantum mechanics is in a way nothing but statistical mechanics analytically continued to the complex plane, we usually tend to regard not just the Hamiltonian in quantum mechanics as encoding information about nature, but also the rest of the formalism.
Whether that “rest of the formalism” is really just a manifestation of our thinking or a genuine aspect of nature is hotly debated in all those discussions concerning the “interpretation of quantum mechanics”.
Re: Common Applications
In connection with Fuchs’ viewpoint, see this discussion by Ray Streater, entitled “Locality in the EPR experiment”. It starts this way:
I. The von Neumann collapse postulate
In this section, we show that the postulate of von Neumann, that on measurement the wave function collapses to an eigenstate of the observable being measured, follows from Bayes’s rule for conditioning probabilities in classical probability.
Re: Common Applications
See the following, which cites Caticha’s paper:
Re: Common Applications
I love this discussion. It always brings me joy to see someone mention Jaynes as well :)
I’m now thinking about this stuff from a completely different perspective, i.e. finance and economics, but I think similar arguments carry over (as unobvious as that may sound). Jaynes similarly applied his ideas to economics. I think he was a visionary and will someday be recognized as such by a larger audience. I only wish I had stumbled onto his work earlier so that I might have met him.
Best regards,
Eric
Re: Common Applications
In connection with the application of constructions of mathematical physics to machine learning, as well as Bayesianism, I thought some recent work of Shalizi and Crutchfield should be mentioned:
Pattern Discovery and Computational Mechanics
Computational mechanics is a method for discovering, describing and quantifying patterns, using tools from statistical physics. It constructs optimal, minimal models of stochastic processes and their underlying causal structures. These models tell us about the intrinsic computation embedded within a process—how it stores and transforms information. Here we summarize the mathematics of computational mechanics, especially recent optimality and uniqueness results. We also expound the principles and motivations underlying computational mechanics, emphasizing its connections to the minimum description length principle, PAC theory, and other aspects of machine learning.
Shalizi’s views on Bayesian approaches (to use a crude cover term) are interesting; I think he can be fairly described as a skeptic, as indicated here, here, and here (Pet peeves: Physicists who do not distinguish between a random variable (“X = the roll of a die”) and the value it takes (“x=5”). People who report numbers without error-bars or confidence-intervals. Bayesians.)
Read the post
Ubiquitous Duality
Weblog: The n-Category Café
Excerpt: I'm in one of those phases where everywhere I look I see the same thing. It's Fourier duality and its cousins, a family which crops up here with amazing regularity. Back in August, John wrote: So, amazingly enough, Fourier duality...
Tracked: January 11, 2007 2:18 PM
Re: Common Applications
A discussion of complexity by Murray Gell-Mann, summarizing material in his book The Quark and the Jaguar, echoes the connections made in this post. Note in particular the emphasis on the relevance of a particular notion of regularities in this excerpt:
A measure that corresponds much better to what is usually meant by complexity in ordinary conversation, as well as in scientific discourse, refers not to the length of the most concise description of an entity (which is roughly what AIC is), but to the length of a concise description of a set of the entity’s regularities. Thus something almost entirely random, with practically no regularities, would have effective complexity near zero. So would something completely regular, such as a bit string consisting entirely of zeroes. Effective complexity can be high only a region intermediate between total order and complete disorder.
There can exist no procedure for finding the set of all regularities of an entity. But classes of regularities can be identified. Finding regularities typically refers to taking the available data about the entity, processing it in some manner into, say, a bit string, and then dividing that string into parts in a particular way and looking for mutual AIC among the parts. If a string is divided into two parts, for example, the mutual AIC can be taken to be the sum of the AIC’s of the parts minus the AIC of the whole. An amount of mutual algorithmic information content above a certain threshold can be considered diagnostic of a regularity. Given the identified regularities, the corresponding effective complexity is the AIC of a description of those regularities.
More precisely, any particular regularities may be regarded as embedding the entity in question in a set of entities sharing the regularities and differing only in other respects. In general, the regularities associate a probability with each entity in the set. (The probabilities are in many cases all equal but they may differ from one member of the set to another.) The effective complexity of the regularities can then be defined as the AIC of the description of the set of entities and their probabilities. (Specifying a given entity, such as the original one, requires additional information.)
I can’t resist quoting from another of Gell-Mann’s essays posted at SFI, entitled “Nature Conformable To Herself”, which is highly relevant to the topic of David’s post:
To answer those questions, we need to deal first with the widespread notion that all scientific theory is nothing but a set of constructs with which the human mind attempts to grasp reality, a notion associated with the German philosopher Immanuel Kant. Although I had heard of that belief many times, I first came into collision with it thirty-six years ago in Paris.
At that time, I was a visiting professor at the Collège de France, founded by Francis I more than four hundred years earlier. (As far as I know, I was the first visiting professor in the history of that venerable institution.) My office was in the laboratory of experimental physics established by Francis Perrin, a well-known scientist who was a permanent professor at the Collège. On visits to the offices of the junior experimentalists down the hall, I noticed that they spent a certain amount of time drawing little pictures in their notebooks, which I assumed at first must be diagrams of experimental apparatus. Many of the drawings turned out, however, to be sketches of a gallows for hanging the vice-director of the lab, whose rigid ideas drove them crazy.
I soon got to know the sous-directeur, and we conversed on various subjects, one of which was Project Ozma, an early attempt to detect possible signals from other technical civilizations on planets orbiting nearby stars. The corresponding project nowadays is called the Search for Extraterrestrial Intelligence. We discussed how communication might take place if alien intelligences broadcasting signals were close enough to the solar system, assuming that both interlocutors would have the patience to wait years for the signals to be transmitted back and forth. I suggested that we might try beep, beep-beep, beep-beep-beep, etc. to indicate the numbers 1, 2, 3, and so forth, and then perhaps 1, 2, 3,…42, 44…..60, 62…….92, for the atomic numbers of the 90 chemical elements that are stable—1 to 92 except for 43 and 61. “Wait,” said the sous-directeur, “that is absurd. Those numbers up to 92 would mean nothing to such aliens…. Why, if they have 90 stable chemical elements as we do, then they must also have the Eiffel Tower and Brigitte Bardot.”
That is how I became acquainted with the fact that French schools taught a kind of neo-Kantian philosophy, according to which the laws of nature are nothing but Kantian “categories” used by the human mind to describe reality.
Read the post
Category Theoretic Probability Theory
Weblog: The n-Category Café
Excerpt: Having noticed (e.g., here and here) that what I do in my day job (statistical learning theory) has much to do with my hobby (things discussed here), I ought to be thinking about probability theory in category theoretic terms....
Tracked: February 7, 2007 11:57 AM
Re: Common Applications
Interesting observartions.
A tiny comment:
I think everybody will agree that the general pattern of statistical mechanics is indeed about more about inference than about nature per se. But at some point you want to apply all this to a particular case. Usually this amounts to specifying a Hamiltonian function.
And the precise details of that function is what encodes information about nature.
So there is a bit of information about nature - encoded in a Hamiltonian - and then there are means to extract certain parts of that information (entropy maximization, etc.).
Interestingly, while quantum mechanics is in a way nothing but statistical mechanics analytically continued to the complex plane, we usually tend to regard not just the Hamiltonian in quantum mechanics as encoding information about nature, but also the rest of the formalism.
Whether that “rest of the formalism” is really just a manifestation of our thinking or a genuine aspect of nature is hotly debated in all those discussions concerning the “interpretation of quantum mechanics”.