Category Theoretic Probability Theory II
Posted by David Corfield
Encouraged by the comment of a statistician as eminent as Phil Dawid, I shall continue with what category theory has to say about probability theory. Recall that we were considering a monad on the category of measurable spaces, . For a number of reasons, researchers prefer to work with , the category of Polish spaces, i.e., separable metric spaces for which a complete metric exists. We have an endofunctor , which sends a space, , to the space of probability measures on the Borel sets of . is equipped with the weakest topology which makes continuous for any , a bounded, continuous, real function on . We also need a map :
Now, there are interesting developments of this idea, such as Abramsky et al.’s Nuclear and Trace Ideals in Tensored ∗-Categories, which looks to work Giry’s monad into a context even more closely resembling the category of relations, but let’s look about to see if we can make contact with some probability theory. Well, a natural kind of thing to do would be to work out the algebras for the monad. And this is just what Ernst-Erich Doberkat does in Characterizing the Eilenberg-Moore Algebras for a Monad of Stochastic Relations. What you’re looking for are measurable maps between and , such that the ‘fibres’ are convex and closed, and such that , the delta distribution on , is in the fibre over . And there’s another condition which requires a compact subset of to be sent to a compact subset of .
Now, as ever, will support an algebra, . This is the analogue of a free group being an algebra of the group monad. But just as there are many interesting groups which are not free, we should want to find algebras of Giry’s monad which are not of the form. Doberkat shows that for such an algebra must be connected, and suggests this example
.
I haven’t worked out the details, but I have the suspicion that this might be . After all, probability measures on are just binomial, parameterised by .
The other example he gives has a bounded, closed and convex subset of , and probability measures being sent to their barycentre. It wouldn’t surprise me if this were the paradigmatic example of an algebra for the Giry monad. It subsumes the example above.
Finding centres of gravity also occurs in Bayesian information geometry. As shown on page 4 of Hichem Snoussi and Ali Mohammad-Djafari’s Information geometry and prior selection, given a prior distribution over probability distributions, a best estimator for the generating distribution after data has been received is expressible as the barycentre of the posterior distribution. But as we’re working in , this barycentre only makes sense with respect to a geometry on it. Now, the ‘sensible’ geometries form a 1-dimensional family, although only for one member of this family, corresponding to the Kullback-Liebler divergence, is the geometry of convex, which I guess is what we need for an algebra. Perhaps this could be fixed, otherwise it’s yet another point for the KL-divergence.
Update: Doberkat has a longer article on Eilenberg-Moore algebras of the Giry monad as item 5 here. (Unfortunately, the monograph ‘Stochastic Relations: Foundations for Markov Transition Systems’ doesn’t appear to be available.) There are two monads being treated here, one which sends a Polish space to the space of all probability measures, the other to the space of all subprobability measures. The extra structure relating to these monads, is that of a (positive) convex structure. In the case of a convex structure, this intuitively captures the idea that a weighted sum of points in the space has barycentre within the space.
Doberkat’s work relates to the category of Polish spaces with continuous maps. He notes that it would be interesting to develop the theory for the general case of Borel measurable maps.
Posted at February 13, 2007 6:37 PM UTC
TrackBack URL for this Entry: http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1163
Re: Category Theoretic Probability Theory II
Nice.. I wish I would have read this before.. I basically created my own formulation of these types of ‘stochastic categories’ (they are really determinstic!) as I was creating ways to analytically solved infinite-dimensional problems ‘exactly’ and without numerical methods except at the final stage of evaluating.. had to use loads of measure theory, capacities, lifting maps, harmonic and complex analysis, the Weil topology, analytical number theory and some other stuff.. I absolutely hate reading this stuff abstractly if no concrete examples are given.. give me some concrete generating functions and something I can *compute*! But, now that I have actual concrete realizations of things I can go back and find out what abstract concept they implement, sort of a backwards approach but it works.
Re: Category Theoretic Probability Theory II
Something intriguing about this monad perspective is that it fits neatly with a Bayesian outlook. Bayesians like to think in terms of distibutions over distributions and form averages. For example, ask me about a coin you’re about to toss, and, so long as I take all exchangeable sequences (permutations) to be equally likely, I can represent my belief in a distribution over , the binomial parameter. Ask me for my probability that heads will be tossed and I’ll calculate the expectation of that distribution.
I said
Now, the ‘sensible’ geometries form a 1-dimensional family, although only for one member of this family, corresponding to the Kullback-Liebler divergence, is the geometry of convex, which I guess is what we need for an algebra. Perhaps this could be fixed, otherwise it’s yet another point for the KL-divergence.
One way out is to observe that for , where parameterises the family of geometries, the space of subprobability measures is -convex. Hence the map taking a distribution over subprobability measures to its barycentre with respect to one of these geometries constitutes an Eilenberg-Moore algebra for the subprobability monad.
Re: Category Theoretic Probability Theory II
This monadic perspective is exactly how some people model probability distributions in the programming language Haskell. There’s at least one Haskell library that does this. Of course in Haskell we’re dealing with finite spaces and so the definition of μ there involves only a finite summation. But in essence it’s the same thing.
Re: Category Theoretic Probability Theory II
If you’ll permit a question from a humble cognitive scientist…
In the article “The converse of a stochastic relation” Doberkat seems to be reinventing Bayesian inference in the category theoretic setting. Indeed, the first equation in that paper is (i think) just Bayes’ rule. Yet he never mentions Bayesian learning… so am I missing something here?
If not, to what extent is the Giry monad the “right” structure to describe the logic of Bayesian induction? Should we Bayesians be describing our models in the internal language of one of the categories associated with the Giry monad? If so, which and what does that language it look like?
Read the post
Category Theory in Machine Learning
Weblog: The n-Category Café
Excerpt: Does category theory have a future in machine learning?
Tracked: September 5, 2007 4:03 PM
Read the post
Progic
Weblog: The n-Category Café
Excerpt: My colleague here in Canterbury Jon Williamson is part of an international research group, progicnet, whose aim is to find a good integration of probability theory and first-order logic. For one reason or another, some technical projects get counted...
Tracked: September 18, 2007 11:38 AM
Re: Category Theoretic Probability Theory II
Roland Friedrich and I were thinking on Giry-Lawvere monad and the variants thereof in few meetings in last two years. We never thought enough through to finish a research but we realized that there are many interesting twins of the Giry monad, for example when one considers projection valued measures instead of scalar valued and then one sees some connections to the subject of coherent states, the subject I was earlier working on in connection to quantum groups. We checked few new analytic details which are more nontrivial when the measure is valued in projective operators rather than scalars, but I do not have any complete account of this. We discussed many other measure like concepts which could benefit from this approach.
For newcomers all that business of the Giry monad is an elaborate version of thinking via characteristic functions like in the primitive case of power set monad, and also delta functions, thus the idea has to do with classifying objects on one side and with measure theoretic concepts on analysis side, and also of reproducing kernels in coherent states and evolution story business. Giry was a student of Lawvere and she wrote an elaboration of an earlier preprint of Lawvere and published it as
Giry, Michèle
A categorical approach to probability theory. Categorical aspects of topology and analysis (Ottawa, Ont., 1980),
pp. 68–85,
Lecture Notes in Math., 915, Springer.
MR83k:60007
There are some analytic incorrectnesses there I was told. Prof. Voevodsky gave some talk at IHES where he had approached some biology-motivated stochastics via a refined Giry approach, I was told by Roland who got then interested in. A priori I think the approach is not giving something deep, but it is an interesting source of analogies. There are many papers after Giry, mainly in the computer science literature. Roland drew my attention to Doberkat’s paper as one of very few which are actually good among the mentioned papers. Though good part of the paper is only a survey.
Re: Category Theoretic Probability Theory II
Nice.. I wish I would have read this before.. I basically created my own formulation of these types of ‘stochastic categories’ (they are really determinstic!) as I was creating ways to analytically solved infinite-dimensional problems ‘exactly’ and without numerical methods except at the final stage of evaluating.. had to use loads of measure theory, capacities, lifting maps, harmonic and complex analysis, the Weil topology, analytical number theory and some other stuff.. I absolutely hate reading this stuff abstractly if no concrete examples are given.. give me some concrete generating functions and something I can *compute*! But, now that I have actual concrete realizations of things I can go back and find out what abstract concept they implement, sort of a backwards approach but it works.