### Category Theoretic Probability Theory II

#### Posted by David Corfield

Encouraged by the comment of a statistician as eminent as Phil Dawid, I shall continue with what category theory has to say about probability theory. Recall that we were considering a monad on the category of measurable spaces, $Meas$. For a number of reasons, researchers prefer to work with $Pol$, the category of Polish spaces, i.e., separable metric spaces for which a complete metric exists. We have an endofunctor $P$, which sends a space, $X$, to the space of probability measures on the Borel sets of $X$. $P(X)$ is equipped with the weakest topology which makes $\tau \mapsto \int_{X}f d\tau$ continuous for any $f$, a bounded, continuous, real function on $X$. We also need a map $\mu_{X}: P(P(X)) \to P(X)$:

$\mu_X (M)(A) := \int_{P(X)} \tau(A) M(d\tau)$

Now, there are interesting developments of this idea, such as Abramsky et al.’s Nuclear and Trace Ideals in Tensored ∗-Categories, which looks to work Giry’s monad into a context even more closely resembling the category of relations, but let’s look about to see if we can make contact with some probability theory. Well, a natural kind of thing to do would be to work out the algebras for the monad. And this is just what Ernst-Erich Doberkat does in Characterizing the Eilenberg-Moore Algebras for a Monad of Stochastic Relations. What you’re looking for are measurable maps between $P(X)$ and $X$, such that the ‘fibres’ are convex and closed, and such that $\delta_{x}$, the delta distribution on $x$, is in the fibre over $x$. And there’s another condition which requires a compact subset of $P(X)$ to be sent to a compact subset of $X$.

Now, as ever, $P(X)$ will support an algebra, $\mu_{X}: P(P(X)) \to P(X)$. This is the analogue of a free group being an algebra of the group monad. But just as there are many interesting groups which are not free, we should want to find algebras of Giry’s monad which are not of the $\mu_{X}$ form. Doberkat shows that for such an algebra $X$ must be connected, and suggests this example

$h: P([0, 1]) \ni \tau \mapsto \int_{0}^{1} t \tau (dt) \in [0, 1]$.

I haven’t worked out the details, but I have the suspicion that this might be $\mu_{\{0, 1\}}$. After all, probability measures on $\{0, 1\}$ are just binomial, parameterised by $p \in [0, 1]$.

The other example he gives has $X$ a bounded, closed and convex subset of $\mathbb{R}^n$, and probability measures being sent to their barycentre. It wouldn’t surprise me if this were the paradigmatic example of an algebra for the Giry monad. It subsumes the example above.

Finding centres of gravity also occurs in *Bayesian information geometry*. As shown on page 4 of Hichem Snoussi and Ali Mohammad-Djafari’s Information geometry and prior selection, given a prior distribution over probability distributions, a best estimator for the generating distribution after data has been received is expressible as the barycentre of the posterior distribution. But as we’re working in $P(X)$, this barycentre only makes sense with respect to a geometry on it. Now, the ‘sensible’ geometries form a 1-dimensional family, although only for one member of this family, corresponding to the Kullback-Liebler divergence, is the geometry of $P(X)$ convex, which I guess is what we need for an algebra. Perhaps this could be fixed, otherwise it’s yet another point for the KL-divergence.

**Update:** Doberkat has a longer article on Eilenberg-Moore algebras of the Giry monad as item 5 here. (Unfortunately, the monograph ‘Stochastic Relations: Foundations for Markov Transition Systems’ doesn’t appear to be available.) There are two monads being treated here, one which sends a Polish space to the space of all probability measures, the other to the space of all subprobability measures. The extra structure relating to these monads, is that of a (positive) convex structure. In the case of a convex structure, this intuitively captures the idea that a weighted sum of points in the space has barycentre within the space.

Doberkat’s work relates to the category of Polish spaces with continuous maps. He notes that it would be interesting to develop the theory for the general case of Borel measurable maps.

## Re: Category Theoretic Probability Theory II

Nice.. I wish I would have read this before.. I basically created my own formulation of these types of ‘stochastic categories’ (they are really determinstic!) as I was creating ways to analytically solved infinite-dimensional problems ‘exactly’ and without numerical methods except at the final stage of evaluating.. had to use loads of measure theory, capacities, lifting maps, harmonic and complex analysis, the Weil topology, analytical number theory and some other stuff.. I absolutely hate reading this stuff abstractly if no concrete examples are given.. give me some concrete generating functions and something I can *compute*! But, now that I have actual concrete realizations of things I can go back and find out what abstract concept they implement, sort of a backwards approach but it works.