Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

September 14, 2010

What Is This Category Enriched In?

Posted by David Corfield

I’ve never felt completely happy with enriched category theory, so perhaps people could help me think through this example.

This question got me wondering again what can be said about a category of conditional probabilities. Let’s take as our objects finite sets, and morphisms to be of the form f:ABf: A \to B, a conditional probability distribution M ij=P(b j|a i)M_{i j} = P(b_j|a_i), that is, a row stochastic matrix. This is an arrow in the Kleisli category for the Giry monad. An equivalent category with column stochastic matrices is described by Tobias Fritz in A presentation of the category of stochastic matrices.

Now Hom(A,B)Hom(A, B) has quite some structure. If |A|=m|A|= m and |B|=n|B| = n, Hom(A,B)Hom(A, B) is the mm-fold product of (n1)(n - 1)-simplices. On each of these simplices, representing the space of probability distributions over BB, there is defined a very natural metric – the Fisher information metric. On Hom(A,B)Hom(A, B) we can then choose the product Fisher metric, as described in Lebanon’s Axiomatic Geometry of Conditional Models.

So now, how do I go about working out what this category may be said to be enriched in? Given MM in Hom(A,B)Hom(A, B) and NN in Hom(B,C)Hom(B, C), there is a composite MNM N in Hom(A,C)Hom(A, C), corresponding to P(c k|a i)= jP(b j|a i)P(c k|b j)P(c_k|a_i) = \sum_j P(b_j|a_i) \cdot P(c_k|b_j). So I think I need to come up with a category which has objects including these Hom spaces, and a monoidal product to cater for the pairs of composable morphisms, such that composition is an arrow in that category.

I take it that the category of convex spaces and convex mappings won’t do, because the composition from pairs of composable morphisms is not convex. How about Riemannian manifolds with corners and whatever the right morphisms for these are?

Posted at September 14, 2010 11:18 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2276

20 Comments & 0 Trackbacks

Re: What Is This Category Enriched In?

This category (actually a small variation described below) is enriched over partially additive monoids. In fact the category can be defined more generally. Start with the category Mes of measurable spaces (sets equipped with a sigma-algebra) with morphisms measurable functions. Instead of Giry’s monad use the small variation where there are all *sub*-probability distributions. The Kleisli category of this is something that is called SRel (stochastic relations). The homsets are partially additive (partial because you cannot add a family of morphisms if the total measure adds up to more than 1) monoids under pointwise sum. Arbib and Manes describe the theory of partially additive categories at some length and this is an example. I have used this for semantics of probabilistic programming languages. [Sorry, I am MathML illiterate, otherwise I would have posted more details.]
Prakash

Posted by: Prakash Panangaden on September 14, 2010 2:25 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Thanks for that Prakash. Your work was what got me started on category theoretic probability theory.

I suppose the question of my post might well have multiple answers. Given your choice of subprobability measures, you are looking, as it were, at the truncated cone of measures, which as you say can be added if sums remain bounded by 1. Others have thought to look at all unnormalized positive measures (the full cone).

Certainly in the latter case a Riemannian metric can still be defined, so I guess that metric can be restricted to your case. So I wonder if we might find a place for that as part of an enrichment story.

Hmm, I wonder what structure is possessed by the collection of monoidal categories for which a given category may be equipped with an enrichment over them.

Posted by: David Corfield on September 14, 2010 2:43 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

David wrote:

Others have thought to look at all unnormalized positive measures (the full cone).

If I were going in this direction I might use the full cone of unnormalized finite positive measures.

First, the extra word doesn’t do anything if you’re working with measures on finite sets, as you did at the start of your post. It only matters in more general situations.

Second, a finite positive measure can be normalized to a probability measure (err, unless it’s zero — I need to think about that).

Third, and most importantly, this idea is the basis of classical statistical mechanics: give me a Hamiltonian function H:XH: X \to \mathbb{R} on a measure space (X,μ)(X,\mu), and with luck the integral

Z=e βHdμ Z = \int e^{-\beta H} d\mu

is finite for a large range of inverse temperatures β=1/T\beta = 1/T. This integral is called the partition function, and it contains vast amounts of interesting physics. The normalized measure

e βHμ/Z e^{-\beta H} \mu / Z

is a probability measure that says how likely it is for the system’s state to lie in any measurable subset of XX at temperature TT. But because there’s so much physical information in ZZ, I’d prefer to treat the finite probability measure

e βH e^{-\beta H}

as fundamental. The idea is that relative probabilities are more basic than probabilities.

This stuff, and its relation to the rig of nonnegative real numbers, +\mathbb{R}^+, is discussed here.

If you go down this road, you’ll get categories enriched over the rig Mat( +)Mat(\mathbb{R}^+). But maybe you don’t want to go down the road?

There are, indeed, several parallel roads here, and it’s good to describe them in parallel. Tobias Fritz has laid out the framework, I think. But it would be great if you explained this stuff clearly to people who don’t have category theory in their veins already.

Posted by: John Baez on September 28, 2010 2:18 AM | Permalink | Reply to this

Re: What Is This Category Enriched In?

That integral for ZZ may be intractable, so people certainly opt for the unnormalized version sometimes. See, e.g., LeCun et al. A Tutorial on Energy-Based Learning.

The general point I was wondering about was what’s to be gained by construing a category as enriched. What kinds of thing does it make you look for?

On specifics, I’d still like to understand whether the Fisher information metric fits into the enriched story. On the other hand, I’ve just come across people using a different kind of metric on probability spaces. Sriperumbudur et al. use an integral probability metric. This distance between two distributions, PP and QQ, is the maximum over functions from a specified class of the difference between the expectations of a function according to PP and according to QQ.

γ (P,Q)=supf| MfdP MfdQ|. \gamma_{\mathcal{F}}(P, Q) = \underset{f \in \mathcal{F}}{sup}|\int_M f d P - \int_M f d Q|.

That reminds me of distances in Connes’ noncommutative geometry.

Posted by: David Corfield on September 28, 2010 11:21 AM | Permalink | Reply to this

Re: What Is This Category Enriched In?

There is a huge literature in probability dealing with metrics on probability measures of the kind you just described. In fact, almost all of the popular metrics are of precisely that kind. Here are just a couple of the best-known examples:

If ={I (,x]x}\mathcal{F} = \{I_{(-\infty,x]} \mid x \in \mathbb{R}\}, where I AI_A is the indicator function of a set AA, then γ \gamma_\mathcal{F} is the classical Kolmogorov metric, which is probably the most widely used way of measuring similarity of probability distributions (cf. the Kolmogorov-Smirnov test in statistics).

If ={f:fC(),f 1}\mathcal{F} = \{f:\mathbb{R} \to \mathbb{R} \mid f\in C(\mathbb{R}), \Vert f \Vert_\infty \le 1\}, then γ (P,Q)=2sup{P(A)Q(A)Aismeasurable}\gamma_\mathcal{F}(P,Q) = 2\sup \{ P(A) - Q(A) \mid A \subseteq \mathbb{R} is measurable\} is the total variation metric, which is the same as the metric induced by viewing the space M()M(\mathbb{R}) of finite signed regular Borel measures as the dual of (C(), )(C(\mathbb{R}),\Vert \cdot \Vert_\infty).

I don’t know anything about noncommutative geometry, but it doesn’t surprise me that something similar arises there, because as the last example shows, this kind of distance arises naturally in functional analysis.

All this isn’t necessarily so different from information-theoretic ways of comparing distributions. In particular, the relative entropy H(P|Q)H(P | Q) can be expressed as H(P|Q)=sup{log(f)dPfdQ1}.H(P|Q) = \sup \{ \int \log (f) dP \mid \int f dQ \le 1\}. I don’t know off-hand if there is a similar formula for mutual information but I bet there is.

Posted by: Mark Meckes on September 28, 2010 2:38 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

For the comparison of this distance formula with noncommutative geometry, see A View on Optimal Transport from Noncommutative Geometry.

As briefly mentioned here (page 20 and beyond), I think that the right framework for this kind of formula is the one of “convex Chu space”. I would like to know whether there are examples of convex Chu spaces where neither of the two convex sets is a set of probability measures. Noncommutative geometry gives examples of this kind, but is there anything interesting beyond?

Posted by: Tobias Fritz on September 28, 2010 5:28 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Thanks for the reference! Looking at the abstract, I see that I should have included another important example above: if \mathcal{F} is the class of functions with Lipschitz constant 1\le 1, then γ \gamma_\mathcal{F} is the Wasserstein distance (also known as the Kantorovitch-Rubinstein distance, among other names).

Posted by: Mark Meckes on September 28, 2010 6:12 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

In the paper I mentioned above, they list 7 different choices for \mathcal{F} and then introduce an eighth type, where \mathcal{F} is the unit ball in a reproducing kernel Hilbert space. This idea came from people who use kernels in machine learning.

Posted by: David Corfield on September 29, 2010 12:14 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Yes, that’s perfectly fine, so that we get a whole bunch of metrics on the set of probability measures.

In noncommutative geometry though, one doesn’t deal with probability measures any more, but rather with states on some noncommutative C *C^*-algebra. In particular, the space of states ceases to be “classical”, meaning that its elements are not merely convex combinations or integrals of extremal points. So this is an example of what I was looking for: something where both the space of states and the space of functions (your \mathcal{F}) are non-classical in this sense.

Posted by: Tobias Fritz on September 29, 2010 1:56 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

I guess above you meant categories enriched over the rig +\mathbb{R}^+?

It might be worth mentioning that there are some attempts at reconstructing quantum mechanics from some physically reasonable axioms in terms of categories enriched over +\mathbb{R}^+, as I just happened to learn from Lluis Masanes. The corresponding quantum-mechanical category is the one where objects are natural numbers, and the morphisms between nn and mm are the completely positive trace-nonincreasing maps between matrix algebras M n()M_n(\mathbb{C}) and M m()M_m(\mathbb{C}); these are also known as quantum operations. This category is the quantum analog of what Prakash described above. Now given any category enriched over +\mathbb{R}^+, which properties does it need to satisfy in order to be equivalent to the category of quantum operations, and what is the operational interpretation of these properties? Hardy’s recent paper is going in this direction.

Posted by: Tobias Fritz on September 28, 2010 5:16 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Prakash, do you mean you don’t know how to get maths in your comments? If so then you should just select “itex to MathML with parbreaks” in the Text Filter and then just pretend you’re writing latex. So

One famous equation is $e^{2\pi i} = 1$.

becomes

One famous equation is e 2πi=1e^{2\pi i}=1.

Posted by: Simon Willerton on September 14, 2010 5:01 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Thanks Simon, If the next line typesets correctly: e π*i=1e^{\pi*i} = -1 then I will be able to type maths into blogs. Thanks again Prakash
Posted by: Prakash on September 15, 2010 8:19 AM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Not all blogs; the software behind this blog is not widely used. You can also use LaTeX (with a different trick that I forget offhand) on wordpress blogs.

Posted by: Mark Meckes on September 15, 2010 12:00 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

On wordpress blogs, you must use the word “latex” immediately after the opening dollar sign (no space!):

$latex [math here]$

Posted by: Mike Stay on September 16, 2010 11:07 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Here is a very crazy way to get this category too; if anyone can relate this to something more standard, let me know.

You start with dagger symmetric monoidal category C, then take dagger special commutative Frobenius algebras (X, d : X -> X (x) X, e : X -> I) to be the objects of a new category, with the morphisms of type X -> Y those f : X -> Y in C satisfying:

1. the morphism

(X (x) d_Y^dag) o (X (x) f (x) Y) o (d_X (x) Y) : X (x) Y -> X (x) Y

is positive, ie, it factors as g^dag o g for some g: C -> X (x) Y

2. f is normalized, ie, e_Y o f = e_X

If you take C to be the category of finite dimensional complex Hilbert spaces then you obtain the category of stochastic matrices.

The purpose of this construction was to get classical probabilities out of quantum amplitudes; details are here .

Posted by: bob on September 14, 2010 3:50 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Well, if I’m not mistaken this category is enriched over convex spaces – one just needs to choose the monoidal product right!

For a start, consider the category Mat(R)Mat(R) of matrices over a commutative ring RR (or a commutative rig RR). The obvious monoidal product on the category of RR-modules is the tensor product, and tensor product modules are those universally representing bilinear mappings. Then since matrix multiplication is RR-bilinear, Mat(R)Mat(R) is clearly enriched in RR-modules, right?

The present case is an instance of this if one takes RR to be a (commutative) generalized ring: then convex spaces are the modules of the generalized ring, and there is a tensor product defined in terms of universal objects for `biconvex’ maps. The multiplication of stochastic matrices is such a `biconvex’ map. Therefore, the category of stochastic matrices is enriched over convex spaces with tensor product.

Posted by: Tobias Fritz on September 15, 2010 11:03 AM | Permalink | Reply to this

Re: What Is This Category Enriched In?

Ah, OK, good. Two thoughts then:

1) This convex structure of Hom(A|B)={P(a i|b j)|Pisaconditionalprobabilitydistribution}Hom(A|B) = \{P(a_i|b_j)| P is a conditional probability distribution\} rests on a notion of straight line which is a geodesic for only one element of a one-parameter family of affine connections compatible with the Fisher information metric. It is the mm-affine connection mentioned at information geometry. Another famous connection is the ee-affine connection, where geodesics lie in exponential families. The ee- and mm- connections are dual to each other. Corresponding distances are the Kullback-Leibler and inverse Kullback-Leibler divergence. The Expectation-maximization algorithm can be seen as alternating projections along such geodesics.

Can we have convexity for other geodesics, especially for the ee-connection?

2) What does showing a category to be enriched gain you? Are there natural questions you then ask?

Posted by: David Corfield on September 15, 2010 12:42 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

You mean that the geodesics of one of these connections would yield non-standard convex combinations? Unfortunately no: the affinely parametrized geodesics of a Riemannian manifold define a convex space structure only if the manifold is flat. This is easy to prove by looking at a small embedded triangle in the manifold and using the associativity axiom of convex spaces. So if one wants to retain the associativity axiom, then obtaining a convex space only works for Riemannian manifolds which are isometric to convex subsets of Euclidean n\mathbb{R}^n.

Posted by: Tobias Fritz on September 16, 2010 5:45 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

The sort of thing to understand is Rodriguez’s A Geometric Theory of Ignorance:

The extended space 𝒫˜\tilde\mathcal{P} is δ\delta-flat and δ\delta-convex for all δ[0,1]\delta \in [0,1]. The space of all the discrete distributions on a finite set of atoms is also intrinsically δ\delta-flat for all δ\delta. Exponential family models are intrinsically (0,1)(0,1)-flat,

and Snoussi’s The geometry of prior selection

Embedding the model 𝒬\mathcal{Q} in the whole space of finite measures 𝒫˜\tilde\mathcal{P} not only the space of probability distributions 𝒫\mathcal{P}, many results can be proven easily for the main reason that 𝒫˜\tilde\mathcal{P} is δ\delta-flat and δ\delta-convex δ\forall \delta in [0 , 1], whereas, 𝒫\mathcal{P} is δ\delta-flat for only δ\delta = {0, 1} and δ\delta-convex for δ\delta = 1.

Posted by: David Corfield on September 16, 2010 8:30 PM | Permalink | Reply to this

Re: What Is This Category Enriched In?

It seems that this may be related to recent work here in Nijmegen.

The Kleisli category has the structure of what is called a convex category by Jacobs.

Moreover, Prakash’s partially additive structure seem to arise through effect algebras. Jacobs and Mandemakers recently developed a totality operation on such effect algebras which may also be relevant here.

Bas

Posted by: Bas Spitters on September 16, 2010 3:09 PM | Permalink | Reply to this

Post a New Comment