Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

October 22, 2014

Where Do Probability Measures Come From?

Posted by Tom Leinster

Guest post by Tom Avery

Tom (here Tom means me, not him — Tom) has written several times about a piece of categorical machinery that, when given an appropriate input, churns out some well-known mathematical concepts. This machine is the process of constructing the codensity monad of a functor.

In this post, I’ll give another example of a well-known concept that arises as a codensity monad; namely probability measures. This is something that I’ve just written a paper about.

The Giry monads

Write Meas\mathbf{Meas} for the category of measurable spaces (sets equipped with a σ\sigma-algebra of subsets) and measurable maps. I’ll also write II for the unit interval [0,1][0,1], equipped with the Borel σ\sigma-algebra.

Let ΩMeas\Omega \in \mathbf{Meas}. There are lots of different probability measures we can put on Ω\Omega; write GΩG\Omega for the set of all of them.

Is GΩG\Omega a measurable space? Yes: An element of GΩG\Omega is a function that sends measurable subsets of Ω\Omega to numbers in II. Turning this around, we have, for each measurable AΩA \subseteq \Omega, an evaluation map ev A:GΩIev_A \colon G\Omega \to I. Let’s give GΩG\Omega the smallest σ\sigma-algebra such that all of these are measurable.

Is GG a functor? Yes: Given a measurable map g:ΩΩg \colon \Omega \to \Omega' and πGΩ\pi \in G\Omega, we can define the pushforward Gg(π)G g(\pi) of π\pi along gg by

Gg(π)(A)=π(g 1A) G g(\pi)(A') = \pi(g^{-1} A')

for measurable AΩA' \subseteq \Omega'.

Is GG a monad? Yes: Given ωΩ\omega \in \Omega we can define η(ω)GΩ\eta(\omega) \in G\Omega by

η(ω)(A)=χ A(ω) \eta(\omega)(A) = \chi_A (\omega)

where AA is a measurable subset of Ω\Omega and χ A\chi_A is its characteristic function. In other words η(ω)\eta(\omega) is the Dirac measure at ω\omega. Given ρGGΩ\rho \in G G\Omega, let

μ(ρ)(A)= GΩev Adρ \mu(\rho)(A) = \int_{\G\Omega} ev_A \,\mathrm{d}\rho

for measurable AΩA \subseteq \Omega, where ev A:GΩI\ev_A \colon G\Omega \to I is as above.

This is the Giry monad 𝔾=(G,η,μ)\mathbb{G} = (G,\eta,\mu), first defined (unsurprisingly) by Giry in “A categorical approach to probability theory”.

A finitely additive probability measure π\pi is just like a probability measure, except that it is only well-behaved with respect to finite disjoint unions, rather than arbitrary countable disjoint unions. More precisely, rather than having

π( i=1 A i)= i=1 π(A i) \pi\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \pi(A_i)

for disjoint A iA_i, we just have

π( i=1 nA i)= i=1 nπ(A i) \pi\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i=1}^{n} \pi(A_i)

for disjoint A iA_i.

We could repeat the definition of the Giry monad with “probability measure” replaced by “finitely additive probability measure”; doing so would give the finitely additive Giry monad 𝔽=(F,η,μ)\mathbb{F} = (F,\eta,\mu). Every probability measure is a finitely additive probability measure, but not all finitely additive probability measures are probability measures. So 𝔾\mathbb{G} is a proper submonad of 𝔽\mathbb{F}.

The Kleisli category of 𝔾\mathbb{G} is quite interesting. Its objects are just the measurable spaces, and the morphisms are a kind of non-deterministic map called a Markov kernel or conditional probability distribution. As a special case, a discrete space equipped with an endomorphism in the Kleisli category is a discrete-time Markov chain.

I’ll explain how the Giry monads arise as codensity monads, but first I’d like to mention a connection with another example of a codensity monad; namely the ultrafilter monad.

An ultrafilter 𝒰\mathcal{U} on a set XX is a set of subsets of XX satisfying some properties. So 𝒰\mathcal{U} is a subset of the powerset 𝒫X\mathcal{P}X of XX, and is therefore determined by its characteristic function, which takes values in {0,1}I\{0,1\} \subseteq I. In other words, an ultrafilter on XX can be thought of as a special function

𝒫XI. \mathcal{P}X \to I.

It turns out that “special function” here means “finitely additive probability measure defined on all of 𝒫X\mathcal{P}X and taking values in {0,1}\{0,1\}”.

So the ultrafilter monad on Set\mathbf{Set} (which sends a set to the set of ultrafilters on it) is a primitive version of the finitely additive Giry monad. With this in mind, and given the fact that the ultrafilter monad is the codensity monad of the inclusion of the category of finite sets into the category of sets, it is not that surprising that the Giry monads are also codensity monads. In particular, we might expect 𝔽\mathbb{F} to be the codensity monad of some functor involving spaces that are “finite” in some sense, and for 𝔾\mathbb{G} we’ll need to include some information pertaining to countable additivity.

Integration operators

If you have a measure on a space then you can integrate functions on that space. The converse is also true: if you have a way of integrating functions on a space then you can extract a measure.

There are various ways of making this precise, the most famous of which is the Riesz-Markov-Kakutani Representation Theorem:

Theorem. Let XX be a compact Hausdorff space. Then the space of finite, signed Borel measures on XX is canonically isomorphic to

NVS(Top(X,),) \mathbf{NVS}(\mathbf{Top}(X,\mathbb{R}),\mathbb{R})

as a normed vector space, where Top\mathbf{Top} is the category of topological spaces, and NVS\mathbf{NVS} is the category of normed vector spaces.

Given a finite, signed Borel measure π\pi on XX, the corresponding map Top(X,)\mathbf{Top}(X,\mathbb{R}) \to \mathbb{R} sends a function to its integral with respect to π\pi. There are various different versions of this theorem that go by the same name.

My paper contains the following more modest version, which is a correction of a claim by Sturtz.

Proposition. Finitely additive probability measures on a measurable space Ω\Omega are canonically in bijection with functions ϕ:Meas(Ω,I)I\phi \colon \mathbf{Meas}(\Omega,I) \to I that are

  • affine: if f,gMeas(Ω,I)f,g \in \mathbf{Meas}(\Omega,I) and rIr \in I then

ϕ(rf+(1r)g)=rϕ(f)+(1r)ϕ(g), \phi(r f + (1-r)g) = r\phi(f) + (1-r)\phi(g),

and

  • weakly averaging: if r¯\bar{r} denotes the constant function with value rr then ϕ(r¯)=r\phi(\bar{r}) = r.

Call such a function a finitely additive integration operator. The bijection restricts to a correspondence between (countably additive) probability measures and functions ϕ\phi that additionally

  • respect limits: if f nMeas(Ω,I)f_n \in \mathbf{Meas}(\Omega,I) is a sequence of functions converging pointwise to 00 then ϕ(f n)\phi(f_n) converges to 00.

Call such a function an integration operator. The integration operator corresponding to a probability measure π\pi sends a function ff to

Ωfdπ, \int_{\Omega}f \mathrm{d}\pi,

which justifies the name. In the other direction, given an integration operator ϕ\phi, the value of the corresponding probability measure on a measurable set AΩA \subseteq \Omega is ϕ(χ A)\phi(\chi_A).

These bijections are measurable (with respect to a natural σ\sigma-algebra on the set of finitely additive integration operators) and natural in Ω\Omega, so they define isomorphisms of endofunctors of Meas\mathbf{Meas}. Hence we can transfer the monad structures across the isomorphisms, and obtain descriptions of the Giry monads in terms of integration operators.

The Giry monads via codensity monads

So far so good. But what does this have to do with codensity monads? First let’s recall the definition of a codensity monad. I won’t go into a great deal of detail; for more information see Tom’s first post on the topic.

Let U:U \colon \mathbb{C} \to \mathcal{M} be a functor. The codensity monad of UU is the right Kan extension of UU along itself. This consists of a functor T U:T^U \colon \mathcal{M} \to \mathcal{M} satisfying a universal property, which equips T UT^U with a canonical monad structure. The codensity monad doesn’t always exist, but it will whenever \mathbb{C} is small and \mathcal{M} is complete. You can think of T UT^U as a generalisation of the monad induced by the adjunction between UU and its left adjoint that makes sense when the left adjoint doesn’t exist. In particular, when the left adjoint does exist, the two monads coincide.

The end formula for right Kan extensions gives

T Um= c[(m,Uc),Uc], T^U m = \int_{c \in \mathbb{C}} [\mathcal{M}(m,U c),U c],

where [(m,Uc),Uc][\mathcal{M}(m,U c),U c] denotes the (m,Uc)\mathcal{M}(m,U c) power of UcU c in \mathcal{M}, i.e. the product of (m,Uc)\mathcal{M}(m,U c) (a set) copies of UcU c (an object of \mathcal{M}) in \mathcal{M}.

It doesn’t matter too much if you’re not familiar with ends because we can give an explicit description of T UmT^U m in the case that =Meas\mathcal{M} = \mathbf{Meas}: The elements of T UΩT^U\Omega are families α\alpha of functions

α c:Meas(Ω,Uc)Uc \alpha_c \colon \mathbf{Meas}(\Omega, U c) \to U c

that are natural in cc \in \mathbb{C}. For each cc \in \mathbb{C} and measurable f:ΩUcf \colon \Omega \to U c we have ev f:T UΩI\ev_f \colon T^U \Omega \to I mapping α\alpha to α c(f)\alpha_c (f). The σ\sigma-algebra on T UΩT^U \Omega is the smallest such that each of these maps is measurable.

All that’s left is to say what we should choose \mathbb{C} and UU to be in order to get the Giry monads.

A subset cc of a real vector space VV is convex if for any x,ycx,y \in c and rIr \in I the convex combination rx+(1r)yr x + (1-r)y is also in cc, and a map h:cch \colon c \to c' between convex sets is called affine if it preserves convex combinations. So there’s a category of convex sets and affine maps between them. We will be interested in certain full subcategories of this.

Let d 0d_0 be the (convex) set of sequences in II that converge to 00 (it is a subset of the vector space c 0c_0 of all real sequences converging to 00). Now we can define the categories of interest:

  • Let \mathbb{C} be the category whose objects are all finite powers I nI^n of II, with all affine maps between them.

  • Let 𝔻\mathbb{D} be the category whose objects are all finite powers of II, together with d 0d_0, and all affine maps between them.

All the objects of \mathbb{C} and 𝔻\mathbb{D} can be considered as measurable spaces (as subspaces of powers of II), and all the affine maps between them are then measurable, so we have (faithful but not full) inclusions U:MeasU \colon \mathbb{C} \to \mathbf{Meas} and V:𝔻MeasV \colon \mathbb{D} \to \mathbf{Meas}.

Theorem. The codensity monad of UU is the finitely additive Giry monad, and the codensity monad of VV is the Giry monad.

Why should this be true? Let’s start with UU. An element of T UΩT^U \Omega is a family of functions

α I n:Meas(Ω,I n)I n. \alpha_{I^n} \colon\mathbf{Meas}(\Omega,I^n) \to I^n.

But a map into I nI^n is determined by its composites with the projections to II, and these projections are affine. This means that α\alpha is completely determined by α I\alpha_{I}, and the other components are obtained by applying α I\alpha_{I} separately in each coordinate. In other words, an element of T UΩT^U \Omega is a special sort of function

Meas(Ω,I)I. \mathbf{Meas}(\Omega, I) \to I.

Look familiar? As you might guess, the functions with the above domain and codomain that define elements of T UΩT^U \Omega are precisely the finitely additive integration operators.

The affine and weakly averaging properties of α I\alpha_{I} are enforced by naturality with respect to certain affine maps. For example, the naturality square involving the affine map

rπ 1+(1r)π 2:I 2I r\pi_1 + (1-r)\pi_2 \colon I^2 \to I

(where π i\pi_i are the projections) forces α I\alpha_I to preserve convex combinations of the form rf+(1r)gr f + (1-r)g. The weakly averaging condition comes from naturality with respect to constant maps.

How is the situation different for T VT^V? As before αT VΩ\alpha \in T^V \Omega is determined by α I\alpha_I, and α d 0\alpha_{d_0} is obtained by applying α I\alpha_I in each coordinate, thanks to naturality with respect to the projections. A measurable map f:Ωd 0f \colon \Omega \to d_0 is a sequence of maps f n:ΩIf_n \colon \Omega \to I converging pointwise to 00, and

α d 0(f)=(α I(f i)) i=1 . \alpha_{d_0}(f) = (\alpha_I(f_i))_{i=1}^{\infty}.

But α d 0(f)d 0\alpha_{d_0}(f) \in d_0, so α I(f i)\alpha_I(f_i) must converge to 00. So α I\alpha_I is an integration operator!

The rest of the proof consists of checking that these assignments αα I\alpha \mapsto \alpha_{I} really do define isomorphisms of monads.

It’s natural to wonder how much you can alter the categories \mathbb{C} and 𝔻\mathbb{D} without changing the codensity monads. Here’s a result to that effect:

Proposition. The categories \mathbb{C} and 𝔻\mathbb{D} can be replaced by the monoids of affine endomorphisms of I 2I^2 and d 0d_0 respectively (regarded as 1-object categories, with the evident functors to Meas\mathbf{Meas}) without changing the codensity monads.

This gives categories of convex sets that are minimal such that their inclusions into Meas\mathbf{Meas} give rise to the Giry monads. Here I mean minimal in the sense that they contain the fewest objects with all affine maps between them. They are not uniquely minimal; there are other convex sets whose monoids of affine endomorphisms also give rise to the Giry monads.

This result gives yet another characterisation of (finitely and countably) additive probability measures: a probability measure on Ω\Omega is an End(d 0)\mathrm{End}(d_0)-set morphism

Meas(Ω,d 0)d 0, \mathbf{Meas}(\Omega,d_0) \to d_0,

where End(d 0)\mathrm{End}(d_0) is the monoid of affine endomorphisms of d 0d_0. Similarly for finitely additive probability measures, with d 0d_0 replaced by I 2I^2.

What about maximal categories of convex sets giving rise to the Giry monads? I don’t have a definitive answer to this question, but you can at least throw in all bounded, convex subsets of Euclidean space:

Proposition. Let \mathbb{C}' be the category of all bounded, convex subsets of n\mathbb{R}^n (where nn varies) and affine maps. Let 𝔻\mathbb{D}' be \mathbb{C}' but with d 0d_0 adjoined. Then replacing \mathbb{C} by \mathbb{C}' and 𝔻\mathbb{D} by 𝔻\mathbb{D}' does not change the codensity monads.

The definition of 𝔻\mathbb{D}' is a bit unsatisfying; d 0d_0 feels (and literally is) tacked on. It would be nice to have a characterisation of all the subsets of \mathbb{R}^{\mathbb{N}} (or indeed all the convex sets) that can be included in 𝔻\mathbb{D}'. But so far I haven’t found one.

Posted at October 22, 2014 2:29 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2774

22 Comments & 0 Trackbacks

Re: Where Do Probability Measures Come From?

I believe the answer to the question you pose: “What about maximal categories of convex sets giving rise to the Giry monads?” is the category of convex spaces. As you keenly noted, I forgot to add the continuity at zero condition to description of the monad PP in my paper:( Simply adding the continuity at zero condition (or equivalently, continuity from below or continuity from above) in characterizing the subfunctor of the double dualization monad you obtain countable additivity and that monad is naturally isomorphic to the Giry monad. The arguments in my paper remain unchanged. Hence we have a functor from the category of convex spaces to the category of measurable spaces which I denoted by ι\iota. The monad PP is the right Kan extension of ii along itself (Theorem 6.2).
Posted by: kirk sturtz on October 23, 2014 8:12 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

It may well be the case that your argument can be adapted by adding the continuity requirement in the definitions of 𝒫\mathcal{P} and ι\iota (in the notation of Theorem 6.2 of your paper, one would then need to show that each η(z)\eta(z) satisfies this continuity condition, but it’s certainly believable that this could be done similarly to the other properties). As it stands your proof certainly works for the finitely additive Giry monad. Sorry, I should have pointed that out. In that case, the Giry monad is the codensity monad of a functor from the category of all convex spaces to Meas\mathbf{Meas}.

However, your functor ι\iota is quite different from my functor VV. Writing Cvx\mathbf{Cvx} for that category of convex spaces, ι\iota sends a convex space cc to the subset of Cvx(Cvx(c,I),I) \mathbf{Cvx}(\mathbf{Cvx}(c,I),I) consisting of the affine maps that are additionally weakly averaging (and possibly also limit-preserving?), equipped with a suitable σ\sigma-algebra.

On the other hand, for c𝔻c \in \mathbb{D}, VcV c has the same underlying set as cc, with a suitable σ\sigma-algebra.

So I suppose what I was really asking was “What’s the largest extension 𝔻\mathbb{D}' of 𝔻\mathbb{D} equipped with a functor V:𝔻MeasV' \colon \mathbb{D} \to \mathbf{Meas} such the codensity monad of VV' is the Giry monad and the evident triangle between 𝔻,𝔻\mathbb{D}, \mathbb{D}' and Meas\mathbf{Meas} commutes”.

This is a more difficult question, partly because it’s not even clear what the best way to equip an arbitrary convex space with a σ\sigma-algebra is.

Posted by: Tom Avery on October 23, 2014 10:37 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

If you are thinking of the functor V as being an insertion of a convex vector space into Meas then, if the space is also bounded, you can use the generalized metric given by Lawvere and further developed by his student Meng in her dissertation. However if the space is not bounded the resulting σ\sigma-algebras are trivial, e.g., using that generalized metric d(A,B)=logsup t[0,1]td(A,B) = -log \, sup_{t \in [0,1]} t subject to … see his initial reference to this metric is in his Metric Spaces, Generalized Logic, and Closed Categories paper, or the presentation in Mengs dissertation is much clearer. I can send you a photocopy of that if you do not have access to it - the results in Mengs dissertation are unfortunately not readily available in the open literature. Her dissertation makes some nice connections between the category of convex spaces and the category of measurable spaces. It is an extension of Lawvere’s Generalized Metric Space paper mentioned above. If you are looking at an arbitrary convex space then obtaining an extension of DMeasD \hookrightarrow Meas is indeed a difficult question. In a paper by Borger and Kemp they show that the interval (,](-\infty,\infty] is a cogenerator for the category of convex spaces which gives a separability condition on these functionals which is relevant to obtaining an extension.

With regards to the continuity condition - showing η(z)\eta(z) satisfies the continuity condition is straightforward. Indeed, if it was not true then your results would also be in error as our monads on Meas are identical. (Your subfunctor GFG \hookrightarrow F is my functor PP.)

Posted by: kirk sturtz on October 23, 2014 3:24 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Perhaps Meng’s thesis could be hosted somewhere online, if this is not objectionable to her.

Posted by: David Roberts on October 24, 2014 1:07 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Thanks for sending a copy of Meng’s thesis. It looks interesting! I agree it would be good for it to be available online somewhere.

I am happy to believe that it’s straightforward to check the continuity condition for η(z)\eta (z), I just haven’t checked it myself. I’m not sure I agree that my result depends on it though. Since η(z)\eta (z) is defined in terms of ι\iota, and ι\iota doesn’t appear in my argument, I don’t see how it’s relevant. Of course, I need to show that something defined in terms of VV satisfies the continuity condition, and I do (this is also straightforward).

Posted by: Tom Avery on October 24, 2014 9:42 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Kirk emailed Meng and got her permission to post a copy on the nLab. I’ve converted the scan to a djvu, and will upload it. I’m thinking the page metric space would be a start, but not sure where else. Maybe a page for Meng herself?

Posted by: David Roberts on November 6, 2014 3:53 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Interesting! I observe that your category \mathbb{C} is, among other things, a Lawvere theory, and thus the functor UU exhibits II as a model of this theory. That makes me wonder: suppose I have an arbitrary Lawvere theory and a functor exhibiting a model of it; is the codensity monad of that functor interesting? Should I already know the answer?

Posted by: Mike Shulman on October 23, 2014 8:45 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

An easy calculation for =Set\mathcal{M} = \mathbf{Set} shows that the resulting T U:SetSetT^U : \mathbf{Set} \to \mathbf{Set} is given by T U(X)[,Set](U X,U)T^U (X) \cong [\mathbb{C}, \mathbf{Set}](U^X, U) and since \mathbb{C}-algebras are a full subcategory of [,Set][\mathbb{C}, \mathbf{Set}], when UU is a \mathbb{C}-algebra, we could equally well write T U(X)Hom(U X,U)T^U (X) \cong Hom (U^X, U) which looks a lot like a double dualisation!

Indeed, it is a subfunctor of X[[X,U(1)],U(1)]X \mapsto [[X, U(1)], U(1)], where U(1)U(1) is the underlying set of the \mathbb{C}-algebra UU. I haven’t checked, but I would guess that T UT^U is actually a submonad of the “endomorphism” monad of U(1)U(1), and on that basis, I would guess that monad homomorphisms ST US \to T^U correspond to SS-algebra structures on UU as a \mathbb{C}-algebra.

Posted by: Zhen Lin on October 23, 2014 10:45 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Nice! This looks very similar to the “Monad of Schwartz distributions” considered by Anders Kock in “Commutative monads as a theory of distributions”, Section 11.

One thing that puzzles me slightly is that Kock requires the original monad (in this case the monad corresponding to \mathbb{C}) to be commutative before defining the monad of Schwartz distributions, whereas commutativity doesn’t seem to play a role in what you’ve just said. Perhaps commutativity just gives some better properties, or comes from the fact that Kock is considering strong monads on arbitrary Cartesian closed categories rather than just Set\mathbf{Set}.

Posted by: Tom Avery on October 23, 2014 11:46 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Out of interest, do you know if this Lawvere theory has been studied at all previously? Possibly related, if you replace cubes I nI^n with simplices Δ n\Delta^n you get the opposite of the Lawvere theory for convex spaces, and this still gives rise to the finitely additive Giry monad in the same way. But I hadn’t considered \mathbb{C} itself Lawvere theory.

Posted by: Tom Avery on October 23, 2014 11:08 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

I would like to comment on the wider significance of Tom’s paper, as well as my work in this area. Tom’s presentation is beautiful for its simplicity and avoids the symmetric monoidal closed property that I stress in my paper. My use of the SMC property is deliberate as I am pursuing categorical probability theory on more general categories. By stressing the SMC property it becomes readily obvious that (Bayesian) probability theory can be applied to any category with the properties of (1) SMC category, (2) a convex ordered object. The 2nd condition may be unnecessary - yet tbd. This work combined with previous work on Bayesian Machine Learning - which is simply Bayesian probability theory on function spaces (which requires SMC) - shows that it is possible to do probability theory in a very general categorical setting. I should mention Bob Coecke’s & Robert Spekkens work, Picturing classical and quantum Bayesian inference, which showed that for FINITE spaces one has probability theory on SMC categories (I don’t recall the exact details; but they recognized probability theory applied in a more general setting).

In another comment below Tom noted/questioned the role of the commutativity of the Giry monad and Anders Kock’s work. The commutativity of the Giry monad, which amounts simply to Tonelli’s as well as Fubini’s Theorem on double integrals, is not required for probability theory. The theory readily extends to non-commutative probability theory and only requires the SMC property - the Cartesian Closed property is not necessary. For example, IF the category of lattices is SMC (I am not sure of the answer to this question - the category may need to be cut down???) then the theory immediately applies to Lat and one observes non-commutative probability on ortho-complemented lattices used in quantum mechanics. The details of this proposed work need to be filled in, but it provides ample opportunities for anyone interested in categorical probability theory.

Posted by: Kirk Sturtz on October 24, 2014 1:56 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

…previous work on Bayesian Machine Learning - which is simply Bayesian probability theory on function spaces

Do you mean here Gaussian processes?

Posted by: David Corfield on October 26, 2014 10:30 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

It includes Gaussian Processes (the most practical case) but I do not believe it is limited to Gaussian Processes. The analytic solution for the GP is spelled out categorically in the paper “Bayesian Machine Learning via Category Theory” by Jared Culbertson and myself where the inference map can also be explicitly constructed. A quick glance at the figures related to parametric and nonparametric models in Section 7 suffices to get the idea. One simply needs the function spaces Y XY^X… the details are (yawn) an implementation of what the ML community already knows - but the categorical characterization shows the general framework for thinking about these issues categorically. In that paper the Kleisi category, which is SMwCC (weakly closed), is sufficient to do Bayesian ML.

Posted by: kirk sturtz on October 26, 2014 11:49 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Ah, I hadn’t put that together that you were the one of the authors of that paper I’d made a note of to read.

I was once much more involved in this area. I had an idea back then that one could think about Gaussian process use in terms of Bayesian information geometry and infinite-dimensional exponential families.

Posted by: David Corfield on October 26, 2014 1:18 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Kirk Sturtz: How does your proposal relate to, say, The Bayesian interpretation of quantum physics ?

Posted by: Bas Spitters on October 26, 2014 3:42 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Provided you have a SMCC to model your equations you are good to go - it is a modeling problem. For modeling QM the underlying SMCC could be taken to be any SMCC - I believe the Rosetta Stone paper goes into detail on these. Your knowledge far outstrips anything I could say here - but SMCC are relevant here. (Disclaimer\mathbf{Disclaimer}: I want to emphatically state my complete ignorance of QM… but lack of knowledge and skill has never stopped me before.) Let me try to explain a bit further in regards to the QM problem.

The most elementary approach to a Bayesian viewpoint on QM is via Bohmian Mechanics (deBroglie-Bohm Theory/pilot wave Theory). This theory is deterministic if knowledge of the initial configuration state is known. However, as I recall, if one takes a uniform distribution on the initial configuration state then one obtains the identical results that one obtains using the traditional approach that every physicist is taught. From this perspective the Sampling Distribution 𝒮\mathcal{S} is deterministic while the prior probability P HP_H requires a stochastic model. (The generic Bayesian model is shown in Figure 2 of the above mentioned paper. Moving to Function Spaces you get Figure 17 - the same “triangle”.) At that point the problem is like any other Bayesian model. All the work here is in constructing the sampling distribution via Schroedingers equation. Of course when the sampling distributions composed with the prior gets nasty MCMC methods are required to compute the inference map.

Posted by: kirk sturtz on October 26, 2014 7:44 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

respect limits: if f nMeas(Ω,I)f_n \in \mathbf{Meas}(\Omega,I) is a sequence of functions converging pointwise to 00 then ϕ(f n)\phi(f_n) converges to 00.

Does it make any difference to require this for all nets and not merely for sequences?

Posted by: Toby Bartels on October 26, 2014 4:02 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Yes, I think so. Requiring this for arbitrary nets amounts to a strengthening of the monotone convergence theorem, which not all probability measures satisfy. The following is a counterexample for nets whose underlying directed set is the first uncountable ordinal ω 1\omega_1.

Let Ω\Omega be the set of countable ordinals, that is, Ω\Omega is the underlying set of ω 1\omega_1. Equip Ω\Omega with the countable/cocountable σ\sigma-algebra. Let π\pi be the probability measure on Ω\Omega taking value 11 on cocountable sets and 00 on countable sets.

For each β<ω 1\beta \lt \omega_1 let f β(γ)=0f_{\beta}(\gamma) = 0 if γ<β\gamma \lt \beta and 11 if γβ\gamma \geq \beta.

The integral of each f βf_{\beta} with respect to π\pi is 11, since they take the value 11 almost everywhere.

But their pointwise ω 1\omega_1-limit is 00.

Posted by: Tom Avery on October 27, 2014 11:56 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

This is neat stuff, Tom! I’m particularly struck by the role that convexity seems to play.

It also seems like there’s a lot of room to vary things in this construction:

  • Have you thought about functors whose codensity monads might yield signed measures, unbounded measures, complex-valued measures, etc? Actually, I don’t even know which of these constructions are monads, let alone codensity monads…

  • UU and VV equip I nI^n with the Borel σ\sigma-algebra. If you use the Lebesgue σ\sigma-algebra, instead, what happens to the codensity monad?

  • It also seems strange to me to consider the finitely-additive monad 𝔽\mathbb{F} on measurable spaces, simply because a σ\sigma-algebra satisfies axioms about countable unions and intersections which seem unnecessary for the definition of a finitely-additive measure. Does 𝔽\mathbb{F} arise as a codensity monad on the category of all algebras (where an algebra is a set equipped with a boolean subalgebra of its powerset)?

Posted by: Tim Campion on October 27, 2014 2:49 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

I don’t know which of these constructions are monads, let alone codensity monads…

I guess what you had in mind here when you said “codensity monad” was “codensity monad of some functor similar to the one above”. But in case anyone reading this is misled, I can’t resist pointing out:

Every monad is a codensity monad.

That’s simply because every monad is induced by some adjunction FGF \dashv G, which is then the codensity monad of GG.

I’m beginning to think that the terminology “codensity monad” is too weighty. Perhaps it would be better to speak of the induced monad of a functor, or simply the monad of a functor.

Posted by: Tom Leinster on October 27, 2014 9:52 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

Thanks Tim! Interesting questions. I’ve thought/am planning to think a bit about some of these.

  • Not all of these constructions are monads (at least, not in the way that most obviously generalises the Giry monads). The problem is with the multiplication, which in the Giry monads is defined using integration. But, for example, when you try an analogous definition for finite, positive measures, the integrals that should define the multiplication may be infinite. On the other hand, I think positive, possibly infinite measures do form a monad, because every measurable function taking values in [0,][0,\infty] has an integral in [0,][0,\infty]. I haven’t (yet) thought very hard about whether this monad can be realised as a codensity monad in an interesting way.

  • Again, interesting question. I don’t know the answer.

  • I’m fairly sure all the finitely additive stuff should work in the same way for sets equipped with Boolean algebras (but I haven’t checked thoroughly). I mainly used σ\sigma-algebras for both so that I could describe 𝔾\mathbb{G} as a submonad of 𝔽\mathbb{F}, and for brevity.

Posted by: Tom Avery on October 27, 2014 3:03 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

I believe Tims question, from my “subfunctor of the double dualization monad” point of view is that the monad TT defined on component XX by T(X)={I XGI|Gisweaklyaveraging,affine,andcontinuousat} T(X) = \{ \I^X \stackrel{G}{\longrightarrow} \I \, | \, G \, is \, weakly \, averaging,\, affine, and \, continuous \,at \emptyset \} should be able to be generalized to T(X)={K XGK|Gisweaklyaveraging,affine,andcontinuousat}. T(X) = \{ K^X \stackrel{G}{\longrightarrow} K \, | \, G \, is \, weakly \, averaging,\, affine, and \, continuous \,at \emptyset \}.

where KK is replaced by [0,][0,\infty], [1,1][-1,1], the closed unit disk (viewed in the complex plane),etc. The key point is (seemingly) that KK needs to be convex and have a partial order.

Is TT still a subfunctor of the double dualization monad? If so, then as Tom L. reiterates, they certainly come from some functor mapping into MeasMeas.

Abstractly, we should be able to call those elements of the monad TT (assuming it is a submonad) “KK-valued (probability) measures”.

Posted by: kirk sturtz on October 27, 2014 5:19 PM | Permalink | Reply to this

Post a New Comment