September 18, 2007

Progic

Posted by David Corfield

My colleague here in Canterbury Jon Williamson is part of an international research group, progicnet, whose aim is to find a good integration of probability theory and first-order logic. For one reason or another, some technical projects get counted as philosophy, while others don’t. Where I’m sure I’d have a hard time getting funding for 2-geometry, progicnet’s goals are philosophically pukka.

Now, as Jon says,

We see then that there are a plethora of combinations of probability and logic, and that these approaches are being investigated in some detail.

But is there a ‘natural’ way of doing it? In the quest to have philosophy take notice of category theory, all we need do is use it to blend probability theory and logic powerfully. Does anyone know of any progress along these lines? Perhaps involving the probability monad?

On the logic front, there’s a huge amount already worked out category theoretically, such as Lawvere’s observations on the adjunctions (p. 12) between quantification and substitution. Here, formulae sit in fibres above the product of the types of their free variables.

First,

$\psi(y) \vdash_{Y} \forall x \phi(x, y) if and only if \psi(y) \vdash_{X \times Y} \phi(x, y),$

where the second $\psi(y)$ is the result of projecting it from the fibre above $Y$ to the fibre above $X \times Y$.

Second,

$\exists x \phi(x, y) \vdash_{Y} \psi(y) if and only if \phi(x, y) \vdash_{X \times Y} \psi(y).$

Now how do we go about getting a probabilistic valuation on the sentences, i.e., the formulae in the fibre above the empty set?

Posted at September 18, 2007 11:32 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1426

Re: Progic

Sometimes I feel I’ve finally learned British English. For example, I’ve not only learned that an “anorak” is a waterproof coat; I also understand what people mean by saying “he’s a real anorak”.

But then someone says something like “philosophically pukka” — and my bubble of hubris is rudely popped.

“Pukka” sounds bad, but from context I’m guessing it’s good.

Posted by: John Baez on September 18, 2007 8:08 PM | Permalink | Reply to this

Re: Progic

Yes, pukka is good. It came into into english in the days of the Raj from a hindi word meaning cooked or ripe. When I was a schoolboy in the 70’s it was a cool way to say ‘proper’ or ‘the real thing!’

Posted by: Geoff Clohessy on September 18, 2007 8:40 PM | Permalink | Reply to this

Re: Progic

I did not know it had anything to do with cooking. The phrase in my mind is “pukka wallah”. Don’t ask me what this means.

I also think you would have more chance of understanding this if you went to a public school.

Posted by: Bruce Westbury on September 19, 2007 8:15 PM | Permalink | Reply to this

Re: Progic

You’re thinking of punkah wallah, the boy who pulls the ceiling fan.

I don’t think the use of ‘pukka’ is so much a class matter. It’s more a case of when it was made popular. The chef Jamie Oliver likes to use it.

I’ll get back to the thread itself when I’ve come up for air from my new job.

Posted by: David Corfield on September 20, 2007 1:45 PM | Permalink | Reply to this

Re: Progic

I’m feeling sorry for David. It must be disappointing to post a blog entry about connections between probability theory and logic and trigger nothing but a discussion about the word ‘pukka’.

So, let’s talk a bit about that probability monad.

Roughly speaking, this is a functor sending any space $X$ to the space of probability measures on $X$.

‘Any space’ — but what sort of space? David said we should use certain nice topological spaces called Polish spaces. I’m sure this has advantages, but it’s a bit technical. I think we should postpone such technicalities until we understand simpler issues.

So, let’s work with a simpler monad

$M: Set \to Set$

sending any set $X$ to the set $M X$ whose elements are certain very nice measures on $X$: finite positive linear combinations of Dirac delta measures.

The reason I like $M X$ is that it’s the free $[0,\infty)$-module on $X$!

Here $[0,\infty)$ is not a ring, but a rig with the usual addition of real numbers as $+$ and the usual multiplication as $\times$. We can talk about $R$-modules for $R$ a rig, just as we can when it’s a ring. The free $R$-module on $X$ consists of finite $R$-linear combinations of elements of $X$. When $R = [0,\infty)$, these are the same as finite positive linear combinations of Dirac delta measures on $X$.

David should like this, because it begins to develop a bridge between the probability monad and the wonderful world of matrix mechanics over arbitrary rigs.

For any rig $R$ there’s a monad

$T_R: Set \to Set$

sending any set $X$ to the underlying set of the free $R$-module on $X$. I just explained why

$M \cong T_{[0,\infty)}$

But, it’s also interesting to take $R$ to be the rig of real numbers, or complex numbers, or truth values — or the rig of costs $\mathbb{R}^{min}$, or the temperature-dependent rig $\mathbb{R}^T$. As David knows and the links show, all these are interrelated in a wonderful way. Probability theory is just part of this big picture, which also includes classical mechanics, quantum mechanics and Boolean logic.

Or is it???

You’ll notice my monad

$M : Set \to Set$

sends a finite set $X$ not to the set of probability measures on $X$, but to the set of all measures on $X$.

That’s fine if we’re willing to accept the concept of ‘relative probabilities’, which aren’t necessarily normalized so their sum equals 1.

But what if we annoyingly insisted on working with probabilities that sum to 1? Then we’d get the monad

$P : Set \to Set$

sending any set $X$ to the set of probability measures on $X$ that are finite linear combinations of Dirac delta measures. This is a watered-down version of David’s probability monad.

Now, there’s no rig $R$ such that

$P \cong T_R$

The annoying condition that probabilities must sum to 1 stands in the way.

But, I believe there’s a generalized ring that does the job — a generalized ring in the sense of Nikolai Durov. This is the guy who generalized algebraic geometry to handle weird generalized rings like ‘the field with one element’. David already has already written about this.

Indeed, if you read my summary of Durov’s enormous book, you’ll see that one of his generalized rings is nothing but a monad with a certain property!

And, I think my probability monad

$P : Set \to Set$

simply is a generalized ring according to Durov’s definition! Unless I’m confused, it’s even a ‘commutative’ generalized ring!

So, everything David likes is part of the same big picture: the probability monad, the field with one element, the rig of truth values, the rig of costs, and so on. It’s all about linear algebra over Durov’s generalized rings. In particular, using these generalized rings, we can really talk about a ‘generalized ring of probabilities’, even though probabilities must sum to 1 — a condition we could never demand for arbitrary finite sets of elements of a mere rig.

If we’re going to combine probability theory and logic in a really nice way, maybe we should take advantage of this big picture. It may not be philosophically pukka, but it’s certainly pukka.

Posted by: John Baez on September 20, 2007 3:09 AM | Permalink | Reply to this

Re: Progic

In the film “Harvey”, Mr. Wilson looks up the definition of a word pronounced very much like “pukka” in an encyclopedia in the nuthouse reception area. He reads,

“Pooka – from old Celtic mythology. A fairy spirit in animal form, always very large. The pooka appears here and there, now and then, to this one and that one. A benign but mischievous creature, very fond of rumpots, crackpots, and how are you, Mr. Wilson?”

Posted by: Doug Jones on September 20, 2007 6:32 AM | Permalink | Reply to this

Re: Progic

Now David has his revenge. I posted a truly amazing comment explaining how to unify Boolean logic and probability theory in terms of modules of generalized rings, and I get a reply about… the word ‘pukka’!

Posted by: John Baez on September 21, 2007 2:53 AM | Permalink | Reply to this

Re: Progic

Thanks for reminding me about that comment!

One problem I personally have with the longer comments is that sometimes I don’t have time to read them, and then a day or two later when I do have a bit, there is often a whole new slew of comments, and I forget about the old ones.

Many of these comments are really follow-up posts. It might be nice to have some way of promoting big comments so they don’t get lost. Maybe my real problem is that the Firefox RSS set up isn’t as flexible as, say, a mail program, where you can leave messages in your inbox for future reading.

Anyway, my two cents.

Posted by: James on September 21, 2007 4:29 AM | Permalink | Reply to this

Re: Progic

The upside of the many amazingly well thought-out comments in this place is that they’ll provide rich fodder for the n-wikification program…

Posted by: Allan E on September 21, 2007 4:38 AM | Permalink | Reply to this

Re: Progic

A key element of Durov’s framework – that it’s about algebraic monads (section 0.4.4) – is why you had to opt for the finiteness of the support of the probability measure. This would have to be altered for a full treatment of probability theory.

So the question is what that is relevant to probability theory has been captured already by formulating it as about a generalized ring?

Something to wonder about are the operations to be had. We can’t have a straightforward sum or product of probability distribution, since they wouldn’t sum to 1.

Posted by: David Corfield on September 21, 2007 9:52 AM | Permalink | Reply to this

Re: Progic

David wrote:

So the question is what that is relevant to probability theory has been captured already by formulating it as about a generalized ring?

Pretty much everything except the ‘yucky analysis stuff’ where you replace finite sums by integrals. The basic ideas all shine through.

Lest anyone feel the need to argue that probability theory requires analysis and that analysis isn’t yucky, I hasten to agree. I like analysis and I’ve done a lot of it. However, I’ve learned that one can more quickly sketch out interesting category-theoretic ideas by retreating to contexts where analysis doesn’t show up. First lay out the algebraic ‘skeleton’ of what you’re trying to do; then clothe it in the analytic ‘meat’.

For example: make sure you understand a bit about finite-dimensional Hilbert spaces before you tackle the infinite-dimensional ones. Similarly for 2-Hilbert spaces (I’m writing a paper on the infinite-dimensional ones now). I could list lots more examples.

So: there’s surely an analysis-flavored enhancement of Durov’s theory of generalized rings which handles your full-fledged probability monad, Polish spaces and all. But: to outline grand schemes it’s best to retreat to the watered-down probability monad I described.

Something to wonder about are the operations to be had. We can’t have a straightforward sum or product of probability distribution, since they wouldn’t sum to 1.

The algebraic monad I described gives you exactly the right $n$-ary operations on probability measures: convex linear combinations! For example, for each $p \in [0,1]$ there’s a binary operation, which given two probability measures $\mu$ and $\nu$ spits out $p \mu + (1-p) \nu$ These are all the binary operations. More generally, there’s an $(n-1)$-simplex of $n$-ary operations.

It’s really just when we get to the infinitary operations that the watered-down probability monad I described becomes impoverished compared to the probability monad you described! Mine doesn’t have any infinitary operations. Yours allows you to take any collection of probability measures $\mu_x$ where $x$ ranges over a Polish space $X$, and any probability measure $p$ on $X$, and form a new probability measure on $X$, like this: $\int_X \mu_x \; d p(x)$ It’s fun to think about this stuff, but only later, when you’re ready for analysis. One must learn to add before doing integrals!

This is how it always goes — which is one thing that slows down the applications of category theory to probability and all branches of analysis. Analysis requires an ultra-macho version of category theory that can only come after one has mastered the ordinary stuff; most people who like analysis never even learn the ordinary stuff.

Posted by: John Baez on September 21, 2007 12:57 PM | Permalink | Reply to this

Re: Progic

Let me just add that the monad I described:

$P : Set \to Set$

sending any set $X$ to the set of formal linear combinations

$P X = \{ \sum_i p_i x_i : \; x_i \in X,\; p_i \ge 0, \; \sum_i p_i = 1, \; only finitely \; many \; p_i \; nonzero \}$

is often known as the monad for convex sets. Its algebras are convex sets.

The free convex set on an $n$-element set is a simplex with $n$ vertices. Annoyingly, this is called the $(n-1)$-simplex. (Note that the $(-1)$-simplex is the empty set.)

By general abstract nonsense about monads coming from algebraic theories (known as algebraic monads), the $(n-1)$-simplex is also the set of $n$-ary operations in the algebraic theory for convex sets.

There’s also an algebraic monad for affine spaces, where one drops the condition $p_i \ge 0$. Here instead of the $(n-1)$-simplex we get an $(n-1)$-dimensional affine space.

Posted by: John Baez on September 21, 2007 1:34 PM | Permalink | Reply to this

Re: Progic

Perhaps we should follow up David Mumford’s proposal for a stochastic predicate calculus.

It should have the syntax of standard predicate calculus except that we have two kinds of variables in it: the ordinary predicates and constants and quantiable free variables $x$ but also a set of random constants $\underline{x}$. In addition, it comes with a truth value function $p$ mapping all formulas $F$ without free variables to real numbers between $0$ and $1$. If the formula $F$ has only ordinary variables in it, then $p(F) \in \{0, 1\}$. Formal semantics for this theory would make the random constants functions on probability spaces so that a formula would define a subset of the product of these spaces, hence have a probability. (pp. 11-12)

Posted by: David Corfield on September 21, 2007 12:56 PM | Permalink | Reply to this

Re: Progic

David wrote:

Perhaps we should follow up David Mumford’s proposal for a stochastic predicate calculus.

I have a feeling his proposal is related to the stuff I’m talking about, but I can’t quite put my finger on the connection. I’d need to understand the relation between Boolean algebra and the predicate calculus and then generalize it to the probabilistic case.

Here’s my first try — watch me crash and burn:

Propositional logic is summarized by the ‘free Boolean algebra’ monad. The predicate calculus goes a wee bit further. Lawvere worked out exactly how. We start with a Boolean algebra $B_n$ of $n$-ary predicates for each finite set $n$. Then we say there’s a ‘variable substitution’ operation

$B_f : B_m \to B_n$

for each function $f: m \to n$. For example, if $f$ is the unique function $f: 2 \to 1$, then $B_f$ sends any 2-ary predicate $P(x_1, x_2)$ to the 1-ary predicate $P(x_1, x_1)$. At this point we might as well admit $B$ is a functor from $FinSet$ to $BoolAlg$. Then, we get quantifiers as adjoints to the substitution operations — that was Lawvere’s stroke of genius:

• F.W. Lawvere, Adjointness in foundations, Dialectica, 23 (1969), 281–296

Now we want to generalize all this to the ‘stochastic’ world!

To do this, I’d like to see the ‘free Boolean algebra’ monad as a special case of the algebraic monads I’ve been talking about, and figure out how to generalize everything I just sketched to an arbitrary algebraic monad.

Then, if we replaced the ‘free Boolean algebra’ by the ‘free convex set’ monad, we might get a stochastic predicate calculus.

But at this point I realize something is seriously screwed up with my plan! The algebraic theory for convex sets doesn’t contain the ‘and’, ‘or’ and ‘not’ operations we had in Boolean algebras. It only has the convex linear combination operations!

In fact, I now see I was completely confused. Boolean algebras are sets of propositions, while convex sets are sets of states. We perform logical operations on propositions, but linear combination (or ‘mixture’) operations on states. Why was I trying to combine them??? They’re ‘dual’ in some way.

Posted by: John Baez on September 21, 2007 2:29 PM | Permalink | Reply to this

Re: Progic

Hmm, is the logic part of this right? I mean do you really talk about

$n$-ary predicates for each finite set $n$.

I thought the idea was to choose a cartesian closed base category like Sets, then look at all predicates over an object in it.

So if we have $X$ the set of dogs, predicates typed over $X$ are unary, like being a poodle. Typed $n$-ary predicates correspond to a product of $n$ sets.

The adjunction process starts from observing that a function $f: X \to Y$ spits out a predicate of $X$ for every predicate of $Y$. So Let $Y$ be the set of people, and $f$ be the function which sends a dog to its owner. (Each dog is owned by one owner.) Then the predicate ‘is French’ gets sent to the predicate ‘is owned by a Frenchman’.

Let’s try to work out the adjunctions. If from being a poodle it follows that you were owned by a frenchman, then being an owner of a poodle implies you are French. So this functor sends dog predicates to human predicates ‘owner of a – dog’.

Now, the other one. If from being owned by a Frenchman, it followed that you were a poodle, then from being a Frenchman, it follows that if you own a dog then it’s a poodle. So this functor sends dog predicates to human predicates ‘owner only of – dogs’ (which includes owning no dogs).

Does anything similar happen with the probability functor?

Posted by: David Corfield on September 22, 2007 5:46 PM | Permalink | Reply to this

Re: Progic

Just one little thing for now. I was talking about the ‘untyped’ (= monotyped) predicate calculus, where to specify the ‘arity’ of a predicate we just need a number $n$. Lawvere realized that the nice way to do this is via a functor

$B :FinSet \to BoolAlg$

This eliminates the heavy reliance on syntax that dominates the old-fashioned approach — ‘variables’, ‘quantifiers’, ‘well-formed formulas’ and all that junk.

$B$ sends any finite set $n$ to the set $B_n$ of $n$-ary predicates in your theory, which form a Boolean algebra under the usual logical operations. And, it sends any function $f: n \to m$ to a Boolean algebra homomorphism $B_f: B_n \to B_m$ which describes ‘substitution of variables’ in the old approach. I gave an example above, but here’s another: if $f: 3 \to 2$ is the function with $f(0) = 1, \; f(1) = 0, \; f(3) = 0$ then $B_f$ maps any 3-ary predicate $P(x_0,x_1,x_2)$ to the 2-ary predicate $P(x_1,x_0, x_0)$. This is clearly a Boolean algebra homomorphism.

The nice part is that a Boolean algebra is itself a specially nice sort of category (since it’s a poset with implication as $\le$), and a Boolean algebra homomorphism is a specially nice sort of functor (since it preserves implication). So, it makes sense to demand that

$B_f : B_n \to B_m$

has both a left and a right adjoint. And, this demand then gives you universal and existential quantifiers!

If this is old hat to you, sorry. If it’s not, I leave it as a puzzle to work out what the left adjoint of $B_f$ should be like (in the simplest examples), and whether this is universal or existential quantification. Ditto for the right adjoint.

Jim has been telling me about this stuff forever, so I thought I’d finally put it to good use in an attempt to blend probability theory and the predicate calculus into some form of ‘progic’. I still think it should be helpful. However, my first attempt was a belly flop.

Posted by: John Baez on September 22, 2007 7:07 PM | Permalink | Reply to this

Re: Progic

Ok, so you’re telling a similar story to mine, but with a single untyped universe, $U$, of entities. So you have $n$-ary predicates defined over $U^n$. And the maps you deal with go from $U^m$ to $U^n$.

Hmm, do you need that trick of finding a $U$ for which $U^U \equiv U$ to get implication working (‘Adjointness in Foundations’ p. 11)? Aren’t typed logics nicer?

But the more important point is that when we now think about probabilities, it’s not really a distribution over entities in some universe we care about. All that probability monad stuff concerns precisely that. So a morphism between $X$ and $Y$ in the Kleisli category is a function between $X$ and $P Y$, otherwise known as a conditional distribution $P(y | x)$. If $X = \{*\}$, we have a distribution over $Y$.

However, to achieve progic’s aims, I don’t think we’re looking to model distributions over, say, a set of dogs, as though there’s some specific dog, and a chance that it’s Rex, Fido, or Rover. I think it’s more a case of a set of dogs and probabilistic predicates, say, ‘being a poodle’, such that for each dog there’s a number between $0$ and $1$ corresponding to our belief that that dog is a poodle.

So Pr(poodle(Rex)) = 0.4, Pr(poodle(Fifi)) = 0.8, etc. In other words instead of a map from $X$ to $\{0, 1\}$, we have a map from $X$ to $[0, 1]$.

Now, unlike in the case of an ordinary predicate, knowing these probabilistic predicates for each member of $X$ doesn’t tell us what happens on $X^n$. So there’s no reason to have Pr(poodle(Fifi) and poodle(Rex)) = Pr(poodle(Fifi)) $\times$ Pr(poodle(Rex)). Why assume independence? I might know that Rex and Fifi are owned by the same person, and when finding out that Rex is a poodle, it will increase my belief that Fifi is too.

Jon Williamson’s approach is to seek maximally uncertain distributions which satisfy all known constraints.

Posted by: David Corfield on September 23, 2007 12:11 PM | Permalink | Reply to this

Re: Progic

This has surely got to be a useful observation:

As a final example of a hyperdoctrine, we mention the one in which types are finite categories and terms arbitrary functors between them, while $P(A) = S^A$, where $S$ is the category of finite sets and mappings, with substitution as the special Godement multiplication. Quantification must then consist of generalized limits and colimits…By focusing on those $A$ having one object and all morphisms invertible, one sees that this hyperdoctrine includes the theory of permutation groups; in fact, such $A$ are groups and a “property” of $A$ is nothing but a representation of A by permutations. Quantification yields “induced representations” and implication gives a kind of “intertwining representation”. Deductions are of course equivariant maps. (Adjointness in Foundations, p. 14)

Posted by: David Corfield on September 23, 2007 3:56 PM | Permalink | Reply to this
Read the post Progic II
Weblog: The n-Category Café
Excerpt: More on merging probability theory and logic
Tracked: September 25, 2007 9:48 AM
Read the post Progic IV
Weblog: The n-Category Café
Excerpt: More on unity probability theory and logic
Tracked: October 9, 2007 9:15 AM
Read the post What Can Category Theory Do For Philosophy?
Weblog: The n-Category Café
Excerpt: A possible philosophical meeting on category theory
Tracked: December 6, 2012 9:56 AM
Read the post Jual Beli Grosir Supplier Toko Reseller Dropship Distributor Online Murah Terbaru 2014 Termurah di Batam Jakarta
Weblog: Jual Beli Grosir Supplier Toko Reseller Dropship Distributor Online Murah Terbaru 2014 Termurah di Batam Jakarta
Excerpt: [...]Nice blog here! Also your website a lot up fast![...]
Tracked: May 13, 2014 9:00 AM

Post a New Comment