Category Theory in Machine Learning

September 5, 2007

Category Theory in Machine Learning

Posted by David Corfield

I was asked recently by someone for my opinion on the possibility that category theory might prove useful in machine learning. First of all, I wouldn’t want to give the impression that there are signs of any imminent breakthrough. But let me open a thread by suggesting one or two places where categories may come into play.

For other areas of computer science the task would be easier. Category theory features prominently in theoretical computer science as described in books such as Barr and Wells’ Category Theory for Computing Science. Then there’s Johnson and Rosebrugh’s work on databases.

As for machine learning itself, perhaps one of the most promising channels is through probability theory. One advantage of working with the Bayesian approach to machine learning is that it brings with it what I take to be more beautiful mathematics. Take a look at this paper on statistical learning theory. It belongs to the side of the cultural divide where category theory doesn’t flourish. If, on the other hand, we encounter mathematics of the culture I prefer, it is not unlikely that category theory will find its uses.

In a couple of posts (I and II) I discussed a construction of probability theory in terms of a monad. It struck me there that the natural inclination of the Bayesian to think about distributions over distributions fits this construction well. For example, the hot topic of Dirichlet processes are distributions over distributions.

Graphical models, which include directed graphs, are another hot topic. If we remember that a category is a kind of directed graph, perhaps something can be done here. Graphical models can also be blended with probabilities. I once tried to think of Bayesian networks, a result of this blend, as forming a symmetric monoidal category.

Another dimension to spaces of probability distributions is that they can be studied by differential geometry in a field known as information geometry. In this list there are some references to the use of information geometry in machine learning. As a distant propect, perhaps category theoretic aspects to differential geometry could come to play a role.

If Lemm were right,

Statistical field theories, which encompass quantum mechanics and quantum field theory in their Euclidean formulation, are technically similar to a nonparametric Bayesian approach,

and we’re right here about category theory and such field theories, perhaps something interesting could happen.

Another speculative thought was to tie the kernels appearing in machine learning to John’s Tale of Groupoidification. Perhaps this might be done to encode invariances more intelligently. Currently, RBF kernels get used a lot, even though they don’t encode your background knowledge well. For example, two images varying just in one pixel are close in the space of images, so if one is classified as a ‘3’, there is a high probability that the other is too. But shift an image two pixels to the right and the images are far apart in the space of images, so the kernel is agnostic about what the label of one images means for the other. One needs to encode this invariance in the kernel.

Two people who very much believe that the kernels used in machine learning are not the right ones for the tasks we need to perform in the world are Lecun and Bengio. The problem is with the shallowness of the architecture, they say here. Instead they advocated neural nets with deep architectures. These have that ‘catastrophic’ behaviour that small changes in the weights may lead to very different performance.

A neural net architecture is a mapping from space of weights to a certain space of functions, but this is not a 1-1 mapping. Many weight settings may correspond to the same function. Singularity theory can be used to study some of the properties, as Watanabe does. That’s the kind of mathematics where category theory should show.

There is also some work directly applying category theory to neural networks, such as here, but I haven’t followed it.

All in all, you can see that there are no sure fire bets. I very much doubt we’re as far advanced as a mathematical physicist wondering about categories in their field in 1993. If anyone else has some reasons to be optimistic, do let us know.

Posted at September 5, 2007 1:32 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1418

35 Comments & 1 Trackback

Re: Category Theory in Machine Learning

Hi David.
This might be a somewhat silly question, but have people done much work on using the technology of type theory with machine learning? That’s generally where a lot of the uses of categories come in to CS, at least the kind of CS I seem to read about.

Also, I had never heard of information geometry before. The idea of the metric being related to the “amount of information” between two configurations is fun. I’m surprised I haven’t seen it come up more often in quantum mechanics.

Posted by: Creighton Hogg on September 5, 2007 3:34 PM | Permalink

Re: Category Theory in Machine Learning

Not at all a silly question. As far as I’m aware there’s no work done here, but it would have the advantage of being able to tap into an enormous literature. People do try to get machines to parse sentences grammatically. Perhaps that would be a place to start.

As for information geometry, there’s even a quantum version, as discussed at this Fields Institute workshop. One of the people with whom John dined yesterday evening, Ray Streater, has contributed to the field.

Posted by: David Corfield on September 5, 2007 8:38 PM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

On a somewhat related note:

As far as I am aware, Nils Baas (usually mentioned here in relation to his work on classifying spaces of 2-vector bundles) is seriously thinking about applying $n$ -category theory – in the context of his framework of hyperstructures – to biology.

I hear that Baas’s talk at Categories in Geometry and Physics will be about this.

There is

N. Baas, C. Emmeche, On Emergence and Explanation

but I have yet to read it. The basic idea seems to be this:

We think that the present framework is quite useful for analysing the nature of emergence - in particular the dependence on observational mechanisms. In order to put this into a more mathematical framework category theory is very useful. We let the systems be represented as objects in a category and the interactions as morphisms. A complex system with interactions is then represented by a diagram. Since the morphisms are represented by arrows, this may be viewed as a process oriented representation.

Posted by: Urs Schreiber on September 5, 2007 9:03 PM | Permalink

Re: Category Theory in Machine Learning

If I remember rightly Andrée Ehresmann did look at category theory with the idea of using it for modelling parts of machine learning.

Posted by: Tim Porter on September 6, 2007 5:58 PM | Permalink

Re: Category Theory in Machine Learning

Yes, she works on Memory Evolutive Systems. Nils Baas has written a paper with her and Jean-Paul Vanbremeersch.

Posted by: David Corfield on September 6, 2007 7:51 PM | Permalink

Re: Category Theory in Machine Learning

“immanent breakthrough”

Either a typo, or a neat play on words.

Posted by: Dan Piponi on September 5, 2007 11:38 PM | Permalink

Re: Category Theory in Machine Learning

Typo. It certainly wasn’t a Deleuzian trope.

I’ll fix it.

Posted by: David Corfield on September 6, 2007 10:32 AM | Permalink

Re: Category Theory in Machine Learning

I like the idea that all forms of generalisation are adjunctions. It makes some kind of sense to me - the idea being that an adjunction gives what is in some sense the ‘best’ way to represent an object in one category using an object in another category. Much of machine learning could loosely be thought of in this way.

Posted by: Dan Piponi on September 5, 2007 11:45 PM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Dan Piponi wrote:

the idea being that an adjunction gives what is in some sense the ‘best’ way to represent an object in one category using an object in another category.

We can make this very precise by saying that the search for a left or right adjoint to a functor $F: C \to D$ is the search for the ‘best approximation’, from either above or below, to the — possibly nonexistent! — inverse of $F$ .

For example, if $C$ and $D$ are partially ordered sets, $F$ has a right adjoint if we can find for each $d \in D$ a greatest possible $c \in C$ with $F(c) \le d$ . This is what I mean by a ‘best approximation from below’ to the possibly nonexistent object $F^{-1}(d)$ . We then define the right adjoint of $F$ to be the $G : D \to C$ that picks out this best possible $c$ for each $d \in D$ .

The whole business is a bit subtler for categories that aren’t posets, but it’s still quite standard stuff — except for the realization that an adjoint functor is ‘the best possible reply to an impossible demand’.

I’m not sure I believe analogy is best modeled in terms of adjoint functors. I’ve always thought of analogy as being like ‘proportion’ — that is, equations of the form

$\frac{A}{B} = \frac{C}{X} \qquad (solve for X)$

According to Wikipedia:

αναλογια (analogia) originally meant ‘proportionality’, in the mathematical sense, and it was indeed sometimes translated to Latin as proportio. From there analogy was understood as identity of relation between any two ordered pairs, whether of mathematical nature or not.

This is all about finding one side of a commutative square given the other three. I don’t see what this has to do with adjoint functors — in any precise way, that is.

Nonetheless, there’s some relation to adjoint functors, since you could still say every analogy is the best possible reply to an impossible demand — an impossible demand that would be possible if the necessary inverses existed.

For example:

poetry:music::the Iliad:???

The demand here is: if poetry were music, the Iliad would be what?

If we had inverses at our disposal, we could write this as a proportion:

$\frac{poetry}{music} = \frac{the Iliad}{x}$

Then, with luck, we could solve for $x$ :

$x = {the Iliad} \frac{music}{poetry}$

However, the necessary inverses don’t really exist, so we need to use analogy to guess an answer to this impossible problem.

Mathematical puzzle: what do proportions like A:B::C:X, or commutative squares, have to do with adjoint functors? Presumably a good answer would take everything I just said and get it to fit together better!

Posted by: John Baez on September 6, 2007 2:26 AM | Permalink

Re: Category Theory in Machine Learning

I think spans or pushouts might be the way to model analogy.

Posted by: David Corfield on September 6, 2007 11:15 AM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

I think spans or pushouts might be the way to model analogy.

I have seen this statement before. But am not sure I entirely follow what it is supposed to mean in detail. But it sounds very interesting.

Can you give me a simple example, which illustrates this idea that an analogy is a span?

I really like the general idea that it is possible to understand informal everyday concepts like “analogy” in a precise manner. This is similar to how the informal everyday concepts “stuff”, “structure”, “property” are nicely formalized by the forgetful functor game. In that case, I find a couple of examples very helpful to understand how it works.

Same probably here. So: does anyone have a nice example for understanding the statement that “an analogy is a span”?

Posted by: Urs Schreiber on September 6, 2007 4:12 PM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Childhood is to old age as morning is to evening.

Represented by a span in some category of linearly ordered sets from $\bullet \to \bullet$ to a life and a day. Or something like that.

Posted by: David Corfield on September 6, 2007 8:13 PM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Ah, thanks, I see.

The (faithful, I guess) image of category $C$ in category $D$ is a “ $C$ -situation in $D$ ”, or a “ $C$ -state of affairs in $D$ ”.

and all $C$ -situations are analagous.

Okay, great. Thanks.

Posted by: Urs Schreiber on September 6, 2007 8:20 PM | Permalink

Human paths in Laerning Theory, development; Re: Category Theory in Machine Learning

‘The child is father to the man’. Hopkins, Gerard Manley. 1918. Poems.

Hopkins is commenting on Wordsworth’s original sentiment, namely a poetic statement of an otherwise commonplace observation: what you are, and feel, and think, learn, and believe as a child creates a path you will take into adulthood.

Superficially, this is an adjoint of the conventional chronology “The Man is father of the child” but does a kind of recategorification in replacing time spans with paths, and the implication that many potential paths exist. I’m not saying that these poets predicted feynman diagrams or holonomy, but I did want to indicate that metaphor is a higher level of n-category than naive models suggest.

Posted by: Jonathan Vos Post on September 6, 2007 8:25 PM | Permalink

Re: Category Theory in Machine Learning

Oops, I see John already made the point I was making.

Posted by: Paul Smith on September 6, 2007 2:27 PM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Interesting stuff. I’ve been pondering categorical ways of thinking about analogies for a while. First of all I think that the example JB gives is a slightly degenerate example of an analogy: analogies of this form don’t have much structure to them. Anyway, let me solve the analogy problem.

poetry:music::the Iliad:???

The Illiad relates to poetry in a very specific way: it is a prototypical, famous example of a poem. We can think of this as defining a map “take prototypical example of”, and apply it to music. Thus Instead of writing

$x = the Iliad \frac{music}{poetry}$ I would write $x = (\frac{the Iliad}{poetry})music = PrtEx(music) = Beethoven's Symphony No. 9$

Where PrtEx is shorthand for the “take prototypical example” function.

The reason I say that this is a slightly degenerate example of an analogy is that it has almost no structure to it, and is based on knowing what “the Iliad” is. Most useful analogies (i.e. ones that people will actually use when they are trying to explain something) have a lot more structure. For example, suppose you are trying to explain the concept of choosing a password to encrypt a file to your grandmother. She doesn’t understand what encryption is, so you make an analogy with a concept she does understand: the concept of locking a valuable item in a safe. In order to make this analogy to her, you have to map each concept in the “encryption” scenario to an object in the “safe” scenario. Thus:

encryption <—–> safe

password <—–> key for safe

plaintext <—–> valuable item

to encrypt <—–> to lock the safe

to decrypt <—–> to unlock the safe

At first glance it looks like an analogy is simply a bijection of sets. In fact this not the case - to start with these are very structured sets, and most of the work that the analogy does is done by the way the concepts on the “known” side of the analogy link into other concepts. For example, the concept of locking a safe “interfaces” with the concept of other people wanting to steal things. Your granny will know that she should not tell anyone what the password is because she already knows that if she had a safe then she would be careful who she gave the key to. She transports her existing understanding across the analogy.

In summary, an analogy is a way of mapping structure from some “known” situation to an “unknown” situation; to put it another way, an analogy is a “homomorphism of concepts”. This is a lot like what functors do in category theory, which was what originally motivated me to think about formalizing analogy with category theory.

My personal theory is that the above idea can be used in AI as follows: a fully fledged concept can be built up as a sequence of partial analogies to some basic set of concepts. In fact Lakoff and Johnson have written an interesting book called Metaphors We Live By which outlines how a few important basic concepts are analogically related to all sorts of other concepts.

Posted by: Roko on September 17, 2007 9:18 PM | Permalink

Re: Category Theory in Machine Learning

The analogy fits to the degree it matches closely with the primary categorizing descriptions/words.

Joseph Campbell describes the archetypal “Hero’s Journey”. The Iliad describes the journey of a king and return to his kingdom, so it belongs to a Hero’s Journey. Another example is “The Lord of the Rings” particularly the third book “Return Of The King”. I think the epoch of music that matches with ancient Greece are Classical Music offerings often using Latin.

I think the kind of music which matches the Iliad archetype is a mirror canon, an inversion of the music found earlier in the piece, which can be found in GEB by Hofstadter in his discussion of analogy and self-reference. Bach prepared “The Musical Offering” for King Frederick the II, which contains fugues and canons. They could be imagined as corresponding to the adventures of Ulysses. So my answer is that
x = The Musical Offering composed by Bach.

This type of analogical reasoning is pretty hard to capture by computer program. I favor Lakoff’s theory of embodied mathematics because it incorporates a sense of primal origin, archetypal thinking, as does Memory Evolutive Systems, which features Category Theory, that others have mentioned.

Posted by: Stephen Harris on October 20, 2011 8:53 AM | Permalink

Re: Category Theory in Machine Learning

Iliad and Odyssey are both epics and part of the “Western canon” earliest significant Western literature. x = Das Rheingold part 1 of 4 operas composed by Richard Wagner.

Posted by: Stephen Harris on October 20, 2011 10:41 AM | Permalink

Re: Category Theory in Machine Learning

Dan’s remark reminds me of one perspective I have on adjunctions–that they are related to approximations. The simplest example illustrating this is to take the embedding of the integers in the reals but view this as a functor from one category to another:
both sets are partially ordered sets and a poset is viewed as a category by having an arrow x—>y whenever y is greater than or equal to x. The inclusion functor in this case has a left and right adjoint, and these are the ceiling and floor functions. Now think of the ceiling and floor functions as approximating a real number by an integer.

There is more to be said but that’s probably enough for now.

Posted by: Paul Smith on September 6, 2007 2:24 PM | Permalink

Re: Category Theory in Machine Learning

I had some ideas about this about the time of my book with Cordier:
Shape Theory: Categorical Methods of Approximation,
in 1989.
There has been a little written about it by myself and others but the problem of incorporating Kernel type constructions is still a hard one as far as I can see.

It may help to look at some of the work recorded in the Dagstuhl seminars on representation of spaces:
http://drops.dagstuhl.de/portals/index.php?semnr=04351
Some of this is fairly standard TCS but the use for instance by formal concept analysis of lattice theoretic stuff is serious artificial intelligence.

Posted by: Tim Porter on September 6, 2007 12:53 PM | Permalink

Re: Category Theory in Machine Learning

For the record, we should also note attempts to use category theory in cognitive psychology:

The Logical Foundations of Cognition edited by John Macnamara and Gonzalo E. Reyes, Oxford University Press, 1994.

Language, Logic, and Concepts edited by Ray Jackendoff, Paul Bloom and Karen Wynn, MIT Press, 1999.

Lawvere contributes to both.

Posted by: David Corfield on September 10, 2007 3:03 PM | Permalink

Category Theory in Evolutionary Computation Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Though it’s not machine learning, evolutionary computation, and particularly coevolutionary computation, has several places where category theory could contribute quite a lot. I’ve formulated most of my own ideas about coevolution in terms of preorders, many of which could conceivably and usefully be replaced by categories. Besides that, there are places where I’ve seen what amount to functors and sheaves. I’m sorry about the long comment, but I wanted to give some setup and intuition.

A typical formulation of a coevolutionary algorithm starts with an interactive domain. You can view one of these as a function $p:S\times T\rightarrow R$ . When an element of $S$ (candidate solution) interacts with an eleemnt of $T$ (test), it receives a result from the preorder $R$ . The order on $R$ allows results to be compared; higher values are better for the candidate solution. Payoff matrices as used in game theory can be written in this way, though there is no need for $R$ to contain numbers. $p$ might encode the outcome of a game of chess, and $p(me,you)$ might be $lose$ , where $lose \lt draw \lt win$ (it might be useful to encode why I lost, and make $R$ a full-blown category). Since I brought up chess, I should say that games make up a good fraction of example applications, so I’m going to lapse into using words like “strategy” and “play” when to be more general I should be saying “solution” and “interact.”

One of the most important observations here is that in most games of interest, no single element of $S$ actually suffices as a “solution.” Rather, some mixture (distribution – think Nash equilibrium strategy), bag, or other structure built from elements of $S$ usually works better (I should say that I’m ignoring $T$ for now for simplicity’s sake…but there is a lot to say about it). In rock-paper-scissors, for instance, none of the “primitive” or “atomic” strategies rock, paper, or scissors, is a very good one to play; picking one of them at random with equal probability at each interaction is usually a better strategy. Well, that is just a probability distribution over $S=\{rock,paper,scissors\}$ with all choices equiprobable. The collection of all probability distributions over $S$ is what we’re really searching in this case.

So while most algorithms move through elements of $S$ (and $T$ ), what we really seek is an element of some structure built from $S$ . The structures, like mixtures, are almost always free structures built from $S$ . They’re often called configurations, so we might as well write $CS$ for configurations built from $S$ and recognize $C$ as a functor (it’s not completely clear to me whether $C$ should be covariant or contravariant, but in examples it seems the most natural thing to do is treat it as contravariant; there are usually restrictions or “degeneracies” which make more sense than inclusions/emebeddings/…). Calling it “free” suggests an adjoint, and indeed there’s often a forgetful functor lying around too.

The state of an algorithm at time $t$ is usually a pair of subsets $S_t\subset S$ and $T_t\subset T$ (in fact, “populations” in coevolutionary algorithms are usually multisets/bags or distributions, so think of $S_t$ and $T_t$ as supports for simplicity). As noted above, we draw potential solutions from some structure of configurations $CS$ . However, what we actually have on hand when the algorithm is running isn’t all of $S$ , but $S_t$ . Thus, the structures actually available at time $t$ are those in $CS_t$ , so that if we stopped the algorithm and asked it for the best thing it had found, it’d have to give us its best guess from $CS_t$ . It’s as if we know about rock and paper but haven’t discovered scissors yet. From the local perspective afforded by the state $\{rock,paper\}$ , paper would seem like the best strategy to play, even though from our global perspective (which the algorithm does not have) we know this is not the case. This local/global distinction smells like a sheaf to me, though I’ve never been able to formulate this as a sheaf in a useful way. But the important question is: are there generic statements which can be made about whether a particular algorithm, when progressing from $S_t$ to $S_{t+1}$ to … is getting “closer to” or creating a better and better approximation to the best solution in $CS$ (where its guess are are drawn from $CS_t$ , $CS_{t+1}$ , …)?

One more word, about $T$ . Currying $p$ leads to a function $\lambda_S:T\rightarrow R^S$ via which we recognize that elements of $T$ name functions $S\rightarrow R$ (which is interesting to think about: you, as a chess player, act as a function from the set of other chess players to the preorder $\{lose\lt draw\lt win\}$ !). Each $t\in T$ induces a preorder on $S$ therefore by pullback of the order on $R$ through $t$ -as-function. Any subset $T^'\subset T$ can be thought of as inducing an order on $S$ (in a variety of ways…) by “integrating” these individual orders from the $t\in T^'$ . Is there some natural way to turn that information into an order (or some other sense of direction) on $CS_t$ ? Can that then be used to identify potential solutions from $CS_t$ which have some hope of being good approximations for the best we could find in $CS$ ?

Posted by: Anthony Bucci on September 23, 2007 7:12 AM | Permalink

Brilliant! Re: Category Theory in Evolutionary Computation Re: Category Theory in Machine Learning

What Anthony Bucci writes above is incomplete but brilliant. It is what I was stumbling about trying to say when I was taking Category Theory at the University of Massachusetts, 1973-1977, and writing papers on molecular cycbernetics (studying actual microbiology under Bruce Levin) and doing extensive research on learning theory and Artificial Intelligence under the guidance of Oliver Selfridge (“father of Machine Perception”, William Kilmer, Ed Riseman, and others, and studying how Chess Masters and Intenational Masters learn Chess (with Danny Kopec, who then became Chess advisor to the University of Edinburgh’s AI group) plus making regular trips to Cambridge MA to speak with Minsky and others at the AI Lab.

Posted by: Jonathan Vos Post on September 23, 2007 8:15 PM | Permalink

Re: Brilliant! Re: Category Theory in Evolutionary Computation Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

Well, I wouldn’t accept “brilliant,” but incomplete is certainly true. Lately I’ve been reading up on the Chu construction, as there’s an obvious connection between functions of form

p:S\times T\rightarrow R

and objects in

\mathrm{Chu}(Set,R)

. The morphisms in

\mathrm{Chu}(Set,R)

are close to, but not quite, the “useful” monotonic maps between the candidate sets. The Chu category should allow sensible comparison between different views

(S',T')

and

(S'',T''

) of “the same” function

p:S\times T\rightarrow R

. It also suggests that the tensor of

S

and

T

, and not their Cartesian product, is the appropriate way of representing the space of possible interactions when the entities involved become more complicated than just elements of a set. Since the tensor of two state spaces seems to be the right way to represent joint systems in many other contexts, it’d be nice if that proved true here as well.

Posted by: Anthony Bucci on October 10, 2007 3:12 PM | Permalink

Re: Re: Re: Your Boat, Gently Down the Stream

Could you guys please stop creating ever longer subject headers for your comments? The universe might explode.

Posted by: John Baez on October 11, 2007 3:31 AM | Permalink

Chu

$MathML-enabled post (click for more details).$

(How’s that, John?)

Interesting: as far as I can recall, this is the first time the Chu construction has ever come up at the Café. (Edit: no, it seems I’ve blabbed about them a little before.)

For those who didn’t know, the Chu construction is a marvelous construction which contains many categorical dualities under one big self-dual umbrella (read: $*$ -autonomous category). Specifically, I have in mind the category of Chu spaces, which embodies a large number of concrete dualities in which the 2-element set appears as a Janusian or ambimorphic object, but the construction is rather more general, as Anthony indicates.

Let’s start with Chu spaces. A Chu space is a pair of sets $(A, B)$ equipped with a function $\langle -, - \rangle: A \times B \to 2$ , which we think of as a ‘pairing’. (Sometimes, again as indicated by Anthony, one thinks of the elements of $A$ as strategies adopted by a Player and the elements of $B$ as ‘co-strategies’ adopted by an Opponent, and the pairing pits a strategy against a co-strategy and spits out an outcome, either 0 or 1, which says who wins.) A morphism from $(A, B)$ to $(X, Y)$ is a pair of functions $f: A \to X$ , $g: Y \to B$ , which are adjoint to one another:

$\langle f(a), y \rangle = \langle a, g(y) \rangle.$

The category of Chu spaces is self-dual: simply map $(A, B)$ to $(B, A)$ [with the obvious induced pairing], and $(f, g): (A, B) \to (X, Y)$ to $(g, f): (Y, X) \to (B, A)$ !

And, it is a very nice fact that many familiar concrete dualities embed (as full subcategories) in Chu spaces, in a duality-preserving manner. For example, there is a concrete duality between sets and complete atomic Boolean algebras: the dual of a set $X$ is a complete atomic Boolean algebrs $Y = hom(X, 2)$ , and the dual of a complete atomic Boolean algebra $Y$ is a set $X = hom(Y, 2)$ [the set of Boolean algebra maps preserving arbitrary infs and sups], and there is in this way a pairing $X \times Y \to 2$ between the underlying sets. Hence we obtain (full) embeddings

$Set \to Chu: X \mapsto (X, hom(X, 2))$

$CABool \to Chu: Y \mapsto (Y, hom(Y, 2))$

and the self-duality transposition of $Chu$ restricts to the concrete duality between $Set$ and $CABool$ . Other familiar dualities, such as between Boolean algebras and Stone spaces, between posets and completely distributive lattices, between finite linear orders and finite intervals, etc., embed fully in $Chu$ in this duality-compatible way.

Moreover this gives a sensible framework in which to speak of, for example, a structure-preserving map from a Stone space to a continuous lattice: just treat them all as Chu spaces! The underlying set of a Chu space structure $(A, B, \langle -, - \rangle)$ would be the first component $A$ , and the underlying set of the dual structure would be $B$ .

And, there’s more: the category of Chu spaces carries a nice closed symmetric monoidal category structure, in fact a star-autonomous category structure. I won’t go into all the details, but as one would imagine, the underlying set of the Chu function space $(X, Y)^{(A, B)}$ is the set of Chu space maps $(A, B) \to (X, Y)$ . Actually, with a little knowledge of star-autonomous categories, one can pretty much figure out how all this structure works, using the fact that the Chu space $(2, 1)$ (with the obvious pairing) is the dualizing object which turns the abstract self-duality of $Chu$ into a concrete one.

More generally, starting with a closed symmetric monoidal category $V$ with pullbacks, together with a chosen object $D$ (in place of 2) to play the role of a “dualizing object”, one can mimic this construction to produce a $*$ -autonomous category $Chu(V, D)$ . (See Barr’s paper below.)

The Chu construction has been something of a cottage industry in the past fifteen years or so. There is an embarrassment of riches to choose from, but the article by Vaughan Pratt gives a nice reader-friendly introduction with applications to game semantics of linear logic. Another early article by Barr (who was Chu’s adviser) with lots of categorical details and constructions (again with a view to linear logic) can be found here. Finally, as long as people are talking about Hopf algebras over at the Secret Blogging Seminar, I might as well mention a 1995 paper by Rick Blute on the connection between Hopf algebra representations and linear logic.

Posted by: Todd Trimble on October 11, 2007 4:11 PM | Permalink

Re: Category Theory in Machine Learning

Well well,

I have thought the idea in 2008 August and read several books how n-category theory and it’s relatives can be transfered into a machine learning model or/and help a machine learning approach in explaining several cognitive processes which machine can not do. N-Cat is very applicable to layered based generative models. However, the problem is efficiency.

Posted by: Hayri Volkan Agun on February 7, 2009 7:55 PM | Permalink

Re: Category Theory in Machine Learning

I’d love for us to find a robust use of n-categories in machine learning. I’ve been very impressed by Geoff Hinton’s work. Could hyperstructures help us with deep belief networks.

Take an hour to watch this video. Not only fascinating, but Hinton’s very funny.

Posted by: David Corfield on February 8, 2009 12:34 PM | Permalink

Re: Category Theory in Machine Learning

Regarding the comments on using type theory in machine learning, the following may be of interest: http://www.cs.cmu.edu/~fp/papers/popl05.pdf. The authors mention the Giry/Lawvere monad.

An explicit attempt to apply category theory to machine learning is
http://www.adjoint-functors.net/Wojtowicz_Fusion2009pp.pdf.

Posted by: Ralph Wojtowicz on December 23, 2009 5:14 AM | Permalink

Re: Category Theory in Machine Learning

Mmmmmm…
Stumbling upon this thread due to the recent comment by Ralph Wojtowicz elicits the weird feeling that category theory is “a solution in search of problems”.

Posted by: J-L Delatre on December 23, 2009 6:09 AM | Permalink

Re: Category Theory in Machine Learning

I think it’s more that there are so many different “educated guesses” about what the next good stepping stone in artificial reasoning/machine learning that there’s little “truly compelling” theories of learning utilising category theory yet.

(There’s also the issue that in many pratical machine learning projects, one ideally wants models that are only as general as absolutely necessary to solve the problem in order to both make learning from relatively restricted training data both pratically tractable but also reducing the model complexity increases the likelihood that a “sharp” model instance will be learned.)

Posted by: bane on December 23, 2009 6:12 PM | Permalink

Re: Category Theory in Machine Learning

there are so many different “educated guesses” about what the next good stepping stone in artificial reasoning/machine learning

My uneducated guess is that the current main stumbling block is in going from the informal to the formal, not only fuzziness but vagueness must be resolved.
Perhaps there is something to find in machine learning about feature extraction, synthesizing the hypothesis space from scratch (well… the raw data actually).
There is no lack of math crunching ready to fire once “the problem is well defined”, it’s the murky part which is still missing not the “high maths”.

Posted by: J-L Delatre on December 23, 2009 7:31 PM | Permalink

Read the post Causality in Machine Learning
Weblog: The n-Category Café
Excerpt: Introducing causality into machine learning
Tracked: November 8, 2021 10:22 AM

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

A few years on, the field has come quite a way!

Here’s a current preprint:

Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković, Categorical Deep Learning: An Algebraic Theory of Architectures, (arXiv:2402.15332)

Posted by: David Corfield on February 26, 2024 11:51 AM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

What might functorial representations offer deep learning interpretability?

Posted by: Doktoral Teknik Elektro on June 3, 2025 1:04 AM | Permalink

Re: Category Theory in Machine Learning

$MathML-enabled post (click for more details).$

I have no idea! I hope someone is thinking about this.

Posted by: John Baez on July 2, 2025 12:04 PM | Permalink

The n-Category Café

Skip to the Main Content

September 5, 2007