Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

June 6, 2021

Optimal Transport and Enriched Categories I

Posted by Simon Willerton

Last year, in an ACT@UCR talk, I spoke about the Fenchel–Legendre transform from a category theoretic perspective and I showed how convex functions arise as a ‘profunctor nucleus’ in the context of categories enriched over the extended real numbers [,][-\infty, \infty]. At the end of the talk I gave four other examples of things which arise as profunctor nuclei, the final one of which was I put in at the last minute and labelled as “tentative”. John Baez took up the scent and asked me to explain why it was “tentative”, the answer was because I hadn’t thought about it for a while. I decided to write it up here at the Café, but the intervening year has had me concentrate on keeping our department running during the you-know-what, so this has been gestating a while!

Anyway in this series of posts I want to explain how aspects of optimal transport problems can be thought of in terms of enriched category theory, profunctors and related constructions. The genesis of this was a conversation following a comment of mine to Mike Shulman’s classic post “Equipments”, in particular Tobias Fritz pointing me to Cédric Villani’s book “Optimal transport: old and new”.

In this first first post I want to state the optimal transport problem (in the finite, discrete setting) and then describe the dual problem. I’ll end with a little digression on Kantorovich and a plug for the book Red Plenty by Francis Spufford.

The optimal transport problem

Suppose that you want to transport material from ss suppliers S 1,,,S sS_1, ,\dots, S_s to rr receivers R 1,R rR_1,\dots R_r. You might imagine that this arises when you have a collective of bakeries and cafés, with ss bakeries supplying loaves of bread to rr cafés. Suppose that the cost of moving one unit of material (eg. one load of bread) from supplier S iS_i to receiver R jR_j is k ijR 0k_{i j}\in \R_{\ge 0}. Suppose also that the supply available at supplier S iS_i is σ i\sigma_i and the demand at receiver R jR_j is ρ j\rho_j.

Example. You can consider the following example. There are three bakeries - Fletcher’s, Gerry’s and Depot - in the North of the city supplying three cafés - Cawa, Tamper and Rude Shipyard - in the South of the city.


The supply available at a bakery is written above the name and the demand at a café is written below the name.

We will suppose that the cost matrix {k ij}\{k_{i j}\}, in pence, for transporting one loaf of bread is as follows.

 CawaTamperRude Shipyard

The supply and demand, in loaves, can be represented in the following way.

 CawaTamperRude ShipyardSupply
Fletcher's   350
Depot   200
Gerry's   100

A transport plan is a decision on how many units of material to ship from each supplier to each receiver such that demand is met and supply is not exceeded. In other words it is a choice α ijR 0\alpha_{i j}\in\R_{\ge 0} for each i,ji,j such that iα ijρ j\sum_{i}\alpha_{i j} \ge \rho_j and jα ijσ i\sum_{j}\alpha_{i j} \le \sigma_i. The cost of a transport plan is the quantity ijk ijα ij\sum_{i j}k_{i j}\alpha_{i j}.

An optimal transport plan is a transport plan which minimizes the total costs, ie. which minimizes ijk ijα ij\sum_{i j}k_{i j}\alpha_{i j} over all transport plans. The optimal cost is the cost of an optimal plan.

It should be reasonably obvious that in an optimal transport plan the demand constraint will be an equality, iα ij=ρ j\sum_{i}\alpha_{i j} = \rho_j, as, if the demand is exceeded at one of the recievers, say R j 0R_{j_0} then you can reduce the amount of material going to R j 0R_{j_0} and hence the transport cost, whilst still satisfying the demand and supply constraints. We will assume that the total demand equals the total supply.

Example. In the cafés and bakeries example you should be able to convince yourself that the following is an optimal transport plan.

CawaTamper Rude Shipyard

The total cost is then 21,300. You’ll see below that this is indeed optimal.

In the realm of linear programming problems, where such optimal transport problems live, each problem has an associated ‘dual’ problem. It is the dual of the transport problem that will be of interest to us in this series of posts. I will go into the general construction of dual problems next time and here will just state the dual problem in the optimal transport case.

The dual optimal transport problem

Statement of the dual problem

For given transport cost, demand and supply we have seen that the transport problem is to find a transport plan which minimizes the total cost. For abstract reasons (linear programming duality), which I’ll go into in the next post, there is an associated dual optimal transport problem which is to find a ‘optimal dual transport plan’ which maximizes total ‘revenue’. I’ll state what an optimal dual transport plan is and then give an intuitive way of viewing this dual problem.

Given transport cost {k ij}\{k_{i j}\}, demand {ρ i} i=1 r\{\rho_i\}_{i=1}^r and supply {σ j} j=1 s\{\sigma_j\}_{j=1}^s as above, a dual transport plan is choice of ‘prices’ {u i} i=1 r\{u_i\}_{i=1}^r and {v j} j=1 s\{v_j\}_{j=1}^s such that u i,v jR 0u_i,v_j\in \R_{\ge 0} for all i,ji,j and u iv jk iju_i-v_j\le k_{i j} for all i,ji,j. The revenue of a dual transport plan is the quantity i=1 ru iρ i j=1 sv jσ j\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j.

An optimal dual transport plan is a dual transport plan which maximizes the total revenue, ie. which maximizes i=1 ru iρ i j=1 sv jσ j\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j over all dual transport plans. The optimal revenue is the revenue of an optimal dual plan.

The general reason for interest in the dual problem is that by the easy part of linear programming duality (which I’ll go into next time) for a given transport context the revenue of any dual transport plan is less than or equal to the cost of any transport plan. In particular, the revenue of any dual transport plan gives a lower bound for the optimal cost of the problem. Moreover, by the hard part of linear programming duality, this is a strict bound in the following sense.

Linear programming duality. The optimal revenue of the dual transport problem is equal to the optimal cost of the original transport problem.

The fable

The following story is often told as a way to think about the dual problem in this case. I think that as an analogy it can be helpful, but it has its limits.

Suppose that an external transportation firm offers to take over the transportation of the material. They have an unrelated transport mechanism and their costs are unrelated to the costs of the usual transportation method. They have a different pricing structure, they do not charge depending on the route taken by the material, rather for each supplier they will pay a fixed price per unit and for each receiver they will charge a fixed price per unit. So for each unit from supplier S iS_i they will pay the company u iu_i and for each unit they deliver to receiver R jR_j they will charge v jv_j. For this to be competitive or feasible, this needs to be at least as cheap to ship something from supplier S iS_i to receiver R jR_j, in other words, u iv jk iju_i-v_j\le k_{i j} for each ii and jj. Given such a pricing structure or ‘dual plan’, the transport company’s total revenue is i=1 ru iρ i j=1 sv jσ j\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j. This is the quantity they wish to maximize.

Example. You can check that the following gives a dual transport plan. This means checking that transferring a loaf from a given bakery to a given café is not more expensive than using the collective’s own transport system. For instance, under this plan, when transporting a loaf from Depot Bakery to the Rude Shipyard, the bakery gets 22p for the loaf and the café has to pay 47p for it. This means it costs the collective 25p, which is not more expensive than the previous 30p cost.

 CawaTamperRude ShipyardPrice
Fletcher's   0
Depot   22
Gerry's   25

Observe that adding a constant (eg. 100) to all of the prices will still give a dual plan and won’t alter the revenue. Note that, as you might expect, bread from the ‘furthest’ bakery is worth least and the ‘furthest’ café pays most for its bread.

You can also check that this leads to a revenue of 21,300, which is the same as the cost we found for the original transport problem. This means that we know we have an optimum for both problems!

Next time I’ll explain a little more about linear programming duality. After that I’ll come back to further constraints satisfied by optimal dual transport plans and link that in with enriched category theory. But for now I’ll end with a digression on one of the founders of linear programming duality.

A digression on Kantorovich

In 1975 Leonid Vitaliyevich Kantorovich, together with Tjalling Koopmans, was awarded the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (also known, incorrectly, as the Nobel Prize in Economics) for his work on linear programming. Kantorovich was born in Saint Petersburg in 1910 and became a brilliant mathematician at a young age, specializing particularly in functional analysis. Around 1939 he was asked by a plywood factory to help with optimizing the use of their various cutting machines in order to reduce the amount of waste. In the process of looking at this problem he discovered linear programming. Over the next few years he realised how this method could be used in central Soviet planning.

The Soviet Union had adopted a centralized, collectivized economy in the late 1920s under Stalin. Industry and agriculture were contolled by the state with the productivity levels set by a central organisation called Gosplan. For instance, Gosplan would tell each rubber factory how much rubber to produce every year, so that there would be enough rubber to provide tyres for the number of tractors that it was telling the tractor factories to produce. This is in contrast to free market systems in which productivity is determined locally in factories in response to demand (I’m undoubtably oversimplifying). Clearly, centrally fixing the productivity levels for all industries across a large country is a massive task and Kantorovich soon realised how linear programming could be used to do this in an optimal fashion.

Unfortunately, there was a significant political snag. I described above how when you consider the linear programming dual you are led to searching for ‘prices’ which maximize ‘revenue’. These prices are in some sense a mathematical artifact that don’t necessarily exist in the real world - the transportation company was just part of a fable to help understand the computation. However, these ‘shadow prices’ were seen as running counter to the dominant interpretation of Marxist philosophy of the time and Kantorovich’s work on applying optimization to central planning was heavily suppressed under the Stalinist regime. It wasn’t until 1959, after the death of Stalin and following Krushev’s de-Stalinization, that Kantorovich’s book “The Best Use of Economic Resources” finally appeared. In 1960 he moved to the newly-formed, prestigious Novosibirsk State University in Siberia, having been invited to create the Department of Computational Mathematics.

Kantorovich’s time in Novosibirsk is touched on in a rather splendid book about central Soviet planning

which was recommended to me by Martin Hyland. The book is a fictionalised account of a certain period of the planned economy of the Soviet Union and features real historical characters, in particular Kantorovich. You can get a flavour of the book in this review at the Guardian; but just get the book and read it. Never would I have suspected that a book on Soviet planned economy would be so gripping: it helps that Spufford can really write.

I hope that Kantorovich will reappear later in this series of posts.

Posted at June 6, 2021 11:42 AM UTC

TrackBack URL for this Entry:

10 Comments & 0 Trackbacks

Re: Optimal Transport and Enriched Categories I

Nice post; thanks!

I’m trying to get my head round linear programming duality. Let me tell you the very simple thing that’s already in my head, and then I want to (i) ask if it’s correct, and (ii) ask you to help me bridge the gap between the simple thing and what you wrote in the post.

The simple thing is about the isoperimetric inequality. For concreteness, let’s do it in the plane. The usual way to state it is: among all closed curves with a given perimeter, the one that maximizes the enclosed area is the circle. But an equivalent formulation is: among all closed curves enclosing a given area, the one that minimizes perimeter is the circle.

I’ve long heard optimization people talk about duality and lazily assumed that this was an example, without ever bothering to look it up. So my question (i) is: is it? Now that I type these words, I start suspecting that it might not be about duality in linear programming, given that areas don’t scale linearly. But perhaps it’s a dual optimization problem in some more general sense.

And then my question (ii) would be how to fit it into your framework.

Posted by: Tom Leinster on June 6, 2021 8:04 PM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Hi Tom, the situation you’re discussing is a little beyond what I’ve currently got settled in my head.

The situation I’m discussing here is the finite dimensional linear case (at least for in the next few posts I’ve thought about!). I’ll go into more detail next time, but typically you want to minimize a linear function i=1 nc ix i\sum_{i=1}^n c_i x_i for x 0 nx\in \mathbb{R}_{\ge 0}^{n} subject to a set of mm inequality constraints i=1 nA jix ib j\sum_{i=1}^n A_{j i} x_i\ge b_j for j=1,,mj=1,\dots,m. So you have nn variables and mm constraints. The dual problem is a maximization problem with mm variables and nn constraints, namely maximize j=1 my jb j\sum_{j=1}^m y_j b_j over y 0 my\in \mathbb{R}_{\ge 0}^m subject to j=1 my jA jic i\sum_{j=1}^m y_j A_{j i} \le c_i for i=1,,ni=1,\dots,n.

In the non-linear, finite dimensional case you might have a function f: nf\colon \mathbb{R}^n \to \mathbb{R} of nn variables which you wish to minimize subject to mm constraints g j(x)0g_j(x)\ge 0 for j=1,,mj=1,\dots,m. The constraint functions give us a pairing between n\mathbb{R}^n and m\mathbb{R}^m, namely (x,y) j=1 my jg j(x)=y Tg(x)(x,y)\mapsto \sum_{j=1}^{m}y_j g_j(x)=y^\mathrm{T}g(x). This pairing is something which you might wish to think of as profunctor coming from the functor g: n mg\colon \mathbb{R}^n \to \mathbb{R}^m. In the Legendre-Fenchel vein, this pairing/profunctor gives rise to a transform from functions on n\mathbb{R}^n to functions on m\mathbb{R}^m, namely f(yinf x(f(x)+y Tg(x)))f\mapsto (y\mapsto \inf_x(f(x) + y^\mathrm{T}g(x))). And it’s this transformation (or conjugate) of ff that you wish to maximize over 0 m\mathbb{R}_{\ge 0}^m for the dual problem. I certainly haven’t got my head round lots of things here and to do that is on my to do list. I suspect I need to spend some time with Rockafellar’s Conjugate duality and optimization.

Interestingly, this duality is not the focus of this series of posts. Rather, we’re going to see a duality, of a similar flavour, within the dual problem in the case of optimal transport.

You ask about the isoperimetric problem and I don’t know if that fits into this framework. It’s certainly an infinite dimensional problem, but I believe what I’ve mentioned generalizes, with some care, obviously, to certain infinite dimensional problems.

One example of an infinite dimensional problem that I’m aware of is optimal transport where you don’t have finite sets of suppliers and receivers with supply and demand functions, but you have metric spaces of suppliers and receivers with supply and demand distributions. (Imagine a region with a certain distribution of sand on it and you wish to move this to a certain distribution of sand over another region using the least effort.) This is a linear problem and the linear programming duality here is known as Kantorovich duality. My student Callum Reader has been thinking in this direction and touched on this tangentially in his ACT 2020 talk last year when he spoke about the Kantorovich monad which is a version of the Giry monad for metric spaces.

Posted by: Simon Willerton on June 7, 2021 9:02 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Thanks, that’s helpful. I liked the bakery/cafe example (which I guess was supplied to you by the nn-Category Bakery), but in truth, I probably prefer thinking about vector spaces to thinking about Sheffield bakeries and cafes. Though if thinking about vector spaces is in competition with visiting Sheffield bakeries and cafes, that’s a tougher one to call.

Posted by: Tom Leinster on June 7, 2021 4:55 PM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

The idea of using cafés and bakeries in an example was actually taken from Chapter 3 of Villani’s book, mainly because it fits nicely here. Villani describes Gaspard Monge as one of the founding fathers of optimal transport (before going on to Kantorovich), he notes that on Rue Monge in Paris there’s a bakery called “Le Boulanger de Monge” and on the basis of that uses a café and bakery example. His example is rather more abstract than mine.

Aside: as Villani notes, Monge also had his work suppressed for a period by the regime of the time.

Posted by: Simon Willerton on June 8, 2021 7:24 PM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

As soon as I read Simon’s comment, there flashed in my mind the passage in Mathematics Made Difficult where the principal character is Gasparde Mange, who rescues her brother the baker M. Boulangiere by an ingenious application of solving cubic equations (p. 137 and following), promising always to consume at least one cubic “variable loaf” each day.

Posted by: Todd Trimble on June 9, 2021 6:42 PM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

I don’t think the infinite-dimensionality is the issue here. We could ask the same questions with n-gons instead of curves. But the area and perimeter measures aren’t linear (in, say, the positions). Maybe you could change variables to make one of them linear but I’m pretty certain you couldn’t get both.

Posted by: Allen Knutson on June 8, 2021 3:32 PM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Interesting how one side of a duality, ‘shadow prices’, falls foul of the prevailing ideology.

I remember a logician telling me that in Russia in the 30s they’d disguise their work as algebra. This is mentioned also in Adrian Mathias’s Logic and Terror

by the early thirties, formal logic was well under attack, and Soviet logicians such as Markov had to give their papers titles suggesting that the content was not logic but group theory or other harmless concern.

Criticism was expressed in claims such as:

Formal logic is always a most trustworthy weapon in the hands of the predominant exploiting classes, a bastion of religion and obscurantism.


the laws of formal logic are opposed to the law of dialectical logic. Formal logic is empty, poor, abstract, for the laws and categories which it sets up do not correspond to objective reality.

Funny to see Mathias’s article end

the school of Lawvere attempts to achieve a mathematical dialectic with its doctrines of topos theory. Comically, the logic underlying much of topos theory is often intuitionistic: a wholly unexpected link between the self-absorption that produced intuitionism and the tyranny that promoted dialectic.

Posted by: David Corfield on June 8, 2021 8:53 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Funny to see Mathias’s article end

the school of Lawvere attempts to achieve a mathematical dialectic with its doctrines of topos theory. Comically, the logic underlying much of topos theory is often intuitionistic: a wholly unexpected link between the self-absorption that produced intuitionism and the tyranny that promoted dialectic.

Funny, strange, sad. The article as a whole is very interesting, but this sneer comes as if out of nowhere and (probably no one here needs convincing) is not a proper argument against the real advances of topos theory. It reads as pure polemic and rhetoric – not the first time I’ve seen such a thing from Mathias.

Posted by: Todd Trimble on June 9, 2021 1:46 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Little did he think that a mere “unexpected link” would grow into cohesive homotopy type theory!

Posted by: David Corfield on June 9, 2021 7:44 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories I

Looks like a nice introduction to a fun series of posts! I wish I had as much time for the café as in the past.

I think there are ways to think of the dual problem in terms of Lipschitz functions and enriched profunctors, and I guess you’re going to tell us about this in future installments. I’m very curious to see how strong the enriched category approach is going to be: can you use it to prove that the dual problem has the same optimal value, or do you merely get weak duality in the form of an inequality? (No need to give it away yet if you want to keep the suspense.)

Posted by: Tobias Fritz on June 9, 2021 3:41 AM | Permalink | Reply to this

Post a New Comment