### Optimal Transport and Enriched Categories I

#### Posted by Simon Willerton

Last year, in an ACT@UCR talk, I spoke about the Fenchel–Legendre transform from a category theoretic perspective and I showed how convex functions arise as a ‘profunctor nucleus’ in the context of categories enriched over the extended real numbers $[-\infty, \infty]$. At the end of the talk I gave four other examples of things which arise as profunctor nuclei, the final one of which was I put in at the last minute and labelled as “tentative”. John Baez took up the scent and asked me to explain why it was “tentative”, the answer was because I hadn’t thought about it for a while. I decided to write it up here at the Café, but the intervening year has had me concentrate on keeping our department running during the you-know-what, so this has been gestating a while!

Anyway in this series of posts I want to explain how aspects of optimal transport problems can be thought of in terms of enriched category theory, profunctors and related constructions. The genesis of this was a conversation following a comment of mine to Mike Shulman’s classic post “Equipments”, in particular Tobias Fritz pointing me to Cédric Villani’s book “Optimal transport: old and new”.

In this first first post I want to state the optimal transport problem (in the finite, discrete setting) and then describe the dual problem. I’ll end with a little digression on Kantorovich and a plug for the book Red Plenty by Francis Spufford.

### The optimal transport problem

Suppose that you want to transport material from $s$ suppliers $S_1, ,\dots, S_s$ to $r$ receivers $R_1,\dots R_r$. You might imagine that this arises when you have a collective of bakeries and cafés, with $s$ bakeries supplying loaves of bread to $r$ cafés. Suppose that the cost of moving one unit of material (eg. one load of bread) from supplier $S_i$ to receiver $R_j$ is $k_{i j}\in \R_{\ge 0}$. Suppose also that the supply available at supplier $S_i$ is $\sigma_i$ and the demand at receiver $R_j$ is $\rho_j$.

Example.You can consider the following example. There are three bakeries - Fletcher’s, Gerry’s and Depot - in the North of the city supplying three cafés - Cawa, Tamper and Rude Shipyard - in the South of the city.The supply available at a bakery is written above the name and the demand at a café is written below the name.

We will suppose that the cost matrix $\{k_{i j}\}$, in pence, for transporting one loaf of bread is as follows.

Cawa Tamper Rude Shipyard Fletcher's 39 44 47 Depot 22 22 30 Gerry's 14 25 29 The supply and demand, in loaves, can be represented in the following way.

Cawa Tamper Rude Shipyard SupplyFletcher's 350 Depot 200 Gerry's 100 Demand200 250 200

A **transport plan** is a decision on how many units of material to ship from each supplier to each receiver such that demand is met and supply is not exceeded. In other words it is a choice $\alpha_{i j}\in\R_{\ge 0}$ for each $i,j$ such that $\sum_{i}\alpha_{i j} \ge \rho_j$ and $\sum_{j}\alpha_{i j} \le \sigma_i$. The **cost** of a transport plan is the quantity $\sum_{i j}k_{i j}\alpha_{i j}$.

An **optimal transport plan** is a transport plan which minimizes the total costs, ie. which minimizes $\sum_{i j}k_{i j}\alpha_{i j}$ over all transport plans. The **optimal cost** is the cost of an optimal plan.

It should be reasonably obvious that in an optimal transport plan the demand constraint will be an *equality*, $\sum_{i}\alpha_{i j} = \rho_j$, as, if the demand is exceeded at one of the recievers, say $R_{j_0}$ then you can reduce the amount of material going to $R_{j_0}$ and hence the transport cost, whilst still satisfying the demand and supply constraints. We will assume that the total demand equals the total supply.

Example.In the cafés and bakeries example you should be able to convince yourself that the following is an optimal transport plan.

Cawa Tamper Rude Shipyard Fletcher's 100 50 200 Depot 0 200 0 Gerry's 100 0 0 The total cost is then 21,300. You’ll see below that this is indeed optimal.

In the realm of linear programming problems, where such optimal transport problems live, each problem has an associated ‘dual’ problem. It is the dual of the transport problem that will be of interest to us in this series of posts. I will go into the general construction of dual problems next time and here will just state the dual problem in the optimal transport case.

### The dual optimal transport problem

#### Statement of the dual problem

For given transport cost, demand and supply we have seen that the transport problem is to find a transport plan which *minimizes* the total cost. For abstract reasons (linear programming duality), which I’ll go into in the next post, there is an associated **dual optimal transport problem** which is to find a ‘optimal dual transport plan’ which *maximizes* total ‘revenue’. I’ll state what an optimal dual transport plan is and then give an intuitive way of viewing this dual problem.

Given transport cost $\{k_{i j}\}$, demand $\{\rho_i\}_{i=1}^r$ and supply $\{\sigma_j\}_{j=1}^s$ as above, a **dual transport plan** is choice of ‘prices’ $\{u_i\}_{i=1}^r$ and $\{v_j\}_{j=1}^s$ such that $u_i,v_j\in \R_{\ge 0}$ for all $i,j$ and $u_i-v_j\le k_{i j}$ for all $i,j$. The **revenue** of a dual transport plan is the quantity $\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j$.

An **optimal dual transport plan** is a dual transport plan which maximizes the total revenue, ie. which maximizes $\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j$ over all dual transport plans. The **optimal revenue** is the revenue of an optimal dual plan.

The general reason for interest in the dual problem is that by the easy part of linear programming duality (which I’ll go into next time) for a given transport context the revenue of any dual transport plan is less than or equal to the cost of any transport plan. In particular, the revenue of any dual transport plan gives a lower bound for the optimal cost of the problem. Moreover, by the hard part of linear programming duality, this is a strict bound in the following sense.

Linear programming duality.The optimal revenue of the dual transport problem is equal to the optimal cost of the original transport problem.

#### The fable

The following story is often told as a way to think about the dual problem in this case. I think that as an analogy it can be helpful, but it has its limits.

Suppose that an external transportation firm offers to take over the transportation of the material. They have an unrelated transport mechanism and their costs are unrelated to the costs of the usual transportation method. They have a different pricing structure, they do not charge depending on the route taken by the material, rather for each supplier they will pay a fixed price per unit and for each receiver they will charge a fixed price per unit. So for each unit from supplier $S_i$ they will pay the company $u_i$ and for each unit they deliver to receiver $R_j$ they will charge $v_j$. For this to be competitive or feasible, this needs to be at least as cheap to ship something from supplier $S_i$ to receiver $R_j$, in other words, $u_i-v_j\le k_{i j}$ for each $i$ and $j$. Given such a pricing structure or ‘dual plan’, the transport company’s total revenue is $\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j$. This is the quantity they wish to maximize.

Example.You can check that the following gives a dual transport plan. This means checking that transferring a loaf from a given bakery to a given café is not more expensive than using the collective’s own transport system. For instance, under this plan, when transporting a loaf from Depot Bakery to the Rude Shipyard, the bakery gets 22p for the loaf and the café has to pay 47p for it. This means it costs the collective 25p, which is not more expensive than the previous 30p cost.

Cawa Tamper Rude Shipyard PriceFletcher's 0 Depot 22 Gerry's 25 Price39 44 47 Observe that adding a constant (eg. 100) to all of the prices will still give a dual plan and won’t alter the revenue. Note that, as you might expect, bread from the ‘furthest’ bakery is worth least and the ‘furthest’ café pays most for its bread.

You can also check that this leads to a revenue of 21,300, which is the same as the cost we found for the original transport problem. This means that we know we have an optimum for both problems!

Next time I’ll explain a little more about linear programming duality. After that I’ll come back to further constraints satisfied by optimal dual transport plans and link that in with enriched category theory. But for now I’ll end with a digression on one of the founders of linear programming duality.

### A digression on Kantorovich

In 1975 Leonid Vitaliyevich Kantorovich, together with Tjalling Koopmans, was awarded the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (also known, incorrectly, as the Nobel Prize in Economics) for his work on linear programming. Kantorovich was born in Saint Petersburg in 1910 and became a brilliant mathematician at a young age, specializing particularly in functional analysis. Around 1939 he was asked by a plywood factory to help with optimizing the use of their various cutting machines in order to reduce the amount of waste. In the process of looking at this problem he discovered linear programming. Over the next few years he realised how this method could be used in central Soviet planning.

The Soviet Union had adopted a centralized, collectivized economy in the late 1920s under Stalin. Industry and agriculture were contolled by the state with the productivity levels set by a central organisation called Gosplan. For instance, Gosplan would tell each rubber factory how much rubber to produce every year, so that there would be enough rubber to provide tyres for the number of tractors that it was telling the tractor factories to produce. This is in contrast to free market systems in which productivity is determined locally in factories in response to demand (I’m undoubtably oversimplifying). Clearly, centrally fixing the productivity levels for all industries across a large country is a massive task and Kantorovich soon realised how linear programming could be used to do this in an optimal fashion.

Unfortunately, there was a significant political snag. I described above how when you consider the linear programming dual you are led to searching for ‘prices’ which maximize ‘revenue’. These prices are in some sense a mathematical artifact that don’t necessarily exist in the real world - the transportation company was just part of a fable to help understand the computation. However, these ‘shadow prices’ were seen as running counter to the dominant interpretation of Marxist philosophy of the time and Kantorovich’s work on applying optimization to central planning was heavily suppressed under the Stalinist regime. It wasn’t until 1959, after the death of Stalin and following Krushev’s de-Stalinization, that Kantorovich’s book “The Best Use of Economic Resources” finally appeared. In 1960 he moved to the newly-formed, prestigious Novosibirsk State University in Siberia, having been invited to create the Department of Computational Mathematics.

Kantorovich’s time in Novosibirsk is touched on in a rather splendid book about central Soviet planning

which was recommended to me by Martin Hyland. The book is a fictionalised account of a certain period of the planned economy of the Soviet Union and features real historical characters, in particular Kantorovich. You can get a flavour of the book in this review at the Guardian; but just get the book and read it. Never would I have suspected that a book on Soviet planned economy would be so gripping: it helps that Spufford can really write.

I hope that Kantorovich will reappear later in this series of posts.

## Re: Optimal Transport and Enriched Categories I

Nice post; thanks!

I’m trying to get my head round linear programming duality. Let me tell you the very simple thing that’s already in my head, and then I want to (i) ask if it’s correct, and (ii) ask you to help me bridge the gap between the simple thing and what you wrote in the post.

The simple thing is about the isoperimetric inequality. For concreteness, let’s do it in the plane. The usual way to state it is: among all closed curves with a given perimeter, the one that

maximizesthe enclosed area is the circle. But an equivalent formulation is: among all closed curves enclosing a given area, the one thatminimizesperimeter is the circle.I’ve long heard optimization people talk about duality and lazily assumed that this was an example, without ever bothering to look it up. So my question (i) is: is it? Now that I type these words, I start suspecting that it might not be about duality in

linearprogramming, given that areas don’t scale linearly. But perhaps it’s a dual optimization problem in some more general sense.And then my question (ii) would be how to fit it into your framework.