### Enrichment and the Legendre–Fenchel Transform I

#### Posted by Simon Willerton

The Legendre–Fenchel transform, or Fenchel transform, or convex conjugation, is, in its naivest form, a duality between convex functions on a vector space and convex functions on the dual space. It is of central importance in convex optimization theory and in physics it is used to switch between Hamiltonian and Lagrangian perspectives.

Suppose that $V$ is a real vector space and that
$f\colon V\to [-\infty ,+\infty ]$
is a function then the **Fenchel transform** is the function
$f^{\ast }\colon V^{#}\to [-\infty ,+\infty ]$
defined on the dual vector space $V^{#}$ by
$f^{\ast }(k)\coloneqq \sup _{x\in V}\big \{ \langle k,x\rangle -f(x)\big \} .$

If you’re a regular reader then you will be unsurprised when I say that I want to show how it naturally arises from enriched category theory constructions. I’ll show that in the next post. In this post I’ll give a little introduction to the Legendre–Fenchel transform.

There is probably no best way to introduce the Legendre–Fenchel transform. The only treatment that I knew for many years was in Arnold’s book *Mathematical Methods of Classical Mechanics*, but I have recently come across the convex optimization literature and would highly recommend Hugo Touchette’s *The Fenchel Transform in a Nutshell* — my treatment here is heavily influenced by this paper. I will talk mainly about the one-dimensional case as I think that gives a lot of the intuition.

We will start, as Legendre did, with the special case of a strictly convex differentiable function $f\colon \mathbb{R}\to \mathbb{R}$; for instance, the function $x^{2}+1/2$ pictured on the left hand side above. The derviative of $f$ is strictly increasing and so the function $f$ can be parametrized by the derivative $k =d f/d x$ instead of the parameter $x$. Indeed we can write the parameter $x$ in terms of the slope $k$, $x=x(k)$. The Legendre-Fenchel transform $f^{*}$ can then be defined to satisfy $\langle k,x \rangle = f(x) +f^{\ast }(k),$ where the angle brackets mean the pairing between a vector space and its dual. In this one-dimensional case, where $x$ and $k$ are thought of as real numbers, we just have $\langle k,x \rangle = k x$.

As $x$ is a function of $k$ we can rewrite this as $f^{\ast }(k)\coloneqq \langle k,x(k) \rangle -f(x(k)).$ So the Legendre-Fenchel transform encodes the function is a different way. By differentiating this equation you can see that the $d f^{\ast }/d k=x(k)$, thus we have interchanged the abcissa (the horizontal co-ordinate) and the slope. So if $f$ has derivative $k_{0}$ at $x_{0}$ then $f^{\ast }$ has derivative $x_{0}$ at $k_{0}$. This is illustrated in the above picture.

I believe this is what Legendre did and then that what Fenchel did was to generalize this to non-differentiable functions.

For non-differentiable functions, we can’t talk about tangent lines and derivatives, but instead can talk about *supporting lines*. A supporting line is one which touches the graph at at least one point and never goes above the graph. (The fact that we’re singling out lines not going above the graph means that we have *convex* functions in mind.)

For instance, at the point $(x_{0},f(x_{0}))$ the graph pictured below has no tangent line, but has supporting lines with gradient from $k_{1}$ to $k_{2}$. A convex function will have at least one supporting line at each point.

It transpires that the right way to generalize the transform to this non-differentiable case is to define it as follows: $f^{\ast }(k)\coloneqq \sup _{x\in \mathbb{R}}\big \{ \langle k,x\rangle -f(x)\big \} .$ In this case, if $f$ has a supporting line of slope $k_{0}$ at $x_{0}$ then $f^{\ast }$ has a supporting line of slope $x_{0}$ at $k_{0}$. In the picture above, at $x_{0}$, the function $f$ has supporting lines with slope from $k_{1}$ to $k_{2}$: correspondingly, the function $f^{\ast }$ has supporting lines with slope $x_{0}$ all the way from $k_{1}$ to $k_{2}$.

If we allow the function $f$ to be not strictly convex then the transform will not alway be finite. For example, if $f(x)\coloneqq ax+b$ then we have $f^{\ast }(a)=-b$ and $f^{\ast }(k)=+\infty$ for $k\ne a$. So we will allow functions taking values in the extended real numbers: $\overline{\mathbb{R}}\coloneqq [-\infty ,+\infty ]$.

We can use the above definition to get the transform of *any* function $f\colon \mathbb{R}\to \overline{\mathbb{R}}$, whether convex or not, but the resulting transform $f^{\ast }$ is *always* convex. (When there are infinite values involved we can also say that $f^{\ast }$ is lower semi-continuous, but I’ll absorb that into my definition of convex for functions taking infinite values.)

Everything we’ve done for one-dimensional $\mathbb{R}$ easily generalizes to any finite dimensional real vector space $V$, where we should say ‘supporting hyperplane’ instead of ‘supporting line’. From that we get a transform between sets of functions
$(\text {--})^{\ast }\colon \mathrm{Fun}(V,\overline{\mathbb{R}})\to \mathrm{Fun}(V^{#},\overline{\mathbb{R}}),$
where $V^{#}$ is the vector space dual of $V$. Similarly, we have a *reverse* transform going the other way, which is traditionally also denoted with a star
$(\text {--})^{\ast }\colon \mathrm{Fun}(V^{#},\overline{\mathbb{R}})\to \mathrm{Fun}(V,\overline{\mathbb{R}}),$
for $g\colon V^{#}\to \overline{\mathbb{R}}$ we define
$g^{\ast }(x)\coloneqq \sup _{k\in V^{#}}\big \{ \langle k,x\rangle -g(k)\big \} .$

This pair of transforms have some rather nice properties, for instance, they are order reversing. We can put a partial order on any set of functions to $\overline{\mathbb{R}}$ by defining $f_{1}\ge f_{2}$ if $f_{1}(x)\ge f_{2}(x)$ for all $x$. Then
$f_{1}\ge f_{2} \quad \Rightarrow \quad f_{2}^{\ast }\ge f_{1}^{\ast }.$
Also for any function $f$ we have
$f^{\ast }=f^{\ast \ast \ast }$
which implies that the operator $f\mapsto f^{\ast \ast }$ is idempotent:
$f^{\ast \ast }=f^{\ast \ast \ast \ast }.$
This means that $f\mapsto f^{\ast \ast }$ is a *closure* operation. What it actually does is take the convex envelope of $f$, which is the largest convex function less than or equal to $f$. Here’s an example.

This gives that if $f$ is already a convex function then $f^{\ast \ast }=f$. And as a consequence the Legendre–Fenchel transform and the reverse transform restrict to an order reversing bijection between convex functions on $V$ and convex functions on its dual $V^{#}$. $\mathrm{Cvx}(V,\overline{\mathbb{R}})\cong \mathrm{Cvx}(V^{#},\overline{\mathbb{R}}).$

There are many other things that can be said about the transform, such as Fenchel duality and the role it plays in optimization, but I don’t understand such things to my own satisfaction yet.

Next time I’ll explain how most of the above structure drops out of the nucleus construction in enriched category theory.

## Re: Enrichment and the Legendre-Fenchel Transform I

Physics question: what’s the quantum counterpart to one’s classical Hamiltonian being convex?

I.e. what property does the “corresponding” Hermitian operator on $L^2(R^n)$ have?