Planet Musings

October 23, 2018

John BaezCategory Theory Course

I’m teaching a course on category theory at U.C. Riverside, and since my website is still suffering from reduced functionality I’ll put the course notes here for now. I taught an introductory course on category theory in 2016, but this one is a bit more advanced.

The hand-written notes here are by Christian Williams. They are probably best seen as a reminder to myself as to what I’d like to include in a short book someday.

Lecture 1 What is pure mathematics all about? The importance of free structures.

Lecture 2: The natural numbers as a free structure. Adjoint functors.

Lecture 3: Adjoint functors in terms of unit and counit.

Lecture 4: 2-Categories. Adjunctions.

Lecture 5: 2-Categories and string diagrams. Composing adjunctions.

Lecture 6: The ‘main spine’ of mathematics. Getting a monad from an adjunction.

Lecture 7: Definition of a monad. Getting a monad from an adjunction. The augmented simplex category.

Lecture 8: The walking monad, the augmented simplex category and the simplex category.

Lecture 9: Simplicial abelian groups from simplicial sets. Chain complexes from simplicial abelian groups.

Lecture 10: The Dold-Thom theorem: the category of simplicial abelian groups is equivalent to the category of chain complexes of abelian groups. The homology of a chain complex.

Terence Tao254A, Notes 1: Local well-posedness of the Navier-Stokes equations

We now begin the rigorous theory of the incompressible Navier-Stokes equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = \nu \Delta u - \nabla p \ \ \ \ \ (1)

\displaystyle  \nabla \cdot u = 0,

where {\nu>0} is a given constant (the kinematic viscosity, or viscosity for short), {u: I \times {\bf R}^d \rightarrow {\bf R}^d} is an unknown vector field (the velocity field), and {p: I \times {\bf R}^d \rightarrow {\bf R}} is an unknown scalar field (the pressure field). Here {I} is a time interval, usually of the form {[0,T]} or {[0,T)}. We will either be interested in spatially decaying situations, in which {u(t,x)} decays to zero as {x \rightarrow \infty}, or {{\bf Z}^d}-periodic (or periodic for short) settings, in which one has {u(t, x+n) = u(t,x)} for all {n \in {\bf Z}^d}. (One can also require the pressure {p} to be periodic as well; this brings up a small subtlety in the uniqueness theory for these equations, which we will address later in this set of notes.) As is usual, we abuse notation by identifying a {{\bf Z}^d}-periodic function on {{\bf R}^d} with a function on the torus {{\bf R}^d/{\bf Z}^d}.

In order for the system (1) to even make sense, one requires some level of regularity on the unknown fields {u,p}; this turns out to be a relatively important technical issue that will require some attention later in this set of notes, and we will end up transforming (1) into other forms that are more suitable for lower regularity candidate solution. Our focus here will be on local existence of these solutions in a short time interval {[0,T]} or {[0,T)}, for some {T>0}. (One could in principle also consider solutions that extend to negative times, but it turns out that the equations are not time-reversible, and the forward evolution is significantly more natural to study than the backwards one.) The study of Euler equations, in which {\nu=0}, will be deferred to subsequent lecture notes.

As the unknown fields involve a time parameter {t}, and the first equation of (1) involves time derivatives of {u}, the system (1) should be viewed as describing an evolution for the velocity field {u}. (As we shall see later, the pressure {p} is not really an independent dynamical field, as it can essentially be expressed in terms of the velocity field without requiring any differentiation or integration in time.) As such, the natural question to study for this system is the initial value problem, in which an initial velocity field {u_0: {\bf R}^d \rightarrow {\bf R}^d} is specified, and one wishes to locate a solution {(u,p)} to the system (1) with initial condition

\displaystyle  u(0,x) = u_0(x) \ \ \ \ \ (2)

for {x \in {\bf R}^d}. Of course, in order for this initial condition to be compatible with the second equation in (1), we need the compatibility condition

\displaystyle  \nabla \cdot u_0 = 0 \ \ \ \ \ (3)

and one should also impose some regularity, decay, and/or periodicity hypotheses on {u_0} in order to be compatible with corresponding level of regularity etc. on the solution {u}.

The fundamental questions in the local theory of an evolution equation are that of existence, uniqueness, and continuous dependence. In the context of the Navier-Stokes equations, these questions can be phrased (somewhat broadly) as follows:

  • (a) (Local existence) Given suitable initial data {u_0}, does there exist a solution {(u,p)} to the above initial value problem that exists for some time {T>0}? What can one say about the time {T} of existence? How regular is the solution?
  • (b) (Uniqueness) Is it possible to have two solutions {(u,p), (u',p')} of a certain regularity class to the same initial value problem on a common time interval {[0,T)}? To what extent does the answer to this question depend on the regularity assumed on one or both of the solutions? Does one need to normalise the solutions beforehand in order to obtain uniqueness?
  • (c) (Continuous dependence on data) If one perturbs the initial conditions {u_0} by a small amount, what happens to the solution {(u,p)} and on the time of existence {T}? (This question tends to only be sensible once one has a reasonable uniqueness theory.)

The answers to these questions tend to be more complicated than a simple “Yes” or “No”, for instance they can depend on the precise regularity hypotheses one wishes to impose on the data and on the solution, and even on exactly how one interprets the concept of a “solution”. However, once one settles on such a set of hypotheses, it generally happens that one either gets a “strong” theory (in which one has existence, uniqueness, and continuous dependence on the data), a “weak” theory (in which one has existence of somewhat low-quality solutions, but with only limited uniqueness results (or even some spectacular failures of uniqueness) and almost no continuous dependence on data), or no satsfactory theory whatsoever. In the former case, we say (roughly speaking) that the initial value problem is locally well-posed, and one can then try to build upon the theory to explore more interesting topics such as global existence and asymptotics, classifying potential blowup, rigorous justification of conservation laws, and so forth. With a weak local theory, it becomes much more difficult to address these latter sorts of questions, and there are serious analytic pitfalls that one could fall into if one tries too strenuously to treat weak solutions as if they were strong. (For instance, conservation laws that are rigorously justified for strong, high-regularity solutions may well fail for weak, low-regularity ones.) Also, even if one is primarily interested in solutions at one level of regularity, the well-posedness theory at another level of regularity can be very helpful; for instance, if one is interested in smooth solutions in {{\bf R}^d}, it turns out that the well-posedness theory at the critical regularity of {\dot H^{\frac{d}{2}-1}({\bf R}^d)} can be used to establish globally smooth solutions from small initial data. As such, it can become quite important to know what kind of local theory one can obtain for a given equation.

This set of notes will focus on the “strong” theory, in which a substantial amount of regularity is assumed in the initial data and solution, giving a satisfactory (albeit largely local-in-time) well-posedness theory. “Weak” solutions will be considered in later notes.

The Navier-Stokes equations are not the simplest of partial differential equations to study, in part because they are an amalgam of three more basic equations, which behave rather differently from each other (for instance the first equation is nonlinear, while the latter two are linear):

  • (a) Transport equations such as {\partial_t u + (u \cdot \nabla) u = 0}.
  • (b) Diffusion equations (or heat equations) such as {\partial_t u = \nu \Delta u}.
  • (c) Systems such as {v = F - \nabla p}, {\nabla \cdot v = 0}, which (for want of a better name) we will call Leray systems.

Accordingly, we will devote some time to getting some preliminary understanding of the linear diffusion and Leray systems before returning to the theory for the Navier-Stokes equation. Transport systems will be discussed further in subsequent notes; in this set of notes, we will instead focus on a more basic example of nonlinear equations, namely the first-order ordinary differential equation

\displaystyle  \partial_t u = F(u) \ \ \ \ \ (4)

where {u: I \rightarrow V} takes values in some finite-dimensional (real or complex) vector space {V} on some time interval {I}, and {F: V \rightarrow V} is a given linear or nonlinear function. (Here, we use “interval” to denote a connected non-empty subset of {{\bf R}}; in particular, we allow intervals to be half-infinite or infinite, or to be open, closed, or half-open.) Fundamental results in this area include the Picard existence and uniqueness theorem, the Duhamel formula, and Grönwall’s inequality; they will serve as motivation for the approach to local well-posedness that we will adopt in this set of notes. (There are other ways to construct strong or weak solutions for Navier-Stokes and Euler equations, which we will discuss in later notes.)

A key role in our treatment here will be played by the fundamental theorem of calculus (in various forms and variations). Roughly speaking, this theorem, and its variants, allow us to recast differential equations (such as (1) or (4)) as integral equations. Such integral equations are less tractable algebraically than their differential counterparts (for instance, they are not ideal for verifying conservation laws), but are significantly more convenient for well-posedness theory, basically because integration tends to increase the regularity of a function, while differentiation reduces it. (Indeed, the problem of “losing derivatives”, or more precisely “losing regularity”, is a key obstacle that one often has to address when trying to establish well-posedness for PDE, particularly those that are quite nonlinear and with rough initial data, though for nonlinear parabolic equations such as Navier-Stokes the obstacle is not as serious as it is for some other PDE, due to the smoothing effects of the heat equation.)

One weakness of the methods deployed here are that the quantitative bounds produced deteriorate to the point of uselessness in the inviscid limit {\nu \rightarrow 0}, rendering these techniques unsuitable for analysing the Euler equations in which {\nu=0}. However, some of the methods developed in later notes have bounds that remain uniform in the {\nu \rightarrow 0} limit, allowing one to also treat the Euler equations.

In this and subsequent set of notes, we use the following asymptotic notation (a variant of Vinogradov notation that is commonly used in PDE and harmonic analysis). The statement {X \lesssim Y}, {Y \gtrsim X}, or {X = O(Y)} will be used to denote an estimate of the form {|X| \leq CY} (or equivalently {Y \geq C^{-1} |X|}) for some constant {C}, and {X \sim Y} will be used to denote the estimates {X \lesssim Y \lesssim X}. If the constant {C} depends on other parameters (such as the dimension {d}), this will be indicated by subscripts, thus for instance {X \lesssim_d Y} denotes the estimate {|X| \leq C_d Y} for some {C_d} depending on {d}.

— 1. Ordinary differential equations —

We now study solutions to ordinary differential equations (4), focusing in particular on the initial value problem when the initial state {u(0) = u_0 \in V} is specified. We restrict attention to strong solutions {u: I \rightarrow V}, in which {u} is continuously differentiable ({C^1}) in the time variable, so that the derivative {\partial_t} in (4) can be interpreted as the classical (strong) derivative, and one has the classical fundamental theorem of calculus

\displaystyle  u(t_2) - u(t_1) = \int_{t_1}^{t_2} \partial_t u(t)\ dt \ \ \ \ \ (5)

whenever {t_1,t_2 \in I} (in this post we use the signed definite integral, thus {\int_{t_1}^{t_2} = -\int_{t_2}^{t_1}}).

We begin with homogeneous linear equations

\displaystyle  \partial_t u = L u

where {L: V \rightarrow V} is a linear operator. Using the integrating factor {e^{-tL}}, where {e^{-tL}: V \rightarrow V} is the matrix exponential of {-tL}, and noting that {\frac{d}{dt} e^{-tL} = -L e^{-tL} = -e^{-tL} L}, we see that this equation is equivalent to

\displaystyle  \partial_t (e^{-tL} u) = 0

and hence from the fundamental theorem of calculus we see that if {u(0) = u_0} then we have the unique global solution {e^{-tL} u = u_0}, or equivalently

\displaystyle  u(t) = e^{tL} u_0.

More generally, if one wishes to solve the inhomogeneous linear equation

\displaystyle  \partial_t u = L u + F

for some continuous {F: {\bf R} \rightarrow V} with initial condition {u(0) = u_0}, then from the fundamental theorem of calculus we have a unique global solution given by

\displaystyle  e^{-tL} u = u_0 + \int_0^t e^{-sL} F(s)\ ds

or equivalently one has the Duhamel’s formula

\displaystyle  u(t) = e^{tL} u_0 + \int_0^t e^{(t-s) L} F(s)\ ds, \ \ \ \ \ (6)

which is continuously differentiable in time if {F} is continuous. Intuitively, the first term {e^{tL} u_0} represents the contribution of the initial data {u_0} to the solution {u(t)} at time {t} (with the {e^{tL}} factor representing the evolution from time {0} to time {t}), while the integrand {e^{(t-s)L} F(s)} represents the contribution of the forcing term {F(s)} at time {s} to the solution {u(t)} at time {t} (with the {e^{(t-s)L}} factor representing the evolution from time {s} to time {t}).

One can apply a similar analysis to the differential inequality

\displaystyle  \partial_t u(t) \leq A(t) u(t) + F(t) \ \ \ \ \ (7)

where {u: I \rightarrow {\bf R}} is now a scalar continuously differentiable function, {F, A: I \rightarrow {\bf R}} are continuous functions, and {I} is an interval containing {0} as its left endpoint; we also assume an initial condition {u(0) = u_0 \in {\bf R}}. Here, the natural integrating factor is {t \mapsto \exp( - \int_0^t A(t')\ dt' )}, whose derivative is {t \mapsto - A(t) \exp( - \int_0^t A(t')\ dt' )} by the chain rule and the fundamental theorem of calculus. Applying this integrating factor to (7), we may write it as

\displaystyle  \partial_t ( \exp( - \int_0^t A(t')\ ds ) u(t) ) \leq \exp( - \int_0^t A(t')\ dt' ) F(t)

and hence by the fundamental theorem of calculus we have

\displaystyle  \exp( - \int_0^t A(t')\ dt' ) u(t) \leq u_0 + \int_0^t \exp( - \int_0^{s} A(t')\ dt' )F(s) \ ds

or equivalently

\displaystyle  u(t) \leq \exp( \int_0^t A(t')\ dt' ) u_0 + \int_0^t \exp( \int_s^t A(t')\ dt' ) F(s)\ ds \ \ \ \ \ (8)

for all {t \in I} (compare with (6)). This is the differential form of Grönwall’s inequality. In the homogeneous case {F=0}, the inequality of course simplifies to

\displaystyle  u(t) \leq \exp( \int_0^t A(t')\ dt' ) u_0. \ \ \ \ \ (9)

We continue assuming that {F=0} for simplicity. From the fundamental theorem of calculus, (7) (and the initial condition {u_0}) implies the integral inequality

\displaystyle  u(t) \leq u_0 + \int_0^t A(s) u(s)\ ds, \ \ \ \ \ (10)

although the converse implication of (7) from (10) is false in general. Nevertheless, there is an analogue of (9) just assuming the weaker inequality (10), and not requiring any differentiability on {u}, at least when all functions involved are non-negative:

Lemma 1 (Integral form of Grönwall inequality) Let {I} be an interval containing {0} as left endpoint, let {u_0 \in [0,+\infty)}, and let {u, A: I \rightarrow [0,+\infty)} be continuous functions obeying the inequality (10) for all {t \in I}. Then one has (9) for all {t \in I}.

Proof: From (10) and the fundamental theorem of calculus, the function {t \mapsto u_0 + \int_0^t A(s) u(s)\ ds} is continuously differentiable and obeys the differential inequality

\displaystyle  \frac{d}{dt}( u_0 + \int_0^t A(s) u(s)\ ds ) = A(t) u(t) \leq A(t) (u_0 + \int_0^t A(s) u(s)\ ds)

(note here that we use the hypothesis that {A(t)} is non-negative). Applying the differential form (9) of Gronwall’s inequality, we conclude that

\displaystyle  u_0 + \int_0^t A(s) u(s)\ ds \leq \exp( \int_0^t A(t')\ dt' ) u_0.

The claim now follows from (10). \Box

Exercise 2 Relax the hypotheses of continuity on {u,A} to that of being measurable and bounded on compact intervals. (You will need tools such as the fundamental theorem of calculus for absolutely continuous or Lipschitz functions, covered for instance in this previous set of notes.)

Gronwall’s inequality is an excellent tool for bounding the growth of a solution to an ODE or PDE, or the difference between two such solutions. Here is a basic example, one half of the Picard (or Picard-Lindeöf) theorem:

Theorem 3 (Picard uniqueness theorem) Let {I} be an interval, let {V} be a finite-dimensional vector space, let {F: V \rightarrow V} be a function that is Lipschitz continuous on every bounded subset of {V}, and let {u, v: I \rightarrow V} be continuously differentiable solutions to the ODE (4), thus

\displaystyle  \partial_t u = F(u); \quad \partial_t v = F(v)

on {I}. If {u(t_0) = v(t_0)} for some {t_0 \in I}, then {u} and {v} agree identically on {I}, thus {u(t)=v(t)} for all {t \in I}.

Proof: By translating {I} and {t_0} we may assume without loss of generality that {t_0=0}. By splitting {I} into at most two intervals, we may assume that {0} is either the left or right endpoint of {I}; by applying the time reversal symmetry of replacing {u,v} by {t \mapsto u(-t), v \mapsto v(-t)} respectively, and also replacing {I, F} by {-I} and {-F}, we may assume without loss of generality that {0} is the left endpoint of {I}. Finally, by writing {I} as the union of compact intervals with left endpoint {0}, we may assume without loss of generality that {I} is compact. In particular, {u,v} are bounded and hence {F} is Lipschitz continuous with some finite Lipschitz constant {K} on the ranges of {u} and {v}.

From the fundamental theorem of calculus we have

\displaystyle  u(t) = u(t_0) + \int_{t_0}^t F(u(s))\ ds


\displaystyle  v(t) = v(t_0) + \int_{t_0}^t F(v(s))\ ds

for every {t \in I}; subtracting, we conclude

\displaystyle  u(t) - v(t) = \int_{t_0}^t F(u(s)) - F(v(s))\ ds.

Applying the Lipschitz property of {F} and the triangle inequality, we conclude that

\displaystyle  |u(t)-v(t)| \leq K \int_{t_0}^t |u(s)-v(s)|\ ds.

By the integral form of Grönwall’s inequality, we conclude that

\displaystyle  |u(t)-v(t)| \leq \exp( \int_{t_0}^t K\ ds) 0 = 0

and the claim follows. \Box

Remark 4 The same result applies for infinite-dimensional normed vector spaces {V}, at least if one requires {u,v} to be continuously differentiable in the strong (Fréchet) sense; the proof is identical.

Exercise 5 (Comparison principle) Let {F: {\bf R} \rightarrow {\bf R}} be a function that is Lipschitz continuous on compact intervals. Let {I} be an interval, and let {u,v: I \rightarrow {\bf R}} be continuously differentiable functions such that

\displaystyle  \partial_t u(t) \leq F(u(t))


\displaystyle  \partial_t v(t) \geq F(v(t))

for all {t \in I}.

  • (a) Suppose that {u(t_0) \leq v(t_0)} for some {t_0 \in I}. Show that {u(t) \leq v(t)} for all {t \in I} with {t \geq t_0}. (Hint: there are several ways to proceed here. One is to try to verify the hypotheses of Grönwall’s inequality for the quantity {\max(v(t)-u(t),0)} or {\max(v(t)-u(t),0)^2}.)
  • (b) Suppose that {u(t_0) < v(t_0)} for some {t_0 \in I}. Show that {u(t) < v(t)} for all {t \in I} with {t \geq t_0}.

Now we turn to the existence side of the Picard theorem.

Theorem 6 (Picard existence theorem) Let {V} be a finite dimensional normed vector space, let {R>0}, and let {u_0 \in V} lie in the closed ball {\overline{B(0,R)} := \{ u \in V: \|u\| \leq R \}}. Let {F: V \rightarrow V} be a function which has a Lipschitz constant of {K} on the ball {\overline{B(0,2R)}}. If one sets

\displaystyle  T := \frac{1}{2K + \|F(0)\|/R}, \ \ \ \ \ (11)

then there exists a continuously differentiable solution {u: [-T,T] \rightarrow V} to the ODE (4) with initial data {u(0) = u_0} such that {u(t) \in \overline{B(0,2R)}} for all {t \in [0,T]}.

Note that the solution produced by this theorem is unique on {[-T,T]}, thanks to Theorem 3. We will be primarily concerned with the case {F(0)=0}, in which case the time of existence {T} simplifies to {T = \frac{1}{2K}}.

Proof: Using the fundamental theorem of calculus, we write (4) (with initial condition {u(0)=u_0}) in integral form as

\displaystyle  u(t) = u_0 + \int_0^t F(u(s))\ ds. \ \ \ \ \ (12)

Indeed, if {u} is continuously differentiable and solves (4) with {u(0)=u_0} on {[-T,T]}, then (12) holds on {[-T,T]}. Conversely, if {u} is continuous and solves (12) on {[-T,T]}, then by the fundamental theorem of calculus the right-hand side of (12) (and hence {u}) is continuously differentiable and solves (4) with {u(0)=u_0}. Thus it suffices to solve the integral equation (12) with a solution taking values in {\overline{B(0,2R)}}.

We can view this as a fixed point problem. Let {X = C([T,T] \rightarrow \overline{B(0,2R)})} denote the space of continuous functions from {[-T,T]} to {\overline{B(0,2R)}}. We give this the uniform metric

\displaystyle  d(u,v) := \sup_{t \in [-T,T]} \| u(t) - v(t) \|.

As is well known, {X} becomes a complete metric space with this metric. Let {\Phi: X \rightarrow X} denote the map

\displaystyle  \Phi(u)(t) := u_0 + \int_0^t F(u(s))\ ds.

Let us first verify that {\Phi} does map {X} to {X}. If {u \in X}, then {\Phi(u)} is clearly continuous. For any {t \in [-T,T]}, one has from the triangle inequality that

\displaystyle  \| \Phi(u)(t) \| \leq \| u_0 \| + T \sup_{s \in [-T,T]} \| F(u(s)) \|

\displaystyle  \leq R + T (\|F(0)\| + 2RK)

\displaystyle  \leq 2R

by choice of {T}, hence {\Phi(u) \in X} as claimed. A similar argument shows that {\Phi} is in fact a contraction on {X}. Namely, if {u,v \in X}, then

\displaystyle  \|\Phi(u)(t) - \Phi(v)(t)\| = \| \int_0^t F(u(s)) - F(v(s)) \ ds \|

\displaystyle  \leq T \sup_{s \in [-T,T]} K \|u(s)-v(s)\|

\displaystyle  \leq TK d(u,v)

and hence {d(\Phi(u),\Phi(v)) \leq \frac{1}{2} d(u,v)} by choice of {T}. Applying the contraction mapping theorem, we obtain a fixed point {u \in X} to the equation {u = \Phi(u)}, which is precisely (12), and the claim follows. \Box

Remark 7 The proof extends without difficulty to infinite dimensional Banach spaces {V}. Up to a multiplicative constant, the result is sharp. For instance, consider the linear ODE {\partial_t u = Ku} for some {K>0}, with {\|u_0\|=R}. Here, the function {u \mapsto Ku} is of course Lipschitz with constant {K} on all of {V}, and the solution is of the form {u(t) = e^{Kt} u_0}, hence {u} will exit {\overline{B(0,2R)}} in time {\frac{\log 2}{K}}, which is only larger than the time {\frac{1}{2K}} given by the above theorem by a multiplicative constant.

We can iterate the Picard existence theorem (and combine it with the uniqueness theorem) to conclude that there is a maximal Cauchy development {u: (T_-, T_+) \rightarrow V} to the ODE (4) with initial data {u(0)=u_0}, with the solution diverging to infinity (or “blowing up”) at the endpoint {T_+} if this endpoint is finite, and similarly for {T_-} (thus one has a dichotomy between global existence and finite time blowup). More precisely:

Theorem 8 (Maximal Cauchy development) Let {V} be a finite dimensional normed vector space, let {u_0 \in V}, and let {F: V \rightarrow V} be a function which is Lipschitz on bounded sets. Then there exists {-\infty \leq T_- < 0 < T_+ \leq +\infty} and a continuously differentiable solution {u: (T_-, T_+) \rightarrow V} to (4) with {u(0) = u_0}, such that {\lim_{t \rightarrow T_+^-} \|u(t)\| = \infty} if {T_+} is finite, and {\lim_{t \rightarrow T_-^+} \|u(t)\| = \infty} if {T_-} is finite. Furthermore, {T_-, T_+}, and {u} are unique.

Proof: Uniqueness follows easily from Theorem 3. For existence, let {I} be the union of all the intervals containing {0} for which there is a continuously differentiable solution to (4) with {u(0)=u_0}. From Theorem 6 {I} contains a neighbourhood of the origin. From Theorem 3 one can glue all the solutions together to obtain a continuously differentiable solution {u: I \rightarrow V} to (4) with {u(0)=u_0}. If {t_0 \in I} is contained in {I}, then by Theorem 6 (and time translation) one could find a solution {\tilde u: (t_0-\varepsilon,t_0+\varepsilon) \rightarrow V} to (4) in a neighbourhood of {t_0} such that {u(t_0) = \tilde u(t_0)}; by Theorem 3 we must then have {(t_0-\varepsilon,t_0+\varepsilon) \subset I}, otherwise we could glue {\tilde u} to {u} and obtain a solution on a larger domain than {I}, contradicting the definition of {I}. Thus {I} is open, and is of the form {I = (T_-, T_+)} for some {-\infty \leq T_- < 0 < T_+ \leq +\infty}.

Suppose for contradiction that {T_+} is finite and {\|u(t)\|} does not go to infinity as {T \rightarrow T_+^-}. Then there exists a finite {R} and a sequence {t_n \nearrow T_+} such that {\|u(t_n)\| \leq R}. Let {K} be the Lipschitz constant of {F} on {\overline{B(0,2R)}}. By Theorem 6, for each {n} one can find a solution {u_n} to (4) on {(t_n-T, t_n+T)} with {u_n(t_n) = u(t_n)}, where {T := \frac{1}{2K + \|F(0)\|/R}} does not depend on {n}. For {n} large enough, this and Theorem 7 allow us to extend the solution {u} outside of {I}, contradicting the definition of {u}. Thus we have {\lim_{t \rightarrow T_+^-} \|u(t)\| = \infty} when {T_+} is finite, and a similar argument gives {\lim_{t \rightarrow T_-^+} \|u(t)\| = \infty} when {T_-} is finite. \Box

Remark 9 Theorem 6 gives a more quantitative description of the blowup: if {T_+} is finite, then for any {T_- < t < T_+}, one must have

\displaystyle  T_+ - t > \frac{1}{2K_{2\|u(t)\|} + \|F(0)\|/\|u(t)\|}

where {K_{2\|u(t)\|}} is the Lipschitz constant of {F} on {\overline{B(0,2\|u(t)\|)}}. This can be used to give some explicit lower bound on blowup rates. For instance, if {F(0)=0} and {F(u)} behaves like {|u|^p} for some {p>1} in the sense that the Lipschitz constant of {F} on {\overline{B(0,R)}} is {O(R^{p-1})} for any {R>0}, then we obtain a lower bound

\displaystyle  \| u(t) \| \gtrsim \frac{1}{(T_+-t)^{1/(p-1)}} \ \ \ \ \ (13)

as {t \nearrow T_+}, if {T_+} is finite, and similarly when {T_-} is finite. This type of blowup rate is sharp. For instance, consider the scalar ODE

\displaystyle  \partial_t u = |u|^p

where {u} takes values in {{\bf R}} and {p>1} is fixed. Then for any {T_+ \in {\bf R}}, one has explicit solutions on {(-\infty,T_+)} of the form

\displaystyle  u(t) = \frac{c}{(T_+-t)^{1/(p-1)}}

where {c := \frac{1}{(p-1)^{1/(p-1)}}} is a positive constant depending only on {p}. The blowup rate at {T_+} is consistent with (13) and also with (11).

Exercise 10 (Higher regularity) Let the notation and hypotheses be as in Theorem 8. Suppose that {F: V \rightarrow V} is {k} times continuously differentiable for some natural number {k}. Show that the maximal Cauchy development {u} is {k+1} times continuously differentiable. In particular, if {F} is smooth, then so is {u}.

Exercise 11 (Lipschitz continuous dependence on data) Let {V} be a finite-dimensional normed vector space.

  • (a) Let {R>0}, let {F: V \rightarrow V} be a function which has a Lipschitz constant of {K} on the ball {\overline{B(0,2R)}}, and let {T} be the quantity (11). If {u_0, v_0 \in \overline{B(0,R)}}, and {u,v: [-T,T] \rightarrow V} are the solutions to (4) with {u(0)=u_0, v(0)=v_0} given by Theorem 6, show that

    \displaystyle  \sup_{t \in [-T,T]} \|u(t)-v(t)\| \leq 2 \|u_0-v_0\|.

  • (b) Let {F: V \rightarrow V} be a function which is Lipschitz on bounded sets, let {u_0 \in V}, and let {u: (T_-, T_+) \rightarrow V} be the maximal Cauchy development of (4) with initial data {u(0)=u_0} given by Theorem 6. Show that for any compact interval {I \subset (T_-,T_+)} containing {0}, there exists an open neighbourhood {U} of {u_0}, such that for any {v_0 \in U}, there exists a solution {v: I \rightarrow V} of (4) with initial data {v(0)=v_0}. Furthermore, the map from {v_0} to {v} is a Lipschitz continuous map from {U} to {C(I \rightarrow V)}.

Exercise 12 (Non-autonomous Picard theorem) Let {V} be a finite-dimensional normed vector space, and let {F: {\bf R} \times V \rightarrow V} be a function which is Lipschitz on bounded sets. Let {u_0 \in V}. Show that there exist {-\infty \leq T_- < 0 < T_+ \leq +\infty} and a continuously differentiable function {u: (T_-,T_+) \rightarrow V} solving the non-autonomous ODE

\displaystyle  \partial_t u(t) = F(t,u(t))

for {t \in (T_-,T_+)} with initial data {u(0) = u_0}; furthermore one has {\lim_{t \rightarrow T_+^-} \|u(t)\| = \infty} if {T_+} is finite, and {\lim_{t \rightarrow T_-^+} \|u(t)\| = \infty} if {T_-} is finite. Finally, show that {T_-, T_+, u} are unique. (Hint: this could be done by repeating all of the previous arguments, but there is also a way to deduce this non-autonomous version of the Picard theorem directly from the Picard theorem by adding one extra dimension to the space {V}.)

The above theory is symmetric with respect to the time reversal of replacing {t \mapsto u(t)} with {t \mapsto u(-t)} and {F} with {-F}. However, one can break this symmetry by introducing a dissipative linear term, in which case one only obtains the forward-in-time portion of the Picard existence theorem:

Exercise 13 Let {V} be a finite dimensional normed vector space, let {R>0}, and let {u_0 \in V} lie in the closed ball {\overline{B(0,R)} := \{ u \in V: \|u\| \leq R \}}. Let {F: V \rightarrow V} be a function which has a Lipschitz constant of {K} on the ball {\overline{B(0,2R)}}. Let {T} be the quantity in (11). Let {L: V \rightarrow V} be a linear operator obeying the dissipative estimates

\displaystyle  \| e^{tL} u \| \leq \| u \|

for all {u \in V} and {t \geq 0}. Show that there exists a continuously differentiable solution {u: [0,T] \rightarrow V} to the ODE

\displaystyle  \partial_t u = Lu + F(u) \ \ \ \ \ (14)

with initial data {u(0) = u_0} such that {u(t) \in \overline{B(0,2R)}} for all {t \in [0,T]}.

Remark 14 With the hypotheses of the above exercise, one can also solve the ODE backwards in time by an amount {\frac{1}{2K + 2 \|L\|_{op} + \|F(0)\|/R}}, where {\|L\|_{op}} denotes the operator norm of {L}. However, in the limit as the operator norm of {L} goes to infinity, the amount to which one can evolve backwards in time goes to zero, whereas the time in which one can evolve forwards in time remains bounded away from zero, thus breaking the time symmetry.

— 2. Leray systems —

Now we discuss the Leray system of equations

\displaystyle  v = F - \nabla p; \quad \nabla \cdot v = 0 \ \ \ \ \ (15)

where {F: {\bf R}^d \rightarrow {\bf R}^d} is given, and the vector field {v: {\bf R}^d \rightarrow {\bf R}^d} and the scalar field {p: {\bf R}^d \rightarrow {\bf R}} are unknown. In other words, we wish to decompose a specified function {F} as the sum of a gradient {\nabla p} and a divergence-free vector field {v}. We will use the usual Lebesgue spaces {L^q(X \rightarrow {\bf R}^m)} of measurable functions {f: X \rightarrow {\bf R}^m} (up to almost everywhere equivalence) defined on some measure space {(X,\mu)} (which in our case will always be either {{\bf R}^d} or {{\bf R}^d/{\bf Z}^d} with Lebesgue measure) such that the {L^q} norm {(\int_X |f|^q\ d\mu)^{1/q}} is finite. (For {q=\infty}, the {L^\infty} norm is defined instead to be the essential supremum of {|f|}.)

Proceeding purely formally, we could solve this system by taking the divergence of the first equation to conclude that

\displaystyle 0 = \nabla \cdot F - \Delta p

where {\Delta p = \nabla \cdot (\nabla p)} is the Laplacian of {p}, and then we could formally solve for {p} as

\displaystyle  p = \Delta^{-1} (\nabla \cdot F) \ \ \ \ \ (16)

and then solve for {v} as

\displaystyle  v = F - \nabla \Delta^{-1} (\nabla \cdot F). \ \ \ \ \ (17)

However, if one wishes to justify this rigorously one runs into the issue that the Laplacian {\Delta} is not quite invertible. To sort this out and make this problem well-defined, we need to specify the regularity and decay one wishes to impose on the data {F} and on the solution {v, p}. To begin with, let us suppose that {u,F,p} are all smooth.

We first understand the uniqueness theory for this problem. By linearity, this amounts to solving the homogeneous equation when {F=0}, thus we wish to classify the smooth fields {v: {\bf R}^d \rightarrow {\bf R}^d} and {p: {\bf R}^d \rightarrow {\bf R}} solving the system

\displaystyle  v = -\nabla p; \quad \nabla \cdot v = 0.

Of course, we can eliminate {v} and write this a single equation

\displaystyle \Delta p = 0

That is to say, the solutions to this equation arise by selecting {p} to be a (smooth) harmonic function, and {v} to be the negative gradient of {p}. This is consistent with our preceding discussion that identified the potential lack of invertibility of {\Delta} as a key issue.

By linearity, this implies that (smooth) solutions {(v,p)} to the system (15) are only unique up to the addition of an arbitrary harmonic function to {p}, and tbe subtraction of the gradient of that harmonic function from {v}.

We can largely eliminate this lack of uniqueness by imposing further requirements on {F,p,v}. For instance, suppose in addition that we require {F,p,v} to all be {{\bf Z}^d}-periodic (or periodic for short), thus

\displaystyle  F(x+n) = F(x), p(x+n) = p(x), v(x+n) = v(x)

for {x \in {\bf R}^d} and {n \in {\bf Z}^d}. Then the only freedom we have is to modify {p} by an arbitrary periodic harmonic function (and to subtract the gradient of that function from {v}). However, by Liouville’s theorem, the only periodic harmonic functions are the constants, whose gradient vanishes. Thus the only freedom in this setting is to add a constant to {p}. This freedom will be almost irrelevant when we consider the Euler and Navier-Stokes equations, since it is only the gradient of the pressure which appears in those equations, rather than the pressure itself. Nevertheless, if one wishes, one could remove this freedom by requiring that {p} be of mean zero: {\int_{{\bf R}^d/{\bf Z}^d} p(x)\ dx = 0}.

Now suppose instead that we only require that {F} and {v} be {{\bf Z}^d}-periodic, but do not require {p} to be {{\bf Z}^d}-periodic. Then we have the freedom to modify {p} by a harmonic function {u} which need not be {{\bf Z}^d}-periodic, but whose gradient {\nabla u} is {{\bf Z}^d}-periodic. Since the gradient of a harmonic function is also harmonic, {\nabla u} has to be constant, and so {u} is an affine-linear function. Conversely, all affine-linear functions are harmonic, and their gradients are constant and thus also {{\bf Z}^d}-periodic. Thus, one has the freedom in this setting to add an arbitrary affine-linear function to {p}, and subtract the constant gradient of that function from {v}.

Instead of periodicity, one can also impose decay conditions on the various functions. Suppose for instance that we require the pressure to lie in an {L^q({\bf R}^d \rightarrow {\bf R})} space for some {1 \leq q < \infty}; roughly speaking, this forces the pressure to decay to zero at infinity “on the average". Then we only have the freedom to modify {p} by a harmonic function {u} that is also in the {L^q} class (and modify {v} by the negative gradient of this harmonic function). However, the mean value property of harmonic functions implies that

\displaystyle  u(x) = \frac{1}{|B(x,R)|} \int_{B(x,R)} u(y)\ dy

for any ball {B(x,R)} of some radius {R} centred around {x}, where {|B(x,R)|} denotes the measure of the ball. By Hölder’s inequality, we conclude that

\displaystyle  |u(x)| \leq \frac{1}{|B(x,R)|} |B(x,R)|^{1-1/q} \|u\|_{L^q({\bf R}^d)}.

Sending {R \rightarrow \infty} we conclude that {u} vanishes identically; thus there are no non-trivial harmonic functions in {L^q({\bf R}^d \rightarrow {\bf R})}. Thus there is uniqueness for the problem (15) if we require the pressure {p} to lie in {L^q({\bf R}^d \rightarrow {\bf R})}. If instead we require the vector field {v} to be in {L^q({\bf R}^d \rightarrow {\bf R}^d)}, then we can modify {p} by a harmonic function {u} with {\nabla u} in {L^q({\bf R}^d \rightarrow {\bf R}^{d^2})}, thus {\nabla u} vanishes identically and hence {u} is constant. So if we require {v \in L^q({\bf R}^d \rightarrow {\bf R}^d)} then we only have the freedom to adjust {p} by arbitrary constants.

Having discussed uniqueness, we now turn to existence. We begin with the periodic setting in which {F,v,p} are required to be {{\bf Z}^d}-periodic and smooth, so that they can also be viewed (by slight abuse of notation) as functions on the torus {{\bf R}^d/{\bf Z}^d}. The system (15) is linear and translation-invariant, which strongly suggests that one solve the system using the Fourier transform (which tends to diagonalise linear translation-invariant equations, because the plane waves {x \mapsto e^{2\pi i k \cdot x}} that underlie the Fourier transform are the eigenfunctions of translation.) Indeed, we may expand {F,v,p} as Fourier series

\displaystyle  F(x) = \sum_{k \in {\bf Z}^d} \hat F(k) e^{2\pi i k \cdot x}

\displaystyle  v(x) = \sum_{k \in {\bf Z}^d} \hat v(k) e^{2\pi i k \cdot x}

\displaystyle  p(x) = \sum_{k \in {\bf Z}^d} \hat p(k) e^{2\pi i k \cdot x}

where the Fourier coefficients {\hat F(k) \in {\bf R}^d}, {\hat v(k) \in{\bf R}^d}, {\hat p(k) \in {\bf R}} are given by the formulae

\displaystyle  \hat F(k) = \int_{{\bf R}^d/{\bf Z}^d} F(x) e^{-2\pi i k \cdot x}\ dx

\displaystyle  \hat v(k) = \int_{{\bf R}^d/{\bf Z}^d} v(x) e^{-2\pi i k \cdot x}\ dx

\displaystyle  \hat p(k) = \int_{{\bf R}^d/{\bf Z}^d} p(x) e^{-2\pi i k \cdot x}\ dx.

When {F,v,p} are smooth, then {\hat F(k), \hat v(k), \hat p(k)} are rapidly decreasing as {k \rightarrow \infty}, which will allow us to justify manipulations such as interchanging summation and derivatives without difficulty. Expanding out (15) in Fourier series and then comparing Fourier coefficients (which are unique for smooth functions), we obtain the system

\displaystyle  \hat v(k) = \hat F(k) - 2\pi i k \hat p(k) \ \ \ \ \ (18)

\displaystyle  2\pi i k \cdot \hat v(k) = 0 \ \ \ \ \ (19)

for each {k \in {\bf Z}^d}. As mentioned above, the Fourier transform has diagonalised the system (15), in that there are no interactions between different frequencies {k \in {\bf Z}^d}, and we now have a decoupled system of vector equations. To solve these equations, we can take the inner product of both sides of (18) with {k} and apply (19) to conclude that

\displaystyle  0 = k \cdot \hat F(k) - 2\pi i |k|^2 \hat p(k).

For non-zero {k}, we can then solve for {\hat p(k)} and hence {\hat v(k)} by the formulae

\displaystyle  \hat p(k) = \frac{k}{2\pi i |k|^2} \cdot \hat F(k)


\displaystyle  \hat v(k) = \hat F(k) - k (\frac{k}{|k|^2} \cdot \hat F(k)).

For {k = 0}, these formulae no longer apply; however from (18) we see that {\hat v(0) = \hat F(0)}, while {\hat p(0)} can be arbitrary (which corresponds to the aforementioned freedom to add an arbitrary constant to {p}). Thus we have the explicit general solution

\displaystyle  p(x) = C + \sum_{k \in {\bf Z}^d \backslash \{0\}} \frac{k}{2\pi i |k|^2} \cdot \hat F(k) e^{2\pi i k \cdot x}

\displaystyle  v(x) = \hat F(0) + \sum_{k \in {\bf Z}^d \backslash \{0\}} (\hat F(k) - k (\frac{k}{|k|^2} \cdot \hat F(k))) e^{2\pi i k \cdot x},

where {C} is an arbitrary constant. Note that if {F} is smooth, then {\hat F(k)} is rapidly decreasing and the functions {p,v} defined by the above formulae are also smooth.

We can write the above general solution in a form similar to (16), (17) as

\displaystyle  p = C + \Delta^{-1} (\nabla \cdot F)

\displaystyle  v = F - \nabla \Delta^{-1} (\nabla \cdot F)

where, by definition, the inverse Laplacian {\Delta^{-1}} of a smooth periodic function {f: {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} of mean zero is given by the Fourier series formula

\displaystyle  \Delta^{-1} f(x) := \sum_{k \in {\bf Z}^d \backslash \{0\}} \frac{1}{-4\pi^2 |k|^2} \hat f(k) e^{2\pi i k \cdot x}.

(Note that {\nabla \cdot F} automatically has mean zero.) It is easy to see that {\Delta \Delta^{-1} f = f} for such functions {f}, thus justifying the choice of notation. We refer to {F - \nabla \Delta^{-1} (\nabla \cdot F)} as the (periodic) Leray projection of {F} and denote it {\mathbb{P}}, thus in the above solution we have {v = \mathbb{P}(F)}. By construction, {\mathbb{P}(F)} is divergence-free, and {\mathbb{P}(F)} vanishes whenever {F} is a gradient {F = \nabla p}.

If we require {F,v} to be {{\bf Z}^d}-periodic, but do not require {p} to be {{\bf Z}^d}-periodic, then by the previous uniqueness discussion, the general solution is now

\displaystyle  p(x) = C + \Delta^{-1} (\nabla \cdot F)(x) + w \cdot x

\displaystyle  v(x) = F(x) - \nabla \Delta^{-1} (\nabla \cdot F)(x) - w = \mathbb{P}(F) - w

where {C \in {\bf R}} and {w \in {\bf R}^d} are arbitrary.

The above discussion was for smooth periodic functions {F,p,v}, but one can make the same construction in other function spaces. For instance, recall that for any {s \geq 0}, the Sobolev space {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^m)} consists of those elements {f} of {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^m)} whose Sobolev norm

\displaystyle  \| f \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^m)} := (\sum_{k \in {\bf Z}^d} \langle k \rangle^{2s} |\hat f(k)|^2)^{1/2}

is finite, where we use the “Japanese bracket” convention {\langle k\rangle := (1 + |k|^2)^{1/2}}. (One can also define Sobolev spaces for negative {s}, but we will not need them here.) Basic properties of these Sobolev spaces can be found in this previous post. From comparing Fourier coefficients we see that the operators {\Delta^{-1}(\nabla \cdot)} and {\nabla \Delta^{-1} (\nabla \cdot)} defined for smooth periodic functions can be extended without difficulty to {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} (taking values in {H^{s+1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} and {H^s({\bf R}^d/{\bf Z}^d \rightarrow{\bf R}^d)} respectively), with bounds of the form

\displaystyle  \| \nabla \Delta^{-1} (\nabla \cdot F)\|_{H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} \lesssim \| \Delta^{-1} (\nabla \cdot F)\|_{H^{s+1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}

\displaystyle \lesssim \|F\|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

Thus, if {F \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}, then one can solve (15) (in the sense of distributions, at least) with some {v \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} and {p \in H^{s+1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}, with bounds

\displaystyle  \| v \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}, \| p \|_{H^{s+1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \lesssim \|F\|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

In particular, the Leray projection {\mathbb{P}} is bounded on {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}. (In fact it is a contraction; see Exercise 16.)

One can argue similarly in the non-periodic setting, as long as one avoids the one-dimensional case {d=1} which contains some technical divergences. Recall (see e.g., these previous lecture notes on this blog) that functions {f \in L^2({\bf R}^d \rightarrow {\bf R}^m)} have a Fourier transform {\hat f \in L^2({\bf R}^d \rightarrow {\bf R}^m)}, which for {f} in the dense subclass {L^1({\bf R}^d \rightarrow {\bf R}^m) \cap L^2({\bf R}^d \rightarrow {\bf R}^m)} of {L^2({\bf R}^d \rightarrow {\bf R}^m)} is defined by the formula

\displaystyle  \hat f(\xi) := \int_{{\bf R}^d} f(x) e^{-2\pi i x \cdot \xi}\ dx

and then is extended to the rest of {L^2({\bf R}^d \rightarrow {\bf R}^m)} by continuous extension in the {L^2} topology, taking advantage of the Plancherel identity

\displaystyle  \|\hat f\|_{L^2({\bf R}^d \rightarrow {\bf R}^m)} = \| f \|_{L^2({\bf R}^d \rightarrow {\bf R}^m)}. \ \ \ \ \ (20)

The Fourier transform is then extended to tempered distributions in the usual fashion (see this previous set of notes).

We then define the Sobolev space {H^s({\bf R}^d \rightarrow {\bf R}^m)} for {s \geq 0} to be the collection of those functions {f \in L^2({\bf R}^d \rightarrow {\bf R}^m)} for which the norm

\displaystyle  \| f \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)} := (\int_{{\bf R}^d} \langle \xi \rangle^{2s} |\hat f(\xi)|^2\ d\xi)^{1/2}

is finite; equivalently, one has

\displaystyle  \| f \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)} = \| \langle \nabla \rangle^s f \|_{L^2({\bf R}^d \rightarrow{\bf R}^m)}

where the Fourier multiplier {\langle \nabla \rangle^s} is defined by

\displaystyle  \widehat{\langle \nabla \rangle^s f}(\xi) = \langle \xi \rangle^s \hat f(\xi).

For any vector-valued function {F: {\bf R}^d \rightarrow {\bf R}^d} in the Schwartz class, we define {\Delta^{-1} (\nabla \cdot F)} to be the scalar tempered distribution whose (distributional) Fourier transform is given by the formula

\displaystyle  \widehat{\Delta^{-1} (\nabla \cdot F)}(\xi) = -\frac{2\pi i \xi \cdot \hat F(\xi)}{4\pi^2 |\xi|^2} \ \ \ \ \ (21)

and define the Leray projection {\mathbb{P} F} to be the vector-valued distribution

\displaystyle  \mathbb{P} F = F - \nabla \Delta^{-1} (\nabla \cdot F) \ \ \ \ \ (22)

or in terms of the (distributional) Fourier transform

\displaystyle  \widehat{\mathbb{P} F}(\xi) = \hat F(\xi) - \frac{\xi}{|\xi|^2} \xi \cdot \hat F(\xi).

Then by using the well-known relationship

\displaystyle  \widehat{\partial_j F}(\xi) = 2\pi i \xi_j \hat F(\xi)

between (distributional) derivatives and (distributional) Fourier transforms we see that the tempered distributions

\displaystyle  p = \Delta^{-1} (\nabla \cdot F), v = F - \nabla \Delta^{-1} (\nabla \cdot F)

solve the equation (15) in the distributional sense, and hence also in the classical sense since {p,F} have rapidly decreasing Fourier transforms and are thus smooth.

As in the periodic case we see that we have the bound

\displaystyle  \| \mathbb{P}(F) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} \lesssim \| F \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}

for all Schwartz vector fields {F} (in fact {\mathbb{P}} is again a contraction), so we can extend the Leray projection without difficulty to {H^s({\bf R}^d \rightarrow {\bf R}^d)} functions. The operator {\Delta^{-1} (\nabla \cdot)} can similarly be extended continuously to a map from {H^s({\bf R}^d \rightarrow {\bf R}^m)} to the space {\{ f: \nabla f \in H^s({\bf R}^d \rightarrow {\bf R}^d)\}} of scalar tempered distributions with gradient in {H^s({\bf R}^d \rightarrow {\bf R}^d)}, although we will not need to work directly with the pressure much in this course. This allows us to solve (15) in a distributional sense for all {F \in H^s({\bf R}^d \rightarrow {\bf R}^d)}.

Remark 15 (Remark removed due to inaccuracy.)

Exercise 16 (Hodge decomposition) Define the following three subspaces of the Hilbert space {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}:

  • {d{\mathcal E}(\Omega^0)} is the space of all elements of {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} of the form {u = \nabla f} (in the sense of distributions) for some {f \in H^1({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})};
  • {{\mathcal H}^1} is the space of all elements of {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} that are weakly harmonic in the sense that {\Delta u = 0} (in the sense of distributions).
  • {d^*{\mathcal E}(\Omega^2)} is the space of all elements {u = (u_1,\dots,u_n)} of {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} which take the form

    \displaystyle  u_i = \partial_j \omega_{ij}

    (with the usual summation conventions) for some tensor {(\omega_{ij})_{1 \leq i,j \leq d} \in H^1({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d^2})} obeying the antisymmetry property {\omega_{ji} = -\omega_{ij}}.

Exercise 17 (Helmholtz decomposition) Define the following two subspaces of the Hilbert space {L^2({\bf R}^d \rightarrow {\bf R}^d)}:

  • {H_{\mathrm{df}}} is the space of functions {u \in L^2({\bf R}^d \rightarrow {\bf R}^d)} which are divergence-free, by which we mean that {\nabla \cdot u = 0} in the sense of distributions.
  • {H_{\mathrm{cf}}} is the space of functions {u \in L^2({\bf R}^d \rightarrow {\bf R}^d)} which are curl-free, by which we mean that {\nabla \wedge u = 0} in the sense of distributions, where {\nabla \wedge u} is the rank two tensor with components {(\nabla \wedge u)_{ij} := \partial_i u_j - \partial_j u_i}.
  • (a) Show that these two spaces are closed subspaces of {L^2({\bf R}^d \rightarrow {\bf R}^d)}, and one has the orthogonal decomposition

    \displaystyle  L^2({\bf R}^d \rightarrow {\bf R}^d) = H_{\mathrm{df}} \oplus H_{\mathrm{cf}}.

    This is known as the Helmholtz decomposition (particularly in the three-dimensional case {d=3}, in which one can interpret {\nabla \wedge u} as the curl of {u}).

  • (b) Show that on {L^2({\bf R}^d \rightarrow {\bf R}^d)}, the Leray projection {\mathbb{P}} is the orthogonal projection to {H_{\mathrm{df}}}.
  • (c) Show that the Leray projection is a contraction on {H^s({\bf R}^d \rightarrow {\bf R}^d)} for all {s \geq 0}.

Exercise 18 (Singular integral form of Leray projection) Let {d \geq 3}. Then the function {x \mapsto \frac{1}{|x|^{d-2}}} is locally integrable and thus well-defined as a distribution.

Remark 19 One can also solve (15) in {L^q}-based Sobolev spaces for exponents {1 < q < \infty} other than {q=2} by using Calderón-Zygmund theory and the singular integral form of the Leray projection given in Exercise 18. However, we will try to avoid having to rely on this theory in these notes.

— 3. The heat equation —

We now turn to the study of the heat equation

\displaystyle  \partial_t u = \nu \Delta u \ \ \ \ \ (23)

on a spacetime region {[0,T] \times {\bf R}^d}, with initial data {u(0) = u_0}, where {\nu>0} is a fixed constant; we also consider the inhomogeneous analog

\displaystyle  \partial_t u = \nu \Delta u + F \ \ \ \ \ (24)

with some forcing term {F: [0,T] \times {\bf R}^d \rightarrow {\bf R}}.

Formally, the solution to the initial value problem for (23) should be given by {u(t) = e^{\nu t \Delta} u_0}, and (by the Duhamel formula(6)) the solution to (24) should similarly be

\displaystyle  u(t) = e^{\nu t \Delta} u_0 + \int_0^t e^{\nu (t-s) \Delta} F(s)\ ds;

but there are subtleties arising from the unbounded nature of {\Delta}.

The first issue is that even if {u_0} vanishes and {u} is required to be smooth without any decay hypothesis at infinity, one can have non-uniqueness. The following counterexample is basically due to Tychonoff:

Exercise 20 (Tychonoff example) Let {1 < \theta 0}.

  • (a) Show that there exists smooth, compactly supported function {\phi: {\bf R} \rightarrow {\bf R}}, not identically zero, obeying the derivative bounds

    \displaystyle  |\phi^{(k)}(t)| \leq (Ck)^{\theta k}

    for all {k \geq 0} and {t \in {\bf R}}. (Hint: one can construct {\phi = \psi_1 * \psi_2 * \dots} as the convolution of an infinite number of approximate identities {\psi_n}, where each {\psi_n} is supported on an interval of length {n^{-\theta}}, and use the identity {\frac{d}{dt}(f*g) = (\frac{d}{dt} f) * g = f * \frac{d}{dt} g} repeatedly. To justify things rigorously, one may need to first work with finite convolutions and take limits.)

  • (b) With {\phi} as in part (i) show that the function

    \displaystyle  u(t,x) := \sum_{k=0}^\infty \frac{x^{2k}}{\nu^k (2k)!} \phi^{(k)}(t)

    is well-defined as a smooth function on {{\bf R} \times {\bf R}} that is compactly supported in time, and obeys the heat equation (23) for {d=1} without being identically zero.

  • (c) Show that the initial value problem to (23) is not unique (for any dimension {d \geq 1}) if {u} is only required to be smooth, even if {u_0} vanishes.

Exercise 21 (Kowalevski example)

  • (a) Let {u_0: {\bf R} \rightarrow {\bf R}} be the function {u_0(x) := \frac{1}{1+x^2}}. Show that there does not exist any solution {u: {\bf R} \times {\bf R} \rightarrow {\bf R}} to (23) that is jointly real analytic in {t,x} at {0} (that is to say, it can be expressed as an absolutely convergent power series in {t,x} in a neighbourhood of {0}).
  • (b) Modify the above example by replacing {\frac{1}{1+x^2}} by a function that extends to an entire function on {{\bf C}} (as opposed to {z \mapsto \frac{1}{1+z^2}}, which has poles at {\pm i}).

This classic example, due to Sofia Kowalevski, demonstrates the need for some hypotheses on the PDE in order to invoke the Cauchy-Kowaleski theorem.

One can recover uniqueness (forwards in time) by imposing some growth condition at infinity. We give a simple example of this, which illustrates a basic tool in the subject, namely the energy method, which is based on understanding the rate of change of various “energy” integrals of integrands which primarily involve quadratic expressions of the solution or its derivatives. The reason for favouring quadratic expressions is that they are more likely to produce integrals with a definite sign (positive definite or negative definite), such as (squares of) {L^2} norms or higher Sobolev norms of the solution, particularly after suitable application of integration by parts.

Proposition 22 (Uniqueness with energy bounds) Let {T>0}, and let {u, v: [0,T] \times {\bf R} \rightarrow {\bf R}^m} be smooth solutions to (24) with common initial data {u(0) = v(0) = u_0} and forcing term {F: [0,T] \times {\bf R} \rightarrow {\bf R}^m} such that the norm

\displaystyle  \| u \|_{L^\infty_t L^2_x([0,T] \times {\bf R} \rightarrow {\bf R}^m)} := \sup_{t \in [0,T]} \|u\|_{L^2({\bf R} \rightarrow {\bf R}^m)}

of {u} is finite, and similarly for {v}. Then {u=v}.

Proof: As the heat equation (23) is linear, we may subtract {v} from {u} and assume without loss of generality that {v=0}, {u_0=0}, and {F=0}. By working with each component separately we may take {m=1}.

Let {\eta: {\bf R}^d \rightarrow {\bf R}} be a non-negative test function supported on {B(0,2)} that equals {1} on {B(0,1)}. Let {R>0} be a parameter, and consider the “energy” (or more precisely, “local mass”)

\displaystyle  E_R(t) := \int_{\bf R} u(t,x)^2 \eta(x/R)\ dx

for {t \in [0,T]}. As {u(0)=u_0=0}, we have {E_R(t)=0}. As {u} is smooth and {\eta} is compactly supported, {E_R(t)} depends smoothly on {t}, and we can differentiate under the integral sign to obtain

\displaystyle  \partial_t E_R(t) = 2 \int_{\bf R} u(t,x) \partial_t u(t,x) \eta(x/R)\ dx.

Using (23) we thus have

\displaystyle  \partial_t E_R(t) = 2 \nu \int_{\bf R} u(t,x) \partial_i \partial_i u(t,x) \eta(x/R)\ dx

using the usual summation conventions.

A basic rule of thumb in the energy method is this: whenever one is faced with an integral in which one term in the integrand has much lower regularity (or much less control on regularity) than any other, due to a large number of derivatives placed on that term, one should integrate by parts to move one or more derivatives off of that term to other terms in order to make the distribution of derivatives more balanced (which, as we shall see, tends to make the integrals easier to estimate, or to ascribe a definite sign to). Accordingly, we integrate by parts to write

\displaystyle  \partial_t E_R(t) = - 2 \nu \int_{\bf R} \partial_i u(t,x) \partial_i u(t,x) \eta(x/R)\ dx

\displaystyle  - 2 \nu R^{-1} \int_{\bf R} u(t,x) \partial_i u(t,x) \partial_i \eta(x/R)\ dx.

The first term is non-positive, thus we may discard it to obtain the inequality

\displaystyle  \partial_t E_R(t) \leq - 2 \nu R^{-1} \int_{\bf R} u(t,x) \partial_i u(t,x) \partial_i \eta(x/R)\ dx.

Another rule of thumb in the energy method is to keep an eye out for opportunities to express some expression appearing in the integrand as a total derivative In this case, we can write

\displaystyle  2 u(t,x) \partial_i u(t,x) = \partial_i ( u(t,x)^2 )

and then integrate by parts to move the derivative on to the much more slowly varying function {\eta(x,R)} to conclude

\displaystyle  \partial_t E_R(t) \leq \nu R^{-2} \int_{\bf R} u(t,x)^2 \partial_i \partial_i \eta(x/R)\ dx.

In particular we have a bound of the form

\displaystyle  \partial_t E_R(t) \lesssim_{\nu,\eta} R^{-2} \| u \|_{L^\infty_t L^2_x([0,T] \times {\bf R})}^2

where the subscript indicates that the implied constant can depend on {\nu} and {\eta}. Since {E_R(0)=0}, we conclude from the fundamental theorem of calculus that

\displaystyle  E_R(t) \lesssim_{\nu,\eta,T} R^{-2} \| u \|_{L^\infty_t L^2_x([0,T] \times {\bf R})}^2

for all {t \in [0,T]} (note how it is important here that we evolve forwards in time, rather than backwards). Sending {R \rightarrow \infty} and using the dominated convergence theorem, we conclude that

\displaystyle  \int_{\bf R} u(t,x)^2\ dx \lesssim_{\nu,\eta,T} 0

and thus {u} vanishes identically, as required. \Box

Now we turn to existence for the heat equation, restricting attention to forward in time solutions. Formally, if one solves the heat equation (23), then on taking spatial Fourier transforms

\displaystyle  \hat u(t,\xi) := \int_{{\bf R}^d} u(t,x) e^{-2\pi i \xi \cdot x}\ dx

the equation transforms to the ODE

\displaystyle  \partial_t \hat u(t,\xi) = - 4\pi^2 \nu |\xi|^2 \hat u(t,\xi)

which when combined with the initial condition {u(0) = u_0} gives

\displaystyle  \hat u(t,\xi) = e^{- 4\pi^2 \nu |\xi|^2 t} \hat u_0(\xi)

and hence by the Fourier inversion formula we arrive (formally, at least) at the representation

\displaystyle  u(t,x) = \int_{{\bf R}^d} e^{- 4\pi^2 \nu |\xi|^2 t} \hat u_0(\xi) e^{2\pi i \xi \cdot x}\ dx. \ \ \ \ \ (25)

As we are assuming forward time evolution {t \geq 0}, the exponential factor {e^{- 4\pi^2 \nu |\xi|^2 t}} here is bounded. In the case that {u_0:{\bf R}^d \rightarrow {\bf R}^m} is a Schwartz function, then {\hat u_0} is also Schwartz, and this formula is certainly well-defined to be smooth in both time and space (and rapidly decreasing in space for any fixed time), and in particular in {L^\infty_t L^2_x([0,+\infty) \times {\bf R}^d)}; one can easily justify differentiation under the integral sign to conclude that (23) is indeed verified, and the Fourier inversion formula shows that we have the initial data condition {u(0)=u_0}. So this is the unique solution to the initial value problem (23) for the heat equation that lies in {L^\infty_t L^2_x}. By definition we declare the right-hand side of (25) to be {e^{\nu t\Delta} u_0}, thus

\displaystyle  e^{\nu t \Delta} u_0(x) = \int_{{\bf R}^d} e^{- 4\pi^2 \nu |\xi|^2 t} \hat u_0(\xi) e^{2\pi i \xi \cdot x}\ d\xi \ \ \ \ \ (26)

for all {t \geq 0} and all Schwartz functions {u}; equivalently, one has

\displaystyle  \widehat{e^{\nu t \Delta} u_0}(\xi) = e^{- 4\pi^2 \nu |\xi|^2 t} \hat u_0(\xi). \ \ \ \ \ (27)

(One can justify this choice of notation using the functional calculus of the self-adjoint operator {\Delta}, as discussed for instance in this previous blog post, but we will not do so here since the Fourier transform is available as a substitute.) It is also clear from (27) that {e^{\nu t\Delta}} commutes with other Fourier multipliers such as {\langle \nabla \rangle^s} or constant-coefficient differential operators, on Schwartz functions at least.

From (27) and Plancherel’s theorem we see that {e^{\nu t \Delta}} for {t \geq 0} is a contraction in (the Schwartz functions of) {L^2({\bf R}^d \rightarrow {\bf R}^m)}, and more generally in {H^s({\bf R}^d \rightarrow {\bf R}^m)} for any {s \geq 0}, thus

\displaystyle  \| e^{\nu t \Delta} u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)} \leq \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)}

for any Schwartz {u_0: {\bf R}^d \rightarrow {\bf R}^m} and any {s, t \geq 0}. Thus by density one can extend the heat propagator {e^{\nu t \Delta}} for {t \geq 0} to all of {L^2({\bf R}^d \rightarrow {\bf R}^m)}, in a fashion that is a contraction on {L^2({\bf R}^d \rightarrow {\bf R}^m)} and more generally on {H^s({\bf R}^d \rightarrow {\bf R}^m)}. By a limiting argument, (27) holds almost everywhere for all {u_0 \in L^2({\bf R}^d \rightarrow {\bf R}^m)}.

There is also a smoothing effect:

Exercise 23 (Smoothing effect) Let {s' \geq s \geq 0}. Show that

\displaystyle  \| e^{\nu t \Delta} u_0 \|_{H^{s'}({\bf R}^d \rightarrow {\bf R}^m)} \lesssim_{s,s'} (1+(\nu t)^{-\frac{s'-s}{2}}) \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)}

for all {u_0 \in H^s({\bf R}^d \rightarrow {\bf R}^m)} and {\nu, t > 0}.

Exercise 24 (Fundamental solution for the heat equation) For {u_0 \in L^2({\bf R}^d \rightarrow {\bf R}^m)} and {t>0}, establish the identity

\displaystyle  e^{\nu t \Delta} u_0(x) = \frac{1}{(4\pi \nu t)^{d/2}} \int_{{\bf R}^d} e^{-\frac{|x-y|^2}{4\nu t}} u_0(y)\ dy.

for almost every {x \in {\bf R}^d}. (Hint: first work with Schwartz functions. Either compute the Fourier transform explicitly, or verify directly that the heat equation initial value problem is solved by the right-hand side.) Conclude in particular that (after modification on a measure zero set if necessary) {e^{\nu t \Delta} u_0} is smooth for any {t>0}.

Exercise 25 (Ill-posedness of the backwards heat equation) Show that there exists a Schwartz function {u_0: {\bf R}^d \rightarrow {\bf R}} with the property that there is no solution {u \in L^\infty_t L^2_x([-T,0] \times {\bf R}^d \rightarrow {\bf R})} to (23) with final data {u(0)=u_0} for any {-T < 0}. (Hint: choose {u_0} so that the Fourier transform {\hat u_0} decays somewhat, but not extremely rapidly. Then argue by contradiction using (27).

Exercise 26 (Continuity in the strong operator topology) For any {s \geq 0}, let {C^0_t H^s_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^m)} denote the Banach space of functions {u: [0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^m} such that for each {t}, {u(t)} lies in {H^s_x({\bf R}^d \rightarrow {\bf R}^m)} and varies continuously and boundedly in {t} in the strong topology, with norm

\displaystyle  \| u \|_{C^0_t H^s_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^m)} := \sup_{t \in [0,+\infty)} \| u(t) \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^m)}.

Show that if {u_0 \in H^s_x({\bf R}^d \rightarrow {\bf R}^m)} and {u(t) = e^{\nu t \Delta} u_0} solves the heat equation on {[0,+\infty) \times {\bf R}^d}, then {u \in C^0_t H^s_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^m)} with

\displaystyle  \| u \|_{C^0_t H^s_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^m)} \leq \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^m)}.

Similar considerations apply to the inhomogeneous heat equation (24). If {u_0: {\bf R}^d \rightarrow {\bf R}^m} and {F: [0,T] \times {\bf R}^d \rightarrow {\bf R}^m} are Schwartz for some {T>0}, then the function {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^m} defined by the Duhamel formula

\displaystyle  u(t) := e^{\nu t \Delta} u_0 + \int_0^t e^{\nu(t-s) \Delta} F(s)\ ds \ \ \ \ \ (28)

can easily be verified to also be Schwartz and solve (24) with initial data {u_0}; by Proposition 22, this is the only such solution in {L^\infty_t L^2_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^m)}. It also obeys good estimates:

Exercise 27 (Energy estimates) Let {u_0: {\bf R}^d \rightarrow {\bf R}^m} and {F, G: [0,T] \times {\bf R}^d \rightarrow {\bf R}^m} be Schwartz functions for some {T>0}, and let {u} be the solution to the equation

\displaystyle  \partial_t u = \nu \Delta u + F + \nabla G

with initial condition {u(0) = u_0} given by the Duhamel formula. For any {s \geq 0}, establish the energy estimate

\displaystyle  \| u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d\rightarrow {\bf R}^m)} + \nu^{1/2} \| \nabla u \|_{L^2_t H^s_x([0,T] \times {\bf R}^d\rightarrow {\bf R}^m)} \ \ \ \ \ (29)

\displaystyle  \lesssim \|u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^m)} + \| F \|_{L^1_t H^s_x([0,T] \times {\bf R}^d\rightarrow {\bf R}^m)}

\displaystyle  + \nu^{-1/2} \| G \|_{L^2_t H^s_x([0,T] \times {\bf R}^d\rightarrow {\bf R}^m)}

in two different ways:

  • (i) By using the Fourier representation (27) and Plancherel’s formula;
  • (ii) By using energy methods as in the proof of Proposition 22. (Hint: first reduce to the case {s=0}. You may find the arithmetic mean-geometric mean inequality {ab \leq \frac{1}{2} a^2 + \frac{1}{2} b^2} to useful.)

Here of course we are using the norms

\displaystyle  \| F \|_{L^1_t H^s_x([0,T] \times {\bf R}^d)\rightarrow {\bf R}^m} := \int_{[0,T]} \| F(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)}\ dt


\displaystyle  \| G \|_{L^2_t H^s_x([0,T] \times {\bf R}^d)\rightarrow {\bf R}^m} := (\int_{[0,T]} \| F(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^m)}^2\ dt)^{1/2}

The energy estimate contains some smoothing effects similar (though not identical) to those in Exercise 23, since it shows that {u} can in principle be one degree of regularity smoother than {u_0} (if one averages in time in an {L^2} sense, and the viscosity {\nu} is not sent to zero), and two degrees of regularity smoother than the forcing term {\nabla G} (with the same caveats). As we shall shortly see, this smoothing effect will allow us to handle the nonlinear terms in the Navier-Stokes equations for the purposes of setting up a local well-posedness theory.

Exercise 28 (Distributional solution) Let {s \geq 0}, let {u_0 \in H^s({\bf R}^d \rightarrow {\bf R})}, and let {F \in C^0_t H^s_x( [0,T] \times {\bf R}^d \rightarrow {\bf R})} for some {T>0}. Let {u \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R})} be given by the Duhamel formula (28). Show that (24) is true in the spacetime distributional sense, or more precisely that

\displaystyle  \langle \partial_t u(t), \phi \rangle = \langle \nu\Delta u + F, \phi \rangle \ \ \ \ \ (30)

in the sense of spaceime distributions for any test function {\phi: [0,T] \times {\bf R}^d \rightarrow {\bf R}} supported in the interior of {[0,T] \times {\bf R}^d}.

Pretty much all of the above discussion can be extended to the periodic setting:

Exercise 29 Let {d,m \geq 1} and {\nu > 0}.

Remark 30 The heat equation for negative viscosities {\nu < 0} can be transformed into a positive viscosity heat equation by time reversal: if {u} solves the equation {\partial_t u = - \nu \Delta u}, then {(t,x) \mapsto (-t, x)} solves the equation {\partial_t u = \nu \Delta u}. Thus one can solve negative viscosity heat equations (also known as backwards heat equations) backwards in time, but one tends not to have well-posedness forwards in time. In a similar spirit, if {\nu} is positive, one can normalise it to (say) {1} by an appropriate rescaling of the time variable, {t \mapsto t/\nu}. However, we will generally keep the parameter {\nu} non-normalised in preparation for understanding the limit as {\nu \rightarrow 0}.

— 4. Local well-posedness for Navier-Stokes —

We now have all the ingredients necessary to create a local well-posedness theory for the Navier-Stokes equations (1).

We first dispose of the one-dimensional case {d=1}, which is rather degenerate as incompressible one-dimensional fluids are somewhat boring. Namely, suppose that one had a smooth solution to the one-dimensional Navier-Stokes equations

\displaystyle  \partial_t u + u \partial_x u = - \partial_x p + \nu \partial_{xx} u

\displaystyle  \partial_x u = 0.

The second equation implies that {u} is just a function of time, {u = u(t)}, and the first equation becomes

\displaystyle  u'(t) = - \partial_x p(t,x).

To solve this equation, one can set {u: {\bf R} \rightarrow {\bf R}} to be an arbitrary smooth function of time, and then set

\displaystyle  p(t,x) = a(t) - u'(t) x

for an arbitrary smooth function {a: {\bf R} \rightarrow {\bf R}}. If one requires the pressure to be bounded, then {u'} vanishes identically, and then {u} is constant in time, which among other things shows that the initial value problem is (rather trivially) well-posed in the category of smooth solutions, up to the ability to alter the pressure by an arbitrary constant {a(t)}. On the other hand, if one does not require the pressure to stay bounded, then one has a lot less uniqueness, since the function {u(t)} is essentially unconstrained.

Now we work in two or higher dimensions {d \geq 2}, and consider solutions to (1) on the spacetime region {[0,T] \times {\bf R}^d}. To begin with, we assume that {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} is smooth and periodic in space: {u(t,x+n) = u(t,x)} for {n \in {\bf Z}^d}; we assume {p} is smooth but do not place any periodicity hypotheses on it. Then, by (1), {\nabla p} is periodic. In particular, for any {n \in {\bf Z}^d} and {t}, the function {x \mapsto p(t,x+n)-p(t,x)} has vanishing gradient and is thus constant in {x}, so that

\displaystyle  p(t,x+n) - p(t,x) = q_n(t)

for all {x \in {\bf R}^d} and some function {q_n(t)} of {t}. The map {n \mapsto q_n(t)} is a homomorphism for fixed {t}, so we can write {q_n(t) = n \cdot q(t)} for some {q: [0,T] \rightarrow {\bf R}^d}, which will be smooth since {p} is smooth. We thus have {p(t,x) = p_0(t,x) + x \cdot q(t)} for some smooth {{\bf Z}^d}-periodic function {p_0}. By subtracting off the mean, we can further decompose

\displaystyle  p(t,x) = p_1(t,x) + x \cdot q(t) + r(t)

for some smooth function {r: [0,T] \rightarrow {\bf R}} and some smooth {{\bf Z}^d}-periodic function {p_1: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} which has mean zero at every time.

Note that one can simply omit the constant term {r(t)} from the pressure without affecting the system (1). One can also eliminate the linear term {x \cdot q(t)} by the following “generalised Galilean transformation“. If {u, p, p_1, q, r} are as above, and one lets

\displaystyle  Q(t) := \int_0^t q(s)\ ds

be the primitive of {q}, and

\displaystyle  Q_2(t) := \int_0^t Q(s)\ ds

be the primitive of {Q}, then a short calculation reveals that the smooth function {u_2: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} defined by

\displaystyle  u_2(t, x) := u(t, x + Q_2(t)) - Q(t)

and the smooth function {p_2: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} defined by

\displaystyle  p_2(t,x) := p_1(t,x + Q_2(t))

solves the Navier-Stokes equations

\displaystyle  \partial_t u_2 + u_2 \cdot \nabla u_2 = - \nabla p_2 + \nu \Delta u_2

\displaystyle  \nabla \cdot u_2 = 0

with {u_2} having the same initial data as {u}; conversely, if {(u_2,p_2)} is a solution to Navier-Stokes, then so is {(u,p)}. In particular this reveals a lack of uniqueness for the periodic Navier-Stokes equations that is essentially the same lack of uniqueness that is present for the Leray system: one can add an arbitrary spatially affine function to the pressure {p} by applying a suitable Galilean transform to {u}. On the other hand, we can eliminate this lack of uniqueness by requiring that the pressure be normalised in the sense that {q(t)=0} and {r(t)=0}, that is to say we require {p} to be {{\bf Z}^d}-periodic and mean zero. The above discussion shows that any smooth solution to Navier-Stokes with {u} periodic can be transformed by a Galilean transformation to one in which the pressure is normalised.

Once the pressure is normalised, it turns out that one can recover uniqueness (much as was the case with the Leray system):

Theorem 31 (Uniqueness with normalised pressure) Let {(u_1,p_1), (u_2,p_2)} be two smooth periodic solutions to (1) on {[0,T] \times {\bf R}^d} with normalised pressure such that {u_1(0)=u_2(0)}. Then {(u_1,p_1)=(u_2,p_2)}.

Proof: We use the energy method. Write {w = u_1 - u_2}, then subtracting (1) for {(u_1,p_1)} from {(u_2,p_2)} we see that {w: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} is smooth with

\displaystyle  \partial_t w + (u_1 \cdot \nabla) w + (w \cdot \nabla) u_2 = \nu \Delta w - \nabla (p_1 - p_2)


\displaystyle  \nabla \cdot w = 0.

Now we consider the energy {E(t) := \int_{{\bf R}^d/{\bf Z}^d} |w(t,x)|^2\ dx}. This varies smoothly with {t}, and we can differentiate under the integral sign to obtain

\displaystyle  \partial_t E(t) = 2 \int_{{\bf R}^d/{\bf Z}^d} w(t,x) \cdot \partial_t w(t,x)\ dx

\displaystyle  = A + B + C+ D


\displaystyle  A := - 2 \int_{{\bf R}^d/{\bf Z}^d} w \cdot (u_1 \cdot \nabla) w\ dx

\displaystyle  B := -2 \int_{{\bf R}^d/{\bf Z}^d} w\cdot (w \cdot \nabla) u_2\ dx

\displaystyle  C := 2\nu \int_{{\bf R}^d/{\bf Z}^d} w \partial_i \partial_i w\ dx

\displaystyle  D := - 2 \int_{{\bf R}^d/{\bf Z}^d} w \cdot \nabla (p_1-p_2)\ dx

and we have omitted the explicit dependence on {t} and {x} for brevity.

For {A}, we observe the total derivative {2 w \cdot (u_1 \cdot \nabla) w = (u_1 \cdot \nabla) |w|^2} and integrate by parts to conclude that

\displaystyle  A = \int_{{\bf R}^d/{\bf Z}^d} (\nabla \cdot u_1) |w|^2\ dx = 0

since {u_1} is divergence-free. Similarly, integration by parts shows that {D} vanishes since {w} is divergence-free. Another integration by parts gives

\displaystyle  C = - 2\nu \int_{{\bf R}^d/{\bf Z}^d} \partial_i w \partial_i w\ dx

and hence {C \leq 0}. Finally, from Hölder’s inequality we have

\displaystyle  B \leq 2 E(t) \|\nabla u_2 \|_{L^\infty_t L^\infty_x([0,T] \times {\bf R}^d/{\bf Z}^d)}

and hence

\displaystyle  \partial_t E(t) \leq 2 E(t) \|\nabla u_2 \|_{L^\infty_t L^\infty_x([0,T] \times {\bf R}^d/{\bf Z}^d)}.

Since {E(0)=0}, we conclude from Gronwall’s inequality that {E(t) \leq 0} for all {t \in [0,T]}, and hence {w} is identically zero, thus {u_1=u_2}. Substituting this into (1) we conclude that {\nabla p_1 = \nabla p_2}; as {p_1,p_2} have mean zero, we conclude (e.g., from Fourier inversion) that {p_1=p_2}, and the claim follows. \Box

Now we turn to existence in the periodic setting, assuming normalised pressure. For various technical reasons, it is convenient to reduce to the case when the velocity field {u} has zero mean. Observe that the right-hand sides {\nu \Delta u}, {\nabla p} of (1) have zero mean on {{\bf R}^d/{\bf Z}^d}, thanks to integration by parts. A further integration by parts, using the divergence-free condition {\nabla \cdot u = 0}, reveals that the transport term {(u \cdot \nabla) u} also has zero mean:

\displaystyle  \int_{{\bf R}^d/{\bf Z}^d} ((u \cdot \nabla) u)_i\ dx = \int_{{\bf R}^d/{\bf Z}^d} u_j \partial_j u_i\ dx

\displaystyle  = - \int_{{\bf R}^d/{\bf Z}^d} (\partial_j u_j) u_i\ dx

\displaystyle  = 0.

Thus, we see that the mean {\int_{{\bf R}^d/{\bf Z}^d} u(t,x)\ dx} is a conserved integral of motion: if {v_0 := \int_{{\bf R}^d/{\bf Z}^d} u_0(x)\ dx} is the mean initial velocity, and {(u,p)} is a solution to (1) (obeying some minimal regularity hypothesis), then {u(t)} continues to have mean velocity {v_0} for all subsequent times. On the other hand, if {(u,p)} is a smooth periodic solution to (1) with normalised pressure and initial velocity {u_0}, then the Galilean transform {(\tilde u, \tilde p)} defined by

\displaystyle  \tilde u(t,x) := u(t, x + v_0 t) - v_0

\displaystyle  \tilde p(t,x) := p(t, x + v_0 t)

can be easily verified to be a smooth periodic solution to (1) with normalised pressure and initial velocity {u_0 - v_0}. Of course, one can reconstruct {(u,p)} from {(\tilde u,\tilde p)} by the inverse tranformation

\displaystyle  u(t,x) = \tilde u(t, x - v_0 t) + v_0

\displaystyle  p(t,x) = \tilde p(t, x - v_0 t).

Thus, up to this simple transformation, solving the initial value problem for (1) for {u_0} is equivalent to that of {u_0-v_0}, so we may assume without loss of generality that the initial velocity (and hence the velocity at all subsequent times) has zero mean.

A general rule of thumb is that whenever an integral of a solution to a PDE can be proven to vanish (or be equal to boundary terms) by integration by parts, it is because the integrand can be rewritten in “divergence form” – as the divergence of a tensor of one higher rank. (This is because the integration by parts identity {\int f \partial_i g = - \int (\partial_i f) g} arises from the divergence form {\partial_i(fg)} of the expression {f \partial_i g + (\partial_i f) g}.) Thus we expect the transport term {(u \cdot \nabla) u} to be in divergence form. Indeed, in components we have

\displaystyle  ((u \cdot \nabla) u)_i = u_j \partial_j u_i;

since we have the divergence-free condition {\partial_j u_j=0}, we thus have from the Leibniz rule that

\displaystyle  ((u \cdot \nabla) u)_i = \partial_j (u_j u_i).

We write this in coordinate-free notation as

\displaystyle  (u \cdot \nabla) u = \nabla \cdot (u \otimes u)

where {u \otimes u} is the tensor product {(u \otimes u)_{ji} := u_j u_i} and {\nabla \cdot (u \otimes u)} denotes the divergence

\displaystyle  (\nabla \cdot (u \otimes u))_i = \partial_j (u \otimes u)_{ji}.

Thus we can rewrite (1) as the system

\displaystyle  \partial_t u + \nabla \cdot (u \otimes u) = \nu \Delta u - \nabla p \ \ \ \ \ (32)

Next, we observe that we can use the Leray projection operator {\mathbb{P}} to eliminate the role of the (normalised) pressure. Namely, if {(u,p)} are a smooth periodic solution to (1) with normalised pressure, then on applying {\mathbb{P}} (which preserves divergence-free vector fields such as {\partial_t u} and {\Delta u}, but annihilates gradients such as {\nabla p}) we conclude an equation that does not involve the pressure at all:

\displaystyle  \partial_t u + \mathbb{P}( \nabla \cdot (u \otimes u) ) = \nu \Delta u. \ \ \ \ \ (33)

Conversely, suppose that one has a smooth periodic solution {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to (33) with initial condition {u(0)=u_0} for some smooth periodic divergence-free vector field {u_0: {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d}. Taking divergences of both sides of (33), we then conclude that

\displaystyle  \partial_t (\nabla \cdot u) = \nu \Delta (\nabla \cdot u),

that is to say {\nabla \cdot u} obeys the heat equation (23). Since {\nabla \cdot u} is periodic, smooth, and vanishes at {0}, we see from Exercise 29(c) that {\nabla \cdot u} vanishes on all of {[0,T] \times {\bf R}^d/{\bf Z}^d}, thus {u} is divergence free on the entire time interval {[0,T]}. From (33) and (22) we thus see that if one defines {p: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} to be the function

\displaystyle  p := - \Delta^{-1} (\nabla \cdot \nabla \cdot (u \otimes u))

(which can easily be verified to be a smooth function in both space and time) then {(u,p)} is a smooth periodic solution to (1) with normalised pressure and initial condition {u(0)=u_0} (and is thus the unique solution to this system, thanks to Theorem 31). Thus, the problem of finding a smooth solution to (1) in the smooth periodic setting with normalised pressure and divergence-free initial data {u(0)=u_0} is equivalent to that of solving (33) with the same initial data.

By Duhamel’s formula (Exercise 29(c)), any smooth solution {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to the initial value problem (33) with {u(0)=u_0} obeys the Duhamel formula

\displaystyle  u(t) = e^{\nu t \Delta} u_0 + \int_0^t e^{\nu (t-s) \Delta} \mathbb{P}( \nabla \cdot (u(s) \cdot u(s)) )\ ds. \ \ \ \ \ (34)

(The operator {e^{\nu (t-s) \Delta} \mathbb{P}} is sometimes referred to as the Oseen operator in the literature.) Conversely, a smooth solution {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to (34) will solve the initial value problem (33) with initial data {u(0)=u_0}.

To obtain existence of smooth periodic solutions (with normalised pressure) to the Navier-Stokes equations with given smooth divergence-free periodic initial data {u_0}, it thus suffices to find a smooth periodic solution to the integral equation (34). We will achieve this by a two-step procedure:

  • (i) (Existence at finite regularity) Construct a solution {u} to (34) in a certain function space with a finite amount of regularity (assuming that the initial data {u_0} has a similarly finite amount of regularity); and then
  • (ii) (Propagation of regularity) show that if {u_0} is in fact smooth, then the solution constructed in (i) is also smooth.

The reason for this two step procedure is that one wishes to solve (34) using iteration-type methods (which for instance power the contraction mapping theorem that was used to prove the Picard existence theorem); however the function space {C^\infty([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} that one ultimately wishes the solution to lie in is not well adapted for such iteration (for instance, it is not a Banach space, instead being merely a Fréchet space). Instead, we iterate in an auxiliary lower regularity space first, and then “bootstrap” the lower regularity to the desired higher regularity. Observe that the same situation occured with the Picard existence theorem, where one performed the iteration in the low regularity space {C([0,T] \rightarrow \overline{B(0,2R)})}, even though ultimately one desired the solution to be continuously differentiable or even smooth.

Of course, to run this procedure, one actually has to write down an explicit function space in which one will perform the iteration argument. Selection of this space is actually a non-trivial matter and often requires a substantial amount of trial and error, as well as experience with similar iteration arguments for other PDE. Often one is guided by the function space theory for the linearised counterpart of the PDE, which in this case is the heat equation (23). As such, the following definition can be at least partially motivated by the energy estimates in Exercise 29(d).

Definition 32 (Mild solution) Let {s \geq 0}, {T>0}, and let {u_0 \in H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free, where {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} denotes the subspace of {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} consisting of mean zero functions. An {H^s}-mild solution (or Fujita-Kato mild solution to the Navier-Stokes equations with initial data {u_0} is a function {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} in the function space

\displaystyle  u \in C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0

\displaystyle  \cap L^2_t H^{s+1}_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0

that obeys the integral equation (34) (in the sense of distributions) for all {t \in [0,T]}. We say that {u} is a mild solution on {[0,T_*)} if it is a mild solution on {[0,T]} for every {0 < T < T_*}.

Remark 33 The definition of a mild solution could be extended to those choices of initial data {u_0} that are not divergence-free, but then this solution concept no longer has any direct connection with the Navier-Stokes equations, so we will not consider such “solutions” here. Similarly, one could also consider mild solutions without the mean zero hypothesis, but the function space estimates are slightly less favourable in this setting and so we shall restrict attention to mean zero solutions only.

Note that the regularity on {u} places {u \otimes u} in {L^\infty_t L^1_x} (with plenty of room to spare), which is more than enough regularity to make sense of the right-hand side of (34). One can also define mild solutions for other function spaces than the one provided here, but we focus on this notion for now, which was introduced in the work of Fujita and Kato. We record a simple compatibility property of mild solutions:

Exercise 34 (Splitting) Let {s \geq 0}, {T>0}, let {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free, and let

\displaystyle u \in C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0

\displaystyle \cap L^2_t H^{s+1}_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0.

Let {0 < \tau < T}. Show that the following are equivalent:

  • (i) {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} is an {H^s} mild solution to the Navier-Stokes equations on {[0,T]} with initial data {u_0}.
  • (ii) {u: [0,\tau] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} is an {H^s} mild solution to the Navier-Stokes equations on {[0,\tau]} with initial data {u_0}, and the translated function {u_\tau: [0,T-\tau] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} defined by {u_\tau(t,x) := u(t+\tau,x)} is an {H^s} mild solution to the Navier-Stokes equations with initial condition {u(\tau)}.

To use this notion of a mild solution, we will need the following harmonic analysis estimate:

Proposition 35 (Product estimate) Let {s \geq 0}, and let {u,v \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}) \cap L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}. Then one has {uv \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}, with the estimate

\displaystyle  \| uv \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \lesssim_{d,s} \| u \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \| v \|_{L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}

\displaystyle + \| u \|_{L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \| v \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}.

When {s=0} this claim follows immediately from Hölder’s inequality. For {s=1} the claim is similarly immediate from the Leibniz rule {\nabla(uv) = (\nabla u) v + u (\nabla v)} and the triangle and Hölder inequalities (noting that {\| u \|_{H^1({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}} is comparable to {\| u \|_{L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} + \| \nabla u \|_{L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}}. For more general {s} the claim is not quite so immediate (for instance, when {s=2} one runs into difficulties controlling the intermediate term {2 (\nabla u) \otimes (\nabla v)} arising in the Leibniz expansion of {\nabla^2(uv)}). Nevertheless the bound is still true. However, to prove it we will need to introduce a tool from harmonic analysis, namely Littlewood-Paley theory, and we defer the proof to the appendix.

We also need a simple case of Sobolev embedding:

Exercise 36 (Sobolev embedding)

  • (a) If {s > \frac{d}{2}}, show that for any {u \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}, one has {u \in L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} with

    \displaystyle  \| u \|_{L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \lesssim_{d,s} \| u \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}.

  • (b) Show that the inequality fails at {s = \frac{d}{2}}.
  • (c) Establish the same statements with {{\bf R}^d/{\bf Z}^d} replaced by {{\bf R}^d} throughout.

In particular, combining this exercise with Proposition 35 we see that for {s > \frac{d}{2}}, {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} is a Banach algebra:

\displaystyle  \| uv \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \lesssim_{d,s} \| u \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \| v \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}. \ \ \ \ \ (35)

Now we can construct mild solutions at high regularities {s > \frac{d}{2}}.

Theorem 37 (Local well-posedness of mild solutions at high regularity) Let {s > \frac{d}{2}}, and let {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free. Then there exists a time

\displaystyle  T \gg_{d,s} \frac{\nu}{\|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}^2},

and an {H^s} mild solution {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to (34). Furthermore, this mild solution is unique.

The hypothesis {s > \frac{d}{2}} is not optimal; we return to this point later in these notes.

Proof: We begin with existence. We can write (34) in the fixed point form

\displaystyle  u = \Phi(u)


\displaystyle  \Phi(u)(t) := e^{\nu t \Delta} u_0 + \int_0^t e^{\nu (t-s) \Delta} \mathbb{P}( \nabla \cdot (u(s) \cdot u(s)) )\ ds. \ \ \ \ \ (36)

We remark that this expression automatically has mean zero since {u_0} has mean zero. Let {X_T} denote the function space

\displaystyle  X_T := C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0 \cap L^2_t H^{s+1}_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0

with norm

\displaystyle  \|u\|_{X_T} := \| u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{1/2} \| u \|_{L^2_t H^{s+1}_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

This is a Banach space. Because of the mean zero restriction on {X}, we may estimate

\displaystyle  \|u\|_{X_T} \lesssim_{s,d} \| u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{1/2} \| \nabla u \|_{L^2_t H^{s}_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d^2})}.

Note that if {u \in X_T}, then {u \otimes u \in C^t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d^2})} by (35), which by Exercise 29(d) (and the fact that {\mathbb{P}} commutes with {\nabla} and is a contraction on {H^s}) implies that {\Phi(u) \in X_T}. Thus {\Phi} is a map from {X_T} to {X_T}. In fact we can obtain more quantitative control on this map. By using Exercise 29(d), (35), and the Hölder bound

\displaystyle  \| F \|_{L^2_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \leq T^{1/2} \| F \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \ \ \ \ \ (37)

we have

\displaystyle  \| \Phi(u) \|_{X_T} \lesssim \|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} \| \mathbb{P} (u \otimes u) \|_{L^2_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^{d^2})}

\displaystyle  \lesssim \|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} T^{1/2} \| u \otimes u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^{d^2})}

\displaystyle  \lesssim_{d,s} \|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} T^{1/2} \| u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)}^2

\displaystyle  \lesssim \|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} T^{1/2} \| u \|_{X_T}^2.

Thus, if we set {R := C_{d,s} \|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}} for a suitably large constant {C_{d,s} > 0}, and set {T := c_{d,s} \frac{\nu}{\|u_0\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}^2}} for a sufficienly small constant {c_{d,s} > 0}, then {\Phi} maps the closed ball {\overline{B_{X_T}(0,R)}} in {X_T} to itself. Furthermore, for {u,v \in X_T}, we have by similar arguments to above

\displaystyle  \| \Phi(u) - \Phi(v) \|_{X_T} \lesssim \nu^{-1/2} \| \mathbb{P} (u \otimes u - v \otimes v) \|_{L^2_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)}

\displaystyle  \lesssim \nu^{-1/2} T^{1/2} \| u \otimes u - v \otimes v\|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)}

\displaystyle  \lesssim_{d,s} \nu^{-1/2} T^{1/2} \| u - v\|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)}

\displaystyle (\| u \|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)} + \| v\|_{C^0_t H^s_x([0,T] \times {\bf R}^d/{\bf Z}^d\rightarrow {\bf R}^d)})

\displaystyle  \lesssim_{d,s} \nu^{-1/2} T^{1/2} \| u - v\|_{X_T} (\|u\|_{X_T} + \|v\|_{X_T})

and hence if the constant {c_{d,s}} is chosen small enough, {\Phi} is also a contraction (with constant, say, {\frac{1}{2}}) on {\overline{B(0,R)}}. Thus there exists {u \in \overline{B(0,R)}} such that {u=\Phi(u)}, thus {u} is an {H^s} mild solution.

Now we show that it is the only mild solution. Suppose for contradiction that there is another {H^s} mild solution {v} with the same initial data {u_0}. This solution {v} might not lie in {\overline{B_{X_T}(0,R)}}, but it will lie in {\overline{B_{X_{T}}(0,R')}} for some {R'>R}. By the same arguments as above, if {0 < T' < T} is sufficiently small depending on {d,s,R'} then {\Phi} will be a contraction on {\overline{B_{X_{T'}}(0,R')}}, which implies that {u} and {v} agree on {[0,T']}. Now we apply Exercise 34 to advance in time by {T'} and iterate this process (noting that {T'} depends on {d,s,R'} but does not otherwise depend on {u} or {v}) until one concludes that {u=v} on all of {[0,T]}. \Box

Iterating this as in the proof of Theorem 8, we have

Theorem 38 (Maximal Cauchy development) Let {s > \frac{d}{2}}, and let {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free. Then there exists a time {0 < T_* \leq \infty} and an {H^s} mild solution {u: [0,T_*) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to (34), such that if {T_* < \infty} then {\|u(t)\|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} \rightarrow \infty} as {t \rightarrow T_*^-}. Furthermore, {T_*} and {u} are unique.

In principle, if the initial data {u_0} belongs to multiple Sobolev spaces {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} the maximal time of existence {T_*} could depend on {s} (so that the solution exits different regularity classes {H^s} at different times). However, this is not the case, because there is an {s}-independent blowup criterion:

Proposition 39 (Blowup criterion) Let {s, u_0, T_*, u} be as in Theorem 38. If {T_* < \infty}, then {\| u \|_{L^\infty_t L^\infty_x([0,T_*))} = \infty}.

Note from Exercise 36 that {\| u \|_{L^\infty_t L^\infty_x([0,T])}} is finite for any {0 \leq T < T_*}. This shows that {T_*} is the unique time at which the {L^\infty_t L^\infty_x} norm “blows up" (becomes infinite) and thus {T_*} is independent of {s}.

Proof: Suppose for contradiction that {T_* < \infty} but that the quantity {A := \| u \|_{L^\infty_t L^\infty_x([0,T_*))}} was finite. Let {0 < \tau_1 < \tau_2 < T_*} be parameters to be optimised in later. We define the norm

\displaystyle  \|u\|_{X_{[\tau_1,\tau_2]}} := \| u \|_{C^0_t H^s_x([\tau_1,\tau_2] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{1/2} \| u \|_{L^2_t H^{s+1}_x([\tau_1,\tau_2] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

As {u} is a mild solution, this expression is finite.

We adapt the proof of Theorem 37. Using Exercise 29(d) (and Exercise 34) we have

\displaystyle  \|u\|_{X_{[\tau_1,\tau_2]}} \lesssim_{s,d} \| u(\tau_1) \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} \| \mathbb{P}(u \otimes u) \|_{L^2_t H^s_x([\tau_1,\tau_2] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

Again we discard {\mathbb{P}} and use (a variant of) (37) to conclude

\displaystyle  \|u\|_{X_{[\tau_1,\tau_2]}} \lesssim_{s,d} \| u(\tau_1) \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} (\tau_2-\tau_1)^{1/2} \| u \otimes u \|_{L^\infty_t H^s_x([\tau_1,\tau_2] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

If we now use Proposition 35 in place of (35), we conclude that

\displaystyle  \|u\|_{X_{[\tau_1,\tau_2]}} \lesssim_{s,d} \| u(\tau_1) \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} (\tau_2-\tau_1)^{1/2} A \| u \|_{L^\infty_t H^s_x([\tau_1,\tau_2] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

\displaystyle  \lesssim_{s,d} \| u(\tau_1) \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \nu^{-1/2} (T_*-\tau_1)^{1/2} A \|u\|_{X_{[\tau_1,\tau_2]}}.

If we choose {\tau_1} to be sufficiently close to {T_*} (depending on {A} and {\nu}), we can absorb the second term on the RHS into the LHS and conclude that

\displaystyle  \|u\|_{X_{[\tau_1,\tau_2]}} \lesssim_{s,d} \| u(\tau_1) \|_{H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

In particular, {\|u(\tau_2)\|_{H^s_x({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}} stays bounded as {\tau_2 \rightarrow T_*}, contradicting Theorem 38. \Box

Corollary 40 (Existence of smooth solutions) If {u_0: {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} is smooth and divergence free then there is a {0 < T_* \leq \infty} and a smooth periodic solution {(u,p)} to the Navier-Stokes equations on {[0,T_*) \times {\bf R}^d/{\bf Z}^d} with normalised pressure such that if {T_*<\infty}, then {\| u \|_{L^\infty_t L^\infty_x([0,T_*))} = \infty}. Furthermore, {T_*} and {(u,p)} are unique.

Proof: As discussed previously, we may assume without loss of generality that {u_0} has mean zero. As {u_0} is periodic and smooth, it lies in {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} for every {s}. From the preceding discussion we already have {0  \frac{d}{2}}, and with {\|u\|_{L^\infty_t L^\infty_x([0,T_*))} = \infty} if {T_*} is finite. It will suffice to show that {u} is smooth, since we know from preceding discussion that a smooth solution to (33) can be converted to a smooth solution to (1).

By Exercise 29, one has

\displaystyle  \partial_t u = \nu \Delta u + \nabla \cdot \mathbb{P} (u \otimes u)

in the sense of spacetime distributions. The right-hand side lies in {C^0_t H^s_x} for every {s}, hence the left-hand side does also; this makes {u} lie in {C^1_t H^s_x}. It is then easy to see that this implies that the right-hand side of the above equation lies in {C^1_t H^s_x} for every {s}, and so {u} now lies in {C^2_t H^s_x} for every {s}. Iterating this (and using Sobolev embedding) we conclude that {u} is smooth in space and time, giving the claim. \Box

Remark 41 When {d=3}, it is a notorious open problem whether the maximal lifespan {T_*} given by the above corollary is always infinite.

Exercise 42 (Instantaneous smoothing) Let {s > \frac{d}{2}}, let {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free, and let {u: [0,T_*) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} be the maximal Cauchy development provided by Theorem 38. Show that {u} is smooth on {(0,T_*) \times {\bf R}^d/{\bf Z}^d} (note the omission of the initial time {t=0}). (Hint: first show that {t \mapsto u(t+\varepsilon)} is a {H^{s+1}} mild solution for arbitrarily small {0 < \varepsilon < T_*}.)

Exercise 43 (Lipschitz continuous dependence on initial data) Let {s > \frac{d}{2}}, let {T>0}, and let {u_0 \in H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} be divergence-free. Suppose one has an {H^s} mild solution {u: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to the Navier-Stokes equations with initial data {u_0}. Show that there in a neighbourhood {U} of {u_0} in (the divergence-free elements of) in {H^s({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0}, such that for every {v_0 \in U}, there exists an {H^s} mild solution {v: [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} to the Navier-Stokes equations with initial data {v_0} with the map from {v_0} to {v} Lipschitz continuous (using the {H^s} metric for the initial data {v_0} and the {C^0_t H^s} metric for the solution {v}).

Now we discuss the issue of relaxing the regularity condition {s > \frac{d}{2}} in the above theory. The main inefficiency in the above arguments is the use of the crude estimate (37), which sacrifices some of the {L^p} exponent in time in exchange for extracting a positive power of the lifespan {T} that can be used to create a contraction mapping, as long as {T} is small enough. It turns out that by using a different energy estimate than Exercise 29(d), one can avoid such an exchange, allowing one to construct solutions at lower regularity, and in particular at the “critical” regularity of {H^{\frac{d}{2}-1}}. Furthermore, in the category of smooth solutions, one can even achieve the desirable goal of ensuring that the time of existence {T} is infinite – but only provided that the initial data is small. More precisely,

Proposition 44 Let {u_0 \in H^{\frac{d}{2}-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})^0} and let {F \in L^1_t H^{\frac{d}{2}-1}_x([0,\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})^0}. Then the function {u: [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} defined by the Duhamel formula

\displaystyle  u(t) = e^{\nu t\Delta} u_0 + \int_0^t e^{\nu(t-s)\Delta} F(s)\ ds

also has mean zero for all {t}, and obeys the estimates

\displaystyle  \| u \|_{C^0_t H^{\frac{d}{2}-1}([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} + \nu^{1/2} \| u \|_{L^2_t H^{\frac{d}{2}}_x([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}

\displaystyle  + \nu^{1/2} \| u \|_{L^2_t L^\infty_x([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}

\displaystyle  \lesssim_d \| u_0 \|_{H^{\frac{d}{2}-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} + \|F \|_{L^1_t H^{\frac{d}{2}-1}_x([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}.

Proof: By Minkowski’s integral inequality, it will suffice to establish the bounds in the case {F=0}. The first two norms of the right-hand side are already established by Exercise 29(d), so it remains to establish the estimate

\displaystyle  \nu^{1/2} \| u \|_{L^2_t L^\infty_x([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \lesssim_d \| u_0 \|_{H^{\frac{d}{2}-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}.

By working with the rescaled function {\tilde u(t,x) := u(\nu^{-1} t, x)} (and also rescaling {\tilde F(t,x) = \nu^{-1} F(\nu^{-1} t, x)}), we may normalise {\nu=1}. By a limiting argument we may assume without loss of generality that {u_0} is Schwarz. We cannot directly apply Exercise 36 here due to the failure of endpoint Sobolev embedding; nevertheless we may argue as follows. For any {t>0}, we see from (31), the mean zero hypothesis, and the triangle inequality that

\displaystyle  \| u(t) \|_{L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} \leq \sum_{k \in {\bf Z}^d \backslash \{0\}} e^{- 4\pi^2 |k|^2 t} |\hat u_0(k)|,

and hence by Cauchy-Schwarz and the bound

\displaystyle  \sum_{k \in {\bf Z}^d \backslash \{0\}} e^{- 4\pi^2 |k|^2 t} \lesssim_d t^{-d/2}

(which can be verified using the integral test for {t \geq 1}, while for {t < 1} it is easy to bound the LHS by {O(1)}) we have

\displaystyle  \| u(t) \|_{L^\infty({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}^2 \lesssim_d t^{-d/2} \sum_{k \in {\bf Z}^d \backslash \{0\}} e^{- 4\pi^2 |k|^2 t} |\hat u_0(k)|^2.

Integrating this in {t} using Fubini’s theorem, we conclude

\displaystyle  \| u \|_{L^2_t L^\infty_x([0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}^2 \lesssim_d \sum_{k \in {\bf Z}^d \backslash \{0\}} |k|^{d-2} |\hat u_0(k)|^2,

and the claim follows. \Box

This gives the following small data global existence result, also due to Fujita and Kato:

Theorem 45 (Small data global existence) Suppose that {u_0 \in H^{\frac{d}{2}-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0} is divergence-free with norm at most {\varepsilon_d \nu}, where {\varepsilon_d>0} is a sufficiently small constant depending only on {d}. Then there exists a {H^{\frac{d}{2}-1}} mild solution to the Navier-Stokes equations on {[0,+\infty)}. Furthermore, if {u_0} is smooth, then this mild solution is also smooth.

Proof: By working with the rescaled function {\tilde u(t,x) := \nu^{-1} u(\nu^{-1} t, x)}, we may normalise {\nu=1}. Let {X} denote the Banach space of functions

\displaystyle  X = C^0_t H^{d/2-1}_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0 \cap L^2_t H^{d/2}_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0 \cap L^2_t L^\infty_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)

with the obvious norm

\displaystyle  \|u\|_X = \|u\|_{C^0_t H^{d/2-1}_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \|u\|_{L^2_t H^{d/2}_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

\displaystyle  + \|u\|_{L^2_t L^\infty_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

Let {\Phi} be the Duhamel operator (36). If {u \in X}, then by Proposition 44 and Lemma 35 one has

\displaystyle  \| \Phi(u) \|_X \lesssim_d \varepsilon_d + \| \mathbb{P}( \nabla \cdot (u \cdot u) ) \|_{L^1_t H^{\frac{d}{2}-1}( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

\displaystyle  \lesssim_d \varepsilon_d + \| u \otimes u \|_{L^1_t H^{\frac{d}{2}}( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d^2})}

\displaystyle  \lesssim_d \varepsilon_d + \| u \|_{L^2_t H^{\frac{d}{2}}_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d})} \| u \|_{L^2_t L^\infty_x( [0,+\infty) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^{d})}

\displaystyle  \lesssim_d \varepsilon_d + \| u \|_X^2.

In particular, {\Phi} maps {X} to {X}. A similar argument establishes the bound

\displaystyle  \| \Phi(u) - \Phi(v) \|_X \lesssim_d \| u-v\|_X ( \|u\|_X + \|v\|_X )

for all {u,v \in X}. For {\varepsilon_d} small enough, {\Phi} will be a contraction on {\overline{B(0, C_d \varepsilon_d)}} for some absolute constant {C_d} depending only on {d}, and hence has a fixed point {u = \Phi(u)} which will be the desired {H^{\frac{d}{2}-1}} mild solution.

Now suppose that {u_0} is smooth. Let {s > \frac{d}{2}}, and let {u: [0,T_*) \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d} be the maximal Cauchy development provided by Theorem 38. For any {0 < T \leq T_*}, if one defines

\displaystyle  \|u\|_{X_T} = \|u\|_{C^0_t H^{d/2-1}_x( [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \|u\|_{L^2_t H^{d/2}_x( [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \|u\|_{L^2_t L^\infty_x( [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

then the preceding arguments give

\displaystyle  \|u\|_{X_T} \lesssim_d \varepsilon_d + \| u \|_{X_T}^2,

thus either {\|u\|_{X_T} \lesssim_d \varepsilon_d} or {\|u\|_{X_T} \gtrsim_d 1}. On the other hand, {\|u\|_{X_T}} depends continuously on {d} and converges to {\|u\|_{H^{d/2-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} \leq \varepsilon_d} as {T \rightarrow 0}. For {\varepsilon_d} small enough, this implies that {\|u\|_{X_T} \lesssim_d \varepsilon_d} for all {T} (this is an example of a “continuity argument”). Next, if we set

\displaystyle  \|u\|_{X^s_T} = \|u\|_{C^0_t H^{s}_x( [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \|u\|_{L^2_t H^{s+1}_x( [0,T] \times {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

then repeating the previous arguments also gives

\displaystyle  \| u \|_{X^s_T} \lesssim_d \| u_0 \|_{H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} + \| u \|_{X_T} \| u \|_{X^s_T};

as {\|u\|_{X^s_T}} is finite and {\|u\|_{X_T} \lesssim_d \varepsilon_d}, we conclude (for {\varepsilon_d} small enough) that

\displaystyle  \| u \|_{X^s_T} \lesssim_d \| u_0 \|_{H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}.

In particular we have

\displaystyle  \| u(t) \|_{H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)} \lesssim_d \| u_0 \|_{H^{s}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)}

for all {0 < t < T_*}, and hence by Theorem 38 we have {T_* = \infty}. The argument used to prove Corollary 40 shows that {u} is smooth, and the claim follows. \Box

Remark 46 Modifications of this argument also allow one to establish local existence of {H^{\frac{d}{2}-1}} mild solutions when the initial data {u_0} lies in {H^{\frac{d}{2}-1}({\bf R}^d/{\bf Z}^d \rightarrow {\bf R}^d)^0}, but has large norm rather than norm less than {\varepsilon_d}. However, in this case one does not have a lower bound on the time of existence that depends only on the norm of the data, as was the case with Theorem 37. Further modification of the argument also allows one to extend Theorem 38 to the entire “subcritical” range of regularities {s > \frac{d}{2}-1}. See the paper of Fujita and Kato for details.

We now turn attention to the non-periodic case in two and higher dimensions {d \geq 2}. The theory is largely identical, though with some minor technical differences. Unlike the periodic case, we will not attempt to reduce to the case of {u} having mean zero (indeed, we will not even assume that {u} is absolutely integrable, so that the mean might not even be well defined).

In the periodic case, we focused initially on smooth solutions. Smoothness is not sufficient by itself in the non-periodic setting to provide a good well-posedness theory, as we already saw in Section 3 when discussing the linear heat equation; some additional decay at spatial infinity is needed. There is some flexibility as to how much smoothness to prescribe. Let us say that a solution {(u,p)} to Navier-Stokes is classical if {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} and {p: [0,T] \times {\bf R}^d \rightarrow {\bf R}} are smooth, and furthermore {u} lies in {C^0_t H^s_x( [0,T] \times {\bf R}^d \rightarrow {\bf R}^d )} for every {s}.

Now we work on normalising the pressure. Suppose {(u,p)} is a classical solution. As before we may write the Navier-Stokes equation in divergence form as (32). Taking a further divergence we obtain the equation

\displaystyle  \Delta p = - \nabla \cdot \nabla \cdot (u \otimes u).

The function {u \otimes u} belongs to {C^t H^s_x} for every {s}, so if we define the normalised pressure

\displaystyle  p_0 := - \Delta^{-1} (\nabla \cdot \nabla \cdot (u \otimes u))

via the Fourier transform as

\displaystyle  \widehat{\Delta p_0}(\xi) = \frac{\xi_i \xi_j}{|\xi|^2} \widehat{u_i u_j}(\xi)

then {p_0} will also belong to {C^t H^s_x} for every {s}. We then have {p = p_0 + h} for some smooth harmonic function {h: [0,T] \times {\bf R}^d \rightarrow {\bf R}}. To control this harmonic function, we return to (32), which we write as

\displaystyle  \nabla h = -\partial_t u + F


\displaystyle  F := - \nabla \cdot (u \otimes u) + \nu \Delta u - \nabla p_0

and apply the fundamental theorem of calculus to conclude that

\displaystyle  \int_{t_1}^{t_2} \nabla h(t)\ dt = - u(t_2) + u(t_1) + \int_{t_2}^{t_1} F(t)\ dt.

The left-hand side is harmonic (thanks to differentiating under the integral sign), and the right-hand side lies in {L^2({\bf R}^d \rightarrow {\bf R})} (in fact it is in {H^s} for every {s}), hence both sides vanish. By the fundamental theorem of calculus this implies that {\nabla h(t)} vanishes identically, thus {h(t)} is constant in space. One can then subtract it from the pressure without affecting (1). Thus, in the category of classical solutions, at least, we may assume without loss of generality that we have normalised pressure

\displaystyle  p_0 := - \Delta^{-1} (\nabla \cdot \nabla \cdot (u \otimes u))

in which case the Navier-Stokes equations may be written as before as (33). (See also this paper of mine for some variants of this argument.)

Exercise 47 (Uniqueness with normalised pressure) Let {(u_1,p_1), (u_2,p_2)} be two smooth classical solutions to (1) on {[0,T] \times {\bf R}^d} with normalised pressure such that {u_1(0)=u_2(0)}. Then {(u_1,p_1)=(u_2,p_2)}.

We can now define the notion of a Fujita-Kato {H^s} mild solution as before, except that we replace all mention of the torus {{\bf R}^d/{\bf Z}^d} with the Euclidean space {{\bf R}^d}, and omit all requirements for the solution to be of mean zero. As stated in the appendix, the product estimate in Proposition 35 continues to hold in {{\bf R}^d}, so one can obtain the analogue of Theorem 37, Theorem 38, Proposition 39, and Corollary 40 on {{\bf R}^d} by repeating the proofs with the obvious changes; we leave the details as an exercise for the interested reader.

Exercise 48 Establish an analogue of Proposition 44 on {{\bf R}^d}, using the homogeneous Sobolev space {\dot H^{\frac{d}{2}-1}({\bf R}^d \rightarrow {\bf R})} defined to be the closure of the Schwartz functions {f: {\bf R}^d \rightarrow {\bf R}} with respect to the norm

\displaystyle  \| f \|_{\dot H^{\frac{d}{2}-1}({\bf R}^d \rightarrow {\bf R})} := (\int_{{\bf R}^d} |\xi|^{d-2} |\hat f(\xi)|^2\ d\xi)^{1/2},

and use this to state and prove an analogue of Theorem 45.

— 5. Heuristics —

There are several further extensions of these types of local and global existence results for smooth solutions, in which the role of the Sobolev spaces {H^s} here are replaced by other function spaces. For instance, in three dimensions in the non-periodic setting, the role of the critical space {\dot H^{1/2}({\bf R}^3 \rightarrow {\bf R}^3)} was replaced by the larger critical space {L^3({\bf R}^3 \rightarrow {\bf R}^3)} by Kato, and to the even larger space {BMO^{-1}({\bf R}^3 \rightarrow {\bf R}^3)} by Koch and Tataru, who also gave evidence that the latter space essentially the limit of the method; in even larger spaces such as the Besov space {B^{-1}_{\infty,\infty}}, there are constructions of Bourgain and Pavlovic that demonstrate ill-posedness in the sense of “norm inflation” – solutions that start from arbitrarily small norm data but end up being arbitrarily large in arbitrarily small amounts of time. (This grossly abbreviated history skips over dozens of other results, both positive and negative, in yet further function spaces, such as Morrey spaces or Besov spaces. See for instance the recent text of Lemarie-Rieusset for a survey.

Rather tham detail these other results, let us present instead a scaling heuristic which can be used to interpret these results (and can clarify why all the positive well-posedness results discussed here involve either “subcritical” or “critical” function spaces, rather than “supercritical” ones). For simplicity we restrict our discussion to the non-periodic setting {{\bf R}^d}, although the discussion here could also be adapted without much difficulty to the periodic setting (which effectively just imposes an additional constraint {N \gtrsim 1} on the frequency parameter {N} to be introduced below).

In this heuristic discussion, we assume that any given time {t}, the velocity field {u(t)} is primarily located at a certain frequency {N = N(t)} (or equivalently, at a certain wavelength {1/N = 1/N(t)}) in the sense that the spatial Fourier transform {\hat u(t)} is largely concentrated in the region {|\xi| \sim N}. We also assume that at this time, the solution has an amplitude {A(t)}, in the sense that {u(t,x)} tends to be of order {A(t)} in magnitude in the region where it is concentrated. (We are deliberately leaving terms such as “concentrated” vague for the purposes of this discussion.) Using this ansatz, one can then heuristically compute the magnitude of various terms in the Navier-Stokes equations (1) or the projected version (33). For instance, if {u} has ampltude {\sim A} and frequency {\sim N}, then {\Delta u} should have amplitude {\sim AN^2} (and frequency {\sim N}), since the Laplacian operator {\Delta} multiplies the Fourier transform {\hat u(t,\xi)} by {4\pi^2 |\xi|^2 \sim N^2}; one can also take a more “physical space” viewpoint and view the second derivatives in {\Delta} as being roughly like dividing out by the wavelength {1/N} twice. Thus we see that the viscosity term {\nu \Delta u} in (1) or (33) should have size about {\nu AN^2}. Similarly, the expression {u \otimes u} in (33) should have magnitude {\sim A^2} and frequency {\sim N} (or maybe slightly less due to cancellation), so {\nabla \cdot (u \otimes u)} and hence {\mathbb{P} \nabla \cdot (u \otimes u)} should have magnitude {\sim A^2 N}. The terms {(u \cdot \nabla) u} and {\nabla p} in (1) can similarly be computed to have magnitude {\sim A^2 N}. Finally, if the solution oscillates (or blows up) in time in intervals of length {T} (which one can think of as the natural time scale for the solution), then the term {\partial_t u} should have magnitude {\sim A/T}.

This leads to the following heuristics:

  • If {\nu A N^2 \gg A^2 N} (or equivalently if {A \ll \nu N}), then the viscosity term {\nu \Delta u} dominates the nonlinear terms in (1) or (33), and one should expect the Navier-Stokes equations to behave like the heat equation (23) in this regime. In particular solutions should exist and maintain (or even improve) their regularity as long as this regime persists. To balance the equation (1) or (33), one expects {A/T \sim \nu A N^2}, so the natural time scale here is {T \sim \frac{1}{\nu N^2}}.
  • If {\nu A N^2 \ll A^2 N} (or equivalently if {A \gg \nu N}), then nonlinear effects dominate, and the behaviour is likely to be quite different to that of the heat equation. One now expects {A/T \sim A^2 N}, so the natural time scale here is {T \sim \frac{1}{AN}}. In particular, one could theoretically have blowup or other bad behaviour after this time scale.

As a general rule of thumb, the known well-posedness theory for the Navier-Stokes equation is only applicable when the hypotheses on the initial data (and on the timescale being considered) is compatible either with the viscosity-dominated regime {A \ll \nu N}, or the time-limited regime {T \ll \frac{1}{AN}}. Outside of these regimes, we expect the evolution to be highly nonlinear in nature, and techniques such as the ones in this set of notes, which are primarily based on approximating the evolution by the linear heat flow, are not expected to apply.

Let’s discuss some of the results in this set of notes using these heuristics. Suppose we are given that the initial data {u_0} is bounded in {H^s} norm by some bound {B}:

\displaystyle  \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} \leq B.

As in the above heuristics, we assume that {u_0} exhibits some amplitude {A} and frequency {N}. Heuristically, the {H^s} norm of {u_0} should resemble {(1+N)^s} times the {L^2} norm of {u_0}, which should be roughly {A V^{1/2}}, where {V} is the volume of the region where {u_0} is concentrated in. Thus we morally have a bound of the form

\displaystyle  (1+N)^s A V^{1/2} \lesssim B.

To use this bound, we invoke (at a heuristic level) the uncertainty principle {\Delta x \Delta \xi \gtrsim 1}, which indicates that the data {u_0} should be spatially spread out at a scale of at least the wavelength {1/N}, which implies that the volume {V} should be at least {\gtrsim 1/N^d}. Thus we have

\displaystyle  (1+N)^s N^{-\frac{d}{2}} A \lesssim B.

Suppose we have {s \geq \frac{d}{2}}, then we have the crude bound

\displaystyle  (1+N)^s \geq N^{\frac{d}{2}}, \ \ \ \ \ (38)

so we expect to have an amplitude bound {A \lesssim B}. If we are in the nonlinear regime {A \gg \nu N}, this implies that {AN \lesssim B^2 / \nu}, and so the natural time scale {T \sim \frac{1}{AN}} here is lower bounded by {T \gtrsim \frac{\nu}{B^2}}. This matches up with the local existence time given in Theorem 37 (or the non-periodic analogue of this theorem). However, the use of the crude bound (38) suggests that one can make improvements to this bound when {s} is far from {\frac{d}{2}}:

Exercise 49 If {s > \frac{d}{2}-1}, make a heuristic argument as to why the optimal lower bound for the time of existence {T} for the Navier-Stokes equation in terms of the {H^s({\bf R}^d \rightarrow {\bf R}^d)} norm of the initial data {u_0} should take the form

\displaystyle  T \gtrsim \frac{1}{\nu (\| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}/\nu)^{\frac{2}{s-\frac{d}{2}+1}}}.

In a similar spirit, suppose we have the smallness hypothesis

\displaystyle  \| u_0 \|_{\dot H^{\frac{d}{2}-1}({\bf R}^d \rightarrow {\bf R}^d)} \leq \varepsilon_d \nu

on the critical norm {\| u_0 \|_{\dot H^{\frac{d}{2}-1}({\bf R}^d \rightarrow {\bf R}^d)}}, then a similar analysis to above leads to

\displaystyle  N^{\frac{d}{2}-1} N^{-\frac{d}{2}} A \lesssim \varepsilon_d \nu

and hence we will be in the viscosity dominated regime {A \ll \nu N} if {\varepsilon_d} is small enough, regardless of what time scale {T} one uses; this is consistent with the global existence result in Theorem 45. On the other hand, if the norm {\| u_0 \|_{\dot H^{\frac{d}{2}-1}({\bf R}^d \rightarrow {\bf R}^d)}} is much larger than {\nu}, then {A} can be larger than {\nu N}, and we can fail to be in the viscosity dominated regime at any choice of frequency {N}; setting {A} to be a large multiple of {\nu N} and sending {N} to infinity, we see that the natural time scale {T} could be arbitrarily small.

Finally, if one only controls a supercritical norm such as {\| u_0 \|_{H^{\frac{d}{2}-1-\delta}({\bf R}^d \rightarrow {\bf R}^d)}} for some {\delta > 0}, this gives a bound on a quantity of the form {A N^{-1-\delta}}, which allows one to leave the viscosity dominated regime {A \ll \nu N} (with plenty of room to spare) when {N} is large, creating examples of initial data for which the natural time scale can be made arbitrarily small. As {N} increases (restricting to, say, powers of two), the supercritical norm of these examples decays geometrically, so one can superimpose an infinite number of these examples together, leading to a choice of initial data with arbitrarily small supercritical norm for which the natural time scale is in fact zero. This strongly suggests that there is no good local well-posedness theory at such regularities.

Exercise 50 Discuss the product estimate in Proposition 35, the Sobolev estimate in Exercise 36, and the energy estimates in Exercise 29(d) and Proposition 44 using the above heuristics.

Remark 51 These heuristics can also be used to locate errors in many purported solutions to the Navier-Stokes global regularity problem that proceed through a sequence of estimates on a Navier-Stokes solution. At some point, the estimates have to rule out the scenario that the solution {u} leaves the viscosity-dominated regime {A \ll \nu N} at larger and larger frequencies {N} (and at smaller and smaller time scales {T}), with the time scales converging to zero to achieve a finite time blowup. If the estimates in the proposed solution are strong enough to heuristically rule out this scenario by the end of the argument, but not at the beginning of the argument, then there must be some step inside the argument where one moves from “supercritical” estimates that are too weak to rule out this scenario, to “critical” or “subcritical” estimates which are capable of doing so. This step is often where the error in the argument may be found.

The above heuristics are closely tied to the classification of various function space norms as being “subcritical”, “supercritical”, or “critical”. Roughly speaking, a norm is subcritical if bounding that norm heuristically places one in the linear-dominated regime (which, for Navier-Stokes, is the viscosity-dominated regime) at high frequencies; critical if control of the norm very nearly places one in the linear-dominated regime at high frequencies; and supercritical if control of the norm completely fails to place one in the linear-dominated regime at high frequencies. When the equation in question enjoys a scaling symmetry, the distinction between subcritical, supercritical, and critical norms can be made by seeing how the the top-order component of these norms vary with respect to scaling a function to be high frequency. In the case of the Navier-Stokes equations (1), the scaling {(u,p) \mapsto (u_\lambda,p_\lambda)} is given by the formulae

\displaystyle  u_\lambda(t,x) := \lambda u( \lambda^2 t, \lambda x) \ \ \ \ \ (39)

\displaystyle  p_\lambda(t,x) := \lambda^2 p(\lambda^2 t, \lambda x) \ \ \ \ \ (40)

with the initial data {u_0} similarly being scaled to

\displaystyle  u_{0,\lambda}(x) := \lambda u_0(\lambda x).

Here {\lambda > 0} is a scaling parameter; as {\lambda \rightarrow \infty}, the functions {u_\lambda, p_\lambda, u_{0,\lambda}} are being sent to increasingly fine scales (i.e., high frequencies). One easily checks that if {(u,p)} solves the Navier-Stokes equations (1) with initial data {u_0}, then {(u_\lambda,p_\lambda)} solves the same equations with initial data {u_{0,\lambda}}; similarly for other formulations of the Navier-Stokes equations such as (33) or (34). (In terms of the parameters {A,N,T,V} from the previous heuristic discussion, this scaling corresponds to the map {(A,N,T,V) \mapsto (\lambda A, \lambda N, \lambda^{-2} T, \lambda^{-d} V)}.)

Typically, if one considers a function space norm of {u_{0,\lambda}} (or of {u_\lambda} or {p_\lambda}) in the limit {\lambda \rightarrow \infty}, the top order behaviour will be given by some power {\lambda^\alpha} of {\lambda}. A norm is called subcritical if the exponent {\alpha} is positive, supercritical if the exponent is negative, and critical if the exponent is zero. For instance, one can calculate the Fourier transform

\displaystyle  \widehat{u_{0,\lambda}(\xi)} = \lambda^{1+d} \hat u_0(\xi/\lambda)

and hence

\displaystyle  \| u_{0,\lambda} \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}^2 = \lambda^{2-2d} \int_{{\bf R}^d} \langle \xi \rangle^{2s} |\hat u_0(\xi/\lambda)|^2\ d\xi

\displaystyle  = \lambda^{2-d} \int_{{\bf R}^d} \langle \lambda \xi \rangle^{2s} |\hat u_0(\xi)|^2\ d\xi.

As {\lambda \rightarrow \infty}, this expression behaves like {\lambda^{2-d+2s}} to top order; hence the {H^s} norm is subcritical when {s > \frac{d}{2}-1}, supercritical when {s < \frac{d}{2}-1}, and critical when {s = \frac{d}{2}-1}.

Another way to phrase this classification is to use dimensional analysis. If we use {L} to denote the unit of length, and {T} the unit of time, then the velocity field {u} should have units {L T^{-1}}, and the terms {\partial_t u} and {(u \cdot \nabla) u} in (1) then have units {L T^{-2}}. To be dimensionally consistent, the kinematic viscosity {\nu} must then have the units {L^2 T^{-1}}, and the pressure {p} should have units {L^2 T^{-2}}. (This differs from the usual units given in physics to the pressure, which is {M L^{2-d} T^{-2}} where {M} is the unit of mass; the discrepancy comes from the choice to normalise the density, which usually has units {M L^{-d}}, to equal {1}.) If we fix {\nu} to be a dimensionless constant such as {1}, this forces a relation {T = L^2} between the time and length units, so now {u} and {p} have the units {L^{-1}} and {L^{-2}} respectively (compare with (39) and (40)). Of course {u_0} will then also have units {L^{-1}}. One can then declare a function space norm of {u_0}, {u}, or {p} to be subcritical if its top order term has units of a negative power of {L}, supercritical if this is a positive power of {L}, and critical if it is dimensionless. For instance, the top order term in {\| u_{0} \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}} is the {L^2} norm of {|\nabla|^s u_0}; as {|\nabla|^s u_0} has the units of {L^{1-s}}, and Lebesgue measure {dx} has the units of {L^d}, we see that {\| u_{0} \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}} has the units of {L^{1-s} L^{d/2}}, giving the same division into subcritical, supercritical, and critical spaces as before.

— 6. Appendix: some Littlewood-Paley theory —

We now prove Proposition 35. By a limiting argument it suffices to establish the claim for smooth {u,v}. The claim is immediate from Hölder’s inequality when {s=0}, so we will assume {s>0}. For brevity we shall abbreviate {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})} as {L^2}, and similarly for {H^s}, etc..

We use the technique of Littlewood-Paley projections. Let {\phi: {\bf R}^d \rightarrow {\bf R}} be an even bump function (depending only on {d}) that equals {1} on {B(0,1/2)} and is supported on {B(0,1)}; for the purposes of asymptotic notation, any bound that depends on {\phi} can thus be thought of as depending on {d} instead. For any dyadic integer {N} (by which we mean an integer that is a power of {2}), define the Littlewood-Paley projections {P_{\leq N}, P_N} on periodic smooth functions {f: {\bf R}^d/{\bf Z}^d \rightarrow {\bf R}} by the formulae

\displaystyle  P_{\leq N} f(x) := \sum_{k \in {\bf Z}^d} \phi( k/N ) \hat f(k) e^{2\pi i k \cdot x}

and (for {N > 1})

\displaystyle  P_N := P_{\leq N} - P_{\leq N/2}

so one has the Littlewood-Paley decomposition

\displaystyle  f = P_{\leq 1} f + \sum_{N>1} P_N f.

Here and in the sequel {N} is always understood to be restricted to be a dyadic integer.

The key point of this decomposition is that the {L^p} and Sobolev norms of the individual components of this decomposition are easier to estimate than the original function {f}. The following estimates in particular will suffice for our applications:

Exercise 52 (Basic Littlewood-Paley estimates)

  • (a) For any dyadic integer {N}, show that

    \displaystyle  P_{\leq N} f(x) = \int_{{\bf R}^d} \check \phi(y) f(x - \frac{y}{N})\ dy

    where {\check \phi(x) := \int_{\bf R} \phi(\xi) e^{2\pi i \xi \cdot x}\ d\xi} is the inverse Fourier transform of {\phi}. In particular if {f} is real-valued then so is {P_{\leq N}} and {P_N}. Conclude the Bernstein inequality

    \displaystyle  \| P_{\leq N} f \|_{L^q} \lesssim_{d} N^{\frac{d}{p}-\frac{d}{q}} \| f \|_{L^p}

    for all smooth functions {f: {\bf R}^d/{\bf Z}^d \rightarrow{\bf R}}, all {N \geq 1} and {1 \leq p \leq q \leq \infty}; in particular

    \displaystyle  \| P_{\leq N} f \|_{L^p} \lesssim_{d} N^{\frac{d}{p}-\frac{d}{q}} \| f \|_{L^p}.

    By the triangle inequality, the same estimates also hold for {P_N}, {N > 1}.

  • (b) For any {s \geq 0}, show that

    \displaystyle  \| f \|_{H^s} \sim_{d,s} \| P_{\leq 1} f \|_{L^2} + \left( \sum_{N>1} N^{2s} \| P_N f \|_{L^2}^2\right)^{1/2}

Remark 53 The more advanced Littlewood-Paley inequality, which is usually proven using the Calderón-Zygmund theory of singular integrals, asserts that

\displaystyle  \| f \|_{L^p} \sim_{d,p} \| |P_{\leq 1} f| + (\sum_{N > 1} |P_N f|^2)^{1/2} \|_{L^p}

for any {1 < p < \infty}. However, we will not use this estimate here.

We return now to the proof of Proposition 35. Let {\phi} be as above. By Exercise 52, it suffices to establish the bounds

\displaystyle  \| P_{\leq 1}( uv) \|_{L^2} \lesssim_{d,s} \| u \|_{H^s} \| v \|_{L^\infty} + \| u \|_{L^\infty} \| v \|_{H^s} \ \ \ \ \ (41)


\displaystyle  \sum_{N>1} N^{2s} \| P_{N}( uv) \|_{L^2}^2 \lesssim_{d,s} \| u \|_{H^s}^2 \| v \|_{L^\infty}^2 + \| u \|_{L^\infty}^2 \| v \|_{H^s}^2. \ \ \ \ \ (42)

The estimate (41) follows by dropping {P_{\leq 1}} (using Exercise 52) and applying Hölder’s inequality, so we turn to (42). We may restrict attention to those terms where {N>8} (say) since the other terms can be treated by the same argument used to prove (41).

The basic strategy here is to split the product {uv} (or the component {P_N(uv)} of this product) into paraproducts in which some constraint is imposed between the frequencies of the {u} and {v} terms. There are many ways to achieve this splitting; we will use

\displaystyle  P_N(uv) = P_N( (P_{\leq N/8}u) v ) + \sum_{M>N/8} P_N( (P_M u) v).

By the triangle inequality, it suffices to show the estimates

\displaystyle  \sum_{N>8} N^{2s} \| P_{N}( (P_{\leq N/8} u) v) \|_{L^2}^2 \lesssim_{d,s} \| u \|_{L^\infty}^2 \| v \|_{H^s}^2 \ \ \ \ \ (43)


\displaystyle  \sum_{N>8} N^{2s} (\sum_{M>N/8} \| P_{N}( (P_M u) v) \|_{L^2})^2 \lesssim_{d,s} + \| u \|_{L^\infty}^2 \| v \|_{H^s}^2 \ \ \ \ \ (44)

We begin with (43). We can expand further

\displaystyle  P_N( (P_{\leq N/8} u) v ) = P_N( (P_{\leq N/8} u) P_{\leq 1} v ) + \sum_{M>1} P_N( (P_{\leq N/8} u) P_M v ).

The key point now is that (by inspecting the Fourier series expansions) the first term on the RHS vanishes, and the summands in the second term also vanish unless {M \sim N}. Thus

\displaystyle  N^{2s} \| P_{N}( (P_{\leq N/8} u) v) \|_{L^2}^2 \lesssim \sum_{M \sim N} N^{2s} \| P_{N}( (P_{\leq N/8} u) (P_M v)) \|_{L^2}^2

\displaystyle  \lesssim_d \sum_{M \sim N} M^{2s} \| (P_{\leq N/8} u) (P_M v) \|_{L^2}^2

\displaystyle  \lesssim_d \sum_{M \sim N} M^{2s} \| P_{\leq N/8} u \|_{L^\infty}^2 \| P_M v \|_{L^2}^2

\displaystyle  \lesssim_d \| u \|_{L^\infty}^2 \sum_{M \sim N} M^{2s} \| P_M v \|_{L^2}^2

and the claim follows by summing in {N}, interchanging the summations, and using Exercise 52. Now we prove (44). We bound

\displaystyle  N^{2s} (\sum_{M>N/8} \| P_{N}( (P_M u) v) \|_{L^2})^2 \lesssim_d N^{2s} (\sum_{M>N/8} \| (P_M u) v \|_{L^2})^2

\displaystyle  \lesssim_d N^{2s} (\sum_{M > N/8} \| P_M u \|_{L^2} \| v \|_{L^\infty})^2

\displaystyle  \lesssim_{d,s} \| v\|_{L^\infty}^2 \sum_{M > N/8} (N/M)^{s} M^{2s} \| P_M u \|_{L^2}^2

using Cauchy-Schwarz, and the claim again follows by summing in {N}, interchanging the summations, and using Exercise 52.

There is an essentially identical theory in the non-periodic setting, in which the role of smooth periodic functions are now replaced by Schwartz functions, the Littlewood-Paley projections {P_{\leq N}} are now defined as

\displaystyle  P_{\leq N} f(x) := \int_{{\bf R}^d} \phi( \xi/N ) \hat f(\xi) e^{2\pi i \xi \cdot x},

and {P_N} is defined as before.

Exercise 54 (Non-periodic Littlewood-Paley theory) With {L^2} now denoting {L^2({\bf R}^d \rightarrow {\bf R})} instead of {L^2({\bf R}^d/{\bf Z}^d \rightarrow {\bf R})}, and similarly for other function spaces, establish the non-periodic analogue of Exercise 52 for Schwartz functions {f}.

In particular, one obtains the non-periodic analogue of Proposition 35 by repeating the proof verbatim.

David HoggKnutson, calibration, hot stars

Heather Knutson (Caltech) arrived for a week of hacking on exoplanet and brown-dwarf spectroscopy. She has a number of things she has brought for our consideration. But the one that seems to be sticking is the inadequacy of her theory-driven or physical tellurics model. It has systematic residuals. We are going to explore options for tweaking the model using a data-driven fit to the residuals. This is a structure that I would like to try also for The Cannon: Instead of making a data-driven model for the stellar spectra, we could make a data-driven model for the residuals of the spectra away from best-fit models. And the parameters for the physics-driven model and the data-driven model could be tied together (or not) in various clever ways. So much idea.

At lunch, Anthony Pullen (NYU) gave a great talk about foreground mitigation in line-intensity mapping experiments. He went through all the kinds of auto-correlations, cross-correlations, and de-correlations that can be done to remove or mitigate foregrounds. The talk reminded me of many conversations I have had over my life about self-calibration, which led me to think about whether we could replace the cross-correlation parts of his model with a kind of self-calibration. Worth thinking about!

Late in the day, Benjamin Pope (NYU) and I came up with a good plan for looking at hot stars in Kepler. We could look at modeling them as a mixture of asteroseismic modes, spacecraft systematics, and planets. And then probably find nothing! But find nothing better than it has been found before. I like that kind of project.

David Hoggready to submit!

I worked on the weekend to finish my paper with Eilers (MPIA) and Rix (MPIA). It is ready to submit! And yet I can't push my changes properly to GitHub because they are (in a very rare moment) down! I made some compromises in finishing up this paper; I can only justify them by promising myself I will address the final issues while the referee considers the manuscript.

Jordan EllenbergThe greatest Red Sox / Dodger

After game 2 it was already clear this was an NLCS so great it had to go seven, and it did.  But game seven wasn’t a great game seven.  After six hard-fought games, the Brewers never really mounted a threat, and went down 5-1.  Keenest pain of all was that I got what I’d been waiting for the whole series; a chance for my beloved Jonathan Schoop to be the hero.  He came in to pinch hit for starter Joulys Chacin in the bottom of the second, with two on and two out and the Brewers down by 1.  Schoop grounded out.  He was 0 for the postseason in 6 plate appearances.

So here we have it, a Red Sox / Dodgers series, and so it’s time for my annual post about what player had the best combined career for both teams.  (Last year:  Jimmy Wynn was the greatest Astro/Dodger.)

The greatest Red Sox / Dodger?  A player I’d never heard of, even though he was just a little before my time:  Reggie Smith.  Played in one World Series for the Red Sox (1967) and three for the Dodgers (1977,1978,1981).  Went to the All-Star Game with both teams.  Hit 300 home runs, cannon of an arm in the outfield, got 0.7% of the vote the one and only time he was up for the Hall of Fame.  Well, here’s his all time distinction; with 34.2 WAR for the Red Sox and 19.4 for the Dodgers, he’s the greatest Red Sox / Dodger of all time.

Surprisingly, given how old these teams are, the top Red Sox / Dodgers of all time are mostly recent players.  Derek Lowe is the top pitcher (19.4 WAR for Boston, 13.3 for LA.)  Adrian Gonzalez, Manny Ramirez, and Adrian Beltre are also worthy of mention.   The only old-time player who was a contender was Dutch Leonard, who actually pitched for Boston in the last Red Sox – Dodgers World Series in 1916, notching a complete game win.  But that guy never actually pitched for the Dodgers!  My search got confused because it turns out there were two Dutch Leonards, the second of whom was a Dodger to start his career.  Doesn’t count!

October 22, 2018

BackreactionString theory pros and cons [video - no singing!]

As I mentioned, I used the prize money I got from the recent FQXi essay contest to buy a new video camera. The main reason for doing this was not better music videos (though there’s that), but that my old camcorder doesn’t have a mic port and the audio quality of the built-in mic is miserable to say the least. Turned out, however, that none of my microphones actually worked with the new camera,

Jordan EllenbergNaser Talebizadeh Sardari, Hecke eigenvalues, and Chabauty in the deformation space

Naser Sardari is finishing a postdoc at Wisconsin this year and just gave a beautiful talk about his new paper.  Now Naser thinks of this as a paper about automorphic forms — and it is — but I want to argue that it is also a paper which develops an unexpected new form of the Chabauty method!  As I will now explain.  Tell me if you buy it.

First of all, what does Naser prove?  As the title might suggest, it’s a statement about the multiplicity of Hecke eigenvalues a_p; in this post, we’re just going to talk about the eigenvalue zero.  The Hecke operator T_p acts on the space of weight-k modular forms on Gamma_0(N); how many zero eigenvectors can it have, as k goes to infinity with N,p fixed?  If you believe conjectures of Maeda type, you might expect that the Hecke algebra acts irreducibly on the space S_k(Gamma_0(N)); of course this doesn’t rule out that one particular Hecke operator might have some zeroes, but it should make it seem pretty unlikely.

And indeed, Naser proves that the number of zero eigenvectors is bounded independently of k, and even gives an explicit upper bound. (When the desired value of a_p is nonzero, T_p has finite slope and we can reduce to a problem about modular forms in a single p-adic family; in that context, a uniform bound is easy, and one can even show that the number of such forms of weight <k grows very very very very slowly with k, where each "very" is a log; this is worked out on Frank Calegari’s blog.. On the other hand, as Naser points out below in comments, if you ask about the “Hecke angle” a_p/p^{(k-1)/2}, we don’t know how to get any really good bound in the nonzero case. I think the conjecture is that you always expect finite multiplicity in either setting even if you range over all k.)

What I find most striking is the method of proof and its similarity to the Chabauty method!  Let me explain.  The basic idea of Naser’s paper is to set this up in the language of deformation theory, with the goal of bounding the number of weight-k p-adic Galois representations rho which could be the representations attached to weight-k forms with T_p = 0.

We can pin down the possible reductions mod p of such a form to a finite number of possibilities, and this number is independent of k, so let’s fix a residual representation rhobar once and for all.

The argument takes place in R_loc, the ring of deformations of rhobar|G_{Q_p}.  And when I say “the ring of deformations” I mean “the ring of deformations subject to whatever conditions are important,” I’m just drawing a cartoon here.  Anyway, R_loc is some big p-adic power series ring; or we can think of the p-adic affine space Spec R_loc, whose Z_p-points we can think of as the space of deformations of rhobar to p-adic local representations.  This turns out to be 5-dimensional in Naser’s case.

Inside Spec R_loc, we have the space of local representations which extend to global ones; let’s call this locus Spec R_glob.  This is still a p-adic manifold but it’s cut out by global arithmetic conditions and its dimension will be given by some computation in Galois cohomology over Q; it turns out to be 3.

But also inside Spec R_loc, we have a submanifold Z cut out by the condition that a_p is not just 0 mod p, it is 0 on the nose, and that the determinant is the kth power of cyclotomic for the particular k-th power you have in mind.  This manifold, which is 2-dimensional, is something you could define without ever knowing there was such a thing as Q; it’s just some closed locus in the deformation space of rhobar|Gal(Q_p).

But the restriction of rho to Gal(Q_p) is a point psi of R_loc which has to lie in both these two spaces, the local one which expresses the condition “psi looks like the representation of Gal(Q_P) attached to a weight-k modular form with a_p = 0” and the global one which expresses the condition “psi is the restriction to Gal(Q_p) of representation of Gal(Q) unramified away from some specified set of primes.”  So psi lies in the intersection of the 3-dimensional locus and the 2-dimensional locus in 5-space, and the miracle is that you can prove this intersection is transverse, which means it consists of a finite set of points, and what’s more, it is a set of points whose cardinality you can explicitly bound!

If this sounds familiar, it’s because it’s just like Chabauty.  There, you have a curve C and its Jacobian J.  The analogue of R_loc is J(Q_p), or rather let’s say a neighborhood of the identity in J(Q_p) which looks like affine space Q_p^g.

The analogue of R_glob is (the p-adic closure of) J(Q), which is a proper subspace of dimension r, where r is the rank of J(Q), something you can compute or at least bound by Galois cohomology over Q.  (Of course it can’t be a proper subspace of dimension r if r >= g, which is why Chabauty doesn’t work in that case!)

The analogue of Z is C(Q_p); this is something defined purely p-adically, a locus you could talk about even if you had no idea your C/Q_p were secretly the local manifestation of a curve over Q.

And any rational point of C(Q), considered as a point in J(Q_p), has to lie in both C(Q_p) and J(Q), whose dimensions 1 and at most g-1, and once again the key technical tool is that this intersection can be shown to be transverse, whence finite, so C(Q) is finite and you have Mordell’s conjecture in the case r < g.  And, as Coleman observed decades after Chabauty, this method even allows you to get an explicit bound on the number of points of C(Q), though not an effective way to compute them.

I think this is a very cool confluence indeed!  In the last ten years we've seen a huge amount of work refining Chabauty; Matt Baker discusses some of it on his blog, and then there’s the whole nonabelian Chabauty direction launched by Minhyong Kim and pushed forward by Jen Balakrishnan and Netan Dogra and many others.  Are there other situations in which we can get meaningful results from “deformation-theoretic Chabauty,” and are the new technical advances in Chabauty methods relevant in this context?

David Hoggtarget selection; rock and metal

At Flatiron we have purchased a share in the Terra Hunting Experiment, which will be a big, long-term radial-velocity monitoring program with HARPS3. Today Megan Bedell (Flatiron) and I had a conversation about target selection for that survey. There are many choices that could be made in target selection that could make populations or astrophysics inferences very difficult or even impossible later. These conversations remind me of the great and hard work that went in to target selection in the SDSS family of surveys.

The day ended with a great talk by Leslie Rogers (Chicago) about the things that set planet sizes (as a function of mass). She always phrases her results in terms of what isn't rocky, because of the one-sided-ness of some or most of the composition-related observational uncertainties, but it sure looks to my eyes like the smallest planets are rock and metal, like the Earth. She has one extremely good case, which is orbiting so close to its host star that tidal-disruption arguments come in to play! She also was optimistic that transit-timing information might be informative in the near future. There were jokes about water planets and soda-water planets, because many planets that are rich in water are also expected to be very rich in CO2.

October 20, 2018

David Hoggconvexity in machine learning

Thursdays are low-research! But there was a great NYU Physics Colloquium at the end of the day by Eric Vanden-Eijnden (NYU) about the mathematical properties of neural networks. I would say “deep learning” but in fact the networks that are most amenable to mathematical analysis are actually shallow and wide.

I am not sure I fully understood EVE's talk, but if I did, he can show the following: Although the optimization of the network (which is a shallow but wide fully connected logistic network, maybe) is not in any sense convex, and although the model is non-identifiable, with certain (or any?) convex loss function, and with enough data (maybe), the optimum of the loss is convex in the approximation of the model to the function it is trying to emulate.

If anything even close to this is true it is extremely important: Can an optimization be non-convex in the parameter space of a function but convex in the function space? I am sure there are trivial examples, but non-trivially? This might relate to things I have wondered about bi-linear models and related, previously.

October 19, 2018

Matt von HippelA Micrographia of Beastly Feynman Diagrams

Earlier this year, I had a paper about the weird multi-dimensional curves you get when you try to compute trickier and trickier Feynman diagrams. These curves were “Calabi-Yau”, a type of curve string theorists have studied as a way to curl up extra dimensions to preserve something called supersymmetry. At the time, string theorists asked me why Calabi-Yau curves showed up in these Feynman diagrams. Do they also have something to do with supersymmetry?

I still don’t know the general answer. I don’t know if all Feynman diagrams have Calabi-Yau curves hidden in them, or if only some do. But for a specific class of diagrams, I now know the reason. In this week’s paper, with Jacob Bourjaily, Andrew McLeod, and Matthias Wilhelm, we prove it.

We just needed to look at some more exotic beasts to figure it out.


Like this guy!

Meet the tardigrade. In biology, they’re incredibly tenacious microscopic animals, able to withstand the most extreme of temperatures and the radiation of outer space. In physics, we’re using their name for a class of Feynman diagrams.


A clear resemblance!

There is a long history of physicists using whimsical animal names for Feynman diagrams, from the penguin to the seagull (no relation). We chose to stick with microscopic organisms: in addition to the tardigrades, we have paramecia and amoebas, even a rogue coccolithophore.

The diagrams we look at have one thing in common, which is key to our proof: the number of lines on the inside of the diagram (“propagators”, which represent “virtual particles”) is related to the number of “loops” in the diagram, as well as the dimension. When these three numbers are related in the right way, it becomes relatively simple to show that any curves we find when computing the Feynman diagram have to be Calabi-Yau.

This includes the most well-known case of Calabi-Yaus showing up in Feynman diagrams, in so-called “banana” or “sunrise” graphs. It’s closely related to some of the cases examined by mathematicians, and our argument ended up pretty close to one made back in 2009 by the mathematician Francis Brown for a different class of diagrams. Oddly enough, neither argument works for the “traintrack” diagrams from our last paper. The tardigrades, paramecia, and amoebas are “more beastly” than those traintracks: their Calabi-Yau curves have more dimensions. In fact, we can show they have the most dimensions possible at each loop, provided all of our particles are massless. In some sense, tardigrades are “as beastly as you can get”.

We still don’t know whether all Feynman diagrams have Calabi-Yau curves, or just these. We’re not even sure how much it matters: it could be that the Calabi-Yau property is a red herring here, noticed because it’s interesting to string theorists but not so informative for us. We don’t understand Calabi-Yaus all that well yet ourselves, so we’ve been looking around at textbooks to try to figure out what people know. One of those textbooks was our inspiration for the “bestiary” in our title, an author whose whimsy we heartily approve of.

Like the classical bestiary, we hope that ours conveys a wholesome moral. There are much stranger beasts in the world of Feynman diagrams than anyone suspected.

n-Category Café Analysis in Higher Gauge Theory

Higher gauge theory has the potential to describe the behavior of 1-dimensional objects and higher-dimensional membranes much as ordinary gauge theory describes the behavior of point particles. But ordinary gauge theory is also a source of fascinating differential equations, which yield interesting results about topology if one uses enough analysis to prove rigorous results about their solutions. What about higher gauge theory?

Andreas Gastel has a new paper studying higher gauge theory using some techniques of analysis that are commonly used in ordinary gauge theory. He’s finding some interesting similarities but also some ways in which higher gauge theory is simpler:

Abstract. We study the problem of finding good gauges for connections in higher gauge theories. We find that, for 2-connections in strict 2-gauge theory and 3-connections in 3-gauge theory, there are local “Coulomb gauges” that are more canonical than in classical gauge theory. In particular, they are essentially unique, and no smallness of curvature is needed in the critical dimensions. We give natural definitions of 2-Yang-Mills and 3-Yang-Mills theory and find that the choice of good gauges makes them essentially linear. As an application, (anti-)selfdual 2-connections over B 6B^6 are always 2-Yang-Mills, and (anti-)selfdual 3-connections over B 8B^8 are always 3-Yang-Mills.

I think this is great. I don’t know if the 2-Yang Mills and 3-Yang Mills equations studied here are interesting in physics — they are, in any event, being studied on Euclidean n\mathbb{R}^n — but they might be interesting in differential geometry. And I’m very glad that Gastel read and tried to fix my old paper on higher Yang–Mills theory.

More generally, I think it’s great to get analysts involved in studying higher structures. Categorically-minded mathematicians and homotopy type theorists shouldn’t just soar up to the paradise of \infty-categories and leave their colleagues stranded on Earth in some mathematical equivalent of the Rapture.

There are interesting discoveries to be made by bringing other mathematicians into the conversation! For example, Gastel writes:

… we have a 2-Yang-Mills functional that really resembles Yang–Mills. Therefore, it is tempting to ask whether there is a higher form of Uhlenbeck’s gauge theorem, controlling norms like A W 2,2+B W 1,2\|A\|_{W^{2,2}}+\|B\|_{W^{1,2}} by Z A,B L 2\|Z_{A,B}\|_{L^2} in dimensions m6m \le 6 once a suitable 2-gauge is fixed, maybe under a smallness condition for the latter norm. One of our results will be that this really works. But, surprisingly enough for the author, it turns out that the good 2-gauge exists without a smallness condition, and moreover the transformed 2-connection has a canonical form that has the potential to simplify the theory very much.

This kind of blend of ideas makes me happy: Sobolev spaces, differential geometry and higher categories!

Here Z A,BZ_{A,B} is a kind of ‘curvature’ of the 2-connection (A,B)(A,B), which consists of a Lie-algebra-valued 1-form AA and a Lie-algebra-valued 2-form BB:

Z A,B=dB+α̲(A)B Z_{A,B} = d B + \underline{\alpha}(A) \wedge B

AA and BB take values in different Lie algebras, in general, because they come from a Lie 2-algebra. But the first of these Lie algebras acts as derivations of the second via α̲\underline{\alpha}, and this lets us form the wedge product α̲(A)B\underline{\alpha}(A) \wedge B.

The point, therefore, is that if the L 2L^2 norm of the curvature is small, we can find a gauge in which AA and BB are small in a suitable sense. This is a generalization of a fundamental result by Uhlenbeck for ordinary gauge theories, proved in 1982.

October 18, 2018

n-Category Café Analysis in Higher Gauge Theory

Higher gauge theory has the potential to describe the behavior of 1-dimensional objects and higher-dimensional membranes much as ordinary gauge theory describes the behavior of point particles. But ordinary gauge theory is also a source of fascinating differential equations, which yield interesting results about topology if one uses enough analysis to prove rigorous results about their solutions. What about higher gauge theory?

Andreas Gastel has a new paper studying higher gauge theory using some techniques of analysis that are commonly used in ordinary gauge theory. He’s finding some interesting similarities but also some ways in which higher gauge theory is simpler:

Abstract. We study the problem of finding good gauges for connections in higher gauge theories. We find that, for 2-connections in strict 2-gauge theory and 3-connections in 3-gauge theory, there are local “Coulomb gauges” that are more canonical than in classical gauge theory. In particular, they are essentially unique, and no smallness of curvature is needed in the critical dimensions. We give natural definitions of 2-Yang-Mills and 3-Yang-Mills theory and find that the choice of good gauges makes them essentially linear. As an application, (anti-)selfdual 2-connections over B 6B^6 are always 2-Yang-Mills, and (anti-)selfdual 3-connections over B 8B^8 are always 3-Yang-Mills.

I think this is great. I don’t know if the 2-Yang Mills and 3-Yang Mills equations studied here are interesting in physics — they are, in any event, being studied on Euclidean n\mathbb{R}^n — but they might be interesting in differential geometry. And I’m very glad that Gastel read and tried to fix my old paper on higher Yang–Mills theory.

More generally, I think it’s great to get analysts involved in studying higher structures. Categorically-minded mathematicians and homotopy type theorists shouldn’t just soar up to the paradise of \infty-categories and leave their colleagues stranded on Earth in some mathematical equivalent of the Rapture.

There are interesting discoveries to be made by bringing other mathematicians into the conversation! For example, Gastel writes:

… we have a 2-Yang-Mills functional that really resembles Yang–Mills. Therefore, it is tempting to ask whether there is a higher form of Uhlenbeck’s gauge theorem, controlling norms like A W 2,2+B W 1,2\|A\|_{W^{2,2}}+\|B\|_{W^{1,2}} by Z A,B L 2\|Z_{A,B}\|_{L^2} in dimensions m6m \le 6 once a suitable 2-gauge is fixed, maybe under a smallness condition for the latter norm. One of our results will be that this really works. But, surprisingly enough for the author, it turns out that the good 2-gauge exists without a smallness condition, and moreover the transformed 2-connection has a canonical form that has the potential to simplify the theory very much.

This kind of blend of ideas makes me happy: Sobolev spaces, differential geometry and higher categories.

BackreactionFirst stars spell trouble for dark matter

The HERA telescope array in South Africa [img src]. In the beginning, reheating created a hot plasma of elementary particles. The plasma expanded, cooled, and emitted the cosmic background radiation. Then gravity made the plasma clump, and darkness was upon the face of the deep, whatever that means. Cosmologists call it the “dark ages,” the period in the early universe where matter is

David HoggIs the Milky Way halo really a thing?

A very low-research day was saved by Suroor Gandhi (NYU) who showed me work she is doing with Melissa Ness (Columbia) on stellar chemistry and kinematics. We discussed the question of whether the Milky Way stellar halo really looks like a distinct kinematic and chemical component (as it should!) or whether it just looks like some kind of continuous extension of the disk (which it should not, but does). Interesting, and how to dig deeper?

October 17, 2018

Robert HellingBavarian electoral system

Last Sunday, we had the election for the federal state of Bavaria. Since the electoral system is kind of odd (but not as odd as first past the post), I would like to analyse how some variations (assuming the actual distribution of votes) in the rule would have worked out. So, first, here is how actually, the seats are distributed: Each voter gets two ballots: On the first ballot, each party lists one candidate from the local constituency and you can select one. On the second ballot, you can vote for a party list (it's even more complicated because also there, you can select individual candidates to determine the position on the list but let's ignore that for today).

Then in each constituency, the votes on ballot one are counted. The candidate with the most votes (like in first past the pole) gets elected for parliament directly (and is called a "direct candidate"). Then over all, the votes for each party on both ballots (this is where the system differs from the federal elections) are summed up. All votes for parties with less then 5% of the grand total of all votes are discarded (actually including their direct candidates but this is not of a partial concern). Let's call the rest the "reduced total". According to the fraction of each party in this reduced total the seats are distributed.

Of course the first problem is that you can only distribute seats in integer multiples of 1. This is solved using the Hare-Niemeyer-method: You first distribute the integer parts. This clearly leaves fewer seats open than the number of parties. Those you then give to the parties where the rounding error to the integer below was greatest. Check out the wikipedia page explaining how this can lead to a party losing seats when the total number of seats available is increased.

Because this is what happens in the next step: Remember that we already allocated a number of seats to constituency winners in the first round. Those count towards the number of seats that each party is supposed to get in step two according to the fraction of votes. Now, it can happen, that a party has won more direct candidates than seats allocated in step two. If that happens, more seats are added to the total number of seats and distributed according to the rules of step two until each party has been allocated at least the number of seats as direct candidates. This happens in particular if one party is stronger than all the other ones leading to that party winning almost all direct candidates (as in Bavaria this happened to the CSU which won all direct candidates except five in Munich and one in Würzburg which were won by the Greens).

A final complication is that Bavaria is split into seven electoral districts and the above procedure is for each district separately. So there are seven times rounding and adding seats procedures.

Sunday's election resulted in the following distribution of seats:

After the whole procedure, there are 205 seats distributed as follows

  • CSU 85 (41.5% of seats)
  • SPD 22 (10.7% of seats)
  • FW 27 (13.2% of seats)
  • GREENS 38 (18.5% of seats)
  • FDP 11 (5.4% of seats)
  • AFD 22 (10.7% of seats)
You can find all the total of votes on this page.

Now, for example one can calculate the distribution without districts throwing just everything in a single super-district. Then there are 208 seats distributed as

  • CSU 85 (40.8%)
  • SPD 22 (10.6%)
  • FW 26 (12.5%)
  • GREENS 40 (19.2%)
  • FDP 12 (5.8%)
  • AFD 23 (11.1%)
You can see that in particular the CSU, the party with the biggest number of votes profits from doing the rounding 7 times rather than just once and the last three parties would benefit from giving up districts.

But then there is actually an issue of negative weight of votes: The greens are particularly strong in Munich where they managed to win 5 direct seats. If instead those seats would have gone to the CSU (as elsewhere), the number of seats for Oberbayern, the district Munich belongs to would have had to be increased to accommodate those addition direct candidates for the CSU increasing the weight of Oberbayern compared to the other districts which would then be beneficial for the greens as they are particularly strong in Oberbayern: So if I give all the direct candidates to the CSU (without modifying the numbers of total votes), I get the follwing distribution:
221 seats
  • CSU 91 (41.2%)
  • SPD 24 (10.9%)
  • FW 28 (12,6%)
  • GREENS 42 (19.0%)
  • FDP 12 (5.4%)
  • AFD 24 (10.9%)
That is, there greens would have gotten a higher fraction of seats if they had won less constituencies. Voting for green candidates in Munich actually hurt the party as a whole!

The effect is not so big that it actually changes majorities (CSU and FW are likely to form a coalition) but still, the constitutional court does not like (predictable) negative weight of votes. Let's see if somebody challenges this election and what that would lead to.

The perl script I used to do this analysis is here.

The above analysis in the last point is not entirely fair as not to win a constituency means getting fewer votes which then are missing from the grand total. Taking this into account makes the effect smaller. In fact, subtracting the votes from the greens that they were leading by in the constituencies they won leads to an almost zero effect:

Seats: 220
  • CSU  91 41.4%
  • SPD  24 10.9%
  • FW  28 12.7%
  • GREENS  41 18.6%
  • FDP  12 5.4%
  • AFD  24 10.9%
Letting the greens win München Mitte (a newly created constituency that was supposed to act like a bad bank for the CSU taking up all central Munich more left leaning voters, do I hear somebody say "Gerrymandering"?) yields

Seats: 217
  • CSU  90 41.5%
  • SPD  23 10.6%
  • FW  28 12.9%
  • GREENS  41 18.9%
  • FDP  12 5.5%
  • AFD  23 10.6%
Or letting them win all but Moosach and Würzbug-Stadt where the lead was the smallest:

Seats: 210

  • CSU  87 41.4%
  • SPD  22 10.5%
  • FW  27 12.9%
  • GREENS  40 19.0%
  • FDP  11 5.2%
  • AFD  23 11.0%

October 16, 2018

Jordan EllenbergNLCS game 2: Dodgers 4, Brewers 3

In 35 years of watching baseball I had never been to a postseason game, until this Saturday, when I was able to get two tickets to Game 2 of the National League Championship Series through a wonderful terrific beautiful friend with connections.

First of all, I salute whoever the free spirit was who slammed a Zima right before entering Miller Park.

The game started at 3pm; in late afternoon with the roof shut at Miller Park there’s a slant-line of sunlight across the field which is lovely to look at and probably terrible to hit in.

And indeed there wasn’t a lot of hitting to start with. Wade Miley, once a bad Oriole, now a good Brewer, never looked dominant, giving up lots of hard-hit balls including a shot by Jeremy Freese in the first that Lorenzo Cain hauled back in from over the wall, but somehow pitched 5 2/3 only allowing 2 hits (and collecting a single himself.) Hyun-jin Ryu matched him zero for zero. Every seat in Miller Park full, everyone attentive to the game, a level of attention I’ve never seen there. The guy behind us kept saying “NASTY, throw something NASTY.” CJ believes he sees Marlins Man in the front row — he’s right! Brewers get runners on second and third with one out, Dodgers intentionally walk Yelich to load the bases, (wave of boos), Braun delivers the RBI groundout but can’t score any more. Travis Shaw hits a solo shot to deepest center, the Brewers go up 3-0, and people start to smell win, but the Dodgers lineup has good hitters all the way down to #8 and the usually reliable Milwaukee bullpen starts to crack. Jeremy Jeffress comes in with runners on first and second and nobody out, immediately gives up a single to Joc Pederson, now they’re loaded, still nobody out, Brewers up 3-1. Manny Machado, on third base, keeps jumping off the bag, trying to distract Jeffress. But Jeffress strikes out Yasiel Puig, who’s so angry he smashes his bat over his knee. Crowd exults. Then he walks light-hitting catcher Austin Barnes to force in a run. Nobody’s up in the bullpen. Crowd panics. Yasmini Grandal comes in to hit in the pitcher’s spot and Jeffress somehow gets the double play ball and is out of it. But the next inning, Jeffress stays in a little too long; Chris Taylor leads off with a lucky little dink of an infield single and then Turner muscles a ball out to the short corner in left field; 4-3 Dodgers and it stays that way.

But the Brewers do threaten. 43,000 Brewers fans want to see Yelich get one more chance to be the hero. Hernan Perez draws a walk in the bottom of the ninth, steals while Cain strikes out. So Yelich gets to bat with 2 outs and a runner in scoring position. He grounds out. Crowd deflates. But that’s all you can ask of a baseball game, right? The hitter you want in the situation you want with the game on the line and whatever happens happens. Great baseball. Great team. I hope they win it all.  Maybe I’ll try to be there when they do.

Jordan EllenbergDid I like Bleak House?

It’s like asking if I like New York.  It’s big!  A lot of different things are in it.  Some things are monumental and wonderful, some things have an offhand arresting beauty, some things smell bad.

Minor thoughts after break — this book just came out 165 years ago and I want to spare you spoilers.

    • The court case, Jarndyce and Jarndyce, that makes up the center of the book, promises the person who masters it great power, but it insidiously and inevitably corrupts everyone who comes in contact with it.  In other words I sort of see it as the basis for the Ring in Tolkien! And in the end the person most fully corrupted by it ends up participating in its destruction.
    • A lot of space in this book is taken up by the question of “What is the right way to be a woman?  What is the right way to be a man?”  Dickens offers you a lot of pretty different examples of men he sees as properly manly; the gentle and endlessly patient John Jarndyce, the simple-yet-deeply-good-and-loves-his-mother-a-lot soldier George, the selfless do-gooder Woodcourt, the comically presented and finally viewed as noble Lord Dedlock, and so on.  But in this book there is one and only one way to be good at being a woman, which is reiterated again and again — not to make trouble.  Not to ask for anything, not to disagree, to make of oneself a kind of item of service.  Here’s Inspector Bucket, praising Esther Summerson in much the same kind of language she uses aspirationally about herself:
      “Lord! You’re no trouble at all. I never see a young woman in any station of society — and I’ve seen many elevated ones too — conduct herself like you have conducted yourself since you was called out of your bed. You’re a pattern, you know, that’s what you are,” said Mr. Bucket warmly; “you’re a pattern.”I told him I was very glad, as indeed I was, to have been no hindrance to him, and that I hoped I should be none now.“My dear,” he returned, “when a young lady is as mild as she’s game, and as game as she’s mild, that’s all I ask, and more than I expect. She then becomes a queen, and that’s about what you are yourself.”
    • The character of Skimpole is the best thing in this, maybe because he’s one of the few characters who’s not a fixed type.  He’s charming and innocent, until he’s not.  Bucket again:  “Whenever a person proclaims to you ‘In worldly matters I’m a child,’ you consider that that person is only a-crying off from being held accountable and that you have got that person’s number, and it’s Number One.”
    • There’s a spontaneous combustion in the middle of this book!  Better still, before the spontaneous combustion reveal, there’s twenty pages of characters wandering around saying things like “my goodness, it certainly does smell like greasy smoke tonight, they must be cooking mutton at the pub.”  In the very last scene before Krook’s charred body is discovered, another character, directly upstairs from Krook’s room, puts his hand on the windowsill and finds it covered with a foul oily yellow ichor.  The whole thing is extremely metal.

      In the section that follows there’s a weird and kind of boring catalog of contemporary accounts of spontaneous combustions, which is apparently there because scientists hassled Dickens about the ridiculousness of the scene and he felt the need to defend himself in the subsequently published chapter.

    • Lots of sentence fragments in this book, moody description in a telegraphic style I think of as more modern than the 1850s (did Dickens invent this?)  Here’s the opening:

      London. Michaelmas term lately over, and the Lord Chancellor sitting in Lincoln’s Inn Hall. Implacable November weather. As much mud in the streets as if the waters had but newly retired from the face of the earth, and it would not be wonderful to meet a Megalosaurus, forty feet long or so, waddling like an elephantine lizard up Holborn Hill. Smoke lowering down from chimney-pots, making a soft black drizzle, with flakes of soot in it as big as full-grown snowflakes — gone into mourning, one might imagine, for the death of the sun. Dogs, undistinguishable in mire. Horses, scarcely better; splashed to their very blinkers.

    • And here’s the closing, which lands again on this question of “what does it mean to be good at being a woman”:

      “My dear Dame Durden,” said Allan, drawing my arm through his, “do you ever look in the glass?”

      “You know I do; you see me do it.”

      “And don’t you know that you are prettier than you ever were?”

      “I did not know that; I am not certain that I know it now. But I know that my dearest little pets are very pretty, and that my darling is very beautiful, and that my husband is very handsome, and that my guardian has the brightest and most benevolent face that ever was seen, and that they can very well do without much beauty in me — even supposing —.

      I struggle with this paragraph on a purely literal level. I don’t understand the final sentence fragment. What is elided? Even supposing what?

October 15, 2018

Terence Tao254A, Notes 3: Local well-posedness for the Euler equations

We now turn to the local existence theory for the initial value problem for the incompressible Euler equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p \ \ \ \ \ (1)


\displaystyle \nabla \cdot u = 0

\displaystyle u(0,x) = u_0(x).

For sake of discussion we will just work in the non-periodic domain {{\bf R}^d}, {d \geq 2}, although the arguments here can be adapted without much difficulty to the periodic setting. We will only work with solutions in which the pressure {p} is normalised in the usual fashion:

\displaystyle p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u). \ \ \ \ \ (2)


Formally, the Euler equations (with normalised pressure) arise as the vanishing viscosity limit {\nu \rightarrow 0} of the Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p + \nu \Delta u \ \ \ \ \ (3)


\displaystyle \nabla \cdot u = 0

\displaystyle p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u)

\displaystyle u(0,x) = u_0(x)

that was studied in previous notes. However, because most of the bounds established in previous notes, either on the lifespan {T_*} of the solution or on the size of the solution itself, depended on {\nu}, it is not immediate how to justify passing to the limit and obtain either a strong well-posedness theory or a weak solution theory for the limiting equation (1). (For instance, weak solutions to the Navier-Stokes equations (or the approximate solutions used to create such weak solutions) have {\nabla u} lying in {L^2_{t,loc} L^2_x} for {\nu>0}, but the bound on the norm is {O(\nu^{-1/2})} and so one could lose this regularity in the limit {\nu \rightarrow 0}, at which point it is not clear how to ensure that the nonlinear term {u_j u} still converges in the sense of distributions to what one expects.)

Nevertheless, by carefully using the energy method (which we will do loosely following an approach of Bertozzi and Majda), it is still possible to obtain local-in-time estimates on (high-regularity) solutions to (3) that are uniform in the limit {\nu \rightarrow 0}. Such a priori estimates can then be combined with a number of variants of these estimates obtain a satisfactory local well-posedness theory for the Euler equations. Among other things, we will be able to establish the Beale-Kato-Majda criterion – smooth solutions to the Euler (or Navier-Stokes) equations can be continued indefinitely unless the integral

\displaystyle \int_0^{T_*} \| \omega(t) \|_{L^\infty_x( {\bf R}^d \rightarrow \wedge^2 {\bf R}^d )}\ dt

becomes infinite at the final time {T_*}, where {\omega := \nabla \wedge u} is the vorticity field. The vorticity has the important property that it is transported by the Euler flow, and in two spatial dimensions it can be used to establish global regularity for both the Euler and Navier-Stokes equations in these settings. (Unfortunately, in three and higher dimensions the phenomenon of vortex stretching has frustrated all attempts to date to use the vorticity transport property to establish global regularity of either equation in this setting.)

There is a rather different approach to establishing local well-posedness for the Euler equations, which relies on the vorticity-stream formulation of these equations. This will be discused in a later set of notes.

— 1. A priori bounds —

We now develop some a priori bounds for very smooth solutions to Navier-Stokes that are uniform in the viscosity {\nu}. Define an {H^\infty} function to be a function that lies in every {H^s} space; similarly define an {L^p_t H^\infty_x} function to be a function that lies in {L^p_t H^s_x} for every {s}. Given divergence-free {H^\infty({\bf R}^d \rightarrow {\bf R}^d)} initial data {u_0}, an {H^\infty} mild solution {u} to the Navier-Stokes initial value problem (3) is a solution that is an {H^s} mild solution for all {s}. From the (non-periodic version of) Corollary 40 of Notes 1, we know that for any {H^\infty({\bf R}^d \rightarrow {\bf R}^d)} divergence-free initial data {u_0}, there is unique {H^\infty} maximal Cauchy development {u: [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d}, with {\|u\|_{L^\infty_t L^\infty_x([0,T_*) \times {\bf R}^d)}} infinite if {T_*} is finite.

Here are our first bounds:

Theorem 1 (A priori bound) Let {u: [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d} be an {H^\infty} maximal Cauchy development to (3) with initial data {u_0}.

  • (i) For any integer {s > \frac{d}{2}+1}, we have

    \displaystyle T_* \gg_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}.

    Furthermore, if {0 \leq t \leq c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}} for a sufficiently small constant {c_{s,d}>0} depending only on {s,d}, then

    \displaystyle \| u(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}.

  • (ii) For any {0 < T < T_*} and integer {s \geq 0}, one has

    \displaystyle \| u \|_{L^\infty_t H^s_x([0,T] \times{\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d}

    \displaystyle \exp( O_{s,d}( \| \nabla u \|_{L^1_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})} ) ) \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}.

The hypothesis that {s} is integer can be dropped by more heavily exploiting the theory of paraproducts, but we shall restrict attention to integer {s} for simplicity.

We now prove this theorem using the energy method. Using the Navier-Stokes equations, we see that {u, p} and {\partial_t u} all lie in {L^\infty_t H^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} for any {0 < T < T_*}; an easy iteration argument then shows that the same is true for all higher derivatives of {u,p} also. This will make it easy to justify the differentiation under the integral sign that we shall shortly perform.

Let {s \geq 0} be an integer. For each time {t \in [0,T)}, we introduce the energy-type quantity

\displaystyle E(t) := \sum_{m=0}^s \frac{1}{2} \int_{{\bf R}^d} |\nabla^m u(t,x)|^2\ dx.

Here we think of {\nabla^m u} as taking values in the Euclidean space {{\bf R}^{d^{m+1}}}. This quantity is of course comparable to {\| u(t) \|_{H^m({\bf R}^d \rightarrow {\bf R}^d)}^2}, up to constants depending on {d,s}. It is easy to verify that {E(t)} is continuously differentiable in time, with derivative

\displaystyle \partial_t E(t) = \sum_{m=0}^s \int_{{\bf R}^d} \nabla^k u \cdot \nabla^k \partial_t u\ dx,

where we suppress explicit dependence on {t,x} in the integrand for brevity. We now try to bound this quantity in terms of {E(t)}. We expand the right-hand side in coordinates using (3) to obtain

\displaystyle \partial_t E(t) = -A - B +C


\displaystyle A := \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m (u_j \partial_j u_i)\ dx

\displaystyle B := \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m \partial_i p\ dx

\displaystyle C := \nu \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m \partial_j \partial_j u_i\ dx.

For {B}, we can integrate by parts to move the {\partial_i} operator onto {u_i} and use the divergence-free nature {\partial_i u_i=0} of {u} to conclude that {B=0}. Similarly, we may integrate by parts for {C} to move one copy of {\partial_j} over to the other factor in the integrand to conclude

\displaystyle C = - \nu \sum_{m=0}^s \sum_{{\bf R}^d} |\nabla^{m+1} u|^2\ dx

so in particular {C \leq 0} (note that as we are seeking bounds that are uniform in {\nu}, we can’t get much further use out of {C} beyond this bound). Thus we have

\displaystyle \partial_t E(t) \leq -A.

Now we expand out {A} using the Leibniz rule. There is one dangerous term, in which all the derivatives in {\nabla^m (u_j \partial_j u_i)} fall on the {u_i} factor, giving rise to the expression

\displaystyle \sum_{m=0}^s \int_{{\bf R}^d} u_j \nabla^m u_i \cdot \nabla^m \partial_j u_i\ dx.

But we can locate a total derivative to write this as

\displaystyle \frac{1}{2} \sum_{m=0}^s \int_{{\bf R}^d} u_j \partial_j |\nabla^m u|^2\ dx,

and then an integration by parts using {\partial_j u_j=0} as before shows that this term vanishes. Estimating the remaining contributions to {A} using the triangle inequality, we arrive at the bound

\displaystyle |A| \lesssim_{s,d} \sum_{m=1}^s \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m u| |\nabla^a u| |\nabla^{m-a+1} u|\ dx.

At this point we now need a variant of Proposition 35 from Notes 1:

Exercise 2 Let {a,b \geq 0} be integers. For any {f,g \in H^\infty({\bf R}^d \rightarrow {\bf R})}, show that

\displaystyle \| |\nabla^a f| |\nabla^b g| \|_{L^2({\bf R}^d \rightarrow {\bf R})} \lesssim_{a,b,d} \| f \|_{L^\infty({\bf R}^d \rightarrow {\bf R})} \| g \|_{H^{a+b}({\bf R}^d \rightarrow {\bf R})}

\displaystyle + \| f \|_{H^{a+b}({\bf R}^d \rightarrow {\bf R})} \| g \|_{L^\infty({\bf R}^d \rightarrow {\bf R})}.

(Hint: for {a=0} or {b=0}, use Hölder’s inequality. Otherwise, use a suitable Littlewood-Paley decomposition.)

Using this exercise and Hölder’s inequality, we see that

\displaystyle \int_{{\bf R}^d} |\nabla^m u| |\nabla^a u| |\nabla^{m-a+1} u| \lesssim_{a,m,d} \| \nabla^m u \|_{L^2({\bf R}^d \rightarrow {\bf R}^{d^{m+1}})} \| \nabla u \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})}

\displaystyle \| \nabla^m u \|_{L^2({\bf R}^d \rightarrow {\bf R}^{d^{m+1}})}

and thus

\displaystyle \partial_t E(t) \leq O_{s,d}( E(t) \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} ). \ \ \ \ \ (4)


By Gronwall’s inequality we conclude that

\displaystyle E(t) \leq E(0) \exp( O_{s,d}( \| \nabla u \|_{L^1_t L^\infty_x( [0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2} )} ) )

for any {0 < T < T_*} and {t \in [0,T]}, which gives part (ii).

Now assume {s > \frac{d}{2}+1}. Then we have the Sobolev embedding

\displaystyle \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_{s,d} E(t)^{1/2}

which when inserted into (4) yields the differential inequality

\displaystyle \partial_t E(t) \leq O_{s,d}( E(t)^{3/2} )

or equivalently

\displaystyle \partial_t E(t)^{-1/2} \geq - C_{s,d}

for some constant {C_{s,d}} (strictly speaking one should work with {(\varepsilon + E(t))^{-1/2}} for some small {\varepsilon>0} which one sends to zero later, if one wants to avoid the possibility that {E(t)} vanishes, but we will ignore this small technicality for sake of exposition.) Since {E(0)^{-1/2} \gtrsim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}^{-1}}, we conclude that {E(t)} stays bounded for a time interval of the form {0 \leq t < \min( c_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}^{-1}, T_*)}; this, together with the blowup criterion that {\|u(t)\|_{H^s}} must go to infinity as {t \rightarrow T_*}, gives part (i).

As a consequence, we can now obtain local existence for the Euler equations from smooth data:

Corollary 3 (Local existence for smooth solutions) Let {u_0 \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free. Let {s > \frac{d}{2}+1} be an integer, and set

\displaystyle T := c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}.

Then there is a smooth solution {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p: [0,T] \times {\bf R}^d \rightarrow {\bf R}} to (1) with all derivatives of {u,p} in {L^\infty_t H^\infty([0,T] \times {\bf R}^d \rightarrow {\bf R}^m)} for appropriate {m}. Furthermore, for any integer {s' \geq 0}, one has

\displaystyle \| u \|_{L^\infty_t H^{s'}_x([0,T] \times{\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,s',d} \| u_0 \|_{H^{s'}_x({\bf R}^d \rightarrow {\bf R}^d)}. \ \ \ \ \ (5)


Proof: We use the compactness method, which will be more powerful here than in the last section because we have much higher regularity uniform bounds (but they are only local in time rather than global). Let {\nu_n > 0} be a sequence of viscosities going to zero. By the local existence theory for Navier-Stokes (Corollary 40 of Notes 1), for each {n} we have a maximal Cauchy development {u^{(n)}: [0,T^{(n)}) \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(n)}: [0,T^{(n)}_*) \times {\bf R}^d \rightarrow {\bf R}^d} to the Navier-Stokes initial value problem (3) with viscosity {\nu_n} and initial data {u_0}. From Theorem 1(i), we have {T^{(n)}_* \geq T} for all {n} (if {c_{s,d}} is small enough), and

\displaystyle \| u^{(n)} \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}

for all {n}. By Sobolev embedding, this implies that

\displaystyle \| \nabla u^{(n)} \|_{L^\infty_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)},

and then by Theorem 1(ii) one has

\displaystyle \| u^{(n)} \|_{L^\infty_t H^{s'}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d})} \lesssim_{s,s',d} \| u_0 \|_{H^{s'}({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (6)


for every integer {s}. Thus, for each {s'}, {u^{(n)}} is bounded in {L^\infty_t H^{s'}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})}, uniformly in {n}. By repeatedly using (3) and product estimates for Sobolev spaces, we see the same is true for {p^{(n)}}, and for all higher derivatives of {u^{(n)}, p^{(n)}}. In particular, all derivatives of {u^{(n)}, p^{(n)}} are equicontinuous.

Using weak compactness (Proposition 2 of Notes 2), one can pass to a subsequence such that {u^{(n)}, p^{(n)}} converge weakly to some limits {u, p}, such that {u,p} and all their derivatives lie in {L^\infty_t H^{s'}_x} on {[0,T] \times {\bf R}^d}; in particular, {u,p} are smooth. From the Arzelá-Ascoli theorem (and Proposition 3 of Notes 2), {u^{(n)}} and {p^{(n)}} converge locally uniformly to {u,p}, and similarly for all derivatives of {u,p}. One can then take limits in (3) and conclude that {u,p} solve (1). The bound (5) follows from taking limits in (6). \Box

Remark 4 We are able to easily pass to the zero viscosity limit here because our domain {{\bf R}^d} has no boundary. In the presence of a boundary, we cannot freely differentiate in space as casually as we have been doing above, and one no longer has bounds on higher derivatives on {u} and {p} near the boundary that are uniform in the viscosity. Instead, it is possible for the fluid to form a thin boundary layer that has a non-trivial effect on the limiting dynamics. We hope to return to this topic in a future set of notes.

We have constructed a local smooth solution to the Euler equations from smooth data, but have not yet established uniqueness or continuous dependence on the data; related to the latter point, we have not extended the construction to larger classes of initial data than the smooth class {H^\infty}. To accomplish these tasks we need a further a priori estimate, now involving differences of two solutions, rather than just bounding a single solution:

Theorem 5 (A priori bound for differences) Let {R>0}, let {s > \frac{d}{2}+1} be an integer, and let {u_0, v_0 \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free with {H^s({\bf R}^d \rightarrow {\bf R}^d)} norm at most {R}. Let

\displaystyle 0 < T \leq c_{s,d} R^{-1}

where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} and {p: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be an {H^\infty} solution to (1) with initial data {u_0} (this exists thanks to Corollary 3), and let {v: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} and {q: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be an {H^\infty} solution to (1) with initial data {v_0}. Then one has

\displaystyle \|u-v\|_{L^\infty_t H^{s-1}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0-v_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (7)



\displaystyle \|u-v\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0-v_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (8)


\displaystyle + T \|u_0-v_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| v_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

Note the asymmetry between {u} and {v} in (8): this estimate requires control on the initial data {v_0} in the high regularity space {H^{s+1}} in order to be usable, but has no such requirement on the initial data {u_0}. This asymmetry will be important in some later applications.

Proof: From Corollary 3 we have

\displaystyle \| u \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, \| v \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} R \ \ \ \ \ (9)



\displaystyle \| v \|_{L^\infty_t H^{s+1}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| v_0 \|_{H^{s+1}_x({\bf R}^d \rightarrow {\bf R}^d)}. \ \ \ \ \ (10)


Now we need bounds on the difference {w := u-v}. Initially we have {w(0)=w_0}, where {w_0 := u_0-v_0}. To evolve later in time, we will need to use the energy method. Subtracting (1) for {(u,p)} and {(v,q)}, we have

\displaystyle \partial_t w + w \cdot \nabla v + u \cdot \nabla w = - \nabla (p-q)

\displaystyle \nabla \cdot w = 0.

By hypothesis, all derivatives of {w} and {p-q} lie in {L^\infty_t H^\infty_x} on {[0,T] \times {\bf R}^d}, which will allow us to justify the manipulations below without difficulty. We introduce the low regularity energy for the difference:

\displaystyle E^{s-1}(t) = \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^k w \cdot \nabla^k \partial_t w\ dx.

Arguing as in the proof of Proposition 1, we see that

\displaystyle \partial_t E^{s-1}(t) = -A - B


\displaystyle A := \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^m w_i \cdot \nabla^m (w_j \partial_j v_i + u_j \partial_j w_i)\ dx

\displaystyle B := \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m \partial_i p\ dx.

As before, the divergence-free nature of {w} ensures that {B} vanishes. For {A}, we use the Leibniz rule and again extract out the dangerous term

\displaystyle \sum_{m=0}^{s-1} \int_{{\bf R}^d} u_j \nabla^m w_i \cdot \nabla^m \partial_j w_i\ dx,

which again vanishes by integration by parts. We then use the triangle inequality to bound

\displaystyle |A| \lesssim_{s,d} \sum_{m=0}^{s-1} \sum_{a=0}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| \ dx

\displaystyle + \sum_{m=1}^{s-1} \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| + |\nabla^m w| |\nabla^a u| |\nabla^{m-a+1} w| \ dx.

Using Exercise 2 and Hölder, we may bound this by

\displaystyle \lesssim_{s,d} \sum_{m=0}^{s-1} \| \nabla^m w\|_{L^2} ( \| w \|_{L^\infty} \| \nabla^{m+1} v \|_{L^2} + \| \nabla^m w \|_{L^2} \| \nabla v \|_{L^\infty})

\displaystyle + \sum_{m=1}^{s-1} \| \nabla^m w\|_{L^2} ( \| \nabla u \|_{L^\infty} \| \nabla^{m} w \|_{L^2} + \| \nabla^{m+1} u \|_{L^2} \| w \|_{L^\infty})

which by Sobolev embedding gives

\displaystyle \lesssim_{s,d} E^{s-1}(t) ( \| v(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} + \| u(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} ).

Applying (9) and Gronwall’s inequality, we conclude that

\displaystyle E^{s-1}(t) \lesssim_{s,d} E^{s-1}(0)

for {0 \leq t \leq T}, and (7) follows.

Now we work with the high regularity energ

\displaystyle E^{s}(t) = \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^k w \cdot \nabla^k \partial_t w\ dx.

Arguing as before we have

\displaystyle \partial_t E^s(t) \lesssim_{s,d} \sum_{m=0}^{s} \sum_{a=0}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| \ dx

\displaystyle + \sum_{m=1}^{s} \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| + |\nabla^m w| |\nabla^a u| |\nabla^{m-a+1} w| \ dx.

Using Exercise 2 and Hölder, we may bound this by

\displaystyle \lesssim_{s,d} \sum_{m=0}^{s} \| \nabla^m w\|_{L^2} ( \| w \|_{L^\infty} \| \nabla^{m+1} v \|_{L^2} + \| \nabla^m w \|_{L^2} \| \nabla v \|_{L^\infty})

\displaystyle + \sum_{m=1}^{s} \| \nabla^m w\|_{L^2} ( \| \nabla u \|_{L^\infty} \| \nabla^{m} w \|_{L^2} + \| \nabla^{m} u \|_{L^2} \| \nabla w \|_{L^\infty}).

Using Sobolev embedding we thus have

\displaystyle \partial_t E^s(t) \lesssim_{s,d} E^s(t)^{1/2} E^{s-1}(t)^{1/2} \|v(t)\|_{H^{s+1}} + E^s(t) \|v(t)\|_{H^s}

\displaystyle + E^s(t) \|u(t)\|_{H^s} + E^s(t) \|u(t) \|_{H^s}

and hence by (9), (10), (7)

\displaystyle \partial_t E^s(t) \lesssim_{s,d} E^s(t)^{1/2} \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}} + R^{-1} E^s(t).

By the chain rule, we obtain

\displaystyle \partial_t (E^s(t)^{1/2}) \lesssim_{s,d} \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}} + R^{-1} E^s(t)^{1/2}

(one can work with {(\varepsilon + E^s(t))^{1/2}} in place of {E^s(t)^{1/2}} and then send {\varepsilon \rightarrow 0} later if one wishes to avoid a lack of differentiability at {0}). By Gronwall’s inequality, we conclude that

\displaystyle E^s(t)^{1/2} \lesssim_{s,d} E^s(0)^{1/2} + R^{-1} \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}}

for all {0 \leq t \leq T}, and (8) follows. \Box

By specialising (7) (or (8)) to the case where {u_0=v_0}, we see the solution constructed in Corollary 3 is unique. Now we can extend to wider classes of initial data than {H^\infty} initial data. The following result is essentially due to Kato and to Swann (with a similar result obtained by different methods by Ebin-Marsden):

Proposition 6 Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free. Set

\displaystyle T := c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}

where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {u_0^{(n)} \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be a sequence of divergence-free vector fields converging to {u_0} in {H^s} norm (for instance, one could apply Littlewood-Paley projections to {u_0}). Let {u^{(n)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(n)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be the associated solutions to (1) provided by Corollary 3 (these are well-defined for {n} large enough). Then {u^{(n)}} and {p^{(n)}} converge in {L^\infty_t H^s_x} norm on {[0,T] \times {\bf R}^d \rightarrow {\bf R}^d} to limits {u \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R})} respectively, which solve (1) in a distributional sense.

Proof: We use a variant of Kato’s argument (see also the paper of Bona and Smith for a related technique). It will suffice to show that the {u_0^{(n)}} form a Cauchy sequence in {C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, since the algebra properties of {H^s} then give the same for {p^{(n)}}, and one can then easily take limits (in this relatively high regularity setting) to obtain the limiting solution {u,p} that solves (1) in a distributional sense.

Let {N} be a large dyadic integer. By Corollary 3, we may find an {H^\infty} solution {v^{(N)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d, q^{(N)}} be the solution to the Euler equations (1) with initial data {P_{\leq N} u_0} (which lies in {H^\infty}). From Theorem 5, one has

\displaystyle \|u^{(n)}-v^{(N)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0^{(n)}- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

\displaystyle + T \|u_0^{(n)}- P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| P_{\leq N} u_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

Applying the triangle inequality and then taking limit superior, we conclude that

\displaystyle \limsup_{n,m \rightarrow \infty} \|u^{(n)}-v^{(m)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

\displaystyle + T \|u_0 - P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| P_{\leq N} u_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

But by Plancherel’s theorem and dominated convergence we see that

\displaystyle N \|u_0- P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

\displaystyle \|u_0- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

\displaystyle N^{-1} \|P_{\leq N} u_0\|_{H^{s+1}_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

as {N \rightarrow \infty}, and hence

\displaystyle \limsup_{n,m \rightarrow \infty} \|u^{(n)}-v^{(m)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} = 0,

giving the claim. \Box

Remark 7 Since the sequence {(u^{(n)}, p^{(n)})} can converge to at most one limit {(u,p)}, we see that the solution {(u,p)} to (1) is unique in the class of distributional solutions that are limits of smooth solutions (with initial data of those solutions converging to {u_0} in {H^s}). However, this leaves open the possibility that there are other distributional solutions that do not arise as the limits of smooth solutions (or as limits of smooth solutions whose initial data only converge to {u_0} in a weaker sense). It is possible to recover some uniqueness results for fairly weak solutions to the Euler equations if one also assumes some additional regularity on the fields {u,p} (or on related fields such as the vorticity {\omega = \nabla \wedge u}). In two dimensions, for instance, there is a celebrated theorem of Yudovich that weak solutions to 2D Euler are unique if one has an {L^\infty} bound on the vorticity. In higher dimensions one can also obtain uniqueness results if one assumes that the solution is in a high-regularity space such as {C^0_t H^s_x}, {s > \frac{d}{2}+1}. See for instance this paper of Chae for an example of such a result.

Exercise 8 (Continuous dependence on initial data) Let {s > \frac{d}{2}+1} be an integer, let {R>0}, and set {T := c_{s,d} R^{-1}}, where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {B} be the closed ball of radius {R} around the origin of divergence-free vector fields {u_0} in {H^s_x({\bf R}^d \rightarrow {\bf R}^d)}. The above proposition provides a solution {u \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} to the associated initial value problem. Show that the map from {u_0} to {u} is a continuous map from {B} to {C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}.

Remark 9 The continuity result provided by the above exercise is not as strong as in Navier-Stokes, where the solution map is in fact Lipschitz continuous (see e.g., Exercise 43 of Notes 1). In fact for the Euler equations, which is classified as a “quasilinear” equation rather than a “semilinear” one due to the lack of the dissipative term {\nu \Delta u} in the equation, the solution map is not expected to be uniformly continuous on this ball, let alone Lipschitz continuous. See this previous blog post for some more discussion.

Exercise 10 (Maximal Cauchy development) Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence free. Show that there exists a unique {T_*>0} and unique {u \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R})} with the following properties:

Furthermore, show that {T_*, u, p} do not depend on the particular choice of {s}, in the sense that if {u_0} belongs to both {H^s} and {H^{s'}} for two integers {s,s' > \frac{d}{2}+1} then the time {T_*} and the fields {u,p} produced by the above claims are the same for both {s} and {s'}.

We will refine part (iii) of the above exercise in the next section. It is a major open problem as to whether the case {T_* < \infty} (i.e., finite time blowup) can actually occur. (It is important here that we have some spatial decay at infinity, as represented here by the presence of the {H^s_x} norm; when the solution is allowed to diverge at spatial infinity, it is not difficult to construct smooth solutions to the Euler equations that blow up in finite time; see e.g., this article of Stuart for an example.)

Remark 11 The condition {s > \frac{d}{2}+1} that recurs in the above results can be explained using the heuristics from Section 5 of Notes 1. Assume that a given time {t}, the velocity field {u} fluctuates at a spatial frequency {N \gtrsim 1}, with the fluctuations being of amplitude {A}. (We however permit the velocity field {u} to contain a “bulk” low frequency component which can have much higher amplitude than {u}; for instance, the first component {u_1} of {u} might take the form {u_1 = B + A \cos( N x_2)} where {B} is a quantity much larger than {A}.) Suppose one considers the trajectories of two particles {P,Q} whose separation at time zero is comparable to the wavelength {1/N} of the frequency oscillation. Then the relative velocities of {P,Q} will differ by about {A}, so one would expect the particles to stay roughly the same distance from each other up to time {\sim \frac{1}{AN}}, and then exhibit more complicated and unpredictable behaviour after that point. Thus the natural time scale {T} here is {T \sim \frac{1}{AN}}, so one only expects to have a reasonable local well-posedness theory in the regime

\displaystyle \frac{1}{AN} \gtrsim 1. \ \ \ \ \ (11)


On the other hand, if {u_0} lies in {H^s}, and the frequency {N} fluctuations are spread out over a set of volume {V}, the heuristics from the previous notes predict that

\displaystyle N^s A V^{1/2} \lesssim 1.

The uncertainty principle predicts {V \gtrsim N^{-d}}, and so

\displaystyle \frac{1}{AN} \gtrsim N^{s - \frac{d}{2} - 1}.

Thus we force the regime (11) to occur if {s > \frac{d}{2}+1}, and barely have a chance of doing so in the endpoint case {s = \frac{d}{2}+1}, but would not expect to have a local theory (at least using the sort of techniques deployed in this section) for {s < \frac{d}{2} + 1}.

Exercise 12 Use similar heuristics to explain the relevance of quantities of the form {\| \nabla u \|_{L^1_t L^\infty_x}} that occurs in various places in this section.

Because the solutions constructed in Exercise 10 are limits (in rather strong topologies) of smooth solutions, it is fairly easy to extend estimates and conservation laws that are known for smooth solutions to these slightly less regular solutions. For instance:

Exercise 13 Let {s, u_0, T_*, u, p} be as in Exercise 10.

  • (i) (Energy conservation) Show that {\|u(t)\|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)} = \| u_0 \|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)}} for all {0 \leq t < T_*}.
  • (ii) Show that

    \displaystyle \| u(t) \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d}

    \displaystyle \exp( O_{s,d}( \int_0^t \| \nabla u(t')\|_{L^\infty_x({\bf R}^d \rightarrow {\bf R}^{d^2})}\ dt' )) \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

    for all {0 \leq t < T_*}.

Exercise 14 (Vanishing viscosity limit) Let the notation and hypotheses be as in Corollary 3. For each {\nu>0}, let {u^{(\nu)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(\nu)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be the solution to (3) with this choice of viscosity and with initial data {u_0}. Show that as {\nu \rightarrow 0}, {u^{(\nu)}} and {p^{(\nu)}} converge locally uniformly to {u,p}, and similarly for all derivatives of {u^{(\nu)}} and {p^{(\nu)}}. (In other words, there is actually no need to pass to a subsequence as is done in the proof of Corollary 3.) Hint: apply the energy method to control the difference {u^{(\nu)} - u}.

Exercise 15 (Local existence for forced Euler) Let {u_0 \in H^\infty_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free, and let {F \in C^\infty_{t,loc} H^\infty([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d)}, thus {F} is smooth and for any {T>0} and any integer {j \geq 0} and {s>0}, {\partial_t^j F \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}. Show that there exists {T>0} and a smooth solution {(u,p)} to the forced Euler equation

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p + F

\displaystyle \nabla \cdot u = 0

\displaystyle p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u)

\displaystyle u(0) = u_0.

Note: one will first need a local existence theory for the forced Navier-Stokes equation. It is also possible to develop forced analogues of most of the other results in this section, but we will not detail this here.

— 2. The Beale-Kato-Majda blowup criterion —

In Exercise 10 we saw that we could continue {H^s} solutions, {s > \frac{d}{2}+1} to the Euler equations indefinitely in time, unless the integral {\int_0^{T_*} \| \nabla u(t) \|_{L^\infty_x({\bf R}^d \rightarrow {\bf R}^{d^2})}\ dt} became infinite at some finite time {T_*}. There is an important refinement of this blowup criterion, due to Beale, Kato, and Majda, in which the tensor {\nabla u} is replaced by the vorticity two-form (or vorticity, for short)

\displaystyle \omega := \nabla \wedge u,

that is to say {\omega} is essentially the anti-symmetric component of {\nabla u}. Whereas {\nabla u} is the tensor field

\displaystyle (\nabla u)_{ij} = \partial_i u_j,

{\omega} is the anti-symmetric tensor field

\displaystyle \omega_{ij} = \partial_i u_j - \partial_j u_i. \ \ \ \ \ (12)


Remark 16 In two dimensions, {\omega} is essentially a scalar, since {\omega_{11}=\omega_{22}=0} and {\omega_{12} = -\omega_{21}}. As such, it is common in fluid mechanics to refer to the scalar field {\omega_{12} = \partial_1 u_2 - \partial_2 u_1} as the vorticity, rather than the two form {\omega}. In three dimensions, there are three independent components {\omega_{23}, \omega_{31}, \omega_{12}} of the vorticity, and it is common to view {\omega} as a vector field {\vec \omega = (\omega_{23}, \omega_{31}, \omega_{12})} rather than a two-form in this case (actually, to be precise {\omega} would be a pseudovector field rather than a vector field, because it behaves slightly differently to vectors with respect to changes of coordinate). With this interpretation, the vorticity is now the curl of the velocity field {u}. From a differential geometry viewpoint, one can view the two-form {\omega} as an antisymmetric bilinear map from vector fields {X,Y} to scalar functions {\omega(X,Y)}, and the relation between the vorticity two-form {\omega} and the vorticity (pseudo-)vector field {\vec \omega} in {{\bf R}^3} is given by the relation

\displaystyle \omega(X,Y) = \mathrm{vol}( \vec \omega, X, Y )

for arbitrary vector fields {X,Y}, where {\mathrm{vol} = dx_1 \wedge dx_2 \wedge dx_3} is the volume form on {{\bf R}^3}, which can be viewed in three dimensions as an antisymmetric trilinear form on vector fields. The fact that {\vec \omega} is a pseudovector rather than a vector then arises from the fact that the volume form changes sign upon applying a reflection.

The point is that vorticity behaves better under the Euler flow than the full derivative {\nabla u}. Indeed, if one takes a smooth solution to the Euler equation in coordinates

\displaystyle \partial_t u_j + u_k \partial_k u_j = -\partial_j p

and applies {\partial_i} to both sides, one obtains

\displaystyle \partial_t \partial_i u_j + \partial_i u_k \partial_k u_j + u_k \partial_k \partial_i u_j = -\partial_i \partial_j p.

If one interchanges {i,j} and then subtracts, the pressure terms disappear, and one is left with

\displaystyle \partial_t \omega_{ij} + \partial_i u_k \partial_k u_j - \partial_j u_k \partial_k u_i + u_k \partial_k \omega_{ij} = 0

which we can rearrange using the material derivative {D_t = \partial_t + u_k \partial_k} as

\displaystyle D_t \omega_{ij} - \partial_j u_k \partial_k u_i + \partial_i u_k \partial_k u_j.

Writing {\partial_k u_i = -\omega_{ik} + \partial_i u_k} and {\partial_k u_j = - \omega_{jk} + \partial_j u_k}, this becomes the vorticity equation

\displaystyle D_t \omega_{ij} + \omega_{ik} \partial_j u_k - \omega_{jk} \partial_i u_k = 0. \ \ \ \ \ (13)


The vorticity equation is particularly simple in two and three dimensions:

Exercise 17 (Transport of vorticity) Let {u,p} be a smooth solution to Euler equation in {{\bf R}^d}, and let {\omega} be the vorticity two-form.

  • (i) If {d=2}, show that

    \displaystyle D_t \omega_{12} = 0.

  • (ii) If {d=3}, show that

    \displaystyle D_t \vec \omega = (\vec \omega \cdot \nabla) u

    where {\vec \omega = (\omega_{23}, \omega_{31}, \omega_{12})} is the vorticity pseudovector.

Remark 18 One can interpret the vorticity equation in the language of differential geometry, which is a more covenient formalism when working on more general Riemann manifolds than {{\bf R}^d}. To be consistent with the conventions of differential geometry, we now write the components of the velocity field {u} as {u^i} rather than {u_i} (and the coordinates of {{\bf R}^d} as {x^i} rather than {x_i}). Define the covelocity {1}-form {v} as

\displaystyle v = \eta_{ij} u^i dx^j

where {\eta_{ij}} is the Euclidean metric tensor (in the standard coordinates, {\eta_{ij} = \delta_{ij}} is the Kronecker delta, though {\eta_{ij}} can take other values than {\delta_{ij}} if one uses a different coordinate system). Thus in coordinates, {v_i = \eta_{ij} u^j}; the covelocity field is thus the musical isomorphism applied to the velocity field. The vorticity {2}-form {\omega} can then be interpreted as the exterior derivative of the covelocity, thus

\displaystyle \omega = dv

or in coordinates

\displaystyle \omega_{ij} = \partial_i v_j - \partial_j v_i.

The Euler equations can be rearranged as

\displaystyle \partial_t v + \mathcal{L}_u v = - d \tilde p, \ \ \ \ \ (14)


where {\mathcal{L}_u} is the Lie derivative along {u}, which for {1}-forms is given in coordinates as

\displaystyle \mathcal{L}_u v_i = u^j \partial_j v_i + (\partial_i u^j) v_j

and {\tilde p} is the modified pressure

\displaystyle \tilde p := p - \frac{1}{2} u^j v_j.

If one takes exterior derivatives of both sides of (14) using the basic differential geometry identities {d \mathcal{L}_u = \mathcal{L}_u d} and {dd = 0}, one obtains the vorticity equation

\displaystyle \partial_t \omega + \mathcal{L}_u \omega = 0

where the Lie derivative for {2}-forms is given in coordinates as

\displaystyle \mathcal{L}_u \omega_{ik} = u^j \partial_j \omega_{ik} + (\partial_i u^j) \omega_{jk} + (\partial_k u^j) \omega_{ij}

and so we recover (13) after some relabeling.

We now present the Beale-Kato-Majda condition.

Theorem 19 (Beale-Kato-Majda) Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence free. Let {u \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R})} be the maximal Cauchy development from Exercise 10, and let {\omega} be the vorticity.

The double exponential in (i) is not a typo! It is an open question though as to whether this double exponential bound can be at all improved, even in the simplest case of two spatial dimensions.

We turn to the proof of this theorem. Part (ii) will be implied by part (i), since if {\| \omega\|_{L^1_t L^\infty_x( [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^{d^2} )}} is finite then part (i) gives a uniform bound on {\|u(t)\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}} as {t \rightarrow T_*}, preventing finite time blowup. So it suffices to prove part (i). To do this, it suffices to do so for {H^\infty} solutions, since one can then pass to a limit (using the strong continuity in {C^0_t H^s_x}) to establish the general case. In particular, we can now assume that {u,p,u_0} are smooth.

We would like to convert control on {\omega} back to control of the full derivative {\nabla u}. If one takes divergences {\partial_i \omega_{ij}} of the vorticity using (12) and the divergence-free nature {\partial_i u_i = 0} of {u}, we see that

\displaystyle \partial_i \omega_{ij} = \Delta u_j.

Thus, we can recover the derivative {\partial_k u_j} from the vorticity by the formula

\displaystyle \partial_k u_j = \Delta^{-1} \partial_i \partial_k \omega_{ij}, \ \ \ \ \ (16)


where one can define {\Delta^{-1} \partial_i \partial_k} via the Fourier transform as a multiplier bounded on every {H^s} space.

If the operators {\Delta^{-1} \partial_i \partial_k} were bounded in {L^\infty_x({\bf R}^d \rightarrow {\bf R})}, then we would have

\displaystyle \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_d \| \omega(t)\|_{L^\infty({\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)}

and the claimed bound (15) would follow from Theorem 1(ii) (with one exponential to spare). Unfortunately, {\Delta^{-1} \partial_i \partial_j} is not quite bounded on {L^\infty}. Indeed, from Exercise 18 of Notes 1 we have the formula

\displaystyle \Delta^{-1} \partial_i \partial_k \phi(y) = \lim_{\varepsilon \rightarrow 0} \int_{|x| > \varepsilon} K_{ik}(x) \phi(x+y)\ dx + \frac{\delta_{ik}}{d} \phi(y)

for any test function {\phi} and {y \in {\bf R}^d}, where {K_{ik}} is the singular kernel

\displaystyle K_{ik}(x) := -\frac{1}{|S^{d-1}|} (\frac{d x_i x_k}{|x|^{d+2}} - \frac{\delta_{ik}}{|x|^d}).

If one sets {\phi} to be a (smooth approximation) to the signum {\mathrm{sgn}(K_{ik})} restricted to an annulus {\varepsilon \leq |x| \leq R}, we conclude that the operator norm of {\Delta^{-1} \partial_i \partial_k} is at least as large as

\displaystyle \int_{\varepsilon \leq |x| \leq R} |K_{ik}(x)|\ dx.

But one can calculate using polar coordinaates that this expression diverges like {\log \frac{R}{\varepsilon}} in the limit {\varepsilon \rightarrow 0}, {R \rightarrow \infty}, giving unboundedness.

As it turns out, though, the Gronwall argument used to establish Theorem 1(ii) can just barely tolerate an additional “logarithmic loss” of the above form, albeit at the cost of worsening the exponential term to a double exponential one. The key lemma is the following result that quantifies the logarithmic divergence indicated by the previous calculaation, and is similar in spirit to a well known inequality of Brezis and Wainger.

Lemma 20 (Near-boundedness of {\Delta^{-1} \partial_i \partial_k}) For any {\phi \in H^\infty_x({\bf R}^d \rightarrow {\bf R})} and {s > \frac{d}{2}}, one has

\displaystyle \| \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty({\bf R}^d \rightarrow {\bf R})} \lesssim_{s,d} \| \phi \|_{L^\infty({\bf R}^d \rightarrow {\bf R})} \log(2 + \| \phi \|_{H^s_x({\bf R}^d \rightarrow {\bf R})}) \ \ \ \ \ (17)


\displaystyle + \| \phi \|_{L^2_x({\bf R}^d \rightarrow {\bf R})} + 1.

The lower order terms {\| \phi \|_{L^2_x({\bf R}^d \rightarrow {\bf R})} + 1} will be easily dealt with in practice; the main point is that one can almost bound the {L^\infty} norm of {\Delta^{-1} \partial_i \partial_k \phi} by that of {\phi}, up to a logarithmic factor.

Proof: By a limiting argument we may assume that {\phi} is a test function. We apply Littlewood-Paley decomposition to write

\displaystyle \phi = P_{\leq 1} \phi + \sum_{N>1} P_N \phi

and hence by the triangle inequality we may bound the left-hand side of (17) by

\displaystyle \| P_{\leq 1} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} + \sum_{N>1} \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty}

where we omit the domain and range from the function space norms for brevity.

By Bernstein’s inequality we have

\displaystyle \| P_{\leq 1} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} \lesssim_d \| \Delta^{-1} \partial_i \partial_k \phi \|_{L^2} \lesssim_d \| \phi \|_{L^2}.

Also, from Bernstein and Plancherel we have

\displaystyle \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} \lesssim_d N^{d/2} \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^2}

\displaystyle \lesssim_d N^{d/2} \| P_{N} \phi \|_{L^2}

\displaystyle \lesssim_d N^{d/2-s} \| \phi \|_{H^s}

and hence by geometric series we have

\displaystyle \sum_{N > N_0} \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} \lesssim_{s,d} N_0^{d/2-s} \| \phi \|_{H^s}

for any {N_0>1}. This gives an acceptable contribution if we select {N_0 := (2+\| \phi \|_{H^s})^{1/(s-d/2)}}. This leaves {O_{s,d}( \log(2 + \| \phi\|_{H^s})} remaining values of {N} to control, so if one can bound

\displaystyle \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} \lesssim_d \| \phi \|_{L^\infty} \ \ \ \ \ (18)


for each {N > 1}, we will be done.

Observe from applying the scaling {x \mapsto Nx/2} (that is, replacing {x \mapsto \phi(x)} with {x \mapsto \phi(2x/N)} that to prove (18) for all {N} it suffices to do so for {N=2}. By Fourier analysis, the function {P_2 \Delta^{-1} \partial_i \partial_k \phi} is the convolution of {\phi} with the inverse Fourier transform {K} of the function

\displaystyle \xi \mapsto (\phi(\xi/2) - \phi(\xi)) \frac{\xi_i \xi_k}{|\xi|^2}.

This function is a test function, so {K} is a Schwartz function, and the claim now follows from Young’s inequality. \Box

We return now to the proof of (15). We adapt the proof of Proposition 1(i). As in that proposition, we introduce the higher energy

\displaystyle \partial_t E(t) = \sum_{m=0}^s \int_{{\bf R}^d} \nabla^k u \cdot \nabla^k \partial_t u\ dx.

We no longer have the viscosity term as {\nu=0}, but that term was discarded anyway in the analysis. From (4) we have

\displaystyle \partial_t E(t) \leq O_{s,d}( E(t) \| \nabla u(t) \|_{L^\infty} ).

Applying (16), (20) one thus has

\displaystyle \partial_t E(t) \leq O_{s,d}( E(t) (\| \omega(t) \|_{L^\infty} \log(2 + E(t)) + \| u(t) \|_{L^2} + 1) ).

From Exercise 13 one has

\displaystyle \| u(t) \|_{L^2} = \| u_0 \|_{L^2}.

By the chain rule, one then has

\displaystyle \partial_t \log(2+E(t)) \leq O_{s,d}( \| \omega(t) \|_{L^\infty} \log(2 + E(t)) + \| u_0 \|_{L^2} + 1 )

and hence by Gronwall’s inequality one has

\displaystyle \log(2+E(T)) \lesssim_{s,d} T (\|u_0\|_{L^2}+1) +

\displaystyle \log(2+E(0)) \exp( O_{s,d}( \|\omega \|_{L^1_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)} ) ).

The claim (15) follows.

Remark 21 The Beale-Kato-Majda criterion can be sharpened a little bit, by replacing the sup norm {\|\omega(t) \|_{L^\infty({\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)}} with slightly smaller norms, such as the bounded mean oscillation (BMO) norm of {\omega(t)}, basically by improving the right-hand side of Lemma 20 slightly. See for instance this paper of Planchon and the references therein.

Remark 22 An inspection of the proof of Theorem 19 reveals that the same result holds if the Euler equations are replaced by the Navier-Stokes equations; the energy estimates acquire an additional “{C}” term by doing so (as in the proof of Proposition 1), but the sign of that term is favorable.

We now apply the Beale-Kato-Majda criterion to obtain global well-posedness for the Euler equations in two dimensions:

Theorem 23 (Global well-posedness) Let {u_0, s, T_*, u, p} be as in Exercise 10. If {d=2}, then {T_* = +\infty}.

This theorem will be immediate from Theorem 19 and the following conservation law:

Proposition 24 (Conservation of vorticity distribution) Let {u_0, s, T_*, u, p} be as in Exercise 10 with {d=2}. Then one has

\displaystyle \| \omega_{12}(t) \|_{L^q({\bf R}^2 \rightarrow {\bf R})} = \| \omega_{12}(0) \|_{L^q({\bf R}^2 \rightarrow {\bf R})}

for all {2 \leq q \leq \infty} and {0 \leq t < T_*}.

Proof: By a limiting argument it suffices to show the claim for {q < \infty}, thus we need to show

\displaystyle \int_{{\bf R}^2} |\omega_{12}(t, x)|^q\ dx = \int_{{\bf R}^2} |\omega_{12}(0, x)|^q\ dx.

By another limiting argument we can take {u} to be an {H^\infty} solution. By the monotone convergence theorem (and Sobolev embedding), it suffices to show that

\displaystyle \int_{{\bf R}^2} F( \omega_{12}(t, x) )\ dx = \int_{{\bf R}^2} F( \omega_{12}(0, x) )\ dx

whenever {F: {\bf R} \rightarrow {\bf R}} is a test function that vanishes in a neighbourhood of the origin {0}. Note that as {\omega_{12}} and all its derivatives are in {L^\infty_t H^\infty_x} on {[0,T] \times {\bf R}^2} for every {0 < T < T_*}, it is Lipschitz in space and time, which among other things implies that the level sets {\{ (t,x) \in [0,T] \times {\bf R}^2: \omega_{12}| \geq \varepsilon \}} are compact for every {\varepsilon>0}, and so {F(\omega_{12})} is smooth and compactly supported in {[0,T] \times {\bf R}^2}. We may therefore may differentiate under the integral sign to obtain

\displaystyle \partial_t \int_{{\bf R}^2} F( \omega_{12}(t, x) )\ dx = \int_{{\bf R}^2} F'( \omega_{12} ) \partial_t \omega_{12}\ dx

where we omit explicit dependence on {t,x} for brevity. By Exercise 17(i), the right-hand side is

\displaystyle \int_{{\bf R}^2} F'( \omega_{12} ) (u \cdot \nabla) \omega_{12}\ dx

which one can write as a total derivative

\displaystyle \int_{{\bf R}^2} (u \cdot \nabla) F(\omega_{12})\ dx

which vanishes thanks to integration by parts and the divergence-free nature of {u}. The claim follows. \Box

The above proposition shows that in two dimensions, {\| \omega(t)\|_{L^\infty({\bf R}^2 \rightarrow \bigwedge^2{\bf R}^2)}} is constant, and so the integral {\int_0^{T_*} \| \omega(t)\|_{L^\infty({\bf R}^2 \rightarrow \bigwedge^2{\bf R}^2)}\ dt} cannot diverge for finite {T_*}. Applying Theorem 19, we obtain Theorem 23. We remark that global regularity for two-dimensional Euler was established well before the Beale-Kato-Majda theorem, starting with the work of Wolibner.

One can adapt this argument to the Navier-Stokes equations:

Exercise 25 Let {s > 2} be an integer, let {\nu>0}, let {u_0 \in H^s({\bf R}^2 \rightarrow {\bf R}^2)} be divergence-free, and let {u: [0,T_*) \times {\bf R}^2 \rightarrow {\bf R}^2}, {p: [0,T_*) \times {\bf R}^2 \rightarrow {\bf R}} be a maximal Cauchy development to the Navier-Stokes equations with initial data {u_0}. Let {\omega} be the vorticity.

Remark 26 There are other ways to establish global regularity for two-dimensional Navier-Stokes (originally due to Ladyzhenskaya); for instance, the {L^2} bound on the vorticity in Exercise 25(ii), combined with energy conservation, gives a uniform {H^1} bound on the velocity field, which can then be inserted into (the non-periodic version of) Theorem 38 of Notes 1.

Remark 27 If {t \mapsto u(t,x), t \mapsto p(t,x)} solve the Euler equations on some time interval {I} with initial data {x \mapsto u_0(x)}, then the time-reversed fields {t \mapsto -u(-t,x), t \mapsto p(-t,x)} solve the Euler equations on the reflected interval {-I} with initial data {x \mapsto -u_0(x)}. Because of this time reversal symmetry, the local and global well-posedness theory for the Euler equations can also be extended backwards in time; for instance, in two dimensions any {H^\infty} divergence free initial data {u_0} leads to an {H^\infty} solution to the Euler equations on the whole time interval {(-\infty,\infty)}. However, the Navier-Stokes equations are very much not time-reversible in this fashion.

n-Category Café Topoi of G-sets

I’m thinking about finite groups these days, from a Klein geometry perspective where we think of a group GG as a source of GG-sets. Since the category of GG-sets is a topos, this lets us translate concepts, facts and questions about groups into concepts, facts and questions about topoi. I’m not at all good at this, so here are a bunch of basic questions.

For any group GG the category of GG-sets is a Boolean topos, which means basically that its internal logic obeys the principle of excluded middle.

  • Which Boolean topoi are equivalent to the category of GG-sets for some group GG?

  • Which are equivalent to the category of GG-sets for a finite group GG?

It might be easiest to start by characterizing the categories of GG-sets where GG is a groupoid, and then add an extra condition to force GG to be a group.

The category GSetG Set comes with a forgetful functor U:GSetSetU: G Set \to Set.

  • Is the group of natural automorphisms of UU just GG?

This should be easy to check, I’m just feeling lazy. If some result like this is true, how come people talk so much about the Tannaka–Krein reconstruction theorem and not so much about this simpler thing? (Maybe it’s just too obvious.)

Whenever we have a homomorphism f:HGf \colon H \to G we get an obvious functor

f *:GSetHSet f^\ast \colon G Set \to H Set

This is part of an essential geometric morphism, which means that it has both a right and left adjoint. By this means we can actually get a 2-functor from the 2-category of groups (yeah, it’s a 2-category since groups can be seen as one-object categories) to the 2-category Topos essTopos_{ess} consisting of topoi, essential geometric morphisms and natural transformations. If I’m reading the nnLab correctly, this makes GSetG Set into a full sub-2-category of Topos essTopos_{ess}. This makes it all the more interesting to know which topoi are equivalent to categories of GG-sets.

  • What properties characterize essential geometric morphisms of the form i *:GSetHSeti^\ast \colon G \Set \to H \Set when i:HGi \colon H \to G is the inclusion of a subgroup?

Whenever we have this, we get a transitive GG-set G/HG/H, which is thus a special object in GSetG Set. These objects are just the atoms in GSetG Set: that is, the objects whose only subobjects are themselves and the initial object. Indeed GSetG Set is an atomic topos, meaning that every object is a coproduct of atoms. That’s just a fancy way of saying that every GG-set can be broken into orbits, which are transitive GG-sets.


  • What properties characterize essential geometric morphisms of the form i *:GSetHSeti^\ast \colon G \Set \to H \Set when i:HGi \colon H \to G is the inclusion of a normal subgroup?

In this case G/HG/H is a group with a surjection p:GG/Hp \colon G \to G/H, so we get another topos (G/H)Set(G/H)Set and essential geometric morphisms

Set(G/H)Setp *GSeti *HSetSet Set \longrightarrow (G/H)Set \stackrel{p^\ast}{\longrightarrow} G Set \stackrel{i^\ast}{\longrightarrow} H Set \longrightarrow Set

  • What properties characterize essential geometric morphisms of the form p *p^* for pp a surjective homomorphism of groups?

  • Is there a concept of ‘short exact sequence’ of essential geometric morphisms such that the above sequence is an example?

Well, my questions could go on all day, but this is enough for now!

BackreactionDear Dr B: What do you actually live from?

Some weeks ago a friend emailed me to say he was shocked – shocked! – to hear I had lost my job. This sudden unemployment was news to me, but not as big a surprise as you may think. I was indeed unemployed for two months last year, not because I said rude things about other people’s theories, but simply because someone forgot to renew my contract. Or maybe I forgot to ask that it be renewed. Or

October 14, 2018

Mark Chu-CarrollAnother Stab at Category Theory: Starting with Monoids

Introduction and Motivation

One thing that I keep bumping up against as an engineer who loves functional a programming is category theory. It often seems like there are two kinds of functional programmers: people who came into functional programming via engineering, and people who came into functional programming via math. The problem is that a lot of the really interesting work in languages and libraries for functional programming are being built from the mathematical side, but for people on the engineering side, it’s impenetrable: it’s like it’s written in a whole different language, and even basic discussions about programming go off the rails, because the basic abstractions don’t make any sense if you don’t know category theory.

But how do you learn category theory? It seems impenetrable to mere humans. For example, one of the textbooks on category theory that several people told me was the most approachable starts chapter one with the line:

A group extension of an abelian group H by an abelian group G consists of a group E together with an inclusion of G \hookrightarrow E as a normal subgroup and a surjective homomorphism E  \twoheadrightarrow H that displays H as the quotient group E|G.

If you’re not a professional mathematician, then that is pure gobbledigook. But that seems to be typical of how initiates of category theory talk about it. But the basic concepts, while abstract, really aren’t all that tricky. In many ways, it feels a lot like set theory: there’s a simple conceptual framework, on which you can build extremely complicated formalisms. The difference is that while many people have spent years figuring out how to make the basics of set theory accessible to lay-people, but that effort hasn’t been applied to set theory.

What’s the point?

Ok, so why should you care about category theory?

Category theory is a different way of thinking, and it’s a language for talking about abstractions. The heart of engineering is abstraction. We take problems, and turn them into abstract structures. We look at the structures we create, and recognize commonalities between those structures, and then we create new abstractions based on the commonalities. The hardest part of designing a good library is identifying the right abstractions.

Category theory is a tool for talking about structures, which is particularly well suited to thinking about software. In category theory, we think in terms of arrows, where arrows are mappings between objects. We’ll see what that means in detail later, but the gist of it is that one example of arrows mapping between objects is functions mapping between data types in a computer program.

Category theory is built on thinking with orrows, and building structures using arrows. It’s about looking at mathematical constructions built with arrows, and in those structures, figuring out what the fundamental parts are. When we abstract enough, we can start to see that things that look very different are really just different realizations of the same underlying structure. Category theory gives us a language and a set of tools for doing that kind of abstraction – and then we can take the abstract structures that we identify, and turn them into code – into very generic libraries that express deep, fundamental structure.

Start with an Example: Monoids

Monoids in Code

We’ll get started by looking at a simple mathematical structure called a monoid, and how we can implement it in code; and then, we’ll move on to take an informal look at how it works in terms of categories.

Most of the categorical abstractions in Scala are implemented using something called a typeclass, so we’ll start by looking at typeclasses. Typeclasses aren’t a category theoretical notion, but they make it much, much easier to build categorical structures. And they do give us a bit of categorical flavor: a typeclass defines a kind of metatype – that is, a type of type – and we’ll see, that kind of self-reflective abstraction is a key part of category theory.

The easiest way to think about typeclasses is that they’re a kind of metatype – literally, as the name suggests, they define classes where the elements of those classes are types. So a typeclass provides an interface that a type must provide in order to be an instance of the metatype. Just like you can implement an interface in a type by providing implementations of its methods, you can implement a typeclass by providing implementations of its operations.

In Scala, you implement the operations of a typeclasses using a language construct called an implicit parameter. The implicit parameter attaches the typeclass operations to a meta-object that can be passed around the program invisibly, providing the typeclass’s operations.

Let’s take a look at an example. An operation that comes up very frequently in any kind of data processing code is reduction: taking a collection of values of some type, and combining them into a single value. Taking the sum of a list of integers, the product of an array of floats, and the concatenation of a list of strings are all examples of reduction. Under the covers, these are all similar: they’re taking an ordered group of values, and performing an operation on them. Let’s look at a couple of examples of this:

def reduceFloats(floats: List[Float]): Float =
    floats.foldRight(0)((x, y) => x + y)

def reduceStrings(strings: Seq[String]): String =
    strings.foldRight("")((x, y) => x.concat(y))

When you look at the code, they look very similar. They’re both just instantiations of the same structural pattern:

def reduceX(xes: List[X]): X =
    xes.foldRight(xIdentity)((a, b) => Xcombiner(a, b))

The types are different; the actual operation used to combine the values is different; the base value in the code is different. But they’re both built on the same pattern:

  • There’s a type of values we want to combine: Float or String. Everything we care about in reduction is a connected with this type.
  • There’s a collection of values that we want to combine, from left to right. In one case, that’s a List[Float], and in the other, it’s a Seq[String]. The type doesn’t matter, as long as we can iterate over it.
  • There’s an identity value that we can use as a starting point for building the result; 0 for the floats, and "" (the empty string) for the strings.
  • There’s an operation to combine two values: + for the floats, and concat for the strings.

We can capture that concept by writing an interface (a trait, in Scala terms) that captures it; that interface is called a typeclass. It happens that this concept of reducible values is called a monoid in abstract algebra, so that’s the name we’ll use.

trait Monoid[A]  {
    def empty: A
    def combine(x: A, y: A): A

We can read that as saying “A is a monoid if there are implementations of empty and combine that meet these constraints”. Given the declaration of the typeclass, we can implement it as an object which provides those operations for a particular type:

object FloatAdditionMonoid extends Monoid[Float] {
    def empty: Float = 0.0
    def combine(x: Float, y: Float): Float = x + y

object StringConcatMonoid extends Monoid[String] {
    def empty: String = ""
    def combine(x: String, y: String): String = x.concat(y)

FloatAdditionMonoid implements the typeclass Monoid for the type Float. And since we can write an implementation of Monoid for Float or String, we can say that the types Float and String are instances of the typeclass Monoid.

Using our implementation of Monoid, we can write a single, generic reduction operator now:

def reduce[A](values: Seq[A], monoid: Monoid[A]): A =

We can use that to reduce a list of floats:

reduce([1.0, 3.14, 2.718, 1.414, 1.732], FloatAdditionMonoid)

And we can do a bit better than that! We can set up an implicit, so that we don’t need to pass the monoid implementation around. In Scala, an implicit is a kind of dynamically scoped value. For a given type, there can be one implicit value of that type in effect at any point in the code. If a function takes an implicit parameter of that type, then the nearest definition in the execution stack will automatically be inserted if the parameter isn’t passed explicitly.

def reduce[A](values: Seq[A])(implicit A: Monoid[A]): A =

And as long as there’s a definition of the Monoid for a type A in scope, we can can use that now by just writing:

implicit object FloatAdditionMonoid extends Monoid[Float] {
    def empty: Float = 0.0
    def combine(x: Float, y: Float): Float = x + y

val floats: List[Float] = ...
val result = reduce(floats)

Now, anywhere that the FloatAdditionMonoid declaration is imported, you can call reduce on any sequence of floats, and the implicit value will automatically be inserted.

Using this idea of a monoid, we’ve captured the concept of reduction in a common abstraction. Our notion of reduction doesn’t care about whether we’re reducing strings by concatenation, integers by addition, floats by multiplication, sets by union. Those are all valid uses of the concept of a monoid, and they’re all easy to implement using the monoid typeclass. The concept of monoid isn’t a difficult one, but at the same time, it’s not necessarily something that most of us would have thought about as an abstraction.

We’ve got a typeclass for a monoid; now, we’ll try to connect it into category theory. It’s a bit tricky, so we won’t cover it all at once. We’ll look at it a little bit now, and we’ll come back to it in a later lesson, after we’ve absorbed a bit more.

From Sets to Arrows

For most of us, if we’ve heard of monoids, we’ve heard of them in terms of set theory and abstract algebra. So in that domain, what’s a monoid?

A monoid is a triple (V, 1, *), where:

  • V is a set of values;
  • 1 is a value in V;
  • * is a total binary operator where:
    • 1 is an identity of *: For any value v \in V: v*1 = 1*v = v.
    • * is associative: for any values v, w, x \in V:  (v * w) * x = v * (w * x)

That’s all just a formal way of saying that a monoid is a set with a binary associative operator and an identity value. The set of integers can form a monoid with addition as the operator, and 0 as identity. Real numbers can be a monoid with multiplication and 1. Strings can be a monoid with concatenation as the operator, and empty string as identity.

But we can look at it in a different way, too, by thinking entirely in terms of function.
Let’s forget about the numbers as individual values, and instead, let’s think about them in functional terms. Every number is a function which adds itself to its parameter. So “2” isn’t a number, it’s a function which adds two to anything.

How can we tell that 2 is a function which adds two to things?

If we compose it with 3 (the function that adds three to things), we get 5 (the function that adds five to things). And how do we know that? Because it’s the same thing that we get if we compose 3 with 1, and then compose the result of that with 1 again. 3+1+1=5, and 3+2=5. We can also tell that it’s 2, because if we just take 1, and compose it with itself, what we’ll get back is the object that we call 2.

In this scheme, all of the numbers are related not by arithmetic, not by an underlying concept of quantity or cardinality or ordinality, but only by how they compose with each other. We can’t see anything else – all we have are these functions. But we can recognize that they are the natural numbers that we’re familiar with.

Looking at it this way, we can think of the world of natural numbers as a single point, which represents the set of all natural numbers. And around that point, we’ve got lots and lots of arrows, each of which goes from that point back to itself. Each of those arrows represents one number. The way we tell them apart is by understanding which arrow we get back when we compose them. Take any arrow from that point back to that point, and compose it with the arrow 0, and what do you get? The arrow you started with. Take any arrow that you want, and compose it with 2. What do you get? You get the same thing that you’d get if you composed it with 1, and then composed it with one again.

That dot, with those arrows, is a category.

What kind of advantage do we get in going from the algebraic notion of a set with a binary operation, to the categorical notion of an object with a bunch of composable arrows? It allows to understand a monoid purely as a structure, without having the think about what the objects are, or what the operator means.

Now, let’s jump back to our monoid typeclass for a moment.

trait Monoid[A]  {
    def empty: A
    def combine(x: A, y: A): A

We can understand this as being a programmable interface for the categorical object that we just described. All we need to do is read “:” as “is an arrow in”: It says that A is a monoid if:

  • It has an element called empty which is an arrow in A.
  • It has an operation called combine which, given any two arrows in A, composes them into a new arrow in A.

There are, of course, other conditions – combine needs to be associative, and empty needs to behave as the identity value. But just like when we write an interface for, say, a binary search tree, the interface only defines the structure not the ordering condition, the typeclass defines the functional structure of the categorical object, not the logical conditions.

This is what categories are really all about: tearing things down to a simple core, where everything is expressed in terms of arrows. It’s almost reasoning in functions, except that it’s even more abstract than that: the arrows don’t need to be functions – they just need to be composable mappings from things to things.

Deeper Into Arrows

We can abstract a bit more, and look at the entire construction, including the identity and associativity constraints entirely in terms of arrows. To really understand this, we’ll need to spend some time diving deeper into the actual theory of categories, but as a preview, we can describe a monoid with the following pair of diagrams (copied from wikipedia):

In these diagrams, any two paths between the same start and end-nodes are equivalent (up to isomorphism). When you understand how to read this diagrams, these really do define everything that we care about for monoids.

For now, we’ll just run through and name the parts – and then later, in another lesson, we’ll come back, and we’ll look at this in more detail.

  • \mu is an arrow from M\times M \rightarrow M, which we’ll call a multiplication operator.
  • \eta is an arrow from I \rightarrow M, called unit.
  • \alpha is an arrow from (M\times M)\times M \rightarrow M \times (M\times M) which represents the associativity property of the monoid.
  • \lambda is a morphism which represents the left identity property of the monoid (that is, 1*x=x), and \rho is a morphism representing the right identity property (x*1=x).

This diagram, using these arrows, is a way of representing all of the key properties of a monoid via nothing but arrows and composition. It says, among other things, that:

  • (M \times M) \times M composes with multiplication to be M \times M.
    That is, applying multiplication to (M \times M) \times M evaluates to (M \times M).
  • (M \times M) \times M composed with associativity can become M \times (M \times M).

So it’s a monoid – but it’s a higher level monoid. In this, M isn’t just an object in a category: it’s an entire category. These arrows are arrows between categories in a category of categories.

What we’ll see when we get deeper into category theory is how powerful this kind of abstraction can get. We’ll often see a sequence of abstractions, where we start with a simple concept (like monoid), and find a way to express it in terms of arrows between objects in a category. But then, we’ll lift it up, and look at how we can see in not just as a relation between objects in a category, but as a different kind of relation between categories, by constructing the same thing using a category of categories. And then we’ll abstract even further, and construct the same thing using mappings between categories of categories.

Doug NatelsonFaculty position at Rice - theoretical biological physics

Faculty position in Theoretical Biological Physics at Rice University

As part of the Vision for the Second Century (V2C2), which is focused on investments in research excellence, Rice University seeks faculty members, preferably at the assistant professor level, starting as early as July 1, 2019, in all areas of Theoretical Biological Physics. Successful candidates will lead dynamic, innovative, and independent research programs supported by external funding, and will excel in teaching at the graduate and undergraduate levels, while embracing Rice’s culture of excellence and diversity.  This search will consider applicants from all science and engineering disciplines. Ideal candidates will pursue research with strong intellectual overlap with physics, chemistry, biosciences, bioengineering, chemical and biomolecular engineering, or other related disciplines. Applicants pursuing all styles of theory and computation integrating the physical and life sciences are encouraged to apply.

For full details and to apply, please visit  Applicants should please submit the following materials: (1) cover letter (2) curriculum vitae, (3) research statement, (4) statement of teaching philosophy, and the names and contact information for three references. Application review will commence no later than November 30, 2018 and continue until the positions are filled. Candidates must have a PhD or equivalent degree and outstanding potential in research and teaching. We particularly encourage applications from women and members of historically underrepresented groups who bring diverse cultural experiences and who are especially qualified to mentor and advise members of our diverse student population.

Rice University, located in Houston, Texas, is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability, or protected veteran status.

October 13, 2018

John BaezLebesgue Universal Covering Problem (Part 3)

Back in 2015, I reported some progress on this difficult problem in plane geometry. I’m happy to report some more.

First, remember the story. A subset of the plane has diameter 1 if the distance between any two points in this set is ≤ 1. A universal covering is a convex subset of the plane that can cover a translated, reflected and/or rotated version of every subset of the plane with diameter 1. In 1914, the famous mathematician Henri Lebesgue sent a letter to a fellow named Pál, challenging him to find the universal covering with the least area.

Pál worked on this problem, and 6 years later he published a paper on it. He found a very nice universal covering: a regular hexagon in which one can inscribe a circle of diameter 1. This has area


But he also found a universal covering with less area, by removing two triangles from this hexagon—for example, the triangles C1C2C3 and E1E2E3 here:

The resulting universal covering has area


In 1936, Sprague went on to prove that more area could be removed from another corner of Pál’s original hexagon, giving a universal covering of area


In 1992, Hansen took these reductions even further by removing two more pieces from Pál’s hexagon. Each piece is a thin sliver bounded by two straight lines and an arc. The first piece is tiny. The second is downright microscopic!

Hansen claimed the areas of these regions were 4 · 10-11 and 6 · 10-18. This turned out to be wrong. The actual areas are 3.7507 · 10-11 and 8.4460 · 10-21. The resulting universal covering had an area of


This tiny improvement over Sprague’s work led Klee and Wagon to write:

it does seem safe to guess that progress on [this problem], which has been painfully slow in the past, may be even more painfully slow in the future.

However, in 2015 Philip Gibbs found a way to remove about a million times more area than Hansen’s larger region: a whopping 2.233 · 10-5. This gave a universal covering with area


Karine Bagdasaryan and I helped Gibbs write up a rigorous proof of this result, and we published it here:

• John Baez, Karine Bagdasaryan and Philip Gibbs,The Lebesgue universal
covering problem
, Journal of Computational Geometry 6 (2015), 288–299.

Greg Egan played an instrumental role as well, catching various computational errors.

At the time Philip was sure he could remove even more area, at the expense of a more complicated proof. Since the proof was already quite complicated, we decided to stick with what we had.

But this week I met Philip at The philosophy and physics of Noether’s theorems, a wonderful workshop in London which deserves a full blog article of its own. It turns out that he has gone further: he claims to have found a vastly better universal covering, with area


This is an improvement of 2.178245 × 10-5 over our earlier work—roughly equal to our improvement over Hansen.

You can read his argument here:

• Philip Gibbs, An upper bound for Lebesgue’s universal covering problem, 22 January 2018.

I say ‘claims’ not because I doubt his result—he’s clearly a master at this kind of mathematics!—but because I haven’t checked it and it’s easy to make mistakes, for example mistakes in computing the areas of the shapes removed.

It seems we are closing in on the final result; however, Philip Gibbs believes there is still room for improvement, so I expect it will take at least a decade or two to solve this problem… unless, of course, some mathematicians start working on it full-time, which could speed things up considerably.

October 12, 2018

Terence Tao254A, Notes 0: Physical derivation of the incompressible Euler and Navier-Stokes equations

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

\displaystyle F = ma

to the seemingly rather different-looking law

\displaystyle \partial_t u + (u \cdot \nabla) u = -\nabla p + \nu \Delta u

\displaystyle \nabla \cdot u = 0

for incompressible Navier-Stokes (or, if one drops the viscosity term {\nu \Delta u}, the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities {X \approx Y} as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

— 1. From Newton’s laws to the Euler and Navier-Stokes equations —

For obvious reasons, the derivation of the equations of fluid mechanics is customarily presented in the three dimensional setting {d=3} (and sometimes also in the two-dimensional setting {d=2}), but actually the general dimensional case is not that much more difficult (and in some ways clearer, as it reveals that the derivation does not depend on any structures specific to three dimensions, such as the cross product), so for this derivation we will work in the spatial domain {{\bf R}^d} for arbitrary {d}. One could also work with bounded domains {\Omega \subset {\bf R}^d}, or periodic domains such as {{\bf R}^d/{\bf Z}^d}; the derivation is basically the same, thanks to the local nature of the forces of fluid mechanics, except at the boundary {\partial \Omega} where the situation is more subtle (and may be discussed in more detail in later posts). For sake of notational simplicity, we will assume that the time variable {t} ranges over the entire real line {{\bf R}}; again, since the laws of classical mechanics are local in time, one could just as well restrict {t} to some sub-interval of this line, such as {[0,T)} for some time {T>0}.

Our starting point is Newton’s second law {F=ma}, which (partially) describes the motion of a particle {P} of some fixed mass {m > 0} moving in the spatial domain {{\bf R}^d}. (Here we assume that the mass {m} of a particle does not vary with time; in particular, our discussion will be purely non-relativistic in nature, though it is possible to derive a relativistic version of the Euler equations by variants of the arguments given here.) We write Newton’s second law as the ordinary differential equation

\displaystyle m \frac{d^2}{dt^2} x(t) = F(t)

where {x: {\bf R} \rightarrow {\bf R}^d} is the trajectory of the particle (thus {x(t) \in {\bf R}^d} denotes the position of the particle at time {t}), and {F: {\bf R} \rightarrow {\bf R}^d} is the force applied to that particle. If we write {x_1,\dots,x_d: {\bf R} \rightarrow {\bf R}} for the {d} coordinates of the vector-valued function {x: {\bf R} \rightarrow {\bf R}^d}, and similarly write {F_1,\dots,F_d: {\bf R} \rightarrow {\bf R}} for the components of {F}, we therefore have

\displaystyle m \frac{d^2}{dt^2} x_i(t) = F_i(t)

where we adopt in this section the convention that the indices {i,j,k,l} are always understood to range from {1} to {d}.

If one has some collection {(P^{(a)})_{a \in A}} particles instead of a single particle, indexed by some set of labels {A} (e.g. the numbers from {1} to {N}, if there are a finite number {N} of particles; for unbounded domains such as {{\bf R}^d} one can also imagine situations in which {A} is infinite), then for each {a \in A}, the {a^{th}} particle {P^{(a)}} has some mass {m^{(a)} > 0}, some trajectory {x^{(a)}: {\bf R} \rightarrow {\bf R}^d} (with components {x^{(a)}_i: {\bf R} \rightarrow {\bf R}}), and some force applied {F^{(a)}: {\bf R} \rightarrow {\bf R}^d} (with components {F^{(a)}_i: {\bf R} \rightarrow {\bf R}}, we thus have the equation of motion

\displaystyle m^{(a)} \frac{d^2}{dt^2} x^{(a)}(t) = F^{(a)}(t)

or in components

\displaystyle m^{(a)} \frac{d^2}{dt^2} x^{(a)}_i(t) = F^{(a)}_i(t).

In this section we adopt the convention that the indices {a,b,c} are always understood to range over the set of labels {A} for the particles; in particular, their role should not be confused with those of the coordinate indices {i,j,k}.

Newton’s second law does not, by itself, completely specify the evolution of a system of {N} particles, because it does not specify exactly how the forces {F^{(a)}} depend on the current state of the system. For {N} particles, the current state at a given time {t} is given by the positions {x^{(b)}(t), b \in A} of all the particles {P^{(b)}}, as well as their velocities {\frac{d}{dt} x^{(b)}(t), b \in A}; we assume for simplicity that the particles have no further physical characteristics or internal structure of relevance that would require more state variables than these. (No higher derivatives need to be specified beyond the first, thanks to Newton’s second law. On the other hand, specifying position alone is insufficient to describe the state of the system; this was noticed as far back as Zeno in his paradox of the arrow, which in retrospect can be viewed as a precursor to Newton’s second law insofar as it demonstrated that the laws of motion needed to be second-order in time (in contrast, for instance, to Aristotlean physics, which was morally first-order in nature).) At a fundamental level, the dependency of forces on the current state are governed by the laws of physics for such forces; for instance, if the particles interact primarily through electrostatic forces, then one needs the laws of electrostatics to describe these forces. (In some cases, such as electromagnetic interactions, one cannot accurately model the situation purely in terms of interacting particles, and the equations of motion will then involve some additional mediating fields such as the electromagnetic field; but we will ignore this possibility in the current discussion for sake of simplicity.)

Fortunately, thanks to other laws of physics, and in particular Newton’s other two laws of motion, one can still obtain partial information about the forces {F^{(a)}} without having to analyse the fundamental laws producing these forces. For instance, Newton’s first law of motion (when combined with the second) tells us that a single particle {P^{(a)}} does not exert any force on itself; the net force on {P^{(a)}} only arises from interaction with other particles {P^{(b)}}, {b \neq a} (for this discussion we neglect external forces, such as gravity, although one could easily incorporate such forces into this discussion; see Exercise 3 below). We will assume that the only forces present are pair interactions coming from individual pairs of particles {(P^{(a)},P^{(b)})}; it is theoretically possible that one could have more complicated interactions between, say, a triplet {(P^{(a)},P^{(b)}, P^{(c)})} of particles that do not simply arise from the interactions between the three pairs {(P^{(a)},P^{(b)})}, {(P^{(a)},P^{(c)})}, {(P^{(b)},P^{(c)})}, but we will not consider this possibility here. We also assume that the net force on a particle is just the sum of all the interacting forces (i.e., the force addition law contains no nonlinear terms). This gives us a decomposition

\displaystyle F^{(a)} = \sum_{b: b \neq a} F^{(ab)}

of the net force {F^{(a)}: {\bf R} \rightarrow {\bf R}^d} on a particle {P^{(a)}} into the interaction force {F^{(ab)}: {\bf R} \rightarrow {\bf R}^d} exerted on {P^{(a)}} by another particle {P^{(b)}}. Thus the equation of motion is now

\displaystyle m^{(a)} \frac{d^2}{dt^2} x^{(a)}(t) = \sum_{b: b \neq a} F^{(ab)}(t) \ \ \ \ \ (1)


Of course, this description is still incomplete, because we have not specified exactly what the interaction forces {F^{(ab)}} are. But one important constraint on these forces is provided by Newton’s third law

\displaystyle F^{(ba)} = - F^{(ab)}. \ \ \ \ \ (2)


This already gives some restrictions on the possible dynamics. For instance, it implies (formally, at least) that the total momentum

\displaystyle \sum_a m^{(a)} \frac{d}{dt} x^{(a)} \ \ \ \ \ (3)


(which takes values in {{\bf R}^d}) is conserved in time:

\displaystyle \frac{d}{dt} \sum_a m^{(a)} \frac{d}{dt} x^{(a)} = \sum_a m^{(a)} \frac{d^2}{dt^2} x^{(a)}

\displaystyle = \sum_a \sum_{b \neq a} F^{(ab)}

\displaystyle = \frac{1}{2} \sum_{a,b: a \neq b} F^{(ab)} + F^{(ba)}

\displaystyle = 0.

We will also assume that the interaction force {F^{(ab)}} between a pair {P^{(a)}, P^{(b)}} of particles is parallel to the displacement {x^{(a)}-x^{(b)}} between the pair; in other words, we assume the torque {(x^{(a)} - x^{(b)}) \wedge F^{(ab)}} created by this force vanishes, thus

\displaystyle (x^{(a)} - x^{(b)}) \wedge F^{(ab)} = 0. \ \ \ \ \ (4)


Here {\wedge: {\bf R}^d \times {\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d} is the exterior product on {{\bf R}^d} (which in three dimensions can be transformed if one wishes to the cross product, but is well defined in all dimensions). Algebraically, {\wedge} is the universal alternating bilinear form on {{\bf R}^d}; in terms of the standard basis {e^1,\dots,e^d} of {{\bf R}^d}, the wedge product {x \wedge y} of two vectors {x = \sum_i x_i e^i}, {y = \sum_j y_j e^j} is given by

\displaystyle x \wedge y = \sum_{i,j} x_i y_j e_i \wedge e_j = \sum_{i,j: i < j} (x_i y_j - x_j y_i) (e_i \wedge e_j)

and the vector space {\bigwedge^2 {\bf R}^d} is the formal span of the basic wedge products {e_i \wedge e_j} for {i<j}.

One consequence of the absence (4) of torque is the conservation of total angular momentum

\displaystyle \sum_a m^{(a)} x^{(a)} \wedge \frac{d}{dt} x^{(a)} \ \ \ \ \ (5)


(around the spatial origin {x=0}). Indeed, we may calculate

\displaystyle \frac{d}{dt} \sum_a m^{(a)} x^{(a)} \wedge \frac{d}{dt} x^{(a)} = \sum_a m^{(a)} \frac{d}{dt} x^{(a)} \wedge \frac{d}{dt} x^{(a)} + m^{(a)} x^{(a)} \wedge \frac{d^2}{dt^2} x^{(a)}

\displaystyle = \sum_a x^{(a)} \wedge m^{(a)} \frac{d^2}{dt^2} x^{(a)}

\displaystyle = \sum_{a,b: a \neq b} x^{(a)} \wedge F^{(ab)}

\displaystyle = \frac{1}{2} \sum_{a,b: a \neq b} x^{(a)} \wedge F^{(ab)} + x^{(b)} \wedge F^{(ba)}

\displaystyle = \frac{1}{2} \sum_{a,b: a \neq b} (x^{(a)}-x^{(b)}) \wedge F^{(ab)}

\displaystyle =0.

Note that there is nothing special about the spatial origin {x=0}; the angular momentum

\displaystyle \sum_a m^{(a)} (x^{(a)} - x_0) \wedge \frac{d}{dt} x^{(a)}

around any other point {x_0 \in {\bf R}^d} is also conserved in time, as is clear from repeating the above calculation, or by combining the existing conservation laws for (3) and (5).

Now we pass from particle mechanics to continuum mechanics, by considering the limiting (bulk) behaviour of many particle systems as the number {N} of particles per unit volume goes to infinity. (In physically realistic scenarios, {N} will be comparable to Avagadro’s constant, which seems large enough that such limits should be a good approximation to the truth.) To make such limits, we assume that the distribution of particles (and various properties of these particles, such as their velocities and net interaction forces) are approximated in a certain bulk sense by continuous fields. For instance, the mass distribution of a system of particles {(P^{(a)})_{a \in A}} at a given time {t} is given by the discrete measure

\displaystyle \mu_{\mathrm{mass}}(t) := \sum_a m^{(a)} \delta_{x^{(a)}(t)} \ \ \ \ \ (6)


where {\delta_{x_0}} denotes the Dirac measure (or distribution). We will assume that at each time {t}, we have a “bulk” approximation

\displaystyle \mu_{\mathrm{mass}}(t) \approx \rho(t,x)\ dx \ \ \ \ \ (7)


by some continuous measure {\rho(t,x)\ dx}, where the density function {\rho: {\bf R} \times {\bf R}^d \rightarrow {\bf R}^+} is some smooth function of time and space, and {dx} denotes Lebesgue measure on {{\bf R}^d}. What does bulk approximation mean? One could work with various notions of approximation, but we will adopt the viewpoint of the theory of distributions (as reviewed for instance in these old lecture notes of mine) and consider approximation against test functions in spacetime, thus we assume that

\displaystyle \int_{\bf R} \int_{{\bf R}^d} \psi(t,x)\ d \mu_{\mathrm{mass}}(t)(x)\ dt \approx \int_{\bf R} \int_{{\bf R}^d} \psi(t,x) \rho(t,x)\ dx dt. \ \ \ \ \ (8)


for all spacetime test functions {\psi \in C^\infty_c( {\bf R} \times {\bf R}^d )}. (One could also work with purely spatial test functions at each fixed time {t}, or work with “infinitesimal” paralleopipeds or similar domains instead of using test functions; the arguments look slightly different when doing so, but the final equations of motion obtained are the same in all cases. See Exercise 1 for an example of this.) We will be deliberately vague as to what {\approx} means, other than to say that the approximation should only be considered accurate (in the sense that it becomes exact in the limit {N \rightarrow \infty}) when the test function {\psi} exists at “macroscopic” (or “bulk”) spatial scales, in particular it should not oscillate with a wavelength that goes to zero as {N} goes to infinity. (For instance, one certainly expects the approximation (7) to break down if one tries to test it on scales comparable to the mean spacing between particles.)

Applying (6) and evaluating the delta integrations, the approximation (8) becomes

\displaystyle \int_{\bf R} \sum_a m^{(a)} \psi(t,x^{(a)}(t))\ dt \approx \int_{\bf R} \int_{{\bf R}^d} \psi(t,x) \rho(t,x)\ dx dt. \ \ \ \ \ (9)


In a physical liquid, particles in a given small region of space tend to move at nearly identical velocities (as opposed to gases, where Brownian motion effects lead one to expect velocities to be distributed stochastically, for instance in a Maxwellian distribution). To model this, we assume that there exists a smooth velocity field {u: {\bf R} \times {\bf R}^d \rightarrow {\bf R}^d} for which we have the approximation

\displaystyle \frac{d}{dt} x^{(a)}(t) \approx u( t, x^{(a)}(t) ) \ \ \ \ \ (10)


for all particles {P^{(a)}} and all times {t}. (When stochastic effects are significant, the continuum limit of the fluid will be the Boltzmann equations rather than the Euler or Navier-Stokes equations; however the latter equations can still emerge as an approximation of the former in various regimes. See also Remark 7 below.)

Implicit in our model of many-particle interactions is the conservation of mass: each particle {P^{(a)}} has a fixed mass {m^{(a)}}, and no particle is created or destroyed by the evolution. This conservation of mass, when combined with the approximations (9) and (10), gives rise to a certain differential equation relating the density function {\rho} and the velocity field {u}. To see this, first observe from the fundamental theorem of calculus in time that

\displaystyle \int_{\bf R} \frac{d}{dt}(\int_{{\bf R}^d} \psi(t,x) \ d\mu_{\mathrm{mass}}(t)(x)) dt = 0 \ \ \ \ \ (11)


or equivalently (after applying (6) and evaluating the delta integrations)

\displaystyle \int_{\bf R} \frac{d}{dt} \sum_a m^{(a)} \psi(t,x^{(a)}(t))\ dt = 0 \ \ \ \ \ (12)


for any test function {\psi} (note that we make {\psi} compactly supported in both space and time). By the chain rule and (10), we have

\displaystyle \frac{d}{dt} \psi(t,x^{(a)}(t)) = (\partial_t \psi)(t,x^{(a)}(t)) + \frac{d}{dt} x^{(a)}(t) \cdot (\nabla \psi)(t, x^{(a)}(t))

\displaystyle \approx (\partial_t \psi + u \cdot \nabla \psi)(t, x^{(a)}(t))

for any particle {P^{(a)}} and any time {t}, where {\partial_t} denotes the partial derivative with respect to the time variable, {\nabla = (\partial_i)_{i=1,\dots,d}} denotes the spatial gradient (with {\partial_i} denoting partial differentiation in the {x_i} coordinate), and {\cdot} denotes the Euclidean inner product. (One could also use notation here that avoids explicit use of Euclidean structure, for instance writing {d\psi(u)} in place of {u \cdot \nabla \psi}, but it is traditional to use Euclidean notation in fluid mechanics.) This particular combination of derivatives {\partial_t + u \cdot \nabla} appears so often in the subject that we will give it a special name, the material derivative {D_t}:

\displaystyle D_t := \partial_t + u \cdot \nabla.

We have thus obtained the approximation

\displaystyle \frac{d}{dt} \psi(t,x^{(a)}(t)) \approx (D_t \psi)(t,x^{(a)}(t)), \ \ \ \ \ (13)


which on insertion back into (12) yields

\displaystyle \int_{\bf R} \sum_a m^{(a)} (D_t \psi)(t,x^{(a)}(t))\ dt \approx 0.

The material derivative {D_t \psi} of a test function {\psi} will still be a test function {\psi}. Therefore we can use our field approximation (9) to conclude that

\displaystyle \int_{\bf R} \int_{{\bf R}^d} (D_t \psi)(t,x) \rho(t,x)\ dx dt \approx 0

for all test functions {\psi}. The left-hand side consists entirely of the limiting fields, so on taking limits {N \rightarrow \infty} we should therefore have the exact equation

\displaystyle \int_{\bf R} \int_{{\bf R}^d} (D_t \psi)(t,x) \rho(t,x)\ dx dt = 0.

We can integrate by parts to obtain

\displaystyle \int_{\bf R} \int_{{\bf R}^d} \psi(t,x) (D_t^* \rho)(t,x)\ dx dt = 0

where {D_t^*} is the adjoint material derivative

\displaystyle D_t^* \rho := - \partial_t \rho - \nabla \cdot (\rho u).

Since the test function {\psi} was arbitrary, we conclude the continuity equation

\displaystyle D_t^* \rho = 0 \ \ \ \ \ (14)


or equivalently (and more customarily)

\displaystyle \partial_t \rho + \nabla \cdot (\rho u) = 0. \ \ \ \ \ (15)


We will eventually specialise to the case of incompressible fluids in which the density {\rho} is a non-zero constant in both space and time. (In some texts, one uses incompressibility to refer only to constancy of {\rho} along trajectories: {D_t \rho = 0}. But in this course we always use incompressibility to refer to homogeneous incompressibility, in which {\rho} is constant in both space and time.) In this incompressible case, the continuity equation (15) simplifies to a divergence-free condition on the velocity:

\displaystyle \nabla \cdot u = 0. \ \ \ \ \ (16)


For now, though, we allow for the possibility of compressibility by allowing {\rho} to vary in space and time. We also note that by integrating (15) in space, we formally obtain conservation of the total mass

\displaystyle \int_{{\bf R}^d} \rho(t,x)\ dx

since on differentiating under the integral sign and then integrating by parts we formally have

\displaystyle \partial_t \int_{{\bf R}^d} \rho(t,x)\ dx = 0.

Of course, this conservation law degenerates in the incompressible case, since the total mass {\int_{{\bf R}^d} \rho(t,x)\ dx} is manifestly an infinite constant. (In periodic settings, for instance if one is working in {{\bf R}^d/{\bf Z}^d} instead of {{\bf R}^d}, the total mass {\int_{{\bf R}^d/{\bf Z}^d} \rho(t,x)\ dx} is manifestly a finite constant in the incompressible case.)

Exercise 1

  • (i) Assume that the spacetime bulk approximation (8) is replaced by the spatial bulk approximation

    \displaystyle \int_{{\bf R}^d} \psi(x)\ d \mu_{\mathrm{mass}}(t)(x) \approx \int_{{\bf R}^d} \psi(x) \rho(t,x)\ dx

    for any time {t} and any spatial test function {\psi \in C^\infty_c({\bf R}^d)}. Give an alternate heuristic derivation of the continuity equation (15) in this case, without using any integration in time. (Feel free to differentiate under the integral sign.)

  • (ii) Assume that the spacetime bulk approximation (8) is replaced by the spatial bulk approximation

    \displaystyle \int_\Omega\ d\mu_{\mathrm{mass}}(t)(x) \approx \int_\Omega \rho(t,x)\ dx

    for any time {t} and any “reasonable” set {\Omega} (e.g., a rectangular box). Give an alternate heuristic derivation of the continuity equation (15) in this case. (Feel free to introduce infinitesimals and argue non-rigorously with them.)

We can repeat the above analysis with the mass distribution (6) replaced by the momentum distribution

\displaystyle \mu_{\mathrm{momentum}} := \sum_a m^{(a)} (\frac{d}{dt} x^{(a)}(t)) \delta_{x^{(a)}(t)}, \ \ \ \ \ (17)


thus we now wish to exploit the conservation of momentum rather than conservation of mass. The measure {\mu_{\mathrm{momentum}}} is a vector-valued measure, or equivalently a vector {\mu_{\mathrm{momentum}} = (\mu_{\mathrm{momentum},1},\dots,\mu_{\mathrm{momentum},d})} of scalar measures

\displaystyle \mu_{\mathrm{momentum},i} := \sum_a m^{(a)} (\frac{d}{dt} x^{(a)}_i(t)) \delta_{x^{(a)}(t)}.

Instead of starting with the identity (11), we begin with the momentum counterpart

\displaystyle \int_{\bf R} \frac{d}{dt}(\int_{{\bf R}^d} \psi(t,x) \ d\mu_{\mathrm{momentum}}(t)(x)) dt = 0

which on applying (17) and evaluating the delta integrations becomes

\displaystyle \int_{\bf R} \frac{d}{dt} \sum_a m^{(a)} (\frac{d}{dt} x^{(a)}(t)) \psi(t,x^{(a)}(t))\ dt = 0.

Using the product rule and (13), the left-hand side is approximately

\displaystyle \int_{\bf R} \sum_a m^{(a)} (\frac{d^2}{dt^2} x^{(a)}(t)) \psi(t,x^{(a)}(t)) + m^{(a)} \frac{d}{dt} x^{(a)}(t) D_t \psi(t, x^{(a)}(t))\ dt. \ \ \ \ \ (18)


Applying (1) and (10), we conclude

\displaystyle \int_{\bf R} \sum_{a,b: b \neq a} F^{(ab)}(t) \psi(t,x^{(a)}(t)) + m^{(a)} (u D_t \psi)(t, x^{(a)}(t))\ dt \approx 0.

We can evaluate the second term using (9) to obtain

\displaystyle \int_{\bf R} \sum_{a,b: b \neq a} F^{(ab)}(t) \psi(t,x^{(a)}(t))\ dt + \int_{\bf R} \int_{{\bf R}^d} (u D_t \psi)(t, x) \rho(t,x)\ dx dt \approx 0.

What about the first term? We can use symmetry and Newton’s third law (2) to write

\displaystyle \sum_{a,b: b \neq a} F^{(ab)}(t) \psi(t,x^{(a)}(t))

\displaystyle = \frac{1}{2} \sum_{a,b: b \neq a} F^{(ab)}(t) \psi(t,x^{(a)}(t)) + F^{(ba)}(t) \psi(t,x^{(b)}(t))

\displaystyle = \frac{1}{2} \sum_{a,b: b \neq a} F^{(ab)}(t) ( \psi(t,x^{(a)}(t)) -\psi(t,x^{(b)}(t)) ).

Now we make the further physical assumption that the only significant interactions between particles {P^{(a)}, P^{(b)}} are short-range interactions, in which {x^{(a)}(t)} and {x^{(b)}(t)} are very close to each other. With this hypothesis, it is then plausible to make the Taylor approximation

\displaystyle \psi(t,x^{(a)}(t)) -\psi(t,x^{(b)}(t)) \approx (x^{(a)}(t)-x^{(b)}(t)) \cdot \nabla \psi( t, x^{(a)}(t) ).

We thus have

\displaystyle \frac{1}{2} \int_{\bf R} \sum_{a,b: b \neq a} F^{(ab)}(t) (x^{(a)}(t)-x^{(b)}(t)) \cdot \nabla \psi( t, x^{(a)}(t) ) \ dt

\displaystyle + \int_{\bf R} \int_{{\bf R}^d} (u D_t \psi)(t, x) \rho(t,x)\ dx dt \approx 0.

We write this in coordinates as

\displaystyle -\int_{\bf R} \sum_a \Sigma^{(a)}_{ij}(t) \partial_j \psi(t, x^{(a)}(t))\ dt \ \ \ \ \ (19)


\displaystyle + \int_{\bf R} \int_{{\bf R}^d} (u_i D_t \psi)(t,x) \rho(t,x)\ dx dt \approx 0

for {i=1,\dots,d}, where we use the Einstein convention that indices {i,j,k} are implicitly summed over {1,\dots,d} if they are repeated in an expression, and the stress {\Sigma^{(a)}(t)} on the particle {P^{(a)}} at time {t} is the rank {2}-tensor defined by the formula

\displaystyle \Sigma^{(a)}_{ij}(t) := \frac{1}{2} \sum_{b: b \neq a} F^{(ab)}_i(t) (x^{(b)}_j(t) - x^{(a)}_j(t)) \ \ \ \ \ (20)


where {F^{(ab)}_1,\dots,F^{(ab)}_d} denote the components of {F^{(ab)}}. Recall from the torque-free hypothesis (4) that {F^{(ab)}} is parallel to {x^{(b)}-x^{(a)}}, thus we could write {F^{(ab)}(t) = f^{(ab)}(t) (x^{(b)}(t)-x^{(a)}(t))} for some scalar {f^{(ab)}(t)}. Thus we have

\displaystyle \Sigma^{(a)}_{ij}(t) = \frac{1}{2} \sum_{b: b \neq a} f^{(ab)}(t) (x^{(b)}_i(t) - x^{(a)}_i(t)) (x^{(b)}_j(t) - x^{(a)}_j(t)).

In particular, we see that the torque-free hypothesis makes the stress tensor symmetric:

\displaystyle \Sigma^{(a)}_{ij} = \Sigma^{(a)}_{ji}.

To proceed further, we make the assumption (similar to (9)) that the stress tensor {\Sigma^{(a)}} (or more precisely, the measure {\sum_a \Sigma^{(a)} \delta_{x^{(a)}(t)}}) is approximated in the bulk by a smooth tensor field {\sigma: {\bf R} \times {\bf R}^d \rightarrow {\bf R}^d \otimes {\bf R}^d} (with components {\sigma_{ij}: {\bf R} \times {\bf R}^d \rightarrow {\bf R}} for {i,j=1,\dots,d}), in the sense that

\displaystyle \int_{\bf R} \sum_a \Sigma^{(a)}_{ij}(t) \psi(t,x^{(a)}(t))\ dt \approx \int_{\bf R} \int_{{\bf R}^d} \psi(t,x) \sigma_{ij}(t,x)\ dx dt. \ \ \ \ \ (21)


The tensor {\sigma_{ij}} is known as the Cauchy stress tensor. Since {\Sigma^{(a)}} is symmetric in {i,j}, the right-hand side of (21) is also symmetric in {i,j}, which by the arbitrariness of {\psi} implies that the tensor {\sigma} is symmetric also:

\displaystyle \sigma_{ij} = \sigma_{ji}.

This is also known as Cauchy’s second law of motion.

Inserting (19) into (21), we arrive at

\displaystyle \int_{\bf R} \int_{{\bf R}^d} (-\sigma_{ij} \partial_j \psi)(t, x)\ dt dx + \int_{\bf R} \int_{{\bf R}^d} (u_i D_t \psi)(t,x) \rho(t,x)\ dx dt \approx 0

for any test function {\psi} and {i=1,\dots,d}. Taking limits as {N \rightarrow \infty} we obtain the exact equation

\displaystyle \int_{\bf R} \int_{{\bf R}^d} (-\sigma_{ij} \partial_j \psi)(t, x)\ dt dx + \int_{\bf R} \int_{{\bf R}^d} (u_i D_t \psi)(t,x) \rho(t,x)\ dx dt = 0

and then integrating by parts we have

\displaystyle \int_{\bf R} \psi(t,x) (\partial_j \sigma_{ij} + D^*_t (\rho u_i))(t,x)\ dx dt = 0;

as the test function {\psi} is arbitrary, we conclude the Cauchy momentum equation

\displaystyle \partial_j \sigma_{ij} + D^*_t (\rho u_i) = 0.

From the Leibniz rule (in an adjoint form) we see that

\displaystyle D^*_t (\rho \partial_j u_i) = (D^*_t \rho) \partial_j u_i - \rho D_t( u_i);

using (14) we can thus also write the Cauchy momentum equation in the more conventional form

\displaystyle D_t u_i = \frac{1}{\rho} \partial_j \sigma_{ij}. \ \ \ \ \ (22)


(This is a dynamical version of Cauchy’s first law of motion.)

To summarise so far, the unknown density field {\rho} and velocity field {u} obey two equations of motion: the continuity equation (15) (or (14)) and the momentum equation (22). As the former is a scalar equation and the latter is a vector equation, this is {d+1} equations for {d+1} unknowns, which looks good – so long as the stress tensor {\sigma} is known. However, the stress tensor is not given to us in advance, and so further physical assumptions on the underlying fluid are needed to derive additional equations to yield a more complete set of equations of motion.

One of the simplest such assumptions is isotropy – that, in the vicinity of a given particle {P^{(a)}} at a given point in time, the distribution of the nearby particles {P^{(b)}} (and of the forces {F^{(b)}}) is effectively rotationally symmetric, in the sense that rotation of the fluid around that particle does not significantly affect the net stresses acting on the particle. To give more mathematical meaning to this assumption, let us fix {a} and {t}, and let us set {x^{(a)}(t)} to be the spatial origin {0} for simplicity. In particular the stress tensor {\Sigma^{(a)}_{ij}(t)} now simplifies a little to

\displaystyle \Sigma^{(a)}_{ij}(t) = \frac{1}{2} \sum_{b: b \neq a} f^{(ab)}(t) x^{(b)}_i(t) x^{(b)}_j(t);

viewing this tensor as a {d \times d} symmetric matrix, we can also write

\displaystyle \Sigma^{(a)}(t) = \frac{1}{2} \sum_{b: b \neq a} f^{(ab)}(t) x^{(b)}(t) x^{(b)}(t)^T

where we now think of the vector {x^{(b)}(t)} as a {d}-dimensional column vector, and {x^T} denotes the transpose of {x}.

Imagine that we rotate the fluid around this spatial origin using some rotation matrix {U \in SO(d)}, thus replacing {x^{(b)}(t)} with {U x^{(b)}(t)} (and hence {x^{(b)}(t)^T} is replaced with {x^{(b)}(t)^T U^T}). If we assume that interaction forces are rotationally symmetric, the interaction scalars {f^{(ab)}} should not be affected by this rotation. As such, {\Sigma^{(a)}(t)} would be replaced with {U \Sigma^{(a)}(t) U^T}. If we assume isotropy, though, this rotated fluid should generate the same stress as the original fluid, thus we have

\displaystyle \Sigma^{(a)}(t) =U \Sigma^{(a)}(t) U^T

for all rotation matrices {U}, that is to say that {\Sigma^{(a)}} commutes with all rotations. This implies that all eigenspaces of {\Sigma^{(a)}} are rotation-invariant, but the only rotation-invariant subspaces of {{\bf R}^d} are {\{0\}} and {{\bf R}^d}. Thus the spectral decomposition of the symmetric matrix {\Sigma^{(a)}} only involves a single eigenspace {{\bf R}^d}, or equivalently {\Sigma^{(a)}} is a multiple of the identity. In coordinates, we have

\displaystyle \Sigma^{(a)}_{ij}(t) = - p^{(a)}(t) \delta_{ij}

for some scalar {p^{(a)}} (known as the pressure exerted on the particle {P^{(a)}}), where {\delta_{ij}} denotes the Kronecker delta (the negative sign here is for compatibility with other physical definitions of pressure). Passing from the individual stresses {\Sigma^{(a)}} to the stress field {\sigma}, we see that {\sigma_{ij}} is also rotationally invariant and thus is also a multiple of the identity, thus

\displaystyle \sigma_{ij}(t,x) = -p(t,x) \delta_{ij} \ \ \ \ \ (23)


for some field {p: {\bf R} \times {\bf R}^d \rightarrow {\bf R}}, which we call the pressure field, and which we assume to be smooth. The equations (15), (22) now become the Euler equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \frac{1}{\rho} \nabla p \ \ \ \ \ (24)


\displaystyle \partial_t \rho + \nabla \cdot (\rho u) = 0. \ \ \ \ \ (25)


This is still an underdetermined system, being {d+1} equations for {d+2} unknowns (two scalar fields {\rho,p} and one vector field {u}). But if we assume incompressibility (normalising {\rho=1} for simplicity), we obtain the incompressible Euler equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p \ \ \ \ \ (26)


\displaystyle \nabla \cdot u = 0.

Without incompressibility, one can still reach a determined system of equations if one postulates a relationship (known as an equation of state) between the pressure {p} and the density {\rho} (in some physical situations one also needs to introduce further thermodynamic variables, such as temperature or entropy, which also influence this relationship and obey their own equation of motion). Alternatively, one can proceed using an analysis of the energy conservation law, similar to how (15) arose from conservation of mass and (22) arose from conservation of momentum, though at the end of the day one would still need an equation of state connecting energy density to other thermodynamic variables. For details, see Exercise 4 below.

Now we consider relaxing the assumption of isotropy in the stress. Many fluids, in addition to experiencing isotropic stress coming from a scalar pressure field, also experience additional shear stress associated to strain in the fluid – distortion in the shape of the fluid arising from fluctuations in the velocity field. One can thus postulate a generalisation to (23) of the form

\displaystyle \sigma_{ij}(t,x) = -p(t,x) \delta_{ij} + \tau_{ij}( u(t,x), \nabla u(t,x) ) \ \ \ \ \ (27)


where {\tau_{ij}: {\bf R}^d \times {\bf R}^{d^2} \rightarrow {\bf R}} is some function of the velocity {u(t,x) \in {\bf R}^d} at the point {(t,x)} in spacetime, as well as the first derivatives {\nabla u = (\partial_i u_j)_{1 \leq i,j \leq d} \in {\bf R}^{d^2}}, that arises from changes in shape. (Here, we assume here that the response to stress is autonomous – it does not depend directly on time, location, or other statistics of the fluid, such as pressure, except insofar as those variables are related to the quantities {u(t,x)} and {\nabla u(t,x)}. One can of course consider more complex models in which there is a dependence on such quantities or on higher derivatives {\nabla^2 u, \nabla^3 u}, etc. of the velocity, but we will not do so here.)

It is a general principle in physics that functional relationships between physical quantities, such as the one in (27), can be very heavily constrained by requiring the relationship to be invariant with respect to various physically natural symmetries. This is certainly the case for (27). First of all, we can impose Galilean invariance: if one changes to a different inertial frame of reference, thus adding a constant vector {u_0} to the velocity field {u(t,x)} (and not affecting the gradient {\nabla u} at all), this should not actually introduce any new stresses on the fluid. This leads to the postulate

\displaystyle \tau_{ij}( u + u_0, \nabla u ) = \tau_{ij}( u, \nabla u )

for any {u_0}, and hence {\tau_{ij}} should not actually depend on the velocity {u(t,x)} and should only depend on the first derivative. Thus we now write {\tau_{ij}( \nabla u(t,x) )} for {\tau_{ij}(u(t,x), \nabla u(t,x))} (thus {\tau_{ij}} is now a function from {{\bf R}^{d^2}} to {{\bf R}}).

If {u=0} then there should be no additional shear stress, so we should have {\tau_{ij}(0) = 0}. We now make a key assumption that the fluid is a Newtonian fluid, in that the linear term in the Taylor expansion of {\tau} dominates, or in other words we assume that {\tau_{ij}} is a linear function of {\nabla u}. (One can certainly study the mechanics of non-Newtonian fluids as well, in which {\tau} depends nonlinearly on {\nabla u}, or even on past values of {\nabla u}, but these are no longer governed by the Navier-Stokes equations and will not be considered further here.) One can also think of {\tau} as a linear map from the space of {n \times n} matrices (which is the space where {\nabla u(t,x)} takes values) to the space of {n \times n} matrices. In coefficients, this means we are postulating a relationship of the form

\displaystyle \tau_{ij} = \mu_{ijkl} \partial_k u_l(t,x)

for some constants {\mu_{ijkl}} (recall we are using the Einstein summation conventions). This looks like a lot of unspecified constants, but again we can use physical principles to impose significant constraints. Firstly, because stress is symmetric in {i} and {j}, the coefficients {\mu_{ijkl}} must also be symmetric in {i} and {j}: {\mu_{ijkl} = \mu_{jikl}}. Next, let us for simplicity set {x} to be the spatial origin {x=0}, and consider a rotating velocity field of the form

\displaystyle u(t,x) = A x

for some constant-coefficient anti-symmetric matrix {A}, or in coordinates

\displaystyle u_i(t,x) = A_{ij} x_j.

The derivative field {\partial_i u_j} is then just the anti-symmetric {A_{ji} = -A_{ij}}. This corresponds to fluids moving according to the rotating trajectories {x(t) = \exp(tA) x(0)}, where {\exp} denotes the matrix exponential. (For instance, in two dimensions, the velocity field {u(t, x_1,x_2) = (-x_2, x_1)} gives rise to trajectories {x_1(t) = \cos(t) x_1(0) - \sin(t) x_2(0)}, {x_2(t) = \sin(t) x_1(0) + \cos(t) x_2(0)} corresponding to counter-clockwise rotation around the origin.)

Exercise 2 When {A} is anti-symmetric, show that the matrix {\exp( tA )} is orthogonal for all {t}, thus

\displaystyle \exp( tA ) \exp( tA )^T = \exp(tA)^T \exp(tA) = 1

for all {t \in {\bf R}}, where {1} denotes the identity matrix. (Hint: differentiate the expressions appearing in the above equation with respect to time.) Also show that {\exp(tA)} is a rotation matrix, that is to say an orthogonal matrix of determinant {+1}.

As {\exp(tA)} is orthogonal, it describes a rigid motion. Thus this rotating motion does not change the shape of the fluid, and so should not give rise to any shear stress. That is to say, the linear map {\tau} should vanish when applied to an anti-symmetric matrix, or in coordinates {\mu_{ijkl} = \mu_{ijlk}}. Another way of saying this is that {\tau(\nabla u)} only depends on {\nabla u} through its symmetric component {\frac{1}{2} (\nabla u + (\nabla u)^T)}, known as the rate-of-strain tensor (or the deformation tensor). (The anti-symmetric part {\frac{1}{2} (\nabla u - (\nabla u)^T)} does not cause any strain, but instead measures the infinitesimal rotation the fluid; up to trivial factors such as {\frac{1}{2}}, it is essentially the vorticity of the fluid, which will be an important field to study in subsequent notes.)

To constrain the behaviour of {\tau} further, we introduce a hypothesis of rotational (and reflectional) symmetry. If one rotates the fluid by a rotation matrix {U \in SO(d)} around the origin, then if the original fluid had a velocity of {u(t,x)} at {(t,x)}, the new fluid should have a velocity of {U u(t,x)} at {(t,Ux)}, thus the new velocity field {\tilde u} is given by the formula

\displaystyle \tilde u(t,x) = U u(t, U^T x)

and the derivative {\nabla \tilde U} of this velocity field at the origin {0} is then related to the original derivative {\nabla u} by the formula

\displaystyle \nabla \tilde u(t,0) = U \nabla u(u, 0) U^T.

Meanwhile, as discussed in the analysis of the isotropic case, the new stress {\tilde \tau} at the origin {0} is related to the original stress {\tau} by the same relation:

\displaystyle \tilde \tau = U \tau U^T.

This means that the linear map {\tau} is rotationally equivariant in the sense that

\displaystyle \tau( U A U^T ) = U \tau(A) U^T \ \ \ \ \ (28)


for any matrix {A}. Actually the same argument also applies for reflections, so one could also take {U} in the orthogonal group {O(d)} rather than the special orthogonal group.

This severely constrains the possible behaviour of {\tau}. First consider applying {\tau} to the rank {1} matrix {e_1 e_1^T}, where {e_1} is the first basis (column) vector of {{\bf R}^d}. The equivariant property (28) then implies that {\tau(e_1 e_1^T)} is invariant with respect to any rotation or reflection of the remaining coordinates {x_2,\dots,x_d}. As in the isotropic analysis, this implies that the lower right {d-1 \times d-1} minor of {\tau(e_1 e_1^T)} is a multiple of the identity; when {d \geq 3} it also implies that the upper right entries {\tau_{1i}(e_1 e_1^T)} or lower left entries {\tau_{i1}(e_1 e_1^T)} for {i \neq 1} also vanish (one can also obtain this by applying (28) with {U} the reflection in the {x_1} variable). Thus we have

\displaystyle \tau_{ij}(e_1 e_1^T) = 2 \mu \delta_{1i} \delta_{1j} + \lambda \delta_{ij}

for some constant scalars {\mu,\lambda} (known as the dynamic viscosity and second viscosity respectively); in matrix form we have

\displaystyle \tau(e_1 e_1^T) = 2 \mu e_1 e_1^T + \lambda I

where {I} is the identity matrix. Applying equivariance, we conclude that

\displaystyle \tau(v v^T) = 2 \mu v v^T + \lambda I

for any unit vector {v}; applying the spectral theorem this implies that

\displaystyle \tau(A) = 2 \mu A + \lambda \mathrm{tr}(A) I

for any symmetric matrix {A}. Since {\tau(A)} is already known to vanish for non-symmetric matrices, upon decomposing a general matrix {A} into the symmetric part {\frac{1}{2} (A + A^T)} and anti-symmetric part {\frac{1}{2} (A - A^T)} (with the latter having trace zero) we conclude that

\displaystyle \tau(A) = \mu (A + A^T) + \lambda \mathrm{tr}(A) I

for an arbitrary matrix {A}. In particular

\displaystyle \tau(\nabla u) = \mu ( \nabla u + (\nabla u)^T ) + \lambda (\nabla \cdot u) I.

In the incompressible case {\nabla \cdot u = 0}, the second term vanishes, and this equation simply says that the shear stress is proportional to the (rate of) strain.

Inserting the above law back into (27) and then (15), (22), we obtain the Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \frac{1}{\rho} \nabla p + \frac{\mu}{\rho} (\Delta u + \nabla (\nabla \cdot u) ) + \frac{\lambda}{\rho} \nabla(\nabla \cdot u)

\displaystyle \partial_t \rho + \nabla \cdot (\rho u) = 0,

where {\Delta u = \partial_i \partial_i u} is the spatial Laplacian. In the incompressible case, setting {\nu := \frac{\mu}{\rho}} (this ratio is known as the kinematic viscosity), and also normalising {\rho=1} for simplicity, this simplifies to the incompressible Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p + \nu \Delta u

\displaystyle \nabla \cdot u = 0.

Of course, the incompressible Euler equations arise as the special case when the viscosity {\nu} is set to zero. For physical fluids, {\nu} is positive, though it can be so small that the Euler equations serve as a good approximation. Negative values of {\mu} are mathematically possible, but physically unrealistic for several reasons (for instance, the total energy of the fluid would increase over time, rather than dissipate over time) and also the equations become mathematically quite ill-behaved in this case (as they carry essentially all of the pathologies of the backwards heat equation).

Exercise 3 In a constant gravity field oriented in the direction {-e_d}, each particle {P^{(a)}} will experience an external gravitational force {- m^{(a)} g e_d}, where {g>0} is a fixed constant. Argue that the incompressible Euler equations in the presence of such a gravitational field should be modified to

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p - g e_d

\displaystyle \nabla \cdot u = 0

and the incompressible Navier-Stokes equations should similarly be modified to

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p + \nu \Delta u - g e_d

\displaystyle \nabla \cdot u = 0.

Exercise 4

Remark 5 In the literature, the relationship between the functional relationship (33) and the equation of state (34) is usually derived instead using the laws of thermodynamics. However, as the above exercise demonstrates, it is also possible to recover this relationship from first principles. In the case of an (isoentropic) ideal gas, the laws of thermodynamics can be used to establish an equation of state of the form {p = C \rho^\gamma} for some constants {C,\gamma} with {\gamma \neq 1}, as well as the corresponding functional relationship {e = \frac{1}{\gamma - 1} \rho^{\gamma-1} = \frac{p}{(\gamma-1) \rho}}, so that the internal energy density is {\frac{p}{\gamma-1}}. This is of course consistent with (33) and (34), after choosing {F} appropriately.

Exercise 6 Suppose one has a (compressible or incompressible) fluid obeying the velocity approximation (10), the stress approximation (21), the isotropy condition (23), and the torque free condition (4). Assume also that all interactions are short range. Derive the angular momentum equation

\displaystyle D_t ( x_i u_j - x_j u_i ) = \frac{1}{\rho} ( \partial_i (x_j p) - \partial_j (x_i p ) )

for {i,j= 1,\dots,d} in two different ways:

  • (a) From a heuristic analysis of the angular momentum distribution

    \displaystyle \mu_{\mathrm{ang}, ij} := \sum_a (\frac{1}{2} m^{(a)} ( x^{(a)}_i \frac{d}{dt} x^{(a)}_j(t) - x^{(a)}_j \frac{d}{dt} x^{(a)}_i(t)) \delta_{x^{(a)}(t)}

    analogous to how the mass, momentum, and energy distributions were analysed previously; and

  • (b) Directly from the system (24), (25).

(c) Conclude in particular that the total angular momentum

\displaystyle \int_{{\bf R}^d} (x_i u_j(t,x) - x_j u_i(t,x)) \rho(t,x)\ dx

is formally conserved in time.

Remark 7 In this set of notes we made the rather strong assumption (10) that the velocities of particles could be approximated by a smooth function {u(t,x)} of their time and position. In practice, most fluids will violate this hypothesis due to thermal fluctuations in the velocity. However, one can still proceed with a similar derivation assuming that the velocities behave on the average like a smooth function {u(t,x)}, in the sense that

\displaystyle \int_{\bf R} \sum_a m^{(a)} (\frac{d}{dt} x^{(a)}(t)) \psi(t,x^{(a)}(t))\ dt \ \ \ \ \ (35)


\displaystyle \approx \int_{\bf R} \int_{{\bf R}^d} \psi(t,x) u(t,x) \rho(t,x)\ dx dt

for any test function {\rho}. The approximation (13) now has to be replaced by the more general

\displaystyle \frac{d}{dt} \psi(t,x^{(a)}(t)) \approx (D_t \psi)(t,x^{(a)}(t)) + w^{(a)}(t) \cdot \nabla \psi(t, x^{(a)}(t))


\displaystyle w^{(a)}(t) := \frac{d}{dt} x^{(a)}(t) - u(t, x^{(a)}(t))

is the deviation of the particle velocity from its mean. The second term in (18) now needs to be replaced by the more complicated expression

\displaystyle \int_{\bf R} \sum_a m^{(a)} (u(t,x^{(a)}(t)) + w^{(a)}(t)) \times \ \ \ \ \ (36)


\displaystyle \times (D_t \psi(t, x^{(a)}(t)) + w^{(a)}(t) \cdot \nabla \psi(t, x^{(a)}(t))\ dt.

From (35), (9) one has

\displaystyle \int_{\bf R} \sum_a m^{(a)} w^{(a)}(t) \psi(t,x^{(a)}(t))\ dt \approx 0

for any test function {\psi}. This allows us to heuristically drop the cross terms from (36) involving a single factor of {w^{(a)}}, and simplify this expression (up to negligible errors) as

\displaystyle \int_{\bf R} \sum_a m^{(a)} D_t \psi(t, x^{(a)}(t)\ dt

\displaystyle + \int_{\bf R} \sum_a m^{(a)} w^{(a)}(t) (w^{(a)}(t) \cdot \nabla) \psi(t, x^{(a)}(t))\ dt.

Repeating the analysis after (18), one eventually arrives again at (19), except that one has to add an additional term

\displaystyle - w^{(a)}_i(t) w^{(a)}_j(t)

to the stress tensor {\Sigma^{(a)}_{ij}(t)} of a single particle {P^{(a)}}. However, this term is still symmetric, and one can still continue most of the heuristic analysis in this post after suitable adjustments to the various physical hypotheses (for instance, assuming some form of the molecular chaos hypothesis to be able to neglect some correlation terms between {w^{(a)}} and other quantities). We leave the details to the interested reader.

Matt von HippelThe Amplitudes Assembly Line

In the amplitudes field, we calculate probabilities for particles to interact.

We’re trying to improve on the old-school way of doing this, a kind of standard assembly line. First, you define your theory, writing down something called a Lagrangian. Then you start drawing Feynman diagrams, starting with the simplest “tree” diagrams and moving on to more complicated “loops”. Using rules derived from your Lagrangian, you translate these Feynman diagrams into a set of integrals. Do the integrals, and finally you have your answer.

Our field is a big tent, with many different approaches. Despite that, a kind of standard picture has emerged. It’s not the best we can do, and it’s certainly not what everyone is doing. But it’s in the back of our minds, a default to compare against and improve on. It’s the amplitudes assembly line: an “industrial” process that takes raw assumptions and builds particle physics probabilities.


  1. Start with some simple assumptions about your particles (what mass do they have? what is their spin?) and your theory (minimally, it should obey special relativity). Using that, find the simplest “trees”, involving only three particles: one particle splitting into two, or two particles merging into one.
  2. With the three-particle trees, you can now build up trees with any number of particles, using a technique called BCFW (named after its inventors, Ruth Britto, Freddy Cachazo, Bo Feng, and Edward Witten).
  3. Now that you’ve got trees with any number of particles, it’s time to get loops! As it turns out, you can stitch together your trees into loops, using a technique called generalized unitarity. To do this, you have to know what kinds of integrals are allowed to show up in your result, and a fair amount of effort in the field goes into figuring out a better “basis” of integrals.
  4. (Optional) Generalized unitarity will tell you which integrals you need to do, but those integrals may be related to each other. By understanding where these relations come from, you can reduce to a basis of fewer “master” integrals. You can also try to aim for integrals with particular special properties, quite a lot of effort goes in to improving this basis as well. The end goal is to make the final step as easy as possible:
  5. Do the integrals! If you just want to get a number out, you can use numerical methods. Otherwise, there’s a wide variety of choices available. Methods that use differential equations are probably the most popular right now, but I’m a fan of other options.

Some people work to improve one step in this process, making it as efficient as possible. Others skip one step, or all of them, replacing them with deeper ideas. Either way, the amplitudes assembly line is the background: our current industrial machine, churning out predictions.

Doug NatelsonShort items

A few interesting things I've found this past week:

  • The connection between particle spin and quantum statistics (fermions = half-integer spin, bosons = integer spin) is subtle, as I've mentioned before.  This week I happened upon a neat set of slides (pdf) by Jonathan Bain on this topic.  He looks at how we should think about why a pretty restrictive result from non-interacting relativistic quantum field theories has such profound, general implications.  He has a book on this, too.  
  • There is a new book about the impact of condensed matter physics on the world and why it's the comparatively unsung branch of the discipline.   I have a copy on the way; once I read it I'll post a review.
  • It's also worth reading about why mathematics as a discipline is viewed the way it is culturally.
  • This is a really well-written article about turbulence, and why it's hard even though it's "just \(\mathbf{F} = m\mathbf{a}\)" for little blobs of fluid.
  • Humanoid robots are getting more and more impressive.  I would really like to know the power consumption of one of those, though, given that the old ones used to have either big external power cables or on-board diesel engines.  The robot apocalypse is less scary if they have to recharge every ten minutes of operating time.
  • I always wondered if fidget spinners were good for something.

October 11, 2018

BackreactionYes, women in science still have a disadvantage.

Women today still face obstacles men don’t encounter and often don’t notice. I see this every day at my front door, in physics, where women are still underrepresented. Among the sciences, it’s physics where the gender-balance is most skewed. While women are catching up on PhDs, with the ratio now at roughly 20% (US data), women are more likely to leave higher education for good. Among faculty,

Mark Chu-CarrollMental Health Day: A Taste of Living with Social Anxiety

It’s world mental health day. I’ve been meaning to do some more writing about social anxiety, and this seems like an appropriate day for that.

This isn’t easy to write about. A big part of social anxiety, to me, is that I’m afraid of how people will react to me. So talking about the things that are wrong with me is hard, and not exactly a lot of fun. But I try to do it, because I think it’s important. It’s useful for me to confront this; it’s important for other people with social anxiety to see and hear that they’re not alone; and it’s important to fight the general stigma against mental illness. I still struggle with my social anxiety – but I’m also happily married, with a great job and a successful career: I’m a walking demonstration of the fact that you can have mental illnesses like depression and social anxiety disorder, and still have a good, happy, full life.

In the past, I’ve tried to explain what it’s like to live with social anxiety. I’m going to try to expand on that a bit, and walk you through a particularly hard example of it that I’m trying to deal with right now.

What I’ve said before is that SA, for me, is a deeply seated belief that there’s something wrong with me, and whenever I’m socially interacting with people, I’m afraid that they’re going to realize what a freak I am.

That’s kind-of true, and it’s also kind-of not. This is difficult to put into words, because the actually feeling is almost a physical reaction, not a thought, so it’s not really linguistic. Yes, I am constantly on edge when I’m interacting socially. I am constantly afraid in social situations. The hard part to explain is that I don’t even know what I’m afraid of. There’s no specific bad outcome that I’m imagining. I can often relate the fear back to things that I’ve experienced in the past – but I don’t experience the fear and anxiety now as being fear/anxiety that those specific things, or things like them, will re-occur. I’m just afraid.

Here’s where I