Planet Musings

December 18, 2018

David Hoggactions, planet spectroscopy, dust

Discussions continued at Flatiron about Galactic dynamics and actions. We laid out uses for actions and then discussed more results from Beane (Flatiron) on the inconsistency of actions when you have wrong coordinate systems or potential.

Stars meeting featured various interesting discussions. But during a discussion led by Kreidberg (Harvard) about temperature-mapping hot rocky planets, I had an idea: We could use the strong absorption lines in stellar spectra to increase the planet-to-star brightness ratio. If we have full-phase coverage with high-resolution spectroscopy, we can look for the hot planet to “fill in” some of the absorption lines at full phases, and the amount it fills in for lines at different wavelengths would tell you the temperature (or low-resolution spectrum) of the planet! I want to do this with our HARPS data and our wobble pipeline!

In the afternoon, Boris Leistedt (NYU) and I made a plan with David Blei (Columbia) and Andrew Miller (Columbia) to build our 3-d dust model out of dust measurements. There are many problems to solve! But we are starting by assuming that Leistedt's data-driven dust measurements are correct and have Gaussian noise, the stellar positions are well known, and the dust field can be represented by a Gaussian process. In terms of challenges, we are starting by working on the scaling problem: How to make things run on millions or hundreds of millions of stars at a time? One dispute we had is about what line-of-sight integral of the dust corresponds to the extinction?

David Hoggare jets beamed? correlation function slowness

Today Kate Alexander (Harvard) gave the Astro Seminar. She talked about the observational properties of jets across wavelength but especially in the radio. And unresolved jets, understood through their spectral energy distributions. One point which came up is that there does still seem to be a beaming puzzle: The models of the observations imply high beaming factors, but off-axis examples are very hard to find. So is the model ruled out? MacFadyen (NYU) implied yes, even though he is one of the principal authors of the theories! I think this is a super-important area for multi-messenger and time-domain astrophysics.

Before lunch, Kate Storey-Fisher (NYU) and I had an absolutely great discussion with Roman Scoccimarro (NYU) about our correlation function estimator. He started off very skeptical and ended up a huge fan, which was fun to see, because I am pretty stoked about it! But then he said something off-topic but super-interesting: He has a standard experience on huge projects of the following form: While the correlation-function team is waiting for the data center to compute the correlation-function estimator (which involves an enormous pair-count operation in data and (much more importantly) random catalogs), he computes the power spectrum for the same data sample on his laptop! And yet the correlation function and the power spectrum are (in principle) the same information! What gives?

The answer—which I have to say I haven't fully figured out yet—is in part that the standard power-spectrum estimation doesn't consider explicitly the off-diagonal (k not equal to k-prime) mode cross-correlations, and in part that the standard power-spectrum estimation assumes that the window function is simple enough that a random catalog is not necessary. Those are huge approximations! However, if they are good enough for the power spectrum on baryon-acoustic scales, then they must be good enough for the correlation function on those same scales and maybe we can build a far, far faster estimator?

December 17, 2018

Terence Tao255B, Notes 1: The Lagrangian formulation of the Euler equations

These lecture notes are a continuation of the 254A lecture notes from the previous quarter.

We consider the Euler equations for incompressible fluid flow on a Euclidean space {{\bf R}^d}; we will label {{\bf R}^d} as the “Eulerian space” {{\bf R}^d_E} (or “Euclidean space”, or “physical space”) to distinguish it from the “Lagrangian space” {{\bf R}^d_L} (or “labels space”) that we will introduce shortly (but the reader is free to also ignore the {E} or {L} subscripts if he or she wishes). Elements of Eulerian space {{\bf R}^d_E} will be referred to by symbols such as {x}, we use {dx} to denote Lebesgue measure on {{\bf R}^d_E} and we will use {x^1,\dots,x^d} for the {d} coordinates of {x}, and use indices such as {i,j,k} to index these coordinates (with the usual summation conventions), for instance {\partial_i} denotes partial differentiation along the {x^i} coordinates. (We use superscripts for coordinates {x^i} instead of subscripts {x_i} to be compatible with some differential geometry notation that we will use shortly; in particular, when using the summation notation, we will now be matching subscripts with superscripts for the pair of indices being summed.)

In Eulerian coordinates, the Euler equations read

\displaystyle  \partial_t u + u \cdot \nabla u = - \nabla p \ \ \ \ \ (1)

\displaystyle  \nabla \cdot u = 0

where {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} is the velocity field and {p: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}} is the pressure field. These are functions of time {t \in [0,T)} and on the spatial location variable {x \in {\bf R}^d_E}. We will refer to the coordinates {(t,x) = (t,x^1,\dots,x^d)} as Eulerian coordinates. However, if one reviews the physical derivation of the Euler equations from 254A Notes 0, before one takes the continuum limit, the fundamental unknowns were not the velocity field {u} or the pressure field {p}, but rather the trajectories {(x^{(a)}(t))_{a \in A}}, which can be thought of as a single function {x: [0,T) \times A \rightarrow {\bf R}^d_E} from the coordinates {(t,a)} (where {t} is a time and {a} is an element of the label set {A}) to {{\bf R}^d}. The relationship between the trajectories {x^{(a)}(t) = x(t,a)} and the velocity field was given by the informal relationship

\displaystyle  \partial_t x(t,a) \approx u( t, x(t,a) ). \ \ \ \ \ (2)

We will refer to the coordinates {(t,a)} as (discrete) Lagrangian coordinates for describing the fluid.

In view of this, it is natural to ask whether there is an alternate way to formulate the continuum limit of incompressible inviscid fluids, by using a continuous version {(t,a)} of the Lagrangian coordinates, rather than Eulerian coordinates. This is indeed the case. Suppose for instance one has a smooth solution {u, p} to the Euler equations on a spacetime slab {[0,T) \times {\bf R}^d} in Eulerian coordinates; assume furthermore that the velocity field {u} is uniformly bounded. We introduce another copy {{\bf R}^d_L} of {{\bf R}^d}, which we call Lagrangian space or labels space; we use symbols such as {a} to refer to elements of this space, {da} to denote Lebesgue measure on {{\bf R}^d_L}, and {a^1,\dots,a^d} to refer to the {d} coordinates of {a}. We use indices such as {\alpha,\beta,\gamma} to index these coordinates, thus for instance {\partial_\alpha} denotes partial differentiation along the {a^\alpha} coordinate. We will use summation conventions for both the Eulerian coordinates {i,j,k} and the Lagrangian coordinates {\alpha,\beta,\gamma}, with an index being summed if it appears as both a subscript and a superscript in the same term. While {{\bf R}^d_L} and {{\bf R}^d_E} are of course isomorphic, we will try to refrain from identifying them, except perhaps at the initial time {t=0} in order to fix the initialisation of Lagrangian coordinates.

Given a smooth and bounded velocity field {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}, define a trajectory map for this velocity to be any smooth map {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} that obeys the ODE

\displaystyle  \partial_t X(t,a) = u( t, X(t,a) ); \ \ \ \ \ (3)

in view of (2), this describes the trajectory (in {{\bf R}^d_E}) of a particle labeled by an element {a} of {{\bf R}^d_L}. From the Picard existence theorem and the hypothesis that {u} is smooth and bounded, such a map exists and is unique as long as one specifies the initial location {X(0,a)} assigned to each label {a}. Traditionally, one chooses the initial condition

\displaystyle  X(0,a) = a \ \ \ \ \ (4)

for {a \in {\bf R}^d_L}, so that we label each particle by its initial location at time {t=0}; we are also free to specify other initial conditions for the trajectory map if we please. Indeed, we have the freedom to “permute” the labels {a \in {\bf R}^d_L} by an arbitrary diffeomorphism: if {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} is a trajectory map, and {\pi: {\bf R}^d_L \rightarrow{\bf R}^d_L} is any diffeomorphism (a smooth map whose inverse exists and is also smooth), then the map {X \circ \pi: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} is also a trajectory map, albeit one with different initial conditions {X(0,a)}.

Despite the popularity of the initial condition (4), we will try to keep conceptually separate the Eulerian space {{\bf R}^d_E} from the Lagrangian space {{\bf R}^d_L}, as they play different physical roles in the interpretation of the fluid; for instance, while the Euclidean metric {d\eta^2 = dx^1 dx^1 + \dots + dx^d dx^d} is an important feature of Eulerian space {{\bf R}^d_E}, it is not a geometrically natural structure to use in Lagrangian space {{\bf R}^d_L}. We have the following more general version of Exercise 8 from 254A Notes 2:

Exercise 1 Let {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} be smooth and bounded.

  • If {X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E} is a smooth map, show that there exists a unique smooth trajectory map {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} with initial condition {X(0,a) = X_0(a)} for all {a \in {\bf R}^d_L}.
  • Show that if {X_0} is a diffeomorphism and {t \in [0,T)}, then the map {X(t): a \mapsto X(t,a)} is also a diffeomorphism.

Remark 2 The first of the Euler equations (1) can now be written in the form

\displaystyle  \frac{d^2}{dt^2} X(t,a) = - \nabla p( t, X(t,a) ) \ \ \ \ \ (5)

which can be viewed as a continuous limit of Newton’s first law {m^{(a)} \frac{d^2}{dt^2} x^{(a)}(t) = F^{(a)}(t)}.

Call a diffeomorphism {Y: {\bf R}^d_L \rightarrow {\bf R}^d_E} (oriented) volume preserving if one has the equation

\displaystyle  \mathrm{det}( \nabla Y )(a) = 1 \ \ \ \ \ (6)

for all {a \in {\bf R}^d_L}, where the total differential {\nabla Y} is the {d \times d} matrix with entries {\partial_\alpha Y^i} for {\alpha = 1,\dots,d} and {i=1,\dots,d}, where {Y^1,\dots,Y^d:{\bf R}^d_L \rightarrow {\bf R}} are the components of {Y}. (If one wishes, one can also view {\nabla Y} as a linear transformation from the tangent space {T_a {\bf R}^d_L} of Lagrangian space at {a} to the tangent space {T_{Y(a)} {\bf R}^d_E} of Eulerian space at {Y(a)}.) Equivalently, {Y} is orientation preserving and one has a Jacobian-free change of variables formula

\displaystyle  \int_{{\bf R}^d_E} f( Y(a) )\ da = \int_{{\bf R}^d_L} f(x)\ dx

for all {f \in C_c({\bf R}^d_L \rightarrow {\bf R})}, which is in turn equivalent to {Y(E) \subset {\bf R}^d_E} having the same Lebesgue measure as {E} for any measurable set {E \subset {\bf R}^d_L}.

The divergence-free condition {\nabla \cdot u = 0} then can be nicely expressed in terms of volume-preserving properties of the trajectory maps {X}, in a manner which confirms the interpretation of this condition as an incompressibility condition on the fluid:

Lemma 3 Let {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} be smooth and bounded, let {X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E} be a volume-preserving diffeomorphism, and let {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} be the trajectory map. Then the following are equivalent:

  • {\nabla \cdot u = 0} on {[0,T) \times {\bf R}^d_E}.
  • {X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E} is volume-preserving for all {t \in [0,T)}.

Proof: Since {X_0} is orientation-preserving, we see from continuity that {X(t)} is also orientation-preserving. Suppose that {X(t)} is also volume-preserving, then for any {f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})} we have the conservation law

\displaystyle  \int_{{\bf R}^d_L} f( X(t,a) )\ da = \int_{{\bf R}^d_E} f(x)\ dx

for all {t \in [0,T)}. Differentiating in time using the chain rule and (3) we conclude that

\displaystyle  \int_{{\bf R}^d_L} (u(t) \cdot \nabla f)( X(t,a)) \ da = 0

for all {t \in [0,T)}, and hence by change of variables

\displaystyle  \int_{{\bf R}^d_E} (u(t) \cdot \nabla f)(x) \ dx = 0

which by integration by parts gives

\displaystyle  \int_{{\bf R}^d_E} (\nabla \cdot u(t,x)) f(x)\ dx = 0

for all {f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})} and {t \in [0,T)}, so {u} is divergence-free.

To prove the converse implication, it is convenient to introduce the labels map {A:[0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_L}, defined by setting {A(t): {\bf R}^d \rightarrow {\bf R}^d} to be the inverse of the diffeomorphism {X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E}, thus

\displaystyle  X(t, A(t, x)) = x

for all {(t,x) \in [0,T) \times {\bf R}^d_E}. By the implicit function theorem, {A} is smooth, and by differentiating the above equation in time using (3) we see that

\displaystyle  D_t A(t,x) = 0

where {D_t} is the usual material derivative

\displaystyle  D_t := \partial_t + u \cdot \nabla \ \ \ \ \ (7)

acting on functions on {[0,T) \times {\bf R}^d_E}. If {u} is divergence-free, we have from integration by parts that

\displaystyle  \partial_t \int_{{\bf R}^d_E} \phi(t,x)\ dx = \int_{{\bf R}^d_E} D_t \phi(t,x)\ dx

for any test function {\phi: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}. In particular, for any {g \in C^\infty_c({\bf R}^d_L \rightarrow {\bf R})}, we can calculate

\displaystyle \partial_t \int_{{\bf R}^d_E} g( A(t,x) )\ dx = \int_{{\bf R}^d_E} D_t (g(A(t,x)))\ dx

\displaystyle  = \int_{{\bf R}^d_E} 0\ dx

and hence

\displaystyle  \int_{{\bf R}^d_E} g(A(t,x))\ dx = \int_{{\bf R}^d_E} g(A(0,x))\ dx

for any {t \in [0,T)}. Since {X_0} is volume-preserving, so is {A(0)}, thus

\displaystyle  \int_{{\bf R}^d_E} g \circ A(t)\ dx = \int_{{\bf R}^d_L} g\ da.

Thus {A(t)} is volume-preserving, and hence {X(t)} is also. \Box

Exercise 4 Let {M: [0,T) \rightarrow \mathrm{GL}_d({\bf R})} be a continuously differentiable map from the time interval {[0,T)} to the general linear group {\mathrm{GL}_d({\bf R})} of invertible {d \times d} matrices. Establish Jacobi’s formula

\displaystyle  \partial_t \det(M(t)) = \det(M(t)) \mathrm{tr}( M(t)^{-1} \partial_t M(t) )

and use this and (6) to give an alternate proof of Lemma 3 that does not involve any integration.

Remark 5 One can view the use of Lagrangian coordinates as an extension of the method of characteristics. Indeed, from the chain rule we see that for any smooth function {f: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}} of Eulerian spacetime, one has

\displaystyle  \frac{d}{dt} f(t,X(t,a)) = (D_t f)(t,X(t,a))

and hence any transport equation that in Eulerian coordinates takes the form

\displaystyle  D_t f = g

for smooth functions {f,g: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}} of Eulerian spacetime is equivalent to the ODE

\displaystyle  \frac{d}{dt} F = G

where {F,G: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}} are the smooth functions of Lagrangian spacetime defined by

\displaystyle  F(t,a) := f(t,X(t,a)); \quad G(t,a) := g(t,X(t,a)).

In this set of notes we recall some basic differential geometry notation, particularly with regards to pullbacks and Lie derivatives of differential forms and similar fields on manifolds such as {{\bf R}^d_E} and {{\bf R}^d_L}, and explore how the Euler equations look in this notation. Our discussion will be entirely formal in nature; we will assume that all functions have enough smoothness and decay at infinity to justify the relevant calculations. (It is possible to work rigorously in Lagrangian coordinates – see for instance the work of Ebin and Marsden – but we will not do so here.) As a general rule, Lagrangian coordinates tend to be somewhat less convenient to use than Eulerian coordinates for establishing the basic analytic properties of the Euler equations, such as local existence, uniqueness, and continuous dependence on the data; however, they are quite good at clarifying the more algebraic properties of these equations, such as conservation laws and the variational nature of the equations. It may well be that in the future we will be able to use the Lagrangian formalism more effectively on the analytic side of the subject also.

Remark 6 One can also write the Navier-Stokes equations in Lagrangian coordinates, but the equations are not expressed in a favourable form in these coordinates, as the Laplacian {\Delta} appearing in the viscosity term becomes replaced with a time-varying Laplace-Beltrami operator. As such, we will not discuss the Lagrangian coordinate formulation of Navier-Stokes here.

— 1. Pullbacks and Lie derivatives —

In order to efficiently change coordinates, it is convenient to use the language of differential geometry, which is designed to be almost entirely independent of the choice of coordinates. We therefore spend some time recalling the basic concepts of differential geometry that we will need. Our presentation will be based on explicitly working in coordinates; there are of course more coordinate-free approaches to the subject (for instance setting up the machinery of vector bundles, or of derivations), but we will not adopt these approaches here.

Throughout this section, we fix a diffeomorphism {Y: {\bf R}^d_L \rightarrow {\bf R}^d_E} from Lagrangian space {{\bf R}^d_L} to Eulerian space {{\bf R}^d_E}; one can for instance take {Y = X(t)} where {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} is a trajectory map and {t \in [0,T)} is some time. Then all the differential geometry structures on Eulerian space {{\bf R}^d_E} can be pulled back via {Y} to Lagrangian space {{\bf R}^d_L}. For instance, a physical point {x \in {\bf R}^d_E} can be pulled back to a label {Y^* x := Y^{-1}(x) \in {\bf R}^d_L}, and similarly a subset {E \subset {\bf R}^d_E} of physical space can be pulled back to a subset {Y^* E := Y^{-1}(E) \subset {\bf R}^d_L} of label space. A scalar field {f: {\bf R}^d_E \rightarrow {\bf R}} can be pulled back to a scalar field {Y^* f: {\bf R}^d_L \rightarrow {\bf R}}, defined by pre-composition:

\displaystyle  Y^* f(a) := f(Y(a)).

These operations are all compatible with each other in various ways; for instance, if {x \in {\bf R}^d_E}, {E \subset {\bf R}^d_E}, and {f: {\bf R}^d_L \rightarrow {\bf R}}, and {c \in {\bf R}} then

  • {x \in E} if and only if {Y^* x \in Y^* E}.
  • {f(x) = c} if and only if {Y^* f( Y^* x ) = c}.
  • The map {E \mapsto Y^* E} is an isomorphism of {\sigma}-algebras.
  • The map {f \mapsto Y^* f} is an algebra isomorphism.

Differential forms. The next family of structures we will pull back are that of differential forms, which we will define using coordinates. (See also my previous notes on this topic for more discussion on differential forms.) For any {k \geq 0}, a {k}-form {\omega} on {{\bf R}^d_E} will be defined as a family of functions {\omega_{i_1 \dots i_k}: {\bf R}^d_E \rightarrow {\bf R}} for {i_1,\dots,i_k \in \{1,\dots,d\}} which is totally antisymmetric with respect to permutations of the indices {i_1,\dots,i_k}, thus if one interchanges {i_j} and {i_{j'}} for any {1 \leq j < j' \leq k}, then {\omega_{i_1 \dots i_k}} flips to {-\omega_{i_1 \dots i_k}}. Thus for instance

  • A {0}-form is just a scalar field {\omega: {\bf R}^d_E \rightarrow {\bf R}};
  • A {1}-form, when viewed in coordinates, is a collection {\omega_i: {\bf R}^d_E \rightarrow {\bf R}} of {d} scalar functions;
  • A {2}-form, when viewed in coordinates, is a collection {\omega_{ij}: {\bf R}^d_E \rightarrow {\bf R}} of {d^2} scalar functions with {\omega_{ji} = -\omega_{ij}} (so in particular {\omega_{ii}=0});
  • A {3}-form, when viewed in coordinates, is a collection {\omega_{ijk}: {\bf R}^d_E \rightarrow {\bf R}} of {d^3} scalar functions with {\omega_{jik} = -\omega_{ijk}}, {\omega_{ikj} = -\omega_{ijk}}, and {\omega_{kji} = -\omega_{ijk}}.

The antisymmetry makes the component {\omega_{i_1 \dots i_k}} of a {k}-form vanish whenever two of the indices agree. In particular, if {k>d}, then the only {k}-form that exists is the zero {k}-form {0}. A {d}-form is also known as a volume form; amongst all such forms we isolate the standard volume form {d\mathrm{vol}_E}, defined by setting {(d\mathrm{vol}_E)_{\sigma(1) \dots \sigma(d)} := \mathrm{sgn}(\sigma)} for any permutation {\sigma: \{1,\dots,d\} \rightarrow \{1,\dots,d\}} (with {\mathrm{sgn}(\sigma)\in \{-1,+1\}} being the sign of the permutation), and setting all other components of {d\mathrm{vol}_E} equal to zero. For instance, in three dimensions one has {(d\mathrm{vol}_E)_{i_1 \dots i_3}} equal to {+1} when {(i_1,i_2,i_3) = (1,2,3), (2,3,1), (3,1,2)}, {-1} when {(i_1,i_2,i_3) = (1,3,2), (3,2,1), (2,1,3)}, and {0} otherwise. We use {\Omega^k({\bf R}^d_E)} to denote the space of {k}-forms on {{\bf R}^d_E}.

If {f: {\bf R}^d_E \rightarrow {\bf R}} is a scalar field and {\omega \in \Omega^k({\bf R}^d_E)}, we can define the product {f\omega} by pointwise multiplication of components:

\displaystyle  (f\omega)_{i_1 \dots i_k}(x) := f(x) \omega_{i_1 \dots i_k}(x).

More generally, given two forms {\omega \in \Omega^k({\bf R}^d_E)}, {\theta \in \Omega^l({\bf R}^d_E)}, we define the wedge product {\omega \wedge \theta := \Omega^{k+l}({\bf R}^d_E)} to be the {k+l}-form given by the formula

\displaystyle  (\omega \wedge \theta)_{i_1 \dots i_{k+l}}(x) := \frac{1}{k! l!} \sum_{\sigma \in S_{k+l}} \mathrm{sgn}(\sigma) \omega_{i_{\sigma(1)} \dots i_{\sigma(k)}}(x) \theta_{i_{\sigma(k+1)} \dots i_{\sigma(k+l)}}(x)

where {S_{k+l}} is the symmetric group of permutations on {\{1,\dots,k+l\}}. For instance, for a scalar field {f: {\bf R}^d_E \rightarrow {\bf R}} (so {f \in \Omega^0({\bf R}^d_E)}), {f \wedge \omega = \omega \wedge f = f \omega}. Similarly, if {\theta,\eta \in \Omega^1({\bf R}^d_E)} and {\omega \in \Omega^2({\bf R}^d_E)}, we have the pointwise identities

\displaystyle  (\theta \wedge \eta)_{ij} = \theta_i \eta_j - \theta_j \eta_i

\displaystyle  (\theta \wedge \omega)_{ijk} = \theta_i \omega_{jk} - \theta_j \omega_{ik} + \theta_k \omega_{ij}

\displaystyle  (\omega \wedge \theta)_{ijk} = \omega_{ij} \theta_k - \omega_{ik} \theta_j + \omega_{jk} \theta_i.

Exercise 7 Show that the wedge product is a bilinear map from {\Omega^k({\bf R}^d_E) \times \Omega^l({\bf R}^d_E)} to {\Omega^{k+l}({\bf R}^d_E)} that obeys the supercommutative property

\displaystyle  \omega \wedge \theta = (-1)^{kl} \theta \wedge \omega

for {\omega \in \Omega^k({\bf R}^d_E)} and {\theta \in \Omega^l({\bf R}^d_E)}, and the associative property

\displaystyle  (\omega \wedge \theta) \wedge \eta = \omega \wedge (\theta \wedge \eta)

for {\omega \in \Omega^k({\bf R}^d_E)}, {\theta \in \Omega^l({\bf R}^d_E)}, {\eta \in \Omega^m({\bf R}^d_E)}. (In other words, the space of formal linear combinations of forms, graded by the parity of the order of the forms, is a supercommutative algebra. Very roughly speaking, the prefix “super” means that “odd order objects anticommute with each other rather than commute”.)

If {\omega \in \Omega^k({\bf R}^d_E)} is continuously differentiable, we define the exterior derivative {d\omega \in \Omega^{k+1}({\bf R}^d_E)} in coordinates as

\displaystyle  (d\omega)_{i_1 \dots i_{k+1}} := \sum_{j=1}^{k+1} (-1)^{j-1} \partial_{i_j} \omega_{i_1 \dots i_{j-1} i_{j+1} \dots i_{k+1}}. \ \ \ \ \ (8)

It is easy to verify that this is indeed a {k+1}-form. Thus for instance:

  • If {f \in \Omega^0({\bf R}^d_E)} is a continously differentiable scalar field, then {(df)_i = \partial_i f}.
  • If {\theta \in \Omega^1({\bf R}^d_E)} is a continuously differentiable {1}-form, then {(d\theta)_{ij} = \partial_i \theta_j - \partial_j \theta_i}.
  • If {\omega \in \Omega^2({\bf R}^d_E)} is a continuously differentiable {2}-form, then {(d\omega)_{ijk} = \partial_i \omega_{jk} - \partial_j \omega_{ik} + \partial_k \omega_{ij}}.

Exercise 8 If {\omega \in \Omega^k({\bf R}^d_E)} and {\theta \in \Omega^l({\bf R}^d_E)} are continuously differentiable, establish the antiderivation (or super-Leibniz) law

\displaystyle  d( \omega \wedge \theta ) = (d\omega) \wedge \theta + (-1)^k \omega \wedge d\theta \ \ \ \ \ (9)

and if {\omega} is twice continuously differentiable, establish the chain complex law

\displaystyle  d d \omega = 0. \ \ \ \ \ (10)

Each of the coordinates {x^i}, {i=1,\dots,d} can be viewed as scalar fields {(x^1,\dots,x^d) \mapsto x^i}. In particular, the exterior derivatives {dx^i}, {i=1,\dots,d} are {1}-forms. It is easy to verify the identity

\displaystyle  \omega = \frac{1}{k!} \omega_{i_1 \dots i_k} dx^{i_1} \wedge \dots \wedge dx^{i_k}

for any {\omega \in \Omega^k({\bf R}^d_E)} with the usual summation conventions (which, in this differential geometry formalism, assert that we sum indices whenever they appear as a subscript-superscript pair). In particular the volume form {d\mathrm{vol}_E} can be written as

\displaystyle  d\mathrm{vol}_E = dx^1 \wedge \dots \wedge dx^d.

One can of course define differential forms on Lagrangian space {{\bf R}^d_L} as well, changing the indices from Roman to Greek. For instance, if {\theta \in \Omega^1({\bf R}^d_L)} is continuously differentiable, then {d\theta \in \Omega^2({\bf R}^d_L)} is given in coordinates as

\displaystyle  (d\theta)_{\alpha \beta} = \partial_\alpha \theta_\beta - \partial_\beta \theta_\alpha.

If {\omega \in \Omega^k({\bf R}^d_E)}, we define the pullback form {Y^* \omega \in \Omega^k({\bf R}^d_L)} by the formula

\displaystyle  (Y^* \omega)_{\alpha_1 \dots \alpha_k}(a) := \omega_{i_1 \dots i_k}(Y(a)) \partial_{\alpha_1} Y^{i_1}(a) \dots \partial_{\alpha_k} Y^{i_k}(a) \ \ \ \ \ (11)

with the usual summation conventions. Thus for instance

It is easy to see that pullback {Y^*} is a linear map from {\Omega^k({\bf R}^d_E)} to {\Omega^k({\bf R}^d_L)}. It also preserves the exterior algebra and exterior derivative:

Exercise 9 Let {\omega \in \Omega^k({\bf R}^d_E)} and {\theta \in \Omega^l({\bf R}^d_L)}. Show that

\displaystyle  Y^* (\omega \wedge \theta) = (Y^* \omega) \wedge (Y^* \theta),

and if {\omega} is continuously differentiable, show that

\displaystyle  Y^* d \omega = d Y^* \omega.

One can integrate {k}-forms on oriented {k}-manifolds. Suppose for instance that an oriented {k}-manifold {M \subset {\bf R}^d_E} has a parameterisation {\{ \phi(a): a \in U \}}, where {U} is an open subset of {{\bf R}^k_L} and {\phi: U \rightarrow {\bf R}^d_E} is an injective immersion. Then any continuous compactly supported {k}-form {\omega \in \Omega^k({\bf R}^d_E)} can be integrated on {M} by the formula

\displaystyle  \int_M \omega := \frac{1}{k!} \int_U \omega(\phi(a))_{i_1 \dots i_k} \partial_1 Y^{i_1}(a) \dots \partial_k Y^{i_k}(a)\ da

with the usual summation conventions. It can be shown that this definition is independent of the choice of parameterisation. For a more general manifold {M}, one can use a partition of unity to decompose the integral {\int_M \omega} into parameterised manifolds, and define the total integral to be the sum of the components; again, one can show (after some tedious calculation) that this is independent of the choice of parameterisation. If {M} is all of {{\bf R}^d_E} (with the standard identity), and {f \in C_c({\bf R}^d \rightarrow {\bf R})}, then we have the identity

\displaystyle  \int_{{\bf R}^d_E} f\ d\mathrm{vol}_E = \int_{{\bf R}^d_E} f(x)\ dx \ \ \ \ \ (13)

linking integration on differential forms with the Lebesgue (or Riemann) integral. We also record Stokes’ theorem

\displaystyle  \int_{\partial \Omega} \omega = \int_{\Omega} d \omega \ \ \ \ \ (14)

whenever {\Omega} is a smooth orientable {k+1}-manifold with smooth boundary {\partial \Omega}, and {\omega} is a continuous, compactly supported {k}-form. The regularity conditions on {\Omega,\omega} here can often be relaxed by the usual limiting arguments; for the purposes of this set of notes, we shall proceed formally and assume that identities such as (14) hold for all manifolds {\Omega} and forms {\omega} under consideration.

From the change of variables formula we see that pullback also respects integration on manifolds, in that

\displaystyle  \int_{Y^* \Omega} Y^* \omega = \int_\Omega \omega \ \ \ \ \ (15)

whenever {\Omega} is a smooth orientable {k}-manifold, and {\omega} a continuous compactly supported {k}-form.

Exercise 10 Establish the identity

\displaystyle  Y^* d\mathrm{vol}_E = \mathrm{det}(\nabla Y) d\mathrm{vol}_L.

Conclude in particular that {Y} is volume-preserving if and only if

\displaystyle  Y^* d\mathrm{vol}_E = d\mathrm{vol}_L.

Vector fields. Having pulled back differential forms, we now pull back vector fields. A vector field {Z} on {{\bf R}^d_E}, when viewed in coordinates, is a collection {Z^i: {\bf R}^d_E \rightarrow {\bf R}}, {i=1,\dots,d} of scalar functions; superficially, this resembles a {1}-form {\theta \in \Omega^1({\bf R}^d_E)}, except that we use superscripts {Z^i} instead of subscripts {\theta_i} to denote the components. On the other hand, we will transform vector fields under pullback in a different manner from {1}-forms. For each {i}, a basic example of a vector field is the coordinate vector field {\frac{d}{dx^i}}, defined by setting {(\frac{d}{dx^i})^j} to equal {1} when {i=j} and {0} otherwise. Then every vector field {Z} may be written as

\displaystyle  Z = Z^i \frac{d}{dx^i}

where we multiply scalar functions against vector fields in the obvious fashion. The space of all vector fields will be denoted {\Gamma(T {\bf R}^d_E)}. One can of course define vector fields on {{\bf R}^d_L} similarly.

The pullback {Y^* Z} of {Z} is defined to be the unique vector field {Y^* Z \in \Gamma(T {\bf R}^d_L)} such that

\displaystyle  Z^i(Y(a)) = (Y^* Z)^\alpha(a) \partial_\alpha Y^i(a) \ \ \ \ \ (16)

for all {a \in {\bf R}^d_L} (so that {Z} is the pushforward of {Y_* Z}). Equivalently, if {(\nabla Y)^{-1}} is the inverse matrix to the total differential {\nabla Y} (which we recall in coordinates is {(\nabla Y)^i_\alpha := \partial_\alpha Y^i}), so that

\displaystyle  ((\nabla Y)^{-1})^\alpha_i (\nabla Y)^i_\beta = \delta^\alpha_\beta, \quad (\nabla Y)^i_\alpha ((\nabla Y)^{-1})^\alpha_j = \delta^i_j

with {\delta} denoting the Kronecker delta, then

\displaystyle  (Y^* Z)^\alpha(a) = Z^i(Y(a)) ((\nabla Y)^{-1})^\alpha_i(a).

From the inverse function theorem one can also write

\displaystyle  ((\nabla Y)^{-1})^\alpha_i(a) = (\partial_i Y^{-1})( Y(a) ),

thus {Z} is also the pullback of {Y^* Z} by {Y^{-1}}.

If {\omega \in \Omega^k({\bf R}^d_E)} is a {k}-form and {Z_1,\dots,Z_k \in \Gamma(T{\bf R}^d_E)} are vector fields, one can form the scalar field {\omega(Z_1,\dots,Z_k): {\bf R}^d \rightarrow {\bf R}} by the formula

\displaystyle  \omega(Z_1,\dots,Z_k) := \omega_{i_1 \dots i_k} Z^{i_1}_1 \dots Z^{i_k}_k.

Thus for instance if {\omega \in \Omega^2({\bf R}^d_E)} is a {2}-form and {Z,W \in \Gamma(T{\bf R}^d_E)} are vector fields, then

\displaystyle  \omega(Z,W) = \omega_{ij} Z^i W^j.

It is clear that {\omega(Z_1,\dots,Z_k)} is a totally antisymmetric form in the {Z_1,\dots,Z_k}. If {\omega \in \Omega^k({\bf R}^d_E)} is a {k}-form for some {k \geq 1} and {Z \in \Gamma(T{\bf R}^d_E)} is a vector field, we define the contraction (or interior product) {Z \neg \omega \in \Omega^{k-1}({\bf R}^d_E)} in coordinates by the formula

\displaystyle  (Z \neg \omega)_{i_2 \dots i_k} := Z^{i_1} \omega_{i_1 \dots i_k}

or equivalently that

\displaystyle  (Z_1 \neg \omega)(Z_2,\dots,Z_k) = \omega(Z_1,\dots,Z_k)

for {Z_1,\dots,Z_k \in \Gamma(T{\bf R}^d_E)}. Thus for instance if {\omega \in \Omega^2({\bf R}^d_E)} is a {2}-form, and {Z \in \Gamma(T{\bf R}^d_E)} is a vector field, then {Z \neg \omega \in \Omega^1({\bf R}^d_E)} is the {1}-form

\displaystyle  (Z \neg \omega)_i = Z^j \omega_{ji}.

If {Z \in \Gamma(T{\bf R}^d_E)} is a vector field and {f \in \Omega^0({\bf R}^d_E)} is a continuously differentiable scalar field, then {Z \neg df = df(Z)} is just the directional derivative of {f} along the vector field {Z}:

\displaystyle  Z \neg df = Z^i \partial_i f.

The contraction {Z \neg \omega} is also denoted {\iota_Z \omega} in the literature. If one contracts a vector field {Z} against the standard volume form {d\mathrm{vol}_E}, one obtains a {d-1}-form which we will call (by slight abuse of notation) the Hodge dual {*Z} of {Z}:

\displaystyle  *Z := Z \neg d\mathrm{vol}_E.

This can easily be seen to be a bijection between vector fields and {d-1}-forms. The inverse of this operation will also be denoted by the Hodge star {*}:

\displaystyle  *(Z \neg d\mathrm{vol}_E) := Z.

In a similar spirit, the Hodge dual of a scalar field {f: {\bf R}^d_E \rightarrow {\bf R}} will be defined as the volume form

\displaystyle  *f := f d\mathrm{vol}_E

and conversely the Hodge dual of a volume form is a scalar field:

\displaystyle  *(f d\mathrm{vol}_E) = f.

More generally one can form a Hodge duality relationship between {k}-vector fields and {d-k}-forms for any {0 \leq k \leq d}, but we will not do so here as we will not have much use for the notion of a {k}-vector field for any {k>1}.

These operations behave well under pullback (if one assumes volume preservation in the case of the Hodge star):

Exercise 11

  • (i) If {\omega \in \Omega^k({\bf R}^d_E)} and {Z_1,\dots,Z_k \in \Gamma(T{\bf R}^d_E)}, show that

    \displaystyle  Y^* ( \omega(Z_1,\dots,Z_k) ) = (Y^* \omega)(Y^* Z_1, \dots, Y^* Z_k).

  • (ii) If {\omega \in \Omega^k({\bf R}^d_E)} for some {k \geq 1} and {Z \in \Gamma(T{\bf R}^d_E)}, show that

    \displaystyle  Y^*( Z \neg \Omega ) = (Y^* Z) \neg (Y^* \omega).

  • (iii) If {Y} is volume-preserving, show that

    \displaystyle  Y^*( * T ) = * Y^* T

    whenever {T} is a scalar field, vector field, {d-1}-form, or {d}-form on {{\bf R}^d_E}.

Riemannian metrics. A Riemannian metric {g} on {{\bf R}^d_E}, when expressed in coordinates is a collection of scalar functions {g_{ij}: {\bf R}^d_E \rightarrow {\bf R}} such that for each point {x \in {\bf R}^d_E}, the matrix {(g_{ij}(x))_{1 \leq i,j \leq d}} is symmetric and strictly positive definite. In particular it has an inverse metric {g^{-1}}, which is a collection of scalar functions {(g^{-1})^{ij}(x) = g^{ij}(x)} such that

\displaystyle  g^{ij} g_{jk} = \delta^i_k

where {\delta} denotes the Kronecker delta; here we have abused notation (and followed the conventions of general relativity) by allowing the inverse on the metric to be omitted when expressed in coordinates (relying instead on the superscripting of the indices, as opposed to subscripting, to indicate the metric inversion). The Euclidean metric {\eta} is an example of a metric tensor, with {\eta_{ij}} equal to {1} when {i=j} and zero otherwise; the coefficients {\eta^{ij}} of the inverse Euclidean metric {\eta^{-1}} is similarly equal to {1} when {i=j} and {0} otherwise. Given two vector fields {Z,W \in \Gamma(T{\bf R}^d_E)} and a Riemannian metric {g}, we can form the scalar field {g(Z,W)} by

\displaystyle  g(Z,W) := g_{ij} Z^i W^j;

this is a symmetric bilinear form in {Z,W}.

We can define the pullback metric {Y^* g} by the formula

\displaystyle  (Y^* g)_{\alpha \beta}(a) := g_{ij}(Y(a)) \partial_\alpha Y^i(a) \partial_\beta Y^j(a); \ \ \ \ \ (17)

this is easily seen to be a Riemannian metric on {{\bf R}^d_L}, and one has the compatibility property

\displaystyle  Y^*( g(Z,W) ) = (Y^* g)(Y^* Z, Y^* W)

for all {Z,W \in \Gamma(T{\bf R}^d_E)}. It is then not difficult to check that if we pull back the inverse metric {g^{-1}} by the formula

\displaystyle  (Y^*(g^{-1}))^{\alpha \beta}(a) := g^{ij}(Y(a)) ((\nabla Y)^{-1})^\alpha_i(a) ((\nabla Y)^{-1})^\beta_j(a)

then we have the expected relationship

\displaystyle  Y^*(g^{-1}) = (Y^* g)^{-1}.

Exercise 12 If {\pi: {\bf R}^d_L \rightarrow {\bf R}^d_L} is a diffeomorphism, show that

\displaystyle  (Y \circ \pi)^* \omega = \pi^* Y^* \omega

for any {\omega \in \Omega^k({\bf R}^d_E)}, and similarly

\displaystyle  (Y \circ \pi)^* Z = \pi^* Y^* Z

for any {Z \in \Gamma(T{\bf R}^d_E)}, and

\displaystyle  (Y \circ \pi)^* g = \pi^* Y^* g

for any Riemannian metric {g}.

Exercise 13 Show that {Y: {\bf R}^d_L \rightarrow {\bf R}^d_E} is an isometry (with respect to the Euclidean metric on both {{\bf R}^d_L} and {{\bf R}^d_E}) if and only if {Y^* \eta = \eta}.

Every Riemannian metric {g} induces a musical isomorphism between vector fields on {{\bf R}^d_E} with {1}-forms: if {Z \in \Gamma(T {\bf R}^d_E)} is a vector field, the associated {1}-form {g \cdot Z \in \Omega^1({\bf R}^d_E)} (also denoted {Z^\flat_g} or simply {Z^\flat}) is defined in coordinates as

\displaystyle  (g \cdot Z)_i := g_{ij} Z^j

and similarly if {\theta \in \Omega^1({\bf R}^d_E)}, the associated vector field {g^{-1} \cdot \theta \in \Gamma(T{\bf R}^d_E)} (also denoted {\theta^\sharp_g} or {\theta^\sharp}) is defined in coordinates as

\displaystyle  (g^{-1} \cdot \theta)^i := g^{ij} \theta_i.

These operations clearly invert each other: {g^{-1} \cdot g \cdot Z = Z} and {g \cdot g^{-1} \cdot \theta = \theta}. Note that {g \cdot Z} can still be defined if {g} is not positive definite, though it might not be an isomorphism in this case. Observe the identities

\displaystyle  g(W,Z) = W \neg (g \cdot Z) = Z \neg (g \cdot W) = (g \cdot Z)(W) = (g \cdot W)(Z). \ \ \ \ \ (18)

The musical isomorphism interacts well with pullback, provided that one also pulls back the metric {g}:

Exercise 14 If {g} is a Riemannian metric, show that

\displaystyle  Y^*( g \cdot Z ) = Y^* g \cdot Y^* Z

for all {Z \in \Gamma(T {\bf R}^d_E)}, and

\displaystyle  Y^*( g^{-1} \cdot \theta ) = (Y^* g)^{-1} \cdot Y^* \theta.

for all {\theta \in \Omega^1({\bf R}^d_E)}.

We can now interpret some classical operations on vector fields in this differential geometry notation. For instance, if {Z,W \in \Gamma(T{\bf R}^d_E)} are vector fields, the dot product {Z \cdot W: {\bf R}^d_E \rightarrow {\bf R}} can be written as

\displaystyle  Z \cdot W = \eta(Z,W) = Z \neg (\eta \cdot W) = (\eta \cdot W)(Z)

and also

\displaystyle  Z \cdot W = *( (\eta \cdot Z) \wedge *W ),

and for {d=3}, the cross product {Z \times W: {\bf R}^3_E \rightarrow {\bf R}^3_E} can be written in differential geometry notation as

\displaystyle  Z \times W = *((\eta \cdot Z) \wedge (\eta \cdot W)).

Exercise 15 Formulate a definition for the pullback {Y^* T} of a rank {(k,l)} tensor field {T} (which in coordinates would be given by {T^{i_1 \dots i_k}_{j_1 \dots j_l}} for {i_1,\dots,i_k,j_1,\dots,j_l \in \{1,\dots,d\}}) that generalises the pullback of differential forms, vector fields, and Riemannian metrics. Argue why your definition is the natural one.

Lie derivatives. Let {Z \in \Gamma(T {\bf R}^d_E)} is a continuously differentiable vector field, and {\omega \in \Omega^k( {\bf R}^d_E )} is a continuously differentiable {k}-form, we will define the Lie derivative {{\mathcal L}_Z \omega \in \Omega^k({\bf R}^d_E)} of {\omega} along {Z} by the Cartan formula

\displaystyle  {\mathcal L}_Z \omega := Z \neg d\omega + d(Z \neg \omega) \ \ \ \ \ (19)

with the convention that {d(Z \neg \omega)} vanishes if {\omega} is a {0}-form. Thus for instance:

One can interpret the Lie derivative as the infinitesimal version of pullback:

Exercise 16 Let {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} be smooth and bounded (so that {u(t)} can be viewed as a smooth vector field on {{\bf R}^d_E} for each {t}), and let {X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E} be a trajectory map. If {\omega \in \Omega^k({\bf R}^d_E)} is a smooth {k}-form, show that

\displaystyle  \partial_t (X(t)^* \omega) = X(t)^*( {\mathcal L}_{u(t)} \omega ).

More generally, if {\omega(t) \in \Omega^k({\bf R}^d_E)} is a smooth {k}-form that varies smoothly in {t}, show that

\displaystyle  \partial_t (X(t)^* \omega) = X(t)^*( {\mathcal D}_t \omega )

where {{\mathcal D}_t} denotes the material Lie derivative

\displaystyle  {\mathcal D}_t := \partial_t + {\mathcal L}_{u(t)}.

Note that the material Lie derivative specialises to the material derivative when applied to scalar fields. The above exercise shows that the trajectory map intertwines the ordinary time derivative {\partial_t} with the material (Lie) derivative.

Remark 17 If one lets {(t,x) \mapsto \exp(tZ) x} be the trajectory map associated to a time-independent vector field {Z} with initial condition (4) (thus {\exp(0 Z) x = x} and {\frac{d}{dt} \exp(tZ) x = Z( \exp(tZ) x)}, then the above exercise shows that {{\mathcal L}_Z \omega = \frac{d}{dt} \exp(tZ)_* \omega|_{t=0}} for any differential form {\omega}. This can be used as an alternate definition of the Lie derivative {{\mathcal L}_Z} (and has the advantage of readily extending to other tensors than differential forms, for which the Cartan formula is not available).

The Lie derivative behaves very well with respect to exterior product and exterior derivative:

Exercise 18 Let {Z \in \Gamma(T {\bf R}^d_E)} be continuously differentiable, and let {\omega \in \Omega^k({\bf R}^d_E), \alpha \in \Omega^l({\bf R}^d_E)} also be continuously differentiable. Establish the Leibniz rule

\displaystyle  {\mathcal L}_Z( \omega \wedge \alpha ) = ({\mathcal L}_Z \omega) \wedge \alpha + \omega \wedge {\mathcal L}_Z \omega.

If {\omega} is twice continuously differentiable, also establish the commutativity

\displaystyle  {\mathcal L}_Z d \omega = d {\mathcal L}_Z \omega

of exterior derivative and Lie derivative.

Exercise 19 Let {Z \in \Gamma(T {\bf R}^d_E)} be continuously differentiable. Show that

\displaystyle  {\mathcal L}_Z d\mathrm{vol}_E = \mathrm{div}(Z) d\mathrm{vol}_E

where {\mathrm{div}(Z) := \partial_i Z^i} is the divergence of {Z}. Use this and Exercise 16 to give an alternate proof of Lemma 3.

Exercise 20 Let {Z \in \Gamma(T {\bf R}^d_E)} be continuously differentiable. For any smooth compactly supported volume form {\omega}, show that

\displaystyle  \int_{{\bf R}^d_E} {\mathcal L}_Z \omega = 0.

Conclude in particular that if {Z} is divergence-free then

\displaystyle  \int_{{\bf R}^d_E} ({\mathcal L}_Z f) d\mathrm{vol}_E = 0

for any {f\in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})}.

The Lie derivative {{\mathcal L}_Z W \in \Gamma(T {\bf R}^d_E) } of a continuously differentiable vector field {W \in\Gamma(T {\bf R}^d_E)} is defined in coordinates as

\displaystyle  ({\mathcal L}_Z W)^i := Z^j \partial_j W^i - W^j \partial_j Z^i

and the Lie derivative {{\mathcal L}_Z g} of a continuously differentiable rank {(0,2)} tensor {g} is defined in coordinates as

\displaystyle  ({\mathcal L}_Z g)_{ij} := Z^k \partial_k g_{ij} + (\partial_i Z^k) g_{kj} + (\partial_j Z^k) g_{ik}.

Thus for instance the Lie derivative of the Euclidean metric {\eta_{ij}} is expressible in coordinates as

\displaystyle  ({\mathcal L}_Z \eta)_{ij} = \partial_i Z^j + \partial_j Z^i \ \ \ \ \ (21)

(compare with the deformation tensor used in Notes 0).

We have similar properties to Exercise 21:

Exercise 21 Let {Z \in \Gamma(T {\bf R}^d_E)} be continuously differentiable.

Exercise 22 If {Z \in \Gamma(T {\bf R}^d_E)} is continuously differentiable, establish the identity

\displaystyle  Y^* {\mathcal L}_Z \phi = {\mathcal L}_{Y^* Z} Y^* \phi

whenever {\phi} is a continuously differentiable differential form, vector field, or metric tensor.

Exercise 23 If {Z,W \in \Gamma(T {\bf R}^d_E)} are smooth, define the Lie bracket {[Z,W] \in \Gamma(T {\bf R}^d_E)} by the formula

\displaystyle  [Z,W] := {\mathcal L}_Z W.

Establish the anti-symmetry {[Z,W] = -[W,Z]} (so in particular {[Z,Z]=0}) and the Jacobi identity

\displaystyle  [[Z_1,Z_2],Z_3] + [[Z_2,Z_3],Z_1] + [[Z_3,Z_1],Z_2] = 0,

and also

\displaystyle  {\mathcal L}_{[Z,W]} \phi = {\mathcal L}_Z {\mathcal L}_W - {\mathcal L}_W {\mathcal L}_Z

whenever {Z,W,Z_1,Z_2,Z_3 \in \Gamma(T {\bf R}^d_E)} are smooth, and {\phi} is a smooth differentiable form, vector field, or metric tensor.

Exercise 24 Formulate a definition for the Lie derivative {{\mathcal L}_Z T} of a (continuously differentiable) rank {(k,l)} tensor field {T} along a vector field {Z} that generalises the Lie derivative of differential forms, vector fields, and Riemannian metrics. Argue why your definition is the natural one.

— 2. The Euler equations in differential geometry notation —

Now we write the Euler equations (1) in differential geometry language developed in the above section. This will make it relatively painless to change coordinates. As in the rest of the notes, we work formally, assuming that all fields are smooth enough to justify the manipulations below.

The Euler equations involve a time-dependent scalar field {p(t)}, which can be viewed as an element of {\Omega^0({\bf R}^d_E)}, and a time-dependent velocity field {u(t)}, which can be viewed as an element of {\Gamma(T {\bf R}^d_E)}. The second of the Euler equations simply asserts that this vector field is divergence-free:

\displaystyle  \mathrm{div} u(t) = 0

or equivalently (by Exercise 19 and the definition of material Lie derivative {{\mathcal D}_t = \partial_t + {\mathcal L}_u})

\displaystyle  {\mathcal D}_t d\mathrm{vol}_E = 0.

For the first equation, it is convenient to work instead with the covelocity field {v(t) \in \Omega^1({\bf R}^d_E)}, formed by applying the Euclidean musical isomorphism to {u(t)}:

\displaystyle  v(t) = \eta \cdot u(t).

In coordinates, we have {v_i = u^i}. The Euler equations can then be written in coordinates as

\displaystyle  \partial_t v_i + u^j \partial_j v_i = -\partial_i p.

The left-hand side is close to the {i} component of the material Lie derivative {{\mathcal D}_t v = \partial_t v + {\mathcal L}_u v} of {v}. Indeed, from (20) we have

\displaystyle  ({\mathcal D}_t v)_i = \partial_t v_i + u^j \partial_j v_i + (\partial_i u^j) v_j

and so the first Euler equation becomes

\displaystyle  ({\mathcal D}_t v)_i = - \partial_i p + (\partial_i u^j) v_j.

Since {u^j = v_j}, we can express the right-hand side as a total derivative {- \partial_i \tilde p}, where {\tilde p} is the modified pressure

\displaystyle  \tilde p := p - \frac{1}{2} u^j v_j = p - \frac{1}{2} \eta(u,u).

We thus see that the Euler equations can be transformed to the system

\displaystyle  {\mathcal D}_t v = - d \tilde p \ \ \ \ \ (22)

\displaystyle  v = \eta \cdot u \ \ \ \ \ (23)

\displaystyle  {\mathcal D}_t d\mathrm{vol}_E = 0. \ \ \ \ \ (24)

Using the Cartan formula (19), one can also write (22) as

\displaystyle  \partial_t v + u \neg dv = - d p' \ \ \ \ \ (25)

where {p'} is another modification of the pressure:

\displaystyle  p' = \tilde p + u \neg v = p + \frac{1}{2} \eta(u,u).

In coordinates, (25) becomes

\displaystyle  \partial_t v_j + u^i (\partial_i v_j - \partial_j v_i) = - \partial_i p'. \ \ \ \ \ (26)

One advantage of the formulation (22)(24) is that one can pull back by an arbitrary diffeomorphic change of coordinates (both time-dependent and time-independent), with the only things potentially changing being the material Lie derivative {{\mathcal D}_t}, the metric {\eta}, and the volume form {d\mathrm{vol}_E}. (Another, related, advantage is that this formulation readily suggests an extension to more general Riemannian manifolds, by replacing {\eta} with a general Riemannian metric and {d\mathrm{vol}_E} with the associated volume form, without the need to explicitly introduce other Riemannian geometry concepts such as covariant derivatives or Christoffel symbols.)

For instance, suppose {d=3}, and we wish to view the Euler equations in cylindrical coordinates {(r,\theta,z) \in [0,+\infty) \times {\bf R}/2\pi {\bf Z} \times {\bf R}}, thus pulling back under the time-independent map {Y: [0,+\infty) \times {\bf R}/2\pi {\bf Z} \times {\bf R} \rightarrow {\bf R}^3_E} defined by

\displaystyle  Y(r, \theta,z) := (r \cos \theta, r \sin \theta, z ).

Strictly speaking, this is not a diffeomorphism due to singularities at {r=0}, but we ignore this issue for now by only working away from the {z} axis {r=0}. As is well known, the metric {d\eta^2 = (dx^1)^2 + (dx^2)^2 + (dx^3)^2} pulls back under this change of coordinates {x^1 = r \cos \theta, x^2 = r \sin \theta, z = x^3} as

\displaystyle  d(Y^* \eta)^2 = dr^2 + r^2 d\theta^2 + dz^2,

thus the pullback metric {Y^* \eta} is diagonal in {r,\theta,z} coordinates with entries

\displaystyle  (Y^* \eta)_{rr} = 1; \quad (Y^* \eta)_{\theta \theta} = r^2; \quad (Y^* \eta)_{zz} = 1.

The volume form {d\mathrm{vol}_E = dx \wedge dy \wedge dz} similarly pulls back to the familiar cylindrical coordinate volume form

\displaystyle  Y^* d\mathrm{vol}_E = r dr \wedge d \theta \wedge dz.

If (by slight abuse of notation) we write the components of {Y^* u} as {u^r, u^z, u^\theta}, and the components of {Y^* v} as {v_r, v_z, v_\theta}, then the second equation (23) in p., current formulation of the Euler equations now becomes

\displaystyle  v_r = u^r; \quad v_\theta = r^2 u^\theta; \quad v_z = u_z \ \ \ \ \ (27)

and the third equation (24) is

\displaystyle  {\mathcal L}_u ( r dr \wedge d \theta \wedge dz ) = 0

which by the product rule and Exercise 19 becomes

\displaystyle  {\mathcal L}_u(r) + r \mathrm{div} u = 0

or after expanding in coordinates

\displaystyle  u^r + r (\partial_r u^r + \partial_\theta u^\theta + \partial_z u^z) = 0.

If one substitutes (27) into eqref{yu-1-alt2} in the {r,\theta,z} coordinates to eliminate the {v} variables, we thus see that the cylindrical coordinate form of the Euler equations is

\displaystyle  \partial_t u^r + u^\theta (\partial_\theta u^r - \partial_r(r^2 u^\theta)) + u^z (\partial_z u^r - \partial_r u^z) = - \partial_r p' \ \ \ \ \ (28)

\displaystyle  \partial_t (r^2 u^\theta) + u^r (\partial_r (r^2 u^\theta) - \partial_\theta u^r) + u^z (\partial_z (r^2 u^\theta) - \partial_\theta u^z) = - \partial_\theta p' \ \ \ \ \ (29)

\displaystyle  \partial_t u^z + u^r (\partial_r u^z - \partial_z u^r) + u^\theta (\partial_\theta u^r - \partial_z (r^2 u^\theta)) = - \partial_z p' \ \ \ \ \ (30)

\displaystyle  u^r + r (\partial_r u^r + \partial_\theta u^\theta + \partial_z u^z) = 0. \ \ \ \ \ (31)

One should compare how readily one can derive these equations using the differential geometry formalism with the more pedestrian aproach using the chain rule:

Exercise 25 Starting with a smooth solution {(u,p)} to the Euler equations (1) in {{\bf R}^3_L}, and transforming to cylindrical coordinates {(r,\theta,z)}, establish the chain rule formulae

\displaystyle  u^1 = u^r \cos \theta - r u^\theta \sin \theta

\displaystyle  u^2 = u^r \sin \theta + r u^\theta \cos \theta

\displaystyle  u^3 = u^z

\displaystyle  \partial_1 = \cos \theta \partial_r - \frac{\sin \theta}{r} \partial_\theta

\displaystyle  \partial_2 = \sin \theta \partial_r + \frac{\cos \theta}{r} \partial_\theta

\displaystyle  \partial_3 = \partial_z

and use this and the identity

\displaystyle  p' := p + \frac{1}{2} (u^1 u^1 + u^2 u^2 + u^3 u^3)

to rederive the system (28)(31) (away from the {z} axis) without using the language of differential geometry.

Exercise 26 Turkington coordinates {(x,y,\zeta) \in {\bf R} \times [0,+\infty) \times {\bf R}/2\pi {\bf Z}} are a variant of cylindrical coordinates {(r,\theta,z) \in [0,+\infty) \times {\bf R}/2\pi{\bf Z} \times {\bf R}}, defined by the formulae

\displaystyle  (x,y,\zeta) := (z,r^2/2, \theta);

the advantage of these coordinates are that the map from Cartesian coordinates {(x^1,x^2,x^3)} to Turkington coordinates {(x,y,z)} is volume preserving. Show that in these coordinates, the Euler equations become

\displaystyle  \partial_t u^x + u^y (\partial_y u^x - \partial_x(\frac{u^y}{2y})) + u^\zeta (\partial_\zeta u^x - \partial_x(2y u^\zeta)) = - \partial_x p'

\displaystyle  \partial_t (\frac{u^y}{2y}) + u^x (\partial_x (\frac{u^y}{2y}) - \partial_y u^x) + u^\zeta (\partial_\zeta (\frac{u^y}{2y}) - \partial_y u_\zeta) = - \partial_y p'

\displaystyle  \partial_t (2yu^\zeta) + u^x (\partial_x (2yu^\zeta) - \partial_\zeta u^x) + u^y (\partial_y (2yu_\zeta) - \partial_\zeta (\frac{u^y}{2y})) = - \partial_\zeta p'

\displaystyle  \partial_x u^x + \partial_y u^y + \partial_\zeta u^\zeta = 0.

(These coordinates are particularly useful for studying solutions to Euler that are “axisymmetric with swirl”, in the sense that the fields {u^x, u^y, u^z, p'} do not depend on the {\zeta} variable, so that all the terms involving {\partial_\zeta} vanish; one can specialise further to the case of solutions that are “axisymmetric without swirl”, in which case {u_\zeta} also vanishes.)

We can use the differential geometry formalism to formally verify the conservation laws of the Euler equation. We begin with conservation of energy

\displaystyle  E(t) := \frac{1}{2} \int_{{\bf R}^d_E} |u|^2\ d\mathrm{vol}_E = \frac{1}{2} \int_{{\bf R}^d_E} (g^{-1} \cdot v) \neg v\ d\mathrm{vol}_E.

Formally differentiating this in time (and noting that the form {(g^{-1} \cdot w) \neg v= g^{ij} v_i w_j} is symmetric in {v,w}) we have

\displaystyle  \partial_t E(t) = \int_{{\bf R}^d_E} (g^{-1} \cdot v) \neg \partial_t v \ d\mathrm{vol}_E = \int_{{\bf R}^d_E} u \neg \partial_t v\ d\mathrm{vol}(E).

Using (22), we can write

\displaystyle  u \neg \partial_t v = - u \neg {\mathcal L}_u v - u \neg dp'.

From the Cartan formula (19) one has {u \neg dp' = {\mathcal L}_u p'}; from Exercise 23 one has {{\mathcal L}_u u = 0}, and hence by the Leibniz rule (Exercise 21(i)) we thus can write {u \neg \partial_t v} as a total derivative:

\displaystyle  u \neg \partial_t v = {\mathcal L}( - u \neg v - p' ).

From Exercise 20 we thus formally obtain the conservation law {\partial_t E}.

Now suppose that {Z \in \Gamma(T {\bf R}^d_E)} is a time-independent vector field that is a Killing vector field for the Euclidean metric {\eta}, by which we mean that

\displaystyle  {\mathcal L}_Z \eta = 0.

Taking traces in (21), this implies in particular that {Z} is divergence-free, or equivalently

\displaystyle  {\mathcal L}_Z d\mathrm{vol}_E = 0.

(Geometrically, this implication arises because the volume form {d\mathrm{vol}_E} can be constructed from the Euclidean metric {\eta} (up to a choice of orientation).) Consider the formal quantity

\displaystyle  M(t) := \int_{{\bf R}^d_E} (Z \neg v)\ d\mathrm{vol}_E.

As {v} is the only time-dependent quantity here, we may formally differentiate to obtain

\displaystyle  \partial_t M(t) = \int_{{\bf R}^d_E} (Z \neg \partial_t v)\ d\mathrm{vol}_E

Using (22), the left-hand side is

\displaystyle  - \int_{{\bf R}^d_E} (Z \neg {\mathcal L}_u v) + (Z \neg dp')\ d\mathrm{vol}_E.

By Cartan’s formula, {Z \neg dp'} is a total derivative {{\mathcal L}_Z p'}, and hence this contribution to the integral formally vanishes as {Z} is divergence-free. The quantity {Z \neg {\mathcal L}_u v} can be written using the Leibniz rule as the difference of the total derivative {{\mathcal L}_u (Z \neg v)} and the quantity {{\mathcal L}_u Z \neg v}. The former quantity also gives no contribution to the integral as {u} is divergence free, thus

\displaystyle  \partial_t M(t) = \int_{{\bf R}^d_E} {\mathcal L}_u Z \neg v\ d\mathrm{vol}_E.

By Exercise 23, we have {{\mathcal L}_u Z = - {\mathcal L}_Z u = - {\mathcal L}_Z(\eta^{-1} \cdot v)}. Since {\eta} (and hence {\eta^{-1}}) is annihilated by {{\mathcal L}_Z}, and the form {(\eta^{-1} \cdot v) \neg w = \eta^{ij} v_i w_j} is symmetric in {v,w}, we can express {{\mathcal L}_Z(\eta^{-1} \cdot v) \neg v} as a total derivative

\displaystyle  {\mathcal L}_Z(\eta^{-1} \cdot v) \neg v = \frac{1}{2} {\mathcal L}_Z ( (\eta^{-1} \cdot v) \neg v ),

and so this integral also vanishes. Thus we obtain the conservation law {\partial_t M(t) = 0}. If we set the Killing vector field {Z} equal to the constant vector field {\frac{d}{dx^i}} for some {i=1,\dots,d}, we obtain conservation of the momentum components

\displaystyle  \int_{{\bf R}^d_E} u^i\ d\mathrm{vol}_E

for {i=1,\dots,d}; if we instead set the Killing vector field {Z} equal to the rotation vector field {x^i \frac{d}{dx^j} - x^j \frac{d}{dx^i}}) (which one can easily verify to be Killing using (21)) we obtain conservation of the angular momentum components

\displaystyle  \int_{{\bf R}^d_E} x^i u^j - x^j u^i\ d\mathrm{vol}_E

for {i,j =1,\dots,d}. Unfortunately, this essentially exhausts the supply of Killing vector fields:

Exercise 27 Let {Z} be a smooth Killing vector field of the Euclidean metric {\eta}. Show that {Z} is a linear combination (with real coefficients) of the constant vector fields {\frac{d}{dx^i}}, {i=1,\dots,d} and the rotation vector fields {x^i \frac{d}{dx^j} - x^j \frac{d}{dx^i}}, {i,j=1,\dots,d}. (Hint: use (21) to show that all the second derivatives of components of {Z} vanish.)

The vorticity {2}-form {\omega(t) \in \Omega^2( {\bf R}^d_E)} is defined as the exterior derivative of the covelocity:

\displaystyle  \omega := d v.

It already made an appearance in Notes 3 from the previous quarter. Taking exterior derivatives of (22) using (10) and Exercise 21 we obtain the appealingly simple vorticity equation

\displaystyle  {\mathcal D}_t \omega = 0. \ \ \ \ \ (32)

In two and three dimensions we may take the Hodge dual {*\omega} of the velocity {2}-form to obtain either a scalar field (in dimension {d=2}) or a vector field (in dimension {d=3}), and then Exercise 21(iv) implies that

\displaystyle  {\mathcal D}_t *\omega = 0. \ \ \ \ \ (33)

In two dimensions, this gives us a lot of conservation laws, since one can apply the scalar chain rule to then formally conclude that

\displaystyle  {\mathcal D}_t F( *\omega) = 0

for any {F: {\bf R} \rightarrow {\bf R}}, which upon integration on {{\bf R}^2_E} using Exercise 20 gives the conservation law

\displaystyle  \partial_t \int_{{\bf R}^2_E} F(*\omega)\ d\mathrm{vol}_E = 0

for any such function {F}. Thus for instance the {L^p({\bf R}^2_E \rightarrow {\bf R})} norms of {*\omega} are formally conserved for every {0 < p < \infty}, and hence also for {p=\infty} by a limiting argument, recovering Proposition 24 from Notes 3 of the previous quarter.

In three dimensions there is also an interesting conservation law involving the vorticity. Observe that the wedge product {v \wedge \omega} of the covelocity and the vorticity is a {3}-form and can thus be integrated over {{\bf R}^3_E}. The helicity

\displaystyle  H(t) := \int_{{\bf R}^3_E} v(t) \wedge \omega(t) \ \ \ \ \ (34)

is a formally conserved quantity of the Euler equations. Indeed, formally differentiating and using Exercise 20 we have

\displaystyle  \partial_t H(t) = \int_{{\bf R}^3_E} {\mathcal D}_t ( v \wedge \omega).

From the Leibniz rule and (32) we have

\displaystyle  {\mathcal D}_t ( v \wedge \omega) = ({\mathcal D}_t v) \wedge \omega.

Applying (22) we can write this expression as {-dp' \wedge \omega}. From (10) we have {d\omega=0}, hence this expression is also a total derivative {-d(p' \omega)}. From Stokes’ theorem (14) we thus formally obtain the conservation of helicity: {H(t) = H(0)}(first observed by Moreau).

Exercise 28 Formally verify the conservation of momentum, angular momentum, and helicity directly from the original form (1) of the Euler equations.

Exercise 29 In even dimensions {d \geq 2}, show that the integral {\int_{{\bf R}^d_E} \bigwedge^{d/2} \omega(t)\ dt} (formed by taking the exterior product of {d/2} copies of {\omega}) is conserved by the flow, while in odd dimensions {d \geq 3}, show that the generalised helicity {\int_{{\bf R}^d_E} v(t) \wedge \bigwedge^{\frac{d-1}{2}} \omega(t)} is conserved by the flow. (This observation is due to Denis Serre, as well as unpublished work of Tartar.)

As it turns out, there are no further conservation laws for the Euler equations that are linear or quadratic integrals of the velocity field and its derivatives, at least in three dimensions; see this paper of Denis Serre. In particular, the Euler equations are not believed to be completely integrable.

Exercise 30 Let {u: [0,T) \times {\bf R}^3_E \rightarrow {\bf R}^3_E} be a smooth solution to the Euler equations in three dimensions {{\bf R}^3_E}, let {*\omega} be the vorticity vector field, and let {f: [0,T) \times {\bf R}^3_E \rightarrow {\bf R}} be an arbitrary smooth scalar field. Establish Ertel’s theorem

\displaystyle  D_t( *\omega \cdot \nabla f ) = *\omega \cdot \nabla(D_t f).

Exercise 31 (Clebsch variables) Let {u: [0,T) \times {\bf R}^3_E \rightarrow {\bf R}^3_E} be a smooth solution to the Euler equations. Suppose that at time zero, the covelocity {v(0)} takes the form

\displaystyle  v(0) = \sum_{j=1}^k \theta_j(0) d \varphi_j(0)

for some smooth scalar fields {\theta_j(0), \varphi_j(0): {\bf R}^d_E \rightarrow {\bf R}}. Show that at all subsequent times {t}, the covelocity takes the form

\displaystyle  v(t) = \sum_{j=1}^k \theta_j(t) d \varphi_j(t) + d n(t)

where {\theta_j(t), \varphi_j(t), n(t): {\bf R}^d_E \rightarrow {\bf R}} are smooth scalar fields obeying the transport equations

\displaystyle  D_t \theta_j(t) = D_t \varphi_j(t) = 0.

(The classical Clebsch variables take {k=1}, but as was observed by Constantin, the analysis also extends without difficulty to the case {k>1}.)

— 3. Viewing the Euler equations in Lagrangian coordinates —

Throughout this section, {(u,p)} is a smooth solution to the Euler equations on {[0,T) \times {\bf R}^d_E}, and let {X} be a trajectory map.

We pull back the Euler equations (22), (23), (24), to create a Lagrangian velocity field {U: [0,T) \rightarrow \Omega^1({\bf R}^d_L)}, a Lagrangian covelocity field {V: [0,T) \rightarrow \Gamma(T {\bf R}^d_L)}, a Lagrangian modified pressure field {P': [0,T) \times {\bf R}^d_L \rightarrow {\bf R}}, and a Lagrangian vorticity field {\Omega: [0,T) \rightarrow \Omega^2({\bf R}^d_L)} by the formulae

\displaystyle  U(t) := X(t)^* u(t)

\displaystyle  V(t) := X(t)^* v(t)

\displaystyle  \Omega(t) := X(t)^* \omega(t) \ \ \ \ \ (35)

\displaystyle  P'(t) := X(t)^* p'(t).

By Exercise 16, the Euler equations now take the form

\displaystyle  \partial_t V = - d P' \ \ \ \ \ (36)

\displaystyle  \mathrm{div} U = 0

\displaystyle  V = (X^* \eta) \cdot U

and the vorticity is given by

\displaystyle  \Omega = d V

and obeys the vorticity equation

\displaystyle  \partial_t \Omega = 0.

We thus see that in Lagrangian coordinates, the vorticity is a pointwise conserved quantity:

\displaystyle  \Omega_{\alpha \beta}(t, a) = \Omega_{\alpha,\beta}(0, a). \ \ \ \ \ (37)

This lets us solve for the Eulerian vorticity {\omega_{ij}} in terms of the trajectory map. Indeed, from (12), (35) we have

\displaystyle  \Omega_{\alpha \beta}(0,a) = \Omega_{\alpha \beta}(t,a) = \omega_{ij}(t,X(t,a)) \partial_\alpha X(t)^i(a) \partial_\beta X(t)^j(a);

applying the inverse {(\nabla X(t))^{-1}} of the linear transformation {\nabla X(t)}, we thus obtain the Cauchy vorticity formula

\displaystyle  \omega_{ij}(t,X(t,a)) = \Omega_{\alpha \beta}(0,a) (\nabla X(t)^{-1})^\alpha_i(a) (\nabla X(t)^{-1})^\beta_j(a). \ \ \ \ \ (38)

If we normalise the trajectory map by (4), then {\Omega(0) = \omega(0)}, and we thus have

\displaystyle  \omega_{ij}(t,X(t,a)) = \omega_{\alpha \beta}(0,a) (\nabla X(t)^{-1})^\alpha_i(a) (\nabla X(t)^{-1})^\beta_j(a). \ \ \ \ \ (39)

Thus for instance, we see that the support of the vorticity is transported by the flow:

\displaystyle  \mathrm{supp}(\omega(t)) = X(t)( \mathrm{supp}(\omega(0)) ).

Among other things, this shows that the volume and topology of the support of the vorticity remain constant in time. It also suggests that the Euler equations admit a number of “vortex patch” solutions in which the vorticity is compactly supported.

Exercise 32 Assume the normalisation (4).

  • (i) In the two-dimensional case {d=2}, show that the Cauchy vorticity formula simplifies to

    \displaystyle  \omega_{12}(t,X(t,a)) = \omega_{12}(0, a).

    Thus in this case, vorticity is simply transported by the flow.

  • (ii) In the three-dimensional case {d=3}, show that the Cauchy vorticity formula can be written using the Hodge dual {*\omega} of the vorticity as

    \displaystyle  *\omega^i(t, X(t,a)) = *\omega^\alpha(0,a) \partial_\alpha X^i( t, a ).

    Thus we see that the vorticity is transported and also stretched by the flow, with the stretching given by the matrix {\partial_\alpha X^i}.

One can also phrase the conservation of vorticity in an integral form. If {S} is a two-dimensional oriented surface in {{\bf R}^3_L} that does not vary in time, then from (37) we see that the integral

\displaystyle  \int_S \Omega(t)

is formally conserved in time:

\displaystyle  \int_S \Omega(t) = \int_S \Omega(0).

Composing this with the trajectory map {X(t)} using (35), we conclude that

\displaystyle  \int_{X(t)(S)} \omega(t) = \int_{X(0)(S)} \omega(0).

Writing {\omega = dv} and using Stokes’ theorem (14), we arrive at the Kelvin circulation theorem

\displaystyle  \int_{X(t)(\partial S)} v(t) = \int_{X(0)(\partial S)} v(0).

The integral of the covelocity {v} along a loop {\gamma} is known as the circulation of the fluid along the loop; the Kelvin circulation theorem then asserts that this circulation remains constant over time as long as the loop evolves along the flow.

Exercise 33 (Cauchy invariants)

For more discussion of Cauchy’s investigation of the Cauchy invariants and vorticity formula, see this article of Frisch and Villone.

Exercise 34 (Transport of vorticity lines) Suppose we are in three dimensions {d=3}, so that the Hodge dual {* \omega} of vorticity is a vector field. A smooth curve {\gamma} (either infinite on both ends, or a closed loop) in {{\bf R}^3_E} is said to be a vortex line (or vortex ring, in the case of a closed loop) at time {t} if at every point {x} of the curve {\gamma}, the tangent to {x} at {\gamma} is parallel to the vorticity {\omega(t,x)} at that point. Suppose that the trajectory map is normalised using (4). Show that if {\gamma} is a vortex line at time {0}, then {X(t)(\gamma)} is a vortex line at any other time {t}; thus, vortex lines (or vortex rings) flow along with the fluid.

Exercise 35 (Conservation of helicity in Lagrangian coordinates)

  • (i) In any dimension, establish the identity

    \displaystyle  \partial_t ( V(t) \wedge \Omega(t) ) = - d( P' \Omega(t) )

    in Lagrangian spacetime.

  • (ii) Conclude that in three dimensions {d=3}, the quantity

    \displaystyle  \int_{{\bf R}^3} V(t) \wedge \Omega(t)

    is formally conserved in time. Explain why this conserved quantity is the same as the helicity (34).

  • (iii) Continue assuming {d=3}. Define a vortex tube at time {t} to be a region {T \subset {\bf R}^3_E} in which, at every point {x} on the boundary {\partial T}, the vorticity vector field {*\omega(t,x)} is tangent to {\partial T}. Show that if {X(0)(T)} is a vortex tube at time {0}, then {X(t)(T)} is a vortex tube at time {T}, and the helicity {\int_{X(t)(T)} v(t) \wedge \omega(t)\ dt} on the vortex tube is formally conserved in time.
  • (iv) Let {d=3}. If the covelocity {v} can be expressed in Clebsch variables (Exercise 31) with {k=1}, show that the local helicity {\int_{X(t)(T)} v(t) \wedge \omega(t)\ dt} formally vanishes on every vortex tube {T}. This provides an obstruction to the existence of {k=1} Clebsch variables. (On the other hand, it is easy to find Clebsch variables on {{\bf R}^d} with {k=d} for an arbitrary covelocity {v}, simply by setting {\varphi_j} equal to the coordinate functions {\varphi_j(x) = x^j}.)

Exercise 36 In the three-dimensional case {d=3}, show that the material derivative {D_t} commutes with operation {*\omega \cdot \nabla} of differentiation along the (Hodge dual of the) vorticity.

The Cauchy vorticity formula (39) can be used to obtain an integral representation for the velocity {u} in terms of the trajectory map {X}, leading to the vorticity-stream formulation of the Euler equations. Recall from 254A Notes 3 that if one takes the divergence of the (Eulerian) vorticity {\omega}, one obtains the Laplacian of the (Eulerian) covelocity {u}:

\displaystyle  \Delta v_j = \partial^i \omega_{ij},

where {\partial^i := \eta^{ik} \partial_k} are the partial derivatives raised by the Euclidean metric. For {d > 2}, we can use the fundamental solution {\frac{-1}{(d-2)|S^{d-2}|} \frac{1}{|x|^{d-2}}} of the Laplacian (see Exercise 18 of 254A Notes 1) that (formally, at least)

\displaystyle  v_j(t,x) = \frac{-1}{(d-2)|S^{d-2}|} \int_{{\bf R}^d_E} \frac{\partial^i \omega_{ij}(t,y)}{|x-y|^{d-2}}\ d\mathrm{vol}_E(y).

Integrating by parts (after first removing a small ball around {x}, and observing that the boundary terms from this ball go to zero as one shrinks the radius to zero) one obtains the Biot-Savart law

\displaystyle  v_j(t,x) = \frac{1}{|S^{d-2}|} \int_{{\bf R}^d_E} \frac{(x^i-y^i) \omega_{ij}(t,y)}{|x-y|^{d}}\ d\mathrm{vol}_E(y)

for the covelocity, or equivalently

\displaystyle  u^k(t,x) = \frac{1}{|S^{d-2}|} \int_{{\bf R}^d_E} \frac{\eta^{jk} (x^i-y^i) \omega_{ij}(t,y)}{|x-y|^{d}}\ d\mathrm{vol}_E(y)

for the velocity.

Exercise 37 Show that this law is also valid in the two-dimensional case {d=2}.

Changing to Lagrangian variables, we conclude that

\displaystyle  u^k(t,X(t,a)) = \frac{1}{|S^{d-2}|} \int_{{\bf R}^d_L}

\displaystyle  \eta^{jk} \frac{(X^i(t,a)-X^i(t,b)) \omega_{ij}(t,X(t,b))}{|X(t,a)-X(t,b)|^{d}}\ d\mathrm{vol}_L(b).

Using the Cauchy vorticity formula (39) (assuming the normalisation (4)), we obtain

\displaystyle  u^k(t,X(t,a)) = \frac{1}{|S^{d-2}|} \int_{{\bf R}^d_L}

\displaystyle \frac{(X^i(t,a)-X^i(t,b)) (\nabla X(t)^{-1})^\alpha_i(b) (\nabla X(t)^{-1})^\beta_j(b)}{|X(t,a)-X(t,b)|^{d}} \omega_{\alpha \beta}(0,b) \ d\mathrm{vol}_L(b).

Combining this with (3), we obtain an integral-differential equation for the evolution of the trajectory map:

\displaystyle  \partial_t X^k(t,a) = \frac{1}{|S^{d-2}|} \int_{{\bf R}^d_L} \ \ \ \ \ (42)

\displaystyle  \eta^{jk} \frac{(X^i(t,a)-X^i(t,b)) (\nabla X(t)^{-1})^\alpha_i(b) (\nabla X(t)^{-1})^\beta_j(b)}{|X(t,a)-X(t,b)|^{d}} \omega_{\alpha \beta}(0,b) \ d\mathrm{vol}_L(b).

This is known as the vorticity-stream formulation of the Euler equations. In two and three dimensions, the formulation can be simplified using the alternate forms of the vorticity formula in Exercise 32. While the equation (42) looks complicated, it is actually well suited for Picard-type iteration arguments (of the type used in 254A Notes 1), due to the relatively small number of derivatives on the right-hand side. Indeed, it turns out that one can iterate this equation with the trajectory map placed in function spaces such as {C^0_t C^{1,\alpha}_x( [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E)}; see Chapter 4 of Bertozzi-Majda for details.

Remark 38 Because of the ability to solve the Euler equations in Lagrangian coordinates by an iteration method, the local well-posedness theory is slightly stronger in some respects in Lagrangian coordinates than it is in Eulerian coordinates. For instance, in this paper of Constantin Kukavica and Vicol it is shown that Lagrangian coordinate Euler equations are well-posed in Gevrey spaces, while Eulerian coordinate Euler equations are not. It also happens that the trajectory maps {X(t,a)} are real-analytic in {t} even if the initial data is merely smooth; see for instance this paper of Constantin-Vicol-Wu and the references therein.

— 4. Variational characterisation of the Euler equations —

Our computations in this section will be even more formal than in previous sections.

From Exercise 1, a (smooth, bounded) vector field {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} (together with a choice of initial map {X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E}) gives rise to a trajectory map {X(t): [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}. From Lemma 3, we see that that {X(t)} is volume preserving for all times {t \in [0,T)} if and only if {X_0} is volume preserving and if {u} is divergence-free. Given such a trajectory map, let us formally define the Lagrangian {\mathcal{L}(X)} by the formula

\displaystyle  \mathcal{L}(X) := \frac{1}{2} \int_0^T \int_{{\bf R}^d_L} |\partial_t X|_\eta^2\ d\mathrm{vol}_L dt \ \ \ \ \ (43)

\displaystyle  = \frac{1}{2} \int_0^T \int_{{\bf R}^d_L} \eta_{ij} (\partial_t X^i) \partial_t X^j\ d\mathrm{vol}_L dt.

As observed by Arnold, the Euler equations can be viewed as the Euler-Lagrange equations for this Lagrangian, subject to the constraint that the trajectory map is always volume-preserving:

Proposition 39 Let {u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} be a smooth bounded divergence-free vector field with a volume-preserving trajectory map {X(t)}. Then the following are formally equivalent:

  • (i) There is a pressure field {p: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}} such that {(u,p)} solves the Euler equations.
  • (ii) The trajectory map {X} is a critical point of the Lagrangian {\mathcal{L}(X)} with respect to all compactly supported infinitesimal perturbations of {X} in {(0,T) \times {\bf R}^d_L} that preserve the volume-preserving nature of the trajectory map.

Proof: First suppose that (i) holds. Consider an infinitesimal deformation {X + \varepsilon Y + O(\varepsilon^2)} of the trajectory map, with {Y} compactly supported in {(0,T) \times {\bf R}^d_L}, where one can view {\varepsilon} either as an infinitesimal or as a parameter tending to zero (in this formal analysis we will not bother to make the setup more precise than this). If this deformation is still volume-preserving, then we have

\displaystyle  \det( \nabla X + \varepsilon \nabla Y ) = 1 + O(\varepsilon^2);

differentiating at {\varepsilon=0} using Exercise 4 we see that

\displaystyle  \mathrm{tr}( (\nabla X)^{-1} \nabla Y ) = 0

or in coordinates

\displaystyle  (\nabla X)^{-1})^\alpha_i \partial_\alpha Y^i = 0. \ \ \ \ \ (44)

Writing {y(t,x) := Y(t,X(t)^{-1}(x))}, we thus see from the chain rule that the Eulerian vector field {y: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E} is divergence-free

\displaystyle  \partial_i y^i = 0. \ \ \ \ \ (45)

Now, let us compute the infinitesimal variation of the Lagrangian:

\displaystyle  \frac{d}{d\varepsilon} \mathcal{L}(X + \varepsilon Y + O(\varepsilon^2)) |_{\varepsilon=0}.

Formally differentiating under the integral sign, this expression becomes

\displaystyle  \frac{1}{2} \int_0^T \int_{{\bf R}^d_L} \eta_{ij} (\partial_t X^i) \partial_t Y^j + \eta_{ij} (\partial_t Y^i) \partial_t X^j\ d\mathrm{vol}_L dt

which by symmetry simplifies to

\displaystyle  \int_0^T \int_{{\bf R}^d_L} \eta_{ij} (\partial_t X^i) \partial_t Y^j\ d\mathrm{vol}_L dt

We integrate by parts in time to move the derivative off of the perturbation {Y}, to arrive at

\displaystyle  - \int_0^T \int_{{\bf R}^d_L} \eta_{ij} (\partial_{t}^2 X^i) Y^j\ d\mathrm{vol}_L dt. \ \ \ \ \ (46)

Using Newton’s first law (41), this becomes

\displaystyle  \int_0^T \int_{{\bf R}^d_L} ((\nabla X)^{-1})^\beta_j (\partial_\beta P) Y^j\ d\mathrm{vol}_L dt.

Writing {P = X(t)^* p}, we can change to Eulerian variables to obtain

\displaystyle  \int_0^T \int_{{\bf R}^d_E} \partial_j p y^j\ d\mathrm{vol}_E dt.

We can now integrate by parts and use (45) and conclude that this variation vanishes. Thus {X} is a formal critical point of the Lagrangian.

Conversely, if {X} is a formal critical point, then the above analysis shows that the expression (46) vanishes whenever {y} obeys (45). Changing variables to Euclidean space, this expression becomes

\displaystyle  - \int_0^T \int_{{\bf R}^d_E} \eta_{ij} (\partial_{t}^2 X^i \circ X(t)^{-1}) y^j\ d\mathrm{vol}_E dt.

Hodge theory (cf. Exercise 16 of 254A Notes 1) then implies (formally) that {\eta_{ij} (\partial_{t}^2 X^i \circ X(t)^{-1})} must be a differential {-\partial_j p}, which is equivalent to Newton’s first law (41), which is in turn equivalent to the Euler equations (recalling that {u} is assumed to be divergence-free). \Box

Remark 40 The above analysis reveals that the pressure field {p} can be interpreted as a Lagrange multiplier arising from the constraint that the trajectory map be volume-preserving.

Following Arnold, one can use Proposition 39 to formally interpret the Euler equations as a geodesic flow on an infinite dimensional Riemannian manifold. Indeed, for a finite-dimensional Riemannian manifold {(M,g)}, it is well known that (constant speed) geodesics {\gamma: [0,T) \rightarrow M} are formal critical points of the energy functional

\displaystyle  \mathcal{L}(\gamma) := \frac{1}{2} \int_0^T |\partial_t \gamma(t)|_g^2\ dt.

Thus we see that if we formally take {M} to be the infinite-dimensional space of volume-preserving diffeomorphisms {X: {\bf R}^d_L \rightarrow {\bf R}^d_E}, with the formal Riemannian metric {g(X)} at a point {X \in M} in the directions of two infinitesimal perturbations {Y,Z: {\bf R}^d_L \rightarrow {\bf R}^d_E} defined by

\displaystyle  g(X)( Y, Z) := \int_{{\bf R}^d_L} \eta(Y,Z)\ d\mathrm{vol}_L

then Proposition 39 asserts, formally, that solutions to the Euler equations coincide with constant speed geodesic flows on {M}. As it turns out, a number of other physical equations, including several further fluid equations, also have such a geodesic interpretation, such as Burgers’ equation, the Korteweg-de Vries equation, and the Camassa-Holm equations; see for instance this paper of Vizman for a survey. In principle this means that the tools of Riemannian geometry could be deployed to obtain a better understanding of the Euler equations (and of the other equations mentioned above), although to date this has proven to be somewhat elusive (except when discussing conservation laws, as in Remark 41 below) for a number of reasons, not the least of which is that rigorous Riemannian geometry on infinite-dimensional manifolds is technically quite problematic. (Nevertheless, one can at least recover the local existence theory for the Euler equations this way; see the aforementioned work of Ebin and Marsden.)

Remark 41 Noether’s theorem tells us that one should expect a one-to-one correspondence between symmetries of a Lagrangian {{\mathcal L}} and conservation laws of the corresponding Euler-Lagrange equation. Applying this to Proposition 39, we conclude that the conservation laws of the Euler equations should correspond to symmetries of the Lagrangian (43). There are basically two obvious symmetries of this Lagrangian; one coming from isometries of Eulerian spacetime {{\bf R} \times {\bf R}^d_E}, and in particular time translation, spatial translation, and spatial rotation; and the other coming from volume-preserving diffeomorphisms of Lagrangian space {{\bf R}^d_L}. One can check that time translation corresponds to energy conservation, spatial translation corresponds to momentum conservation, and spatial rotation corresponds to angular momentum conservation, while Lagrangian diffeomorphism invariance corresponds to conservation of Lagrangian vorticity (or equivalently, the Cauchy vorticity formula). In three dimensions, if one specialises to the specific Lagrangian diffeomorphism created by flow along the vorticity vector field {*\omega}, one also recovers conservation of helicity; see this previous blog post for more discussion.

Remark 42 There are also Hamiltonian formulations of the Euler equations that do not correspond exactly to the geodesic flow interpretation here; see this paper of Olver. Again, one can explain each of the known conservation laws for the Euler equations in terms of symmetries of the Hamiltonian.

Further discussion of the geodesic flow interpretation of the Euler equations may be found in this previous blog post.

Jordan EllenbergIt’s right by the airport

I went to California last week to talk math and machine learning with Ben Recht (have you read his awesome blogstravaganza about reinforcement learning and control?) My first time on the brand-new Madison – San Francisco direct flight (the long-time wish of Silicon Isthmus finally realized!) That flight only goes once a day, which means I landed at SFO at 6:15, in the middle of rush hour, which meant getting to Berkeley by car was going to take almost an hour and a half.  So maybe it made more sense to have dinner near SFO and then go to the East Bay.  But where can you have dinner near SFO?

Well, here’s what I learned.  When I was at MSRI for the Galois Groups and Fundamental Groups semester in 1999, there was an amazing Chinese restaurant in Albany, CA called China Village.  I learned about it from my favorite website at the time,  China Village is still there and apparently still great, but the original chef, Zongyi Liu, left long ago.  Chowhound, too, is still there, but a thin shadow of its old self.  When I checked Chowhound this week, though, I learned something fantastic — Liu is back and cooking in Millbrae!  At Royal Feast, a 10-minute drive from SFO.  So what started as a plan to dodge traffic turned into the best Chinese meal I’ve eaten in forever.  Now I’m thinking I’ll probably stop there every time I fly to San Francisco!  And it’s right by the Millbrae BART station, so if you’re going into the city, it’s as convenient as being at the airport.

So that got me thinking:  what are good things to know about that are right near the airport in other cities?  The neighborhood around the airport is often kind of unpromising, so it’s good to have some prior knowledge of places worth stopping.  And I actually have a pretty decent list!

LAX:  This is easy — you can go to the beach!  Dockweiler State Beach is maybe 5 minutes from the airport.  It’s a state park, not developed, so there’s no boardwalk, no snack stand, and, when I went there, no people.  You just walk down to the ocean and look at the waves and every thirty seconds or so a jumbo jet blasts by overhead on its way to Asia because did I mention 5 minutes from the airport?  You’re right under the takeoff path.  And it’s great.  A sensory experience like no other beach there is.  I just stood there for an hour thinking about math.

BOSTON:  There is lots of great pizza in Boston, of course, but Santarpio’s in East Boston might be the very best I’ve had, and it’s only 7 minutes from Logan airport.  Stop there and get takeout on your way unless you want to bring yet another $13 cup of Legal Seafood chowder on your flight.

MILWAUKEE:  I have already blogged about the unexpectedly excellent Jalapeño Loco, literally across the street from the airport.  Best chile en nogada in the great state of Wisconsin.

SEATTLE:  The Museum of Flight isn’t quite as close to Sea-Tac as some of these other attractions are to their airports — 12 minutes away per Google Maps.  But it’s very worth seeing, especially if you happen to be landing in Seattle with an aircraft-mad 11-year-old in tow.

MADISON:  “The best barbecue in Madison, Wisconsin” is not going to impress my friends south of the Mason-Dixon line, or even my friends south of the Beloit-Rockford line, but Smoky Jon’s, just north of the airport on Packers Avenue (not named for the football team, but for the actual packers who worked at the Oscar Mayer plant that stood on this road until 2017) is the real thing, good enough for out of town visitors and definitely better than what’s on offer at MSN.

CHICAGO:  No, O’Hare is terrible in this way as in every other way.  I once got stuck there for the night and tried to find something exciting in the area to do or eat.  I didn’t succeed.

You guys travel a lot — you must have some good ones!  Put them in the comments.

December 16, 2018

Terence Tao254A, Notes 3: Local well-posedness for the Euler equations

We now turn to the local existence theory for the initial value problem for the incompressible Euler equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = - \nabla p \ \ \ \ \ (1)

\displaystyle  \nabla \cdot u = 0

\displaystyle  u(0,x) = u_0(x).

For sake of discussion we will just work in the non-periodic domain {{\bf R}^d}, {d \geq 2}, although the arguments here can be adapted without much difficulty to the periodic setting. We will only work with solutions in which the pressure {p} is normalised in the usual fashion:

\displaystyle  p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u). \ \ \ \ \ (2)

Formally, the Euler equations (with normalised pressure) arise as the vanishing viscosity limit {\nu \rightarrow 0} of the Navier-Stokes equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = - \nabla p + \nu \Delta u \ \ \ \ \ (3)

\displaystyle  \nabla \cdot u = 0

\displaystyle  p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u)

\displaystyle  u(0,x) = u_0(x)

that was studied in previous notes. However, because most of the bounds established in previous notes, either on the lifespan {T_*} of the solution or on the size of the solution itself, depended on {\nu}, it is not immediate how to justify passing to the limit and obtain either a strong well-posedness theory or a weak solution theory for the limiting equation (1). (For instance, weak solutions to the Navier-Stokes equations (or the approximate solutions used to create such weak solutions) have {\nabla u} lying in {L^2_{t,loc} L^2_x} for {\nu>0}, but the bound on the norm is {O(\nu^{-1/2})} and so one could lose this regularity in the limit {\nu \rightarrow 0}, at which point it is not clear how to ensure that the nonlinear term {u_j u} still converges in the sense of distributions to what one expects.)

Nevertheless, by carefully using the energy method (which we will do loosely following an approach of Bertozzi and Majda), it is still possible to obtain local-in-time estimates on (high-regularity) solutions to (3) that are uniform in the limit {\nu \rightarrow 0}. Such a priori estimates can then be combined with a number of variants of these estimates obtain a satisfactory local well-posedness theory for the Euler equations. Among other things, we will be able to establish the Beale-Kato-Majda criterion – smooth solutions to the Euler (or Navier-Stokes) equations can be continued indefinitely unless the integral

\displaystyle  \int_0^{T_*} \| \omega(t) \|_{L^\infty_x( {\bf R}^d \rightarrow \wedge^2 {\bf R}^d )}\ dt

becomes infinite at the final time {T_*}, where {\omega := \nabla \wedge u} is the vorticity field. The vorticity has the important property that it is transported by the Euler flow, and in two spatial dimensions it can be used to establish global regularity for both the Euler and Navier-Stokes equations in these settings. (Unfortunately, in three and higher dimensions the phenomenon of vortex stretching has frustrated all attempts to date to use the vorticity transport property to establish global regularity of either equation in this setting.)

There is a rather different approach to establishing local well-posedness for the Euler equations, which relies on the vorticity-stream formulation of these equations. This will be discused in a later set of notes.

— 1. A priori bounds —

We now develop some a priori bounds for very smooth solutions to Navier-Stokes that are uniform in the viscosity {\nu}. Define an {H^\infty} function to be a function that lies in every {H^s} space; similarly define an {L^p_t H^\infty_x} function to be a function that lies in {L^p_t H^s_x} for every {s}. Given divergence-free {H^\infty({\bf R}^d \rightarrow {\bf R}^d)} initial data {u_0}, an {H^\infty} mild solution {u} to the Navier-Stokes initial value problem (3) is a solution that is an {H^s} mild solution for all {s}. From the (non-periodic version of) Corollary 40 of Notes 1, we know that for any {H^\infty({\bf R}^d \rightarrow {\bf R}^d)} divergence-free initial data {u_0}, there is unique {H^\infty} maximal Cauchy development {u: [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d}, with {\|u\|_{L^\infty_t L^\infty_x([0,T_*) \times {\bf R}^d)}} infinite if {T_*} is finite.

Here are our first bounds:

Theorem 1 (A priori bound) Let {u: [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d} be an {H^\infty} maximal Cauchy development to (3) with initial data {u_0}.

  • (i) For any integer {s > \frac{d}{2}+1}, we have

    \displaystyle  T_* \gtrsim_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}.

    Furthermore, if {0 \leq t \leq c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}} for a sufficiently small constant {c_{s,d}>0} depending only on {s,d}, then

    \displaystyle  \| u(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}.

  • (ii) For any {0 < T < T_*} and integer {s \geq 0}, one has

    \displaystyle  \| u \|_{L^\infty_t H^s_x([0,T] \times{\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d}

    \displaystyle \exp( O_{s,d}( \| \nabla u \|_{L^1_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})} ) ) \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}.

The hypothesis that {s} is integer can be dropped by more heavily exploiting the theory of paraproducts, but we shall restrict attention to integer {s} for simplicity.

We now prove this theorem using the energy method. Using the Navier-Stokes equations, we see that {u, p} and {\partial_t u} all lie in {L^\infty_t H^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} for any {0 < T < T_*}; an easy iteration argument then shows that the same is true for all higher derivatives of {u,p} also. This will make it easy to justify the differentiation under the integral sign that we shall shortly perform.

Let {s \geq 0} be an integer. For each time {t \in [0,T)}, we introduce the energy-type quantity

\displaystyle  E(t) := \sum_{m=0}^s \frac{1}{2} \int_{{\bf R}^d} |\nabla^m u(t,x)|^2\ dx.

Here we think of {\nabla^m u} as taking values in the Euclidean space {{\bf R}^{d^{m+1}}}. This quantity is of course comparable to {\| u(t) \|_{H^m({\bf R}^d \rightarrow {\bf R}^d)}^2}, up to constants depending on {d,s}. It is easy to verify that {E(t)} is continuously differentiable in time, with derivative

\displaystyle  \partial_t E(t) = \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u \cdot \nabla^m \partial_t u\ dx,

where we suppress explicit dependence on {t,x} in the integrand for brevity. We now try to bound this quantity in terms of {E(t)}. We expand the right-hand side in coordinates using (3) to obtain

\displaystyle  \partial_t E(t) = -A - B +C


\displaystyle  A := \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m (u_j \partial_j u_i)\ dx

\displaystyle  B := \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m \partial_i p\ dx

\displaystyle  C := \nu \sum_{m=0}^s \int_{{\bf R}^d} \nabla^m u_i \cdot \nabla^m \partial_j \partial_j u_i\ dx.

For {B}, we can integrate by parts to move the {\partial_i} operator onto {u_i} and use the divergence-free nature {\partial_i u_i=0} of {u} to conclude that {B=0}. Similarly, we may integrate by parts for {C} to move one copy of {\partial_j} over to the other factor in the integrand to conclude

\displaystyle  C = - \nu \sum_{m=0}^s \sum_{{\bf R}^d} |\nabla^{m+1} u|^2\ dx

so in particular {C \leq 0} (note that as we are seeking bounds that are uniform in {\nu}, we can’t get much further use out of {C} beyond this bound). Thus we have

\displaystyle  \partial_t E(t) \leq -A.

Now we expand out {A} using the Leibniz rule. There is one dangerous term, in which all the derivatives in {\nabla^m (u_j \partial_j u_i)} fall on the {u_i} factor, giving rise to the expression

\displaystyle  \sum_{m=0}^s \int_{{\bf R}^d} u_j \nabla^m u_i \cdot \nabla^m \partial_j u_i\ dx.

But we can locate a total derivative to write this as

\displaystyle  \frac{1}{2} \sum_{m=0}^s \int_{{\bf R}^d} u_j \partial_j |\nabla^m u|^2\ dx,

and then an integration by parts using {\partial_j u_j=0} as before shows that this term vanishes. Estimating the remaining contributions to {A} using the triangle inequality, we arrive at the bound

\displaystyle  |A| \lesssim_{s,d} \sum_{m=1}^s \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m u| |\nabla^a u| |\nabla^{m-a+1} u|\ dx.

At this point we now need a variant of Proposition 35 from Notes 1:

Exercise 2 Let {a,b \geq 0} be integers. For any {f,g \in H^\infty({\bf R}^d \rightarrow {\bf R})}, show that

\displaystyle  \| |\nabla^a f| |\nabla^b g| \|_{L^2({\bf R}^d \rightarrow {\bf R})} \lesssim_{a,b,d} \| f \|_{L^\infty({\bf R}^d \rightarrow {\bf R})} \| g \|_{H^{a+b}({\bf R}^d \rightarrow {\bf R})}

\displaystyle + \| f \|_{H^{a+b}({\bf R}^d \rightarrow {\bf R})} \| g \|_{L^\infty({\bf R}^d \rightarrow {\bf R})}.

(Hint: for {a=0} or {b=0}, use Hölder’s inequality. Otherwise, use a suitable Littlewood-Paley decomposition.)

Using this exercise and Hölder’s inequality, we see that

\displaystyle  \int_{{\bf R}^d} |\nabla^m u| |\nabla^a u| |\nabla^{m-a+1} u| \lesssim_{a,m,d} \| \nabla^m u \|_{L^2({\bf R}^d \rightarrow {\bf R}^{d^{m+1}})} \| \nabla u \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})}

\displaystyle \| \nabla^m u \|_{L^2({\bf R}^d \rightarrow {\bf R}^{d^{m+1}})}

and thus

\displaystyle  \partial_t E(t) \leq O_{s,d}( E(t) \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} ). \ \ \ \ \ (4)

By Gronwall’s inequality we conclude that

\displaystyle  E(t) \leq E(0) \exp( O_{s,d}( \| \nabla u \|_{L^1_t L^\infty_x( [0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2} )} ) )

for any {0 < T < T_*} and {t \in [0,T]}, which gives part (ii).

Now assume {s > \frac{d}{2}+1}. Then we have the Sobolev embedding

\displaystyle  \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_{s,d} E(t)^{1/2}

which when inserted into (4) yields the differential inequality

\displaystyle  \partial_t E(t) \leq O_{s,d}( E(t)^{3/2} )

or equivalently

\displaystyle  \partial_t E(t)^{-1/2} \geq - C_{s,d}

for some constant {C_{s,d}} (strictly speaking one should work with {(\varepsilon + E(t))^{-1/2}} for some small {\varepsilon>0} which one sends to zero later, if one wants to avoid the possibility that {E(t)} vanishes, but we will ignore this small technicality for sake of exposition.) Since {E(0)^{-1/2} \gtrsim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}^{-1}}, we conclude that {E(t)} stays bounded for a time interval of the form {0 \leq t < \min( c_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}^{-1}, T_*)}; this, together with the blowup criterion that {\|u(t)\|_{H^s}} must go to infinity as {t \rightarrow T_*}, gives part (i).

As a consequence, we can now obtain local existence for the Euler equations from smooth data:

Corollary 3 (Local existence for smooth solutions) Let {u_0 \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free. Let {s > \frac{d}{2}+1} be an integer, and set

\displaystyle  T := c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}.

Then there is a smooth solution {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p: [0,T] \times {\bf R}^d \rightarrow {\bf R}} to (1) with all derivatives of {u,p} in {L^\infty_t H^\infty([0,T] \times {\bf R}^d \rightarrow {\bf R}^m)} for appropriate {m}. Furthermore, for any integer {s' \geq 0}, one has

\displaystyle  \| u \|_{L^\infty_t H^{s'}_x([0,T] \times{\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,s',d} \| u_0 \|_{H^{s'}_x({\bf R}^d \rightarrow {\bf R}^d)}. \ \ \ \ \ (5)

Proof: We use the compactness method, which will be more powerful here than in the last section because we have much higher regularity uniform bounds (but they are only local in time rather than global). Let {\nu_n > 0} be a sequence of viscosities going to zero. By the local existence theory for Navier-Stokes (Corollary 40 of Notes 1), for each {n} we have a maximal Cauchy development {u^{(n)}: [0,T^{(n)}) \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(n)}: [0,T^{(n)}_*) \times {\bf R}^d \rightarrow {\bf R}^d} to the Navier-Stokes initial value problem (3) with viscosity {\nu_n} and initial data {u_0}. From Theorem 1(i), we have {T^{(n)}_* \geq T} for all {n} (if {c_{s,d}} is small enough), and

\displaystyle  \| u^{(n)} \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)}

for all {n}. By Sobolev embedding, this implies that

\displaystyle  \| \nabla u^{(n)} \|_{L^\infty_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_{s,d} \| u_0 \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)},

and then by Theorem 1(ii) one has

\displaystyle  \| u^{(n)} \|_{L^\infty_t H^{s'}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d})} \lesssim_{s,s',d} \| u_0 \|_{H^{s'}({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (6)

for every integer {s}. Thus, for each {s'}, {u^{(n)}} is bounded in {L^\infty_t H^{s'}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^{d^2})}, uniformly in {n}. By repeatedly using (3) and product estimates for Sobolev spaces, we see the same is true for {p^{(n)}}, and for all higher derivatives of {u^{(n)}, p^{(n)}}. In particular, all derivatives of {u^{(n)}, p^{(n)}} are equicontinuous.

Using weak compactness (Proposition 2 of Notes 2), one can pass to a subsequence such that {u^{(n)}, p^{(n)}} converge weakly to some limits {u, p}, such that {u,p} and all their derivatives lie in {L^\infty_t H^{s'}_x} on {[0,T] \times {\bf R}^d}; in particular, {u,p} are smooth. From the Arzelá-Ascoli theorem (and Proposition 3 of Notes 2), {u^{(n)}} and {p^{(n)}} converge locally uniformly to {u,p}, and similarly for all derivatives of {u,p}. One can then take limits in (3) and conclude that {u,p} solve (1). The bound (5) follows from taking limits in (6). \Box

Remark 4 We are able to easily pass to the zero viscosity limit here because our domain {{\bf R}^d} has no boundary. In the presence of a boundary, we cannot freely differentiate in space as casually as we have been doing above, and one no longer has bounds on higher derivatives on {u} and {p} near the boundary that are uniform in the viscosity. Instead, it is possible for the fluid to form a thin boundary layer that has a non-trivial effect on the limiting dynamics. We hope to return to this topic in a future set of notes.

We have constructed a local smooth solution to the Euler equations from smooth data, but have not yet established uniqueness or continuous dependence on the data; related to the latter point, we have not extended the construction to larger classes of initial data than the smooth class {H^\infty}. To accomplish these tasks we need a further a priori estimate, now involving differences of two solutions, rather than just bounding a single solution:

Theorem 5 (A priori bound for differences) Let {R>0}, let {s > \frac{d}{2}+1} be an integer, and let {u_0, v_0 \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free with {H^s({\bf R}^d \rightarrow {\bf R}^d)} norm at most {R}. Let

\displaystyle  0 < T \leq c_{s,d} R^{-1}

where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {u: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} and {p: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be an {H^\infty} solution to (1) with initial data {u_0} (this exists thanks to Corollary 3), and let {v: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d} and {q: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be an {H^\infty} solution to (1) with initial data {v_0}. Then one has

\displaystyle  \|u-v\|_{L^\infty_t H^{s-1}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0-v_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (7)


\displaystyle  \|u-v\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0-v_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \ \ \ \ \ (8)

\displaystyle  + T \|u_0-v_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| v_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

Note the asymmetry between {u} and {v} in (8): this estimate requires control on the initial data {v_0} in the high regularity space {H^{s+1}} in order to be usable, but has no such requirement on the initial data {u_0}. This asymmetry will be important in some later applications.

Proof: From Corollary 3 we have

\displaystyle  \| u \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, \| v \|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} R \ \ \ \ \ (9)


\displaystyle  \| v \|_{L^\infty_t H^{s+1}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \| v_0 \|_{H^{s+1}_x({\bf R}^d \rightarrow {\bf R}^d)}. \ \ \ \ \ (10)

Now we need bounds on the difference {w := u-v}. Initially we have {w(0)=w_0}, where {w_0 := u_0-v_0}. To evolve later in time, we will need to use the energy method. Subtracting (1) for {(u,p)} and {(v,q)}, we have

\displaystyle  \partial_t w + w \cdot \nabla v + u \cdot \nabla w = - \nabla (p-q)

\displaystyle  \nabla \cdot w = 0.

By hypothesis, all derivatives of {w} and {p-q} lie in {L^\infty_t H^\infty_x} on {[0,T] \times {\bf R}^d}, which will allow us to justify the manipulations below without difficulty. We introduce the low regularity energy for the difference:

\displaystyle  E^{s-1}(t) = \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^k w \cdot \nabla^k w\ dx.

Arguing as in the proof of Proposition 1, we see that

\displaystyle  \partial_t E^{s-1}(t) = -A - B


\displaystyle  A := \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^m w_i \cdot \nabla^m (w_j \partial_j v_i + u_j \partial_j w_i)\ dx

\displaystyle  B := \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^m w_i \cdot \nabla^m \partial_i (p-q)\ dx.

As before, the divergence-free nature of {w} ensures that {B} vanishes. For {A}, we use the Leibniz rule and again extract out the dangerous term

\displaystyle  \sum_{m=0}^{s-1} \int_{{\bf R}^d} u_j \nabla^m w_i \cdot \nabla^m \partial_j w_i\ dx,

which again vanishes by integration by parts. We then use the triangle inequality to bound

\displaystyle  |A| \lesssim_{s,d} \sum_{m=0}^{s-1} \sum_{a=0}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| \ dx

\displaystyle  + \sum_{m=1}^{s-1} \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| + |\nabla^m w| |\nabla^a u| |\nabla^{m-a+1} w| \ dx.

The key point here is that at most {s-1} derivatives are being applied to {w} at any given time, although the full {s} derivatives may also hit {u} or {v}. Using Exercise 2 and Hölder, we may bound the above expression by

\displaystyle  \lesssim_{s,d} \sum_{m=0}^{s-1} \| \nabla^m w\|_{L^2} ( \| w \|_{L^\infty} \| \nabla^{m+1} v \|_{L^2} + \| \nabla^m w \|_{L^2} \| \nabla v \|_{L^\infty})

\displaystyle  + \sum_{m=1}^{s-1} \| \nabla^m w\|_{L^2} ( \| \nabla u \|_{L^\infty} \| \nabla^{m} w \|_{L^2} + \| \nabla^{m+1} u \|_{L^2} \| w \|_{L^\infty})

which by Sobolev embedding gives

\displaystyle  \lesssim_{s,d} E^{s-1}(t) ( \| v(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} + \| u(t) \|_{H^s({\bf R}^d \rightarrow {\bf R}^d)} ).

Applying (9) and Gronwall’s inequality, we conclude that

\displaystyle  E^{s-1}(t) \lesssim_{s,d} E^{s-1}(0)

for {0 \leq t \leq T}, and (7) follows.

Now we work with the high regularity energy

\displaystyle  E^{s}(t) = \sum_{m=0}^{s-1} \int_{{\bf R}^d} \nabla^k w \cdot \nabla^k \partial_t w\ dx.

Arguing as before we have

\displaystyle  \partial_t E^s(t) \lesssim_{s,d} \sum_{m=0}^{s} \sum_{a=0}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| \ dx

\displaystyle  + \sum_{m=1}^{s} \sum_{a=1}^m \int_{{\bf R}^d} |\nabla^m w| |\nabla^a w| |\nabla^{m-a+1} v| + |\nabla^m w| |\nabla^a u| |\nabla^{m-a+1} w| \ dx.

Using Exercise 2 and Hölder, we may bound this by

\displaystyle  \lesssim_{s,d} \sum_{m=0}^{s} \| \nabla^m w\|_{L^2} ( \| w \|_{L^\infty} \| \nabla^{m+1} v \|_{L^2} + \| \nabla^m w \|_{L^2} \| \nabla v \|_{L^\infty})

\displaystyle  + \sum_{m=1}^{s} \| \nabla^m w\|_{L^2} ( \| \nabla u \|_{L^\infty} \| \nabla^{m} w \|_{L^2} + \| \nabla^{m} u \|_{L^2} \| \nabla w \|_{L^\infty}).

Using Sobolev embedding we thus have

\displaystyle  \partial_t E^s(t) \lesssim_{s,d} E^s(t)^{1/2} E^{s-1}(t)^{1/2} \|v(t)\|_{H^{s+1}} + E^s(t) \|v(t)\|_{H^s}

\displaystyle  + E^s(t) \|u(t)\|_{H^s} + E^s(t) \|u(t) \|_{H^s}

and hence by (9), (10), (7)

\displaystyle  \partial_t E^s(t) \lesssim_{s,d} E^s(t)^{1/2} \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}} + R E^s(t).

By the chain rule, we obtain

\displaystyle  \partial_t (E^s(t)^{1/2}) \lesssim_{s,d} \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}} + R E^s(t)^{1/2}

(one can work with {(\varepsilon + E^s(t))^{1/2}} in place of {E^s(t)^{1/2}} and then send {\varepsilon \rightarrow 0} later if one wishes to avoid a lack of differentiability at {0}). By Gronwall’s inequality, we conclude that

\displaystyle  E^s(t)^{1/2} \lesssim_{s,d} E^s(0)^{1/2} + R \| w_0 \|_{H^{s-1}} \|v_0\|_{H^{s+1}}

for all {0 \leq t \leq T}, and (8) follows. \Box

By specialising (7) (or (8)) to the case where {u_0=v_0}, we see the solution constructed in Corollary 3 is unique. Now we can extend to wider classes of initial data than {H^\infty} initial data. The following result is essentially due to Kato and to Swann (with a similar result obtained by different methods by Ebin-Marsden):

Proposition 6 Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free. Set

\displaystyle  T := c_{s,d} \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}^{-1}

where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {u_0^{(n)} \in H^\infty({\bf R}^d \rightarrow {\bf R}^d)} be a sequence of divergence-free vector fields converging to {u_0} in {H^s} norm (for instance, one could apply Littlewood-Paley projections to {u_0}). Let {u^{(n)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(n)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be the associated solutions to (1) provided by Corollary 3 (these are well-defined for {n} large enough). Then {u^{(n)}} and {p^{(n)}} converge in {L^\infty_t H^s_x} norm on {[0,T] \times {\bf R}^d \rightarrow {\bf R}^d} to limits {u \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R})} respectively, which solve (1) in a distributional sense.

Proof: We use a variant of Kato’s argument (see also the paper of Bona and Smith for a related technique). It will suffice to show that the {u_0^{(n)}} form a Cauchy sequence in {C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, since the algebra properties of {H^s} then give the same for {p^{(n)}}, and one can then easily take limits (in this relatively high regularity setting) to obtain the limiting solution {u,p} that solves (1) in a distributional sense.

Let {N} be a large dyadic integer. By Corollary 3, we may find an {H^\infty} solution {v^{(N)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d, q^{(N)}} be the solution to the Euler equations (1) with initial data {P_{\leq N} u_0} (which lies in {H^\infty}). From Theorem 5, one has

\displaystyle  \|u^{(n)}-v^{(N)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0^{(n)}- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

\displaystyle  + T \|u_0^{(n)}- P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| P_{\leq N} u_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

Applying the triangle inequality and then taking limit superior, we conclude that

\displaystyle  \limsup_{n,m \rightarrow \infty} \|u^{(n)}-v^{(m)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d} \|u_0- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

\displaystyle  + T \|u_0 - P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \| P_{\leq N} u_0 \|_{H^{s+1}({\bf R}^d \rightarrow {\bf R}^d)}.

But by Plancherel’s theorem and dominated convergence we see that

\displaystyle  N \|u_0- P_{\leq N} u_0\|_{H^{s-1}_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

\displaystyle  \|u_0- P_{\leq N} u_0\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

\displaystyle  N^{-1} \|P_{\leq N} u_0\|_{H^{s+1}_x({\bf R}^d \rightarrow {\bf R}^d)} \rightarrow 0

as {N \rightarrow \infty}, and hence

\displaystyle  \limsup_{n,m \rightarrow \infty} \|u^{(n)}-v^{(m)}\|_{L^\infty_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} = 0,

giving the claim. \Box

Remark 7 Since the sequence {(u^{(n)}, p^{(n)})} can converge to at most one limit {(u,p)}, we see that the solution {(u,p)} to (1) is unique in the class of distributional solutions that are limits of smooth solutions (with initial data of those solutions converging to {u_0} in {H^s}). However, this leaves open the possibility that there are other distributional solutions that do not arise as the limits of smooth solutions (or as limits of smooth solutions whose initial data only converge to {u_0} in a weaker sense). It is possible to recover some uniqueness results for fairly weak solutions to the Euler equations if one also assumes some additional regularity on the fields {u,p} (or on related fields such as the vorticity {\omega = \nabla \wedge u}). In two dimensions, for instance, there is a celebrated theorem of Yudovich that weak solutions to 2D Euler are unique if one has an {L^\infty} bound on the vorticity. In higher dimensions one can also obtain uniqueness results if one assumes that the solution is in a high-regularity space such as {C^0_t H^s_x}, {s > \frac{d}{2}+1}. See for instance this paper of Chae for an example of such a result.

Exercise 8 (Continuous dependence on initial data) Let {s > \frac{d}{2}+1} be an integer, let {R>0}, and set {T := c_{s,d} R^{-1}}, where {c_{s,d}>0} is sufficiently small depending on {s,d}. Let {B} be the closed ball of radius {R} around the origin of divergence-free vector fields {u_0} in {H^s_x({\bf R}^d \rightarrow {\bf R}^d)}. The above proposition provides a solution {u \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)} to the associated initial value problem. Show that the map from {u_0} to {u} is a continuous map from {B} to {C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}.

Remark 9 The continuity result provided by the above exercise is not as strong as in Navier-Stokes, where the solution map is in fact Lipschitz continuous (see e.g., Exercise 43 of Notes 1). In fact for the Euler equations, which is classified as a “quasilinear” equation rather than a “semilinear” one due to the lack of the dissipative term {\nu \Delta u} in the equation, the solution map is not expected to be uniformly continuous on this ball, let alone Lipschitz continuous. See this previous blog post for some more discussion.

Exercise 10 (Maximal Cauchy development) Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence free. Show that there exists a unique {T_*>0} and unique {u \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R})} with the following properties:

Furthermore, show that {T_*, u, p} do not depend on the particular choice of {s}, in the sense that if {u_0} belongs to both {H^s} and {H^{s'}} for two integers {s,s' > \frac{d}{2}+1} then the time {T_*} and the fields {u,p} produced by the above claims are the same for both {s} and {s'}.

We will refine part (iii) of the above exercise in the next section. It is a major open problem as to whether the case {T_* < \infty} (i.e., finite time blowup) can actually occur. (It is important here that we have some spatial decay at infinity, as represented here by the presence of the {H^s_x} norm; when the solution is allowed to diverge at spatial infinity, it is not difficult to construct smooth solutions to the Euler equations that blow up in finite time; see e.g., this article of Stuart for an example.)

Remark 11 The condition {s > \frac{d}{2}+1} that recurs in the above results can be explained using the heuristics from Section 5 of Notes 1. Assume that a given time {t}, the velocity field {u} fluctuates at a spatial frequency {N \gtrsim 1}, with the fluctuations being of amplitude {A}. (We however permit the velocity field {u} to contain a “bulk” low frequency component which can have much higher amplitude than {u}; for instance, the first component {u_1} of {u} might take the form {u_1 = B + A \cos( N x_2)} where {B} is a quantity much larger than {A}.) Suppose one considers the trajectories of two particles {P,Q} whose separation at time zero is comparable to the wavelength {1/N} of the frequency oscillation. Then the relative velocities of {P,Q} will differ by about {A}, so one would expect the particles to stay roughly the same distance from each other up to time {\sim \frac{1}{AN}}, and then exhibit more complicated and unpredictable behaviour after that point. Thus the natural time scale {T} here is {T \sim \frac{1}{AN}}, so one only expects to have a reasonable local well-posedness theory in the regime

\displaystyle  \frac{1}{AN} \gtrsim 1. \ \ \ \ \ (11)

On the other hand, if {u_0} lies in {H^s}, and the frequency {N} fluctuations are spread out over a set of volume {V}, the heuristics from the previous notes predict that

\displaystyle  N^s A V^{1/2} \lesssim 1.

The uncertainty principle predicts {V \gtrsim N^{-d}}, and so

\displaystyle  \frac{1}{AN} \gtrsim N^{s - \frac{d}{2} - 1}.

Thus we force the regime (11) to occur if {s > \frac{d}{2}+1}, and barely have a chance of doing so in the endpoint case {s = \frac{d}{2}+1}, but would not expect to have a local theory (at least using the sort of techniques deployed in this section) for {s < \frac{d}{2} + 1}.

Exercise 12 Use similar heuristics to explain the relevance of quantities of the form {\| \nabla u \|_{L^1_t L^\infty_x}} that occurs in various places in this section.

Because the solutions constructed in Exercise 10 are limits (in rather strong topologies) of smooth solutions, it is fairly easy to extend estimates and conservation laws that are known for smooth solutions to these slightly less regular solutions. For instance:

Exercise 13 Let {s, u_0, T_*, u, p} be as in Exercise 10.

  • (i) (Energy conservation) Show that {\|u(t)\|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)} = \| u_0 \|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)}} for all {0 \leq t < T_*}.
  • (ii) Show that

    \displaystyle  \| u(t) \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)} \lesssim_{s,d}

    \displaystyle  \exp( O_{s,d}( \int_0^t \| \nabla u(t')\|_{L^\infty_x({\bf R}^d \rightarrow {\bf R}^{d^2})}\ dt' )) \| u_0 \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}

    for all {0 \leq t < T_*}.

Exercise 14 (Vanishing viscosity limit) Let the notation and hypotheses be as in Corollary 3. For each {\nu>0}, let {u^{(\nu)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}^d}, {p^{(\nu)}: [0,T] \times {\bf R}^d \rightarrow {\bf R}} be the solution to (3) with this choice of viscosity and with initial data {u_0}. Show that as {\nu \rightarrow 0}, {u^{(\nu)}} and {p^{(\nu)}} converge locally uniformly to {u,p}, and similarly for all derivatives of {u^{(\nu)}} and {p^{(\nu)}}. (In other words, there is actually no need to pass to a subsequence as is done in the proof of Corollary 3.) Hint: apply the energy method to control the difference {u^{(\nu)} - u}.

Exercise 15 (Local existence for forced Euler) Let {u_0 \in H^\infty_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free, and let {F \in C^\infty_{t,loc} H^\infty([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d)}, thus {F} is smooth and for any {T>0} and any integer {j \geq 0} and {s>0}, {\partial_t^j F \in C^0_t H^s_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}. Show that there exists {T>0} and a smooth solution {(u,p)} to the forced Euler equation

\displaystyle  \partial_t u + (u \cdot \nabla) u = - \nabla p + F

\displaystyle  \nabla \cdot u = 0

\displaystyle  p = - \Delta^{-1} \nabla \cdot \nabla \cdot (u \otimes u)

\displaystyle  u(0) = u_0.

Note: one will first need a local existence theory for the forced Navier-Stokes equation. It is also possible to develop forced analogues of most of the other results in this section, but we will not detail this here.

— 2. The Beale-Kato-Majda blowup criterion —

In Exercise 10 we saw that we could continue {H^s} solutions, {s > \frac{d}{2}+1} to the Euler equations indefinitely in time, unless the integral {\int_0^{T_*} \| \nabla u(t) \|_{L^\infty_x({\bf R}^d \rightarrow {\bf R}^{d^2})}\ dt} became infinite at some finite time {T_*}. There is an important refinement of this blowup criterion, due to Beale, Kato, and Majda, in which the tensor {\nabla u} is replaced by the vorticity two-form (or vorticity, for short)

\displaystyle  \omega := \nabla \wedge u,

that is to say {\omega} is essentially the anti-symmetric component of {\nabla u}. Whereas {\nabla u} is the tensor field

\displaystyle  (\nabla u)_{ij} = \partial_i u_j,

{\omega} is the anti-symmetric tensor field

\displaystyle  \omega_{ij} = \partial_i u_j - \partial_j u_i. \ \ \ \ \ (12)

Remark 16 In two dimensions, {\omega} is essentially a scalar, since {\omega_{11}=\omega_{22}=0} and {\omega_{12} = -\omega_{21}}. As such, it is common in fluid mechanics to refer to the scalar field {\omega_{12} = \partial_1 u_2 - \partial_2 u_1} as the vorticity, rather than the two form {\omega}. In three dimensions, there are three independent components {\omega_{23}, \omega_{31}, \omega_{12}} of the vorticity, and it is common to view {\omega} as a vector field {\vec \omega = (\omega_{23}, \omega_{31}, \omega_{12})} rather than a two-form in this case (actually, to be precise {\omega} would be a pseudovector field rather than a vector field, because it behaves slightly differently to vectors with respect to changes of coordinate). With this interpretation, the vorticity is now the curl of the velocity field {u}. From a differential geometry viewpoint, one can view the two-form {\omega} as an antisymmetric bilinear map from vector fields {X,Y} to scalar functions {\omega(X,Y)}, and the relation between the vorticity two-form {\omega} and the vorticity (pseudo-)vector field {\vec \omega} in {{\bf R}^3} is given by the relation

\displaystyle  \omega(X,Y) = \mathrm{vol}( \vec \omega, X, Y )

for arbitrary vector fields {X,Y}, where {\mathrm{vol} = dx_1 \wedge dx_2 \wedge dx_3} is the volume form on {{\bf R}^3}, which can be viewed in three dimensions as an antisymmetric trilinear form on vector fields. The fact that {\vec \omega} is a pseudovector rather than a vector then arises from the fact that the volume form changes sign upon applying a reflection.

The point is that vorticity behaves better under the Euler flow than the full derivative {\nabla u}. Indeed, if one takes a smooth solution to the Euler equation in coordinates

\displaystyle  \partial_t u_j + u_k \partial_k u_j = -\partial_j p

and applies {\partial_i} to both sides, one obtains

\displaystyle  \partial_t \partial_i u_j + \partial_i u_k \partial_k u_j + u_k \partial_k \partial_i u_j = -\partial_i \partial_j p.

If one interchanges {i,j} and then subtracts, the pressure terms disappear, and one is left with

\displaystyle  \partial_t \omega_{ij} + \partial_i u_k \partial_k u_j - \partial_j u_k \partial_k u_i + u_k \partial_k \omega_{ij} = 0

which we can rearrange using the material derivative {D_t = \partial_t + u_k \partial_k} as

\displaystyle  D_t \omega_{ij} - \partial_j u_k \partial_k u_i + \partial_i u_k \partial_k u_j.

Writing {\partial_k u_i = -\omega_{ik} + \partial_i u_k} and {\partial_k u_j = - \omega_{jk} + \partial_j u_k}, this becomes the vorticity equation

\displaystyle  D_t \omega_{ij} + \omega_{ik} \partial_j u_k - \omega_{jk} \partial_i u_k = 0. \ \ \ \ \ (13)

The vorticity equation is particularly simple in two and three dimensions:

Exercise 17 (Transport of vorticity) Let {u,p} be a smooth solution to Euler equation in {{\bf R}^d}, and let {\omega} be the vorticity two-form.

  • (i) If {d=2}, show that

    \displaystyle  D_t \omega_{12} = 0.

  • (ii) If {d=3}, show that

    \displaystyle  D_t \vec \omega = (\vec \omega \cdot \nabla) u

    where {\vec \omega = (\omega_{23}, \omega_{31}, \omega_{12})} is the vorticity pseudovector.

Remark 18 One can interpret the vorticity equation in the language of differential geometry, which is a more covenient formalism when working on more general Riemann manifolds than {{\bf R}^d}. To be consistent with the conventions of differential geometry, we now write the components of the velocity field {u} as {u^i} rather than {u_i} (and the coordinates of {{\bf R}^d} as {x^i} rather than {x_i}). Define the covelocity {1}-form {v} as

\displaystyle  v = \eta_{ij} u^i dx^j

where {\eta_{ij}} is the Euclidean metric tensor (in the standard coordinates, {\eta_{ij} = \delta_{ij}} is the Kronecker delta, though {\eta_{ij}} can take other values than {\delta_{ij}} if one uses a different coordinate system). Thus in coordinates, {v_i = \eta_{ij} u^j}; the covelocity field is thus the musical isomorphism applied to the velocity field. The vorticity {2}-form {\omega} can then be interpreted as the exterior derivative of the covelocity, thus

\displaystyle  \omega = dv

or in coordinates

\displaystyle  \omega_{ij} = \partial_i v_j - \partial_j v_i.

The Euler equations can be rearranged as

\displaystyle  \partial_t v + \mathcal{L}_u v = - d \tilde p, \ \ \ \ \ (14)

where {\mathcal{L}_u} is the Lie derivative along {u}, which for {1}-forms is given in coordinates as

\displaystyle  \mathcal{L}_u v_i = u^j \partial_j v_i + (\partial_i u^j) v_j

and {\tilde p} is the modified pressure

\displaystyle  \tilde p := p - \frac{1}{2} u^j v_j.

If one takes exterior derivatives of both sides of (14) using the basic differential geometry identities {d \mathcal{L}_u = \mathcal{L}_u d} and {dd = 0}, one obtains the vorticity equation

\displaystyle  \partial_t \omega + \mathcal{L}_u \omega = 0

where the Lie derivative for {2}-forms is given in coordinates as

\displaystyle  \mathcal{L}_u \omega_{ik} = u^j \partial_j \omega_{ik} + (\partial_i u^j) \omega_{jk} + (\partial_k u^j) \omega_{ij}

and so we recover (13) after some relabeling.

We now present the Beale-Kato-Majda condition.

Theorem 19 (Beale-Kato-Majda) Let {s > \frac{d}{2}+1} be an integer, and let {u_0 \in H^s_x({\bf R}^d \rightarrow {\bf R}^d)} be divergence free. Let {u \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R}^d)}, {p \in C^0_{t,loc} H^s([0,T_*) \times {\bf R}^d \rightarrow {\bf R})} be the maximal Cauchy development from Exercise 10, and let {\omega} be the vorticity.

The double exponential in (i) is not a typo! It is an open question though as to whether this double exponential bound can be at all improved, even in the simplest case of two spatial dimensions.

We turn to the proof of this theorem. Part (ii) will be implied by part (i), since if {\| \omega\|_{L^1_t L^\infty_x( [0,T_*) \times {\bf R}^d \rightarrow {\bf R}^{d^2} )}} is finite then part (i) gives a uniform bound on {\|u(t)\|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}} as {t \rightarrow T_*}, preventing finite time blowup. So it suffices to prove part (i). To do this, it suffices to do so for {H^\infty} solutions, since one can then pass to a limit (using the strong continuity in {C^0_t H^s_x}) to establish the general case. In particular, we can now assume that {u,p,u_0} are smooth.

We would like to convert control on {\omega} back to control of the full derivative {\nabla u}. If one takes divergences {\partial_i \omega_{ij}} of the vorticity using (12) and the divergence-free nature {\partial_i u_i = 0} of {u}, we see that

\displaystyle  \partial_i \omega_{ij} = \Delta u_j.

Thus, we can recover the derivative {\partial_k u_j} from the vorticity by the formula

\displaystyle  \partial_k u_j = \Delta^{-1} \partial_i \partial_k \omega_{ij}, \ \ \ \ \ (16)

where one can define {\Delta^{-1} \partial_i \partial_k} via the Fourier transform as a multiplier bounded on every {H^s} space.

If the operators {\Delta^{-1} \partial_i \partial_k} were bounded in {L^\infty_x({\bf R}^d \rightarrow {\bf R})}, then we would have

\displaystyle  \| \nabla u(t) \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_d \| \omega(t)\|_{L^\infty({\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)}

and the claimed bound (15) would follow from Theorem 1(ii) (with one exponential to spare). Unfortunately, {\Delta^{-1} \partial_i \partial_j} is not quite bounded on {L^\infty}. Indeed, from Exercise 18 of Notes 1 we have the formula

\displaystyle  \Delta^{-1} \partial_i \partial_k \phi(y) = \lim_{\varepsilon \rightarrow 0} \int_{|x| > \varepsilon} K_{ik}(x) \phi(x+y)\ dx + \frac{\delta_{ik}}{d} \phi(y)

for any test function {\phi} and {y \in {\bf R}^d}, where {K_{ik}} is the singular kernel

\displaystyle K_{ik}(x) := -\frac{1}{|S^{d-1}|} (\frac{d x_i x_k}{|x|^{d+2}} - \frac{\delta_{ik}}{|x|^d}).

If one sets {\phi} to be a (smooth approximation) to the signum {\mathrm{sgn}(K_{ik})} restricted to an annulus {\varepsilon \leq |x| \leq R}, we conclude that the operator norm of {\Delta^{-1} \partial_i \partial_k} is at least as large as

\displaystyle  \int_{\varepsilon \leq |x| \leq R} |K_{ik}(x)|\ dx.

But one can calculate using polar coordinaates that this expression diverges like {\log \frac{R}{\varepsilon}} in the limit {\varepsilon \rightarrow 0}, {R \rightarrow \infty}, giving unboundedness.

As it turns out, though, the Gronwall argument used to establish Theorem 1(ii) can just barely tolerate an additional “logarithmic loss” of the above form, albeit at the cost of worsening the exponential term to a double exponential one. The key lemma is the following result that quantifies the logarithmic divergence indicated by the previous calculaation, and is similar in spirit to a well known inequality of Brezis and Wainger.

Lemma 20 (Near-boundedness of {\Delta^{-1} \partial_i \partial_k}) For any {s > \frac{d}{2}+1}, one has

\displaystyle  \| \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty({\bf R}^d \rightarrow {\bf R}^{d^2})} \lesssim_{s,d} \| \omega \|_{L^\infty({\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)} \log(2 + \| u \|_{H^s_x({\bf R}^d \rightarrow {\bf R}^d)}) \ \ \ \ \ (17)

\displaystyle  + \| u \|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)} + 1.

The lower order terms {\| u \|_{L^2_x({\bf R}^d \rightarrow {\bf R}^d)} + 1} will be easily dealt with in practice; the main point is that one can almost bound the {L^\infty} norm of {\Delta^{-1} \partial_i \partial_k \omega} by that of {\omega}, up to a logarithmic factor.

Proof: By a limiting argument we may assume that {u} (and hence {\omega} are test functions. We apply Littlewood-Paley decomposition to write

\displaystyle \omega = P_{\leq 1} \omega + \sum_{N>1} P_N \omega

and hence by the triangle inequality we may bound the left-hand side of (17) by

\displaystyle  \| P_{\leq 1} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty} + \sum_{N>1} \| P_{N} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty}

where we omit the domain and range from the function space norms for brevity.

By Bernstein’s inequality we have

\displaystyle  \| P_{\leq 1} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty} \lesssim_d \| P_{\leq 1} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^2} \lesssim_d \| P_{\leq 1} \omega \|_{L^2}

\displaystyle  \lesssim_d \| u \|_{L^2}.

Also, from Bernstein and Plancherel we have

\displaystyle  \| P_{N} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty} \lesssim_d N^{d/2} \| P_{N} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^2}

\displaystyle  \lesssim_d N^{d/2} \| P_{N} \omega \|_{L^2}

\displaystyle  \lesssim_d N^{d/2-s-1} \| u \|_{H^s}

and hence by geometric series we have

\displaystyle  \sum_{N > N_0} \| P_{N} \Delta^{-1} \partial_i \partial_k \omega_{ij} \|_{L^\infty} \lesssim_{s,d} N_0^{d/2-s-1} \| u \|_{H^s}

for any {N_0>1}. This gives an acceptable contribution if we select {N_0 := (2+\| \phi \|_{H^s})^{1/(s-d/2-1)}}. This leaves {O_{s,d}( \log(2 + \| \phi\|_{H^s})} remaining values of {N} to control, so if one can bound

\displaystyle  \| P_{N} \Delta^{-1} \partial_i \partial_k \phi \|_{L^\infty} \lesssim_d \| \phi \|_{L^\infty} \ \ \ \ \ (18)

for each {N > 1}, we will be done.

Observe from applying the scaling {x \mapsto Nx/2} (that is, replacing {x \mapsto \phi(x)} with {x \mapsto \phi(2x/N)} that to prove (18) for all {N} it suffices to do so for {N=2}. By Fourier analysis, the function {P_2 \Delta^{-1} \partial_i \partial_k \phi} is the convolution of {\phi} with the inverse Fourier transform {K} of the function

\displaystyle  \xi \mapsto (\phi(\xi/2) - \phi(\xi)) \frac{\xi_i \xi_k}{|\xi|^2}.

This function is a test function, so {K} is a Schwartz function, and the claim now follows from Young’s inequality. \Box

We return now to the proof of (15). We adapt the proof of Proposition 1(i). As in that proposition, we introduce the higher energy

\displaystyle  \partial_t E(t) = \sum_{m=0}^s \int_{{\bf R}^d} \nabla^k u \cdot \nabla^k \partial_t u\ dx.

We no longer have the viscosity term as {\nu=0}, but that term was discarded anyway in the analysis. From (4) we have

\displaystyle  \partial_t E(t) \leq O_{s,d}( E(t) \| \nabla u(t) \|_{L^\infty} ).

Applying (16), (20) one thus has

\displaystyle  \partial_t E(t) \leq O_{s,d}( E(t) (\| \omega(t) \|_{L^\infty} \log(2 + E(t)) + \| u(t) \|_{L^2} + 1) ).

From Exercise 13 one has

\displaystyle  \| u(t) \|_{L^2} = \| u_0 \|_{L^2}.

By the chain rule, one then has

\displaystyle  \partial_t \log(2+E(t)) \leq O_{s,d}( \| \omega(t) \|_{L^\infty} \log(2 + E(t)) + \| u_0 \|_{L^2} + 1 )

and hence by Gronwall’s inequality one has

\displaystyle  \log(2+E(T)) \lesssim_{s,d} T (\|u_0\|_{L^2}+1) +

\displaystyle  \log(2+E(0)) \exp( O_{s,d}( \|\omega \|_{L^1_t L^\infty_x([0,T] \times {\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)} ) ).

The claim (15) follows.

Remark 21 The Beale-Kato-Majda criterion can be sharpened a little bit, by replacing the sup norm {\|\omega(t) \|_{L^\infty({\bf R}^d \rightarrow \bigwedge^2 {\bf R}^d)}} with slightly smaller norms, such as the bounded mean oscillation (BMO) norm of {\omega(t)}, basically by improving the right-hand side of Lemma 20 slightly. See for instance this paper of Planchon and the references therein.

Remark 22 An inspection of the proof of Theorem 19 reveals that the same result holds if the Euler equations are replaced by the Navier-Stokes equations; the energy estimates acquire an additional “{C}” term by doing so (as in the proof of Proposition 1), but the sign of that term is favorable.

We now apply the Beale-Kato-Majda criterion to obtain global well-posedness for the Euler equations in two dimensions:

Theorem 23 (Global well-posedness) Let {u_0, s, T_*, u, p} be as in Exercise 10. If {d=2}, then {T_* = +\infty}.

This theorem will be immediate from Theorem 19 and the following conservation law:

Proposition 24 (Conservation of vorticity distribution) Let {u_0, s, T_*, u, p} be as in Exercise 10 with {d=2}. Then one has

\displaystyle  \| \omega_{12}(t) \|_{L^q({\bf R}^2 \rightarrow {\bf R})} = \| \omega_{12}(0) \|_{L^q({\bf R}^2 \rightarrow {\bf R})}

for all {2 \leq q \leq \infty} and {0 \leq t < T_*}.

Proof: By a limiting argument it suffices to show the claim for {q < \infty}, thus we need to show

\displaystyle  \int_{{\bf R}^2} |\omega_{12}(t, x)|^q\ dx = \int_{{\bf R}^2} |\omega_{12}(0, x)|^q\ dx.

By another limiting argument we can take {u} to be an {H^\infty} solution. By the monotone convergence theorem (and Sobolev embedding), it suffices to show that

\displaystyle  \int_{{\bf R}^2} F( \omega_{12}(t, x) )\ dx = \int_{{\bf R}^2} F( \omega_{12}(0, x) )\ dx

whenever {F: {\bf R} \rightarrow {\bf R}} is a test function that vanishes in a neighbourhood of the origin {0}. Note that as {\omega_{12}} and all its derivatives are in {L^\infty_t H^\infty_x} on {[0,T] \times {\bf R}^2} for every {0 < T 0}, and so {F(\omega_{12})} is smooth and compactly supported in {[0,T] \times {\bf R}^2}. We may therefore may differentiate under the integral sign to obtain

\displaystyle  \partial_t \int_{{\bf R}^2} F( \omega_{12}(t, x) )\ dx = \int_{{\bf R}^2} F'( \omega_{12} ) \partial_t \omega_{12}\ dx

where we omit explicit dependence on {t,x} for brevity. By Exercise 17(i), the right-hand side is

\displaystyle  \int_{{\bf R}^2} F'( \omega_{12} ) (u \cdot \nabla) \omega_{12}\ dx

which one can write as a total derivative

\displaystyle  \int_{{\bf R}^2} (u \cdot \nabla) F(\omega_{12})\ dx

which vanishes thanks to integration by parts and the divergence-free nature of {u}. The claim follows. \Box

The above proposition shows that in two dimensions, {\| \omega(t)\|_{L^\infty({\bf R}^2 \rightarrow \bigwedge^2{\bf R}^2)}} is constant, and so the integral {\int_0^{T_*} \| \omega(t)\|_{L^\infty({\bf R}^2 \rightarrow \bigwedge^2{\bf R}^2)}\ dt} cannot diverge for finite {T_*}. Applying Theorem 19, we obtain Theorem 23. We remark that global regularity for two-dimensional Euler was established well before the Beale-Kato-Majda theorem, starting with the work of Wolibner.

One can adapt this argument to the Navier-Stokes equations:

Exercise 25 Let {s > 2} be an integer, let {\nu>0}, let {u_0 \in H^s({\bf R}^2 \rightarrow {\bf R}^2)} be divergence-free, and let {u: [0,T_*) \times {\bf R}^2 \rightarrow {\bf R}^2}, {p: [0,T_*) \times {\bf R}^2 \rightarrow {\bf R}} be a maximal Cauchy development to the Navier-Stokes equations with initial data {u_0}. Let {\omega} be the vorticity.

Remark 26 There are other ways to establish global regularity for two-dimensional Navier-Stokes (originally due to Ladyzhenskaya); for instance, the {L^2} bound on the vorticity in Exercise 25(ii), combined with energy conservation, gives a uniform {H^1} bound on the velocity field, which can then be inserted into (the non-periodic version of) Theorem 38 of Notes 1.

Remark 27 If {t \mapsto u(t,x), t \mapsto p(t,x)} solve the Euler equations on some time interval {I} with initial data {x \mapsto u_0(x)}, then the time-reversed fields {t \mapsto -u(-t,x), t \mapsto p(-t,x)} solve the Euler equations on the reflected interval {-I} with initial data {x \mapsto -u_0(x)}. Because of this time reversal symmetry, the local and global well-posedness theory for the Euler equations can also be extended backwards in time; for instance, in two dimensions any {H^\infty} divergence free initial data {u_0} leads to an {H^\infty} solution to the Euler equations on the whole time interval {(-\infty,\infty)}. However, the Navier-Stokes equations are very much not time-reversible in this fashion.

Terence TaoEmbedding the Heisenberg group into a bounded dimensional Euclidean space with optimal distortion

I’ve just uploaded to the arXiv my paper “Embedding the Heisenberg group into a bounded dimensional Euclidean space with optimal distortion“, submitted to Revista Matematica Iberoamericana. This paper concerns the extent to which one can accurately embed the metric structure of the Heisenberg group

\displaystyle H := \begin{pmatrix} 1 & {\bf R} & {\bf R} \\ 0 & 1 & {\bf R} \\ 0 & 0 & 1 \end{pmatrix}

into Euclidean space, which we can write as {\{ [x,y,z]: x,y,z \in {\bf R} \}} with the notation

\displaystyle [x,y,z] := \begin{pmatrix} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1 \end{pmatrix}.

Here we give {H} the right-invariant Carnot-Carathéodory metric {d} coming from the right-invariant vector fields

\displaystyle X := \frac{\partial}{\partial x} + y \frac{\partial}{\partial z}; \quad Y := \frac{\partial}{\partial y}

but not from the commutator vector field

\displaystyle Z := [Y,X] = \frac{\partial}{\partial z}.

This gives {H} the geometry of a Carnot group. As observed by Semmes, it follows from the Carnot group differentiation theory of Pansu that there is no bilipschitz map from {(H,d)} to any Euclidean space {{\bf R}^D} or even to {\ell^2}, since such a map must be differentiable almost everywhere in the sense of Carnot groups, which in particular shows that the derivative map annihilate {Z} almost everywhere, which is incompatible with being bilipschitz.

On the other hand, if one snowflakes the Heisenberg group by replacing the metric {d} with {d^{1-\varepsilon}} for some {0 < \varepsilon < 1}, then it follows from the general theory of Assouad on embedding snowflaked metrics of doubling spaces that {(H,d^{1-\varepsilon})} may be embedded in a bilipschitz fashion into {\ell^2}, or even to {{\bf R}^{D_\varepsilon}} for some {D_\varepsilon} depending on {\varepsilon}.

Of course, the distortion of this bilipschitz embedding must degenerate in the limit {\varepsilon \rightarrow 0}. From the work of Austin-Naor-Tessera and Naor-Neiman it follows that {(H,d^{1-\varepsilon})} may be embedded into {\ell^2} with a distortion of {O( \varepsilon^{-1/2} )}, but no better. The Naor-Neiman paper also embeds {(H,d^{1-\varepsilon})} into a finite-dimensional space {{\bf R}^D} with {D} independent of {\varepsilon}, but at the cost of worsening the distortion to {O(\varepsilon^{-1})}. They then posed the question of whether this worsening of the distortion is necessary.

The main result of this paper answers this question in the negative:

Theorem 1 There exists an absolute constant {D} such that {(H,d^{1-\varepsilon})} may be embedded into {{\bf R}^D} in a bilipschitz fashion with distortion {O(\varepsilon^{-1/2})} for any {0 < \varepsilon \leq 1/2}.

To motivate the proof of this theorem, let us first present a bilipschitz map {\Phi: {\bf R} \rightarrow \ell^2} from the snowflaked line {({\bf R},d_{\bf R}^{1-\varepsilon})} (with {d_{\bf R}} being the usual metric on {{\bf R}}) into complex Hilbert space {\ell^2({\bf C})}. The map is given explicitly as a Weierstrass type function

\displaystyle \Phi(x) := \sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(x) - \phi_k(0))

where for each {k}, {\phi_k: {\bf R} \rightarrow \ell^2} is the function

\displaystyle \phi_k(x) := 2^k e^{2\pi i x / 2^k} e_k.

and {(e_k)_{k \in {\bf Z}}} are an orthonormal basis for {\ell^2({\bf C})}. The subtracting of the constant {\phi_k(0)} is purely in order to make the sum convergent as {k \rightarrow \infty}. If {x,y \in {\bf R}} are such that {2^{k_0-2} \leq d_{\bf R}(x,y) \leq 2^{k_0-1}} for some integer {k_0}, one can easily check the bounds

\displaystyle |\phi_k(x) - \phi_k(y)| \lesssim d_{\bf R}(x,y)^{(1-\varepsilon)} \min( 2^{-(1-\varepsilon) (k_0-k)}, 2^{-\varepsilon (k-k_0)} )

with the lower bound

\displaystyle |\phi_{k_0}(x) - \phi_{k_0}(y)| \gtrsim d_{\bf R}(x,y)^{(1-\varepsilon)}

at which point one finds that

\displaystyle d_{\bf R}(x,y)^{1-\varepsilon} \lesssim |\Phi(x) - \Phi(y)| \lesssim \varepsilon^{-1/2} d_{\bf R}(x,y)^{1-\varepsilon}

as desired.

The key here was that each function {\phi_k} oscillated at a different spatial scale {2^k}, and the functions were all orthogonal to each other (so that the upper bound involved a factor of {\varepsilon^{-1/2}} rather than {\varepsilon^{-1}}). One can replicate this example for the Heisenberg group without much difficulty. Indeed, if we let {\Gamma := \{ [a,b,c]: a,b,c \in {\bf Z} \}} be the discrete Heisenberg group, then the nilmanifold {H/\Gamma} is a three-dimensional smooth compact manifold; thus, by the Whitney embedding theorem, it smoothly embeds into {{\bf R}^6}. This gives a smooth immersion {\phi: H \rightarrow {\bf R}^6} which is {\Gamma}-automorphic in the sense that {\phi(p\gamma) = \phi(p)} for all {p \in H} and {\gamma \in \Gamma}. If one then defines {\phi_k: H \rightarrow \ell^2 \otimes {\bf R}^6} to be the function

\displaystyle \phi_k(p) := 2^k \phi( \delta_{2^{-k}}(p) ) \otimes e_k

where {\delta_\lambda: H \rightarrow H} is the scaling map

\displaystyle \delta_\lambda([x,y,z]) := [\lambda x, \lambda y, \lambda^2 z],

then one can repeat the previous arguments to obtain the required bilipschitz bounds

\displaystyle d(p,q)^{1-\varepsilon} \lesssim |\Phi(p) - \Phi(q) \lesssim \varepsilon^{-1/2} d(p,q)^{1-\varepsilon}

for the function

\displaystyle \Phi(p) :=\sum_{k \in {\bf Z}} 2^{-\varepsilon k} (\phi_k(p) - \phi_k(0)).

To adapt this construction to bounded dimension, the main obstruction was the requirement that the {\phi_k} took values in orthogonal subspaces. But if one works things out carefully, it is enough to require the weaker orthogonality requirement

\displaystyle B( \phi_{k_0}, \sum_{k>k_0} 2^{-\varepsilon(k-k_0)} \phi_k ) = 0

for all {k_0 \in {\bf Z}}, where {B(\phi, \psi): H \rightarrow {\bf R}^2} is the bilinear form

\displaystyle B(\phi,\psi) := (X \phi \cdot X \psi, Y \phi \cdot Y \psi ).

One can then try to construct the {\phi_k: H \rightarrow {\bf R}^D} for bounded dimension {D} by an iterative argument. After some standard reductions, the problem becomes this (roughly speaking): given a smooth, slowly varying function {\psi: H \rightarrow {\bf R}^{D}} whose derivatives obey certain quantitative upper and lower bounds, construct a smooth oscillating function {\phi: H \rightarrow {\bf R}^{D}}, whose derivatives also obey certain quantitative upper and lower bounds, which obey the equation

\displaystyle B(\phi,\psi) = 0. \ \ \ \ \ (1)


We view this as an underdetermined system of differential equations for {\phi} (two equations in {D} unknowns; after some reductions, our {D} can be taken to be the explicit value {36}). The trivial solution {\phi=0} to this equation will be inadmissible for our purposes due to the lower bounds we will require on {\phi} (in order to obtain the quantitative immersion property mentioned previously, as well as for a stronger “freeness” property that is needed to close the iteration). Because this construction will need to be iterated, it will be essential that the regularity control on {\phi} is the same as that on {\psi}; one cannot afford to “lose derivatives” when passing from {\psi} to {\phi}.

This problem has some formal similarities with the isometric embedding problem (discussed for instance in this previous post), which can be viewed as the problem of solving an equation of the form {Q(\phi,\phi) = g}, where {(M,g)} is a Riemannian manifold and {Q} is the bilinear form

\displaystyle Q(\phi,\psi)_{ij} = \partial_i \phi \cdot \partial_j \psi.

The isometric embedding problem also has the key obstacle that naive attempts to solve the equation {Q(\phi,\phi)=g} iteratively can lead to an undesirable “loss of derivatives” that prevents one from iterating indefinitely. This obstacle was famously resolved by the Nash-Moser iteration scheme in which one alternates between perturbatively adjusting an approximate solution to improve the residual error term, and mollifying the resulting perturbation to counteract the loss of derivatives. The current equation (1) differs in some key respects from the isometric embedding equation {Q(\phi,\phi)=g}, in particular being linear in the unknown field {\phi} rather than quadratic; nevertheless the key obstacle is the same, namely that naive attempts to solve either equation lose derivatives. Our approach to solving (1) was inspired by the Nash-Moser scheme; in retrospect, I also found similarities with Uchiyama’s constructive proof of the Fefferman-Stein decomposition theorem, discussed in this previous post (and in this recent one).

To motivate this iteration, we first express {B(\phi,\psi)} using the product rule in a form that does not place derivatives directly on the unknown {\phi}:

\displaystyle B(\phi,\psi) = \left( W(\phi \cdot W \psi) - \phi \cdot WW \psi\right)_{W = X,Y} \ \ \ \ \ (2)


This reveals that one can construct solutions {\phi} to (1) by solving the system of equations

\displaystyle \phi \cdot W \psi = \phi \cdot WW \psi = 0 \ \ \ \ \ (3)


for {W \in \{X, Y \}}. Because this system is zeroth order in {\phi}, this can easily be done by linear algebra (even in the presence of a forcing term {B(\phi,\psi)=F}) if one imposes a “freeness” condition (analogous to the notion of a free embedding in the isometric embedding problem) that {X \psi(p), Y \psi(p), XX \psi(p), YY \psi(p)} are linearly independent at each point {p}, which (together with some other technical conditions of a similar nature) one then adds to the list of upper and lower bounds required on {\psi} (with a related bound then imposed on {\phi}, in order to close the iteration). However, as mentioned previously, there is a “loss of derivatives” problem with this construction: due to the presence of the differential operators {W} in (3), a solution {\phi} constructed by this method can only be expected to have two degrees less regularity than {\psi} at best, which makes this construction unsuitable for iteration.

To get around this obstacle (which also prominently appears when solving (linearisations of) the isometric embedding equation {Q(\phi,\phi)=g}), we instead first construct a smooth, low-frequency solution {\phi_{\leq N_0} \colon H \rightarrow {\bf R}^{D}} to a low-frequency equation

\displaystyle B( \phi_{\leq N_0}, P_{\leq N_0} \psi ) = 0 \ \ \ \ \ (4)


where {P_{\leq N_0} \psi} is a mollification of {\psi} (of Littlewood-Paley type) applied at a small spatial scale {1/N_0} for some {N_0}, and then gradually relax the frequency cutoff {P_{\leq N_0}} to deform this low frequency solution {\phi_{\leq N_0}} to a solution {\phi} of the actual equation (1).

We will construct the low-frequency solution {\phi_{\leq N_0}} rather explicitly, using the Whitney embedding theorem to construct an initial oscillating map {f} into a very low dimensional space {{\bf R}^6}, composing it with a Veronese type embedding into a slightly larger dimensional space {{\bf R}^{27}} to obtain a required “freeness” property, and then composing further with a slowly varying isometry {U(p) \colon {\bf R}^{27} \rightarrow {\bf R}^{36}} depending on {P_{\leq N_0}} and constructed by a quantitative topological lemma (relying ultimately on the vanishing of the first few homotopy groups of high-dimensional spheres), in order to obtain the required orthogonality (4). (This sort of “quantitative null-homotopy” was first proposed by Gromov, with some recent progress on optimal bounds by Chambers-Manin-Weinberger and by Chambers-Dotterer-Manin-Weinberger, but we will not need these more advanced results here, as one can rely on the classical qualitative vanishing {\pi^k(S^d)=0} for {k < d} together with a compactness argument to obtain (ineffective) quantitative bounds, which suffice for this application).

To perform the deformation of {\phi_{\leq N_0}} into {\phi}, we must solve what is essentially the linearised equation

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = 0 \ \ \ \ \ (5)


of (1) when {\phi}, {\psi} (viewed as low frequency functions) are both being deformed at some rates {\dot \phi, \dot \psi} (which should be viewed as high frequency functions). To avoid losing derivatives, the magnitude of the deformation {\dot \phi} in {\phi} should not be significantly greater than the magnitude of the deformation {\dot \psi} in {\psi}, when measured in the same function space norms.

As before, if one directly solves the difference equation (5) using a naive application of (2) with {B(\phi,\dot \psi)} treated as a forcing term, one will lose at least one derivative of regularity when passing from {\dot \psi} to {\dot \phi}. However, observe that (2) (and the symmetry {B(\phi, \dot \psi) = B(\dot \psi,\phi)}) can be used to obtain the identity

\displaystyle B( \dot \phi, \psi ) + B( \phi, \dot \psi ) = \left( W(\dot \phi \cdot W \psi + \dot \psi \cdot W \phi) - (\dot \phi \cdot WW \psi + \dot \psi \cdot WW \phi)\right)_{W = X,Y} \ \ \ \ \ (6)


and then one can solve (5) by solving the system of equations

\displaystyle \dot \phi \cdot W \psi = - \dot \psi \cdot W \phi

for {W \in \{X,XX,Y,YY\}}. The key point here is that this system is zeroth order in both {\dot \phi} and {\dot \psi}, so one can solve this system without losing any derivatives when passing from {\dot \psi} to {\dot \phi}; compare this situation with that of the superficially similar system

\displaystyle \dot \phi \cdot W \psi = - \phi \cdot W \dot \psi

that one would obtain from naively linearising (3) without exploiting the symmetry of {B}. There is still however one residual “loss of derivatives” problem arising from the presence of a differential operator {W} on the {\phi} term, which prevents one from directly evolving this iteration scheme in time without losing regularity in {\phi}. It is here that we borrow the final key idea of the Nash-Moser scheme, which is to replace {\phi} by a mollified version {P_{\leq N} \phi} of itself (where the projection {P_{\leq N}} depends on the time parameter). This creates an error term in (5), but it turns out that this error term is quite small and smooth (being a “high-high paraproduct” of {\nabla \phi} and {\nabla\psi}, it ends up being far more regular than either {\phi} or {\psi}, even with the presence of the derivatives) and can be iterated away provided that the initial frequency cutoff {N_0} is large and the function {\psi} has a fairly high (but finite) amount of regularity (we will eventually use the Hölder space {C^{20,\alpha}} on the Heisenberg group to measure this).

December 15, 2018

David Hoggcross-correlations of maps.

It is a low-research time of year! But a highlight today was the PhD candidacy exam of Shengqi Yang (NYU), who is working with Anthony Pullen (NYU) on new data analysis techniques for cosmology. She is doing a number of things, but the part I am most interested in is manipulation of combinations of cross- and auto-correlation functions to determine other cross- and auto-correlation functions. These combinations are very simple and valuable! And you can combine observed and theoretical functions as you see fit. I got an argument started in the room about the conditions under which these relationships are exact, or true in the limit, or approximations. I would like to understand that better!

December 14, 2018

John BaezApplied Category Theory Seminar

We’re going to have a seminar on applied category theory here at U. C. Riverside! My students have been thinking hard about category theory for a few years, but they’ve decided it’s time to get deeper into applications. Christian Williams, in particular, seems to have caught my zeal for trying to develop new math to help save the planet.

We’ll try to videotape the talks to make it easier for you to follow along. I’ll also start discussions here and/or on the Azimuth Forum. It’ll work best if you read the papers we’re talking about and then join these discussions. Ask questions, and answer any questions you can!

Here’s how the schedule of talks is shaping up so far. I’ll add more information as it becomes available, either here or on a webpage devoted to the task.

January 8, 2019: John Baez – Mathematics in the 21st century

I’ll give an updated synthesized version of these earlier talks of mine, so check out these slides and the links:

The mathematics of planet Earth.

What is climate change?

Props in network theory.

January 15, 2019: Jonathan Lorand – Problems in symplectic linear algebra

Lorand is visiting U. C. Riverside to work with me on applications of symplectic geometry to chemistry. Here is the abstract of his talk:

In this talk we will look at various examples of classification problems in symplectic linear algebra: conjugacy classes in the symplectic group and its Lie algebra, linear lagrangian relations up to conjugation, tuples of (co)isotropic subspaces. I will explain how many such problems can be encoded using the theory of symplectic poset representations, and will discuss some general results of this theory. Finally, I will recast this discussion from a broader category-theoretic perspective.

January 22, 2019: Christina Vasilakopoulou – Wiring diagrams

Vasilakopoulou, a visiting professor here, previously worked with David Spivak. So, we really want to figure out how two frameworks for dealing with networks relate: Brendan Fong’s ‘decorated cospans’, and Spivak’s ‘monoidal category of wiring diagrams’. Since Fong is now working with Spivak they’ve probably figured it out already! But anyway, Vasilakopoulou will give a talk on systems as algebras for the wiring diagram monoidal category. It will be based on this paper:

• Patrick Schultz, David I. Spivak and Christina Vasilakopoulou, Dynamical systems and sheaves.

but she will focus more on the algebraic description (and conditions for deterministic/total systems) rather than the sheaf theoretic aspect of the input types. This work builds on earlier papers such as these:

• David I. Spivak, The operad of wiring diagrams: formalizing a graphical language for databases, recursion, and plug-and-play circuits.

• Dmitry Vagner, David I. Spivak and Eugene Lerman, Algebras of open dynamical systems on the operad of wiring diagrams.

January 29, 2019: Daniel Cicala – Dynamical systems on networks

Cicala will discuss a topic from this paper:

• Mason A. Porter and James P. Gleeson, Dynamical systems on networks: a tutorial.

His leading choice is a model for social contagion (e.g. opinions) which is discussed in more detail here:

• Duncan J. Watts, A simple model of global cascades on random networks.

BackreactionDon’t ask what science can do for you.

Among the more peculiar side-effects of publishing a book are the many people who suddenly recall we once met. There are weird fellows who write to say they mulled ten years over a single sentence I once spoke with them. There are awkward close-encounters from conferences I’d rather have forgotten about. There are people who I have either indeed forgotten about or didn’t actually meet. And

Matt von HippelInterdisciplinarity Is Good for the Soul

Interdisciplinary research is trendy these days. Grant agencies love it, for one. But talking to people in other fields isn’t just promoted by the authorities: like eating your vegetables, it’s good for you too.

If you talk only to people from your own field, you can lose track of what matters in the wider world. There’s a feedback effect where everyone in a field works on what everyone else in the field finds interesting, and the field spirals inward. “Interesting” starts meaning what everyone else is working on, without fulfilling any other criteria. Interdisciplinary contacts hold that back: not only can they call bullshit when you’re deep in your field’s arcane weirdness, they can also point out things that are more interesting than you expected, ideas that your field has seen so often they look boring but that are actually more surprising or useful than you realize.

Interdisciplinary research is good for self-esteem, too. As a young researcher, you can easily spend all your time talking to people who know more about your field than you do. Branching out reminds you of how much you’ve learned: all that specialized knowledge may be entry-level in your field, but it still puts you ahead of the rest of the world. Even as a grad student, you can be someone else’s guest expert if the right topic comes up.

December 13, 2018

Tim GowersTaylor and Francis doing Trump’s dirty work for him

The following story arrived in my email inbox (and those of many others) this morning. Apparently a paper was submitted to the Taylor and Francis journal Dynamical Systems, and was accepted. The published version was prepared, and it had got to the stage where a DOI had been assigned. Then the authorS received a letter explaining that “following internal sanctions process checks” the article could not after all be published because one of them was based in Iran.

I don’t know what the legal consequences would have been if Taylor and Francis had simply gone ahead and published, but my hunch is that they are being unduly cautious. I wonder if they turned down any papers by Russian authors after the invasion of Ukraine.

This is not an isolated incident. An Iranian PhD student who applied for funding to go to a mathematics conference in Rome was told that “we are unable to provide financial support for Iranians due to administrative difficulties”.

I’m not sure what one can do about this, but at the very least it should be generally known that it is happening.

Update. Taylor and Francis have now reversed their decision.

BackreactionNew experiment cannot reproduce long-standing dark matter anomaly

Close-up of the COSINE detector  [Credits: COSINE collaboration] To correctly fit observations, physicists’ best current theory for the universe needs a new type of matter, the so-called “dark matter.” According to this theory, our galaxy – as most other galaxies – is contained in a spherical cloud of this dark stuff. Exactly what dark matter is made of, however, we still don’t know. The more

December 12, 2018

David Hoggactions useful? and stars meeting

In the morning I met with Gus Beane (Flatiron) to discuss his use and understanding of actions in empirical work on the Milky Way, following up on the blow-up of last week. We discussed the point that small issues with the Galactocentric coordinate system could totally mess any action calculation, even if the actions make sense, and the point that there are many possible galaxies we might live in for which the actions don't even make sense. We vowed to move the conversation / argument going on at Flatiron towards the question of what we are trying to achieve with these calculations. Are they just orbit labels? Or are they quasi-invariants? Or are we using them to match up stars that are far apart?

Stars Meeting was a whirlwind of interesting things! Kim Bott (UW) told us about polarimetry for exoplanet discovery and characterization. Evan Bauer (UCSB) told us about accretion signatures on white dwarfs and how they have probably been mis-interpreted (but in a way that makes accretion more important!). And Suroor Gandhi (NYU) told us about relationships between ages, abundances, and actions in the Milky Way that suggest that there are interesting relationships at all ages, and at all abundances. Hard to summarize, but there are lots of things to think about in there.

David Hoggnothing

Absolutely nothing! It's job season!

December 11, 2018

Doug NatelsonRice Academy of Fellows, 2019

Just in case....

Rice has a competitive endowed postdoctoral program, the Rice Academy of Fellows.  There are five slots for the coming year (application deadline of January 3).  It's a very nice program, though like all such things it's challenging to get a slot.  If someone is interested in trying this to work with me, I'd be happy to talk - the best approach would be to email me.

Dmitry PodolskyTop Ten Open Problems in Physics

What is the ultimate purpose of my work as theoretical physicist and, if you want, my existence itself? Is it serving the community of other physicists like organizing and participating in conferences? Nop. Then, maybe teaching future physicists in the University, encourage young people to enter the exciting field of physics? Not quite. Writing good papers?  Ei.  Maybe blogging? Sorry but nein. I think… the ultimate purpose of my work is solving unsolved mysteries in physics. I am afraid, this and only this makes my work enjoyable for me, makes it fun. For the sake of future reference, let me enlist here the most important (from my point of view), hard and interesting unsolved problems in physics.

1. The nature of time

The nature of time

This problem is so fundamental that ultimately I have no idea to how even approach it. I mean, it’s like to answer the question what are we born for and why do we have to die. Why 4-dimensional spacetime we live in has three spacelike coordinates and one timelike? Well, I mean – it is rather clear why it does not have 2 timelike coordinates (otherwise, time machines could be possible). But why there is even 1 timelike coordinate and not zero of them?

What is the nature of the arrow of time? What happened to the Universe 14 billion years ago – what was in the beginning of time and what was before the beginning? Was there anything at all? And is there any meaning to even ask this kind of questions? How to reconcile classical chaos, loss of information, with quantum mechanics and S-matrix in quantum field theory – both unitary and therefore forbidding any loss of information? Does CP-violation (and that is – T-violation, since CPT is conserved) have anything to do with the emergence of the arrow of time? Does existence of horizons, black holes and any non-trivial causal structure of spacetime influence play any role?

Asking all these questions, I feel being ridiculously stupid, that’s why I put the problem of time in the first place in my list.

2. Cosmological constant. Inflation

Cosmological constant. Inflation

It seems that the Universe is currently expanding with nearly constant acceleration driven by the cosmological constant (the very same one that Einstein once called “the biggest blunder” of his life) or something that approximately behaves like a cosmological constant. A very surprising fact is that although the associated energy density takes about 70% of the total energy density in the Universe, it  is still vastly smaller (namely, 226. Top ten open problems in physics smaller) than the natural energy scale associated with gravity effects – i.e., Planckian energy density 226. Top ten open problems in physics.

Present accelerated character of expansion of the Universe seems even more interesting taking into account that the very early Universe (13.8 billion years younger than it is now) has also passed through the stage of accelerated expansion, but the associated energy density was much higher at  that time – although still not as high as the Planckian energy density.

To be honest, after years of research and observations (cosmology did become a precision science nowadays) we still have no slightest idea what is behind the present accelerated expansion of the Universe and what is the nature of the huge (in fact, the largest) heirarchy of scales which seem relevant for the gravitational physics.

3. Turbulence


I wrote about it many times, it is worth writing another time. We don’t know how to analytically treat developed turbulence. We are mumbling something about 2-dimensional scalar turbulence, weak turbulence, but ultimately we don’t even know whether a general solution of Navier-Stokes equation exists, whether it is smooth or finite time singularities are getting developed in the fluid velocity flow (note that Clay Mathematics Institute currently offers 1 million USD for the solution of this problem).

Ultimately, I think, the very heart of the problem is this: out of viscosity 226. Top ten open problems in physics, velocity of the flow 226. Top ten open problems in physics and a linear scale of the flow 226. Top ten open problems in physics one can construct a single dimensionless combination called Reynolds number:

226. Top ten open problems in physics. (1)

Essentially, it defines whether the flow of a fluid is turbulent or laminar (smooth) – it becomes turbulent when the Reynolds number gets larger than 100.

Now, usually in physics all dimensionless combinations are of the order 1. Why the combination (1) instead has to be of the order 100 in order for physics of fluid to become interesting? If we will know the answer to this question, we will advance quite a bit in understanding developed turbulence.

Why the latter is so critically important for our health and wealth? I am Russian, so let me talk a bit about oil pipes.What would be the optimal velocity of the oil flow through the pipe to maximize income of GAZPROM? If I pump the oil with small velocity, the oil flow through it is slow, as well as the flow of money to my pocket. If I pump oil with very large velocity, then turbulence and turbulent viscosity becomes important at some point – I start to loose lots and lots of energy to pumping (viscosity is essentially equivalent of friction).

As you see, the price of the question is much higher than 1 million offered by the Clay Mathematics Institute.

4. Confinement. Quark-gluon plasma.

Confinement. Quark-gluon plasma.

Studying hard inelastic scatterings, we know from the end of 1960s that hadrons (like protons and neutrons) have constituents called quarks. And yet we are unable to break a hadron into its constituents and get quarks in a free state. We say that quarks are confined within the hardron. Why? We have many different ideas but so far we are never 100% sure that a given idea is correct. The theory of quarks and gluons (mediating interaction between quarks) is known from the beginning of 1970s – it is called quantum chromodynamics. Somehow, we are unable to take it and prove by hands that quarks are getting confined at large distances, although computer lattice simulations do seem to imply confinement.

What is the physics behind confinement of quarks? Is it dual Meissner effect? Is it specific behavior of instanton liquid? Is it something more exotic? We just don’t know – any single idea seems to have its own advantages and serious drawbacks.

Note that confinement of quarks is closely related to the problem of existence of mass gap in Yang-Mills theories – another problem that Clay Mathematics Institute is willing to pay 1000000 USD for.

5. String theory. M-theory. Dualities.

String theory. M-theory. Dualities.

String theory and its 11-dimensional generalization, M-theory, seems to be the ultimate Theory of Everything – it’s very promising and very powerful. Indeed, the spectrum of strings contains graviton (so the low energy effective action of the string theory is ultimately the Einstein-Hilbert action plus matter fields, that describe the large scale structure of the Universe observable to us). Compactifications of heterotic string theories seem to also contain all matter fields that we observe in Nature, to describe electroweak and strong interactions.

Although the amount of energy and time invested into the developement of string theory by the physics community is enormous, there are many questions that remain unanswered for decades. How to reconcile string theory with accelerated expansion of the Universe that we currently observe? How to technically analyze the perturbation theory of superstrings (beyond, say, four loops)? How to deal with string theory on curved backgrounds or time-dependent backgrounds?

The latter question is of especial importance. The reason is that Yang-Mills theories and quantum chromodynamics in particular behave as string theories in the regime of strong coupling (which is of especial interest for us if we want to solve the problem 4 above). However, these string theories should be defined on curved backgrounds – like Anti-de Sitter background, for example (Anti de Sitter is the spacetime of constant negative curvature). We somewhat (qualitatively) understand how to deal with such string theories, but full, detailed, technical understanding is yet to be achieved. And it may well take 20 or 30 or 100 years to achieve it.

6. Black holes, information loss paradox

Black holes, information loss paradox

If the amount of matter within a given 3-volume becomes large enough, the generated gravitational field gets so strong (or, to say it better, the spacetime gets curved so much) that it becomes impossible for rays of light to leave the given 3-volume. We say that a black hole is getting formed. It swallows all objects trapped by its gravitational field, and nothing can escape out of it once getting trapped. With the exception of Hawking radiation.

As Hawking has shown long time ago, since gravitational field is very strong, it can produce quanta of particles and antiparticles – electrons and positions, for example. If the position that belongs to the produced pair falls inside black hole, then the electron acquires sufficient energy to escape the gravitational field of the black hole, and we can later detect it. Therefore, there is constant outgoing flux of particles emitted by the black hole – the latter is getting evaporated.

Now, the tricky thing is that the spectrum of Hawking radiation is thermal – i.e., it is characterized by just a single number, temperature. What if we drop many different objects into the black hole, and the latter will completely evaporate emitting Hawking radiation? Will the information about the objects that fell to the black hole get ultimately lost? After 30 years of study we still honestly have no answer, although lots of ideas are on the market.

The question of information loss is actually very important one, because all quantum field theories (and string theory!) we are dealing with are unitary, that is, they preserve information.

7. Thermonuclear fusion

Thermonuclear fusion

But let us get somewhat more practical. Once we will construct a thermonuclear reactor, we will get tons of energy almost for free. Indeed, reactions that we are interested in,

226. Top ten open problems in physics,

226. Top ten open problems in physics,

226. Top ten open problems in physics,

run with a huge excess of energy – frankly, the Sun shines due to this excess and will keep shining for another 6 billion years or so. Although we are trying to practically solve the problem of thermonuclear fusion for more than 50 years already, we were yet unable to run the self-sustaining thermonuclear reaction, not to mention a reaction with an energy outcome.

Honestly, I don’t think that the problem of thermonuclear fusion can get into the list, since its theoretical aspects are fairly understood by now. But the promises are so incredibly high… And Steven Chu is the Head of the Department of Energy.

Anyway, there are tons of very complicated technical problems that do not allow us to start and run a thermonuclear reaction today – for one, the plasma is unstable. One can try to control it by magnetic fields but we need really powerful (superconducting) magnets fo it – more powerful than the ones that got broken at LHC several months ago. And this brings us to the next problem:

8. High temperature superconductivity

High temperature superconductivity

Maybe, even room temperature superconductivity – once we thought that this is impossible, but we are no longer so sure of it after the discoveries of 1980s (some organic superconductors allow the critical temperature as high as 138 K). It seems though that high temperature superconductors are not described by the canonic Bardeen-Cooper-Schrieffer (BCS) theory, that is, possibly the mechanism of superconductivity is not related to the interaction between electrons and phonons (phonons are quanta of fluctuations of the molecular lattice). Another possibility is that interaction between electron-phonon interaction in high 226. Top ten open problems in physics superconductors is so strong that we loose analytic control over the theory.

In any event, if we will understand the nature of high 226. Top ten open problems in physicssuperconductivity in cuprate-perovskite ceramics, the room temperature supperconductivity can also become possible and cheap electricity, flying cars (as well as other flying junk) will enter our everyday life.

9. Dark matter

Dark matter

Almost 26% of all matter in the Universe does not emit light. We can feel it gravitationally – see how it attracts stars to galaxies, how it affects expansion of the Universe, but we are not able to detect it which is quite frustrating. Does dark matter consist of some heavy particles (actually, observations seem to suggest so) which were born in the early Universe and do not interact at all with matter we consist of except through the gravitational interaction? Does dark matter consist of supersymmetric partners of the Standard Model fields? Is it an inflaton condensate? We don’t know.

10. Ultra high energy cosmic rays

Ultra high energy cosmic rays

Ultra-high energy cosmic rays (UHECRs) are cosmic rays with energy higher than 226. Top ten open problems in physics eV. They strike the atmosphere and produce spectacular showers of daughter particles, and that is how we detect them (about hundred events per year or so).

The problem is that the source of UHECRs is unknown. They cannot be of cosmological origin (that is, they are not created in the early Universe when energies were high and then travel to us over very large distances) – Greizen-Zatsepin-Kuzmin (GZK) cutoff forbids that. (The physics behind GZK is basically that ultra-high energy cosmic rays flying over cosmological distances should scatter on quanta of cosmic microwave background radiation and inevitably loose energy.) On the other hand, if they are not of cosmological origin, following their trajectory we don’t quite see what is their source on the sky. Recently, Pierre Auger collaboration has claimed  that active galactic nuclei may be the source (see the picture above), but the community is not yet quite sure of that.

Bonus problem.* Metallic hydrogen and exotic matter

Metallic hydrogen and exotic matter
Source:  University of Rochester

Okay, maybe some problems (like 7 or 10) do not quite deserve to be included into the list, so I have to compensate somehow. What you see on the picture above is the artist’s expression of an ocean on Jupiter. The ocean is of metallic hydrogen.

As you may remember, Jupiter is a gas giant (basically, the statement is that a planet cannot become too large unless it mostly consists of gas) – so is Uranus, Saturn and Neptune. However, near the core, the planet should be solid, and the solid core is expected to be covered by liquid metallic hydrogen. It is metallic because under a very strong pressure any material is loosing hold of its electrons and becomes metal. It is liquid since proton (that is, hydrogen atom with electron ripped off) is much lighter than 226. Top ten open problems in physics, and the latter remains liquid even at very small temperatures. Moreover, we expect that metallic hydrogen is superconducting and its critical temperature may be rather high – 100-200 K (so, the Bonus problem is somewhat related to the problem 8).

There are many technical problems related to the production of metallic hydrogen – the main one is of course pressure, we need it to be as high as hundreds of GPa. It seems that some progress has been recently sumultaneously achieved by several groups (LNBL, for one). In particular, it seems to be proven that metallic hydrogen is superconducting.

Instead of conclusion

I seriously doubt that science (and theoretical physics in particular!) is near its completion as some people like to suggest. I believe that even this extremely short list of 11 problems can keep us busy for decades (if not centuries if we consider the problems 1, 5 and 6 really seriously). I am also sure that my list is terribly incomplete, so please feel very welcome to rearrange the order of problems in this list as you like or point out some other problems that I missed due to my ignorance.

The post Top Ten Open Problems in Physics appeared first on None Equilibrium.

Jordan EllenbergRingo Starr rebukes the Stoics

I’ve been reading Marcus Aurelius and he keeps returning to the theme that one must live “according to one’s nature” in order to live a good life.  He really believes in nature.  In fact, he reasons as follows:  nature wouldn’t cause bad things to happen to the virtuous as well as the wicked, and we see that both the virtuous and the wicked often die young, so early death must not be a bad thing.

Apparently this focus on doing what is according to one’s nature is a standard feature of Stoic philosophy.  It makes me think of this song, one of the few times the Beatles let Ringo sing.  It’s not even a Beatles original; it’s a cover of a Buck Owens hit from a couple of years previously.  Released as a B-side to “Yesterday” and then on the Help! LP.

Ringo has a different view on the virtues of acting according to one’s nature:

They’re gonna put me in the movies
They’re gonna make a big star out of me
We’ll make a film about a man that’s sad and lonely
And all I gotta do is act naturally
Well, I’ll bet you I’m a-gonna be a big star
Might win an Oscar you can’t never tell
The movie’s gonna make me a big star,
‘Cause I can play the part so well
Well, I hope you come and see me in the movie
Then I’ll know that you will plainly see
The biggest fool that’s ever hit the big time
And all I gotta do is act naturally

December 10, 2018

Dmitry PodolskyBuilding Skyward: How Physics Made Possible the Vertical Rise of Modern Cities.

There is a new measure to the advancement and modernity of cities around the world. That measure does not belong to the group of economic or financial yardstick. That degree of modernization is now seen on the skyline. The upward trend of building the spaces that people dwell in. The expansion of towns and cities across the world has jumped into vertical rates in the recent years. The era of skyscrapers have arrived. Buildings that literally touch the clouds, tower over any man-made structures and cast shadows several kilometers long

The vertical growth might be due to the innovation or the heavy amount of competition between countries. It might be because of land scarcity and the skyrocketing prices of real estate. But one trend has emerged – the most advanced cities are building high, instead of wide.

advanced cities

The skyscrapers have drawn in the tourism, commerce, business and trade. It has become the icon of a city. It has brought fame, functionality and profits. It has become the quintessential marking of an advanced and modern world.

These man-made wonders we are looking at does not simply emerge from the ground. It took the most talented engineers, architects, contractors and businessmen. It took hard work, perseverance and will. It took countless working hours. It took blood, sweat and tears.

However, most important element in the mix is science – engineering physics to be exact.

Engineering Physics is the discipline devoted to creating and optimizing engineering solutions through enhanced understanding and integrated application of mathematical, scientific, statistical, and engineering principles. It is a branch of applied physics that bridges the gap between theoretical science and practical engineering with emphasis in research and development, design, and analysis.

In building skyward, it takes the expertise on materials science, applied mechanics, mechanical physics, electrical engineering, aerodynamics, energy and solid state physics.

We’ll look into 2 main innovations in applied physics that allowed skyscrapers to grow from a few storeys in size to the modern skyscraper.

Innovation 1: Elevators


Before buildings can go taller it must solve how it can bring people higher. The first dilemma that struck physicist and engineers is the challenge of lifting people to higher floors efficiently and safely. Engineers knew that elevators does exist, but they cannot guarantee the safety of passengers when riding an elevator. They found inspiration from the invention of Elijah Graves Otis in 1854 during the World’s Fair in New York wherein he rode a cab connected to an elevator rope that is secured with a powerful wagon spring mounted on top of the cab. The spring connects to a set of metal prongs on each side of the elevator. The prongs run along guide rails fitted with a row of teeth when the rope breaks it triggers a chain of friction spring relaxes and forces the metal prongs into the teeth locking the cab in place.

The solution used physics as a way to counter the natural force of physics – gravity.

An elevator has three forces, the force of gravity, a downward normal force from the passenger and the elevator itself and an upward force from the tension in the cable holding the elevator.

This application traces its roots from Newton’s Law. Newton’s Law proved that an elevator when stationary the acceleration is 0. When the elevator is going up passengers are accelerating, which adds more force to the scale and increases in the total weight. When the elevator is going down, the same is true, but the acceleration is negative, subtracting force from the scale and decreasing the elevator’s apparent weight.

The understanding of Newton’s Law has allowed engineers to calculate and establish the safe parameters for an elevator to carry passengers up and down the tallest structures on earth.

Innovation 2: Aerodynamics

As engineers wanted to build taller buildings they faced another problem – the wind. In a glance, wind hitting a concrete building with reinforced steel frames seems to be a normal occurrence. However, as the buildings go higher, the forces of wind gets dangerously wilder. In this case, building a stronger and more durable building is not the easy way out to counter the forces of wind. So engineers turn to aerodynamics.

Aerdoynamics is the study of the properties of moving air, and especially of the interaction between the air and solid bodies moving through it.They have to marry the design of the building to the natural forces of the wind.

This innovation is best exemplified by the Burj Dubai tower. The Burj Dubai currently stands as the tallest man-made structure in the world. It stands at 2,717 feet and it houses 30,000 residences spread out over 19 residential towers, an artificial lake, nine hotels, and a shopping mall. The engineers who built and design this mammoth structure took aerodynamics in unprecedented territories.


Instead of making the building sleeker, thinner and smoother so that the wind can easily glide pass it, they designed the angles to “break down” the wind that smashes it at 2000 feet in the air. It eliminates the forces of the wind by deflecting it and disrupting the powerful vortices. It had separate stalks, which top out unevenly around the central spire. The unusual and odd-looking design deflects the wind around the structure and prevents it from forming organized whirlpools of air current, or vortices.

One of the wind-engineering specialist for the skyscraper, Jason Garber said that: “the amount of motion you’d expect is on the order of 1/200 to 1/500 times its height.” For the BurjKhalifa, this translates into about two to four meters. “It’s not much, but certainly enough to make residents queasy if they can sense this motion. That’s why one of the chief concerns of architects and engineers is acceleration, which can result in perceptible forces on the human body.”

These advanced techniques they employed at Burj Dubai relates back to the law of conservation of energy. Energy may neither be created nor destroyed. Air has no force or power, except pressure, unless it is in motion. When it is moving, however, its force becomes apparent.The breaking of the redirection of the force of the wind made it possible to break its force into small, manageable and safe pieces.

Most of us have been inside a tall building and have worked or lived inside a high-rise structure. It is easy to miss the science behind the technology that brought us to newer heights. But as much as we are in awe of the skyscraping buildings physical structure and functionality, we must be in awe of the basic principles that allowed it to exist.


The post Building Skyward: How Physics Made Possible the Vertical Rise of Modern Cities. appeared first on None Equilibrium.

Dmitry PodolskyIn Space, After Chaos There is Birth: Mysterious Formation of Star Clusters

The stars that we see on a clear night sky are conceived through a pandemonium of atoms. These light elements are squeezed under enough pressure for their nuclei to undergo fusion. These little twinkling bits of light we see adorning the infinity of sky at night are beautiful to gaze at. However, its beauty is a result of a mayhem. Atomic behaviors that are pattern-less and random. The small sensitivity in the changes of its atom’s condition can bring life to myriad of possibilities.

Stars are born as a result of balance of forces. The team up of varying forces of gravity squeezes atoms so tight that fusion and reaction emanates. The result of this fusion is an outward and expanding pressure. Once the outward and inward pressure becomes stable, the star that was born can live up to millions – if not billions – of years.

This sounds like a simple lighting of a match when read from a paper. However, the upheaval that the fusion and reaction takes is beyond everyone’s imagination. The biggest explosion known to human being is just a tiny spark in the process of giving birth to a star.

Such awe-inspiring events have made physical cosmologists and astrophysicist craving for more details. They have found some indicators and explanation on the different results of a birth of a star.

Star cluster has received renewed attention in the realm of physical cosmology and astrophysics. Thanks to the technology and relevant discovery they culminated throughout the years.

The Understanding of How Star Clusters are Born

How Star Clusters are Born
Source: NASA

Stars start from large clouds of molecular gas. As a member of a molecular gas, stars form in groups or clusters. The influence of various level of gravity has made possible the exchange of energy between the stars. Some are born as runaway stars that run astray and escaped the tug of the gravity. The rest of them fall into a bind and exists as a collection of stars orbiting one another for an indefinite period of time.

A single star is born from the giant chilling cloud of molecular gas and dust. The stars have had dozens or even hundreds of stellar siblings in a cluster.

When a cluster is young, the brightest members are O, B and A stars. Young clusters in our Galaxy are called open clusters due to their loose appearance. They usually contain between 100 and 1,000 members.

Traditional models claim that the force of gravity may be solely responsible for the formation of stars and star clusters. More recent observations suggest that magnetic fields, turbulence, or both are also involved and may even dominate the creation process. But just what triggers the events that lead to the formation of star clusters.

Gravity has a Hand, But Not Most of the Time.

While gravity plays a very important role in keeping the clusters together, scientists have recently discovered that there are other causes of star birth. Aside from gravity, magnetic fields and turbulence, it is found out that collision among giant molecular clouds sparks the formation of star clusters.

At the forefront of the discovery is National Aeronautics and Space Administration (NASA) and German Aerospace Center’s SOFIA. SOFIA is a Stratospheric Observatory for Infrared Astronomy. It is a modified Boeing 747SP aircraft that houses a 2.7-meter (106-inch) reflecting telescope (with an effective diameter of 2.5 meters or 100 inches). Flying into the stratosphere at 38,000-45,000 feet allows SOFIA to escape 99 percent of Earth’s infrared-blocking atmosphere, allowing astronomers to study the solar system and beyond in ways that are not possible with ground-based telescopes.

Gravity has a Hand, But Not Most of the Time.
Source: NASA

The astronomers leveraging SOFIA’s instrument have observed the amount of motion needed for the ionized carbon around a molecular cloud to form stars. They found the existence of two distinct components of molecular gas colliding with each other at unimaginable speeds of 20,000 miles per hour

The relationship of the distribution and velocity of the molecular and ionized gases are found to be consistent with their model simulations of cloud collisions. Thus the cluster form as a huge gas that is compressed upon collision – creating a shock wave as clouds hit one another.

Thomas Bisbas, a postdoctoral researcher at the University of Virginia, Charlottesville, Virginia, and the lead author on the paper describing these new results said that “Stars are powered by nuclear reactions that create new chemical elements.”

“The very existence of life on earth is the product of a star that exploded billions of years ago, but we still don’t know how these stars — including our own sun — form,” he added.

Universe’s Own Way of Giving Birth

The universe and all the mysteries that shroud its infinite length and width have its own unique ways of giving birth. Astronomers are delving deeply into their simulations and theories that would match their findings. In the case of SOFIA, the infrared-based observation on near-, mid- and far-infrared wavelengths has brought new ideas on how the universe can bring to life a star.

“These star formation models are difficult to assess observationally,” said Jonathan Tan, a professor at Chalmers University of Technology in Gothenburg, Sweden, and the University of Virginia, and a lead researcher on the paper. “We’re at a fascinating point in the project, where the data we are getting with SOFIA can really test the simulations.”

The next step for astronomers is to gather larger amount of data and draw scientific consensus on the mechanism responsible for driving the creation of star clusters.

The breakthrough sparked by SOFIA points that the universe and the limitless chaos that sparks from time to time has its own way of giving birth. It was a spark in a huge atomic fusion that – according to the Big Bang Theory – gave birth to our galaxy.

The unravelling of the actual and scientific cause of a birth of a star is a step towards determining the possibility of life – maybe even civilizations – thriving in one of those stars in a cluster.


The post In Space, After Chaos There is Birth: Mysterious Formation of Star Clusters appeared first on None Equilibrium.

Dmitry PodolskyTo Study Small, You Must Build Big: Why are Scientists Building Big Colliders for the Smallest Collisions?

We were once thought to believe – maybe back in grade school – that the smallest unit of matter is the atom. The word atom that came from the Greek word “atomos,” which means indivisible and were made of protons, neutrons, and electrons. Atoms that can be illustrated by drawing a spherical core, enveloped by overlapping oval lines that serves as orbits of the electron. The iconic diagram of the atom that has become the inspiration of many logo of scientific organizations – and the fascination of the brightest minds in science.

Scientists have proven that atoms were just the starting point. Atoms were just the beginning. Atoms is another universe that is worth digging.

While it is an exciting feat to look deeper into atoms, dissecting it is not as easy as using scalpel to cut open a frog during one of your high school lab experiments. It requires knowledge, effort and funding which closest comparison are space explorations.

It is an effort in the grandest scale to test the predictions of different theories of particle physics, including measuring the properties of the Higgs boson.

Building Big for Particles

Building Big for Particles

The mysteries that are stored inside an atom is being unearthed in a facility that is 27-kilometer long, occupies two countries and requires thousands of people just to operate.

Its size is hard to fathom. Let’s simply describe it as the largest machine on earth.

The Large Hadron Collider (LHC) is the most powerful particle accelerator ever built and the biggest machine on earth. The accelerator is situated in a tunnel 100 meters underground at CERN, the European Organization for Nuclear Research, on the Franco-Swiss border near Geneva, Switzerland.

Its purpose is to propel charged particles to super-fast speeds and energies, then store them in a beam. The beam is composed of neutral particles moving the near speed of light. Through this process, scientist can store high-energy particles that are useful for fundamental and applied research in the sciences.

The LHC consists of a 27-kilometre ring of superconducting magnets with a number of accelerating structures boosting the energy of the particles along the way.

It happens in almost a snap, in an invisible environment clad by tubes and magnets. A couple of high-energy particle beams travel at close to the speed of light before they are made to collide. These particles – like heavy ions and lead ions – are guided around the accelerator ring by a strong magnetic field maintained by superconducting electromagnets. The electromagnets are made from coils of specialized electric cable that are subjected in a superconducting state

This method has allowed efficient conducting of electricity with little to no resistance or loss of energy.

The most energetic process of placing ions in a collision course is by no means possible in a small tube. In order to derive the unexplored particles of an atom, massive magnetic coils that span kilometers is needed. The ions need to gain enough momentum so it can ‘break’ or ‘react upon collision.

The energy requirement to run the machine is like putting 4000 coal-powered power plants together. The heat output of the process is way above the normal levels. Such is why in LHC, they are chilling the magnets to ‑271.3°C – a temperature colder than outer space – and are using liquid helium as the main cooling component.

Building to Add Meaning to the Standard Model

Building to Add Meaning to the Standard Model
Source: Digital Trends/a>

The ultimate goal of this gigantic operating machine is to provide an avenue for scientist to contribute to the Standard Model of Particle Physics.

The Standard Model details how the building blocks of matter interact under the presence of the four fundamental elements. All findings in particle physics has been summed up and boiled down to this model.

It illustrates the electromagnetic, weak and strong forces in the universe with the capability to classify the elementary particles.

It classifies and details the following elementary elements:

  • They are the particles in an atom that have mass and exhibit a spin. They combine to constitute all hadrons (baryons and mesons)–i.e., all particles that interact by means of the strong force, the force that binds the components of the nucleus. Physicists have formed three groups under the quarks: up/down, charm/strange and top/bottom.
  • Particles of half-integer spin that does not subject to strong interactions. Two main classes of leptons exist: charged leptons, and neutral leptons
  • The particles that exhibits zero or integral spin and follows the statistical description given by S. N. Bose and Einstein. Includes fundamental particles such as photons, gluons, and W and Z bosons
  • Higgs Boson. The elementary particle in the Standard Model of particle physics, produced by the quantum excitation of the Higgs field. It is defined as the “God Particle.” There has been no experiment that has seen or observed the before theoretical particle. However, the LHC, on July 4, 2012 announced that they have successfully found evidence of the particle. This particle is believed to be responsible for all the mass in the universe.

This interactive illustration shall help in understanding these elusive elements.

Since the discovery of Higgs Boson, the questions are piling up. What the scientist know, as of the moment is that the said particle exists, and can be summoned using their gigantic magnetic coils and kilometric tubes.

Building Even Bigger

The biggest questions require an even bigger collider. The discovery of the “God Particle” has spurred more interest in the field of particle physics. However, the best findings can only be produced using most capable collider.

 Circular Electron Positron Collider
Source:Vision Times

CERN’s counterpart in the Far East, China’s Institute of High Energy Physics has announced that the conceptual design report for its Circular Electron Positron Collider (CEPC) has been released a few weeks ago.

The two-volume report containing the sophisticated technical details shows that it will be built to exceed the capabilities of LHC. According to the report, it would be a ten-year operation that will yield one million Higgs bosons. Plus one hundred million W bosons, and close to one trillion Z bosons. Billions of bottom quarks, charm quarks and tau-leptons will also be produced in the decays of the Z bosons.


The post To Study Small, You Must Build Big: Why are Scientists Building Big Colliders for the Smallest Collisions? appeared first on None Equilibrium.

Dmitry PodolskyA Nonequilibrium Energy and Environment Special Part 2: The Key to Green Energy is Putting it into People’s Smartphones.

“This is the first of a two-part series detailing the possibility of small-scale energy production to help curb the use of fossil fuels and ultimately – global warming. Explored here are two promising technologies – even if they are at their infancy stage – can make everyone a contributor to the goal of reducing our carbon dioxide footprint.”

The rise of smartphones has disrupted the lives of billions of people. Just like its functionalities that can be attained by a touch of a button, it has crept through the human system like a flick of a switch.Few can imagine life without the bright, wide and intuitive screens of smartphones. Fewer can live without a smartphone.

They haven’t been around for too long, but it has rose to becoming a basic necessity for people around the world. It hasn’t taken too long before it dominated the telecommunication sector.

The innovation and technology being poured to improve the smartphonehas satiated the taste buds of the tech-savvy generation. The rise of smartphone is inevitable and nobody is seemingly stopping it. Well, why would anyone?

It has brought functionality, convenience and a whole lot of cool stuff right into the palm of our hands. It has made communicating through a device easier and more accessible.

However, scientists have sounded the alarm about the negative impact of smartphones to our planet.

In a study conducted by the McMaster University’s Journal of Cleaner Production which analyzed the carbon impact of the entirety of Information and Communication Industry (ICT) has found that smartphones are slowly killing the planet.The blame is pinned to the smartphone due to its life cycle.According to the study, a smartphone is more or less disposable. An average smartphone would be used for two years, then disposed or replaced with a newer model.

Smartphones, just like energy consumption is not harmful by itself. There is a small amount of carbon footprint hooked to the actual use of smartphone as its main energy requirement is the charging of its batteries.

The harm is rooted from the manufacturing of smartphones. Such harm is amplified by the short life cycle of a smartphone.In 2017 alone,175,780 units of smartphones were sold every hour. More than half of the 1.54 billion smartphones sold last year are for disposal or have been replaced by a newer model. In 2020, 90% of these phones have ended their life cycle.


The worrisome figure is that 85% to 95% of the device’s carbon emission came from the mined minerals that make up a smartphone. The big players here are the lithium ion batteries, the glass and semiconductors used inside its system.

As the trend of smartphone design shifts toward bigger, brighter and better screens, the carbon footprint of producing these devices also shoots up. A testament for this is the publicized report of Apple admitting that building an iPhone 7 Plus creates more or less 10% more CO2 than the iPhone 6s, but an iPhone 7 standard creates roughly 10% less than a 6s.

The manufacturing of the smartphone is just the first of the two-prong environmental impact. The second, insidious impact is the smartphone’s server and data centers. These huge facilities are solely responsible for the 45% of the carbon emission of the ICT. That is because every message, every Facebook like, every Instagram post, every Tweet requires a server to execute the commands – and these servers or cloud is not cheap to operate. Nor its carbon emission can be neglected.

Do we take a step back and just stop?

If your answer is yes, you may bid farewell to instant messages, you may say goodbye to Facebook likes, you may stop tweeting and you may halt taking selfies.

Sounds difficult and unfair, right? Yes, we agree that humans will not stop using the smartphone.But what can we do? What can you do?

What we can do has been done before. In fact, it is the reason for the immense popularity of smartphones. Its designers can innovate the build and functionality of the smartphone we know today.

As avid students, researchers and followers of science, especially physics, the environmental team of Nonequilibrium has been trying to find the needle in the haystack and trying to think out of the box.

When almost all functionality has been squeezed in a smartphone’s rectangular body frame, we will look into one additional functionality that can make it not just smarter, but greener.

Harvesting Energy from Glass Panels
Source: MSU Today

The majority of smartphone is made of glass. The capacitive touchscreen, the camera lens, and the premium ones’ back panels are made up of glass. It also comes in different types like Gorilla and Corning.

A few bright minds that are not even associated in the smartphone industryhas a revamped breakthrough in the production of glass. A glass that can harvest energy from the sun.

Researchers and scientists from Michigan State University (MSU) have floated the use of transparent solar panels.The thin sheets of see-through glass that offers solar-harvesting system using the organic molecules developed by Richard Lunt, the Johansen Crosby Endowed Associate Professor of Chemical Engineering and Materials Science at MSU andhis team to absorb invisible wavelengths of sunlight. The team claims that they can set the glass to absorb ultraviolet and the near-infrared wavelengths that then convert this energy into electricity.

The glass that they have been testing fits well into the requirements of a smartphone, because, according to the researchers, it would be made of transparent luminescent solar concentrator that could generatesolar energy on any clear surface without affecting the view.

Once this new breed of solar panel has been refined it can replace the glass that is being used in a smartphone. The smartphone manufacturers can plug in the chipset that will collect the energy from the glass.

This, in theory is more convenient than the new buzz of wireless charging. If tech giants like Apple, Samsung, Huawei, LG and so on puts their fine touch,we can be seeinga smartphone that does not need any charger. A smartphone that is energy self-sufficient.

“We analyzed their potential and show that by harvesting only invisible light, these devices can provide a similar electricity-generation potential as rooftop solar while providing additional functionality to enhance the efficiency of buildings, automobiles and mobile electronics.” Lunt wrote in the research.


technologies and trends

Humans are great in adapting to new technologies and trends. We are great at clinging to our basic necessities and finding ways to acquire them. Clothing, as mentioned in the first part of this report is a basic necessity for a human being to survive – and be decent. Smartphones, as mentioned here has been a necessity in our technologically advancing world.

When we cannot live without these materials, then incorporating our green energy goals to them is a win for both sides. We continue to enjoy what we enjoy, while each and every one contributing to curbing our use of fossil fuels.

These two researches that have caught the attention of our writers and editors are possible and actionable ways to lessen our carbon footprint – so we may continue our existence in this planet without compromising what we love.

The post A Nonequilibrium Energy and Environment Special Part 2: The Key to Green Energy is Putting it into People’s Smartphones. appeared first on None Equilibrium.

December 09, 2018

Terence Tao254A, Supplemental: Weak solutions from the perspective of nonstandard analysis (optional)

Note: this post is not required reading for this course, or for the sequel course in the winter quarter.

In a Notes 2, we reviewed the classical construction of Leray of global weak solutions to the Navier-Stokes equations. We did not quite follow Leray’s original proof, in that the notes relied more heavily on the machinery of Littlewood-Paley projections, which have become increasingly common tools in modern PDE. On the other hand, we did use the same “exploiting compactness to pass to weakly convergent subsequence” strategy that is the standard one in the PDE literature used to construct weak solutions.

As I discussed in a previous post, the manipulation of sequences and their limits is analogous to a “cheap” version of nonstandard analysis in which one uses the Fréchet filter rather than an ultrafilter to construct the nonstandard universe. (The manipulation of generalised functions of Columbeau-type can also be comfortably interpreted within this sort of cheap nonstandard analysis.) Augmenting the manipulation of sequences with the right to pass to subsequences whenever convenient is then analogous to a sort of “lazy” nonstandard analysis, in which the implied ultrafilter is never actually constructed as a “completed object“, but is instead lazily evaluated, in the sense that whenever membership of a given subsequence of the natural numbers in the ultrafilter needs to be determined, one either passes to that subsequence (thus placing it in the ultrafilter) or the complement of the sequence (placing it out of the ultrafilter). This process can be viewed as the initial portion of the transfinite induction that one usually uses to construct ultrafilters (as discussed using a voting metaphor in this post), except that there is generally no need in any given application to perform the induction for any uncountable ordinal (or indeed for most of the countable ordinals also).

On the other hand, it is also possible to work directly in the orthodox framework of nonstandard analysis when constructing weak solutions. This leads to an approach to the subject which is largely equivalent to the usual subsequence-based approach, though there are some minor technical differences (for instance, the subsequence approach occasionally requires one to work with separable function spaces, whereas in the ultrafilter approach the reliance on separability is largely eliminated, particularly if one imposes a strong notion of saturation on the nonstandard universe). The subject acquires a more “algebraic” flavour, as the quintessential analysis operation of taking a limit is replaced with the “standard part” operation, which is an algebra homomorphism. The notion of a sequence is replaced by the distinction between standard and nonstandard objects, and the need to pass to subsequences disappears entirely. Also, the distinction between “bounded sequences” and “convergent sequences” is largely eradicated, particularly when the space that the sequences ranged in enjoys some compactness properties on bounded sets. Also, in this framework, the notorious non-uniqueness features of weak solutions can be “blamed” on the non-uniqueness of the nonstandard extension of the standard universe (as well as on the multiple possible ways to construct nonstandard mollifications of the original standard PDE). However, many of these changes are largely cosmetic; switching from a subsequence-based theory to a nonstandard analysis-based theory does not seem to bring one significantly closer for instance to the global regularity problem for Navier-Stokes, but it could have been an alternate path for the historical development and presentation of the subject.

In any case, I would like to present below the fold this nonstandard analysis perspective, quickly translating the relevant components of real analysis, functional analysis, and distributional theory that we need to this perspective, and then use it to re-prove Leray’s theorem on existence of global weak solutions to Navier-Stokes.

— 1. Quick review of nonstandard analysis —

In this section we quickly review the aspects of nonstandard analysis that we need. Let {{\mathfrak U}} denote the “standard” universe of “standard” mathematical objects; this includes what one might think of as “primitive” standard objects such as (standard) numbers and (standard) points, but also sets of standard objects (such as the set {{\bf R}} of real numbers, or the Euclidean space {{\bf R}^d}), or functions {f: X \rightarrow Y} from one standard space to another, or function spaces such as {L^p({\bf R}^d \rightarrow {\bf R})} of such functions (possibly quotiented out by almost everywhere equivalence), and so forth. In short, {{\mathfrak U}} should contain all the standard objects that one generally works with in analysis. One can require that this universe obey various axioms (e.g. the Zermelo-Fraenkel-Choice axioms of set theory), but we will not be particularly concerned with the precise properties of this universe (we won’t even need to know whether {{\mathfrak U}} is a set or a proper class).

What nonstandard analysis does is take this standard universe {{\mathfrak U}} of standard objects and embed it in a larger nonstandard universe {{}^* {\mathfrak U}} of nonstandard objects which has similar properties to the standard one, but also some additional properties. As discussed in this previous post, the relationship between the standard universe {{\mathfrak U}} and the nonstandard universe {{}^* {\mathfrak U}} is somewhat analogous to that between the rationals {{\bf Q}} and its metric completion {{\bf R}}; most of the algebraic properties of {{\bf Q}} carry over to {{\bf R}}, but {{\bf R}} also has some additional completeness and (local) compactness properties that {{\bf Q}} lacks. Also, one should think of {{}^* {\mathfrak U}} as being far “larger” than {{\mathfrak U}}, in much the same way that {{\bf R}} is larger than {{\bf Q}} in various senses, for instance in the sense of cardinality.

There is one important subtlety concerning the nonstandard universe {{}^* {\mathfrak U}}: it comes with a more restrictive notion of subset (or of function) than the “external” notion of subset or function that one has if one views {{}^* {\mathfrak U}} from some external metatheory (e.g., if one places both {{\mathfrak U}} and {{}^* {\mathfrak U}} inside a very large model of ZFC). Thus, for instance, an externally defined subset of the nonstandard reals {{}^* {\bf R}} may or may not be an internal subset of these reals (in particular, the embedded copy of the standard reals {{\bf R}} is not an internal subset of {{}^* {\bf R}}, being merely an external subset instead); similarly, an externally defined function from {{}^* {\bf R}} to {{}^* {\bf R}} need not be an internal function (for instance, the standard part function {\mathrm{st}} will be external rather than internal). The relationship between internal sets/functions and external sets/functions in nonstandard analysis is somewhat analogous to the relationship between measurable sets/functions and arbitrary sets/functions in measure theory.

The reals {{\bf R}} can be constructed from the rationals {{\bf Q}} in a number of ways, such as by forming Cauchy sequences in {{\bf Q}} and quotienting out by the sequences that converge to zero; similarly, the nonstandard universe {{}^* {\mathfrak U}} can be formed from the standard one {{\mathfrak U}} in a number of ways, such as by forming arbitrary sequences in {{\mathfrak U}} and quotienting out by a non-principal ultrafiter. See for instance this previous post for details. However, much as the precise construction of the reals {{\bf R}} is often of little direct importance in applications, we will not need to care too much about how the nonstandard universe is constructed. Rather, the following properties of this universe will be used:

  • (i) (Embedding) Every standard object, space, operation, or function {x} in {{\mathfrak U}} has a nonstandard counterpart {{}^* x} in {{}^* {\mathfrak U}}. For instance, if {x} is a real number in the set {{\bf R}} of standard reals, then {{}^* x} will be an element of the set {{}^* {\bf R}} of nonstandard reals; if {f: {\bf R}^d \rightarrow {\bf R}} is a standard function, then {{}^* f: {}^*({\bf R}^d) \rightarrow {}^* {\bf R}} is a nonstandard function from the nonstandard Euclidean space {{}^*({\bf R}^d)} to the nonstandard reals {{}^* {\bf R}}. The standard addition operation {+: {\bf R} \times {\bf R} \rightarrow {\bf R}} on the standard reals {{\bf R}} induces a nonstandard addition operation {{}^* +: {}^* {\bf R} \times {}^* {\bf R} \rightarrow {}^* {\bf R}} on the nonstandard reals, though to avoid notational clutter we will write {{}^* +} as {+}, and similarly for other basic mathematical operations. Similarly, the norm function {\|\|: L^p({\bf R}^d \rightarrow {\bf R}) \rightarrow [0,+\infty)} has a nonstandard counterpart {{}^* \| \|: {}^* L^p({\bf R}^d \rightarrow {\bf R}) \rightarrow {}^* [0,+\infty)} that assigns a nonstandard non-negative real {{}^* \| f \| \in {}^* [0,+\infty)} to any nonstandard {L^p({\bf R}^d \rightarrow {\bf R})} function {f \in {}^* {\bf R}^d}. (To avoid notational clutter, we will often abuse notation by identifying {x} with {{}^* x} for various “primitive” mathematical objects {x} such as real numbers, arithmetic operations such as {+}, or functions such as {f: {\bf R}^d \rightarrow {\bf R}}, unless we have a pressing need to carefully distinguish a standard object {x} from its representative {{}^* x} in the nonstandard universe.)
  • (ii) (Transfer) If {P(x_1,\dots,x_k)} is a standard predicate in first order logic involving some finite number of standard objects {x_1,\dots,x_k} (with {k} a fixed standard natural number), and possibly some quantification over standard sets, and {{}^* P} is the nonstandard version of the predicate in which one quantifies over nonstandard sets, then {P(x_1,\dots,x_k)} is true if and only if {{}^* P( {}^* x_1, \dots, {}^* x_k)} is true. Important caveat: the predicate {P} needs to be internal to the mathematical language used internally to both {{\mathfrak U}} and {{}^* {\mathfrak U}} separately; it is not allowed to use external concepts dependent on the way in which {{\mathfrak U}} embeds into {{}^* {\mathfrak U}}, or how either universe embeds into an external metatheory.
  • (iii) ({\aleph_1}saturation) Let {k,l} be standard natural numbers, and suppose that for each standard natural number {n}, {{}^* P_n(x_1,\dots,x_k,c_1,\dots,c_l)} is a nonstandard predicate on {k} nonstandard variables {x_1,\dots,x_k} and nonstandard constants {c_1,\dots,c_l}. If any finite collection of the predicates {{}^* P_n} are simultaneously satisfiable (thus, for each standard {N}, there exist nonstandard objects {x_1^{(N)},\dots,x_k^{(N)}} such that {{}^* P_n(x_1^{(N)},\dots,x_k^{(N)},c_1,\dots,c_l)} holds for all {1 \leq n \leq N}), then the entire collection {{}^* P_n} is simultaneously satisfiable (thus there exists nonstandard objects {x_1,\dots,x_k} such that {{}^* P_n(x_1,\dots,x_k,c_1,\dots,c_l)} holds for all {n \in {\bf N}}).

The {\aleph_1}-saturation property (also informally referred to as countable saturation, though this is technically a slight misnomer) resembles the finite intersection property that characterises compactness of topological spaces (and can thus be viewed as somewhat analogous to the local compactness property for the reals {{\bf R}}), except that the finite intersection property involves arbitrary families of (closed) sets, whereas the {\aleph_1}-saturation property requires the collection of predicates involved to be countable. It is possible to construct nonstandard models with a higher degree of saturation (where one can use more predicates {P_n}, as long as the total number does not exceed some cardinal {\kappa} which relates to the size of the nonstandard universe {{}^* {\mathfrak U}}), for instance by replacing the sequences used to construct the nonstandard universe with tuples ranging over a larger cardinality set. This may potentially be useful for certain types of analysis, for instance ones involving non-separable spaces, or Frechet spaces involving an uncountable number of seminorms.

Let us take for granted the existence of a nonstandard universe obeying the embedding, transfer, and saturation properties, and see what we can do with them. Firstly, transfer shows that the map {x \mapsto {}^* x} is injective: {x = y} if and only if {{}^* x = {}^* y}. The field axioms of the standard reals {{\bf R}} can be phrased in the language of first-order logic, and hence by transfer the nonstandard reals {{}^* {\bf R}} also form a field. For instance, the assertion “For every non-zero standard real {x}, there exists a standard real {y} such that {xy=1}” transfers over to “For every non-zero nonstandard real {x}, there exists a nonstandard real {y} such that {xy=1}“. If {d} is a standard natural numer, one can transfer the statement “{(x_1,\dots,x_d) = (y_1,\dots,y_d)} if and only if {x_1=y_1, \dots, x_d=y_d}” from standard tuples to nonstandard tuples; among other things, this gives the nice identification {{}^* ({\bf R}^d) = ({}^* {\bf R})^d} when {d} is a standard natural number. (The situation is more subtle when {d} is a nonstandard natural number, but in most PDE applications one works in a fixed dimension {d} and will not need to deal with this subtlety.) As one final example, “If {f \in L^p({\bf R}^d)}, then {\|f\|_{L^p({\bf R}^d \rightarrow {\bf R})}=0} holds if and only if {f=0}” transfers to “If {f \in {}^* L^p({\bf R}^d)}, then {{}^* \|f\|_{{}^* L^p({\bf R}^d \rightarrow {\bf R})}=0} holds if and only if {f=0}“. More generally, basic inequalities such as Hölder’s inequality, Sobolev embedding, or the Bernstein inequalities transfer over to the nonstandard setting without difficulty.

As a basic example of saturation, for each standard natural number {n} let {P_n} denote the statement “There exists a nonstandard real {x} such that {x>n}“. These statements are finitely satisfable, hence by {\aleph_1}-saturation they are jointly satisfiable, thus there exists a nonstandard real {x} which is unbounded in the sense that it is larger than every standard natural number (and hence also by every standard real number, by the Archimedean property of the reals). Similarly, there exist nonstandard real numbers {x} which are non-zero but still infinitesimal in the sense that {|x| \leq \varepsilon} for every standard real {\varepsilon>0}.

On the other hand, one cannot apply the saturation property to the statements “There exists a nonstandard real {x} such that {x \in {\bf R}} and {x>n}“, since {{\bf R}} is not known to be an internal subset of the nonstandard universe {{}^* {\mathfrak U}} and so cannot be used as a constant for the purposes of saturation. (Indeed, since this sequence of statements is finitely satisfiable but not jointly satisfiable, this is a proof that {{\bf R}} is not an internal subset of {{}^* {\bf R}}, and must instead be viewed only as an external subset.)

Now we develop analogues of the sequential-based theory of limits in nonstandard analysis. The following dictionary may be helpful to keep in mind when comparing the two:

Standard real {x} A real number {x}
Nonstandard reals {x} A sequence {x_n} of reals
Embedding {{}^* x} of standard real {x} A constant sequence {x_n = x} of reals
Internal set {A} of nonstandard reals A sequence {A_n} of subsets of reals
Embedding {{}^* A} of standard set {A} of reals A constant sequence {A_n=A} of subsets of reals
External set {A \subset {}^* {\bf R}} A collection of sequences of reals
Internal function {f: {}^* {\bf R}^d \rightarrow {}^* {\bf R}} A sequence {f_n: {\bf R}^d \rightarrow {\bf R}} of functions
Embedding {{}^*} of a standard function {f: {\bf R}^d \rightarrow {\bf R}} A constant sequence {f_n = f} of functions
External function {f: {}^* {\bf R}^d \rightarrow {}^* {\bf R}} A map from sequences of vectors to sequences of reals
Equality {x=y} of nonstandard reals After passing to a subsequence, {x_n=y_n} for all {n}
{x = O(1)} {x_n} is bounded
{x = o(1)} {x_n} converges to zero (possibly after passing to subsequence)
{O({\bf R})} Bounded sequences
{o({\bf R})} Sequences converging to zero (possibly after passing to subsequence)
{{\bf R} \oplus o({\bf R})} Convergent sequences (possibly after passing to subsequence)
{O({\bf R}) = {\bf R} \oplus o({\bf R})} Bolzano-Weierstrass theorem
Standard part {\mathrm{st}(x)} of bounded real {x} Limit {\lim_{n \rightarrow\infty} x_n} of bounded sequence {x_n} (possibly after passing to subsequence)

Note in particular that in the nonstandard analysis formalism there is no need to repeatedly pass to subsequences, as is often the case in sequential-based analysis.

A nonstandard real {x} is said to be bounded if one has {|x| \leq C} for some standard {C > 0}. In this case, we write {x=O(1)}, and let {O({\bf R})} denote the set of all bounded reals. It is an external subring of {{}^*{\bf R}} that in turn contains {{\bf R}} as a external subring.

A nonstandard real {x} is said to be infinitesimal if one has {|x| \leq \varepsilon} for all standard {\varepsilon>0}. In this case, we write {x=o(1)}, and let {o({\bf R})} denote the set of all infinitesimal reals. This is another external subring (in fact, an ideal) of {O({\bf R})}, and {O({\bf R}), o({\bf R}), {\bf R}} can be viewed as external vector spaces over {{\bf R}}.

The Bolzano-Weierstrass theorem is fundamental to orthodox real analysis. Its counterpart in nonstandard analysis is

Theorem 1 (Nonstandard version of Bolzano-Weierstrass) As external vector spaces over {{\bf R}}, we have the decomposition {O({\bf R}) = {\bf R} \oplus o({\bf R})}.

Proof: The only real which is simultaneously standard and infinitesimal is zero, so {{\bf R} \cap o({\bf R}) = \{0\}}. It thus suffices to show that every bounded real {x} can be written in the form {x = \alpha + o(1)} for some standard {\alpha}. But the set {\{ y \in {\bf R}: y \leq x \}} is a Dedekind cut; setting {\alpha \in{\bf R}} to be the supremum of this cut, we have {\alpha - 1/n \leq x \leq \alpha + 1/n} for all standard natural numbers {n}, hence {x=\alpha+o(1)} as desired. \Box

If {x \in O({\bf R})} and {x = \alpha + o(1)} for some standard real {\alpha}, we call {\alpha} the standard part of {x} and denote it by {\mathrm{st}(x)}: thus {\mathrm{st}: O({\bf R}) \rightarrow {\bf R}} is the linear projection from {O({\bf R})} to {{\bf R}} with kernel {o({\bf R})}. It is an algebra homomorphism (this is the analogue of the usual limit laws in real analysis).

In real analysis, we know that continuous functions on a compact set that are pointwise bounded are automatically uniformly bounded. There is a handy analogue of this fact in nonstandard analysis:

Lemma 2 (Pointwise bounded/infinitesimal internal functions are uniformly bounded/infinitesimal) Let {f: A \rightarrow {}^*{\bf R}} be an internal function.

  • (i) If {f(x) = O(1)} for all {x \in A}, then there is a standard {C>0} such that {|f(x)| \leq C} for all {x \in A}.
  • (ii) If {f(x) = o(1)} for all {x \in A}, then there is an infinitesimal {\varepsilon>0} such that {|f(x)| \leq \varepsilon} for all {x \in A}.

Proof: Suppose (i) were not the case, then the predicates “{x \in A} and {|f(x)| \geq n}” would be finitely satisfiable, hence jointly satisfiable by {\aleph_1}-saturation. But then there would exist {x \in A} such that {|f(x)| \geq n} for all {n}, contradicting the hypothesis that {f(x) = O(1)}.

For (ii), observe that the predicates “{0 < \varepsilon \leq 1/n} is a nonstandard real such that {|f(x) \leq \varepsilon} for all {x \in A}” are finitely satisfiable, hence jointly satisfiable by {\aleph_1}-saturation, giving the claim. \Box

Because the rationals are dense in the reals, we see (from saturation) that every standard real number can be expressed as the standard part of a bounded rational, thus {R \equiv O({\bf Q})/o({\bf Q})}. This can in fact be viewed as a way to construct the reals; it is a minor variant of the standard construction of the reals as the space of Cauchy sequences of rationals, quotiented out by Cauchy equivalence.

Closely related to Lemma 2 is the overspill (or underspill) principle:

Lemma 3 Let {P(x)} be an internal predicate of a nonnegative nonstandard real number {x}.

  • (i) (Overspill) If {P(x)} is true for arbitrarily large standard {x}, then it is also true for at least one unbounded {x}.
  • (ii) (Underspill) If {P(x)} is true for arbitrarily small standard {x>0}, then it is also true for at least one infinitesimal {x}.

Proof: To prove (i), observe that the predicates “{P(x)} and {x \geq n}” for {n=1,2,\dots} a standard natural number are finitely satisfiable, hence jointly satisfiable by {\aleph_1}-saturation, and the claim follows. The claim (ii) is proven similarly, using {0 < x < 1/n} instead of {x \geq n}. \Box

Corollary 4 Let {f: {}^* {\bf R} \rightarrow {}^* {\bf R}} be an internal function. If {f(x) = o(1)} for all standard {x > 0}, then one has {f(x) = o(1)} for at least one unbounded {x>0}.

Proof: Apply Lemma 3 to the predicate {|f(x)| \leq 1/x}. \Box

The overspill principle and its analogues correspond, roughly speaking, to the “diagonalisation” arguments that are common in sequential analysis, for instance in the proof of the Arzelá-Ascoli theorem.

— 2. Some nonstandard functional analysis —

In the discussion of the previous section, the real numbers {{\bf R}} could be replaced by the complex numbers {{\bf C}} or finite-dimensional vector spaces {{\bf R}^d} (with {d} a standard natural number) with essentially no change in theory. However the situation becomes a bit more subtle when one works with infinite dimensional spaces, such as the functional spaces that are commonly used in PDE.

Let {X} be a standard normed vector space with norm {\| \|_X}, then we can form the nonstandard function space {{}^* X} with a nonstandard (or internal) norm {{}^* \| \|_{{}^* X}: {}^* X \rightarrow {}^*[0,+\infty)}. This is not quite a normed vector space when viewed externally, because the nonstandard norm {{}^* \| \|_{{}^* X}} takes values in the nonstandard nonnegative reals to {{}^*[0,+\infty)} rather than the standard nonnegative reals {[0,+\infty)}. However, we can form the subspace {O(X)} of {{}^* X} consisting of those vectors {x \in {}^* X} which are strongly bounded in the sense that {{}^* \|x\|_{{}^* X} = O(1)}. This is an external real subspace of {{}^* X} that contains {X}. It comes with a seminorm {\| \|_X: O(X) \rightarrow [0,+\infty)} defined by

\displaystyle  \| x \|_X := \mathrm{st} {}^* \| x\|_{{}^* X}.

It is easy to see that this is a seminorm. The null space of this seminorm is the subspace {o(X)} of {{}^* X} consisting of those vectors {x \in {}^* X} which are strongly infinitesimal (in {X}) in the sense that {{}^* \|x\|_{{}^* X} = o(1)}; we say two elements of {{}^* X} are strongly equivalent (in {X}) if their difference is strongly infinitesimal. In infinite dimensions, {X} is no longer locally compact, and the Bolzano-Weierstrass theorem now only gives an inclusion:

\displaystyle  X \oplus o(X) \subset O(X).

In general we expect {O(X)} to be significantly larger than {X} (this is the nonstandard analogue of the sequential analysis phenomenon that most bounded sequences in {X} will fail to have convergent subsequences). For instance if {H} is a standard Hilbert space with an orthonormal system {e_n, n \in {\bf N}}, and {N} is an unbounded natural number, one can check that {e_N} lies in {O(X)} but is not in {X \oplus o(X)}. The quotient space {O(X)/o(X)} is a normed vector space that contains {X} as an isometric subspace and is known as the nonstandard hull of {X}, but we will not explicitly use this space much in these notes.

In functional analysis one often has an embedding {X \subset Y} of standard function spaces {X,Y}, with an inequality of the form {\| f \|_Y \leq C \|f\|_X} for all {f \in X} and some constant {C>0}. For instance, one has the Sobolev embeddings {H^1({\bf R}^d \rightarrow {\bf R}) \subset L^p({\bf R}^d \rightarrow {\bf R})} whenever {2 \leq p \leq \infty}, {\frac{1}{p} \geq \frac{1}{2}-\frac{1}{d}}, and {(d,p) \neq (2,\infty)}. One easily sees that such an embedding {X \subset Y} induces also embeddings {O(X) \subset O(Y)} and {o(X) \subset o(Y)}.

If the embedding {X \subset Y} is compact – so that bounded subsets in {X} are precompact in {Y} – then we can partially recover the missing inclusion in the Bolzano-Weierstrass theorem. This follows from

Lemma 5 (Compactness) Let {K} be a standard compact subset (in the strong topology) of a standard normed vector space {X}. Then one has the inclusion

\displaystyle  {}^* K \subset K \oplus o(X),

that is to say every {f \in {}^* K} can be decomposed (uniquely) as {f = g+h} with {g \in K} and {h \in o(X)}.

This is the nonstandard analysis analogue of the assertion that compact subsets of a normed metric space are sequentially compact.

Proof: Uniqueness is clear (since non-zero standard elements of {X} have non-zero standard, hence non-infinitesimal, nonstandard norm), so we turn to existence. If this failed for some {f \in {}^* K}, then for every {g \in K}, there exists a standard {\varepsilon_g>0} such that {{}^* \|f - g \|_{X} \geq \varepsilon_g}. Hence, by compactness of{K}, one can find a standard natural number {k} and standard {g_1,\dots,g_k \in K} such that for all {F \in K}, one has {\|F-g_i \|_X < \varepsilon_{g_i}} for some {i=1,\dots,k}. By transfer, (viewing {g_1,\dots,g_k} as constants), this implies that for all {F \in {}^* K}, one has {\|F-g_i \|_X < \varepsilon_{g_i}} for some {i=1,\dots,k}. Applying this with {F=f}, we obtain a contradiction. \Box

We also note an easy converse inclusion: if {U} is a standard open subset of a standard normed vector space {X}, then

\displaystyle  U \oplus o(X) \subset {}^* U.

Exercise 6 Suppose that the standard normed vector space {X} is separable. Establish the converse implications, that a standard subset {K} of {X} is compact whenever {{}^* K \subset K \oplus o(X)}, and a standard subset {U} is open whenever {U \oplus o(X) \subset {}^* U}. (The hypothesis of separability can be relaxed if one imposes stronger saturation properties on the nonstandard universe than {\aleph_1}-saturation.)

Exercise 7 Let {f: K \rightarrow {\bf R}} be a standard function on a standard subset {K} of a normed vector space {X}, and let {{}^* f: {}^* K \rightarrow {}^* {\bf R}} be the nonstandard counterpart.

  • (i) Show that {f} is bounded if and only if {{}^* f(x) = O(1)} for all {x \in {}^* K}.
  • (ii) Show that {f} is continuous (in the strong topology) if and only if {{}^* f(y) = f(x) + o(1)} whenever {x \in K}, {y \in {}^* K} are such that {y} is strongly equivalent to {x}.
  • (iii) Show that {f} is uniformly continuous (in the strong topology) if and only if {{}^* f(y) = {}^* f(x) + o(1)} whenever {x, y \in {}^* K} are such that {y} is strongly equivalent to {X}.
  • (iv) If {K} is compact, show that {K = \mathrm{st}({}^* K)}. Conclude the well-known fact that a standard continuous function on a compact set {K} is uniformly continuous and bounded.

Now we have

Theorem 8 (Compact embeddings) If {X \subset Y} is a compact embedding of standard normed vector spaces {X,Y}, then

\displaystyle  O(X) \subset Y \oplus o(Y).

If furthermore the closed unit ball of {X} is compact (not just precompact) in {Y}, we can sharpen this to

\displaystyle  O(X) \subset X \oplus o(Y).

This is the analogue of Proposition 3 of Notes 2.

Proof: Since {X \cap o(Y) \subset Y \cap o(Y) = \{0\}}, it suffices to show that every {f \in O(X)} can be written as {f = g + h} with {g \in X} and {h \in o(Y)}. But this is immediate from Lemma 5 (applied to the closure in {Y} of a closed unit ball in {X}). \Box

From the above theorem and the Arzelá-Ascoli theorem, we see for instance that a nonstandard Lipschitz function from {[0,1]} to {{\bf R}} with bounded nonstandard Lipschitz norm can be expressed as the sum of a standard Lipschitz function and a function which is infinitesimal in the uniform norm. Here is a more general version of this latter assertion:

Exercise 9 (Nonstandard version of Arzelá-Ascoli) Let {U \subset {\bf R}^d} be a standard open set, and let {f: {}^* U \rightarrow {}^* {\bf R}} be a nonstandard function obeying the following axioms:

  • (i) (Pointwise boundedness) For all {x \in U}, we have {f(x) = O(1)}.
  • (ii) (Pointwise equicontinuity) For all {x \in U}, we have {f(y) = f(x) + o(1)} whenever {y \in {}^* U} and {y = x + o(1)}.

Let {\mathrm{st} f: U \rightarrow {\bf R}} denote the function

\displaystyle  (\mathrm{st} f)(x) := \mathrm{st}(f(x))

(this is well-defined by pointwise boundedness). Then {\mathrm{st} f: U \rightarrow {\bf R}} is continuous, and its nonstandard representative {{}^* (\mathrm{st}(f)): {}^* U \rightarrow {}^* {\bf R}} is locally infinitesimally close to {f} in the sense that

\displaystyle  f(x) = ({}^* (\mathrm{st} f))(x) + o(1)

whenever {x \in U + o(1)}. Conclude in particular that for every standard compact {K \subset X} there exists an infinitesimal {\varepsilon_K = o(1)} such that

\displaystyle  |f(x) - ({}^* (\mathrm{st} f))(x)| \leq \varepsilon_K

for all {x \in {}^* K}.

We remark that the above discussion for normed vector spaces also extends without difficulty to Frechet spaces {X} that have at most countably many seminorms {\| \|_\alpha, \alpha \in A}, with {O(X)} now consisting of those {f \in {}^* X} with {{}^* \|f\|_\alpha = O(1)} for all {\alpha \in A}, and {o(X)} now consisting of those {f \in {}^* X} with {{}^* \|f\|_\alpha = o(1)} for all {\alpha \in A}.

Now suppose we work with the dual space {X^*} of a normed vector space {X}. (Here unfortunately we have a clash of notation, as the asterisk will now be used both to denote nonstandard representative and dual; hopefully the mathematics will still be unambiguous.) A nonstandard element {\lambda} of {{}^*(X^*)} (thus, a nonstandardly continuous linear functional {\lambda: {}^* X \rightarrow {}^* {\bf R}}) is said to be weak*-bounded if {\lambda(f) = O(1)} for all {f \in X}, and weak*-infinitesimal if {\lambda(f) = o(1)}. The space of weak*-bounded elements of {X^*} will be denoted {O_{w^*}(X^*)}, and the space of weak*-infinitesimal elements denoted {o_{w^*}(X^*)}. These are related to the strong counterparts {O(X^*), o(X^*)} of these spaces by the inclusions

\displaystyle  O(X^*) \subset O_{w^*}(X^*); \quad o(X^*) \subset o_{w^*}(X^*);

we also have the inclusion

\displaystyle  X \oplus o_{w^*}(X^*) \subset O_{w^*}(X_*).

For instance, if {H = H^*} is a standard Hilbert space with orthonormal system {e_n, n \in {\bf N}}, and {N} is an unbounded natural number, then {N e_N} lies in {O_{w^*}(H)} but not in {O(H)}, {o(H)}, or {o_{w^*}(H)}, while {N^{1/2} e_N} lies in {o_{w^*}(H)} (and hence also in {O_{w^*}(H)}) but not in {O(H)} or {o(H)}.

(One could also develop similar notations in which one uses weak topologies instead of the weak* topology, but we will not need the weak topology in these notes.)

We have the following nonstandard version of the Banach-Alaoglu theorem:

Theorem 10 (Nonstandard version of Banach-Alaoglu) If {X} is a normed vector space with dual {X^*}, then we have the inclusion

\displaystyle  O(X^*) \subset X^* \oplus o_{w^*}(X^*).

Proof: Since {X^* \cap o_{w^*}(X^*)}, it suffices to show that every {\lambda \in O(X^*)} can be decomposed as {\lambda = \xi + \eta} for some {\xi \in X^*} and {\eta \in o_{w^*}(X^*)}.

Let {\lambda \in O(X^*)}, thus there is a standard {C} such that {|\lambda(x)| \leq C {}^* \|x\|_{{}^* X}} for all {x \in {}^* X}. In particular, if we define {\xi(f) := \mathrm{st} \lambda(f)} for {f \in X}, then {\xi} depends linearly on {X} and {|\xi(f)| \leq C \|f\|_X} for all {f \in X}; thus {\xi} is an element of {X^*}. Setting {\eta := \lambda - \xi}, we see from construction that {\lambda(f) = o(1)} for all {f \in X}, giving the claim. \Box

Theorem 10 can be compared with Proposition 2 of Notes 2, except in this nonstandard analysis setting, no separability is required on the predual space {X}.

If {T: X \rightarrow Y} is a standard bounded linear operator between normed vector spaces, then one has a nonstandard linear operator {{}^* T: {}^* X \rightarrow {}^* Y}. It is easy to see that this operator maps {O(X)} to {O(Y)} and {o(X)} to {o(Y)}. The adjoint operator {T^*: Y^* \rightarrow X^*} similarly maps {O(Y^*)} to {O(X^*)} and {o(Y^*)} to {o(X^*)}, but also takes {O_{w^*}(Y^*)} to {O_{w^*}(X^*)} and {o_{w^*}(Y^*)} to {o_{w^*}(X^*)}.

A (nonstandard) linear operator {P: O(X^*) \rightarrow O(X^*)} will be said to be an approximate identity on {O(X^*)} if {I-P} maps {O(X^*)} to {o_{w^*}(X^*)}. Here are two basic and useful examples of such approximate identities:

Exercise 11 (Frequency and spatial localisation as approximate identities)

  • (i) For standard {1 < p \leq \infty} and unbounded {N > 0}, show that the nonstandard Littlewood-Paley projection {P_{\leq N}} is an approximate identity on {O(L^p({\bf R}^d))}, and more generally on the Sobolev spaces {O(W^{p,k}({\bf R}^d))} for any standard natural number {k}.
  • (ii) For standard {1 < p < \infty} and unbounded {R>0}, and a standard test function {\psi \in C^\infty_c({\bf R}^d)} that equals {1} near the origin, show that the nonstandard spatial truncation opertor {f(\cdot) \mapsto \psi(\cdot/{\bf R}) f(\cdot)} is an approximate identity on {O(L^p({\bf R}^d))}, and more generally on the Sobolev spaces {O(W^{p,k}({\bf R}^d))} for any standard natural number {k}. What happens at {p=\infty}?

— 3. Nonstandard analysis and distributions —

Let {U \subset {\bf R}^d} be a standard open set. The dual of the standard space {C^\infty_c(U \rightarrow {\bf R})} of test functions on {U} is the standard space {C^\infty_c(U \rightarrow {\bf R})^*} of distributions. Like any other standard space, it has a nonstandard counterpart {*(C^\infty_c(U \rightarrow {\bf R})^*)}, whose elements are the nonstandard distributions.

A nonstandard distribution {\lambda \in {}^*(C^\infty_c(U \rightarrow {\bf R})^*)} will be said to be weakly bounded if, for any standard compact set {K \subset U}, there is a standard natural number {k} and standard {C>0} such that one has the bound

\displaystyle  |\lambda(f)| \leq C \| f \|_{C^k(K \rightarrow {\bf R})}

for all standard {f\in C^k(K \rightarrow {\bf R})}. (It would be slightly more accurate to use the terminology “weak-* bounded” instead of “weakly bounded”, but we will omit the asterisk here to make the notation a little less clunky. Similarly for the related concepts below) Thus for instance any standard distribution is weakly bounded, and if {X} is any normed vector space structure on {C^\infty_c(U \rightarrow {\bf R})} that is continuous in the test function topology, and {\lambda \in O(X^*)}, then {\lambda} will be weakly bounded. We say that a nonstandard distribution {\lambda} is weakly infinitesimal if one has {\lambda(f) = o(1)} for all standard {f \in C^\infty_c(U \rightarrow {\bf R})}. For instance, if {X} is any normed vector space structure on {C^\infty_c(U \rightarrow {\bf R})} and {\lambda = o(X^*)}, then {\lambda} will be weakly infinitesimal. We say that two nonstandard distributions {\lambda, \lambda'} are weakly equivalent, and write {\lambda \sim \lambda'}, if they differ by a weakly infinitesimal distribution, thus

\displaystyle  \lambda(f) = \lambda'(f) + o(1)

for all standard test functions {f \in C^\infty_c(U \rightarrow {\bf R})}.

If {\lambda} is a weakly bounded distribution, we can define the weakly standard part {\mathrm{st}_w(\lambda) \in C^\infty_c(U \rightarrow {\bf R})^*} to be the distribution defined by the formula

\displaystyle  (\mathrm{st}_w(\lambda))(f) := \mathrm{st}(\lambda(f)).

This is the unique standard distribution that is weakly equivalent to {\lambda}. As such, it must be compatible with the other decompositions of the preceding section. For instance, if {X} is a normed vector space structure on {C^\infty_c(U \rightarrow {\bf R})} and {\lambda \in O(X^*)}, then the decomposition {\lambda = \mathrm{st}_w(\lambda) + (\lambda - \mathrm{st}_w(\lambda) )} must agree with the one in Theorem 10, thus {\mathrm{st}_w(\lambda) \in X^*} and {\lambda - \mathrm{st}_w(\lambda) = o_{w^*}(X^*)}. In particular, if {X^*} embeds compactly into a normed vector space {Y}, we also have {\lambda - \mathrm{st}_w(\lambda) = o(Y)}, thus weak equivalence in {O(X^*)} implies strong equivalence in {O(Y)}. By the definition of the dual norm we also conclude the Fatou-type inequality

\displaystyle  \| \mathrm{st}_w(\lambda) \|_{X^*} \leq \mathrm{st} \| \lambda \|_{{}^* X^*} \ \ \ \ \ (1)

whenever {\lambda \in O(X^*)}.

Informally, {\mathrm{st}_w(\lambda)} represents the portion of {\lambda} that one can “observe” at standard physical scales and standard frequency scales, ignoring all components of {\lambda} that are at unbounded or infinitesimal physical or frequency scales. The following examples may help illustrate this point:

Example 12 Let {N} be an unbounded natural number, and on {{\bf R}} let {\lambda} be the nonstandard distribution {\lambda(x) := \sin(Nx)}. Then {\lambda} is weakly infinitesimal, so {\mathrm{st}_w(\lambda) = 0}. This is in contrast to the pointwise standard part {\mathrm{st} \lambda(x) = \mathrm{st}(Nx)}, which is a rather wild (and almost certainly non-measurable) function from {{\bf R}} to {[-1,1]}. The nonstandard distribution {N \sin(Nx)} is not even pointwise bounded in general, so does not have a pointwise standard part, but is still weakly infinitesimal and so {\mathrm{st}_w(\lambda)=0}. Similarly if one replaces {\sin(Nx)} by {\psi(x-N)} for some standard bump function {\psi \in C^\infty_c({\bf R})}.

The following table gives the analogy between these nonstandard analysis concepts, and the more familiar ones from sequential weak compactness theory:

Standard distribution {\lambda} Distribution {\lambda}
Nonstandard distribution {\lambda} Sequence of distributions {\lambda_n}
Embedding {{}^*\lambda} of standard distribution {\lambda} Constant sequence {\lambda_n = \lambda}
{\lambda = O(X)} {\lambda_n} is bounded in {X} norm
{\lambda = o(X)} {\lambda_n} converges strongly to zero in {X}
{\lambda \in X +o(X)} {\lambda_n} converges in {X} norm (possibly after passing to subsequence)
{\lambda = O_{w^*}(X^*)} {\lambda_n} is weak-* bounded in {X^*}
{\lambda = o_{w^*}(X^*)} {\lambda_n} converges weak-* to zero in {X^*}
{\lambda} is weakly infinitesimal {\lambda_n} converges to zero in distribution
{\mathrm{st}_w(\lambda)} Weak limit of {\lambda_n} (possibly after passing to subsequence)

Exercise 13 Let {f \in O( L^2( U \rightarrow {\bf R}) )}, establish the Pythagorean identity

\displaystyle  \|f\|_{L^2(U \rightarrow {\bf R})}^2 = \| \mathrm{st}(f) \|_{L^2(U \rightarrow {\bf R})}^2 + \| f - \mathrm{st}(f) \|_{L^2(U \rightarrow {\bf R})}^2.

This identity can be used as a starting point for the theory of concentration compactness, as discussed in these notes. What happens in other {L^p} spaces?

Exercise 14 Show that every standard distribution {\lambda \in C^\infty_c(U \rightarrow {\bf R})^*} is the weakly standard part of some weakly bounded nonstandard test function {f: {}^* {\bf C}^\infty_c(U \rightarrow {\bf R})}. (Hint: When {U} is {{\bf R}^d}, one can convolve with a nonstandard approximation to the identity and also apply a nonstandard spatial cutoff. When {U} is a proper subset of {{\bf R}^d} one also has to smoothly cut off outside an infinitesimal neighbourhood of the boundary of {U} if one wants to make the convolution well defined.) This result can be viewed as analogous to the previous observation that every standard real is the standard part of a bounded rational. This also provides an alternate way to construct distributions, as weakly bounded nonstandard test functions up to weak equivalence.

Exercise 15 (Arzela-Ascoli, nonstandard distributional form) If {f \in {}^* C(U \rightarrow {\bf R})} is a nonstandard continuous function obeying the pointwise boundedness and pointwise equicontinuity axioms from Exercise 9. Show that {f} is also a weakly bounded distribution and that the weakly standard part of {f} agrees with the pointwise standard part: {\mathrm{st}_w f = \mathrm{st} f}. In particular, if {f} is also weakly infinitesimal, conclude that {\| f \|_{{}^* C(K)} = o(1)} for every standard compact set {K \subset {\bf R}}.

— 4. Leray-Hopf solutions to Navier-Stokes —

For {u_0 \in L^2({\bf R}^d \rightarrow {\bf R}^d)} that is divergence-free, recall that a Leray-Hopf solution to the Navier-Stokes equations on {[0,+\infty) \times {\bf R}^d} is a distribution {u \in L^\infty_t L^2_x( {\bf R} \times {\bf R}^d \rightarrow {\bf R}^d )} vanishing outside of {[0,+\infty) \times {\bf R}^d} that has the additional regularity

\displaystyle  \nabla u \in L^2_t L^\infty_x({\bf R} \times {\bf R}^d \rightarrow {\bf R}^{d^2})

obeying the energy inequality

\displaystyle  \frac{1}{2} \int_{{\bf R}^d} |u(T,x)|^2\ dx + \nu \int_0^T \int_{{\bf R}^d} |\nabla u(t,x)|^2\ dx dt \leq \frac{1}{2} \int_{{\bf R}^d} |u_0(x)|^2\ dx \ \ \ \ \ (2)

for almost all {T \in [0,+\infty)}, and solves the equation

\displaystyle  \partial_t u + \partial_j \mathbb{P} (u_j u) = \nu \Delta u + \delta(t) u_0(x) \ \ \ \ \ (3)

in the sense of distributions. We now give a nonstandard interpretation of this concept:

Proposition 16 (Nonstandard interpretation of Leray-Hopf solution) Let {u_0 \in L^2({\bf R}^d \rightarrow {\bf R}^d)} be divergence-free. Let {u: {}^* ([0,+\infty) \times {\bf R}^d) \rightarrow {}^* {\bf R}^d} be a nonstandard smooth function obeying the following properties:

Then after extending {u} by zero to {{}^*({\bf R} \times {\bf R}^d)}, {u} is weakly bounded and {\mathrm{st}_w u} is a Leray-Hopf solution to Navier-Stokes. Conversely, every such Leray-Hopf solution arises in this fashion.

Thus, roughly speaking, Leray-Hopf solutions arise from nonstandard strong solutions to Navier-Stokes in which one permits weakly infinitesimal changes to the initial data and forcing term, which are arbitrary save for the constraint that these changes do not add more than an infinitesimal amount of energy to the system, and which also obey a technical but weak condition on the time derivative. Thus, for instance, one can insert a nonstandard frequency mollification at an unbounded frequency cutoff {N}, or a spatial truncation at an unbounded spatial scale {R}, without difficulty (so long as one checks that such modifications do not introduce more than an infinitesimal amount of energy into the system), which can be used to recover the standard construction of Leray-Hopf solutions.

Proof: First suppose that {u} obeys all the axioms (i)-(iv). We now repeat the arguments used to prove Theorem 14 of Notes 2, but translated to the language of nonstandard analysis. (All the key inputs are still basically the same; the changes are almost all entirely in the surrounding formalism.)

From (i) we see that

\displaystyle  u \in O( L^\infty_t L^2_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d))


\displaystyle  \nabla u \in O( L^2_t L^2_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^{d^2}))

which implies that

\displaystyle  \mathrm{st}_w u \in L^\infty_t L^2_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d)


\displaystyle  \nabla \mathrm{st}_w u \in L^2_t L^2_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^{d^2}).

Also, for every standard {0 < T < \infty} we have

\displaystyle  u \in O( L^2_t H^1_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d))

and hence by Sobolev embedding

\displaystyle  u \in O( L^2_t L^p_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d))

for {p>2} sufficiently close to {2}. From Hölder this gives

\displaystyle  u_j u \in O( L^1_t L^{p/2}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d))

and then by the boundedness of the Leray projector

\displaystyle  \mathbb{P}( u_j u ) \in O( L^1_t L^{p/2}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)).

We have

\displaystyle  \partial_t u + \partial_j \mathbb{P}(u_j u) = \nu \Delta u + F + \delta(t) u(0,x)

in the sense of nonstandard distributions. Taking weakly standard part, we will obtain a weak solution to the Navier-Stokes equations as long as

\displaystyle  \mathrm{st}_w \mathbb{P}(u_j u) = \mathbb{P}(\mathrm{st}_w u_j \mathrm{st}_w u).

Both sides lie in {L^1_t L^{p/2}_x([0,T] \times {\bf R}^d \rightarrow {\bf R}^d)}, but the equality requires some care. Applying a standard test function {\phi \in C^\infty_c( {\bf R} \times {\bf R}^d \rightarrow {\bf R} )}, it suffices to show that

\displaystyle  \int_{{}^*([0,T] \times {\bf R}^d)} u_j u \cdot \mathbb{P} \phi\ dx dt = \int_{[0,T] \times {\bf R}^d} \mathrm{st}_w u_j \mathrm{st}_w u \cdot \mathbb{P} \phi\ dx dt + o(1) \ \ \ \ \ (6)

for all such {\phi}, which we can assume to be supported in {[0,T] \times {\bf R}^d}. One can check that {\mathbb{P} \phi} lies in {L^\infty_t L^{(p/2)'}([0,T] \times {\bf R}^d)}. If we can show that for every standard compact set {K \subset {\bf R}^d} that

\displaystyle  \int_{{}^*([0,T] \times K)} u_j u \cdot \mathbb{P} \phi\ dx dt = \int_{[0,T] \times K} \mathrm{st}_w u_j \mathrm{st}_w u \cdot \mathbb{P} \phi\ dx dt + o(1)

hence by Corollary 4 the same claim is true for some unbounded {R}; however, the nonstandard {L^\infty_t L^{(p/2)'}_x} norm of {\mathbb{P} \phi} outside of an unbounded ball {B(0,R)} is infinitesimal, and we then conclude (6).

Fix {K}. By Hölder’s inequality, it now suffices to show that

\displaystyle  \| u_j u - \mathrm{st}_w u_j \mathrm{st}_w u \|_{L^1_t L^{p/2}_x( {}^* ([0,T] \times K ) )} = o(1),

that is to say {u_j u} is strongly equivalent to {\mathrm{st}_w u_j \mathrm{st}_w u} in {L^1_t L^{p/2}_x([0,T] \times K)}. By another application of Hölder, it suffices to show that {u} is strongly equivalent to {\mathrm{st}_w u} in {L^2_t L^p_x([0,T] \times K)}.

For unbounded {N}, {P_{\leq N} \mathrm{st}_w u} is strongly equivalent to {\mathrm{st}_w u} in {L^2_t L^p_x([0,T] \times K)}; since {u} is bounded in {L^2_t H^1_x([0,T] \times {\bf R}^d)}, we also see from Bernstein’s theorem and the triangle inequality that {P_{\leq N} u} is strongly equivalent to {u} in {L^2_t L^p_x([0,T] \times K)}. Thus it suffices to show that for at least one unbounded {N}, that {P_{\leq N} u} is strongly equivalent to {P_{\leq N} \mathrm{st}_w u} in {L^2_t L^p_x([0,T] \times K)}. By overspill, it suffices to do this for arbitarily large standard {N}. But by Bernstein’s inequality, the difference {P_{\leq N} u - P_{\leq N} \mathrm{st}_w u} is bounded in {L^\infty_t L^\infty_x}, has space derivative bounded in {L^\infty_t L^\infty_x}, and time derivative bounded in {L^2_t L^\infty_x} on {{}^*( [0,T] \times {\bf R}^d )} by hypothesis, hence is equicontinuous from the fundamental theorem of calculus and Cauchy-Schwarz; by Exercise 15 we conclude that {P_{\leq N} u - P_{\leq N} \mathrm{st}_w u} is strongly infinitesimal in {L^\infty_t L^\infty_x([0,T] \times K)}, and hence also in {L^2_t L^p_x([0,T] \times K)} as required.

To show the energy inequality for {\mathrm{st}_w u}, we again repeat the arguments from Notes 2. If {0 < T < \infty} is standard and {\eta_\varepsilon \in C^\infty_c({\bf R} \rightarrow {\bf R})} is a standard non-negative test function supported on {[T,T+\varepsilon]} of total mass one for some small standard {\varepsilon>0}, then from averaging (4) we have

\displaystyle  \frac{1}{2} \int_{T}^{T+\varepsilon} \int_{{}^* {\bf R}^d} \eta_\varepsilon(t) |u(t,x)|^2\ dx dt + \nu \int_{{}^* [0,T]} \int_{{}^* {\bf R}^d} |\nabla u(t,x)|^2\ dx

\displaystyle \leq \frac{1}{2} \int_{{}^* {\bf R}^d} |u_0(x)|^2\ dx + o(1).

Taking weakly standard parts using (1) we conclude that

\displaystyle  \frac{1}{2} \int_{T}^{T+\varepsilon} \int_{{\bf R}^d} \eta_\varepsilon(t) |\mathrm{st}_w u(t,x)|^2\ dx dt + \nu \int_{[0,T]} \int_{{\bf R}^d} |\nabla \mathrm{st}_w u(t,x)|^2\ dx

\displaystyle \leq \frac{1}{2} \int_{{\bf R}^d} |u_0(x)|^2\ dx.

The energy inequality (2) then follows from the Lebesgue differentiation theorem (which, incidentally, can also be translated into a nonstandard analysis framework, as discussed to some extent in this previous post).

Now we establish the converse direction. Let {v} be a Leray-Hopf solution, then from Sobolev and Hölder we have

\displaystyle  v_j v \in L^2_t L^{q}_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d)

for some {q>1} sufficiently close to {1}, and hence from the weak Navier-Stokes equation and Bernstein’s inequality we have

\displaystyle  \partial_t P_{\leq N} v \in L^2_t L^\infty_x([0,+\infty) \times {\bf R}^d \rightarrow {\bf R}^d) \ \ \ \ \ (7)

for every standard {N} (but without claiming the bound to be uniform in {N}).

Now let {\varepsilon>0} be infinitesimal and {N_1>0} be unbounded, let {\phi \in C^\infty_c({\bf R} \rightarrow {\bf R})} be a standard non-negative test function of total mass {1} supported on {[0,1]}, and let {u} be the nonstandard function

\displaystyle  u(t,x) := \int_{{}^*[0,1]} \phi(s) P_{\leq N_1} v(t + \varepsilon s, x )\ ds

arising from performing a frequency truncation in space and a smooth averaging in time. This is a nonstandard smooth function. From Minkowski’s inequality, (2), and the non-expansive nature of {P_{\leq N}} on {L^2} we see that we have the energy inequality (4) (in fact we do not even need the {o(1)} error here). From Exercise 19 of Notes 2, we know that {v(t)} converges strongly in {L^2} to {u_0} as {t \rightarrow 0}, and hence {P_{\leq N} v(\varepsilon s, x)} is strongly equivalent to {u_0} in {L^2} for all {s \in {}^* [0,1]}, and hence {u(0)} is also. From (7) and Minkowski’s inequality we also conclude property (iii) of the proposition.

The only remaining thing to verify is property (iv), which we will do assuming that {\varepsilon} is sufficiently small depending on {N_1}. From (3) we see that on {{}^* ([0,+\infty) \times {\bf R}^d)}, we have (in the classical sense) that

\displaystyle  \partial_t u(t,x) - \nu \Delta u(t,x) = - \int_{{}^*[0,1]} \phi(s) P_{\leq N_1} {\mathbb P}( v_j v )(t + \varepsilon s, x )\ ds

and so (5) holds with forcing term

\displaystyle  F = \partial_j \int_{{}^*[0,1]} \int_{{}^*[0,1]} \phi(s) \phi(s') {\mathbb P}( P_{\leq N_1} v_j(t+\varepsilon s) P_{\leq N_1} v(t+\varepsilon s') )(x )\ ds ds'

\displaystyle - \int_{{}^*[0,1]} \phi(s) P_{\leq N_1} {\mathbb P}( v_j v )(t + \varepsilon s, x )\ ds.

To show that {F} is weakly infinitesimal, it suffices as before to show that

\displaystyle  G_j := \int_{{}^*[0,1]} \int_{{}^*[0,1]} \phi(s) \phi(s') P_{\leq N_1} v_j(t+\varepsilon s) P_{\leq N_1} v(t+\varepsilon s') )(x )\ ds ds'

\displaystyle - \int_{{}^*[0,1]} \phi(s) P_{\leq N_1}( v_j v )(t + \varepsilon s, x )\ ds

is strongly infinitesimal in {L^1_t L^{p/2}([0,T] \times K)} for every standard {T} and compact {K}. But from (7) we know that {P_{\leq N} v(t+\varepsilon s,x) = P_{\leq N} v(t,x) + o(1)} for all {s \in {}^* [0,1]}, {t \in {}^* [0,T]}, and {x \in {}^* {\bf R}^d}, and hence by overspill one has

\displaystyle P_{\leq N_1} v(t+\varepsilon s,x) = P_{\leq N_1} v(t,x) + o(1)

for the same range of {s,t,x} if {\varepsilon} is sufficiently small depending on {N_1}. Thus {G_j} is strongly equivalent in {L^\infty_t L^\infty_x([0,T] \times K)} (and hence in {L^1_t L^{p/2}([0,T] \times K)} to the commutator type expression

\displaystyle  P_{\leq N_1} v_j P_{\leq N_1} v - P_{\leq N_1}(v_j v).

But from Bernstein’s inequality and the {L^2_t H^1_x} boundedness of {v}, we know that {P_{\leq N_1} v} is strongly equivalent to {v} in {L^2_t L^p_x([0,T] \times {\bf R}^d)}, so by Hölder it suffices to show that {P_{>N_1}(v_j v)} is strongly infinitesimal in {L^1_t L^{p/2}([0,T] \times K)}. But this follows from the fact that {v} is strongly equivalent to {P_{\leq N_1/4} v} in {L^2_t L^{p/2}}, and that {P_{>N_1}( P_{\leq N_1/4} v_j P_{\leq N_1/4} v)} vanishes. \Box

Remark 17 The above proof shows that we can in fact demand stronger regularity on the time derivative {\partial_t u} than is required in Proposition 16(iii) if desired; for instance, one can place {\partial_t u} in {L^2_t H^{-1}_x + L^1_t L^{p/2}_x} for {p>2} close enough to {p}.

Exercise 18 State and prove a more traditional analogue of this proposition that asserts (roughly speaking) that any weak limit of a sequence of smooth solutions to Navier-Stokes with changes in initial data and forcing term that converge weakly to zero, which asymptotically obeys the energy inequality, and which has some weak uniform control on the time derivative, will produce a Leray-Hopf solution, and conversely that every Leray-Hopf solution arises in this fashion.

Exercise 19 Translate the proof of weak-strong uniqueness from Proposition 20 of Notes 2 to nonstandard analysis, by first using Proposition 16 to interpret the weak solution as the weakly standard part of a strong nonstandard approximate solution. (One will need the improved control on {\partial_t u} mentioned in Remark 6.)

John BaezCategory Theory Course

I’m teaching a course on category theory at U.C. Riverside, and since my website is still suffering from reduced functionality I’ll put the course notes here for now. I taught an introductory course on category theory in 2016, but this one is a bit more advanced.

The hand-written notes here are by Christian Williams. They are probably best seen as a reminder to myself as to what I’d like to include in a short book someday.

Lecture 1: What is pure mathematics all about? The importance of free structures.

Lecture 2: The natural numbers as a free structure. Adjoint functors.

Lecture 3: Adjoint functors in terms of unit and counit.

Lecture 4: 2-Categories. Adjunctions.

Lecture 5: 2-Categories and string diagrams. Composing adjunctions.

Lecture 6: The ‘main spine’ of mathematics. Getting a monad from an adjunction.

Lecture 7: Definition of a monad. Getting a monad from an adjunction. The augmented simplex category.

Lecture 8: The walking monad, the augmented simplex category and the simplex category.

Lecture 9: Simplicial abelian groups from simplicial sets. Chain complexes from simplicial abelian groups.

Lecture 10: The Dold-Thom theorem: the category of simplicial abelian groups is equivalent to the category of chain complexes of abelian groups. The homology of a chain complex.

Lecture 7: Definition of a monad. Getting a monad from an adjunction. The augmented simplex category.

Lecture 8: The walking monad, the
augmented simplex category and the simplex category.

Lecture 9: Simplicial abelian groups from simplicial sets. Chain complexes from simplicial abelian groups.

Lecture 10: Chain complexes from simplicial abelian groups. The homology of a chain complex.

Lecture 12: The bar construction: getting a simplicial objects from an adjunction. The bar construction for G-sets, previewed.

Lecture 13: The adjunction between G-sets and sets.

Lecture 14: The bar construction for groups.

Lecture 15: The simplicial set \mathbb{E}G obtained by applying the bar construction to the one-point G-set, its geometric realization EG = |\mathbb{E}G|, and the free simplicial abelian group \mathbb{Z}[\mathbb{E}G].

Lecture 16: The chain complex C(G) coming from the simplicial abelian group \mathbb{Z}[\mathbb{E}G], its homology, and the definition of group cohomology H^n(G,A) with coefficients in a G-module.

Lecture 17: Extensions of groups. The Jordan-Hölder theorem. How an extension of a group G by an abelian group A gives an action of G on A and a 2-cocycle c \colon G^2 \to A.

Lecture 18: Classifying abelian extensions of groups. Direct products, semidirect products, central extensions and general abelian extensions. The groups of order 8 as abelian extensions.

Lecture 19: Group cohomology. The chain complex for the cohomology of G with coefficients in A, starting from the bar construction, and leading to the 2-cocycles used in classifying abelian extensions. The classification of extensions of G by A in terms of H^2(G,A).

Lecture 20: Examples of group cohomology: nilpotent groups and the fracture theorem. Higher-dimensional algebra and homotopification: the nerve of a category and the nerve of a topological space. \mathbb{E}G as the nerve of the translation groupoid G/\!/G. BG = EG/G as the walking space with fundamental group G.

Lecture 21: Homotopification and higher algebra. Internalizing concepts in categories with finite products. Pushing forward internalized structures using functors that preserve finite products. Why the ‘discrete category on a set’ functor \mathrm{Disc} \colon \mathrm{Set} \to \mathrm{Cat}, the ‘nerve of a category’ functor \mathrm{N} \colon \mathrm{Cat} \to \mathrm{Set}^{\Delta^{\mathrm{op}}}, and the ‘geometric realization of a simplicial set’ functor |\cdot| \colon \mathrm{Set}^{\Delta^{\mathrm{op}}} \to \mathrm{Top} preserve products.

Lecture 22: Monoidal categories. Strict monoidal categories as monoids in \mathrm{Cat} or one-object 2-categories. The periodic table of strict n-categories. General ‘weak’ monoidal categories.

Lecture 23: 2-Groups. The periodic table of weak n-categories. The stabilization hypothesis. The homotopy hypothesis. Classifying 2-groups with G as the group of objects and A as the abelian group of automorphisms of the unit object in terms of H^3(G,A). The Eckmann–Hilton argument.

December 08, 2018

Terence TaoFourier uniformity of bounded multiplicative functions in short intervals on average

Kaisa Matomäki, Maksym Radziwill, and I just uploaded to the arXiv our paper “Fourier uniformity of bounded multiplicative functions in short intervals on average“. This paper is the outcome of our attempts during the MSRI program in analytic number theory last year to attack the local Fourier uniformity conjecture for the Liouville function {\lambda}. This conjecture generalises a landmark result of Matomäki and Radziwill, who show (among other things) that one has the asymptotic

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o(HX) \ \ \ \ \ (1)

whenever {X \rightarrow \infty} and {H = H(X)} goes to infinity as {X \rightarrow \infty}. Informally, this says that the Liouville function has small mean for almost all short intervals {[x,x+H]}. The remarkable thing about this theorem is that there is no lower bound on how {H} goes to infinity with {X}; one can take for instance {H = \log\log\log X}. This lack of lower bound was crucial when I applied this result (or more precisely, a generalisation of this result to arbitrary non-pretentious bounded multiplicative functions) a few years ago to solve the Erdös discrepancy problem, as well as a logarithmically averaged two-point Chowla conjecture, for instance it implies that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log X).

The local Fourier uniformity conjecture asserts the stronger asymptotic

\displaystyle  \int_X^{2X} \sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (2)

under the same hypotheses on {H} and {X}. As I worked out in a previous paper, this conjecture would imply a logarithmically averaged three-point Chowla conjecture, implying for instance that

\displaystyle  \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log X).

This particular bound also follows from some slightly different arguments of Joni Teräväinen and myself, but the implication would also work for other non-pretentious bounded multiplicative functions, whereas the arguments of Joni and myself rely more heavily on the specific properties of the Liouville function (in particular that {\lambda(p)=-1} for all primes {p}).

There is also a higher order version of the local Fourier uniformity conjecture in which the linear phase {{}e(-\alpha n)} is replaced with a polynomial phase such as {e(-\alpha_d n^d - \dots - \alpha_1 n - \alpha_0)}, or more generally a nilsequence {\overline{F(g(n) \Gamma)}}; as shown in my previous paper, this conjecture implies (and is in fact equivalent to, after logarithmic averaging) a logarithmically averaged version of the full Chowla conjecture (not just the two-point or three-point versions), as well as a logarithmically averaged version of the Sarnak conjecture.

The main result of the current paper is to obtain some cases of the local Fourier uniformity conjecture:

Theorem 1 The asymptotic (2) is true when {H = X^\theta} for a fixed {\theta > 0}.

Previously this was known for {\theta > 5/8} by the work of Zhan (who in fact proved the stronger pointwise assertion {\sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|= o(H)} for {X \leq x \leq 2X} in this case). In a previous paper with Kaisa and Maksym, we also proved a weak version

\displaystyle  \sup_{\alpha \in {\bf R}} \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)

of (2) for any {H} growing arbitrarily slowly with {X}; this is stronger than (1) (and is in fact proven by a variant of the method) but significantly weaker than (2), because in the latter the worst-case {\alpha} is permitted to depend on the {x} parameter, whereas in (3) {\alpha} must remain independent of {x}.

Unfortunately, the restriction {H = X^\theta} is not strong enough to give applications to Chowla-type conjectures (one would need something more like {H = \log^\theta X} for this). However, it can still be used to control some sums that had not previously been manageable. For instance, a quick application of the circle method lets one use the above theorem to derive the asymptotic

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \lambda(n) \Lambda(n+h) \Lambda(n+2h) = o( H X )

whenever {H = X^\theta} for a fixed {\theta > 0}, where {\Lambda} is the von Mangoldt function. Amusingly, the seemingly simpler question of establishing the expected asymptotic for

\displaystyle  \sum_{h \leq H} \sum_{n \leq X} \Lambda(n+h) \Lambda(n+2h)

is only known in the range {\theta \geq 1/6} (from the work of Zaccagnini). Thus we have a rare example of a number theory sum that becomes easier to control when one inserts a Liouville function!

We now give an informal description of the strategy of proof of the theorem (though for numerous technical reasons, the actual proof deviates in some respects from the description given here). If (2) failed, then for many values of {x \in [X,2X]} we would have the lower bound

\displaystyle  |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha_x n)| \gg 1

for some frequency {\alpha_x \in{\bf R}}. We informally describe this correlation between {\lambda(n)} and {e(\alpha_x n)} by writing

\displaystyle  \lambda(n) \approx e(\alpha_x n) \ \ \ \ \ (4)

for {n \in [x,x+H]} (informally, one should view this as asserting that {\lambda(n)} “behaves like” a constant multiple of {e(\alpha_x n)}). For sake of discussion, suppose we have this relationship for all {x \in [X,2X]}, not just many.

As mentioned before, the main difficulty here is to understand how {\alpha_x} varies with {x}. As it turns out, the multiplicativity properties of the Liouville function place a significant constraint on this dependence. Indeed, if we let {p} be a fairly small prime (e.g. of size {H^\varepsilon} for some {\varepsilon>0}), and use the identity {\lambda(np) = \lambda(n) \lambda(p) = - \lambda(n)} for the Liouville function to conclude (at least heuristically) from (4) that

\displaystyle  \lambda(n) \approx e(\alpha_x n p)

for {n \in [x/p, x/p + H/p]}. (In practice, we will have this sort of claim for many primes {p} rather than all primes {p}, after using tools such as the Turán-Kubilius inequality, but we ignore this distinction for this informal argument.)

Now let {x, y \in [X,2X]} and {p,q \sim P} be primes comparable to some fixed range {P = H^\varepsilon} such that

\displaystyle  x/p = y/q + O( H/P). \ \ \ \ \ (5)

Then we have both

\displaystyle  \lambda(n) \approx e(\alpha_x n p)


\displaystyle  \lambda(n) \approx e(\alpha_y n q)

on essentially the same range of {n} (two nearby intervals of length {\sim H/P}). This suggests that the frequencies {p \alpha_x} and {q \alpha_y} should be close to each other modulo {1}, in particular one should expect the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } 1. \ \ \ \ \ (6)

Comparing this with (5) one is led to the expectation that {\alpha_x} should depend inversely on {x} in some sense (for instance one can check that

\displaystyle  \alpha_x = T/x \ \ \ \ \ (7)

would solve (6) if {T = O( X / H^2 )}; by Taylor expansion, this would correspond to a global approximation of the form {\lambda(n) \approx n^{iT}}). One now has a problem of an additive combinatorial flavour (or of a “local to global” flavour), namely to leverage the relation (6) to obtain global control on {\alpha_x} that resembles (7).

A key obstacle in solving (6) efficiently is the fact that one only knows that {p \alpha_x} and {q \alpha_y} are close modulo {1}, rather than close on the real line. One can start resolving this problem by the Chinese remainder theorem, using the fact that we have the freedom to shift (say) {\alpha_y} by an arbitrary integer. After doing so, one can arrange matters so that one in fact has the relationship

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p \ \ \ \ \ (8)

whenever {x,y \in [X,2X]} and {p,q \sim P} obey (5). (This may force {\alpha_q} to become extremely large, on the order of {\prod_{p \sim P} p}, but this will not concern us.)

Now suppose that we have {y,y' \in [X,2X]} and primes {q,q' \sim P} such that

\displaystyle  y/q = y'/q' + O(H/P). \ \ \ \ \ (9)

For every prime {p \sim P}, we can find an {x} such that {x/p} is within {O(H/P)} of both {y/q} and {y'/q'}. Applying (8) twice we obtain

\displaystyle  p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p


\displaystyle  p \alpha_x = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

and thus by the triangle inequality we have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p

for all {p \sim P}; hence by the Chinese remainder theorem

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } \prod_{p \sim P} p.

In practice, in the regime {H = X^\theta} that we are considering, the modulus {\prod_{p \sim P} p} is so huge we can effectively ignore it (in the spirit of the Lefschetz principle); so let us pretend that we in fact have

\displaystyle  q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \ \ \ \ \ (10)

whenever {y,y' \in [X,2X]} and {q,q' \sim P} obey (9).

Now let {k} be an integer to be chosen later, and suppose we have primes {p_1,\dots,p_k,q_1,\dots,q_k \sim P} such that the difference

\displaystyle  q = |p_1 \dots p_k - q_1 \dots q_k|

is small but non-zero. If {k} is chosen so that

\displaystyle  P^k \approx \frac{X}{H}

(where one is somewhat loose about what {\approx} means) then one can then find real numbers {x_1,\dots,x_k \sim X} such that

\displaystyle  \frac{x_j}{p_j} = \frac{x_{j+1}}{q_j} + O( \frac{H}{P} )

for {j=1,\dots,k}, with the convention that {x_{k+1} = x_1}. We then have

\displaystyle  p_j \alpha_{x_j} = q_j \alpha_{x_{j+1}} + O( \frac{P}{H} )

which telescopes to

\displaystyle  p_1 \dots p_k \alpha_{x_1} = q_1 \dots q_k \alpha_{x_1} + O( \frac{P^k}{H} )

and thus

\displaystyle  q \alpha_{x_1} = O( \frac{P^k}{H} )

and hence

\displaystyle  \alpha_{x_1} = O( \frac{P^k}{H} ) \approx O( \frac{X}{H^2} ).

In particular, for each {x \sim X}, we expect to be able to write

\displaystyle  \alpha_x = \frac{T_x}{x} + O( \frac{1}{H} )

for some {T_x = O( \frac{X^2}{H^2} )}. This quantity {T_x} can vary with {x}; but from (10) and a short calculation we see that

\displaystyle  T_y = T_{y'} + O( \frac{X}{H} )

whenever {y, y' \in [X,2X]} obey (9) for some {q,q' \sim P}.

Now imagine a “graph” in which the vertices are elements {y} of {[X,2X]}, and two elements {y,y'} are joined by an edge if (9) holds for some {q,q' \sim P}. Because of exponential sum estimates on {\sum_{q \sim P} q^{it}}, this graph turns out to essentially be an “expander” in the sense that any two vertices {y,y' \in [X,2X]} can be connected (in multiple ways) by fairly short paths in this graph (if one allows one to modify one of {y} or {y'} by {O(H)}). As a consequence, we can assume that this quantity {T_y} is essentially constant in {y} (cf. the application of the ergodic theorem in this previous blog post), thus we now have

\displaystyle  \alpha_x = \frac{T}{x} + O(\frac{1}{H} )

for most {x \in [X,2X]} and some {T = O(X^2/H^2)}. By Taylor expansion, this implies that

\displaystyle  \lambda(n) \approx n^{iT}

on {[x,x+H]} for most {x}, thus

\displaystyle  \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}|\ dx \gg HX.

But this can be shown to contradict the Matomäki-Radziwill theorem (because the multiplicative function {n \mapsto \lambda(n) n^{-iT}} is known to be non-pretentious).

December 07, 2018

Matt von HippelBook Review: We Have No Idea

I have no idea how I’m going to review this book.

Ok fine, I have some idea.

Jorge Cham writes Piled Higher and Deeper, a webcomic with possibly the most accurate depiction of grad school available. Daniel Whiteson is a professor at the University of California, Irvine, and a member of the ATLAS collaboration (one of the two big groups that make measurements at the Large Hadron Collider). Together, they’ve written a popular science book covering everything we don’t know about fundamental physics.

Writing a book about what we don’t know is an unusual choice, and there was a real risk it would end up as just a superficial gimmick. The pie chart on the cover presents the most famous “things physicists don’t know”, dark matter and dark energy. If they had just stuck to those this would have been a pretty ordinary popular physics book.

Refreshingly, they don’t do that. After blazing through dark matter and dark energy in the first three chapters, the rest of the book focuses on a variety of other scientific mysteries.

The book contains a mix of problems that get serious research attention (matter-antimatter asymmetry, high-energy cosmic rays) and more blue-sky “what if” questions (does matter have to be made out of particles?). As a theorist, I’m not sure that all of these questions are actually mysterious (we do have some explanation of the weird “1/3” charges of quarks, and I’d like to think we understand why mass includes binding energy), but even in these cases what we really know is that they follow from “sensible assumptions”, and one could just as easily ask “what if” about those assumptions instead. Overall, these “what if” questions make the book unique, and it would be a much weaker book without them.

“We Have No Idea” is strongest when the authors actually have some idea, i.e. when Whiteson is discussing experimental particle physics. It gets weaker on other topics, where the authors seem to rely more on others’ popular treatments (their discussion of “pixels of space-time” motivated me to write this post). Still, they at least seem to have asked the right people, and their accounts are on the more accurate end of typical pop science. (Closer to Quanta than IFLScience.)

The book’s humor really ties it together, often in surprisingly subtle ways. Each chapter has its own running joke, initially a throwaway line that grows into metaphors for everything the chapter discusses. It’s a great way to help the audience visualize without introducing too many new concepts at once. If there’s one thing cartoonists can teach science communicators, it’s the value of repetition.

I liked “We Have No Idea”. It could have been more daring, or more thorough, but it was still charming and honest and fun. If you’re looking for a Christmas present to explain physics to your relatives, you won’t go wrong with this book.

Doug NatelsonShoucheng Zhang, 1963-2018

Shocking and saddening news this week about the death of Shoucheng Zhang, Stanford condensed matter theorist who had made extremely high impact contributions to multiple topics in the field.    He began his research career looking at rather exotic physics; string theory was all the rage, and this was one of his first papers.  His first single-author paper, according to scopus, is this Phys Rev Letter looking at the possibility of an exotic (Higgs-related) form of superconductivity on a type of topological defect in spacetime.  Like many high energy theorists of the day, he made the transition to condensed matter physics, where his interests in topology and field theory were present throughout his research career.  Zhang made important contributions on the fractional quantum Hall effect (and here and here), the problem of high temperature superconductivity in the copper oxides (here), and most recently and famously, the quantum spin Hall effect (here for example).   He'd won a ton of major prizes, and was credibly in the running for a share of a future Nobel regarding topological materials and quantum spin Hall physics.

I had the good fortune to take one quarter of "introduction to many-body physics" (basically quantum field theory from the condensed matter perspective) from him at Stanford.  His clear lectures, his excellent penmanship at the whiteboard, and his ever-present white cricket sweater are standout memories even after 24 years.  He was always pleasant and enthusiastic when I'd see him.  In addition to his own scholarly output, Zhang had a huge, lasting impact on the community through mentorship of his students and postdocs.  His loss is deeply felt.  Depression is a terrible illness, and it can affect anyone - hopefully increased awareness and treatment will make tragic events like this less likely in the future.

BackreactionNo, negative masses have not revolutionized cosmology

Figure from arXiv:1712.07962 A lot of people have asked me to comment on a paper by Jamie Farnes, titled A Unifying Theory of Dark Energy and Dark Matter: Negative Masses and Matter Creation within a Modified ΛCDM FrameworkJ.S. FarnesAstronomy and Astrophysics 620, A92 (2018) arXiv:1712.07962 [physics.gen-ph]  Farnes is a postdoc fellow at the Oxford e-Research center and has previously

BackreactionCERN produces marketing video for new collider and it’s full of lies

The Large Hadron Collider (LHC) just completed its second run. Besides a few anomalies, there’s nothing new in the data. After the discovery of the Higgs-boson, there is also no good reason for why there should be something else to find, neither at the LHC nor at higher energies, not up until 15 orders of magnitude higher than what we can reach now. But of course there may be something, whether

John BaezSecond Symposium on Compositional Structures

I’ve been asleep at the switch; this announcement is probably too late for anyone outside the UK. But still, it’s great to see how applied category theory is taking off! And this conference is part of a series, so if you miss this one you can still go to the next.

Second Symposium on Compositional Structures (SYCO2), 17-18 December 2018, University of Strathclyde, Glasgow.

Accepted presentations


Please register asap so that catering can be arranged. Late registrants
might go hungry.

Invited speakers

• Corina Cirstea, University of Southampton – Quantitative Coalgebras for
Optimal Synthesis

• Martha Lewis, University of Amsterdam – Compositionality in Semantic Spaces


The Symposium on Compositional Structures (SYCO) is an interdisciplinary series of meetings aiming to support the growing community of researchers interested in the phenomenon of compositionality, from both applied and abstract perspectives, and in particular where category theory serves as a unifying common language. The first SYCO was held at the School of Computer Science, University of Birmingham, 20-21 September, 2018, attracting 70 participants.

We welcome submissions from researchers across computer science, mathematics, physics, philosophy, and beyond, with the aim of fostering friendly discussion, disseminating new ideas, and spreading knowledge between fields. Submission is encouraged for both mature research and work in progress, and by both established academics and junior researchers, including students.

Submission is easy, with no format requirements or page restrictions. The meeting does not have proceedings, so work can be submitted even if it has been submitted or published elsewhere.

While no list of topics could be exhaustive, SYCO welcomes submissions with a compositional focus related to any of the following areas, in particular from the perspective of category theory:

• logical methods in computer science, including classical and quantum programming, type theory, concurrency, natural language processing and machine learning;

• graphical calculi, including string diagrams, Petri nets and reaction networks;

• languages and frameworks, including process algebras, proof nets, type theory and game semantics;

• abstract algebra and pure category theory, including monoidal category
theory, higher category theory, operads, polygraphs, and relationships to homotopy theory;

• quantum algebra, including quantum computation and representation theory;

• tools and techniques, including rewriting, formal proofs and proof assistants, and game theory;

• industrial applications, including case studies and real-world problem

This new series aims to bring together the communities behind many previous successful events which have taken place over the last decade, including “Categories, Logic and Physics”, “Categories, Logic and Physics (Scotland)”, “Higher-Dimensional Rewriting and Applications”, “String Diagrams in Computation, Logic and Physics”, “Applied Category Theory”, “Simons Workshop on Compositionality”, and the “Peripatetic Seminar in Sheaves and Logic”.

SYCO will be a regular fixture in the academic calendar, running regularly throughout the year, and becoming over time a recognized venue for presentation and discussion of results in an informal and friendly atmosphere. To help create this community, and to avoid the need to make difficult choices between strong submissions, in the event that more good-quality submissions are received than can be accommodated in the timetable, the programme committee may choose to defer some submissions to a future meeting, rather than reject them. This would be done based largely on submission order, giving an incentive for early submission, but would also take into account other requirements, such as ensuring a broad scientific programme. Deferred submissions would be accepted for presentation at any future SYCO meeting without the need for peer review. This will allow us to ensure that speakers have enough time to present their ideas, without creating an unnecessarily competitive reviewing process. Meetings would be held sufficiently frequently to avoid a backlog of deferred papers.


Ross Duncan, University of Strathclyde
Fabrizio Romano Genovese, Statebox and University of Oxford
Jules Hedges, University of Oxford
Chris Heunen, University of Edinburgh
Dominic Horsman, University of Grenoble
Aleks Kissinger, Radboud University Nijmegen
Eliana Lorch, University of Oxford
Guy McCusker, University of Bath
Samuel Mimram, École Polytechnique
Koko Muroya, RIMS, Kyoto University & University of Birmingham
Paulo Oliva, Queen Mary
Nina Otter, UCLA
Simona Paoli, University of Leicester
Robin Piedeleu, University of Oxford and UCL
Julian Rathke, University of Southampton
Bernhard Reus, Univeristy of Sussex
David Reutter, University of Oxford
Mehrnoosh Sadrzadeh, Queen Mary
Pawel Sobocinski, University of Southampton (chair)
Jamie Vicary, University of Birmingham and University of Oxford (co-chair)

December 04, 2018

John BaezGeometric Quantization (Part 1)

I can’t help thinking about geometric quantization. I feel it holds some lessons about the relation between classical and quantum mechanics that we haven’t fully absorbed yet. I want to play my cards fairly close to my chest, because there are some interesting ideas I haven’t fully explored yet… but still, there are also plenty of ‘well-known’ clues that I can afford to explain.

The first one is this. As beginners, we start by thinking of geometric quantization as a procedure for taking a symplectic manifold and constructing a Hilbert space: that is, taking a space of classical states and contructing the corresponding space of quantum states. We soon learn that this procedure requires additional data as its input: a symplectic manifold is not enough. We learn that it works much better to start with a Kähler manifold equipped with a holomorphic hermitian line bundle with a connection whose curvature is the imaginary part of the Kähler structure. Then the space of holomorphic sections of that line bundle gives the Hilbert space we seek.

That’s quite a mouthful—but it makes for such a nice story that I’d love to write a bunch of blog articles explaining it with lots of examples. Unfortunately I don’t have time, so try these:

• Matthias Blau, Symplectic geometry and geometric quantization.

• A. Echeverria-Enriquez, M.C. Munoz-Lecanda, N. Roman-Roy, C. Victoria-Monge, Mathematical foundations of geometric quantization.

But there’s a flip side to this story which indicates that something big and mysterious is going on. Geometric quantization is not just a procedure for converting a space of classical states into a space of quantum states. It also reveals that a space of quantum states can be seen as a space of classical states!

To reach this realization, we must admit that quantum states are not really vectors in a Hilbert space H; from a certain point of view they are really 1-dimensonal subspaces of a Hilbert space, so the set of quantum states I’m talking about is the projective space PH. But this projective space, at least when it’s finite-dimensional, turns out to be the simplest example of that complicated thing I mentioned: a Kähler manifold equipped with a holomorphic hermitian line bundle whose curvature is the imaginary part of the Kähler structure!

So a space of quantum states is an example of a space of classical states—equipped with precisely all the complicated extra structure that lets us geometrically quantize it!

At this point, if you don’t already know the answer, you should be asking: and what do we get when we geometrically quantize it?

The answer is exciting only in that it’s surprisingly dull: when we geometrically quantize PH, we get back the Hilbert space H.

You may have heard of ‘second quantization’, where we take a quantum system, treat it as classical, and quantize it again. In the usual story of second quantization, the new quantum system we get is more complicated than the original one… and we can repeat this procedure again and again, and keep getting more interesting things:

• John Baez, Nth quantization.

The story I’m telling now is different. I’m saying that when we take a quantum system with Hilbert space H, we can think of it as a classical system whose symplectic manifold of states is PH, but then we can geometrically quantize this and get H back.

The two stories are not in contradiction, because they rely on two different notions of what it means to ‘think of a quantum system as classical’. In today’s story that means getting a symplectic manifold PH from a Hilbert space H. In the other story we use the fact that H itself is a symplectic manifold!

I should explain the relation of these two stories, but that would be a big digression from today’s intended blog article: indeed I’m already regretting having drifted off course. I only brought up this other story to heighten the mystery I’m talking about now: namely, that when we geometrically quantize the space PH, we get H back.

The math is not mysterious here; it’s the physical meaning of the math that’s mysterious. The math seems to be telling us that contrary to what they say in school, quantum systems are special classical systems, with the special property that when you quantize them nothing new happens!

This idea is not mine; it goes back at least to Kibble, the guy who with Higgs invented the method whereby the Higgs boson does its work:

• Tom W. B. Kibble, Geometrization of quantum mechanics, Comm. Math. Phys. 65 (1979), 189–201.

This led to a slow, quiet line of research that continues to this day. I find this particular paper especially clear and helpful:

• Abhay Ashtekar, Troy A. Schilling, Geometrical formulation of quantum mechanics, in On Einstein’s Path, Springer, Berlin, 1999, pp. 23–65.

so if you’re wondering what the hell I’m talking about, this is probably the best place to start. To whet your appetite, here’s the abstract:

Abstract. States of a quantum mechanical system are represented by rays in a complex Hilbert space. The space of rays has, naturally, the structure of a Kähler manifold. This leads to a geometrical formulation of the postulates of quantum mechanics which, although equivalent to the standard algebraic formulation, has a very different appearance. In particular, states are now represented by points of a symplectic manifold (which happens to have, in addition, a compatible Riemannian metric), observables are represented by certain real-valued functions on this space and the Schrödinger evolution is captured by the symplectic flow generated by a Hamiltonian function. There is thus a remarkable similarity with the standard symplectic formulation of classical mechanics. Features—such as uncertainties and state vector reductions—which are specific to quantum mechanics can also be formulated geometrically but now refer to the Riemannian metric—a structure which is absent in classical mechanics. The geometrical formulation sheds considerable light on a number of issues such as the second quantization procedure, the role of coherent states in semi-classical considerations and the WKB approximation. More importantly, it suggests generalizations of quantum mechanics. The simplest among these are equivalent to the dynamical generalizations that have appeared in the literature. The geometrical reformulation provides a unified framework to discuss these and to correct a misconception. Finally, it also suggests directions in which more radical generalizations may be found.

Personally I’m not interested in the generalizations of quantum mechanics: I’m more interested in what this circle of ideas means for quantum mechanics.

One rather cynical thought is this: when we start our studies with geometric quantization, we naively hope to extract a space of quantum states from a space of classical states, e.g. a symplectic manifold. But we then discover that to do this in a systematic way, we need to equip our symplectic manifold with lots of bells and whistles. Should it really be a surprise that when we’re done, the bells and whistles we need are exactly what a space of quantum states has?

I think this indeed dissolves some of the mystery. It’s a bit like the parable of ‘stone soup’: you can make a tasty soup out of just a stone… if you season it with some vegetables, some herbs, some salt and such.

However, perhaps because by nature I’m an optimist, I also think there are interesting things to be learned from the tight relation between quantum and classical mechanics that appears in geometric quantization. And I hope to talk more about those in future articles.

John PreskillTheoretical physics has not gone to the dogs.

I was surprised to learn, last week, that my profession has gone to the dogs. I’d introduced myself to a nonscientist as a theoretical physicist.

“I think,” he said, “that theoretical physics has lost its way in symmetry and beauty and math. It’s too far from experiments to be science.”

The accusation triggered an identity crisis. I lost my faith in my work, bit my nails to the quick, and enrolled in workshops about machine learning and Chinese.

Or I might have, if all theoretical physicists pursued quantum gravity.

Quantum-gravity physicists attempt to reconcile two physical theories, quantum mechanics and general relativity. Quantum theory manifests on small length scales, such as atoms’ and electrons’. General relativity manifests in massive systems, such as the solar system. A few settings unite smallness with massiveness, such as black holes and the universe’s origin. Understanding these settings requires a unification of quantum theory and general relativity.

Try to unify the theories, and you’ll find yourself writing equations that contain infinities. Such infinities can’t describe physical reality, but they’ve withstood decades of onslaughts. For guidance, many quantum-gravity theorists appeal to mathematical symmetries. Symmetries, they reason, helped 20th-century particle theorists predict experimental outcomes with accuracies better than any achieved with any other scientific theory. Perhaps symmetries can extend particle physics to a theory of quantum gravity.

Some physicists have criticized certain approaches to quantum gravity, certain approaches to high-energy physics more generally, and the high-energy community’s philosophy and sociology. Much criticism has centered on string theory, according to which our space-time has up to 26 dimensions, most too small for you to notice. Critics include Lee Smolin, the author of The Trouble with Physics, Peter Woit, who blogs on Not Even Wrong, and Sabine Hossenfelder, who published Lost in Math this year. This article contains no criticism of their crusade. I see merit in arguments of theirs, as in arguments of string theorists.

Science requires criticism to progress. So thank goodness that Smolin, Woit, Hossenfelder, and others are criticizing string theory. Thank goodness that the criticized respond. Thank goodness that debate rages, like the occasional wildfire needed to maintain a forest’s health.

The debate might appear to impugn the integrity of theoretical physics. But quantum gravity constitutes one pot in the greenhouse of theoretical physics. Theoretical physicists study lasers, star formation, atomic clocks, biological cells, gravitational waves, artificial materials, and more. Theoretical physicists are explaining, guiding, and collaborating on experiments. So many successes have piled up recently, I had trouble picking examples for this article. 

One example—fluctuation relations—I’ve blogged about beforeThese equalities generalize the second law of thermodynamics, which illuminates why time flows in just one direction. Fluctuation relations also provide a route to measuring an energetic quantity applied in pharmacology, biology, and chemistry. Experimentalists have shown, over the past 15 years, that fluctuation relations govern RNA, DNA, electronic systems, and trapped ions (artificial atoms). 

Second, experimentalists are exercising, over quantum systems, control that physicists didn’t dream of decades ago. Harvard physicists can position over 50 atoms however they please, using tweezers formed from light. Google has built a noisy quantum computer of 72 superconducting qubits, circuits through which charge flows without resistance. Also trapped ions, defects in diamonds, photonics, and topological materials are breaking barriers. These experiments advance partially due to motivation from theorists and partially through collaborations with theorists. In turn, experimental data guide theorists’ explanations and our proposals of experiments.

In one example, theorists teamed with experimentalists to probe quantum correlations spread across space and time. In another example, theorists posited a mechanism by which superconducting qubits interact with a hot environment. Other illustrations from the past five years include discrete time crystals, manybody scars, magic-angle materials, and quantum chaos. 

These collaborations even offer hope for steering quantum gravity with experiments. Certain quantum-gravity systems share properties with certain many-particle quantum systems. This similarity, we call “the AdS/CFT duality.” Experimentalists have many-particle quantum systems and are stretching those systems toward the AdS/CFT regime. Experimental results, with the duality, might illuminate where quantum-gravity theorists should and shouldn’t search. Perhaps no such experiments will take place for decades. Perhaps AdS/CFT can’t shed light on our universe. But theorists and experimentalists are partnering to try.

These illustrations demonstrate that theoretical physics, on the whole, remains healthy, grounded, and thriving. This thriving is failing to register with part of the public. Evidence thwacked me in the face last week, as explained at the start of this article. The Wall Street Journal published another example last month: John Horgan wrote that “physics, which should serve as the bedrock of science, is in some respects the most troubled field of” science. The evidence presented consists of one neighborhood in the theoretical fraction of the metropolis of physics: string and multiverse models.

Horgan’s article reflects decades of experience in science journalism, a field I respect. I sympathize, moreover, with those who interface so much with quantum gravity, the subfield appears to eclipse the rest of theoretical physics. Horgan was reviewing books by Stephen Hawking and Martin Rees, who discuss string and multiverse models. Smolin, Woit, Hossenfelder, and others garner much press, which they deserve: They provoke debate and articulate their messages eloquently. Such press can blot out, say, profiles of the theoretical astrophysicists licking their lips over gravitational-wave data.

If any theory bears flaws, those flaws need correcting. But most theoretical physicists don’t pursue quantum gravity, let alone string theory. Any flaws of string theory do not mar all theoretical physics. These points need a megaphone, because misconceptions about theoretical physics endanger society. First, companies need workers who have technical skills and critical reasoning. Both come from training in theoretical physics. Besmirching theoretical physics can divert students from programs that can benefit the economy and nurture thoughtful citizens.1 

Second, some nonscientists are attempting to discredit the scientific community for political gain. Misconceptions about theoretical physics can appear to support these nonscientists’ claims. The ensuing confusion can lead astray voters and parents who face choices about vaccination, global health, national security, and budget allocations.

Last week, I heard that my profession has wandered too far from experiments. Hours earlier, I’d skyped with an experimentalist with whom I’m collaborating. A disconnect separates the reality of theoretical physicists from impressions harbored by part of the public. Let’s clear up the misconceptions. Theoretical physics, as a whole, remains healthy, grounded, and thriving.



1Nurturing thoughtful citizens takes also humanities, social-sciences, language, and arts programs.

December 02, 2018

Doug NatelsonLate Thanksgiving physics: Split peas and sandcastles

Last week, when I was doing some cooking for the US Thanksgiving holiday, I was making a really good vegetarian side dish (seriously, try it), and I saw something that I thought was pretty remarkable, and it turns out that a Nature paper had been written about it.

The recipe involves green split peas, and the first step is to rinse these little dried lozenge-shaped particles (maybe 4 mm in diameter, maybe 2 mm thick) in water to remove any excess dust or starch.  So, I put the dried peas in a wire mesh strainer, rinsed them with running water, and dumped them into a saucepan.  Unsurprisingly, the wet split peas remained stuck together in a hemispherical shape that exactly mimicked the contours of the strainer.  This is a phenomenon familiar to anyone who has ever built a sandcastle - wet particulates adhere together.  

The physics behind this adhesion is surface tension.  Because water molecules have an attractive interaction with each other, in the absence of any other interactions, liquid water will settle into a shape that minimizes the area of the water-vapor interface.  That's why water forms spherical blobs in microgravity.  It costs about 72 mJ/m2 to create some area of water-air interface.  It turns out that it is comparatively energetically favored to form a water-split pea interface, because of attractive interactions between the polar water molecules and the mostly cellulose split pea surface.  

For a sense of scale, creating water-air interface with the area of one split pea (surface area roughly 2.5e-5 m2) would take about 2 microjoules of energy.  The mass of the split pea half I'm considering, assuming a density similar to water, is around 25 mg = 2.5e-5 kg.  So, lifting such a split pea by about it's own height requires an energy of \(mgh \sim\) 2.5e-5*9.807*2e-4 = 0.5 microjoules.  The fact that this is comparable to (but smaller than) the surface energy of the water-air interface of a wet split pea tells you that you should not be surprised that water coatings can hold wet split peas up against the force of gravity.

What I then saw, which was surprising to me, was that even as I started adding the 3.5 cups of water mentioned in the recipe,  the hemispherical split pea "sandcastle" stayed together, even when I prodded it with a cooking spoon.  This surprised me.  A few minutes of internet search confirmed that this effect is surprising enough to merit its own Nature Materials paper, with its own News and Views article. The transition from cohering wet grains to a flowing slurry turns out to happen at really high water fractions.  Neat physics, and the richness of a system as simple as grains/beads, water, and air is impressive.

November 30, 2018

BackreactionDo women in physics get fewer citations than men?

Yesterday, I gave a seminar about the results of a little side-project that I did with two collaborators, Tobias Mistele and Tom Price. We analyzed publication data in some sub-disciplines of physics and looked for differences in citations to papers with male and female authors. This was to follow-up on the previously noted discrepancy between the arXiv data and the Inspire data that we found

Matt von HippelPan Narrans Scientificus

As scientists, we want to describe the world as objectively as possible. We try to focus on what we can establish conclusively, to leave out excessive speculation and stick to cold, hard facts.

Then we have to write application letters.

Stick to the raw, un-embellished facts, and an application letter would just be a list: these papers in these journals, these talks and awards. Though we may sometimes wish applications worked that way, we don’t live in that kind of world. To apply for a job or a grant, we can’t just stick to the most easily measured facts. We have to tell a story.

The author Terry Pratchett called humans Pan Narrans, the Storytelling Ape. Stories aren’t just for fun, they’re how we see the world, how we organize our perceptions and actions. Without a story, the world doesn’t make sense. And that applies even to scientists.

Applications work best when they tell a story: how did you get here, and where are you going? Scientific papers, similarly, require some sort of narrative: what did you do, and why did you do it? When teaching or writing about science, we almost never just present the facts. We try to fit it into a story, one that presents the facts but also makes sense, in that deliciously human way. A story, more than mere facts, lets us project to the future, anticipating what you’ll do with that grant money or how others will take your research in new directions.

It’s important to remember, though, that stories aren’t actually facts. You can’t get too attached to one story, you have to be willing to shift as new facts come in. Those facts can be scientific measurements, but they can also be steps in your career. You aren’t going to tell the same story when applying to grad school as when you’re trying for tenure, and that’s not just because you’ll have more to tell. The facts of your life will be organized in new ways, rearranging in importance as the story shifts.

Keep your stories in mind as you write or do science. Think about your narrative, the story you’re using to understand the world. Think about what it predicts, how the next step in the story should go. And be ready to start a new story when you need to.

November 28, 2018

John BaezStratospheric Controlled Perturbation Experiment

I have predicted for a while that as the issue of climate change becomes ever more urgent, the public attitude regarding geoengineering will at some point undergo a phase transition. For a long time it seems the general attitude has been that deliberately interfering with the Earth’s climate on a large scale is “unthinkable”: beyond the pale. I predict that at some point this will flip and the general attitude will become: “how soon can we do it?”

The danger then is that we rush headlong into something untested that we’ll regret.

For a while I’ve been advocating research in geoengineering, to prevent a big mistake like this. Those who consider it “unthinkable” often object to such research, but I think preventing research is not a good long-term policy. I think it actually makes it more likely that at some point, when enough people become really desperate about climate change, we will do something rash without enough information about the possible effects.

Anyway, one can argue about this all day: I can see the arguments for both sides. But here is some news: scientists will soon study how calcium carbonate disperses when you dump a little into the atmosphere:

First sun-dimming experiment will test a way to cool Earth, Nature, 27 November 2018.

It’s a good article—read it! Here’s the key idea:

If all goes as planned, the Harvard team will be the first in the world to move solar geoengineering out of the lab and into the stratosphere, with a project called the Stratospheric Controlled Perturbation Experiment (SCoPEx). The first phase — a US$3-million test involving two flights of a steerable balloon 20 kilometres above the southwest United States — could launch as early as the first half of 2019. Once in place, the experiment would release small plumes of calcium carbonate, each of around 100 grams, roughly equivalent to the amount found in an average bottle of off-the-shelf antacid. The balloon would then turn around to observe how the particles disperse.

The test itself is extremely modest. Dai, whose doctoral work over the past four years has involved building a tabletop device to simulate and measure chemical reactions in the stratosphere in advance of the experiment, does not stress about concerns over such research. “I’m studying a chemical substance,” she says. “It’s not like it’s a nuclear bomb.”

Nevertheless, the experiment will be the first to fly under the banner of solar geoengineering. And so it is under intense scrutiny, including from some environmental groups, who say such efforts are a dangerous distraction from addressing the only permanent solution to climate change: reducing greenhouse-gas emissions. The scientific outcome of SCoPEx doesn’t really matter, says Jim Thomas, co-executive director of the ETC Group, an environmental advocacy organization in Val-David, near Montreal, Canada, that opposes geoengineering: “This is as much an experiment in changing social norms and crossing a line as it is a science experiment.”

Aware of this attention, the team is moving slowly and is working to set up clear oversight for the experiment, in the form of an external advisory committee to review the project. Some say that such a framework, which could pave the way for future experiments, is even more important than the results of this one test. “SCoPEx is the first out of the gate, and it is triggering an important conversation about what independent guidance, advice and oversight should look like,” says Peter Frumhoff, chief climate scientist at the Union of Concerned Scientists in Cambridge, Massachusetts, and a member of an independent panel that has been charged with selecting the head of the advisory committee. “Getting it done right is far more important than getting it done quickly.”

For more on SCoPEx, including a FAQ, go here:

Stratospheric Controlled Perturbation Experiment (SCoPEx), Keutsch Group, Harvard.

November 25, 2018

Jordan EllenbergOn not staying in your lane

This week I’ve been thinking about some problems outside my usual zone of expertise — namely, questions about the mapping class group and the Johnson kernel.  This has been my week:

  • Three days of trying to prove a cohomology class is nonzero;
  • Then over Thanksgiving I worked out an argument that it was zero and was confused about that for a couple of days because I feel quite deeply that it shouldn’t be zero;
  • This morning I was able to get myself kind of philosophical peace with the class being zero and was working out which refined version of the class might not be zero;
  • This afternoon I was able to find the mistake in my argument that the class was zero so now I hope it’s not zero again.
  • But I still don’t know.

There’s a certain frustration, knowing that I’ve spend a week trying to compute something which some decently large number of mathematicians could probably sit down and just do, because they know their way around this landscape.  But on the other hand, I would never want to give up the part of math research that involves learning new things as if I were a grad student.  It is not the most efficient way, in the short term, to figure out whether this class is zero or not, but I think it probably helps me do math better in a global sense that I spend some of my weeks stumbling around unfamiliar rooms in the dark.  Of course I might just be rationalizing something I enjoy doing.  Even if it’s frustrating.  Man, I hope that class isn’t zero.

Doug NatelsonFundamental units and condensed matter

As was discussed in many places over the last two weeks, the official definition of the kilogram has now been changed, to a version directly connected to Planck's constant, \(h\).  The NIST description of this is very good, and I am unlikely to do better.  Through the use of a special type of balance (a Kibble or Watt balance, the mass can be related back to \(h\) via the dissipation of electrical power in the form of \(V^{2}/R\).  A point that I haven't seen anyone emphasize in their coverage:  Both the volt and the Ohm are standardized in terms of condensed matter phenomena - there is a deep, profound connection between emergent condensed matter effects and our whole fundamental set of units (a link that needs to be updated to include the new definition of kg).

Voltage \(V\) is standardized in terms of the Josephson effect.  In a superconductor, electrons pair up and condense into a quantum state that is described by a complex number called the order parameter, with a magnitude and a phase.  The magnitude is related to the density of pairs.  The phase is related to the coherent response of all the pairs, and only takes on a well-defined value below the superconducting transition.  In a junction between superconductors (say a thin tunneling barrier of insulator), a dc voltage difference between the two sides causes the phase to "wind" as a function of time, leading to an ac current with a frequency of \(2eV/h\).  Alternately, applying an ac voltage of known frequency \(f\) can generate a dc voltage at integer multiples of \(h f/2e\).  The superconducting phase is an emergent quantity, well defined only when the number of pairs is large.

The Ohm \(\Omega\) is standardized in terms of the integer quantum Hall effect.  Electrons confined to a relatively clean 2D layer and placed in a large magnetic field show plateaus in the Hall resistance, the relationship between longitudinal current and transverse voltage, at integer multiples of \(e^{2}/h\).  The reason for picking out those particular values is deeply connected to topology, and is independent of the details of the material system.  You can see the integer QHE in many systems, one reason why it's good to use as a standard.  The existence of the plateaus, and therefore really accurate quantization, in actual measurements of the Hall conductance requires disorder.  Precise Hall quantization is likewise also an emergent phenomenon.

Interesting that the fundamental definition of the kilogram is deeply connected to two experimental phenomena that are only quantized to high precision because they emerge in condensed matter.

November 23, 2018

Matt von HippelHow to Get a “Minimum Scale” Without Pixels

Zoom in, and the world gets stranger. Down past atoms, past protons and neutrons, far past the smallest scales we can probe at the Large Hadron Collider, we get to the scale at which quantum gravity matters: the Planck scale.

Weird things happen at the Planck scale. Space and time stop making sense. Read certain pop science articles, and they’ll tell you the Planck scale is the smallest scale, the scale where space and time are quantized, the “pixels of the universe”.

That last sentence, by the way, is not actually how the Planck scale works. In fact, there’s pretty good evidence that the universe doesn’t have “pixels”, that space and time are not quantized in that way. Even very tiny pixels would change the speed of light, making it different for different colors. Tiny effects like that add up, and astronomers would almost certainly have noticed an effect from even Planck-scale pixels. Unless your idea of “pixels” is fairly unusual, it’s already been ruled out.

If the Planck scale isn’t the scale of the “pixels of the universe”, why do people keep saying it is?

Part of the problem is that the real story is vaguer. We don’t know what happens at the Planck scale. It’s not just that we don’t know which theory of quantum gravity is right: we don’t even know what different quantum gravity proposals predict. People are trying to figure it out, and there are some more or less viable ideas, but ultimately all we know is that at the Planck scale our description of space-time should break down.

“Our description breaks down” is unfortunately not very catchy. Certainly, it’s less catchy than “pixels of the universe”. Part of the problem is that most people don’t know what “our description breaks down” actually means.

So if that’s the part that’s puzzling you, maybe an example would help. This won’t be the full answer, though it could be part of the story. What it will be is an example of what “our description breaks down” can actually mean, how there can be a scale beyond which space-time stops making sense without there being “pixels”.

The example comes from string theory, from a concept called “T duality”. In string theory, “extra” dimensions beyond our usual three space and one time are curled up small, so that traveling along them just gets you back where you started. Instead of particles, there are strings, with length close to the Planck length.

Picture a loop of string in a small extra dimension. What can it do?

Image credit: someone who’s done a lot more work explaining string theory than I have

One thing it can do is move along the extra dimension. Since it has to end up back where it started, it can’t just move at any speed it wants. It turns out that the smaller the extra dimension, the more energy the string has when it spins around it.

The other thing it can do is wrap around the extra dimension. If it wraps around, the string has more energy if the dimension is larger, like a rubber band stretched around a pipe.

The string can do either or both of these multiple times. It can wrap many times around the extra dimension, or move in a quicker circle around it, or both at once. And if you calculate the energy of these combinations, you notice something: a string wound around a big circle has the same energy as a string moving around a small circle. In particular, you get the same energy on a circle of radius R, and a circle of radius l^2/R, where l is the length of the string.

It turns out it’s not just the energy that’s the same: for everything that happens on a circle of radius R, there’s a matching description with a circle of radius l^2/R, with wrapping and moving swapped. We say that the two descriptions are dual: two seemingly different pictures that turn out to be completely physically indistinguishable.

Since the two pictures are indistinguishable, it doesn’t actually make sense to talk about dimensions smaller than the length of the string. It’s not that they can’t exist, or that they’re smaller than the “pixels of the universe”: it’s just that any description you write down of such a small dimension could just as easily have been of a larger, dual dimension. It’s that your picture, of one obvious size of the curled up dimension, broke down and stopped making sense.

As I mentioned, this isn’t the whole picture of what happens at the Planck scale, even in string theory. It is an example of a broader idea that string theorists are investigating, that in order to understand space-time at the smallest scales you need to understand many different dual descriptions. And hopefully, it’s something you can hold in your mind, a specific example of what “our description breaks down” can actually mean in practice, without pixels.

November 22, 2018

Sean CarrollThanksgiving

This year we give thanks for an historically influential set of celestial bodies, the moons of Jupiter. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, effective field theory, the error bar, gauge symmetry, Landauer’s Principle, the Fourier Transform, Riemannian Geometry, the speed of light, and the Jarzynski equality.)

For a change of pace this year, I went to Twitter and asked for suggestions for what to give thanks for in this annual post. There were a number of good suggestions, but two stood out above the rest: @etandel suggested Noether’s Theorem, and @OscarDelDiablo suggested the moons of Jupiter. Noether’s Theorem, according to which symmetries imply conserved quantities, would be a great choice, but in order to actually explain it I should probably first explain the principle of least action. Maybe some other year.

And to be precise, I’m not going to bother to give thanks for all of Jupiter’s moons. 78 Jovian satellites have been discovered thus far, and most of them are just lucky pieces of space debris that wandered into Jupiter’s gravity well and never escaped. It’s the heavy hitters — the four Galilean satellites — that we’ll be concerned with here. They deserve our thanks, for at least three different reasons!

Reason One: Displacing Earth from the center of the Solar System

Galileo discovered the four largest moons of Jupiter — Io, Europa, Ganymede, and Callisto — back in 1610, and wrote about his findings in Sidereus Nuncius (The Starry Messenger). They were the first celestial bodies to be discovered using that new technological advance, the telescope. But more importantly for our present purposes, it was immediately obvious that these new objects were orbiting around Jupiter, not around the Earth.

All this was happening not long after Copernicus had published his heliocentric model of the Solar System in 1543, offering an alternative to the prevailing Ptolemaic geocentric model. Both models were pretty good at fitting the known observations of planetary motions, and both required an elaborate system of circular orbits and epicycles — the realization that planetary orbits should be thought of as ellipses didn’t come along until Kepler published Astronomia Nova in 1609. As everyone knows, the debate over whether the Earth or the Sun should be thought of as the center of the universe was a heated one, with the Roman Catholic Church prohibiting Copernicus’s book in 1616, and the Inquisition putting Galileo on trial in 1633.

Strictly speaking, the existence of moons orbiting Jupiter is equally compatible with a heliocentric or geocentric model. After all, there’s nothing wrong with thinking that the Earth is the center of the Solar System, but that other objects can have satellites. However, the discovery brought about an important psychological shift. Sure, you can put the Earth at the center and still allow for satellites around other planets. But a big part of the motivation for putting Earth at the center was that the Earth wasn’t “just another planet.” It was supposed to be the thing around which everything else moved. (Remember that we didn’t have Newtonian mechanics at the time; physics was still largely an Aristotelian story of natures and purposes, not a bunch of objects obeying mindless differential equations.)

The Galilean moons changed that. If other objects have satellites, then Earth isn’t that special. And if it’s not that special, why have it at the center of the universe? Galileo offered up other arguments against the prevailing picture, from the phases of Venus to mountains on the Moon, and of course once Kepler’s ellipses came along the whole thing made much more mathematical sense than Ptolemy’s epicycles. Thus began one of the great revolutions in our understanding of our place in the cosmos.

Reason Two: Measuring the speed of light

Time is what clocks measure. And a clock, when you come right down to it, is something that does the same thing over and over again in a predictable fashion with respect to other clocks. That sounds circular, but it’s a nontrivial fact about our universe that it is filled with clocks. And some of the best natural clocks are the motions of heavenly bodies. As soon as we knew about the moons of Jupiter, scientists realized that they had a new clock to play with: by accurately observing the positions of all four moons, you could work out what time it must be. Galileo himself proposed that such observations could be used by sailors to determine their longitude, a notoriously difficult problem.

Danish astronomer Ole Rømer noted a puzzle when trying to use eclipses of Io to measure time: despite the fact that the orbit should be an accurate clock, the actual timings seemed to change with the time of year. Being a careful observational scientist, he deduced that the period between eclipses was longer when the Earth was moving away from Jupiter, and shorter when the two planets were drawing closer together. An obvious explanation presented itself: the light wasn’t traveling instantaneously from Jupiter and Io to us here on Earth, but rather took some time. By figuring out exactly how the period between eclipses varied, we could then deduce what the speed of light must be.

Rømer’s answer was that light traveled at about 220,000 kilometers per second. That’s pretty good! The right answer is 299,792 km/sec, about 36% greater than Rømer’s value. For comparison purposes, when Edwin Hubble first calculated the Hubble constant, he derived a value of about 500 km/sec/Mpc, whereas now we know the right answer is about 70 km/sec/Mpc. Using astronomical observations to determine fundamental parameters of the universe isn’t easy, especially if you’re the first one to to it.

Reason Three: Looking for life

Here in the present day, Jupiter’s moons have not lost their fascination or importance. As we’ve been able to study them in greater detail, we’ve learned a lot about the history and nature of the Solar System more generally. And one of the most exciting prospects is that one or more of these moons might harbor life.

It used to be common to think about the possibilities for life outside Earth in terms of a “habitable zone,” the region around a star where temperatures allowed planets to have liquid water. (Many scientists think that liquid water is a necessity for life to exist — but maybe we’re just being parochial about that.) In our Solar System, Earth is smack-dab in the middle of the habitable zone, and Mars just sneaks in. Both Venus and Jupiter are outside, on opposite ends.

But there’s more than one way to have liquid water. It turns out that both Europa and Ganymede, as well as Saturn’s moons Titan and Enceladus, are plausible homes for large liquid oceans. Europa, in particular, is thought to possess a considerable volume of liquid water underneath an icy crust — approximately two or three times as much water as in all the oceans on Earth. The point is that solar radiation isn’t the only way to heat up water and keep it at liquid temperatures. On Europa, it’s likely that heat is generated by the tidal pull from Jupiter, which stretches and distorts the moon’s crust as it rotates.

Does that mean there could be life there? Maybe! Nobody really knows. Smart money says that we’re more likely to find life on a wet environment like Europa than a dry one like Mars. And we’re going to look — the Europa Clipper mission is scheduled for launch by 2025.

If you can’t wait for then, go back and watch the movie Europa Report. And while you do, give thanks to Galileo and his discovery of these fascinating celestial bodies.

November 20, 2018

John PreskillA Roman in a Modern Court

Yesterday I spent some time wondering how to explain the modern economy to an ancient Roman brought forward from the first millennium BCE. For now I’ll assume language isn’t a barrier, but not much more. Here’s my rough take:

“There have been five really important things that were discovered since when you left and now.

First, every living thing has a tiny blueprint inside it. We learned how to rewrite those, and now we can make crops that resist pests, grow healthy, and take minimal effort to cultivate. The same tool also let us make creatures that manufacture medicine, as well as animals different from anything that existed before. Food became cheap because of this.

Second, we learned that hot air and steam expand. This means you can burn oil or coal and use that to push air around, which in turn can push against solid objects. With this we’ve made vehicles that can go the span of the Empire from Rome to Londinium and back in hours rather than weeks. Similar mechanisms can be used to work farms, forge metal, and so on. Manufactured goods became cheap as a result.

Third, we discovered an invisible fluid that lives in metals. It flows unimaginably quickly and with minimal force through even very narrow channels, so by pushing on it in one city it may be made to move almost instantly in another. That lets you work with energy as a kind of commodity, rather than something that hooks up and is generated specifically for each device.

Fourth, we found that this fluid can be pushed around by light, including a kind human eyes can’t see. This lets a device make light in one place and push on the fluid in a different device with no metal in between. Communication became fast, cheap, and easy.

Finally, and this one takes some explaining, our machines can make decisions. Imagine you had a channel for water with a fork. You can insert a blade to control which route the water takes. If you attach that blade to a lever you can change the direction of the flow. If you dip that lever in another channel of water, then what flows in one channel can set which way another channel goes. It turns out that that’s all you need to make simple decisions like “If water is in this channel, flow down that other one.”, which can then be turned into useful statements like “Put water in this channel if you’re attacked. It’ll redirect the other channel and release boiling oil.” With enough of these water switches you can do really complicated things like tracking money, searching for patterns, predicting the weather, and so on. While water is hard to work with, you can make these channels and switches almost perfect for the invisible fluid, and you can make them tiny, vastly smaller than the width of a hair. A device that fits in your hand might have more switches than there are grains in a cubic meter of sand. The number of switches we’ve made so far outnumbers all the grains of sand on Earth, and we’re just getting started.”

November 18, 2018

n-Category Café Modal Types Revisited

We’ve discussed the prospects for adding modalities to type theory for many a year, e.g., here at the Café back at Modal Types, and frequently at the nLab. So now I’ve written up some thoughts on what philosophy might make of modal types in this preprint. My debt to the people who helped work out these ideas will be acknowledged when I publish the book.

This is to be the fourth chapter of a book which provides reasons for philosophy to embrace modal homotopy type theory. The book takes in order the components: types, dependency, homotopy, and finally modality.

The chapter ends all too briefly with mention of Mike Shulman et al.’s project, which he described in his post – What Is an n-Theory?. I’m convinced this is the way to go.

PS. I already know of the typo on line 8 of page 4.

November 17, 2018

Tim GowersWorrying news from Turkey

One should of course be concerned when anybody is detained for spurious reasons, but when that person is a noted mathematician, the shock is greater. Six academics have recently been detained in Turkey, of whom one, Betül Tanbay, is due to become vice president of the European Mathematical Society in January. I do not know of any petitions for their release, but if they are not released very quickly I hope that there will be a strong reaction. The EMS has issued the following statement.

The European Mathematical Society is outraged at the news that the Turkish police have detained, in Istanbul on the morning of 16th November 2018, Professor Betül Tanbay, a member of the EMS Executive Committee. We are incredulous at the subsequent press release from the Istanbul Security Directorate accusing her of links to organized crime and attempts to topple the Turkish government.

Professor Tanbay is a distinguished mathematician and a Vice President Elect of the European Mathematical Society, due to assume that role from January 2019. We have known her for many years as a talented scientist and teacher, a former President of the Turkish Mathematical Society, an open-minded citizen, and a true democrat. She may not hesitate to exercise her freedom of speech, a lawful right that any decent country guarantees its citizens, but it is preposterous to suggest that she could be involved in violent or criminal activities.

We demand that Professor Tanbay is immediately freed from detention, and we call on the whole European research community to raise its voice against this shameful mistreatment of our colleague, so frighteningly reminiscent of our continent’s darkest times.

Update. I have just seen this on Twitter:

Police freed 8 people, incl. professors Turgut Tarhanli and Betul Tanbay, while barring them from overseas travel, & is still questioning 6 others

November 16, 2018

Matt von HippelMy Other Brain (And My Other Other Brain)

What does a theoretical physicist do all day? We sit and think.

Most of us can’t do all that thinking in our heads, though. Maybe Steven Hawking could, but the rest of us need to visualize what we’re thinking. Our memories, too, are all-too finite, prone to forget what we’re doing midway through a calculation.

So rather than just use our imagination and memory, we use another imagination, another memory: a piece of paper. Writing is the simplest “other brain” we have access to, but even by itself it’s a big improvement, adding weeks of memory and the ability to “see” long calculations at work.

But even augmented by writing, our brains are limited. We can only calculate so fast. What’s more, we get bored: doing the same thing mechanically over and over is not something our brains like to do.

Luckily, in the modern era we have access to other brains: computers.

As I write, the “other brain” sitting on my desk works out a long calculation. Using programs like Mathematica or Maple, or more serious programming languages, I can tell my “other brain” to do something and it will do it, quickly and without getting bored.

My “other brain” is limited too. It has only so much memory, only so much speed, it can only do so many calculations at once. While it’s thinking, though, I can find yet another brain to think at the same time. Sometimes that’s just my desktop, sitting back in my office in Denmark. Sometimes I have access to clusters, blobs of synchronized brains to do my bidding.

While I’m writing this, my “brains” are doing five different calculations (not counting any my “real brain” might be doing). I’m sitting and thinking, as a theoretical physicist should.

November 15, 2018

n-Category Café Magnitude: A Bibliography

I’ve just done something I’ve been meaning to do for ages: compiled a bibliography of all the publications on magnitude that I know about. More people have written about it than I’d realized!

This isn’t an exercise in citation-gathering; I’ve only included a paper if magnitude is the central subject or a major theme.

I’ve included works on magnitude of ordinary, un-enriched, categories, in which context magnitude is usually called Euler characteristic. But I haven’t included works on the diversity measures that are closely related to magnitude.

Enjoy! And let me know in the comments if I’ve missed anything.

November 13, 2018

Doug NatelsonBlog stats weirdness

This blog is hosted on blogger, google's free blogging platform.  There are a couple of ways to get statistics about the blog, like rates of visits and where they're from.  One approach is to start from the nanoscale views blogger homepage and click "stats", which can tell me an overview of hit rates, traffic sources, etc.  The other approach is to go to and look at the more official information compiled by google's tracking code. 

The blogger stats data has always looked weird relative to the analytics information, with "stats" showing far more hits per day - probably tracking every search engine robot that crawls the web, not just real hits.  This is a new one, though:  On "stats" for referring traffic, number one is google, and number three is Peter Woit's blog.  Those both make sense, but in second place there is a site that I didn't recognize, and it appears to be associated with hardcore pornography (!).  That site doesn't show up at all on the analytics page, where number one is google, number two is direct linking, and number three is again Woit's blog.  Weird.  Very likely that this is the result of a script trying to put porn spam in comments on thousands of blogs.  Update:  As I pointed out on social media to some friends, it's not that this blog is porn - it's just that someone somewhere thinks readers of this blog probably like porn.  :-)

November 12, 2018

n-Category Café A Well Ordering Is A Consistent Choice Function

Well orderings have slightly perplexed me for a long time, so every now and then I have a go at seeing if I can understand them better. The insight I’m about to explain doesn’t resolve my perplexity, it’s pretty trivial, and I’m sure it’s well known to lots of people. But it does provide a fresh perspective on well orderings, and no one ever taught me it, so I thought I’d jot it down here.

In short: the axiom of choice allows you to choose one element from each nonempty subset of any given set. A well ordering on a set is a way of making such a choice in a consistent way.

Write P(X)P'(X) for the set of nonempty subsets of a set XX. One formulation of the axiom of choice is that for any set XX, there is a function h:P(X)Xh: P'(X) \to X such that h(A)Ah(A) \in A for all AP(X)A \in P'(X).

But if we think of hh as a piece of algebraic structure on the set XX, it’s natural to ask that hh behaves in a consistent way. For example, given two nonempty subsets A,BXA, B \subseteq X, how can we choose an element of ABA \cup B?

  • We could, quite simply, take h(AB)ABh(A \cup B) \in A \cup B.

  • Alternatively, we could take first take h(A)Ah(A) \in A and h(B)Bh(B) \in B, then use hh to choose an element of {h(A),h(B)}\{h(A), h(B)\}. The result of this two-step process is h({h(A),h(B)})h(\{ h(A), h(B) \}).

A weak form of the “consistency” I’m talking about is that these two methods give the same outcome:

h(AB)=h({h(A),h(B)}) h(A \cup B) = h(\{h(A), h(B)\})

for all A,BP(X)A, B \in P'(X). The strong form is similar, but with arbitrary unions instead of just binary ones:

h(Ω)=h({h(A):AΩ}) h\Bigl( \bigcup \Omega \Bigr) = h\Bigl( \bigl\{ h(A) : A \in \Omega \bigr\} \Bigr)

for all ΩPP(X)\Omega \in P'P'(X).

Let’s say that a function h:P(X)Xh: P'(X) \to X satisfying the weak or strong consistency law is a weakly or strongly consistent choice function on XX.

The central point is this:

A consistent choice function on a set XX is the same thing as a well ordering on XX.

That’s true for consistent choice functions in both the weak and the strong sense — they turn out to be equivalent.

The proof is a pleasant little exercise. Given a well ordering \leq on XX, define h:P(X)Xh: P'(X) \to X by taking h(A)h(A) to be the least element of AA. It’s easy to see that this is a consistent choice function. In the other direction, given a consistent choice function hh on XX, define \leq by

xyh({x,y})=x. x \leq y \Leftrightarrow h(\{x, y\}) = x.

You can convince yourself that \leq is a well ordering and that h(A)h(A) is the least element of AA, for any nonempty AXA \subseteq X. The final task, also easy, is to show that the two constructions (of a consistent choice function from a well ordering and vice versa) are mutually inverse. And that’s that.

(For anyone following in enough detail to wonder about the difference between weak and strong: you only need to assume that hh is a weakly consistent choice function in order to prove that the resulting relation \leq is a well ordering, but if you start with a well ordering \leq, it’s clear that the resulting function hh is strongly consistent. So weak is equivalent to strong.)

For me, the moral of the story is as follows. As everyone who’s done some set theory knows, if we assume the axiom of choice then every set can be well ordered. Understanding well orderings as consistent choice functions, this says the following:

If we’re willing to assume that it’s possible to choose an element of each nonempty subset of a set, then in fact it’s possible to make the choice in a consistent way.

People like to joke that the axiom of choice is obviously true, and that the well orderability of every set is obviously false. (Or they used to, at least.) The theorem on well ordering is derived from the axiom of choice by an entirely uncontroversial chain of reasoning, so I’ve always taken that joke to be the equivalent of throwing one’s hands up in despair: isn’t math weird! Look how this highly plausible statement implies an implausible one!

So the joke expresses a breakdown in many people’s intuitions. And with well orderings understood in the way I’ve described, we can specify the point at which the breakdown occurs: it’s in the gap between making a choice and making a consistent choice.

November 10, 2018

Tim GowersA quasirandomness implication

This is a bit of a niche post, since its target audience is people who are familiar with quasirandom graphs and like proofs of an analytic flavour. Very roughly, a quasirandom graph is one that behaves like a random graph of the same density. It turns out that there are many ways that one can interpret the phrase “behaves like a random graph” and, less obviously, that they are all in a certain sense equivalent. This realization dates back to seminal papers of Thomason, and of Chung, Graham and Wilson.

I was lecturing on the topic recently, and proving that certain of the quasirandomness properties all implied each other. In some cases, the proofs are quite a bit easier if you assume that the graph is regular, and in the past I have sometimes made my life easier by dealing just with that case. But that had the unfortunate consequence that when I lectured on Szemerédi’s regularity lemma, I couldn’t just say “Note that the condition on the regular pairs is just saying that they have quasirandomness property n” and have as a consequence all the other quasirandomness properties. So this year I was determined to deal with the general case, and determined to find clean proofs of all the implications. There is one that took me quite a bit of time, but I got there in the end. It is very likely to be out there in the literature somewhere, but I haven’t found it, so it seems suitable for a blog post. I can be sure of at least one interested reader, which is the future me when I find that I’ve forgotten the argument (except that actually I have now found quite a conceptual way of expressing it, so it’s just conceivable that it will stick around in the more accessible part of my long-term memory).

The implication in question, which I’ll state for bipartite graphs, concerns the following two properties. I’ll state them qualitatively first, and then give more precise versions. Let G be a bipartite graph of density \delta with (finite) vertex sets X and Y.

Property 1. If A\subset X and B\subset Y, then the number of edges between A and B is roughly \delta|A||B|.

Property 2. The number of 4-cycles (or more precisely ordered quadruples (x_1,y_1,x_2,y_2) such that x_iy_j is an edge for all four choices of i,j) is roughly \delta^4|X|^2|Y|^2.

A common way of expressing property 1 is to say that the density of the subgraph induced by A and B is approximately \alpha as long as the sets A and B are not too small. However, the following formulation leads to considerably tidier proofs if one wants to use analytic arguments. I think of G as a function, so G(x,y)=1 if xy is an edge of the graph and 0 otherwise. Then the condition is that if A has density \alpha in X and B has density \beta in Y, then

|\mathbb E_{x,y}G(x,y)A(x)B(y)-\delta\alpha\beta|\leq c_1

This condition is interesting when c_1 is small. It might seem more natural to write c_1\alpha\beta on the right-hand side, but then one has to add some condition such as that \alpha,\beta\geq c_1 in order to obtain a condition that follows from the other conditions. If one simply leaves the right-hand side as c_1 (which may depend on \delta), then one obtains a condition that automatically gives a non-trivial statement when A and B are large and a trivial one when they are small.

As for Property 2, the most natural analytic way of expressing it is the inequality

\displaystyle \mathop{\mathbb E}_{x_1,x_2\in X}\mathop{\mathbb E}_{y_1,y_2\in Y}G(x_1,y_1)G(x_1,y_2)G(x_2,y_1)G(x_2,y_2)\leq\delta^4(1+c_2)

An easy Cauchy-Schwarz argument proves a lower bound of \delta^4, so this does indeed imply that the number of labelled 4-cycles is approximately \delta^4|X|^2|Y|^2.

The equivalence between the two statements is that if one holds for a sufficiently small c_i, then the other holds for a c_j that is as small as you want.

In fact, both implications are significantly easier in the regular case, but I found a satisfactory way of deducing the first from the second a few years ago and won’t repeat it here, as it requires a few lemmas and some calculation. But it is given as Theorem 5.3 in these notes, and the proof in question can be found by working backwards. (The particular point that is easier in the regular case is Corollary 5.2, because then the function g' mentioned there and in Lemma 5.1 is identically zero.)

What I want to concentrate on is deducing the second property from the first. Let me first give the proof in the regular case, which is very short and sweet. We do it by proving the contrapositive. So let’s assume that every vertex in X has degree \delta Y, that every vertex in Y has degree \delta X, and that

\displaystyle \mathop{\mathbb E}_{x_1,x_2\in X}\mathop{\mathbb E}_{y_1,y_2\in Y}G(x_1,y_1)G(x_1,y_2)G(x_2,y_1)G(x_2,y_2)>\delta^4(1+c_2)

Since for each fixed x_2,y_2, the expectation over x_1,y_1 on the left-hand side is zero if G(x_2,y_2)=0, it follows that there exists some choice of x_2,y_2 such that

\displaystyle \mathop{\mathbb E}_{x_1\in X}\mathop{\mathbb E}_{y_1\in Y}G(x_1,y_1)G(x_1,y_2)G(x_2,y_1)>\delta^3(1+c_2)

We now set A=\{x:G(x,y_2)=1\} and B=\{y:G(x_2,y)=1\}. Then A and B both have density \delta (by the regularity assumption), and the above inequality tells us that

\mathbb E_{x_1,y_1}G(x_1,y_1)A(x_1)B(y_1)-\delta^3>c_2\delta^3,

so we obtain the desired conclusion with c_1=c_2\delta^3. Or to put it another way, if we assume Property 1 with constant c_1, then we obtain Property 2 with c_2=\delta^{-3}c_1 (which in practice means that c_1 should be small compared with \delta^3 in order to obtain a useful inequality from Property 2).

The difficulty when G is not regular is that A and B may have densities larger than \delta, and then the inequality

\mathbb E_{x_1,y_1}G(x_1,y_1)A(x_1)B(y_1)-\delta^3>c_2\delta^3,

no longer gives us what we need. The way I used to get round this was what I think of as the “disgusting” approach, which goes roughly as follows. Suppose that many vertices in X have degree substantially larger than \delta|Y|, and let A be the set of all such vertices. Then the number of edges between A and Y is too large, and we get the inequality we are looking for (with B=Y). We can say something similar about vertices in Y, so either we are done or G is at least approximately regular, and in particular almost all neighbourhoods are of density not much greater than \delta. Then one runs an approximate version of the simple averaging argument above, arguing in an ugly way that the contribution to the average from “bad” choices (x_2,y_2) is small enough that there must be at least one “good” choice.

To obtain a cleaner proof, I’ll begin with a non-essential step, but one that I think clarifies what is going on and shows that it’s not just some calculation that magically gives the desired answer. It is to interpret the quantity

\displaystyle \mathop{\mathbb E}_{x_1,x_2\in X}\mathop{\mathbb E}_{y_1,y_2\in Y}G(x_1,y_1)G(x_1,y_2)G(x_2,y_1)G(x_2,y_2)

as the probability that the four pairs x_iy_j are all edges of G if x_1,x_2 are chosen independently at random from X and y_1,y_2 are chosen independently at random from Y. Then our hypothesis is that

\mathbb P[all x_iy_j are edges]-\delta^4>c_2\delta^4

Let us now split up the left-hand side as follows. It is the sum of

\mathbb P[all x_iy_j are edges]
-\delta\mathbb P[x_1y_2,x_2y_1,x_2y_2 are edges]


\delta\mathbb P[x_1y_1,x_1y_2,x_2y_1 are edges]
-\delta^2\mathbb P[x_1y_2,x_2y_1 are edges],

where I have used the fact that the pr